The Command Interpreter Tool

This chapter describes the part of VDAT responsible for the reduction of the data.

Introduction

The main scope of the VDAT GUI is to allow users to select, visualize and reduce VIRUS data. VDAT relies mostly on cure to execute the reduction steps. cure is a C++ library that provides a number of executables that operate on a single or group of fits files.

For each of the reduction steps, VDAT must collect (i.e. generate a list of) the input files and command line options according to the directories and IFUs selected by the user and run the appropriate cure tool.

Although cure is the main library of tools to use, some of the steps of the reduction are not implemented there. We also want to allow users to execute generic commands without any prior knowledge of the signature and name of the files.

We have solved those requirements by designing a command line tool based on these two building blocks:

  1. an interpreter that parses an input command string, containing placeholders, and executes the command in a loop replacing the placeholders with the correct values; we use the standard python string Template to define placeholders;
  2. one or more yaml configuration files to instruct the interpreter on how to expand the placeholders for any provided command. In this documentation we’ll refer to entries in the configuration as keys.

The interpreter

The public interface of the interpreter is defined by the constructor of the class CommandInterpreter and its method run().

Constructor

The constructor has the following signature

class vdat.command_interpreter.CommandInterpreter(command, command_config, selected=None, multiprocessing=False, processes=None)[source]

Interpret and execute the command.

See The interpreter section in the documentation for more details

All the custom errors are defined in vdat.command_interpreter.exceptions. The ones raised in the constructor are derived from from CIValidationError,

Parameters:
command : string

command to parse

command_config : dict

dictionary containing the instructions to execute the command. A deep copy is executed

selected : list-like, optional

None or a list of the selected items; if None no filtering of the primary files is done; otherwise must be an object supporting the membership test operator in.

multiprocessing : bool

run the command using multiprocessing

processes : int

number of processors to use

Raises:
CINoExeError

if the executable does not exists

CIParseError

if there is some error when extracting the keywords

CIKeywordError

for missing keywords or for keywords of wrong type

CIKeywordTypeError

if the type of the keywords is not among the known ones

  1. command is a string with the command to execute. The command contains what we will refer to as fields that will be substituted. For example, the fields in the command string below are args, biasec and fits

    subtractfits $args -o $biassec $fits
    
  2. command_config: the relevant part of the parsed yaml configuration file containing the instructions on how to expand fields like args, biassec and fits while running the command subtractfits. The part of the configuration file necessary to run the above command is

    subtractfits:
        # mandatory fields
        mandatory: [fits, ]
    
        # primary key: the interpreter collects files according to
        # the instructions in the `fits` key, then loops over them,
        # replacing all the fields and executing the command
        primary: fits
    
        # looks for all the files matching the pattern in the `selected_dir`
        fits: '[0-9]*.fits'
    
        # Get the `BIASSECT` value from the header of every file
        # and from it extract the part within square brackets
        biassec:
            type: header
            keyword: BIASSEC
            extract:
                - \[(.*)\]
                - \1
    
        args: '-s -a -k 2.8 -t -z'
    

    The syntax for the expression given for the fits and extract keys will be explained later.

    Both the GUI and the command line interface inject into the command_config the following keys:

    • target_dir: is the directory selected by the user; in the above examples, the fits files are searched in this directory
    • cal_dir: the reference calibration directory
    • zro_dir: the reference bias directory

    If no directory cal or zro has been explicitly selected in the GUI, the default ones are added.

    Warning

    If any of these entries is already in the configuration file, they will be overwritten

  3. selected: list of selected items or None, for selecting all. It tells the interpreter which of the primary elements must be run. E.g. the VDAT GUI passes as selected the list of IFUs selected by the user. The instructions on how to extract the information to match against selected from the files while running the command is defined in the command configuration file.

    Note

    VDAT passes the IFU head mount plate IDs (ihmpid) to the command interpreter. This id is a 3 digit number stored in the file headers under the IFUSLOT key.


In the constructor the following steps are performed:

  1. the configuration object is copied and saved in local variables: this allows the user to enqueue multiple commands;
  2. validations:
    1. the command executable, e.g. subtractfits, is searched in the path to check if it exists
    2. check that all the mandatory fields are present in the command (see the mandatory key in the command_config above)
    3. check that all the required keys are present in the configuration
    4. check that all the required keys are of known type
    5. map all the types to the functions implementing them

The configuration file

To allow for flexibility and extendability, the instructions on how to expand fields come from one or more configuration files, written using the yaml standard.

When validating the command string, the fields and the name of the command are extracted and corresponding keys are searched for in the configuration, under the section specific to that command. The value of a key can be either a string or a python dictionary. If it’s a string, like '-a -b', it is converted into a key of type plain: {'type': 'plain'; 'value': '-a -b'}. If it is a dictionary, it must contain a Python dictionary entry called type, whose value defines the type of the key.

Non-substituted keys

These are keys that are understood and used by the interpreter, but do not represent fields that will be expanded/substituted in the command line calls.

is_alias_of

If it exists, its value is the real name of the executable. This allows the creation of multiple commands using the same underlying executable. If e.g. the command is:

do_something $args -o $ofile $ifiles

and the configuration file contains

is_alias_of: an_executable
args: "-a -b"
ofile: outfile
ifiles: file[1-9].txt
primary: ifiles

then the interpreter will loop through all the files matching the ifiles pattern in target_dir. For the first file, it will execute:

an_executable -a -b -o outfile file1.txt

mandatory

List of mandatory fields; field names defined under mandatory must exist in the provided command. Do not provide this key or leave it empty to disable these checks.

mandatory: [ifiles]
# or equivalently
mandatory:
    - field1
    - field2

primary

Name of the field to use as primary. A primary field has a special status: files are collected from the target_dir according to the type of the underlying key, then they are looped over and for each step the command string is created and executed. If the value of any other key or field needs to be built at run time, it will use the primary files to do it. VDAT is shipped with few primary types.

This key can have either a single value or a list of values. If it has a single value, the corresponding field must be present in the command. If it is a list of values, one and only one of the fields must be present in the command. Multiple primary fields are not allowed.

# single primary
primary: fits
# mutiple primaries
primary: [fits1, fits2]

filter_selected

Tells the interpreter how to filter the list of primary files. If this option is not found in the configuration or the selected keyword in CommandInterpreter is None, no filtering is performed. Otherwise, for each element in the primary list:

  • uses the instructions from the value of filter_selected to extract a string
  • check if the string is in selected.

The value of filter_selected can be any available key type, e.g. the built-in ones described below.

With the following settings:

# Use the value of the header keyword ``IFUSLOT`` to decide whether to
# keep the primary field or not
filter_selected:
    type: header
    keyword: IFUSLOT

the content of the fits header keyword IFUSLOT is extracted and compared with the list provided with the selected options in CommandInterpreter

execute

For each iteration of the primary, it tells the interpreter whether to run the command or not. If the option is not found, no filtering will be performed. VDAT ships some execute types.

If the handling of this key raises an exception, it is logged as a warning and the command is executed as if the key returned True.

Built-in primary key/field types

plain

Search for files matching the given pattern in the target directory. If the value of a key is a string, it is interpreted as a plain type. These three definitions are equivalent:

keyword: 20*.fits
---
keyword: &plain
    type: plain
    value: 20*.fits
---
keyword: {type: plain, value: 20*.fits}

By default, the keyword values are interpreted as shell-style wildcards. As in the fnmatch the only special characters are:

Pattern Meaning
* matches everything
? matches any single character
[seq] matches any character in seq
[!seq] matches any character not in seq

If you need more complex matches, it’s possible to use python regular expressions. To make the interpreter aware of it you can add the optional key is_regex and set it to True. For example:

keyword:
    type: plain
    value: '(?:e.)?jpes[0-9].*fits'
    is_regex: True

will get all the files in the target_dir whose name matches e.jpes[0-9]*fits or jpes[0-9]*fits, but not, e.g., FEjpes[0-9]*fits,

If rather than returning the filename we just one to extract some part of it, e.g. just the time stamp , we can add the returns option with the corresponding instructions. The content of returns can be any available secondary keyword:

keyword:
    <<: *plain
    returns:
        type: regex
        match: '.*(\d{8}T\d{6}).*'
        replace: \1

here the \1 refers to the first regex group returned from the expression in match.

loop

This is designed to loop over, for example, IFUs, channels and/or amplifiers.

  1. collects the keys which have been stored under a yaml key called (a little confusingly) keys
  2. cycles through all the possible combinations of them
  3. for each combination replaces the corresponding entries in value (see example below) using the standard python format string syntax
  4. look for all the files matching the resulting strings
  5. if any files are found, construct a string. If multiple files are found, construct a single string with the different files separated by a space.
  6. if the returns option is given, uses it to manipulate the string with the file names (as explained above)
  7. yields the string

The entries stored under the keys key are maps between the names of the entries, e.g. ifu and the values that they can have in the loop described in step (2) above. Their value can be either a list or three comma separated numbers: start, stop, step. The latter case is converted into a list of numbers from start to stop excluded every step.

The following configuration:

keyword:  &loop
    type: loop
    value: 's[0-9]*{ifu:03d}{channel}{amp}_*.fits'
    keys:   # dictionary of keys to expand in ``value``
        ifu: 1, 100, 1     # start, stop, step values of a slice
        channel: [L, R]    # a list of possible values
        amp:               # alternative syntax for the list
            - L
            - U

cycles through all the possible combinations of the three lists: [1, 2, .., 99], ['L', 'R'] and ['L', 'R']. For the first combination we get: ifu: 1, channel: L, amp: L and value becomes s[0-9]*001LL_*.fit. Then all the files matching this pattern are collected.

As with the plain primary keyword, it’s possible to interpret the strings resulting from filling in the fields in value as regexes by providing the optional key is_regex. As before, one can also extract some part from the file name(s) with the returns key.

groupby

  1. collects all the files matching value and loops through them
  2. for each of the files replace match with all the values in replace using the regex secondary keyword implementation.

The following configuration:

keyword:
    type: groupby
    value: 'p*[0-9][LR]L_*.fits'
    match: (.*p.*\d[LR])L(_.*\.fits)
    replace:
        - \1U\2

cycles through all the files matching value in the target_dir, e.g. “p2LL_sci.fits”, and for each of them creates a new file name replacing the last “L” with “U”, e.g. “p2LU_sci.fits”. The two files are then returned.

To create multiple files out of the first one, it’s enough to provide other entries to replace. E.g.:

replace: [\1U\2, \1A\2, \2_\1]

will create three new files: “p2LU_sci.fits”, “p2LA_sci.fits” and “_sci.fits_p2L”

As with the plain primary keyword, it’s possible to interpret the value as a regex providing the optional key is_regex. All the keywords recognised by regex secondary keyword are also supported.

all_files

This primary type has the same interface of the plain primary keyword. The behaviour is however different: while the plain primary keyword return an iterator (or list) of file names or strings, all_files returns a list containing a single string of space separated file names or, when using the returns option, values.

The following configuration collect all the files matching value as explained in plain and returns a list with a single element:

keyword: &all_files
    type: all_files
    value: 20*.fits

If e.g. there are four files matching the pattern, the type returns something like:

['/path/to/20180219T071318.8_073LL_sci.fits /path/to/20180219T071318.8_073LU_sci.fits /path/to/20180219T072418.2_106RL_sci.fits /path/to/20180219T072418.2_106RU_sci.fits']

For comparison, the plain primary type would return:

['/path/to/20180219T071318.8_073LL_sci.fits',
 '/path/to/20180219T071318.8_073LU_sci.fits',
 '/path/to/20180219T072418.2_106RL_sci.fits',
 '/path/to/20180219T072418.2_106RU_sci.fits']

The regex and returns options are interpreted as described in plain

Warning

The filter_selected option is used to select which of the elements returned by the primary key are to be used. They are not used to filter substrings of the elements returned by the primary key. So using filter_selected with all_files might lead to unexpected results and we suggest to avoid the option

Build-in keyword types ———————-

plain ^^^^^^^^^

A static string. These three definitions are equivalent:

keyword: '-a -b --long option' --- keyword: type: plain value: '-a -b
--long option' --- keyword: {type: plain, value: '-a -b --long option'}

regex

Returns a string obtained from primary replacing match with replace. It uses re.subn() to do the substitution. If e.g. the primary is file_001_LL.fits file_001_RL.fits, the following entry returns L001

keyword:
    type: regex
    match: \S*?_(\d{3})_([LR]).*?\.fits
    replace: \2\1

If the substitution fails because of a regex mismatch or because more than one substitution is performed, a CIKeywordError is raised. It is possible to declare the expected number of substitutions or to disable the check altogether via the optional n_subs key:

  • if not present, defaults to one, if do_split is True, or to the number of input primary files, otherwise;

  • if a negative number: the check is disabled;

  • positive integer: exactly n_subs must be performed. E.g:

    keyword:
        type: regex
        match: \S*?_(\d{3})_([LR]).*?\.fits
        replace: \2\1
        n_subs: 2
    

    will fail because it requires two substitutions;

  • list of integers: the number of substitutions must be one of the list entries:

    keyword:
        type: regex
        match: \S*?_(\d{3})_([LR]).*?\.fits
        replace: \2\1
        n_subs: [1,2]
    

    will accept either one or two substitutions;

  • string: interpreted as a slice [start]:[stop][:step] or a comma separated list of [start],[stop][,step]. The string is used to initialize a SliceLike instance and then to check if the number of substitutions is within the allowed range as defined in the class documentation. E.g the following will succeed:

    keyword:
        type: regex
        match: \S*?_(\d{3})_([LR]).*?\.fits
        replace: \2\1
        n_subs: 1:10:2
    

    but using n_subs: 0:10:2 with raise an error.

Finally the do_split optional key will instruct the function whether to split the primary on white spaces or not. E.g.:

keyword:
    type: regex
    match: \S*?_(\d{3})_([LR]).*?\.fits
    replace: \2\1
    do_split: False

with return L001 R001 from the files Sfile_001_L.fits Sfile_001_R.fits as a single string. If not provided, it defaults to True.

Examples and more information about the python regex syntax can be found in the official python documentation

format

Creates a new string formatting value using the keys. They can be of any secondary type known to VDAT at loading time, except format to avoid circular recursion. Assuming you have a fits file called file_001_LL.fits, with a header keyword DATE-OBS = 2013-01-01, the following configuration instructs the interpreters to extract the id key, a three digit number, from the file name and the DATE-OBS fits header value.

keyword:
    type: format
    value: file_{id}_{sec}.fits
    keys:
        id:
            type: regex
            match: .*_(\d{3}).*\.fits
            replace: \1
        date:
            type: header
            value: DATE-OBS

The resulting value is the string file_001_2013-01-01.fits. If the types for the keys do not exist, a CIKeywordTypeError will be raised at run time. If one of the keys has a string as value, it will be interpreted as of type plain.

As in the previous cases, if do_split is present and False, the formatting is applied to all the elements in the primary and a concatenated string of white-space separated results is returned.

fplane_map

This type allows to maps from one type of ID to an other using the fplane file. The following code shows all the mandatory keys; their explanation can be found below.

keyword:  &fplane_map
    type: fplane_map
    fplane_file: /path/to/fplane.txt
    in_id:
        type: regex
        match: '.*?/dither_(\d{3})\.txt'
        replace: \1
    in_id_type: ifuslot
    out_id_type: ifuid

where:

  • fplane_file points to the fplane file
  • in_id can be any of the available keyword types and is used to extract the ID from the primaries.
  • in_id_type is the type of ID returned by in_id and can be any of the values supported by pyhetdex.het.fplane.FPlane.by_id().
  • out_id_type is the type of ID to return and can be any of the ones supported by pyhetdex.het.fplane.IFU.

If the primary is /path/to/dither_073.txt and the fplane file contains the following IFU:

# IFUSLOT X_FP   Y_FP   SPECID SPECSLOT IFUID IFUROT PLATESC
073 150.0   150.0   04  136 023 0.0 1.0

the above configuration returns the value '023'

Similarly to the header keyword, by default the id is cast to a string. The formatter keyword can be used to the formatting of the output id. In the following example:

keyword:
    <<: *fplane_map
    out_id_type: specid
    formatter: '{:03d}'  # or '{0:03d}'

the return value is '004'. Without the formatter keyword the output would be '4'.

As in the previous cases, if do_split is present and False, the ids are extracted from all the primaries and converted; the resulting IDs are concatenated.

For information about the fplane parser, follow this link.

Built-in execute types

new_file

Following the instructions provided, this type builds a string and checks whether the file referenced by that string exists on the filesytem.

The only mandatory option is value: it is used to build the string from the primary and can be any of the available keyword types. E.g. given a primary like /path/to/123T456_001LL_sci.fits, the following instruction will create the string /path/to/masterbias_001LL.fits and check if it exists.

execute:  &execute
    type: new_file
    value:
        type: regex
        match: (.*?)/\d*?T\d*?_(\d{3}[LR][LU])_.*\.fits
        replace: \1/masterbias_\2.fits

In some cases it might not be possible to build the path of the output file directly from the primary files. In this case you can provide to the type definition the path keyword, whose return value is joined together with the name of the file returned by value. path can be either of the following:

  • any of the available keyword types: the path is then constructed in the same way as value. E.g.:

    execute:
        <<: *execute
        path: /other/path
    
  • a string like $key: this will get the value of path from the key keyword from the command configuration. The following behaves in the same way as the above example:

    other_dir: /other/path
    execute:
        <<: *execute
        path: $other_dir
    

Add new types

To any type, be it primary or not, there is a corresponding function that implements how to handle it.

All the types are implemented as plugins, discovered and dynamically loaded at run time.

The command interpreter looks for two entry points:

  • vdat.cit.primary: for the definition of primary types
  • vdat.cit.keyword: for the definition of other types
  • vdat.cit.execute: for the definition of types to decide whether to execute or not the command

Each entry point is defined as a string, like:

type = package.module:func

where type is the name of the type and func is the function handling the keyword of type; func is implemented in the module module of the package package.

The functions implementing primary and secondary keywords have the following signature:

vdat.command_interpreter.types.primary_template(target_dir, key_val)[source]

Template for a function that deals with a primary keyword.

It collects the files from the target_dir according to the instructions in key_val, if any and either yield a value or return an iterable.

Parameters:
target_dir : string

directory in which the files must be collected

key_val : dictionary

configuration for the key handle

Yields:
yield a string or iterable of strings
Raises:
CIPrimaryError

if something goes wrong when handling the primary key

vdat.command_interpreter.types.keyword_template(primary, key_val)[source]

Template for a function that deals with a non-primary keyword.

A keyword has a value either statically stored in key_val or its value need to be extracted from the value of the primary file(s).

Parameters:
primary : string

the value of one of the items returned by primary_template()

key_val : dictionary

configuration for the key handle

Returns:
string

value to associate to the keyword

Raises:
CIKeywordError

if something goes wrong when handling the key

vdat.command_interpreter.types.execute_template(primary, config)[source]

For each of the primary entry, this function is called to decide whether to execute or skip the command.

Parameters:
primary : string

the value of one of the items returned by primary_template()

config : dictionary

configuration for the command

Returns:
bool

True: the command is executed; False: the command is skipped

The run method

Invoking

CommandInterpreter.run()[source]

Collect the files, expand and run the required command

All the custom errors raised here derive from CIRunError.

Raises:
CICommandFmtError

if the substitution of the command doesn’t work

will:

  1. collect all the primary files
  2. filter them according to the list of selected items
  3. loop over the primary files
  4. check whether the step must be executed or not
  5. for each step in the loop replace the relevant fields in the input command according to the instructions from the configuration
  6. execute the command
  7. report execution progress
  8. collect and send out execution results

The execution of each step in the loop is done using the worker-based interface provided by pyhetdex.tools.processes. Within the command interpreter only the worker named command_interpreter is used. The multiprocessing is enabled, if the multiprocessing keyword argument is given.

Communication

The command interpreter communicates with the rest of the world through different channels.

  • Upon errors directly handled by the interpreter, one of the errors defined in vdat.command_interpreter.exceptions is raised. Most of those errors are notified to the user via pop-up windows. Check the documentation of CommandInterpreter for more details.

  • During normal execution of the command, the resolved command string, standard output, error and any exception raised while executing the code are logged to a logger with the name of the executable. In VDAT, these loggers are set to write to files located in the directory defined in the VDAT configuration file; the name of those files are the executable name with a .log extension. These loggers are set in the main VDAT code, not in the command interpreter sub-package.

    Warning

    in a future release also the logging will be performed via the signal mechanism

  • CommandInterpreter uses PyQt like signals to communicate with the external word.

    The names of the emit methods arguments are the type of the parameter followed by an underscore and optionally by an explanatory name.

    Available signals are:

    • command_string: accept an int and a string;

      CICommandString.emit(int_, string_)[source]
      Parameters:
      int_ : int

      loop number

      string_ : string

      string of the command

    • progress: accept four numbers, the total expected number, the number of done, of skipped and of failures;

      CIProgress.emit(int_tot, int_done, int_skipped, int_fail)[source]
      Parameters:
      int_tot : int

      total number of jobs

      int_done : int

      number of finished jobs; the number of successful jobs is int_done - int_skipped - int_fail

      int_skipped : int

      number of skipped jobs

      int_fail : int

      number of failed jobs

    • command_done: accept a boolean: True for the end of the command interpreter, False for the end of one single command;

      CICommandDone.emit(bool_global)[source]
      Parameters:
      bool_global : boolean

      if True, the command interpreter is done, if False a single command is done

    • global_logger: accept an integer and a string

      CIGlobalLogger.emit(int_level, string_msg)[source]
      Parameters:
      int_level : integer

      logging level; see the logging documentation for more information

      string_msg : string

      string to log

    • n_primaries: accept an integer

      CINPrimaries.emit(int_)[source]
      Parameters:
      int_ : integer

      number of primary files

    • commands: accept six strings and a dictionary. The first string is the primary value; the second the command with all the substitutions in place, if the execution finished, or with the placeholder, if some exception has been raised; the third and fourth are the stdout and stderr of the executed command; the fifth is non empty if the command has a non-null return code; the sixth is non empty is the execution of the command crashes for some reason; the seventh is the configuration dictionary passed to the CommandInterpreter

      CINPrimaries.emit(int_)[source]
      Parameters:
      int_ : integer

      number of primary files

    The list of available signal names can be retrieved with vdat.command_interpreter.signals.get_signal_names(), while the signals can be accessed with vdat.command_interpreter.signals.get_signal(). Callbacks can be connected to or disconnected from each signal using the methods connect() or disconnect()