.. Command line tool documentation The Command Interpreter Tool **************************** This chapter describes the part of ``VDAT`` responsible for the reduction of the data. Introduction ============ The main scope of the ``VDAT`` GUI is to allow users to select, visualize and reduce ``VIRUS`` data. ``VDAT`` relies mostly on ``cure`` to execute the reduction steps. ``cure`` is a C++ library that provides a number of executables that operate on a single or group of fits files. For each of the reduction steps, ``VDAT`` must collect (i.e. generate a list of) the input files and command line options according to the directories and IFUs selected by the user and run the appropriate ``cure`` tool. Although ``cure`` is the main library of tools to use, some of the steps of the reduction are not implemented there. We also want to allow users to execute generic commands without any prior knowledge of the signature and name of the files. We have solved those requirements by designing a command line tool based on these two building blocks: 1) an interpreter that parses an input `command string`, containing placeholders, and executes the command in a loop replacing the placeholders with the correct values; we use the standard `python string Template `_ to define placeholders; 2) one or more `yaml `_ configuration files to instruct the interpreter on how to expand the placeholders for any provided command. In this documentation we'll refer to entries in the configuration as ``keys``. .. _interpreter: The interpreter =============== The public interface of the interpreter is defined by the constructor of the class :class:`~vdat.command_interpreter.CommandInterpreter` and its method :meth:`~vdat.command_interpreter.CommandInterpreter.run`. Constructor ----------- The constructor has the following signature .. autoclass:: vdat.command_interpreter.CommandInterpreter :noindex: 1) ``command`` is a string with the command to execute. The command contains what we will refer to as ``fields`` that will be substituted. For example, the ``fields`` in the command string below are ``args``, ``biasec`` and ``fits`` :: subtractfits $args -o $biassec $fits 2) ``command_config``: the relevant part of the parsed ``yaml`` configuration file containing the instructions on how to expand fields like ``args``, ``biassec`` and ``fits`` while running the command ``subtractfits``. The part of the configuration file necessary to run the above command is .. code-block:: yaml subtractfits: # mandatory fields mandatory: [fits, ] # primary key: the interpreter collects files according to # the instructions in the `fits` key, then loops over them, # replacing all the fields and executing the command primary: fits # looks for all the files matching the pattern in the `selected_dir` fits: '[0-9]*.fits' # Get the `BIASSECT` value from the header of every file # and from it extract the part within square brackets biassec: type: header keyword: BIASSEC extract: - \[(.*)\] - \1 args: '-s -a -k 2.8 -t -z' The syntax for the expression given for the ``fits`` and ``extract`` keys will be explained later. Both the GUI and the command line interface inject into the ``command_config`` the following keys: * ``target_dir``: is the directory selected by the user; in the above examples, the ``fits`` files are searched in this directory * ``cal_dir``: the reference calibration directory * ``zro_dir``: the reference bias directory If no directory ``cal`` or ``zro`` has been explicitly selected in the GUI, the default ones are added. .. warning:: If any of these entries is already in the configuration file, they will be overwritten 3) ``selected``: list of selected items or ``None``, for selecting all. It tells the interpreter which of the ``primary`` elements must be run. E.g. the ``VDAT`` GUI passes as ``selected`` the list of IFUs selected by the user. The instructions on how to extract the information to match against ``selected`` from the files while running the command is defined in the :ref:`command configuration file `. .. note:: VDAT passes the IFU head mount plate IDs (ihmpid) to the command interpreter. This id is a 3 digit number stored in the file headers under the IFUSLOT key. ---- In the constructor the following steps are performed: 1) the configuration object is copied and saved in local variables: this allows the user to enqueue multiple commands; 2) validations: a) the command executable, e.g. ``subtractfits``, is searched in the path to check if it exists b) check that all the mandatory ``fields`` are present in the command (see the ``mandatory`` key in the ``command_config`` above) c) check that all the required ``keys`` are present in the configuration d) check that all the required ``keys`` are of known type e) map all the types to the functions implementing them .. _command_conf: The configuration file ====================== To allow for flexibility and extendability, the instructions on how to expand ``fields`` come from one or more configuration files, written using the ``yaml`` standard. When validating the ``command`` string, the ``fields`` and the name of the command are extracted and corresponding ``keys`` are searched for in the configuration, under the section specific to that command. The value of a ``key`` can be either a string or a python dictionary. If it's a string, like ``'-a -b'``, it is converted into a ``key`` of type ``plain``: ``{'type': 'plain'; 'value': '-a -b'}``. If it is a dictionary, it must contain a Python dictionary entry called ``type``, whose value defines the type of the key. .. _special_keys: Non-substituted keys -------------------- These are ``keys`` that are understood and used by the interpreter, but do not represent ``fields`` that will be expanded/substituted in the command line calls. ``is_alias_of`` ^^^^^^^^^^^^^^^ If it exists, its value is the real name of the executable. This allows the creation of multiple commands using the same underlying executable. If e.g. the command is:: do_something $args -o $ofile $ifiles and the configuration file contains .. code-block:: yaml is_alias_of: an_executable args: "-a -b" ofile: outfile ifiles: file[1-9].txt primary: ifiles then the interpreter will loop through all the files matching the ``ifiles`` pattern in ``target_dir``. For the first file, it will execute:: an_executable -a -b -o outfile file1.txt ``mandatory`` ^^^^^^^^^^^^^ List of mandatory ``fields``; field names defined under ``mandatory`` must exist in the provided command. Do not provide this key or leave it empty to disable these checks. .. code-block:: yaml mandatory: [ifiles] # or equivalently mandatory: - field1 - field2 ``primary`` ^^^^^^^^^^^ Name of the ``field`` to use as primary. A primary ``field`` has a special status: files are collected from the ``target_dir`` according to the type of the underlying ``key``, then they are looped over and for each step the command string is created and executed. If the value of any other ``key`` or ``field`` needs to be built at run time, it will use the ``primary`` files to do it. ``VDAT`` is shipped with few :ref:`primary types `. This key can have either a single value or a list of values. If it has a single value, the corresponding ``field`` must be present in the command. If it is a list of values, **one and only one** of the ``fields`` must be present in the command. Multiple primary ``fields`` are not allowed. .. code-block:: yaml # single primary primary: fits # mutiple primaries primary: [fits1, fits2] .. _filter_selected: ``filter_selected`` ^^^^^^^^^^^^^^^^^^^ Tells the interpreter how to filter the list of primary files. If this option is not found in the configuration or the ``selected`` keyword in :class:`~vdat.command_interpreter.core.CommandInterpreter` is ``None``, no filtering is performed. Otherwise, for each element in the primary list: * uses the instructions from the value of ``filter_selected`` to extract a string * check if the string is in ``selected``. The value of ``filter_selected`` can be any available ``key`` type, e.g. :ref:`the built-in ones ` described below. With the following settings: .. code-block:: yaml # Use the value of the header keyword ``IFUSLOT`` to decide whether to # keep the primary field or not filter_selected: type: header keyword: IFUSLOT the content of the fits header keyword ``IFUSLOT`` is extracted and compared with the list provided with the ``selected`` options in :class:`~vdat.command_interpreter.core.CommandInterpreter` ``execute`` ^^^^^^^^^^^ For each iteration of the ``primary``, it tells the interpreter whether to run the command or not. If the option is not found, no filtering will be performed. ``VDAT`` ships some :ref:`execute types `. If the handling of this ``key`` raises an exception, it is logged as a warning and the command is executed as if the ``key`` returned ``True``. .. _primary_types: Built-in primary key/field types -------------------------------- .. _primary_plain: ``plain`` ^^^^^^^^^ Search for files matching the given pattern in the target directory. If the value of a ``key`` is a string, it is interpreted as a ``plain`` type. These three definitions are equivalent: .. code-block:: yaml keyword: 20*.fits --- keyword: &plain type: plain value: 20*.fits --- keyword: {type: plain, value: 20*.fits} By default, the keyword values are interpreted as shell-style wildcards. As in the `fnmatch `_ the only special characters are: +------------+------------------------------------+ | Pattern | Meaning | +============+====================================+ | ``*`` | matches everything | +------------+------------------------------------+ | ``?`` | matches any single character | +------------+------------------------------------+ | ``[seq]`` | matches any character in *seq* | +------------+------------------------------------+ | ``[!seq]`` | matches any character not in *seq* | +------------+------------------------------------+ If you need more complex matches, it's possible to use `python regular expressions `_. To make the interpreter aware of it you can add the optional key ``is_regex`` and set it to ``True``. For example: .. code-block:: yaml keyword: type: plain value: '(?:e.)?jpes[0-9].*fits' is_regex: True will get all the files in the ``target_dir`` whose name matches ``e.jpes[0-9]*fits`` or ``jpes[0-9]*fits``, but not, e.g., ``FEjpes[0-9]*fits``, If rather than returning the filename we just one to extract some part of it, e.g. just the time stamp , we can add the ``returns`` option with the corresponding instructions. The content of returns can be any available secondary keyword: .. code-block:: yaml keyword: <<: *plain returns: type: regex match: '.*(\d{8}T\d{6}).*' replace: \1 here the ``\1`` refers to the first ``regex`` group returned from the expression in ``match``. ``loop`` ^^^^^^^^ This is designed to loop over, for example, IFUs, channels and/or amplifiers. 1) collects the ``keys`` which have been stored under a ``yaml`` key called (a little confusingly) ``keys`` 2) cycles through all the possible combinations of them 3) for each combination replaces the corresponding entries in ``value`` (see example below) using the standard python `format string syntax `_ 4) look for all the files matching the resulting strings 5) if any files are found, construct a string. If multiple files are found, construct a single string with the different files separated by a space. 6) if the ``returns`` option is given, uses it to manipulate the string with the file names (as explained above) 7) yields the string The entries stored under the ``keys`` key are maps between the names of the entries, e.g. ``ifu`` and the values that they can have in the loop described in step (2) above. Their value can be either a list or three comma separated numbers: ``start, stop, step``. The latter case is converted into a list of numbers from ``start`` to ``stop`` excluded every ``step``. The following configuration: .. code-block:: yaml keyword: &loop type: loop value: 's[0-9]*{ifu:03d}{channel}{amp}_*.fits' keys: # dictionary of keys to expand in ``value`` ifu: 1, 100, 1 # start, stop, step values of a slice channel: [L, R] # a list of possible values amp: # alternative syntax for the list - L - U cycles through all the possible combinations of the three lists: ``[1, 2, .., 99]``, ``['L', 'R']`` and ``['L', 'R']``. For the first combination we get: ``ifu``: 1, ``channel``: L, ``amp``: L and ``value`` becomes ``s[0-9]*001LL_*.fit``. Then all the files matching this pattern are collected. As with the :ref:`plain primary keyword `, it's possible to interpret the strings resulting from filling in the fields in ``value`` as regexes by providing the optional key ``is_regex``. As before, one can also extract some part from the file name(s) with the ``returns`` key. ``groupby`` ^^^^^^^^^^^ 1) collects all the files matching ``value`` and loops through them 2) for each of the files replace ``match`` with all the values in ``replace`` using the :ref:`keyword_regex` secondary keyword implementation. The following configuration: .. code-block:: yaml keyword: type: groupby value: 'p*[0-9][LR]L_*.fits' match: (.*p.*\d[LR])L(_.*\.fits) replace: - \1U\2 cycles through all the files matching ``value`` in the ``target_dir``, e.g. "p2LL_sci.fits", and for each of them creates a new file name replacing the last "L" with "U", e.g. "p2LU_sci.fits". The two files are then returned. To create multiple files out of the first one, it's enough to provide other entries to ``replace``. E.g.: .. code-block:: yaml replace: [\1U\2, \1A\2, \2_\1] will create three new files: "p2LU_sci.fits", "p2LA_sci.fits" and "_sci.fits_p2L" As with the :ref:`plain primary keyword `, it's possible to interpret the ``value`` as a regex providing the optional key ``is_regex``. All the keywords recognised by :ref:`keyword_regex` secondary keyword are also supported. ``all_files`` ^^^^^^^^^^^^^ This primary type has the same interface of the :ref:`plain primary keyword `. The behaviour is however different: while the ``plain`` primary keyword return an iterator (or list) of file names or strings, ``all_files`` returns a list containing a single string of space separated file names or, when using the ``returns`` option, values. The following configuration collect all the files matching ``value`` as explained in :ref:`primary_plain` and returns a list with a single element: .. code-block:: yaml keyword: &all_files type: all_files value: 20*.fits If e.g. there are four files matching the pattern, the type returns something like:: ['/path/to/20180219T071318.8_073LL_sci.fits /path/to/20180219T071318.8_073LU_sci.fits /path/to/20180219T072418.2_106RL_sci.fits /path/to/20180219T072418.2_106RU_sci.fits'] For comparison, the ``plain`` primary type would return:: ['/path/to/20180219T071318.8_073LL_sci.fits', '/path/to/20180219T071318.8_073LU_sci.fits', '/path/to/20180219T072418.2_106RL_sci.fits', '/path/to/20180219T072418.2_106RU_sci.fits'] The ``regex`` and ``returns`` options are interpreted as described in :ref:`primary_plain` .. warning:: The :ref:`filter_selected` option is used to select which of the elements returned by the primary key are to be used. They are not used to filter substrings of the elements returned by the primary key. So using ``filter_selected`` with ``all_files`` might lead to unexpected results and we suggest to avoid the option .. _keyword_types: Build-in keyword types ---------------------- ``plain`` ^^^^^^^^^ A static string. These three definitions are equivalent: .. code-block:: yaml keyword: '-a -b --long option' --- keyword: type: plain value: '-a -b --long option' --- keyword: {type: plain, value: '-a -b --long option'} .. _keyword_regex: ``regex`` ^^^^^^^^^ Returns a string obtained from primary replacing ``match`` with ``replace``. It uses :func:`re.subn` to do the substitution. If e.g. the primary is ``file_001_LL.fits file_001_RL.fits``, the following entry returns ``L001`` .. code-block:: yaml keyword: type: regex match: \S*?_(\d{3})_([LR]).*?\.fits replace: \2\1 If the substitution fails because of a regex mismatch or because more than one substitution is performed, a :class:`~vdat.command_interpreter.exceptions.CIKeywordError` is raised. It is possible to declare the expected number of substitutions or to disable the check altogether via the optional ``n_subs`` key: * if not present, defaults to one, if ``do_split`` is ``True``, or to the number of input primary files, otherwise; * if a negative number: the check is disabled; * positive integer: exactly ``n_subs`` must be performed. E.g:: keyword: type: regex match: \S*?_(\d{3})_([LR]).*?\.fits replace: \2\1 n_subs: 2 will fail because it requires **two** substitutions; * list of integers: the number of substitutions must be one of the list entries:: keyword: type: regex match: \S*?_(\d{3})_([LR]).*?\.fits replace: \2\1 n_subs: [1,2] will accept either one or two substitutions; * string: interpreted as a slice ``[start]:[stop][:step]`` or a comma separated list of ``[start],[stop][,step]``. The string is used to initialize a :class:`~vdat.command_interpreter.utils.SliceLike` instance and then to check if the number of substitutions is within the allowed range as defined in the class documentation. E.g the following will succeed:: keyword: type: regex match: \S*?_(\d{3})_([LR]).*?\.fits replace: \2\1 n_subs: 1:10:2 but using ``n_subs: 0:10:2`` with raise an error. Finally the ``do_split`` optional key will instruct the function whether to split the primary on white spaces or not. E.g.:: keyword: type: regex match: \S*?_(\d{3})_([LR]).*?\.fits replace: \2\1 do_split: False with return ``L001 R001`` from the files ``Sfile_001_L.fits Sfile_001_R.fits`` as a single string. If not provided, it defaults to ``True``. Examples and more information about the python regex syntax can be found `in the official python documentation `_ ``header`` ^^^^^^^^^^ Extract and manipulate the fits header keyword named in ``value`` from the primary files. If the optional keyword ``do_split`` is ``True`` (the default) it splits the primary on white-spaces and gets ``value`` only from the first file. Otherwise ``value`` is extracted from every file, converted to a string and concatenated with white spaces. Assuming that the primary consists of two files containing ``BIASSEC = [1:32,1:1032]`` in the header, the following instruction: .. code-block:: yaml keyword: &header type: header value: BIASSEC will return ``[1:32,1:1032]``. By default the value of the header keyword is cast to a string. However sometimes it is desirable or necessary to format it, e.g. padding an integer with zeros. Via the ``formatter`` key, it is possible to format the header keyword value according to standard python `format string syntax `_. E.g. it is possible to convert the integer header keyword ``IFUSLOT`` (``42``) to a zero padded-three digit string (``042``) with the following definition: .. code-block:: yaml keyword: type: header value: IFUSLOT formatter: '{:03d}' # or '{0:03d}' .. warning:: the ``:`` or ``0:`` part is mandatory, otherwise a ``KeyError`` will be raised. If the formatting code is wrong for the type a ``ValueError`` is raised with a message similar to "Unknown format code 'd' for object of type 'str'" It is also possible to manipulate the return value using the :ref:`keyword_regex` secondary keyword. To do this, add an ``extract`` keyword, whose value is a two element list containing the regex pattern to match and the desired return value which can reference the matched regex groups. E.g.: .. code-block:: yaml keyword: <<: *header extract: - \[(.*?)\] - \1 will return ``1:32,1:1032`` as ``\1`` will return the first regex group, i.e. whatever is contained within the round brackets. If the above instructions contained ``do_split: False``, the return values would have been ``[1:32,1:1032] [1:32,1:1032]`` and ``1:32,1:1032 1:32,1:1032`` respectively. ``format`` ^^^^^^^^^^ Creates a new string `formatting `_ ``value`` using the ``keys``. They can be of any secondary type known to VDAT at loading time, except ``format`` to avoid circular recursion. Assuming you have a fits file called ``file_001_LL.fits``, with a header keyword ``DATE-OBS = 2013-01-01``, the following configuration instructs the interpreters to extract the ``id`` key, a three digit number, from the file name and the ``DATE-OBS`` fits header value. .. code-block:: yaml keyword: type: format value: file_{id}_{sec}.fits keys: id: type: regex match: .*_(\d{3}).*\.fits replace: \1 date: type: header value: DATE-OBS The resulting value is the string ``file_001_2013-01-01.fits``. If the types for the keys do not exist, a ``CIKeywordTypeError`` will be raised at run time. If one of the keys has a string as value, it will be interpreted as of type ``plain``. As in the previous cases, if ``do_split`` is present and ``False``, the formatting is applied to all the elements in the primary and a concatenated string of white-space separated results is returned. ``fplane_map`` ^^^^^^^^^^^^^^ This type allows to maps from one type of ID to an other using the fplane file. The following code shows all the mandatory keys; their explanation can be found below. .. code-block:: yaml keyword: &fplane_map type: fplane_map fplane_file: /path/to/fplane.txt in_id: type: regex match: '.*?/dither_(\d{3})\.txt' replace: \1 in_id_type: ifuslot out_id_type: ifuid where: * ``fplane_file`` points to the fplane file * ``in_id`` can be any of the available keyword types and is used to extract the ID from the primaries. * ``in_id_type`` is the type of ID returned by ``in_id`` and can be any of the values supported by :meth:`pyhetdex.het.fplane.FPlane.by_id`. * ``out_id_type`` is the type of ID to return and can be any of the ones supported by :class:`pyhetdex.het.fplane.IFU`. If the primary is ``/path/to/dither_073.txt`` and the fplane file contains the following IFU:: # IFUSLOT X_FP Y_FP SPECID SPECSLOT IFUID IFUROT PLATESC 073 150.0 150.0 04 136 023 0.0 1.0 the above configuration returns the value ``'023'`` Similarly to the ``header`` keyword, by default the id is cast to a string. The ``formatter`` keyword can be used to the formatting of the output id. In the following example: .. code-block:: yaml keyword: <<: *fplane_map out_id_type: specid formatter: '{:03d}' # or '{0:03d}' the return value is ``'004'``. Without the ``formatter`` keyword the output would be ``'4'``. As in the previous cases, if ``do_split`` is present and ``False``, the ids are extracted from all the primaries and converted; the resulting IDs are concatenated. For information about the fplane parser, follow `this link `_. .. _execute_types: Built-in execute types ---------------------- ``new_file`` ^^^^^^^^^^^^ Following the instructions provided, this type builds a string and checks whether the file referenced by that string exists on the filesytem. The only mandatory option is ``value``: it is used to build the string from the primary and can be any of the available keyword types. E.g. given a primary like ``/path/to/123T456_001LL_sci.fits``, the following instruction will create the string ``/path/to/masterbias_001LL.fits`` and check if it exists. .. code-block:: yaml execute: &execute type: new_file value: type: regex match: (.*?)/\d*?T\d*?_(\d{3}[LR][LU])_.*\.fits replace: \1/masterbias_\2.fits In some cases it might not be possible to build the path of the output file directly from the primary files. In this case you can provide to the type definition the ``path`` keyword, whose return value is joined together with the name of the file returned by ``value``. ``path`` can be either of the following: * any of the available keyword types: the path is then constructed in the same way as ``value``. E.g.: .. code-block:: yaml execute: <<: *execute path: /other/path .. ** will check that the file ``/other/path/masterbias_001LL.fits`` exists * a string like ``$key``: this will get the value of path from the ``key`` keyword from the *command configuration*. The following behaves in the same way as the above example: .. code-block:: yaml other_dir: /other/path execute: <<: *execute path: $other_dir .. ** .. _plugin_types: Add new types ============= To any type, be it primary or not, there is a corresponding function that implements how to handle it. All the types are implemented as plugins, `discovered `_ and `dynamically loaded `_ at run time. The command interpreter looks for two entry points: * ``vdat.cit.primary``: for the definition of primary types * ``vdat.cit.keyword``: for the definition of other types * ``vdat.cit.execute``: for the definition of types to decide whether to execute or not the command Each entry point is defined as a string, like:: type = package.module:func where ``type`` is the name of the type and ``func`` is the function handling the keyword of ``type``; ``func`` is implemented in the ``module`` module of the package ``package``. The functions implementing primary and secondary keywords have the following signature: .. autofunction:: vdat.command_interpreter.types.primary_template :noindex: .. autofunction:: vdat.command_interpreter.types.keyword_template :noindex: .. autofunction:: vdat.command_interpreter.types.execute_template :noindex: The ``run`` method ------------------ Invoking .. automethod:: vdat.command_interpreter.CommandInterpreter.run :noindex: will: 1) collect all the ``primary`` files 2) filter them according to the list of selected items 3) loop over the ``primary`` files 4) check whether the step must be executed or not 5) for each step in the loop replace the relevant ``fields`` in the input command according to the instructions from the configuration 6) execute the command 7) report execution progress 8) collect and send out execution results The execution of each step in the loop is done using the worker-based interface provided by :mod:`pyhetdex.tools.processes`. Within the command interpreter only the worker named ``command_interpreter`` is used. The multiprocessing is enabled, if the ``multiprocessing`` keyword argument is given. Communication ============== The command interpreter communicates with the rest of the world through different channels. * Upon errors directly handled by the interpreter, one of the errors defined in :mod:`vdat.command_interpreter.exceptions` is raised. Most of those errors are notified to the user via pop-up windows. Check the documentation of :class:`~vdat.command_interpreter.CommandInterpreter` for more details. * During normal execution of the command, the resolved command string, standard output, error and any exception raised while executing the code are logged to a logger with the name of the executable. In ``VDAT``, these loggers are set to write to files located in the directory defined in the ``VDAT`` configuration file; the name of those files are the executable name with a ``.log`` extension. These loggers are set in the main ``VDAT`` code, not in the command interpreter sub-package. .. warning:: in a future release also the logging will be performed via the signal mechanism * .. automodule:: vdat.command_interpreter.signals :noindex: The list of available signal names can be retrieved with :func:`vdat.command_interpreter.signals.get_signal_names`, while the signals can be accessed with :func:`vdat.command_interpreter.signals.get_signal`. Callbacks can be connected to or disconnected from each signal using the methods :meth:`~vdat.command_interpreter.signals.BaseCISignal.connect` or :meth:`~vdat.command_interpreter.signals.BaseCISignal.disconnect`