Please, send comments, questions, and bug reports about the online calibration with MRTCAL to gildas@iram.fr.

========================================================================
===              Online data processing with Mrtcal                  ===
===                 $Date: 2017/02/20 17:25:56 $                     ===
========================================================================

The MRTCAL Online Data Processing is composed of two parts: 1) the operator
interface (odp.sh script), and 2) the MRTCAL background process that
calibrates any new IMBFITS file that appears in a precise directory of the
file system. odp.sh can be launched only once for a long time. The MRTCAL
background process should be restarted regularly, e.g., at start of a new
project.

* Interface to the operator: Standard uses
  - The odp.sh script is an interface that allows the operator to interact
    with the MRTCAL process that runs the online calibration.
  - To start this interface script, you must:
      + log as mrt@mrt-lx3
      + execute the odp.sh script, i.e.,
           shell-prompt> /vis/mrt/mrtcal-dir/pico/online/odp.sh
  - The odp.sh script gives you by default the current status of the MRTCAL
    Online Data Processing. There are two standard uses cases:

      1. The most common case is that the MRTCAL OdP process is already
         running. There are two possibilities:

           + MRTCAL is processing one scan observed recently. You will get

             ----------------------------------------------------------------
             Running mrtcal session(s):
               PID %CPU %MEM   RSS S  STARTED     TIME COMMAND
             22974 72.0  1.3 324548 R 12:32:05 00:00:01 mrtcal @ pipeline-odp.cal /simu /log file:./logs/20170214-123159.log

             Status:
              - PROCESSING project t06-16 09-FEB-2017 WILMA
             Inputs found in
              - IMB-FITS      : ./imbfits
             Outputs stored in
              - Index         : ./index and ./index/t06-16
              - Class *.30m   : ./mrtcal
              - TAPAS XML     : ./VOXML (or ./VOXML/YYYYMMDD/scans/NN/)
              - Plots and html: ./plots and ./oplots
             Log file is      : ./logs/20170214-123159.log

             What do you want to do?
              0) Let mrtcal running and leave the current script
              2) Soft kill (let mrtcal finish current calibrations if any, this may
             take a while)
              3) Soft kill and restart
              4) Advanced options
              *) Any other key: print the status again
             ?>
             ----------------------------------------------------------------

           + MRTCAL is idle, waiting for new data:

             ----------------------------------------------------------------
             Running mrtcal session(s):
               PID %CPU %MEM   RSS S  STARTED     TIME COMMAND
             22974 39.6  2.4 600928 S 12:32:05 00:00:13 mrtcal @ pipeline-odp.cal /simu /log file:./logs/20170214-123159.log

             Status:
              - WAITING for new data
             Inputs found in
              - IMB-FITS      : ./imbfits
             Outputs stored in
              - Index         : ./index and ./index/t06-16
              - Class *.30m   : ./mrtcal
              - TAPAS XML     : ./VOXML (or ./VOXML/YYYYMMDD/scans/NN/)
              - Plots and html: ./plots and ./oplots
             Log file is      : ./logs/20170214-123159.log

             What do you want to do?
              0) Let mrtcal running and leave the current script
              2) Soft kill (let mrtcal finish current calibrations if any, this may
             take a while)
              3) Soft kill and restart
              4) Advanced options
              *) Any other key: print the status again
             ?>
             ----------------------------------------------------------------        

         In both cases, the interface is divided into a status section and
         a operator interaction section. In the status section, you get:
           + All the information about the current MRTCAL processes. Note
             the %CPU is not zero even when the process states that it is
             "WAITING for new data" because it is the %CPU that has been
             used since the start of the process, not the current CPU load.
           + Where the input imbfits file are searched for and where the
             output calibration products (PNG plot file, HTML web page, XML
             TAPAS files) are stored.
           + The exact name of the current log file that is written
             /vis/mrt/mrtcal-dir/logs/ directory. A convenient way to
             follow what the background MRTCAL process is doing is to type
             at the prompt of another XTERM:
                shell-prompt> tail -f /vis/mrt/mrtcal-dir/logs/20170214-123159.log
             The last lines of the log file will be automatically updated
             on the screen as the log file is written on disk.
         The sole differences between the two above cases is in the
         current calibration status of the MRTCAL background process:
         "PROCESSING" or "WAITING".

         In the operator interaction section, you are offered several
         choices:
           + Choice 0 will leave the odp.sh interface script without
             modifying the status of the background MRTCAL process. For
             instance, it will leave it alive if it already runs.
           + Choise 2 will gently stop the background MRTCAL process after
             the current scan calibration. In this case, the MRTCAL
             background process finishes to process the current scan and
             exit and the odp.sh script also exit. The operator then gets
             the standard shell prompt.
           + Choice 3 start with gently stopping the background MRTCAL
             process (as in choice 2) and it automatically restart a new
             background MRTCAL process. This choice is the recommended way
             to restart the MRTCAL background process. We recommend to do
             this at start of each project to ensure that RAM memory
             consumption does not drift with time (in standard operation,
             this should not happen as there should not be any memory leak)
             and that the state of MRTCAL is reset regularly.
           + Choice 4 will display another menu of actions that can be used
             in some rare but useful cases (see below).
           + Any other typed key will update the status section with
             current information.
         Note that the MRTCAL background process will process any new
         IMBFITS files acquired in the last 24h that appeared since the
         last MRTCAL background process exited. This allows us to process
         scans that arrive late on disk (e.g., long OTF scans), independent
         of their project number.

      2. For some reasons, the MRTCAL background process does not exist
         because he was stopped when there was no observations (e.g., in
         case of bad weather or maintenance period). The operator will then
         get a slightly different operator interaction section:

         ----------------------------------------------------------------
         No mrtcal process running.

         What do you want to do?
          0) Do nothing and leave the current script
          1) Start mrtcal (calibrate old pending data and future new data)
          4) Advanced options
          *) Any other key: print the status again
         ?>
         ----------------------------------------------------------------

         To start properly, the MRTCAL background process has to index all
         the IMBFITS files acquired in the last 24 to 48hrs. As there were
         no observations during this time that were not processed by a
         previous MRTCAL background process, there will be no new IMBFITS
         to index and calibrate, and standard start of the MRTCAL
         background process is the right thing to do. 

 * Interface to the operator: Exceptional uses
   - There are a few exceptional situations that requires different ways of
     starting/stopping the MRTCAL background process.

      1. The MRTCAL background process did not run for a large fraction of
         the last 24 to 48 hours while there were observations at the
         30m. This should only happen the first time ever the MRTCAL
         background process is launched (i.e., at the time of the swap from
         MIRA to MRTCAL). Here is nevertheless the solution for such
         exceptional situations.
           + Choice 1 is the wrong solution because the MRTCAL background
             process always try to calibrate all the IMB-FILES files
             available that are not yet calibrated. In this peculiar case,
             this would mean calibrating between 24 and 48 hours of
             observations. It would typically last a few hours before
             currently acquired data will be processed! 
           + As this is an exceptional situation, you should use choice 4
             that will lead to the following menu

             ----------------------------------------------------------------
             Advanced options are:
             5) Start mrtcal (***SKIP*** old pending data and calibrate future new data)
             *) Any other key: back to standard options
             ?>
             ----------------------------------------------------------------

             You should use choice 5 that start the MRTCAL background
             process in a special mode where all the IMBFITS files acquired
             in the last 24 to 48 hours are marked as SKIPPED so that
             MRTCAL will not try to calibrate them. In other words, you
             skip pending data and just calibrate newly acquired
             data. SKIPPED files can still be calibrated manually offline
             as required.

      2. For some reasons, the operator suspects that the MRTCAL background
         run wild and soft kill is not enough to stop it. This should never
         happen, but here is the solution in this case. Choice 4 (advanced
         options) will bring you to the following menu:

             ----------------------------------------------------------------
             Advanced options are:
             6) Hard kill (same as kill -9, faster but leaves uncalibrated data or failed calibrations)
             *) Any other key: back to standard options
             ?>
             ----------------------------------------------------------------

         Choice 6 will kill the MRTCAL background process independent of
         the fact that it is working or idle. If MRTCAL was calibrating a
         scan when choice 6 is selected, this will interrupt the processing
         and the writing of the results in the CLASS file. The processing
         of this scan will normally not be restarted when the MRTCAL
         background process will be restarted. But the calibration products
         for this scan will be in an ill defined state. This choice should
         thus be used with extreme caution.

* Additional operational information
  - Only one MRTCAL background process must run at a time to avoid messing
    up output CLASS files. That's why the odp.sh script always offer to
    stop the current MRTCAL process before running another one. Note that
    the observer can run his own version of MRTCAL because he has no right
    to write into the observationdata directory.
  - When the background MRTCAL process is running, you can disconnect from
    the terminal used to launch the odp.sh script without consequence.
  - The yearly online version of GILDAS is used by default (feb17
    automatically selected by gag_feb17 as of 2017-02-14).
  - The script can be started from whichever directory. It automatically
    will move to /vis/mrt/mrtcal-dir/ and it will work from there.
  - The calibration products (PNG plot file, HTML web page, XML TAPAS
    files) are first produced in subdirectories of this directory before
    being copied into the project account. CLASS files are directly written
    into the observationdata/mrtcal directory of the project account.
  - Once an IMBFITS file has been successful calibrated, the MRTCAL
    background process will *never* try again to process the corresponding
    IMBFITS file.
  - To ensure the correct behavior of the MRTCAL background process, no
    manual interventions should be done on the MRTCAL index files whose
    extension is .mrt.

* Trouble-shooting:
  - MRTCAL will stop running in case of
      + catastrophic error (e.g. segmentation fault),
      + standard error (since MRTCAL is executed in batch mode, any error
        will not give a prompt, but will stop abruptly the program
        instead).
    In this case, you should check the log file to have a clue of what
    happened. The IMBFITS file which was being processed, if any, will be
    set as FAILED in the index and will not be recalibrated when MRTCAL is
    restarted. This avoids running in an infinite vicious circle.  It
    should be safe to restart MRTCAL through odp.sh
  - We will provide in the future the possibility to reprocess cleanly
    a subset of IMBFITS files (with selection criteria) from scratch to
    deal with exceptional cases.

***************************************************************************

Behind the curtains:
--------------------

* At the beginning of the session:
  - MRTCAL will look in the main IMB-FITS directory:
     /ncsServer/mrt/ncs/data/imbfits/het/
  - It will update (or create if missing) one index of ALL files (all
    projects) per day, for yesterday and today (only). The files can be
    found under the ./index/ subdirectory, and are named following the
    format YYYYMMDD.mrt
  - If you have choosen to skip the old uncalibrated files, they are
    marked as SKIPPED in the MRTCAL index file.
  - As soon as the update of the index is finished, i.e., new IMBFITS files
    are indexed, MRTCAL calibrates the files with a NONE calibration status
    (i.e., not yet calibrated), for yesterday and today.
  - Once this is done, MRTCAL enters a watching loop (see below).

* During the session:
  - MRTCAL listens to the apparition of new IMBFITS files in the master
    directory (where they appear at once thanks to an hardlink, i.e., there
    should avoid "inconsistent" file issue).
  - When a new IMBFITS file appears in this directory, the index is updated.
  - Once the index is updated, MRTCAL calibrates all the IMBFITS files with
    a NONE calibration status (i.e. not yet calibrated), for any project,
    for yesterday and today. This can be useful in case IMBFITS files would
    be produced offline by A.Sievers. The output products are put into the
    account project ID stored in the IMBFITS file.
  - Once they are calibrated, MRTCAL enters again the watching loop.
    If other files appeared in the meantime, the index is updated and the
    new IMBFITS files are
    processed, and so on.
  - When nothing is to be done, MRTCAL gently waits for new data.
  - All the reduction procedures take a special care of midnight. In
    particular the procedures support the following conditions:
     + The day has changed but IMBFITS file that appeared is from the day
       before (This is why we always check if new data from yesterday
       is to be calibrated. This is a minor cost anyway).
     + The scan was started the day before but it provides data integrated
       on the next day (Note for developpers: it could be an issue if
       relying on the Class header instead of the IMB-FITS date).
     + The day changes during the process of new files, i.e.  keywords
       YESTERDAY and TODAY are resolved at the correct place so that they
       do not change during the calibration, messing up the processing.

* Calibration products:
  - The project id (e.g. t06-16) (named "<project-id>" below) is
    found in the IMB-FITS file, so that MRTCAL knows where the
    products should go, even if several projects are mixed during the
    same reduction process.
  - Class files are written in:
      /mrt-lx3/vis/<project-id>/observationData/MRTCAL/
    One file per day and per backend (and obviously per project) is
    produced. The files are created if missing. After creation, they are
    only incremented with the new spectra of the day.
  - VOXML files are written in:
      /ncsServer/mrt/ncs/data/YYYYMMDD/scans/NN/
    (where YYYYMMDD is the date and NN the scan number)
  - OdP plots and html pages are written in:
      /mrt-lx3/vis/<project-id>/observationData/MRTCAL/plots/
    There is also a second directory where they are duplicated:
      /ncsServer/mrt/ncs/monitor/plots/MRTCAL

* Settings:
  - MRTCAL and procedure settings, directories, and so on can be found in
    the procedure gildas/packages/MRTCAL/pro/pipeline-odp.cal of the used
    GILDAS version. Each change must be commited in the correct CVS branch,
    compiled and reinstalled (same as Fortran code).
  - There is no plan for an external setting file, i.e., everything should
    be version-controled for clarity.

* Watching what MRTCAL OdP is doing:
  - If you invoke again the script odp.sh while an MRTCAL reduction
    is active, the current status of this process will be displayed.
  - Basically there are 2 possible statuses:
     1/ Waiting for new data (idle)
     2/ Processing a project (details will be given)

* Stopping the reduction script:
  - As the MRTCAL process runs an infinite loop in batch mode, ending the
    MRTCAL process has to be done in a specific way. The script odp.sh
    offers 2 possibilities:
      + Soft kill: This option activates a trigger which tells the
        current session to stop "when possible".
          1) If MRTCAL is idle, it will stop within the next second.
          2) If MRTCAL is calibrating data files, it will first finish
             all the remaining queue before stopping. This may take a
             while, and you won't be able to start another MRTCAL process
             before the previous one has stopped as concurrent MRTCAL
             sessions are not authorized/supported in this context.
      + Hard kill: It will just send a SIGKILL signal (kill -9) to the
        process. This should be instantaneous BUT this has undesired
        consequences:
         1) This may leave uncalibrated files.
         2) This may leave the currently calibrated file as FAILED, and
            it will not be recalibrated the next time the calibration
            pipeline is resumed.
         3) This may leave the output files (index file, Class file, or
            others) in a bad shape (corruption...) if the process is
            killed at a critical moment.
        You should avoid this option as much as possible.

MrtCal (last edited 2017-02-21 05:38:42 by ManuelRuiz)