- Data Archiving Overview
- Datastream directories
- NCS Scan directories
- IMBFits directory
- Project directories
- Pool Projects
- Data recovery procedures
This page is moderated by WB. Please note:
- This page is not yet complete, but I release it now for revision. Feel free to update.
- Some links are not yet linking to existing pages.
- Last essential modification 2008-03-24 by WB.
If not stated otherwise, we understand that NCS data means observation data. NCS produces of course more data, but speaking always of NCS observation data ... is a bit clumpsy.
Please note: While working on this document, I (wb) realized that we use the term archive and backup not correctly. Archive should mean that something is stored permanently and backup should mean that we create a copy. Some names of scripts, etc. ... are incorrect and therefore confusing. I hope to clearify this while working on this document.
This wiki page describes the directory structure used by the NCS. Details of the file structure are given in specifc pages. The directories are distributed over several computers, but are normally accessible via NFS from all user computers as well as from specific processors used for telescope control.
- Datastream directories
- NCS scan directories
- IMBFits directories
- Project scan directories
- Project directory to store results of online dataprocessing
Data is produced in the NCS in:
Datastreams: Processes in the NCS can produce datastreams. Datastreams are data records that are created either at a fixed rate or created upon a specific event. Examples:
- antennaMountDrive produces the records of its datastream at a fixed rate recording current antenna postions, tracking errors, etc. . See FileOrganizationDataStreamsAntMD.
- backend drives produce records of their datastreams when the switching state changes.
Scan Directories: The telescope control software (coordinator,...) creates a directory, calledscan director for each observation scan. A scan directory contains among others
- the observer specification for the scan,
- the telescope state at the beginning and at the end of the scan, and
- FITS files in the [IMBFIts FileOrganizationIMBFits] format for each backend used.
A copy of a scan directory is also put into the project directory structure. We introduces this to facilitate the management of NCS files on one hand and projects on the other hand.
IMBFits directories: IMBFits files are collected also in specific directories to facilitate dataprocessing.
Online Dataprocessing: Online dataprocessing for heterodyne receivers creates a CLASS file in a project directory.
Data Archiving Overview
The data of a scan is kept in the NcsScanDirectory, e.g. /ncsServer/mrt/ncs/data/20080324/scans/1/ and at the end of the scan a copy is put into the NcsProject, e.g. /vis/100-07/observationData/20080324/1.
The data produced for an observation is archived three times:
All directories of one day in the NcsScanDirectory are copied as a tar file to a disk NcsScanDirectoryArchiveDisk. We keep about one year of data on disk. After that, the tar files are moved to a tape NcsScanDirectoryArchiveTape. Note: the tar files contain also the datastreams of that day, but not the NcsIMBFits files as they can be reproduced again from the datastream files and the scan specification.
Directories in the NcsScanDirectory older than two weeks are
written to tape during maintenance and then removed from the NcsDirectory, including datastreams and NcsIMBFits files.
The observer normally get a copy of the project directory at the end of an observing session and a copy is kept in the ProjectArchive.
Several subsystems of the NCS can produce so-called [NcsDatastreams datastreams]. The datastreams are written into directories: /mrt-lx1/ncsServer/mrt/ncs/data/datastreams/ Currently the following datastreams are produced:
mrt@mrt-lx1:datastreams>pwd /mrt-lx1/ncsServer/mrt/ncs/data/datastreams mrt@mrt-lx1:datastreams>ls -1F 100khz/ 1MHz/ 1mhz -> 1MHz/ 4mhz/ abba/ antmd/ continuum/ fts/ FTS -> fts/ hera/ secondary/ vespa/ Vespa -> vespa/ wilma/ Wilma -> wilma/ mrt@mrt-lx1:datastreams>
Note: The above directory listing is not literal
Example of a datastream directory:
mrt@mrt-lx1:datastreams>ls -l 1MHz/ | tail -3 -rw-r--r-- 1 root root 5760 2007-01-10 16:38 iram30m-1mhz-20070110t163848.fits -rw-r--r-- 1 root root 5760 2007-01-10 16:39 iram30m-1mhz-20070110t163949.fits -rw-r--r-- 1 root root 5760 2007-01-10 16:40 iram30m-1mhz-20070110t164049.fits
Most datastream files are in FITS format. The filename is of format iram30m-<datastream>-<dataOfCreation>.fits. Files of other format might be there for test purposes.
Moving datastream files to day directory
Every 6hrs via cron, datastream files older than one day will be moved to a day directory /ncsServer/mrt/ncs/data/<date>/<datastream>. This is done in order to avoid having directories with too many files. Example:
du /ncsServer/mrt/ncs/data/20070108/datastreams/ 11540 /ncsServer/mrt/ncs/data/20070108/datastreams/100khz 140832 /ncsServer/mrt/ncs/data/20070108/datastreams/4mhz 552916 /ncsServer/mrt/ncs/data/20070108/datastreams/antmd 88428 /ncsServer/mrt/ncs/data/20070108/datastreams/continuum 23112 /ncsServer/mrt/ncs/data/20070108/datastreams/secondary 149644 /ncsServer/mrt/ncs/data/20070108/datastreams/Vespa 185224 /ncsServer/mrt/ncs/data/20070108/datastreams/1mhz 4496 /ncsServer/mrt/ncs/data/20070108/datastreams/hera 1156196 /ncsServer/mrt/ncs/data/20070108/datastreams/
Archiving day directories
Every day at 07:00 via cron, day directories (/ncsServer/mrt/ncs/data/<date>/...) older than 2 days will be archived on /mrt-lx1/ltmpDataStreams/archives as a gzipped tar. Day directories older than 2 days will also contain the datastreams of that day as explained above.
Note: the name ltmpDataStreams is misleading, the archives also contain the scan NCS directories.
/mrt-lx1/ltmpDataStreams is a USB disk and we have three of these disks for the scan directory and datastream archive.
- Do we have listings of the disk contents ?
- With the disks we use now, we have about 6 months of day directories on disks. Although the directories are in gzipped tar format, recovering the directories is easy. Shall we extend this period by using more disks ? Nevertheless, we also will have them on a backup tape, but access is more cumbersome and if the day directories disks are overwritten only one copy of the datastreams (the backup on tape) is left. See below "Recovery scenarios".
Moving day directories to tape
- Every maintenance day, day directories, that contain scan and datastream directories, are moved to disk, if they are older than 14 days. These tapes are kept forever.
Note: whereas we have copies of the scan directories on the project disk (/mrt-lx3/vis, the only additional copy we have of the datastreams is the archive on /ltmpDataStream. This means, if we by accident loose a backup tape, we might not be able to recreate the imbfits files after about 6 months.
NCS Scan directories
The NCS creates a directory and several subdirectories per scan. The root directory is called the scan directory. All directories of one day are put into directory: /mrt-lx1/ncsServer/mrt/ncs/data/<date>/scans. Example:
mrt@mrt-lx1:scans>pwd /mrt-lx1/ncsServer/mrt/ncs/data/20070110/scans mrt@mrt-lx1:scans>ls -trl total 1200 drwxr-xr-x 4 mrt mrt 4096 2007-01-10 00:05 1 drwxr-xr-x 4 mrt mrt 4096 2007-01-10 00:08 2 drwxr-xr-x 4 mrt mrt 4096 2007-01-10 00:09 3 ... drwxr-xr-x 4 mrt mrt 4096 2007-01-10 16:15 298 drwxr-xr-x 4 mrt mrt 4096 2007-01-10 16:21 300 drwxr-xr-x 4 mrt mrt 4096 2007-01-10 16:21 299 mrt@mrt-lx1:scans>
An example listing of a scan directory is:
mrt@mrt-lx1:scans>ls -ltr 299/* -rw-r--r-- 1 mrt mrt 18679 2007-01-10 16:15 299/iram30m-scan-20070110s299.xml -rw-r--r-- 2 mrt mrt 6082560 2007-01-10 16:21 299/iram30m-4mhz-20070110s299-imb.fits -rw-r--r-- 2 mrt mrt 16208640 2007-01-10 16:21 299/iram30m-wilma-20070110s299-imb.fits 299/debug: total 4 -rw-r--r-- 1 mrt mrt 2006 2007-01-10 16:21 antmd.commands 299/log: total 896 -rw-r--r-- 1 mrt mrt 65537 2007-01-10 16:15 iram30m-statebefore-20070110s299.xml -rw-r--r-- 1 mrt mrt 72842 2007-01-10 16:15 iram30m-statebefore-20070110s299.html -rw-r--r-- 1 mrt mrt 180290 2007-01-10 16:15 iram30m-statebefore-20070110s299.dbm -rw-r--r-- 1 mrt mrt 46795 2007-01-10 16:15 iram30m-scan-20070110s299.pickle -rw-r--r-- 1 mrt mrt 65548 2007-01-10 16:21 iram30m-stateafter-20070110s299.xml -rw-r--r-- 1 mrt mrt 72853 2007-01-10 16:21 iram30m-stateafter-20070110s299.html -rw-r--r-- 1 mrt mrt 180359 2007-01-10 16:21 iram30m-stateafter-20070110s299.dbm -rw-r--r-- 1 mrt mrt 523 2007-01-10 16:21 scanoverview.txt -rw-r--r-- 1 mrt mrt 6393 2007-01-10 16:21 iram30m-sync-20070110s299.xml -rw-r--r-- 1 mrt mrt 7153 2007-01-10 16:21 iram30m-sync-20070110s299.html -rw-r--r-- 1 mrt mrt 17126 2007-01-10 16:21 iram30m-sync-20070110s299.dbm -rw-r--r-- 1 mrt mrt 683 2007-01-10 16:21 scanoverview.html -rw-r--r-- 1 mrt mrt 9448 2007-01-10 16:21 logMessages.xml -rw-r--r-- 1 mrt mrt 4603 2007-01-10 16:21 makeimbfits4mhz.err -rw-r--r-- 1 mrt mrt 51057 2007-01-10 16:21 makeimbfits4mhz.log -rw-r--r-- 1 mrt mrt 6259 2007-01-10 16:21 makeimbfitswilma.err -rw-r--r-- 1 mrt mrt 50214 2007-01-10 16:21 makeimbfitswilma.log mrt@mrt-lx1:scans>
See [NcsScanDirectories] for details on the files in a scan directory.
Archiving and moving scan directories
The scan directories are archived on /mrt-lx1/ltmpDataStream and moved to a backup tape at the same time as the datastream files. See details above.
However, the scan direcories are duplicated to the project directory at the end of a scan. As the project directories are moved to a CD or DVD and the observer gets another copy, we finally end up with at least three copies of the scan directories.
At the end of scans, and possibly subscans, [FileOrganizationIMBFits IMBFits files] are created. The "imbfits" files are stored in the scan directory (see listing above) and, in addition using a unix hard link, in directory /mrt-lx1/ncsServer/mrt/ncs/data/imbfits/het and .../bol.
Note: "imbfits" files can also be created offline if the other files in a scan directory are available and datastream files for the subscan periods.
Note: We also store all imbfits files of a project in /mrt-lx3/vis/<project>/observationData/imbfits/
Archiving and backup of IMBFits files
IMBFits files are archived as the other project files on the observer and IRAM DVD.
Note: we right now also keep an archive of imbfits files per month.
Each project has its own home directory /mrt-lx3/vis/<project/.
ls /mrt-lx3/vis/204-06 3mm_10.30m Desktop1 gopako mira mydata observationData PaKo tmp Desktop FDveSv goPako Mira observationdata pako RCS mrt@mrt-lx1:het>
During the creation of a project ([CreateProjects]), in the project home a directory observationData is created. The owner of this directory is "mrt" and access protection is setup such that only "mrt" can read/write.
Please note: read access to project files should only be possible from account mrt and the project account. However, currently also world can read. We have to investigate this.
ls -ltr /mrt-lx3/vis/204-06/observationData total 136 drwxr-xr-x 3 mrt 204-06 4096 2007-01-06 01:06 mira drwxr-xr-x 3 mrt mrt 4096 2007-01-06 01:06 20070106 ... drwxr-xr-x 3 mrt mrt 4096 2007-01-11 00:04 20070111 drwxr-xr-x 2 mrt mrt 106496 2007-01-11 07:41 imbfits mrt@mrt-lx1:het>
In the directory
observationData we create copies of the "NCS scan directories" at the end of scans
mira we put results from [NcsOnlineDataProcessing]
imbfits all imbfits files of that project to facilitate the use of [NcsMIRA MIRA] (find command?).
Observers can get a copy of the project directories on CD, DVD. They also might just transfer the data themself via FTP. We also encorage Observers to bring their own USB-disk or memory-stick.
Archiving project directories
After the monthly backup, projects that have finished 2 weeks ago are removed from disk:
- The project files are put on a CD or DVD that is kept forever at IRAM.
- After this, the project is deleted from disk.
- Some projects can stay longer on disk.
- Pool projects are handled differently. See below.
Results of online dataprocessing
We store in a project directory also results of online dataprocessing. The observer can read those files but not writem them. We store:
- spectra, etc. in ... observationData/mira/spectraOdp.30m
- the plots created during online dataprocessing in
- .. observationData/mira/plots, ex: ... observationData/mira/plots/lastScan.html
mrt@mrt-lx1:observationData>pwd /vis/wbtest/observationData mrt@mrt-lx1:observationData>ls 20010203 20060510 20060531 20060805 20060926 20070103 20051122 20060529 20060620 20060808 20061107 imbfits 20060509 20060530 20060804 20060810 20061214 mira mrt@mrt-lx1:observationData>ls -l mira total 1636 drwxr-xr-x 3 mrt wbtest 20480 2007-01-03 16:49 plots -rw-r--r-- 1 mrt mrt 1649664 2007-01-03 16:49 spectraOdp.30m
- We have the problem that the spectraOdp.30m can be very large and, as the owner is mrt, the observer cannot rename that file to start with a new file. This has to be requested to the operator.
Here we shall describe spefic organization of pool projects.
Data recovery procedures
Here we shall explain how to create IMBFits files offline.
Here we shall explain how to recalibrate observation.
Here we shall list observed problems.
Performance can be documented here.
The observer shall be able to rename spectraOdp.30m files.