wiki:DC3bPT1ProductionOperatorGuide
Last modified 9 years ago Last modified on 06/10/2010 11:36:31 AM

PT1 Production Operator's Guide Rev 0.01

This document provides a cookbook procedure enabling the non-expert to run the orca-enabled PT1 pipelines from ISR to SFM followed by use of SimpleStageTester SourceAssoc? pipeline.

The operator should refer to RunningOrcaDC3b for detailed information on the orca scheduler operation and use.

Quick Start

$ mkdir myRun; cd myRun

$ PREFIX=cfht
# or
$ PREFIX=imsim

$ setup datarel
$ cp $DATAREL_DIR/bin/runOrca/${PREFIX}-setupForOrcaUse.sh ./setup.sh
$ source ./setup.sh

$ cp -r $DATAREL_DIR/pipeline .
# and edit any needed policies in ./pipeline
# or
$ ln -s $DATAREL_DIR/pipeline .
# if only standard policies are to be used.

$ cp ./pipeline/${PREFIX}-orca.paf ./orca.paf
# Edit Orca policy as needed.
# Generate list of amps into input.txt (see below).

$ RUNID=raa2010053101
$ orca.py -r ./pipeline -e ./setup.sh -V 10 -P 10 ./orca.paf $RUNID
# Wait for pipelines to start... then (perhaps in another window):
$ announceDataset.py -r $RUNID -b lsst8.ncsa.uiuc.edu -t RawAvailable input.txt
# When all done...
$ shutprod.py 1 $RUNID

Directory Structure and Files

When a policy file is specific to a dataset, the filename will be prefaced with the dataset type, e.g. cfht-orca.paf.

  • datarel/pipeline/
    • set of pipeline orchestration policy files - <dataset>-orca.paf
      • cfht-orca.paf
      • imsim-orca.paf
      • platform/ - defines network configuration and root directory for all I/O deployment
        • abecluster.paf
        • lsstcluster_lustre.paf
        • cfht-lsstcluster-nfs.paf
        • imsim-lsstcluster-nfs.paf
    • set of production pipeline job office policy files - <dataset>-joboffice.paf
      • cfht-joboffice.paf
      • imsim-joboffice.paf
      • individual pipelines' job office files are in each pipeline's policy directory
    • set of pipeline management policy files - <dataset>-<pipeline>-master.paf
      • cfht-isr-master.paf
      • cfht-ca-master.paf
      • cfht-crSplit-master.paf
      • cfht-imgChar-master.paf
      • cfht-sfm-master.paf
      • cfht-sa-master.paf
      • imsim-isr-master.paf
      • imsim-ca-master.paf
      • imsim-crSplit-master.paf
      • imsim-imgChar-master.paf
      • imsim-sfm-master.paf
      • imsim-sa-master.paf
    • pipeline policy subdirectories
  • datarel/bin/runOrca
    • scripts to build input data list
      • buildIsrSkyTileInput.sh
      • SkyTileCcds.py
      • VisitCcds.py
    • scripts to setup orca run
      • cfht-setupForOrcaUse.sh
      • imsim-setupForOrcaUse.sh
      • deployData.sh
      • cfht-joboffice.sh
      • imsim-joboffice.sh
    • convenience scripts to build commands to initiate run
      • cfht-buildRunCommands.sh
      • imsim-buildRunCommands.sh

Sample Pipeline Policy Set

The policies specified in the datarel/pipeline/<pipeline> subdirectories are invoked in the order specified in the <pipeline> management file, datarel/pipeline/<dataset>-<pipeline>-master.paf.

For example, the following files are listed in the ISR master.paf: datarel/pipeline/imsim-isr-master.paf:

  • ISR job office policies
    • cfht-isr-joboffice.paf
    • imsim-isr-joboffice.paf
  • ISR pipeline stage policies
    • 000-getajob.paf
    • 010-imsim-input.paf (or for cfht: 010-cfht-input.paf)
    • 040-saturation.paf
    • 050-overscan.paf
    • 060-bias.paf
    • 070-dark.paf
    • 080-flat.paf
    • 090-fringe.paf
    • 100-imsim-exposureOutput.paf (or for cfht: 100-cfht-exposureOutput.paf)
    • 110-sdqa.paf
    • 120-sdqaOutput.paf
    • 130-jobDone.paf
    • 140-failure.paf

Production Run Setup

Tuning Network Orchestration

Refer to RunningOrcaDC3b for detailed information on network options.

  • In datarel/pipeline/cfht-orca.paf or datarel/pipeline/imsim-orca.paf,
    • modify repositoryDirectory to specify the full pathname to the datarel policy files;
    • modify workflow.platform to specify network configuration policy file
  • In datarel/pipeline/platform/cfht-lsstcluster-nfs.paf or datarel/pipeline/platform/imsim-lsstcluster-nfs.paf
    • modify dir.defaultRoot: to define directory root where job's datasets and management files are located
    • modify deploy.nodes to reconfigure the pipeline allocation across cluster nodes.
  • In datarel/bin/runOrca/cfht-joboffice.sh or datarel/bin/runOrca/imsim-joboffice.sh
    • modify defaultRoot to match the directory specified in platform/<datatype>-lsstcluster-nfs.paf.

Tuning Pipeline Management

  • In datarel/pipeline/cfht-joboffice.paf and datarel/pipeline/imsim-joboffice.paf,
    • modify framework.environment to specify a script to setup the system environment for pipeline execution
    • modify framework.exec to specify the Job Office script starting each pipeline's Job Office .

Tuning Dataset Selection

Provide a complete set of Amp (or Channel) data for at least one CCD in order for the pipeline processing to transition through the PT1 pipelines.

Building Input Data Lists

Generally, production runs should be setup to process SkyTile groups instead of Visit groups since the final pipeline, Source Association, collects and operates on all Sources within a given Tile over all Visits.

SkyTile Coverage

  • Use: datarel/bin/runOrca/SkyTileCcds.py to create a single user-selected SkyTile input list.
% ./SkyTileCcds.py

Usage: ./SkyTileCcds.py  option  SKYTILE
where
   SKYTILE - identifies region whose overlapping CCDs are selected
Example: ./SkyTileCcds.py --cfht 100477
Example: ./SkyTileCcds.py --imsim 93687



  • In order to build the entire set of per-SkyTile input lists at once, use the script: datarel/bin/runOrca/buildIsrSkyTileInput.sh. One file per SkyTile will be written using the naming convention: ./<dataset type>-isr-skyTileInput-<Skytile#>.txt
% ./buildIsrSkyTileInput.sh 

Use: buildIsrSkyTileInput.sh -c | -i 
where 
 -c indicates cfht data
 -i indicates ImSim data
Example: buildIsrInput.sh -c
Example: buildIsrInput.sh -i

Visit Coverage

  • Use: datarel/bin/runOrca/VisitCcds.py to create a single user-selected Visit input list.
% ./VisitCcds.py 

Usage: VisitCcds.py  option  VISIT
where
   option - one of '--cfht' or 'imsim' to designate dataset type
   VISIT - identifies the visit selected
Example: VisitCcds.py --cfht 793310
Example: VisitCcds.py --imsim 85471048

Production Run

Environment Setup

Setup the LSST stack based on the dataset to be processed in two terminal sessions. One session will be used to initiate the pipeline managers; the other session will be used for in-progress status checks.

export LSST_HOME=/lsst/DC3/stacks/default
source ${LSST_HOME}/loadLSST.sh
setup datarel
cd $DATAREL_DIR/bin/runOrca
source cfht-setupForOrcaUse.sh     <<<<< choose 'cfht-' or 'imsim-'

Build Run Commands

The convenience script: <dataset>-buildRunCommands.sh, generates: the orca command, the announceData commands, and shutprod termination command, to be used for the production run.

$ ./cfht-buildRunCommands.sh 

cfht-buildRunCommands.sh <runid> <inputlist>
where
   runid : unique ID for run
   inputlist : full pathname to list of visit/ccd/amp to process.

Example: cfht-buildRunCommands.sh raa20100521_01 /tmp/TestCfhtInput.txt

$ ./cfht-buildRunCommands.sh raa20100526_00 $DATAREL_DIR/bin/runOrca/V793310-input.txt


moving into: /lsst/home/rallsman/test/datarel
============================

     cd /lsst/home/rallsman/test/datarel/pipeline; \
     orca.py -r /lsst/home/rallsman/test/datarel/pipeline \
         -e /lsst/home/rallsman/test/datarel/bin/runOrca/cfht-setupForOrcaUse.sh \
         -V 10 -P 10 cfht-orca.paf raa20100526_00 
============================

     announceDataset.py -r raa20100526_00 -b lsst8.ncsa.uiuc.edu \
         -t RawAvailable /lsst/home/rallsman/test/datarel/bin/runOrca/V793310-input.txt
===========================
     shutprod.py 1 raa20100526_00
===========================

Starting Pipeline Processing

Start Job Office Managers for all Pipelines

  • Use the first command displayed by '<dataset>-buildRunCommands.sh'.

Determine if you want informational messages to be displayed to the terminal session, loaded into a file, or abandoned...then add the appropriate ending to the command (e.g. background by '&', capture output by '>/tmp/MyOrca.log 2>&1)

$ cd /lsst/home/rallsman/test/datarel/pipeline; \
orca.py -r /lsst/home/rallsman/test/datarel/pipeline \
-e /lsst/home/rallsman/test/datarel/bin/runOrca/cfht-setupForOrcaUse.sh \
-V 10 -P 10 cfht-orca.paf raa20100526_00 

/bin/runOrca/imsim-setupForOrcaUse.sh -V 10 -P 10 imsim-orca.paf raa20100526_03 
          orca DEBUG: pipelinePolicyFile = imsim-orca.paf
          orca DEBUG: runId = raa20100526_03
          orca.manager DEBUG: Running production: raa20100526_03
          orca.manager DEBUG: ProductionRunManager:createConfigurator
          orca.manager DEBUG: ProductionRunConfigurator:__init__
          orca.manager.config DEBUG: ProductionRunConfigurator:configure
self._prodPolicyFile =  /lsst/home/rallsman/test/datarel/pipeline/imsim-orca.paf
          orca.manager.config DEBUG: ProductionRunConfigurator:createDatabaseConfigurator

  ..... lots more, then ending with following ....

>>>  ['/home/rallsman/test/datarel/bin/runOrca/deployData.sh', '/lsst/DC3/data/datarel-test/ImSim/raa20100526_03', '/lsst/DC3/data/obstest', 'ImSim']
          orca.manager.config.workflow DEBUG: GenericPipelineWorkflowLauncher:__init__
          orca.manager.launch DEBUG: StatusListener:__init__
          orca.manager.config.workflow DEBUG: WorkflowManager:runWorkflow
          orca.manager.config.workflow DEBUG: WorkflowManager:isRunnable
          orca.manager.config.workflow DEBUG: WorkflowManager:isDone
          orca.manager.config.workflow DEBUG: GenericPipelineWorkflowLauncher:launch
          orca.manager.config.workflow.monitor DEBUG: GenericPipelineWorkflowMonitor:__init__
          orca.manager.config.workflow.monitor DEBUG: WorkflowMonitor:addStatusListener
          orca.manager.config.workflow.monitor DEBUG: GenericPipelineWorkflowMonitor Thread started
          orca.manager DEBUG: listening for shutdown event at 0.2 s intervals
                    orca.manager DEBUG: checking for shutdown event
          orca.manager DEBUG: self._timeout = 10
  • Wait until the first pipeline in the sequence is setup and ready to receive input events. When the first pipeline's work/<pipeline>_1/launch.log reports the message: "DEBUG: Told JobOffice, I'm ready!", you may send the announceData.py command generated by '<dataset>-buildRunCommands.sh'.
$ cd /lsst/DC3/data/datarel/CFHTLS/raa20100526_00/work/isr_1

$ tail launch.log
          harness.slice.threadBarrier DEBUG: Slice 0 done waiting; signaling back 1274887893.622478
          harness.slice.threadBarrier DEBUG: Slice 0 sent signal back. Exit threadBarrier  1274887893.622921
   harness.slice.visit.stage.tryProcess DEBUG: Starting tryProcess
   harness.slice.visit.stage.tryProcess DEBUG: Getting process signal from Pipeline
 harness.slice.visit.stage.process DEBUG: Starting process
  harness.slice DEBUG: Told JobOffice, I'm ready!
          harness.pipeline.threadBarrier DEBUG: Done waiting for signal from Slice 0 1274887893.722551
          harness.pipeline.threadBarrier DEBUG: Entry time 1274887893.723013
          harness.pipeline.threadBarrier DEBUG: Signal to Slice  0 1274887893.723343
          harness.pipeline.threadBarrier DEBUG: Wait for signal from Slice 0
[rallsman@lsst5 isr_1]$ 

Provide Input Events to First Pipeline

$ announceDataset.py -r raa20100526_00 -b lsst8.ncsa.uiuc.edu -t RawAvailable \
     /lsst/home/rallsman/test/datarel/bin/runOrca/V793310-input.txt

announceDataset: sending event for raw-amp0-ccd0-filterr-visit793310
announceDataset: sending event for raw-amp1-ccd0-filterr-visit793310
announceDataset: sending event for raw-amp0-ccd1-filterr-visit793310
announceDataset: sending event for raw-amp1-ccd1-filterr-visit793310
announceDataset: sending event for raw-amp0-ccd2-filterr-visit793310
announceDataset: sending event for raw-amp1-ccd2-filterr-visit793310
 ....

$ tail launch.log
  harness.pipeline.visit.stage DEBUG: Starting 060-bias loop
   harness.pipeline.visit.stage.handleEvents DEBUG: Starting handleEvents
          harness.pipeline.visit.stage.handleEvents DEBUG: No event to handle
   harness.pipeline.visit.stage.handleEvents DEBUG: Ending handleEvents
          harness.pipeline.threadBarrier DEBUG: Entry time 1274888015.121552
          harness.pipeline.threadBarrier DEBUG: Signal to Slice  0 1274888015.121885
          harness.pipeline.threadBarrier DEBUG: Wait for signal from Slice 0
          harness.slice.threadBarrier DEBUG: Slice 0 done waiting; signaling back 1274888015.122594
          harness.slice.threadBarrier DEBUG: Slice 0 sent signal back. Exit threadBarrier  1274888015.123078
          harness.slice.threadBarrier DEBUG: Slice 0 waiting for signal from Pipeline 1274888015.123551
$ 

Watch Pipeline Progress

Watch the output products directories get populated.

$ cd  /lsst/DC3/data/datarel/CFHTLS/raa20100526_00/input

$ ls 
calib  D1  D2  D3  D4  postISR  postISRCCD  registry.sqlite3  visitim


$ ls -R visitim
visitim:
v793310-fr

visitim/v793310-fr:
c00.fits  c01.fits

Check for Processing Errors

Scan the job office managers' work logs for 'FATAL' and 'WARNING' error messages.

$ cd /lsst/DC3/data/datarel/<dataset>/<runID>/work/

$ grep "FATAL\|WARN" *_1/Slice0.log


$ cd /lsst/DC3/data/datarel/<dataset>/<runId>/input 
$ ls
calexp  D1  D3  postISR     psf               src
calib   D2  D4  postISRCCD  registry.sqlite3  visitim
$ 

Cleanly End Pipeline Managers

The shutprod.py code sends shutdown events to all pipeline and joboffice servers on all systems operating on behalf of the specified runId.

Determine that the final pipeline in the processing sequence has completed all its output products. You may need to check error status on the final pipeline's Slice0.log to check if abnormal terminations prevented generation of the derivative output product. You may need to check earlier pipeline's abnormal terminations, also.

$ cd /lsst/DC3/data/datarel/CFHTLS/raa20100526_01/input 

$ ls
calexp  D1  D3  postISR     psf               src
calib   D2  D4  postISRCCD  registry.sqlite3  visitim

$ ls -R calexp psf  #<<<<< final ImgChar products

$ ls -R src         #<<<< final SFM product

$ cd /lsst/DC3/data/datarel/CFHTLS/raa20100526_01/work
$ grep FATAL *_1/Slice0.log

$ shutprod.py 1 raa20100526_01

Continuing through SourceAssoc

The Simple Stage Tester (SST) for Source Association will acquire the SFM's boost output and process it.

Modify SourceAssoc SST

Change the definitions for root and outRoot to specify where the SFM output products are located. Based on the examples we've used elsewhere in this document, the change would be:

rootDir=/lsst/DC3/data/datarel/CFHTLS/raa20100526_01/input
outRootDir=/lsst/DC3/data/datarel/CFHTLS/raa20100526_01/input
sourceAssocProcess(root=rootDir, outRoot=outRootDir)

Run Source Assoc SST

cd datarel/bin/sst
./SourceAssoc_ImSim.py 

Convert SourceAssoc Boost Output to CSV

Serge provided the following procedure:

Date: Tue, 18 May 2010 14:10:28 -0700
From: "Serge Monkewitz" <smm@ipac.caltech.edu>
Subject: [LSST-data] PT1: converting boost persisted Sources to CSV


At the stand-up today, I was asked how to convert boost persisted sources
to CSV (and load them into mysql).

To convert from boost persisted sources to CSV:

     python boostPt1Source2CSV.py input_sources.boost output_sources.csv

where boostPt1Source2CSV.py is in the bin directory of ap/trunk.

To load the CSV into a mysql table (assuming you are using the lsst10
mysql instance):

     CREATE TABLE MySources LIKE test_source_assoc.SourceTemplate;
     LOAD DATA INFILE '/path/to/output_sources.csv' INTO TABLE MySources
FIELDS TERMINATED BY ',';

Using our example, again:

cd /lsst/DC3/data/datarel/CFHTLS/raa20100526_01/input/results/<skytile>/
python $AP_DIR/bin/boostPt1Source2CSV.py <source>.boost <source>.csv

where param 1 points directly to the boost file to convert
      param 2 points whereever you want the result