wiki:DC3bPT11ProductionRunStatus
Last modified 8 years ago Last modified on 01/11/2011 12:39:06 PM

DC3b PT1.1 Production Run Status

This document lists runs of the DC3b PT1.1 Release Production pipelines


Production Runs 2000 series ("Dec 17 Run")

RunID Collection # of Visits Started Run On Original Product Directory Comments Backup
pt1prod_imAll ImSim 444 / 444 (currently on the LSST cluster) abe /lsst2/datarel-runs/2000/pt1prod_imAll/ Central root directory of 2000 series
pt1prod_im2002 ImSim 12 Dec 17 2010 abe /lsst2/datarel-runs/2000/pt1prod_im2002/ 189 Worker Pipelines on 32 nodes
Transferred calexp/ psf/ src/ icSrc/ icMatch/
pt1prod_im2003 ImSim 52 Dec 17 2010 abe /lsst2/datarel-runs/2000/pt1prod_im2003/ 189 Worker Pipelines on 32 nodes
Transferred calexp/ psf/ src/ icSrc/ icMatch/
pt1prod_im2004 ImSim 61 Dec 17 2010 abe /lsst4/datarel-runs/2000/pt1prod_im2004/ 189 Worker Pipelines on 32 nodes
Transferred calexp/ psf/ src/ icSrc/ icMatch/
pt1prod_im2005 ImSim 44 Dec 17 2010 abe /lsst4/datarel-runs/2000/pt1prod_im2005/ 189 Worker Pipelines on 32 nodes
Transferred calexp/ psf/ src/ icSrc/ icMatch/
pt1prod_im2006 ImSim 59 Dec 17 2010 abe /lsst2/datarel-runs/2000/pt1prod_im2006/ 189 Worker Pipelines on 32 nodes
Transferred calexp/ psf/ src/ icSrc/ icMatch/
pt1prod_im2007 ImSim 60 Dec 17 2010 abe /lsst4/datarel-runs/2000/pt1prod_im2007/ 189 Worker Pipelines on 32 nodes
Transferred calexp/ psf/ src/ icSrc/ icMatch/
pt1prod_im2008 ImSim 61 Dec 17 2010 abe /lsst4/datarel-runs/2000/pt1prod_im2008/ 189 Worker Pipelines on 32 nodes
Transferred calexp/ psf/ src/ icSrc/ icMatch/
pt1prod_im2009 ImSim 48 Dec 17 2010 abe /lsst4/datarel-runs/2000/pt1prod_im2009/ 189 Worker Pipelines on 32 nodes
Transferred calexp/ psf/ src/ icSrc/ icMatch/
pt1prod_im2010 ImSim 47 Dec 17 2010 abe /lsst4/datarel-runs/2000/pt1prod_im2010/ 189 Worker Pipelines on 32 nodes
Transferred calexp/ psf/ src/ icSrc/ icMatch/



Precursor Runs

RunID Collection # of Visits Started Run On Original Product Directory Comments Backup
pt1prod_im0011 ImSim 1 (v85661762-fr) 13 Oct 2010 abe /lsst2/datarel-runs/pt1prod_im0011/ Non-science quality run (using old App software tags on Abe, see attached file.)
47 Worker Pipelines on 8 nodes process a visit in ~32 minutes.
Some output work/ space corruption occurred in an accidental overwrite post facto.
Rerunning this case with some updated middleware.
pt1prod_im0014 ImSim 1 (v85661762-fr) 14 Oct 2010 abe /lsst2/datarel-runs/pt1prod_im0014/ Non-science quality run ; testing new orca 3.7.5
47 Worker Pipelines on 8 nodes; Again takes ~32 minutes nodes
One worker process (_37) appears to have exited early.
One dataset may have been dropped in the process
pt1prod_im0015 ImSim 1 (v85661762-fr) 19 Oct 2010 abe Non-science quality run ; test reformatted input file to avoid double work
23 Worker Pipelines on 4 nodes;
pt1prod_im0019 ImSim 1 (v85661762-fr) 19 Oct 2010 abe /lsst2/datarel-runs/pt1prod_im0019/ 4.1.x.x Stack on Abe; 23 Worker Pipelines on 4 nodes;
Common Error (for each ccd):
http://dev.lsstcorp.org/trac/wiki/DC3bPT11ProductionRunStatus#pt1prod_im0019Notes
pt1prod_im0022 ImSim 1 (v85661762-fr) 21 Oct 2010 abe /lsst2/datarel-runs/pt1prod_im0022/ 4.1.x.x Stack on Abe; 23 Worker Pipelines on 4 nodes;
ip-pipeline-4.1.0.1 added to the stack on Abe
All CCDs appear to be processed; No reported errors.
pt1prod_imsimshear021 ImSim 1 (v85545806-fr) 23 Oct 2010 LSST /lsst2/datarel-runs-imsim/pt1prod_imsimshear021 First run though of sheared images. Not with the unified stack.
pt1prod_imsimphottest010 ImSim 5 partial focal planes 25 Oct 2010 LSST /lsst2/datarel-runs-imsim/pt1prod_imsimphottest010 Had to modify my meas_utils trunk since my requested undersamplestyle for background fitting was getting overridden somehow. Using unified pipe. ONly partially completed since 011 is faster.
pt1prod_imsimphottest013 ImSim 5 partial focal planes 25 Oct 2010 LSST /lsst2/datarel-runs-imsim/pt1prod_imsimphottest013 Same as imsimphottest010 but using multiple nodes
pt1prod_im0034 ImSim 6 visits [v85661762-fr v85661927-fr v85748120-fg
v85748227-fg v85755500-fi v85755645-fi]
30 Oct 2010 abe /lsst2/datarel-runs/pt1prod_im0034/ 4.1.x.x Stack on Abe;
95 Worker Pipelines on 16 nodes;
All CCDs appear to be processed; No reported errors.
6 visits processed in ~90 minutes execution time on Abe; 10 minutes of startup time.
SourceAssoc failed early on because SFM generated a source with a negative flux error. See ticket #1504 for details; the result is very incomplete SourceAssoc output for this run.
pt1prod_im0089 ImSim 6 visits [v85661762-fr v85661927-fr v85748120-fg
v85748227-fg v85755500-fi v85755645-fi]
20 Nov 2010 abe /lsst2/datarel-runs/pt1prod_im0089/ 189 Worker Pipelines on 32 nodes;
All CCDs appear to be processed; No reported errors.
Software stack on Abe: eups-env-0089.txt
6 visits processed in ~35 minutes execution time on Abe
Level 0 Logging used
http://dev.lsstcorp.org/trac/wiki/DC3bPT11ProductionRunStatus#pt1prod_im0089Notes



Work spaces

LSST runs on the NCSA Abe Cluster generate outputs underneath the base directory /cfs/projects/lsst/DC3/data/datarel-runs/.
Results are transferred to the LSST cluster underneath the shared space /lsst2/datarel-runs/.

pt1prod_im0089 Notes

The pt1prod_im0089 results document that overall system scaling currently diminishes with pipeline verbosity. At level 2 logging we had observed no scaling improvement going from [16 nodes, 95 workers] to [32 nodes, 189 workers]. Now at level 0 logging we observe pt1prod_im0089 [32 nodes, 189 workers] processing 6 visits in ~ 35 minutes, with pt1prod_im0088 [16 nodes, 95 workers] processing 6 visits in ~ 54 minutes (more detailed timing statistics to come from database log scripts), and so we begin to achieve some proper scaling. We continue to probe the source of the slowdown with verbosity (broker configuration, file system performance, etc.)

pt1prod_im0019 Notes

An exception observed for each CCD is

harness.slice.visit.stage.tryProcess FATAL: Traceback (most recent call last):
File "/cfs/projects/lsst/DC3/stacks/gcc44/15oct2010/Linux64/pex_harness/4.1.0.0/python/lsst/pex/harness/Slice.py", line 546, in tryProcess stageObject.applyProcess()
File "/cfs/projects/lsst/DC3/stacks/gcc44/15oct2010/Linux64/pex_harness/4.1.0.0/python/lsst/pex/harness/stage.py", line 353, in applyProcess self.process(clipboard)
File "/cfs/projects/lsst/DC3/stacks/gcc44/15oct2010/Linux64/meas_pipeline/4.1.0.0/python/lsst/meas/pipeline/psfDeterminationStage.py", line 66, in process psf, cellSet = Psf.getPsf(exposure, sourceSet, self.psfDeterminationPolicy, sdqaRatings)
File "/cfs/projects/lsst/DC3/stacks/gcc44/15oct2010/Linux64/meas_algorithms/4.1.0.0/python/lsst/meas/algorithms/Psf.py", line 366, in getPsf kernelSize, nStarPerCell, constantWeight)
File "/cfs/projects/lsst/DC3/stacks/gcc44/15oct2010/EupsBuildDir/Linux64/meas_algorithms-4.1.0.0/meas_algorithms-4.1.0.0/python/lsst/meas/algorithms/algorithmsLib.py", line 890, in createKernelFromPsfCandidates
LsstCppException: 0: lsst::pex::exceptions::RuntimeErrorException thrown at include/lsst/afw/image/Mask.h:178 in void lsst::afw::image::Mask<MaskPixelT>::checkMaskDictionaries(const lsst::afw::image::Mask<MaskPixelT>&) const [with MaskPixelT = short unsigned int] 0: Message: Mask dictionary versions do not match; 2 v. 1

Attachments