wiki:Summer2012/PreProductionRuns
Last modified 6 years ago Last modified on 07/26/2012 02:16:06 PM

Summer 2012 Production Runs are found on the ProductionRunStatus page

June Production

Preparation Stripe82 runs on SDSC Gordon

We target the processing of the 19260 CCDs organized by Simon (/nfs/lsst3/weekly/data/obs_sdss-2012-06-20-Annis).

processCcdSdss with multiShapelet turned on

s2012prod_im0060

  • 2 processCcdSdss.py executions per Condor job; (192 slots appear utilized for the most part)
  • Condor configuration of s2012prod_im0045
  • 32 nodes of Gordon, 192 available processing slots.
  • 18723/19260 CCDs processed (18723 src/ output products)
  • processing dag takes 3 hours 05 minutes total (effective 1 min 50 sec per CCD)

s2012prod_im0052

  • Condor configuration of s2012prod_im0045
  • 32 nodes of Gordon, 192 available processing slots.
  • 18702/19260 CCDs processed (18702 src/ output products)
  • processing dag takes 3 hours 30 minutes total (effective 2 min 06 sec per CCD)

Within 32 nodes 192 processing slots test runs one can clearly see that our Condor configuration does not keep all 192 processing slots filled with jobs, as the rate of job turnover is faster than Condor's response. The number of filled processing slots is a varying number in between 96 and 192, say typically ~150. This is the apparent cause of the diminished processing perfromance wrt s2012prod_im0045. Our choices appear to be

  • push Condor config further (lower _INTERVAL's)
  • Run longer jobs, e.g., two fields per job (modify newDagGenerate.py)

2000 CCD Tests
The following test runs attempt to process 2000 CCDs of /nfs/lsst3/weekly/data/obs_sdss-2012-06-20-Annis.

s2012prod_im0045

Improvements in our approach for this run are:

  • EventLog takes logs to events/database, and not files on disk
  • preJob prepares environment script env.sh for all workers, reducing EUPS overheads
  • Condor tuned for rapid submission, quick response

POLLING_INTERVAL       = 5
SCHEDD_INTERVAL        = 30
NEGOTIATOR_INTERVAL    = 30
UPDATE_INTERVAL        = 40

DAGMAN_SUBMIT_DELAY = 0
DAGMAN_MAX_SUBMITS_PER_INTERVAL = 1000

SUBMIT_SEND_RESCHEDULE = False
SUBMIT_SKIP_FILECHECKS = True

Results:

  • 16 nodes of Gordon, 96 simultaneous processing slots.
  • 1971/2000 CCDs processed (1971 src/ output products)
  • processing dag takes 29 minutes total, or ~ 1 min 24 sec per CCD

s2012prod_im0015

  • 16 nodes of Gordon, 96 simultaneous processing slots.
  • 1971/2000 CCDs processed (1971 src/ output products)
  • 29 Tracebacks in the logs, and hence apparent success rate is 1971/2000.
  • produces 60.2 GB of sci-results/.
  • processing dag takes 1 hour 15 minutes

s2012prod_im0016

  • 32 nodes of Gordon, 192 simultaneous processing slots
  • 1971 src/ output products
  • 29 Tracebacks in the logs, and hence apparent success rate is 1971/2000.
  • produces 60.2 GB of sci-results/.
  • processing dag takes 52 minutes; (scaling behavior: related to writing logs to disk?)

First Test Run: multiShapelet was left off

The first test runs on Gordon try to execute processCcdSdss.py on a small subset of 10 of these

run=1033 filter=r camcol=4 field=124
run=1033 filter=i camcol=4 field=106
run=1033 filter=u camcol=4 field=105
run=1033 filter=r camcol=4 field=121
run=1033 filter=g camcol=4 field=100
run=1033 filter=r camcol=4 field=113
run=1033 filter=u camcol=4 field=111
run=1033 filter=g camcol=4 field=112
run=1033 filter=z camcol=4 field=104
run=1033 filter=z camcol=4 field=114

We setup v5_1 tags of the stack on the LSST cluster and on Gordon within Condor jobs.

We set up a condor glide-in for two nodes of Gordon, 12 processing slots available. In test run s2012prod_im0011 submitted via ctrl_orca we appear to execute through to the end:

% pwd
/oasis/scratch/ux453102/temp_project/lsst/s2012prod_im0011/output

% ls -l -R  sci-results/1033/4/*/src
sci-results/1033/4/g/src:
total 604
-rw-r--r-- 1 ux453102 use310 351360 Jun 23 10:26 src-001033-g4-0100.fits
-rw-r--r-- 1 ux453102 use310 264960 Jun 23 10:25 src-001033-g4-0112.fits

sci-results/1033/4/i/src:
total 448
-rw-r--r-- 1 ux453102 use310 457920 Jun 23 10:26 src-001033-i4-0106.fits

sci-results/1033/4/r/src:
total 1128
-rw-r--r-- 1 ux453102 use310 380160 Jun 23 10:26 src-001033-r4-0113.fits
-rw-r--r-- 1 ux453102 use310 408960 Jun 23 10:26 src-001033-r4-0121.fits
-rw-r--r-- 1 ux453102 use310 362880 Jun 23 10:26 src-001033-r4-0124.fits

sci-results/1033/4/u/src:
total 216
-rw-r--r-- 1 ux453102 use310 141120 Jun 23 10:26 src-001033-u4-0105.fits
-rw-r--r-- 1 ux453102 use310  77760 Jun 23 10:25 src-001033-u4-0111.fits

sci-results/1033/4/z/src:
total 412
-rw-r--r-- 1 ux453102 use310 213120 Jun 23 10:25 src-001033-z4-0104.fits
-rw-r--r-- 1 ux453102 use310 204480 Jun 23 10:25 src-001033-z4-0114.fits

Some sample command lines within Condor jobs:

/oasis/scratch/ux453102/temp_project/lsst/beta-0618/lsst_home/Linux64/pipe_tasks/5.1.2.0+1/bin/processCcdSdss.py 
sdss /oasis/scratch/ux453102/temp_project/lsst/s2012prod_im0011/runs 
--id run=1033 filter=g camcol=4 field=100 
--output /oasis/scratch/ux453102/temp_project/lsst/s2012prod_im0011/output  
> /oasis/scratch/ux453102/temp_project/lsst/s2012prod_im0011/logs/S2012Pipe-run-1033:filter-g:camcol-4:field-100.log
/oasis/scratch/ux453102/temp_project/lsst/beta-0618/lsst_home/Linux64/pipe_tasks/5.1.2.0+1/bin/processCcdSdss.py sdss 
/oasis/scratch/ux453102/temp_project/lsst/s2012prod_im0011/runs 
--id run=1033 filter=z camcol=4 field=114 
--output /oasis/scratch/ux453102/temp_project/lsst/s2012prod_im0011/output  
> /oasis/scratch/ux453102/temp_project/lsst/s2012prod_im0011/logs/S2012Pipe-run-1033:filter-z:camcol-4:field-114.log

Sample log : S2012Pipe-run-1033:filter-g:camcol-4:field-100.log

: Loading config overrride file '/oasis/scratch/ux453102/temp_project/lsst/beta-0618/lsst_home/Linux64/obs_sdss/5.1.1.2+1/config/processCcd.py'
: Config override file does not exist: '/oasis/scratch/ux453102/temp_project/lsst/beta-0618/lsst_home/Linux64/obs_sdss/5.1.1.2+1/config/sdss/processCcd.py'
: input=/oasis/scratch/ux453102/temp_project/lsst/s2012prod_im0011/runs
: calib=None
: output=/oasis/scratch/ux453102/temp_project/lsst/s2012prod_im0011/output
CameraMapper: Loading registry registry from /oasis/scratch/ux453102/temp_project/lsst/s2012prod_im0011/output/registry.sqlite3
processCcd: Processing {'filter': 'g', 'field': 100, 'camcol': 4, 'run': 1033, 'rerun': 40}
processCcd: Performing Calibrate on fpC {'filter': 'g', 'field': 100, 'camcol': 4, 'run': 1033, 'rerun': 40}
processCcd.calibrate: installInitialPsf fwhm=2.5253857716 pixels; size=15 pixels
processCcd.calibrate: backgroundStartCpuTime=4; backgroundStartUTime=2; backgroundStartSTime=1
processCcd.calibrate: backgroundStartMaxRss=125180; backgroundStartMinFlt=43079; backgroundStartMajFlt=1
processCcd.calibrate: backgroundStartInBlock=8; backgroundStartOuBlock=80; backgroundStartNVCsw=63690; backgroundStartNIvCsw=612
processCcd.calibrate: backgroundEndCpuTime=4; backgroundEndUTime=3; backgroundEndSTime=1
processCcd.calibrate: backgroundEndMaxRss=165748; backgroundEndMinFlt=53595; backgroundEndMajFlt=1
processCcd.calibrate: backgroundEndInBlock=8; backgroundEndOuBlock=80; backgroundEndNVCsw=63690; backgroundEndNIvCsw=614
processCcd.calibrate.detection: Detected 148 positive sources to 5 sigma.
processCcd.calibrate.detection: Resubtracting the background after object detection
processCcd.calibrate.initialMeasurement: Measuring 148 sources
processCcd.calibrate.measurePsf: Measuring PSF
processCcd.calibrate.measurePsf: PSF star selector found 46 candidates
processCcd.calibrate.measurePsf: PSF determination using 20/46 stars.
processCcd.calibrate: backgroundStartCpuTime=6; backgroundStartUTime=5; backgroundStartSTime=1
processCcd.calibrate: backgroundStartMaxRss=208356; backgroundStartMinFlt=70459; backgroundStartMajFlt=1
processCcd.calibrate: backgroundStartInBlock=8; backgroundStartOuBlock=80; backgroundStartNVCsw=63695; backgroundStartNIvCsw=819
processCcd.calibrate: Fit and subtracted background
processCcd.calibrate: backgroundEndCpuTime=7; backgroundEndUTime=5; backgroundEndSTime=1
processCcd.calibrate: backgroundEndMaxRss=208356; backgroundEndMinFlt=76383; backgroundEndMajFlt=1
processCcd.calibrate: backgroundEndInBlock=8; backgroundEndOuBlock=80; backgroundEndNVCsw=63695; backgroundEndNIvCsw=892
processCcd.calibrate.measurement: Measuring 148 sources
processCcd.calibrate.ApertureCorrection: flux.psf to flux.sinc
processCcd.calibrate.ApertureCorrection: numGoodStars: 20
processCcd.calibrate.ApertureCorrection: numAvailStars: 20
processCcd.calibrate.ApertureCorrection: mean apCorr: 0.9914 +/- 0.0067
processCcd.calibrate: Aperture correction using 20/20 stars: 0.991377 +/- 0.004899
processCcd.calibrate.measurement: Applying aperture correction to 148 sources
processCcd.calibrate.astrometry WARNING: No CCD associated with exposure; assuming null distortion
processCcd.calibrate.astrometry: Null distortion correction
processCcd.calibrate.astrometry: Solving astrometry
processCcd.calibrate.astrometry: Using filter: 'g'
processCcd.calibrate.astrometry: 122 astrometric matches
processCcd.calibrate.astrometry: Removing distortion correction.
processCcd.calibrate.astrometry: Refitting WCS with distortion removed
processCcd.calibrate.astrometry: Astrometric scatter: 0.043523 arcsec (with non-linear terms)
processCcd.calibrate.photocal: Magnitude zero point: 28.605553 +/- 0.017630 from 103 stars
processCcd.calibrate: Photometric zero-point: 28.605553
processCcd.detection: Detected 684 positive sources to 5 sigma.
processCcd.detection: Resubtracting the background after object detection
processCcd.measurement: Measuring 684 sources
processCcd.measurement: Applying aperture correction to 684 sources

Sample Log : S2012Pipe-run-1033:filter-z:camcol-4:field-114.log

: Loading config overrride file '/oasis/scratch/ux453102/temp_project/lsst/beta-0618/lsst_home/Linux64/obs_sdss/5.1.1.2+1/config/processCcd.py'
: Config override file does not exist: '/oasis/scratch/ux453102/temp_project/lsst/beta-0618/lsst_home/Linux64/obs_sdss/5.1.1.2+1/config/sdss/processCcd.py'
: input=/oasis/scratch/ux453102/temp_project/lsst/s2012prod_im0011/runs
: calib=None
: output=/oasis/scratch/ux453102/temp_project/lsst/s2012prod_im0011/output
CameraMapper: Loading registry registry from /oasis/scratch/ux453102/temp_project/lsst/s2012prod_im0011/output/registry.sqlite3
processCcd: Processing {'filter': 'z', 'field': 114, 'camcol': 4, 'run': 1033, 'rerun': 40}
processCcd: Performing Calibrate on fpC {'filter': 'z', 'field': 114, 'camcol': 4, 'run': 1033, 'rerun': 40}
processCcd.calibrate: installInitialPsf fwhm=2.5242788065 pixels; size=15 pixels
processCcd.calibrate: backgroundStartCpuTime=3; backgroundStartUTime=1; backgroundStartSTime=1
processCcd.calibrate: backgroundStartMaxRss=125172; backgroundStartMinFlt=43030; backgroundStartMajFlt=1
processCcd.calibrate: backgroundStartInBlock=152; backgroundStartOuBlock=80; backgroundStartNVCsw=63486; backgroundStartNIvCsw=587
processCcd.calibrate: backgroundEndCpuTime=3; backgroundEndUTime=2; backgroundEndSTime=1
processCcd.calibrate: backgroundEndMaxRss=165736; backgroundEndMinFlt=53546; backgroundEndMajFlt=1
processCcd.calibrate: backgroundEndInBlock=152; backgroundEndOuBlock=80; backgroundEndNVCsw=63486; backgroundEndNIvCsw=588
processCcd.calibrate.detection: Detected 109 positive sources to 5 sigma.
processCcd.calibrate.detection: Resubtracting the background after object detection
processCcd.calibrate.initialMeasurement: Measuring 109 sources
processCcd.calibrate.measurePsf: Measuring PSF
processCcd.calibrate.measurePsf: PSF star selector found 24 candidates
processCcd.calibrate.measurePsf: PSF determination using 15/24 stars.
processCcd.calibrate: backgroundStartCpuTime=5; backgroundStartUTime=3; backgroundStartSTime=1
processCcd.calibrate: backgroundStartMaxRss=208344; backgroundStartMinFlt=70376; backgroundStartMajFlt=1
processCcd.calibrate: backgroundStartInBlock=152; backgroundStartOuBlock=80; backgroundStartNVCsw=63493; backgroundStartNIvCsw=1326
processCcd.calibrate: Fit and subtracted background
processCcd.calibrate: backgroundEndCpuTime=5; backgroundEndUTime=4; backgroundEndSTime=1
processCcd.calibrate: backgroundEndMaxRss=208344; backgroundEndMinFlt=76300; backgroundEndMajFlt=1
processCcd.calibrate: backgroundEndInBlock=152; backgroundEndOuBlock=80; backgroundEndNVCsw=63493; backgroundEndNIvCsw=3746
processCcd.calibrate.measurement: Measuring 109 sources
processCcd.calibrate.ApertureCorrection: flux.psf to flux.sinc
processCcd.calibrate.ApertureCorrection: numGoodStars: 15
processCcd.calibrate.ApertureCorrection: numAvailStars: 15
processCcd.calibrate.ApertureCorrection: mean apCorr: 0.9876 +/- 0.0072
processCcd.calibrate: Aperture correction using 15/15 stars: 0.993468 +/- 0.002543
processCcd.calibrate.measurement: Applying aperture correction to 109 sources
processCcd.calibrate.astrometry WARNING: No CCD associated with exposure; assuming null distortion
processCcd.calibrate.astrometry: Null distortion correction
processCcd.calibrate.astrometry: Solving astrometry
processCcd.calibrate.astrometry: Using filter: 'z'
processCcd.calibrate.astrometry: 101 astrometric matches
processCcd.calibrate.astrometry: Removing distortion correction.
processCcd.calibrate.astrometry: Refitting WCS with distortion removed
processCcd.calibrate.astrometry: Astrometric scatter: 0.055031 arcsec (with non-linear terms)
processCcd.calibrate.photocal: Magnitude zero point: 26.028034 +/- 0.014482 from 98 stars
processCcd.calibrate: Photometric zero-point: 26.028034
processCcd.detection: Detected 398 positive sources to 5 sigma.
processCcd.detection: Resubtracting the background after object detection
processCcd.measurement: Measuring 398 sources
processCcd.measurement: Applying aperture correction to 398 sources