wiki:DM/buildbot/Weekly_Production/July2011
Last modified 8 years ago Last modified on 09/04/2011 05:49:40 PM

DM/buildbot/Weekly_Production

Detailed discussion of the Weekly Production Runs and Not-Runs

  • Arranged sequentially with Most Recent first.
  • Codeset: Trunk indicates the trunk was used for the lsst stack. In the future, a tagged-set may be specified
  • Dataset: Full indicates the complete set of data specified in $SVN/DMS/datarel/pipeline/<date>_weekly.input. Short indicates a test run using a small subset of available image data.
  • To check on the status of an in-progress run, determine <date_time current run> from run details below, then:
    % cd /lsst3/weekly/datarel-runs/wp_trunk_<date_time current run> # eg 2011_0507_160801
    % /home/buildbot/slave/trunkVsTrunk_lsst/work/RunStatus.sh
    % /home/buildbot/slave/trunkVsTrunk_lsst/work/FindAllErrors.sh
    % ~jbosch/runStatus.py -a -r .   
    

For a status overview of all daily buildbot runs see DM/buildbot/Daily_Status
Past Months' Daily_Status pages: May June July

27 July 2011

1408

  • Why: Debug run using sensors failing in 3000 series run
  • Setup: /home/buildbot/slave/trunkVsTrunk_lsst/work/weeklyPR/pipeline TBD after run: pipeline_2011_0727_140816
  • Codeset: tagged; see /lsst3/weekly/datarel-runs/wp_tags_2011_0727_140816/config/weekly.tags
  • Dataset: one-off of sensors failing in 3000 series run
  • Output: /lsst3/weekly/datarel-runs/wp_tags_2011_0727_140816
  • Database: buildbot_PT1_2_u_wp_tags_2011_0727_140816
  • Status: Status compliments of K-T:
    > The run completed. Looks like 3 records failed to process
    > successfully. One traceback in
    > <rindir>/work/PT1PipeB_1/Slice0.log
    
    Actually, there are four Tracebacks there, all for PSF determination
    failures ("No candidate PSF sources").
    
    887252941 3,4 1,2
    887252941 4,3 2,1
    886915171 4,1 2,1
    888082761 1,1 0,0
    
    These match with exactly the four production failures that issued this
    message.
    
    Curiously and disturbingly, the production failures in PSF determination
    that issued other messages (like "You only have 2 eigen images" or
    "invalid index" or "Please provide at least one Image for me to update")
    did not fail this time.
    
    Two CCDs died in icRemeasure, as they had done in production:
    
    886998681 1,2 2,1
    886236101 2,0 1,2
    887477741 1,4 0,2
    
    One CCD died in icSourceMeasure, as it had in production:
    
    886894861 3,0 1,2
    
  • Resolution: One-off run passed to developers.

1325

  • Why: Debug run using sensors failing in 3000 series run
  • Setup: /home/buildbot/slave/trunkVsTrunk_lsst/work/weeklyPR/pipeline TBD after run: _2011_05_
  • Codeset: tagged; see /lsst3/weekly/datarel-runs/wp_trunk_2011_xxxx_xxxxxx/config/weekly.tags
  • Dataset: one-off of sensors failing in 3000 series run
  • Output: /lsst3/weekly/datarel-runs/wp_trunk_2011_06......
  • Database: buildbot_PT1_2_u_wp_trunk_2011_06.......
  • Status: Failed do to bad input list filename.
  • Resolution: Fixed input parameter typo

22 July 2011

2204

  • Why: Test: new r-band images, new ap module, new sourceAssocIngest.py refObject csv file, AND new input sensor list
  • Setup: /home/buildbot/slave/trunkVsTrunk_lsst/work/weeklyPR/pipeline_2011_0722_220407
  • Codeset: tagged; see /lsst3/weekly/datarel-runs/wp_tags_2011_0722_220407/config/weekly.tags
  • Dataset: full
  • Output: /lsst3/weekly/datarel-runs/wp_tags_2011_0722_220407
  • Database: buildbot_PT1_2_u_wp_tags_2011_0722_220407
  • Status: Successfully completed all sensor data
  • Resolution: pipeQA analysis required.

1940

  • Why: Test: new r-band images, new ap module, new sourceAssocIngest.py refObject csv file
  • Setup: /home/buildbot/slave/trunkVsTrunk_lsst/work/weeklyPR/pipeline TBD after run: pipeline_2011_0722_194036
  • Codeset: tagged; see /lsst3/weekly/datarel-runs/wp_tags_2011_0722_194036/weekly.tags
  • Dataset: full
  • Output: /lsst3/weekly/datarel-runs/wp_tags_2011_0722_194036
  • Database: buildbot_PT1_2_u_wp_tags_2011_0722_194036
  • Status: Failed; need to rebuild the input sensor list to match the new input sensors provided
  • Resolution: Rebuild list and then rerun.

20 Jul 2011

1524

  • Why: Full TAGGED for 3000 weekly run w/vigcorrdata setup....CAVEAT: Serge may need to provide a new /lsst/DC3/data/obs/ImSim/ref/simRefObject-2011-07-20-0.csv to add some extra fields. If so, the source association DB ingest will be rerun after the Tagged weekly run completes (and after the incomplete src assoc DB table is removed in preparation for use of new simRefObject.csv use.)
  • Setup: /home/buildbot/slave/trunkVsTrunk_lsst/work/weeklyPR/pipeline_2011_0720_152434
  • Codeset: TAGGED; see /lsst3/weekly/datarel-runs/wp_tags_2011_0720_152434/config/weekly.tags
  • Dataset: full
  • Output: /lsst3/weekly/datarel-runs/wp_tags_2011_0720_152434
  • Database: buildbot_PT1_2_u_wp_tags_2011_0720_152434
  • Status: Successful run; 502 out of 502 sensors processed. However, source association ingest (only) will be rerun by Serge once the new csv file is fabricated.
  • Resolution: Time for Serge to rerun the src assoc ingest; time for analysts to run pipeQA.

1012

  • Why: DEBUG TAGGED for 3000 weekly run w/vigcorrdata setup
  • Setup: /home/buildbot/slave/trunkVsTrunk_lsst/work/weeklyPR/pipeline_2011_0720_152434
  • Codeset: TAGGED; see /lsst3/weekly/datarel-runs/wp_tags_2011_0720_152434/config/weekly.tags
  • Dataset: debug: 10 sensors
  • Output: /lsst3/weekly/datarel-runs/wp_tags_2011_0720_152434
  • Database: buildbot_PT1_2_u_wp_tags_2011_0720_152434
  • Status: Successful debug run. All 10 sensors processed without error.
  • Resolution: Time for full scale tagged weekly run

0956

  • Why: Debug run in prep for 3000 weekly run. - failed since didn't setup vigcorrdata in input directory
  • Setup: /home/buildbot/slave/trunkVsTrunk_lsst/work/weeklyPR/pipeline_2011_0720_095620_BAD
  • Codeset: TAGGED; see /lsst3/weekly/datarel-runs/wp_tags_2011_0720_095620_BAD/config/weekly.tags
  • Dataset: debug: 10 sensors
  • Output: /lsst3/weekly/datarel-runs/wp_tags_2011_0720_095620_BAD
  • Database: buildbot_PT1_2_u_wp_tags_2011_0720_095620
  • Status: Failed due to missing vignetting input data.
  • Resolution: Link into the new-for-this-run input directory, the vignetting input data.

18 Jul 2011

2105

  • Why: Test updated versions of vignetting stage and PSF algorithm
  • Setup: /home/buildbot/slave/trunkVsTrunk_lsst/work/weeklyPR/pipeline TBD after run:_2011_0718_210516
  • Codeset: trunk; see /lsst3/weekly/datarel-runs/wp_trunk_2011_0718_210516/config/weekly.tags
  • Dataset: full
  • Output: /lsst3/weekly/datarel-runs/wp_trunk_2011_0718_210516
  • Database: buildbot_PT1_2_u_wp_trunk_2011_0718_210516
  • Status: In progress, no errors so far, 502 sensors to be processed
  • Resolution:

16 July 2011

1040

  • Why: Test vignetting stage and new PSF algorithm over complete test data set
  • Setup: /home/buildbot/slave/trunkVsTrunk_lsst/work/weeklyPR/pipeline TBD after run:wp_trunk_2011_0716_104009 wp_trunk_2011_0716_104009
  • Codeset: trunk; see /lsst3/weekly/datarel-runs/wp_trunk_2011_0716_104009/config/weekly.tags
  • Dataset: full
  • Output: /lsst3/weekly/datarel-runs/wp_trunk_2011_0716_104009
  • Database: buildbot_PT1_2_u_wp_trunk_2011_0716_104009
  • Status: Run complete, no serious (exceptions/algorithm fatal/etc) errors detected in logs; 502 records processed
  • Resolution: Turned over to analysts.

13 July 2011

2124

  • Why: Test new flats
  • Setup: /home/buildbot/slave/trunkVsTrunk_lsst/work/weeklyPR/pipeline TBD after run: _2011_0713_212440
  • Codeset: trunk; see /lsst3/weekly/datarel-runs/wp_trunk_2011_0713_212440/config/weekly.tags
  • Dataset: full
  • Output: /lsst3/weekly/datarel-runs/wp_trunk_2011_0713_212440
  • Database: buildbot_PT1_2_u_wp_trunk_2011_0713_212440
  • Status: in progress. Many error messages are streaming; will not try to determine problem until the morrow.
    • Looked at sample traceback error in <RunDir?>/work/PT1PipeB_1/Slice0.log:
      harness.slice.visit.stage.tryProcess FATAL: Traceback (most recent call last):
        File "/home/buildbot/buildbotSandbox/Linux64/pex_harness/svn22998/python/lsst/pex/harness/Slice.py", line 575, in tryProcess
          stageObject.applyProcess()
        File "/lsst/home/buildbot/slave/trunkVsTrunk_lsst/work/svn/pex_harness_22998/python/lsst/pex/harness/stage.py", line 353, in applyProcess
          self.process(clipboard)
        File "/lsst/home/rhl/sss/LSST/daf/meas/pipeline/python/lsst/meas/pipeline/psfDeterminationStage.py", line 70, in process
        File "/lsst/home/buildbot/slave/trunkVsTrunk_lsst/work/svn/meas_algorithms_22987/python/lsst/meas/algorithms/secondMomentStarSelector.py", line 96, in selectStars
          clumps = psfHist.getClumps(display=display)
        File "/lsst/home/buildbot/slave/trunkVsTrunk_lsst/work/svn/meas_algorithms_22987/python/lsst/meas/algorithms/secondMomentStarSelector.py", line 330, in getClumps
          raise RuntimeError(msg)
      RuntimeError: Failed to determine center of PSF clump
      
      
      
  • Resolution:

1844

  • Why: Debug setup to ensure that new dtarel scripts are working OK
  • Setup: /home/buildbot/slave/trunkVsTrunk_lsst/work/weeklyPR/pipeline TBD after run: _2011_0713_184454
  • Codeset: trunk; see /lsst3/weekly/datarel-runs/wp_trunk_2011_0713_184454/config/weekly.tags
  • Dataset: full
  • Output: /lsst3/weekly/datarel-runs/wp_trunk_2011_0713_184454
  • Database: buildbot_PT1_2_u_wp_trunk_2011_0713_184454
  • Status: Scripts are fine; new flats upon which the run was based were not fully installed yet.
  • Resolution: Remove all the DB detritus

9 July 2011

1811

  • Why: AP/Cat, meas_algorithms, use of 6/1/2011 flats,
  • Setup: /home/buildbot/slave/trunkVsTrunk_lsst/work/weeklyPR/pipeline ( TBD later: _2011_0709_181154)
  • Codeset: trunk; see /lsst3/weekly/datarel-runs/wp_trunk_2011_0709_181154/config/weekly.tags
  • Dataset: full
  • Output: /lsst3/weekly/datarel-runs/wp_trunk_2011_0709_181154
  • Database: buildbot_PT1_2_u_wp_trunk_2011_0709_181154
  • Status: 502 out of 502 processed; all DB processing successful except linkDB.py. RAllsman needs to update buildbot run software to use latest stack version (plus extra tag/trunk input param). Unknown why there remains buildbot-owned process: prov.py running still on lsst6.
  • Resolution:
    • RAllsman needs to run linkDB.py by hand and ask KT/Serge about prov.py.
      • Later update: prov.py residual process is from days ago and also has a 'gdb' tied to it. Prov.py is a non-issue for this run.
      • Even later update: linkDB.py has been modified to work with new table names. Rerun worked without error. Need to check into SVN asap.

7 July 2011

2051

  • Why: Still trying for good weekly run
  • Setup: /home/buildbot/slave/trunkVsTrunk_lsst/work/weeklyPR/pipeline TBD after run: _2011_0707_205101
  • Codeset: trunk; see /lsst3/weekly/datarel-runs/wp_trunk_2011_0707_205101/config/weekly.tags
  • Dataset: full
  • Output: /lsst3/weekly/datarel-runs/wp_trunk_2011_0707_205101
  • Database: buildbot_PT1_2_u_wp_trunk_2011_0707_205101
  • Status:
    • A single slice remains running (hung) in sfmSourceMeasurement. Last log entry was 2:58 am 7/8/2011. All other slices exited. Process will be terminated so the overall job will complete DB ingestion, etc.
    • ...Later... The hung job was killed, processing continues thru Associateion Pipeline and DB ingest.
    • Issues:
      • Single slice process (work/PT1PipeB_1/PT1PipeB_1.log) hung in sfmSourceMeasurement on data: "886257211 2,2 1,1 ------ 80 reads, 34 writes, 1 calexp persisted"
      • Assertion posted for: '886257211 1,0 1,2 ------ 80 reads, 34 writes, 1 calexp persisted'
        ./work/PT1PipeB_2/launch.log:python: /lsst/DC3/stacks/gcc443/15oct2010/Linux64/external/eigen/2.0.15/include/Eigen/src/Core/Coeffs.h:96: typename Eigen::ei_traits<T>::Scalar& Eigen::MatrixBase<Derived>::operator()(int, int) [with Derived = Eigen::Matrix<double, 10000, 10000, 2, 10000, 10000>]: Assertion `row >= 0 && row < rows() && col >= 0 && col < cols()' failed.
        
      • Assertion posted for: '886257211 2,2 0,1 ------ 80 reads, 34 writes, 1 calexp persisted'
        ./work/PT1PipeA_1/launch.log:python: /lsst/DC3/stacks/gcc443/15oct2010/Linux64/external/eigen/2.0.15/include/Eigen/src/Core/Coeffs.h:96: typename Eigen::ei_traits<T>::Scalar& Eigen::MatrixBase<Derived>::operator()(int, int) [with Derived = Eigen::Matrix<double, 10000, 10000, 2, 10000, 10000>]: Assertion `row >= 0 && row < rows() && col >= 0 && col < cols()' failed.
        
      • DB Ingest for Source Association reported a number of errors in the log: ingestSourceAssoc.log. The list may be incomplete since the ingest was on-going.
        ./ingestSourceAssoc.log:lsst.pex.exceptions.exceptionsLib.LsstCppException: 0: lsst::pex::exceptions::IoErrorException thrown at src/utils/Csv.cc:488 in void lsst::ap::utils::CsvReader::_ioError(const char*) const
        ./ingestSourceAssoc.log:lsst.pex.exceptions.exceptionsLib.LsstCppException: 0: lsst::pex::exceptions::IoErrorException thrown at src/utils/Csv.cc:342 in lsst::ap::utils::CsvReader::CsvReader(const std::string&, const lsst::ap::utils::CsvDialect&, bool)
        ./ingestSourceAssoc.log:0: Message: failed to open file /lsst3/weekly/datarel-runs/wp_trunk_2011_0707_205101/csv-SourceAssoc/refFilt.csv for reading
        ./ingestSourceAssoc.log:lsst.pex.exceptions.exceptionsLib.LsstCppException: 0: lsst::pex::exceptions::IoErrorException thrown at src/utils/Csv.cc:342 in lsst::ap::utils::CsvReader::CsvReader(const std::string&, const lsst::ap::utils::CsvDialect&, bool)
        ./ingestSourceAssoc.log:0: Message: failed to open file /lsst3/weekly/datarel-runs/wp_trunk_2011_0707_205101/csv-SourceAssoc/refFilt.csv for reading
        ./ingestSourceAssoc.log:ERROR 2 (HY000) at line 1: File '/lsst3/weekly/datarel-runs/wp_trunk_2011_0707_205101/csv-SourceAssoc/refFilt.csv' not found (Errcode: 2)
        ./ingestSourceAssoc.log:    raise CalledProcessError(retcode, cmd)
        ./ingestSourceAssoc.log:subprocess.CalledProcessError: Command '['mysql', '-h', 'lsst10.ncsa.uiuc.edu', '-P', '3306', '-u', 'buildbot', '-pbb4lsst', '-D', 'buildbot_PT1_2_u_wp_trunk_2011_0707_205101', '-vvv', '-e', 'LOAD DATA LOCAL INFILE \'/lsst3/weekly/datarel-runs/wp_trunk_2011_0707_205101/csv-SourceAssoc/refFilt.csv\' REPLACE INTO TABLE SimRefObject\n                    FIELDS TERMINATED BY \',\' OPTIONALLY ENCLOSED BY \'"\' (refObjectId,isStar,varClass,ra,decl,htmId20,gLat,gLon,sedName,uMag,gMag,rMag,iMag,zMag,yMag,muRa,muDecl,parallax,vRad,redshift,semiMajorBulge,semiMinorBulge,semiMajorDisk,semiMinorDisk,uCov,gCov,rCov,iCov,zCov,yCov);\n                 ']' returned non-zero exit status 1
        
      • The numbers reported from RunStatus?.py (501) and JB's runStatus.py input count (502) differ by 1. Need to determine why.
  • Resolution:
    • The Keywords to designate a variety of errors or warnings must be standardized.
    • sfmSourceMeasurement process needs to review the assertions to catch and cleanly terminate the single record's processing.
    • source association DB ingest needs to review the reported errors.
    • The numbers reported from RunStatus?.py (501) and JB's runStatus.py input count (502) differ by 1. Need to determine why.

1032

  • Why: Get a Trunk Weekly Run Processed
  • Setup: /home/buildbot/slave/trunkVsTrunk_lsst/work/weeklyPR/pipeline
  • Codeset: trunk; see /lsst3/weekly/datarel-runs/wp_trunk_2011_0707_103232/config/weekly.tags
  • Dataset: full
  • Output: /lsst3/weekly/datarel-runs/wp_trunk_2011_0707_103232.
  • Database: buildbot_PT1_2_u_wp_trunk_2011_0707_103232
  • Status: Killed...something failed
  • Resolution:
    • Remove the DB detritus

6 July 2011

1837

  • Why: Trunk Weekly Run
  • Setup: /home/buildbot/slave/trunkVsTrunk_lsst/work/weeklyPR/pipeline TBD after run: _2011_0706_183755
  • Codeset: trunk; see /lsst3/weekly/datarel-runs/wp_trunk_2011_0706_183755/config/weekly.tags
  • Dataset: full
  • Output: /lsst3/weekly/datarel-runs/wp_trunk_2011_0706_183755
  • Database: buildbot_PT1_2_u_wp_trunk_2011_0706_183755
  • Status: Foiled again. Re K-T: "It looks like datarel svn22786 was used, which is before MulftiFlagIngestStage was removed in svn22845. As a result, no src datasets are being produced."
  • Resolution: Resume attempt later.
    • Remove the DB detritus

1757

  • Why: Trunk Weekly Run
  • Setup: /home/buildbot/slave/trunkVsTrunk_lsst/work/weeklyPR/pipeline TBD after run: _2011_0706_175713
  • Codeset: trunk; see /lsst3/weekly/datarel-runs/wp_trunk_2011_0706_175713/config/weekly.tags
  • Dataset: full
  • Output: /lsst3/weekly/datarel-runs/wp_trunk_2011_0706_175713
  • Database: buildbot_PT1_2_u_wp_trunk_2011_0706_175713
  • Status: Killed job because I didn't follow my own operator's manual to rebuild the eups cache prior to run.
  • Resolution: RTFM
    • Remove the DB detritus