wiki:DC3bPT1_2
Last modified 8 years ago Last modified on 03/21/2011 09:58:59 PM

Data Challenge 3b Performance Test 1.2 (PT1.2)

(formerly known as PT1.5)

Background

This page covers the second phase of the PT1 Improvement Plan prior to PDR (following PT1.1). It is based on the "six month plan" discussed at the PT2 Scoping Breakout at the 2010 LSST All Hands Meeting, plus subsequent discussions within the DM team.

References

Goals

(Open questions are noted in bold italic. Please comment on them in the wiki.)

Comment from KTL: Overall goal was originally "everything up to multifit", including difference imaging, DayMOPS, and deep detection. Since MOPS doesn't seem to be in the direct line anymore, it could be dropped. Does this mean diffim and hence template generation can/should be dropped, too?

Comment from gpdf: SST has decided to include diffim in production, but only within snap pairs. Coadds will be exercised in pipette only. See below.

  • All new data will be generated for PT1.2 by ImSim. Minimum requirement is 50% of the sky area of PT1.2, to the same depth. Probably in a later year to get a better filter distribution.
    • Most developers prefer a cloud-free sample. A sample with clouds, of similar size, is desirable for an initial test of global photometric calibration.
    • Also need a deep drilling sample on something like 6-8 CCDs over ten years for astrometric studies. Some requirements for "blind" insertion of "interesting things" / "easter eggs".
  • Defer running CFHTLS deep D1-D4, judged not realistic for PT1.2.
    • Related: fringe corrections are now out of scope.
  • Run all current stages through single frame measurement and source association (even though operational system may do this differently, see Appendix for complete list of stages)
  • Jarvis PSF model will be integrated into system using an existing abstraction layers (although this has been found to require some tweaks)
  • Perform galaxy photometry, in two branches
    • Revised Multifit will be developed and integrated with afw
    • "Conventional" elliptical aperture and elliptical gaussian codes will be made available as well, including the computation of Petrosian magnitudes
  • Co-addition will be tested, but only in pipette. Baseline will be to do PSF-matched monochromatic coadds on tangent planes and evaluate a variety of quality metrics. The resulting images will not be run though additional production. Templates will not be generated. No attempt at a synoptic sky pixelization.
  • Difference imaging will be re-validated by differencing snap pairs, but it will not be used against templates.
  • Required capabilities/data quality metrics:
    • Additional plots/reports TBD
    • Isolated point source photometry: 0.05 mag
    • Galaxy photometry: 0.05 mag for color or repeatability, measured on ensemble
      • Galaxy photometry will be limited by its being done independently in each filter band
    • Revisit DC3b data quality requirements/metrics scorecard values
  • Demonstration of selected features relating to provenance and fault tolerance
    • Provenance: demonstrate ability to recreate a specified calibrated image. Permitted to assume that the original software versions are still available in the stack.
    • FT: precise tests are still being defined, basic idea is to demonstrate the necessary bookkeeping to recover from the loss of a node during production.
  • Adiabatic development from PT1.1 via continuous integration. Continuous demonstration of ability to run full pipelines with incremental additions of features.
    • This goal is now supported by weekly runs of trunk-vs-trunk against a reference dataset.
  • Pick up left-overs from PT1.1 that slipped
    • Finish Jarvis PSF implementation, change meas_algorithms to support multiple PSFs, update PSF/exposure formatters
    • Automated movement of data between cluster and MSS (during production execution?)
    • Improve shutdown/stop mechanism (if not done in PT1.1)
  • "On the side" there will be an attempt to gather short-exposure-pair data from Gemini and process it through LSST code.

"Personal Pipelines"

Sci Collabs need "personal pipelines" (pipelines on single PC) processing a small amount of data.

  • Now supported by pipette. Original comments preserved (for now) for context.

Original comments

Comment from Dick Shaw: IMHO this requirement could result in a huge user support load, and may significantly impact the resources needed to prepare for PT2. What are we trying to achieve here? If the answer is that users want to modify or replace code in the stack, do we offer to help them debug their changes?

Comment from KTL: There was a huge demand for users to be able to run pipelines on their own machines in order to have rapid turnaround times for experiments such as modifying the simulation input parameters, testing particular aspects of the data reduction using artificial inputs, and tweaking various parameters for existing algorithms. We would not necessarily support incorporation of other algorithms in the stack/pipelines at this time.

The following is a proposed list of requirements:

Comment from Dick Shaw: Why not just serve up the chopped raw images (which we have now, I think)? Or is the intent that users retrieve raw CFHT-LS data from CADC and do their own data staging & preparation?

Comment from KTL: If we have preprocessed all interesting images and are willing to serve them, this script would not be necessary.

  • DM provides Python script ("SST") that is able to process 32 ImSim channel images or 2 CFHT amp images into calexp, psf, icSrc, and src output datasets.
  • DM provides Python script that is able to process a set of overlapping src datasets into object, source, badSource, sourceHist, and badSourceHist datasets.
  • DM provides Python scripts to convert src, icSrc, object, source, and badSource datasets into CSV form.
  • DM provides Python script to convert psf into FITS table form (if that is possible and appropriate) or other "standard" form.
  • All of the above scripts must not rely on any resources that are not on the local machine. (Note: Most of these scripts already exist and just need a little cleanup for more widespread use.)
  • ImSim catalogs for comparing to Source and ForcedSource catalogs are TBD

Schedule

  • code changes through mid-October
    • supported by weekly runs to demonstrate continuous integration, avoid surprises
  • attempts at full-scale production begin mid-October
  • stable production begins November
  • "quick look" results should be available by end of Nov to support PDR
  • 1 month for runs (December)
  • 6 weeks for analysis and report drafting with overlap
  • 2 weeks for a more formal writeup

Data required

  • ImSim only, new production not compatible with that for PT1.1. See above for details.
  • No CFHT-LS.
  • Possibility of on-the-side analysis of Gemini and/or Subaru data in pipette.

Performance & Reliability

  • Automated way to produce data quality metrics scorecard
  • Targeted scaling tests to clarify job-coordination and I/O contributions to scaling limits and inform the development of a scaling model
  • Demonstration of provenance use to recreate a calibrated image on demand
  • Demonstration of fault tolerance features in DR production (nature of the demonstration is TBD)
  • Continuation of qserv scaling tests in parallel with other PT1.2 activities.

Environment and Tools

  • Data service plans remain similar to those for PT1.1, including both Gator and direct database access, as appropriate (this still needs explicit confirmation)
  • At least twice weekly buildbot for performance testing / regression testing of performance
  • Weekly runs
  • Automated data quality analysis of data from weekly runs
  • Still need to document decisions on updates to underlying external software

PT1.2 Pipelines and Stages (Data Release Production)

Pipeline: Instrument Signature Removal

  • isr_initialize: Acquire a single channel's image data
  • isr_saturation: Do saturation correction
  • isr_overscan: Do overscan subtraction
  • isr_bias: Do bias subtraction
  • isr_variance: Calculate variance from image counts
  • isr_dark: Do dark subtraction
  • isr_flat: Do flat subtraction
  • isr_sdqa: Generate SDQA metrics
  • isr_output: Output initial corrected channel image (post-ISR) and SDQA metrics

Pipeline: CCD Assembly

  • ca_initialize: Acquire a CCD's worth of post-ISR channel image data
  • ca_assembleCCD: Assemble appropriate channels into a CCD
  • ca_isrCcdDefect: Mask out CCD defects
  • ca_isrCcdSdqa: Calculate additional metrics characterizing assembled image
  • ca_sdqa: Package metrics for output
  • ca_output: Output assembled image and SDQA metrics

Pipeline: Cosmic Ray Split

  • cs_initialize: Acquire a visit's worth of post-ISR CCD image data
  • cs_backgroundEstimation: Do background estimation
  • cs_reject: Mask single-frame-detectable cosmic rays from exposure
  • cs_diffim: use DC3a Diffim algorithm on 15-sec exposure pairs
  • cs_sourceDetection: Detect cosmic rays from difference image
  • cs_crSplitCombine: Combine two 15-sec exposures while masking difference-detected CRs
  • cs_output: Output final modified image and SDQA metrics

Pipeline: Image Characterization

  • ic_initialize: Acquire visit image data
  • ic_sourceDetection: Detect 'best and brightest' sources on an exposure
  • ic_sourceMeasurement: Measure 'best and brightest' sources on an exposure
  • ic_psfDetermination: Given exposure and sources measured on that exposure, determine a PSF for that exposure
  • ic_apCorrect: Determine aperture correction
  • ic_wcsDetermination: Validate Wcs for image using astrometry.net package and calculate distortion coefficients
  • ic_wcsVerification: Compute statistics that indicate if calculated WCS is a good measure
  • ic_photocal: Calculate magnitude zero point for a SourceSet for an image that has been matched to a corresponding SourceSet for a catalogue
  • ic_output: Output measurements and SDQA metrics

Pipeline: Single Frame Measurement

  • sfm_initialize: Acquire visit image data
  • sfm_sourceDetection: Detect all sources on an exposure
  • sfm_sourceMeasurement: Measure all sources on an exposure using aperture and PSF magnitudes and "conventional" elliptical aperture and elliptical gaussian codes as well as computing Petrosian magnitudes
  • sfm_apCorrectApply: Apply aperture corrections to PSF magnitudes
  • sfm_computeSourceSkyCoords: Compute the sky coordinates of sources
  • sfm_output: Output source catalog and SDQA metrics

Pipeline: Source Association

  • sa_initialize: Acquire source catalog
  • sa_SourceClustering: Determine which sources belong to the same object
  • sa_SourceClusterAttributes: Characterize the objects
  • sa_output: Output object catalog and SDQA metrics

Other roadmap items beyond PT1 Improvements Plan

(The items below are to be carried forward to future plans.)

PT2 or later

  • Sky pixelization
  • Multi-fit on stacks
  • Deblending