wiki:Summer2012/Deliverables
Last modified 7 years ago Last modified on 04/03/2012 09:40:17 PM

This page defines the criteria by which Summer 2012 activities are considered done.

Applications

For Summer 2012, we plan to deliver:

  • Detect on single-epoch ImSim and S82 images (with extended source photometry), with performance expected to be comparable or better than SDSS.
  • Produce (at least) PSF-matched mono-chromatic S82 and ImSim co-adds may be a good start for further clarifying this deliverable
  • Detect on co-adds and produce a catalog of detections
  • Perform forced photometry on single-epoch images based on detections on co-adds
  • Run Stack-Fit on ImSim and Stripe82 co-adds (w. S82 being a stretch goal?)

Activities:

  • Stage S82 at NCSA
  • Write butler mapper for S82
    • may have been done at UW
  • cameraGeom for SDSS
    • may have been done at UW
  • Run a weekly on S82
  • Improve background estimation
    • WHAT: Better spline fitting, possibly mask the ghosts. Resolve oversensitivity to bright sources/galaxies. Lupton-Hirata algorithm (described in Huff et al.) for background est. on coadds
  • Get extended source photometry working well on single exposures
    • If imsim is needed to get that done, get that first
    • Need something that works much better than what we have now
    • Must be competitive with sextractor
    • would be aided by progress on "interactive analysis package"
  • Co-addition of images
    • NOTES: Code appears to mostly exists, needs some work (also needs some Butler changes)
    • Task: Import and look at HSC code ("driver" code)
    • Task: Write a co-add pipe_tasks script, and take all camera specific stuff out of pipe_tasks
  • Improve PSF estimation
    • WHAT: Determine the number of required PCA components and spatial structure
  • Ubercalibration of S82 outputs
    • WHAT: Needed to match the backgrounds on single exposures of S82 (and ImSim) before the co-adds are done. May be able to run Nikhil's ubercal code if in a bind.
  • Stack-Fit Port
    • WHAT: Complete LSST Stack-Fit port, including a pipe_task wrapper for it. Perform a null-test on ImSim. Verify it produces the same result as James Jee's IDL implementation on DLS. Stretch goal: run it on Stripe 82.
  • Prioritize the order of S82 processing
    • WHAT: Identify what runs+fields of S82 we will process first, to meet the AHM deadline. These should be the scientifically most interesting runs/fields (e.g., overlapping other surveys, with good seeing, etc.).
  • Detect on coadds
    • WHAT: Run source detection on co-adds
  • Forced photometry based on coadd detections
    • WHAT: Perform force photometry on single-epoch S82 and ImSim data. Perform any required schema adaptations.
    • NOTE: Objects detected in coadd of any band should be force-photometried in all bands
  • Produce a RGB coadds
    • WHAT: Produce a (properly coloured) RGB co-add of S82 and ImSim, for "quick-look" and publicity purposes.
  • GPU code benchmarks
    • WHAT: Install GPU-enabled pipeline on moya.dev.lsstcorp.org, benchmark GPU vs. CPU processCcd task. The same with warping. Run a mini-DRP (10,000 chips) on Forge and compare GPU vs. CPU speed and outputs. Benchmark speedups of co-add code when GPUs are used (and test equality of outputs).
  • Interactive analysis package
    • WHAT: We need an interactive analysis package that goes beyond pipe-QA and allows the developers to quickly open images, overlay PSFs, catalogs, explore them interactively, etc. RHL has an existing code-base that he will start with, and others will add to it on an as-needed basis. Possible cross-pollination with some parts of pipeQA or SUI.
  • Image subtraction
    • WHAT: Update the existing codebase to process a visit
    • NOTE: Needed for ... (?ask RHL?)
  • Improve PipeQA
    • Improve performance
    • Add functionality (what?)
    • Summarize results into traffic light or histograms
      • WHAT: More drill-down capabilities. More summary plots/stats/capabilities. Make it more configurable
  • Processing some of the Pan-STARRS data using LSST stack
    • WHAT: Develop the capability to process PS1 data using LSST stack
  • Add LSST-specific metadata to SDSS-derived output image headers
  • Relative astrometry
    • WHAT: Incorporate Monet or HSC relative astrometry code. Needed to do co-adds properly.
    • NOTE: Robert will talk to ZI on how to organize this
  • Test compression of output images
  • Various application infrastructure jobs
    • WHAT: whatever minor tasks come up (e.g., any fixes to pipe_tasks, etc.)
    • NOTE: It may be good to explicitly have a task like this to monitor how much time is spent here

Middleware

  • Refactor persistence
    • Move mapper configuration into repository using pex_config (2 weeks, KTL/DS)
    • Use pickle as interface to Boost serialization (1 week, KTL)
    • Move Formatter registry into Python (3 weeks, KTL/DS)
    • Add co-adds to butler (2 weeks, KTL)
  • Refactor logging
    • Improve performance of database ingest (SMM/JB/KTL)
      • Currently, the dynamic insertion of log events into the database is too slow. Investigate the bottlenecks and experiment with different approaches to speeding this up.
  • Refactor pipe_tasks
    • Collect requirements (???)
    • Redesign components (???)
    • Reimplement components (???)
  • Replace pex_policy with pex_config
    • Incorporate pex_config into pex_harness (SRP)
    • Incorporate pex_config into ctrl_sched (SRP)
    • Incorporate pex_config into ctrl_orca (SRP)
  • Automate file transfers to and from XSEDE (3 weeks, GD)
  • [DRP implementations]
  • Investigate efficiency of ingest from remote sites (GD)
    • Time ingest to lsst10 from remote XSEDE machine and LSST cluster to see whether we should do ingest remotely or transfer output files and ingest locally.
  • Decide which implementations from those below make sense (SRP/GD/KTL)
  • Implement DRP+coadds+AP(+ingest?) in pex_harness/ctrl_sched (SRP/GD)
    • Ingest only if reasonably efficient (see task above).
    • Coadds and AP may require modification to pipe task stage or new stage.
    • Coadds and AP have different events (datasets) since they execute over sky tiles (quite possibly of different sizes).
  • Implement DRP+coadds+AP(+ingest?) as Condor DAG (SRP/GD)
  • Implement DRP+coadds+AP(+ingest?) in NHPPS (GD/SRP)
  • Develop DRP mockup with resource consumers (SRP/GD)
    • The goal is to have an executable to replace the processCcdLsstSim.py pipe_task (and eventually coadd and AP tasks) with minimal dependencies on the LSST stack (ideally zero) that still has the same I/O and compute patterns.
    • The cost of developing the mockup must be balanced against the cost of making the LSST stack easy to port. If the latter can be almost as cheaply, it is the better alternative.
  • Demonstrate hot fail-over (SRP)
    • Kill a running job; observe it being rescheduled.
    • Kill a running node; observe its tasks moving to other nodes.

Comment by srp: How do we expect to kill a running job? How do we expect to kill a running node? What is expected to realize that the job got killed (pex_harness?) or needs to be killed? There are several different ways that any of this can happen, so we need to clarify where this needs to be implemented so we can schedule resources appropriately. Estimate based on changes to the current pex_harness/ctrl_sched implementation only.

  • Demonstrate cluster monitoring (SRP?)
    • List all jobs pending, running, completed, failed.
    • List all nodes busy, available, offline.
    • Optional visualization (by visit/CCD, by node).

Comment by srp: The mechanism we currently use to do visualizations needs to be looked at. Currently, it involves running a daemon that changes messages into a format that JavaScript? can parse. Implementing this is not all that clean, due to the limitations of JavaScript?, and is fragile as is. There is no way to retain no history information, so switching between pages for the 'Optional visualization (by visit/CCD, by node)' is problematic. The implementation of this will vary greatly between DRP methods, and really depends on what sort of "hooks" into the implementations we can get to to issue appropriate messages so the visualization can take place. Estimate based on changes to ctrl_sched, to the current implementation of the JavaScript? version, associated message daemon (and possibly needing Greg to make changes to pex_harness).

  • Evaluate OWL, produce comparison document (GD)
    • Based on documentation/source review only, execution not required.
    • Depends on sufficient documentation from Francesco.
  • Execute mini-productions in March, April, May (GD)
    • Use latest integration-tested tagged beta stack for each.
    • Prepare input data (10K CCDs or more), including latest ImSim results.
    • Use automated data transfer mechanism.
    • Run each mini-production on XSEDE including pipeQA.
    • Run at least monthly; more often if possible.
    • Each mini-production should be more and more automated, so that running one becomes push-button.

Image Data Management

  • FUSE filesystem access to REDDnet
    • needed for production run data access? is this a hard requirement?
  • GridFTP access to NFS disks
  • Benchmark Globus Online performance
  • Scaling up the image retrieval service
    • planning on REDDnet, but need a fallback position

Qserv and Beyond

  • Revisit SQL coverage (#1934) [3 weeks, JB]
  • Implement new master-worker dispatch messaging (#1863) [1 week DW]
  • Rework export directory structure for qserv worker (#1846) [2 weeks JB]
  • Cluster configuration and installation (#1935) [1 month DS]
  • Handle Ref*Match tables
    • Extend partitioner to handle Ref*Match tables (#1848) - [1 week SMM]
    • Integrate Ref*Match partitioning into qserv (#1933) - [2 weeks DW]
  • Improve error handling (#1861, #1847, #1895) [2 weeks JB]
  • Query management [5 weeks JB]
    • Master-side list of queued/running queries (#1920)
    • Safe-kill running queries on master (#1936)
    • Worker-side query interruption facility (#1937)
  • Performance/scale tests [3 weeks DS]
  • Porting qserv to lsst10 (#1938) [2 weeks DS]
  • Rewrite parser (#1939) [2 months, DW]
  • Fix architectural concurrency problems on master and worker (resource isolation, subchunk conflicts, thread mgmt)
    • setup test environment [1 week DS]
    • debug it [2 weeks, DW]

Non-qserv:

  • Merge spatial udfs (scisql) with JHU HTM (#1941) [2 weeks SMM]

[Legend: JB - Jacek, DW - Daniel, DS - Douglas, SMM - Serge]

Modeling

Note: include only level one bullets as project tasks; additional bullets provide SOW for time estimation.

  • UML DC model: Prepare EA:LSST_DM for next Data Challenge - Su2012
    • Copy baseline model to LSST_DM as Su2012
    • Reverse Engineer final source for W2012 into Su2012 Logical model
    • Apply EA add-on tool to remove programming language constructs
    • Review distilled diagrams and record any new objects which might be classified as Domain objects but which are missing from the baseline Domain model.
    • Provide Software Design Architect with list of potential new baseline Domain objects; he will determine if they should be added to the baseline Domain model.
  • UML baseline model: Update EA components changed/added (LOE)
    • Review baseline model to determine if components need further elaboration (especially MW)
  • UML baseline model: Flow down DM System Requirements into upper level baseline use cases (LOE)
  • UML baseline model: Generate specifications for User & System Integration tests (LOE)
    • Determine if DougR's tool to extract test specification from system requirements is portable to EA:SysML from EA:UML
      • if so,
        • arrange for it to be ported to EA_SysML
        • start extracting test specifications
    • Determine if a canned Test Management system is suitable
      • check EA's and Parasoft's test management "solutions".
  • Infrastructure sizing model: Incorporate updates since PDR

SQA Development

Note: include only level one bullets as project tasks; additional bullets provide SOW for time estimation.

  • Parasoft installation
    • Install raw tool -- step 1 should be done ASAP to satisfy our vendor agreement
    • Tune installation to conform to LSST build environment (SOW for Summer hire?)
      • integrate scons and eups into Eclipse build paradigm
    • Update previous DM C++ Parasoft rules to new DM C++ standards by adding, revising, removing as appropriate for the new Standards
  • Implement ABI checker (SOW for Summer hire?)
    • Find and Check in-depth one or more ABI Checkers
      • Do by-hand build of both a simple & a complex source package abi XML file
      • Create abi digest for builds of varient versions of packages used to create abi XML files.
      • Perform an abi check of same source builds
      • Summarize process and abi checker reports
    • Select the best abi checker tool
    • Plan method of integration into buildbot
      • management of abi xml data (creation, fetch, vetting, etc)
      • management of abi digests (creation, fetch, mapping, etc)
      • integrating abi check into scons build process
    • Implement
  • Update standards documents on Trac and docushare (LOE)
    • Ensure all standards docs on Trac are represented in Docushare and referenced in the SDP.
    • Ensure all current policy statements on both Trac and Docushare are the same versions or that they follow the Policy for documents under revision.
    • Review DM standards docs for conformance to (emerging) SE standards.
  • Review of DM policies (LOE)
    • If the DM policy on Docushare documentation is unclear, revise it.
      • Elaborate on
        • the acceptable formats for Docushare;
        • whether or not deviation of format is acceptable between equivalent Docushare and Trac document;
        • the process allowing for transient deviation of content between Docushare & Trac during document content updating.
      • And, possibly, provide 2-way translation tools to move Documents between Docushare and Trac document formats.
    • Ensure all policy statements on Trac are represented in Docushare.
    • Ensure all current policy statements on both Trac and Docushare are the same versions or follow the documents-under-revision policy.
  • Master document to organize all documents (LOE)
    • Should be an Overview for DM team members
    • Should be at a lower-level than PDR document; i.e. more detailed
    • Will require editing and restructuring of Trac

Buildbot

Note: include only level one bullets as project tasks; additional bullets provide SOW for time estimation.

  • Make integration tests run as separate pseudo-daemon
    • Ensure invocation of drpRun does not block process exit of buildslave
  • Developers able to initiate builds via buildbot forms
    • Select and institute a buildbot authorization interface allowing selected users to initiate buildbot runs using the buildbot webform UI.
  • Build and Run Production Test against master or beta or stable tags
    • For a master build: compile&test all master packages, pick up any missing external dependencies from the beta:stable tags in the stack (now implemented but not in an extensible manner to allow following)
    • For a beta build: compile&test all beta-tagged packages, pick up any missing external dependencies from the stable tags in the stack.
    • For a stable build: compile&test all stable-tagged packages or exit in error if any external dependencies are unavailable.
    • Support tag manifest git extraction
  • On-demand integration tests
    • Provide a web UI allowing developers to configure a local NCSA cluster production run. This UI does not necessarily need to be implemented within a buildbot environment. For versatility, it should allow input of the stack manifest to use for the Run or use the latest {master / beta / released} stack generated by Buildbot.
  • Merge-triggering on well-known branches
    • implement a buildmaster interface which triggers a stack build and mini production run whenever an event occurs on a set of well-known branches such as: master, Winter2012x, 4.8.
      • implement in such a way that a buildslave managing a new branch's build history is simple to add and later retire.
      • test implementation by adding 4.10 to the set of well-known branches
      • Support Branch builds (Master, plus addt'l well-known branch (e.g. S2012), and OnDemand? of any user specified git-branch)

System Administration

  • ds33 replacement
    • requirements in flux right now
  • System monitoring
  • System/cluster configuration tools
  • Review backup tools and strategy

MOPS

  • XSEDE allocation on gordon (ember?)
  • Update MOPS requirements to satisfy NEO science
  • Write report of MOPS results to date

System Engineering

  • N-squared diagram for external subsystem interfaces
  • N-squared diagram for DM internal interfaces
  • Risk Registry Update
  • OSS Requirements Review
  • Common standards working groups
    • SDP
    • electrical
    • networking

Science User Interface

  • SUI Workshop
  • SUI use case expansion
  • Improved catalog and image access
    • Direct MySQL access
    • qserv access
    • Gator improvement
    • Firefly on subset at IPAC
    • Firefly on complete data at NCSA

Early Production Runs

  • XSEDE proposal due Apr. 15
  • Define input data for both sets of runs
    • Stripe82 more important than imsim for production
    • finish Stripe 82 in the second set of runs?
  • Define technical requirements for both sets of runs (compute time, storage req., img retrieval svc)
    • image storage requirements (TB, location and access mechanisms)
    • database storage requirement (TB)
  • Acquire new storage if necessary for both sets of runs
    • depends on requirements - but a background need is there for other reasons
  • Define SUI for S12 for both sets of runs
  • Integration testing on HPC platform (hopefully short due to mini-productions)
  • Production runs on HPC platform including data transfer and pipeQA
  • Load data into SUI systems (gator, etc.)
  • Data analysis of output data
  • Update DC handbook

Late Production Runs

  • Production runs on HPC platform including data transfer and pipeQA
  • Load data into SUI systems (gator, etc.)
  • Data analysis of output data
  • Update DC handbook