wiki:DC3bMiddlewareTasks
Last modified 10 years ago Last modified on 10/16/2009 06:44:58 PM

Middleware: Tasks, Issues, and Actions for DC3b Planning

Tasks

*Early Delivery considered important for continuous integration by application developer

1. Miscellaneous Middleware

  1. Complete support for validated policy files [Baker, 2.4 wks]*
    1. Finish support for policy validation
    2. Document recommended pattern for loading and using policies
    3. Update existing policy use (in application code) to use recommended pattern
  2. Logging: automatically attach additional properties to log messages [Plante, 1 wk]*
    1. Develop and Document usage patterns for ensuring the attachment of needed properties.
    2. Update pex_logging and pex_harness to support this pattern
    3. Update existing use of logging to use this pattern.
  3. Create factory-based interface to persistence [Lim (see PersistenceRedesign)]*
  4. DC3b should do concept verification of ideas in the fault tolerance document including FT-MPI and produce a separate deliverable, but this is not expected to be incorporated in the harness that produces the DC3b science outputs. [Daues, 3wk]

2. Pipeline Harness and Orchestration

  1. Update UML and documentation to reflect suggested usage of Stage code for LSST-specific control, with science in non-LSST-specific Python functions. [Plante, 0.2wk; Done, see summary]*
  2. Re-factor Harness to: [Daues, 0.5wk]*
    • separate MPI-specific parts of harness implementation into new package
    • Re-factor the Stage class into two subclasses, one executing in the Pipeline and one executing in the Slices
  3. Implement proper pipeline shutdown mechanisms [Daues, 1wk]*
  4. Create new IO stage(s) optimized for "interactive" launching of pipelines [Pietrowicz, 1wk]*
  5. Re-factor Orca code to: [Pietrowicz, 3wks]*
    • incorporate lessons of DC3a (e.g. PlatformRun-level database configuration)
    • enable safer and more extensible interface
    • incorporate on-going pipeline monitoring
    • support interactive and command-line launching
  6. Improve support for simple, interactive harness execution [Daues, 2wks]
    1. Improve SimpleStageTester, extend to handle multiple (simple) slices and stages in one core (no MPI)
    2. Support simplified MPI-based pipelines
  7. Advanced Inter-slice communication support
    1. Complete flexible mapping of data to slices [Baker, 1wk]
    2. Implement policy-configured model for logical access to inter-slice shared data (Cross-talk or ap driven) [Daues, 2wk]
  8. Provide means for documenting in machine-readable format the required inputs (needed on the input clipboard) and outputs (that will get placed on the clipboard) for a stage. ??
  9. Create capability for "freeze-drying" clipboard state at entry to any stage
    1. Design [Lim, 1wk]
    2. Support policy-driven and/or programmatic over-ride control in pex_harness [Daues, 1wk]

3. Event Infrastructure

  1. Expose typed Event API to event handling framework to: [Pietrowicz, 1wk]*
    • automatically tag events with type-specific properties
    • enable finer control over event listening, including selection based on runId
      (subsumes "Publish/Receive? runId-specific data trigger events")
  2. Provide event monitor scripts that look for simple node/pipeline failures. [Pietrowicz, 1wk]
  3. Provide higher-level, python-based, declarative interface for easier definition of event monitor scripts. [Pietrowicz, 3wk]

Totals

  • Plante: 1.2 wks*
  • Baker: 2.4* + 1 = 3.4 wks; / 50% = 4.8* + 2 = 6.8 wks
  • Daues: 2.5* + 8 = 10.5 wks; / 50% = 5* + 16 = 21 wks
  • Pietrowicz: 5* + 4 = 9 wks
  • Lim: 1 wk

Issues

  1. Tasks 6-9 are most easily supported when building on Tasks 4, 5.
  2. Design for Task 4 needs good documentation of the capabilities we want this to support (e.g. automated validation of pipelines), even if we don't implement support for all the capabilities in DC3b.

Actions

  1. Update Harness UML model to reflect separated components

Design Points

  1. It was resolved that the pipeline harness, from a design point of view, would be explicitly a python-based tool. C++ is used expressly for interprocess communication.
  2. The goals of tasks 3-9 are:
    • Be able to debug a stage running in a pipeline via pdb/gdb.
    • Be able to easily package up the minimum portion of a pipeline required to re-create a failure.
    • Be able to manually run a pipeline from the command line on one node (without MPI or an event broker) or multiple nodes (with MPI but possibly without an event broker) with minimal configuration in order to produce science outputs.
    • Be able to substitute another communication and coordination mechanism for MPI.
  3. Task 13: It was resolved that stage instances should contain application code that is specific to LSST processing including clipboard and policy manipulation. This code should call Python functions that are not specific to LSST that perform the actual science processing.
  4. Task 14: Since the two types of Stage execute in different environments, do not share all member variables, and have different methods, it was felt to be best if they were split into two separate classes.