Ticket #795 (closed new functionality: fixed)

Opened 10 years ago

Last modified 7 years ago

Framework for mapping chunks of data to slice workers

Reported by: bbaker Owned by: daues
Priority: normal Milestone:
Component: pex_harness Keywords:
Cc: rplante Blocked By:
Blocking: Project: LSST
Version Number:
How to repeat:

not applicable

Description

Create a mechanism in harness that allows a pipeline to assign data to a slice worker based on its host, so that we can control data locality, for the sake of efficiency.

Change History

comment:1 Changed 10 years ago by bbaker

  • Status changed from new to assigned

comment:2 Changed 10 years ago by bbaker

  • Status changed from assigned to inTicketWork

Currently, data is assigned based on MPI rank, which doesn't give us good control over data locality for the sake of efficiency. We would like to be able to strategize with our data assignments.

Instead, it would be nice to be able to map chunks of data to compute nodes in a way that is optimized for a particular pipeline. One way to do that is to map the hostname to a data ID which the slice worker can use to fetch its data from the file system. The mapping needs to be configurable. Here's our current design:

  1. Add to the Stage interface a method that receives a data ID (which will be a PropertySet? for now).
  2. Implement a mechanism for the pipeline to query for hostnames (gather) and broadcast data IDs (scatter) via MPI.
    • Slice tells pipeline its compute node hostname
    • Pipeline uses mapper to implement assignment strategy
    • Pipeline sends data ID back to slice
    • Slice uses data ID to ask for CCD, Amp IDs, etc.
  3. Create a Mapper interface that the pipeline can use to map hostnames to data IDs. The pipeline will instantiate it based on a class name given in its policy file; each Mapper implementation will have its own policy file schema. Create a simple Mapper implementation.
  4. Slices pass data ID to all of their stages.
    • Currently, the clipboard is emptied at start; should data ID be preset each time the clipboard is initialized by the slice?
  5. Alter implementation of lsst.pex.harness.IOStage.InputStage? to:
    • Get the Data ID
    • Use it to determine CCD, Amplifier ID, etc.

comment:3 Changed 10 years ago by bbaker

Note: Currently, mapping is done in ctrl_dc3pipe/pipeline/IPSD/01-...

comment:4 Changed 10 years ago by bbaker

  • Blocking 951 added

comment:5 Changed 10 years ago by bbaker

  • Blocking 951 removed

comment:6 Changed 9 years ago by mfreemon

  • Owner changed from bbaker to daues

comment:7 Changed 8 years ago by daues

  • Status changed from inTicketWork to inStandardsReview

For the pex_harness package (which is now free of MPI dependence) the job office performs the mapping of data to workers.

comment:8 Changed 8 years ago by daues

  • Status changed from inStandardsReview to closed
  • Resolution set to fixed

comment:9 Changed 7 years ago by robyn

  • Milestone DC3b MW Pipeline Harness Extensions deleted

Milestone DC3b MW Pipeline Harness Extensions deleted

Note: See TracTickets for help on using tickets.