Last modified 11 years ago Last modified on 07/18/2008 07:07:01 AM

Fault Tolerance Workshop

A workshop will be held at SLAC on July 15 and 16, 2008, to discuss how to incorporate fault tolerance into the architecture, design, implementation, and schedule of the LSST middleware, as well as the relationship of the middleware to the infrastructure and the applications with regard to fault tolerance.


Kavli Building (051), Room 222. Map

Dinner location TBD based on interest.


  • SLAC: Jacek Becla, Gregory Dubois-Felsmann, Kian-Tat Lim, Steffen Luitz
  • NCSA: Greg Daues, Ray Plante
  • IPAC/Applications: Russ Laher
  • Audio-only: Jeff Kantor, Deborah Levine, Francesco Pierfederici


Prepare a draft document with a fault-tolerant system architecture, high-level designs for fault-tolerant components, and a plan for their implementation. After consultation and review, the document will be presented at PDR, and its plan will be incorporated into the schedules for middleware and application development for DC3 and DC4.


Time (PDT) Topic Homework
July 15, 2008
09:30-09:45 Welcome and logistics
09:45-10:30 Middleware/infrastructure and middleware/application interfaces Strawman proposal - KTL: FaultToleranceInterfaces
10:30-11:00 Explicit science requirements Summary - GDF
11:00-11:15 DC3 teleconference
11:15-12:00 Derived requirements Summary - GDF
12:00-13:00 Lunch
13:00-13:30 Failure types Summary - JB: FaultToleranceUseCases
13:30-14:30 Use cases (specific failure type combinations) Summary - JB: FaultToleranceUseCases
14:30-14:45 Break
14:45-15:45 "Peer" philosophy and implications Proposal/Discussion - KTL/RP: FaultToleranceStrategies
15:45-16:45 "Master" philosophy and implications Proposal/Discussion - KTL/RP: FaultToleranceStrategies
16:45-17:00 Summary/Wrap-up
18:00-20:00 Dinner
July 16, 2008
09:00-09:15 Gathering/Review
09:15-10:30 Possible combinations of "Peer" and "Master" Proposal/Discussion - KTL/RP: FaultToleranceStrategies
10:30-10:45 Break
10:45-12:00 Final decisions on architecture/design
12:00-13:00 Lunch
13:00-14:00 Impact on algorithms
14:00-14:30 Impact on pipeline manager/orchestration
14:30-15:00 Impact on pipeline harness/framework
15:00-15:15 Break
15:15-15:45 Impact on pipeline stages
15:45-16:45 Implementation plan and schedule
16:45-17:00 Summary/Wrap-up/Assignments

Notes From Meeting