wiki:InfrastructureWGMeetingD20090914
Last modified 10 years ago Last modified on 09/15/2009 08:54:35 AM

We will be having our regular bi-weekly Infrastructure WG telecon today, Monday, September 14, at 12 Noon CT.

Agenda

This meeting will be focused on DC3b hardware requirements.

  • DC3b schedule
    • Feb: porting
    • Mar/Apr?: large cluster runs
  • TG allocations deadlines
    • requests due Sep15-Oct15 for Jan-Dec 2010
  • Updated the hardware platforms page
    • see link below
    • updated details on the development cluster, abe, lincoln
    • note: lincoln has nvidia GPUs
  • BW test machine?
  • New requirements page
    • see link below
    • adding/updating is encouraged

Links

Notes

Attendees: RP, CC, KT, Jacek, GDF, MF

Please correct any errors or omissions directly on this page!

  • DC3b schedule
    • Feb: porting
    • Mar/Apr?: large cluster runs
    • Everyone agreed this matches existing plans and expectations
  • TG allocations deadlines
    • requests due Sep15-Oct15 for Jan-Dec 2010
    • we are currently running using startup/friendly-user allocations. we will almost certainly need (or have a strong preference for) an explicit allocation. MF will coordinate.
  • Updated the hardware platforms page
    • DC3Platforms
    • updated details on the development cluster, abe, lincoln
    • lincoln has nvidia GPUs
    • the GPUs are interesting not for the "DC3b large cluster runs" task, but for the "DC3b GPU testing" task.
    • other NCSA clusters (e.g. mercury, cobalt) go away in March 2010
  • BW test machine?
    • in response to jeffs note some time ago, two ideas have surfaced for which blue waters would be required
      • a "correlation engine" running alongside the LSST science database, searching for correlations, patterns, trends, principal components, and other features across all combinations of science database parameters. Assuming 200 attributes in the science database (for each of 50 billion objects), then there are 20,000 pairs of attributes to correlate, a million combinations of 3 attributes to correlate, a hundred million combinations of 4 attributes, etc. This quickly becomes a petascale computing challenge.
      • Running multi-fit across the whole image collection.
  • MF met with a BW manager to discuss this and potential use of their prototype hardware for potential opportunities re DC3b.
    • there is a PRAC allocation process in place
    • the next round of allocations proposals are due March 2010
    • the awards are funding travel only (to collaborate more closely with the BW team). they do not fund labor to port, test, etc.
    • awardees will sign NDAs -- working with real information and access to BW similators
    • the BW simulators predict BW performance, and are used to port and tune the science codes.
    • "provisional" allocation on BW when it comes up (provisional in the sense that intermediate testing may show that the science codes under consideration do not warrant running on BW).
    • BW is expected to go live in the summer of 2011
    • the BW team is hosting a BOF at SC09 to discuss the allocations and collaboration process
    • This topic is not applicable to DC3b. The timeframes nor the nature of the work don't match up.
  • New requirements page
    • DC3bHardwareRequirements
    • contains straw man of key attributes required of the hardware environment
      • need to arrive at numbers for these
    • need to discuss DC3bSimInputData further to understand its implications better
      • are their intermediate milestones that can serve as checkpoints along the way?
    • need to split requirements between "large cluster runs" and "data serving" activies. They are separate and independent.
    • need to float the idea of not having direct database access during the cluster runs
      • this simplifies a lot from the infrastructure perspective but has system design and code implications
    • this is high priority issue to get resolved
      • need lead time for allocations process
      • risk to dc3b results/deliverables
    • main areas
      • scope database requirements
      • space needs
      • cpu needs
    • will be discussed on the DC3 call