wiki:DC3Platforms
Last modified 10 years ago Last modified on 11/06/2009 04:15:17 PM

Platforms Available for Executing DC3a

DC3b: DC3b details are under construction

DC3bHardwareRequirements

DC3a: DC3a execution will take place on the platforms listed below.

The baseline runs for the DC3 pipeline will be executed on the LSST development cluster at NCSA using a MySQL database hosted on lsst10.ncsa.uiuc.edu. The advantage of the LSST cluster is the availability of significant amounts of long-term, large-scale data storage. Because of the relatively few number of nodes on the LSST Development Cluster, we will run the pipeline at a comparatively small scale.

The LSST Development cluster will also be used, in preparation for DC3, for some preprocessing of the data -- i.e. splitting data from MEF images down to CCD level, and adjusting their metadata accordingly.

Other platforms listed here -- UW, Abe, HECTor -- will be used for short-term runs to explore how the pipeline runs using a higher level of parallelism compared to the LSST Development Cluster.

LSST SC08 and Unstructured Data Management (including data transfers):

Pipelines
Two platforms will be configured to run SC08-related pipeline runs including the LSST Development Cluster at NCSA, and the HECToR cluster at Edinbugh.
LSST Data Grid
Five sites including NCSA, SDSC, Edinburgh, IN2P3 and Chile will act as the data centers (or just storage servers) during this experiment. All these sites are connected to form the LSST Data Grid powered by iRODS servers. (More information on in-kind hardware that is provided by each site will be provided here). A central database will be used for this data grid during for SC 08 and DC3. Later during construction, we plan to have distributed/federated databases. Data transfer tests for TCP based iRODS data movement protocol and UDP based protocol (RBUDP) is planned as part of SC08 and DC3.

LSST Development Cluster at NCSA

Current as of September 2009

Processing (Node 1-4)
Architecture/OS: i386; RHEL 4; kernel 2.6.9
Number of Nodes: 4
Processor Type: Xeon 3.6 GHz, dual dual-core (2 nodes)
Xeon 3.6 GHz, single dual-core (2 nodes)
Memory: 4 GB
Processing (Nodes 5-10)
Architecture/OS: Intel 64 x86_64; RHEL 5; kernel 2.6.18
Number of Nodes: 6
Processor Type: Xeon E5335 @ 2.0 GHz, dual quad-core
Memory: 4 GB (nodes 5-9); 16 GB (node 10)
Storage
Local Disk: 20 - 60 GB
Shared Disk: 1.5 TB (SAN /scr); 6.9 TB (SAN /lsst); 3.6 TB (Lustre /lustre)
Interface: 2Gb Fiber Channel
Networking
Interconnect/Bandwidth: Gigabit Ethernet

NCSA Intel 64 Cluster: Abe

Current as of September 2009

Processing
Architecture/OS: Intel 64 x86_64; RHEL 4; kernel 2.6.18
Number of Nodes: 1200 (9600 cores)
Processor Type: Dell PowerEdge 1955 dual quad core Intel Xeon E5345 @ 2.33 GHz; 2x4 MB L2 cache
Memory: 8 GB (600 nodes) = 1 GB per core; 16 GB (600 nodes) = 2 GB per core
Storage
Local Disk: n/a
Shared Disk: Lustre: 400 TB
Interface/Bandwidth:
Networking
Interconnect: InfiniBand
Notes
Generally, we should expect little or no persistent storage; input data will need to be staged prior to execution and the results cached to mass storage afterward.
References

http://www.ncsa.illinois.edu/UserInfo/Resources/Hardware/Intel64Cluster/TechSummary/

NCSA Intel 64 Tesla Cluster: Lincoln

Current as of September 2009

Processing
Architecture/OS: Intel 64 x86_64; RHEL 4; kernel 2.6.18
Number of Nodes: 192 (1536 cores)
Processor Type: Dell PowerEdge 1950 dual quad core Intel Xeon Exxxx @ 2.33 GHz; 2x6 MB L2 cache
GPU: NVIDIA Tesla S1070 Accelerator Units
Each server connected to 2 Tesla processors via PCI-e Gen2 X8 slots
96 GPUs
Memory: 16 GB per node = 2 GB per core
Storage
Local Disk: n/a
Shared Disk: Lustre: 400 TB (shared with abe)
Interface/Bandwidth:
Networking
Interconnect: InfiniBand
Notes
Generally, we should expect little or no persistent storage; input data will need to be staged prior to execution and the results cached to mass storage afterward.
References

http://www.ncsa.illinois.edu/UserInfo/Resources/Hardware/Intel64TeslaCluster/TechSummary/

UW Astro Cluster

Processing
Architecture/OS: Xeon / CentOS 5.1
Number of Nodes: 128 nodes x 2 quad-core processors = 1024 cores
Processor Type: Xeon quad-core, 2.33 GHz
Memory: 1 GB per core,
Storage
Local Disk: n/a
Shared Disk: see below
Interface/Bandwidth:
/share/data1    # FiberChannel 6.3TB -- Phys: 2.5TB, INT: 2.5TB, Astro: 1TB
/share/scratch1 # FiberChannel 1.6TB -- ALL (2 week old files wiped at 80% utilization immediately)
/share/scratch2 # FiberChannel 1.6TB -- ALL (2 week old files wiped at 80% utilization immediately)
/share/sdata1   # SATA 11TB -- Phys 2.5TB, INT 2.5TB, Astro: 6TB
/share/sdata2   # SATA 2.7TB -- CENPA 1TB, eScience 1.7TB
Networking
Interconnect: InfiniBand
Notes
LSST will need to share the Astro storage with other UW astronomy projects, although the amount available us is likely to be negotiable. Generally, we should expect little or no persistent storage; input data will need to be staged prior to execution and the results cached to mass storage afterward.

HECToR, Edinburgh (proposed)

Processing
Architecture: Cray XT4
OS: Unicos/lc. It consists of two main components: Compute Node Linux for the compute nodes and a fully featured Linux distribution for the service nodes. The service nodes include the login, I/O and system nodes. The Compute Node Linux kernel deployed on the compute nodes is a stripped down version of Linux, designed to be extremely lightweight to limit the number of interruptions to the compute tasks by the operating system.
Number of Nodes: 5664 processors, 11,328 cores
Processor Type: 2.8 GHz dual core AMD Opteron
Memory: 6 GB main memory per processor. AMD's HyperTransport technology is used to connect the processors and their memory. In single node (SN) mode, all of this memory is available to a single compute task, leaving the second core empty. In virtual node (VN) mode compute tasks get placed on both core of the processor. In this mode the memory gets shared between the two cores - giving 3 GB of memory to each compute task.
Storage
Local Disk:
Shared Disk: There are 576 TB of high-performance RAID disk. The service deploys the Lustre distributed parallel file system to access the disks.
Networking
Interconnect: The processors are connected with a high bandwidth interconnect using Cray !SeaStar2 communication chips. The !SeaStar2 chips are arranged on a 3 dimensional mesh of dimension 20 x 12 x 24. Each dual core Opteron has its own private !SeaStar2 chip, directly connected to the processor's HyperTransport system.
Notes

http://www.hector.ac.uk/support/documentation/userguide/hectoruser/Architecture_Overview.html


Platform Available for DC3 Database Scalability Testing

Brookhaven Cluster

Processing
Architecture/OS: Scientific Linux 4.4 + patches
Number of Nodes: 4
Processor Type: 2.6 GHz Xeon
Memory: 8 GB
Storage
Local Disk: 4 TB
Shared Disk: few TB
Networking
Interconnect: ?

Note that this cluster is behind the BNL firewall and the BNL account / login is required to access this cluster


Platform Template

Processing
Architecture/OS:
Number of Nodes:
Processor Type:
Memory:
Storage
Local Disk:
Shared Disk:
Networking
Interconnect: