wiki:InfrastructureWGMeetingD20100426
Last modified 9 years ago Last modified on 07/23/2010 10:18:06 AM

We will be having our regular bi-weekly Infrastructure WG telecon on Monday, April 26, at 12N CT (10A PT).

Agenda

For this meeting, we have special guest speakers from a company called ScaleMP. Their vSMP architecture aggregates multiple x86 systems into a single virtual x86 system, delivering an industry-standard, high-end symmetric multiprocessor (SMP) computer.

http://www.scalemp.com

This is potentially interesting to us for the following reasons:

  • Run distributed memory (MPI) pipelines and shared memory pipelines at the same time
  • Eliminates the need for a parallel filesystem
  • Easily reprovision/reconfigure the hardware to accommodate different workloads
    • Run in a I/O balanced configuration for nightly processing
    • Run in a "memory-intensive" configuration (e.g. 64 cores with all main memory) for MOPS (for example)
    • Run in a third configuration for Annual DR Production

The purpose of this meeting is to introduce this technology to the team. This seems interesting enough that I wanted to set this up in order to get feedback from the team. It is an exploratory discussion.

The agenda for the meeting on Monday will be ~30 minutes of presentation by our guests, followed by Q&A.

Notes

Attendees: RayP, TomH, BillB, DanielW, KTL, TerryS, MikeF, Jeff Stemler (ScaleMP), Nir Paikowsky (ScaleMP)

  • presenters from ScaleMP
    • Jeff Stemler, channel development manager, formerly SGI
    • Nir Paikowsky, director of application engineering
    • contact info available from MikeF
  • ScaleMP
    • founded 2003
    • shipping product since 2006
    • 150 production deployments
  • virtualization to aggregate as opposed to partitioning (e.g. VMware, Xen)
  • many individual servers connected with infiniband interconnect (10G ethernet support coming soon)
  • aggregates all resources: processors/cores, disk drives, NICs
  • creates a single system with lots of cores, lots of memory, lots of disk drives
    • most are CentOS, RHEL; but any linux supported
    • can use the most current CPUs (unlike proprietary SMP machines such as Altix)
  • we get a large SMP server for the price of a distributed memory cluster using commodity hardware
  • live demo during call
    • saw live system with 600GB RAM
    • 128 cores
    • 16 blades
    • system had 768GB of physical memory
      • 10GB for vSMP
      • 10% (118GB) for cache to achieve high performance
  • disk
    • all disk drives appear to system
    • can combine in any way, such as software RAID, etc.
      • this is why a parallel filesystem is not needed
    • can get 600-800MB/s bandwidth to storage using 100MB/s individual drives
  • can leverage ramfs for scratch filesystems to boost performance
  • maximum of 64TB of RAM will be supported by vSMP later this year
  • accelerates I/O for sequential streaming applications too
  • underlying hardware flexibility
    • SGI (and other SMP vendors) use proprietary interconnects -- pros and cons, but costly for vendors to port to newer hardware, therefore the hardware options are fewer and older
    • ScaleMP can leverage all latest (best) hardware: cpus, interconnects, etc.
    • newer hardware could mean better power and cooling characteristics as well
  • can we mix and match hardware? yes
    • different disk drives, yes, different amounts of memory, yes,
    • only exception is mixing cpu speeds, but vSMP can make slow CPUs as "memory nodes" or "disk/storage nodes" and run process on the fastest cpus to optimize the resulting system
  • cache coherency
    • 4K cache lines
    • efficiency = 1 - (access * latency)
      • can't do anything about latency
      • vSMP good at optimizing/minimizing accesses
    • smarter algorithms with every release of ScaleMP software, so system performance can increase over time without buying new hardware
  • trading backplane latency with redundant RAM
  • Often, the hardest part of performance debugging/tuning is finding the problem.
  • ScaleMP provides system-wide profiling tool for finding bottlenecks and tuning the app
  • consulting service available from ScaleMP as well
  • MPI apps run equally fast on vSMP system
  • vSMP Foundation for Cloud
    • can aggregate on-the-fly
    • works with any job scheduler (PBS Pro, etc.); open interface
    • scenario:
      • hardware say 1000 nodes, 48GB/node, Infiniband
      • launching job consists of
        • taking (say) 10 blades from the pool
        • aggregate together to create single SMP machine w 480GB
        • boot up O/S
        • run application (such as MOPS or DR Productions)
  • could use as database server(s)?

  • several very nice benefits from the system administration perspective that was not discussed during the call