Acceleration of an Asynchronous Message-driven Runtime System over MPI

Ralf Gunter Correa Carvalho
Seminar

The recent increase in the number of supercomputer architectures and networks poses a challenge to runtime developers, who desire to achieve the best possible performance on each machine. The target runtime of this paper, Charm++,may employ any of a myriad network-specific APIs for handling communication, which are usually promoted as being faster than its catch-all MPI module. Such a performance difference not only causes development effort to be spent on tuning vendor-specific APIs but also discourages hybrid Charm++/MPI applications. We investigate this disparity across several machines and applications, ranging from small InfiniBand clusters to Blue Gene/Q supercomputers and from synthetic benchmarks to large-scale biochemistry codes. We also demonstrate two features from the recent MPI-3 standard that can bridge this gap, and discuss what can be done today with MPI-2.

Bio: Ralf is a research associate from the University of Chicago whose current focus is on optimizing runtime systems sitting on top of MPI, working at Dr. Pavan Balaji's group. He was formerly a member of the Parallel Programming Laboratory at UIUC, where he worked on network optimizations for NAMD and Charm++.