Blasting Through the 10 PFlops Barrier with HACC on the BG/Q

Salman Habib
Seminar

Remarkable observational advances have established a compelling cross-validated model of the Universe.
Yet, two key pillars of this model -- dark matter and dark energy -- remain mysterious.
Sky surveys that map billions of galaxies to explore the `Dark Universe', demand a corresponding
extreme-scale simulation capability; the HACC (Hybrid/Hardware Accelerated Cosmology Code) framework
has been designed to deliver this level of performance now, and into the future. With its novel algorithmic
structure, HACC allows flexible tuning across diverse architectures, including accelerated and  multi-core
systems.

On the IBM BG/Q, HACC attains unprecedented scalable performance -- currently 13.94 PFlops at 69.2% of peak
and 90% parallel efficiency on 1,572,864 cores with an equal number of MPI ranks, and a concurrency of 6.3
million. This level of performance was achieved at extreme problem sizes, including a benchmark run with more
than 3.6 trillion particles, significantly larger than any cosmological  simulation yet performed. The largest
ever cosmological simulation science run is currently underway on Mira.