Shortly after Mira, the ALCF’s new 10-petaflops supercomputer, entered production mode in April, Robert Moser and his University of Texas team initiated one of the first full-scale runs on the system, continuing their work with the largest-ever direct numerical simulations (DNS) of fluid dynamics.
Moser’s project involves investigating the physics of turbulent flow phenomena, and his team initially gained access to pre-production time on Mira through the ALCF’s Early Science Program (ESP). The ESP allocated time to 16 projects to help fine-tune Mira for science, but also to help researchers prepare their codes for the architecture and scale of the IBM Blue Gene/Q supercomputer.
“The ESP was essential in preparing our INCITE project to hit the ground running,” said Moser, professor and deputy director of the Institute of Computational Engineering and Sciences (ICES) at the University of Texas at Austin. “Mira’s Blue Gene/Q architecture possesses a unique hardware and software stack that required us to develop and modify our code to fully leverage the system capabilities. The ESP gave us sufficient lead time to propagate the lessons learned from porting compute kernels to these machines into our production code before jumping into our INCITE allocation.”
In the short time since Mira has gone live, Moser and his colleagues, namely ICES researcher Nicholas Malaya and PhD student Myoungkyu Lee, have used more than 55 percent of their INCITE allocation of 175 million core-hours. At this rate, they expect to exhaust the remaining 78 million core-hours over the next few months. This is in addition to the 60 million core-hours the team was awarded through the ESP.
Manipulating Turbulence for More Energy-Efficient Transportation
A substantial fraction of the energy consumed by a moving vehicle is due to drag and is dissipated by turbulence as it moves through air or water. The same forces are at work as air or liquid moves through ducts or pipes. It is estimated that 20 percent of world energy consumption is traceable to such turbulence and the energy it dissipates.
The goal of Moser’s work is to use Mira to perform DNS of high Reynolds number (the dimensionless ratio of inertial forces to viscous forces) fluid flow to examine the complex physics of wall-bounded turbulence. This phenomenon is at the heart of the interaction between solid surfaces (vehicles or pipes) and the fluid flowing past them.
“While we have descriptive characterizations of these effects, we do not have a sufficiently detailed understanding of the mechanisms to allow them to be manipulated,” Moser said. “Our goal is to fill this gap in understanding so that new tools might be developed for the prediction and manipulation of wall-bounded turbulence, and the reduction of drag.”
Once the simulations are complete, the resulting data will provide insights necessary for the development of more accurate turbulence models. Ultimately, this work could lead to more energy-efficient transportation through the design of improved vehicle surfaces and reduced-drag piping and ducts.
Mira Enables Largest DNS for Science to Date
Of particular interest to Moser’s team is the overlap region, where the viscous near-wall turbulence interacts with the outer-layer turbulences. The region is currently not well understood because simulations to date have not allowed for a high enough Reynolds number to obtain the scale separation needed to shed light on the complexity of this multi-scale turbulent structure.
However, due to the substantial power of Mira, the researchers believe the DNS currently being performed have a high enough Reynolds number to generate sufficient scale separation.
The Blue Gene/Q’s architecture, coupled with the team’s software performance increases, are allowing them to run simulations at Reτ = 5200 on a 15360 x 1536 x 11520 mesh. Given the mesh size, this is believed to be the largest production scientific DNS ever conducted.
“Currently, the most complete sequence of cases includes turbulent channels in the friction-Reynolds-number range Reτ = 180−2000, which now constitute the standard reference data set in the field,” Moser said. “Supplementing these data with Reτ = 5200 from our simulations will establish a reference data set that will remain useful for the turbulence research community for many years.”
Developing Code to Take Advantage of Mira’s Unique Capabilities
During the ESP period, Moser’s team was able to port their code to Blue Gene/Q systems with little effort. However, they soon realized that their code needed some work to make the most of the Blue Gene/Q architecture. Their old code had a fixed global communication pattern and its data structures were not well suited for multi-threading.
The researchers attended ESP workshops and worked with their ALCF project catalyst Ramesh Balakrishnan and his colleagues, notably Jeff Hammond, to overcome software issues and better understand the performance metrics while developing their code. With this information, Moser’s PhD student, Myoungkyu Lee, wrote the new channel code from scratch to exploit the hardware features in the Blue Gene/Q architecture.
At the core level, the team achieved efficient memory access by hand loop unrolling and fusing loops to improve cache reuse. At the node level, implementing OpenMP threading with loop fusion techniques allowed them to maximize the size of threaded blocks. Mira’s threading, cache, and memory characteristics enabled the researchers to employ a hybrid Open MP/MPI model to take advantage of the natural concurrence in their algorithm. This provided flexibility in on-node execution resulting in efficient management of cache and execution threads. By minimizing the inter-memory access between OpenMP threads, they achieved near-perfect OpenMP scalability (99 percent). At the system level, they were able to take full advantage of the 5D torus network by replacing the existing library for 3D global Fast Fourier Transforms (P3DFFT) with a new library they developed using the FFTW 3.3 communication library. All of this work resulted in a 2x performance increase compared to the old code.
Balakrishnan has been impressed with the team’s successful approach of crafting code to take full advantage of Mira’s capabilities.
“Rather than working within the narrow confines of a pre-existing code, built on top of a pre-existing library, Moser’s team did a systematic paper and pencil analysis of the governing equations and underlying algorithms,” Balakrishnan said. “They identified those kernels in the code that could benefit from the hardware features in Mira, and developed a strategy to exploit the same to effect a significant improvement in performance.”
“The fact that much of this work was done by a graduate student (Lee) should serve as an example to other students and should help reinforce the truism that software engineering is an integral part of research in computational science,” he added.