Preparing Aurora for science on day one

science
Aurora

Argonne’s upcoming 2 exaflops supercomputer, Aurora, will leverage several technological innovations to support machine learning and data science workloads alongside more traditional modeling and simulation runs. (Image: Argonne National Laboratory)

To celebrate Exascale Day, we take a look at some of the efforts underway to prepare applications and software for the ALCF's Aurora exascale supercomputer.

When the supercomputer Aurora comes online at the U.S. Department of Energy’s (DOE) Argonne National Laboratory, it will be one of the fastest systems in the world, with a theoretical peak performance of greater than 2 exaflops.

However, the process of planning and preparing for a new leadership-class supercomputer takes years of collaboration and coordination. It requires partnerships with vendors and the broader high-performance computing (HPC) community to test and develop various hardware and software components, validating that their performance and functionality meets the needs of the scientific computing community.

Argonne researchers contribute to a broad range of activities to prepare Aurora for science on day one. To ensure key software is ready to run on the exascale system, scientists and developers continue work to port and optimize dozens of scientific computing applications. With access to the Aurora software development kit and early hardware, teams strive to improve the performance and functionality of various codes, frameworks, and libraries using the programming models that will be supported on Aurora.

Here's a look at some of the work being carried out to ready the Argonne Leadership Computing (ALCF), a DOE Office of Science user facility at Argonne, and its user community for science in the exascale era.

Preparing Aurora for scientific visualization

As ALCF’s Joseph Insley explains, often the best—and sometimes the only—way to make sense of the unprecedented amounts of data HPC resources and scientific instruments now produce is to transform the information into images and visualizations.

Furthermore, as the scale of compute resources continues to grow, the rate at which data is computed has greatly outpaced the rate at which that data can be saved to disk or tape. While this disparity is not entirely unique to exascale, the gap between computations and saved records will continue to widen, resulting in lost data and lost science. To address this challenge, the ALCF’s Visualization and Data Analysis team has devoted significant efforts to enabling in situ visualization and analysis on Aurora—that is, performing visualization and data analysis on data that are to be discarded while the simulation is still running and those data are still in memory.

Enabling machine learning capabilities

ALCF computer scientists Bethany Lusch and Murali Emani lead efforts to ready crucial machine learning and data science features for use on Aurora. Their work includes preparing the Intel data analytics programming library oneDAL and the open-source Python machine learning package scikit-learn.

Argonne teams provide input and feedback to Intel engineers, helping to prioritize various aspects of development. Other challenges include enabling distributed implementation across multiple GPUs and at full-scale on Aurora, and facilitating interoperability with other data science libraries.

Software development and testing

Readying new computing systems involves lots of porting, compiling, testing, and evaluating—not just applications, but libraries, modules, and frameworks as well.

ALCF computational scientist Abhishek Bagusetty discusses his research to support Exascale Computing Project work in the application development domain, spanning computational fluid dynamics, domain-specific languages, and molecular simulations of materials.

Porting deep learning software to explore fusion energy

Exascale systems stand to enable fusion researchers to train increasingly large-scale deep learning models able to predict with greater accuracy the onset of plasma instabilities in tokamak reactors such as ITER. The increased processing and predictive powers of exascale will permit more exhaustive hyperparameter tuning campaigns that in turn can lead to better-optimized configurations for the AI models.

ALCF computational scientist Kyle Felker leads efforts to port FusionDL—the primary application of the Aurora Early Science Program project “Accelerated Deep Learning Discovery in Fusion Energy Science,” which uses AI methods to improve predictive capabilities and mitigate large-scale disruptions in burning plasmas in tokamak systems—to Aurora.

Optimizing a computational dynamics solver

Researchers use the computational dynamics solver NekRS, a successor to the Nek5000 application that relies on the open-source vendor-neutral framework and parallel programming library OCCA, for mission-critical DOE problems like the simulation of coolant flow inside small modular reactors.

Kris Rowe, a computational scientist at the ALCF, leads efforts to deploy NekRS on Aurora.

Bringing the PETSc library to Aurora

Argonne software engineer Junchao Zhang leads a team of researchers working to prepare PETSc, a math library for the scalable solution of models generated with continuous partial differential equations, for use on the nation’s exascale supercomputers, including Aurora.

As researchers from both science and industry seek to generate increasingly high-fidelity simulations and apply them to increasingly large-scale problems, PETSc stands to directly benefit from the advances of exascale computing power.

==========

The Argonne Leadership Computing Facility provides supercomputing capabilities to the scientific and engineering community to advance fundamental discovery and understanding in a broad range of disciplines. Supported by the U.S. Department of Energy’s (DOE’s) Office of Science, Advanced Scientific Computing Research (ASCR) program, the ALCF is one of two DOE Leadership Computing Facilities in the nation dedicated to open science.

Argonne National Laboratory seeks solutions to pressing national problems in science and technology. The nation's first national laboratory, Argonne conducts leading-edge basic and applied scientific research in virtually every scientific discipline. Argonne researchers work closely with researchers from hundreds of companies, universities, and federal, state and municipal agencies to help them solve their specific problems, advance America's scientific leadership and prepare the nation for a better future. With employees from more than 60 nations, Argonne is managed by UChicago Argonne, LLC for the U.S. Department of Energy's Office of Science.

The U.S. Department of Energy's Office of Science is the single largest supporter of basic research in the physical sciences in the United States and is working to address some of the most pressing challenges of our time. For more information, visit https://energy.gov/science

Systems