Aurora software development: Preparing profiling tools for exascale

Aurora software development: Preparing profiling tools for exascale

In this series, we examine the range of activities and collaborations that ALCF staff undertake to guide the facility and its users into the next era of scientific computing.

Preparing for Aurora

JaeHyuk Kwack, computational scientist at the Argonne Leadership Computing Facility (ALCF), is responsible for ensuring the readiness of a number of major scientific applications for performant use on the U.S. Department of Energy’s (DOE) forthcoming exascale systems.

JaeHyuk Kwack

JaeHyuk Kwack is responsible for ensuring the readiness of a number of major scientific applications for performant use on forthcoming exascale systems. (Image: Argonne National Laboratory)

For a number of years, Intel has provided the Advisor and VTune performance profiling software for CPU-based Intel architectures. In advance of the delivery of the ALCF’s Intel-HPE exascale system, Aurora, in 2022, Intel has been extending those tools to its Xe GPU architecture. Because the Aurora software development kit (SDK) incorporates multiple major programming models—such as DPC++/SYCL, OpenMP Target offloading (for C, C++ and Fortran), OpenCL, Kokkos, and RAJA—those tools need to be tested across every combination of programming model. Many important applications being developed under DOE’s Exascale Computing Project (ECP) integrate Intel’s optimized math libraries to maximize their performance; it is therefore crucial that Advisor and VTune are able to capture their performance characteristics seamlessly.

Employing principal ECP and ECP-proxy applications across a wide span of science domains ranging from molecular structure systems to atmospheric boundary layer flow simulations, ALCF’s Kwack has validated performance data provided by Advisor and VTune on Aurora testbed systems. Through collaboration with ECP application developers and ALCF performance engineers, he also has provided Intel teams with feedback for feature requests, critical bugs, and user-interfaces.

Roofline analysis

Kwack has recently focused on promoting a roofline analysis feature from Advisor. The roofline analysis captures application performance characteristics and subsequently determines achievable peak performance. With knowledge of these characteristics, application developers can identify performance bottlenecks in their applications and optimize code for Aurora testbed systems (and eventually for the finalized Aurora system itself). Kwack leads roofline analysis tutorials—regularly updated to reflect the most current technologies and trends—at a variety of conferences, including SC, Aurora user training events, and the ECP Annual Meeting.

Performance projection

Kwack’s recent efforts also include work on the performance projection feature of Advisor. Ponte Vecchio GPUs—the processors expected to power Aurora—are currently unavailable, but application developers still need estimates of how application performance will fare under the exascale system’s architecture. Drawing on existing testbed systems to generate assessments to target the Ponte Vecchio GPUs, Advisor provides a systematic approach to estimating application performance for Aurora. Kwack has used an array of applications to provide several cases for validation of performance projection features; on this front he continues to collaborate with the Intel team to develop improved and more reliable capabilities.

The collaborations with Intel tool developers also have the benefit of informing partnerships with colleagues at institutions such as Rice University and University of Oregon to prepare applications including HPCToolKit and TAU for production science as soon as Aurora is deployed to users.