Understanding and Improving the I/O Behavior of Scientific Computing Applications

Shane Snyder, Argonne National Laboratory

I/O is a well-established pain point for many scientific applications running on high performance computing (HPC) systems. These systems deploy massive, state-of-the-art storage subsystems which applications leverage for managing their scientific data, typically using an increasingly deep and complex I/O software stack. The complexity of the HPC I/O subsystem can pose a significant challenge to users, who are often ill-equipped for understanding and improving their application I/O workloads, hampering system efficiency and scientific productivity.

In this talk, I will present Darshan, an I/O characterization and analysis tool commonly employed by application users, facility staff, and researchers at HPC centers across the world for better understanding and improving storage access patterns. We will focus on recent advancements to Darshan's instrumentation capabilities, as well as cover recent efforts in developing a Python-based analysis framework for Darshan log data. We will also cover future directions for Darshan so that it may remain a fundamental tool for HPC I/O understanding as scientific computing continues to evolve.

Shane Snyder is a software engineer in the Mathematics and Computer Science Division of Argonne National Laboratory. He received his master's degree in computer engineering from Clemson University in 2013. His research interests primarily include the design of high-performance distributed storage systems and the characterization and analysis of I/O workloads on production HPC systems. Shane is a contributor on two R&D100 award winning software projects, the Darshan I/O characterization tool (2018) and the Mochi data services project (2021).