Mochi in Practice: Data Services for High-Energy Physics and Elastic In Situ Visualization Workflows

Matthieu Dorier, Argonne National Laboratory
Webinar
CS Seminar Graphic

This seminar is the second of a two-part series on Mochi. The first part, presented by Phil Carns, introduced the Mochi project and how a collection of composable building blocks could be used to build specialized HPC data services. In this talk, we will dive into the Mochi methodology of HPC data service “composition” and focus on two of its success stories: the HEPnOS storage system for high-energy physics (HEP), and the Colza elastic in situ visualization service.

Traditional HEP workflows exchange data between tasks via files. This method incurs a large I/O overhead and forces tasks to work at file granularity, hindering scalability. To address these problems, Argonne and Fermilab developed HEPnOS in the context of the SciDAC "HEP on HPC" project. HEPnOS is a Mochi-based distributed storage system aggregating storage capacity from compute nodes and providing an API that works on native HEP data. A HEPnOS-based version of the NOvA workflow showed an order of magnitude higher throughput than its file-based counterpart on the Theta supercomputer. We will also show how AI was used to autotune and optimize this workflow.

Colza is an in situ analysis framework developed in the context of the SciDAC "RAPIDS2" collaboration with Rutgers University. Contrary to existing in situ frameworks, Colza is able to dynamically change its number of nodes to accommodate for varying workloads, while still relying on state-of-the-art libraries such as Catalyst and Ascent for its rendering pipelines. It does so by replacing MPI with a Mochi-based, elastic communication mechanism in the aforementioned libraries. Colza proved that in situ visualization can be done under time constraints by adapting resource usage.

Bio: Matthieu Dorier

Matthieu Dorier is a software development specialist in Argonne’s MCS Division. He obtained his PhD from Ecole Normale Supérieure de Rennes, France, in 2014. Matthieu then completed a two-year postdoc at Argonne, before becoming a permanent member of the RADIX-IO team. Matthieu’s interests include I/O, storage, in situ analysis, networking, and software engineering for HPC. He has been a core member of the R&D100-winning Mochi project since its start in 2015 and has developed many of its software components.

See upcoming and previous presentations at CS Seminar Series