A widening performance gap is separating CPU performance and IO bandwidth on large-scale systems. In some fields, such as weather forecast and nuclear fusion, numerical models generate such amounts of data that classical post hoc processing is not feasible anymore due to the limits in both storage capacity and IO performance. In situ approaches are attractive to bypass disk accesses in these cases and fully leverage the HPC platform. They are, however, often complex to set up and can require to re-develop parallel versions of the analysis from scratch.
In this talk, we present our work on coupling the bulk synchronous parallel paradigm for simulation with a distributed task-based one for analysis. This reduces complexity and leverages the best of each of these two powerful paradigms.
We propose a bridging model between the two paradigms and validate it through a prototype called DEISA, which supports coupling MPI parallel codes with analyses written using Dask. The bridging model requires minimal modifications of both the simulation and analysis codes compared to their post hoc counterpart. It gives access to an already existing rich ecosystem to be used in situ, such as the parallel versions of Numpy, Pandas, and Scikit-learn. The results are quite encouraging and show good performance with minimum coding efforts.
BIO – Amal Gueroudji
Amal Gueroudji graduated with an engineering and a master’s degree in computer systems from the Higher National School of Computer Science (ESI, ex INI) in Algeria. Her final year project consisted of automating CPU/GPU communications in the GPU backend of the Tiramisu Compiler. Since then, she joined the HPC research community through her PhD at the French Atomic Energy and Alternative Energies Commission. She worked on coupling MPI with the distributed task-based model applied in in situ workflows. Dr. Amal joined the MCS division at Argonne after her PhD and started working on workflows and data services for science within the Radix-io team.
See upcoming and previous presentations at CS Seminar Series.