Handling Data During AI Training

Taylor Childers, ALCF
Riccardo Balin, ALCF
STS Childers Session Graphic

Trainees will learn to effectively use hybrid computers that use CPUs and GPUs concurrently for deep learning. We will introduce TensorFlow’s Data management API and learn how to use the CPU for data preparation while the GPU performs AI computations.

This week's science talk introduces the software infrastructure developed to perform online learning, as well as inference, on current and future supercomputers. Machine learning (ML) approaches have been gaining popularity in the field of computational fluid dynamics (CFD) since they present encouraging solutions to a wide range of problems. However, as we embark on the era of exascale computations, saving to disk all the necessary training data produced by a simulation can be a significant efficiency and storage bottleneck. This limitation is resolved by performing online (in situ) learning, wherein the models are trained concurrently with the simulation producing the data, thus avoiding storing any data to the disk. 

About the Speaker

Taylor Childers has a Ph.D. in Physics from Univ. of Minnesota. He worked at the CERN laboratory in Geneva, Switzerland for six years as a member of the ATLAS experiment and a co-author of the Higgs Boson discovery paper in July 2012. He has worked in physics analysis, workflows, and simulation from scaling on DOE supercomputers to fast custom electronics (ASIC/FPGA). He applies deep learning to science domain problems, including using Graph Neural Networks to perform semantic segmentation to associate each of the 100 million pixels of the ATLAS detector to particles originating from the proton collisions. He is currently working with scientists from different domains to apply deep learning to their datasets and take advantage of Exascale supercomputers arriving in the next few years. 

Riccardo Balin is a Postdoctoral Appointee under the Aurora Early Science Program. His work focuses on coupling large-scale high-fidelity simulations of turbulent flows with Machine Learning in order to perform in-situ training of, as well as inference with, data-driven turbulence closure models. Riccardo obtained a BS/MS degree in Aerospace Engineering in 2016 from the University of Colorado Boulder, which was followed by a Ph.D. in Computational Fluid Dynamics and Turbulence Modeling awarded in 2020 from the same institution.