Do-It-Yourself Parallel Data Analysis

Tom Peterka
Seminar

Scalable data analysis and visualization (collectively called analysis) depends on lightweight, custom tasks that can be tightly integrated with both computational science applications and existing analysis tools. DIY (Do-it-Yourself Analysis) is a scalable library of data-movement algorithms for domain decomposition, parallel I/O, and efficient communication that permits data analysis to be parallelized and executed as a data-parallel program. Parallel analyzes are then executed in situ with full-scale simulations and in tandem with existing visualization and analysis packages. DIY has enabled the parallelization of serial analysis algorithms previously considered difficult to scale efficiently, or for which no parallel counterpart existed. These include parallel particle tracing for steady and unsteady flows, information entropic analysis, topological construction, computation of Lagrangian coherent structures, and mesh generation from N-body particle simulations. This seminar covers the basics of DIY: data decomposition into blocks, assignment of blocks to processes, support for multiple domains, grouping of blocks into neighborhoods, creation of custom DIY datatypes, communication mechanisms, and integration with other libraries.