Parallel I/O Programming: Present and Future

Saba Sehrish
Seminar

Parallel I/O solutions, both at high level I/O libraries (PnetCDF, HDF5) and middleware (MPI-IO) level, have significantly improved performance of many large scale computational science applications . In this talk, we will revisit some of the commonly used I/O middleware and libraries (MPI-IO, PnetCDF, HDF5), and their well-known optimizations (collective I/O, non-blocking I/O, chunking, etc). In the first part of this talk,i.e. Present, we will discuss a case for the middleware improvement. We will focus more specifically on the two-phase I/O implementation of collective I/O optimization in MPI-IO. We have observed that in many large scale runs, applications spend significant amount of time in the request aggregation phase. We have implemented a pipeline mechanism to overlap the request aggregation and the file I/O phases and observe performance improvement. In the second part of the talk, i.e. Future, we will present a case that as computational models and hardware both are getting complex, and I/O libraries still rely on the vectors of variables, there is a need for higher-level data model based API to support more sophisticated data models. We will describe DAMSEL, a data model based storage library and its design, and then some use cases to support the library design.