Optimizing Complex Scientific Workflows Using a Reconfigurable Heterogeneous-Aware Storage System for Extreme-Scale Computing

Hariharan Devarajan, Illinois Institute of Technology

Abstract: Traditional scientific discovery was driven through the computation power of the computer system. Hence, they treated I/O as sparse tasks to perform occasional checkpoints for the application. This has led to a growing gap between computation power and storage capabilities. However, in the era of data explosion, where data analysis is essential for scientific discoveries, the slow storage system has led to the research conundrum known as I/O bottleneck. Additionally, the explosion of data has led to the proliferation of applications as well as storage technologies. This has created a complex matching problem between diverse application requirements and storage technology features. In this proposal, we introduce Jal, a dynamic, re-configurable, and heterogeneous-aware storage system. Jal utilizes a layered approach including application model, data model, and storage model. The application model uses a source-code based profiler which identifies the cause of the I/O behavior of applications. The data model translates various applications' I/O requirements to underlying storage configuration to extract the maximum performance of each application. Finally, the storage model builds a heterogeneous-aware storage system that can be dynamically re-configured to different storage configurations. Our evaluations have shown these models can accelerate I/O for the application while transparently and efficiently utilizing the diverse storage systems.

Bio: Hariharan is a 5th year PhD student and Graduate Research Assistant in the Scalable Computing Software Laboratory at Illinois Institute of Technology. His research interests are in scalable scientific data management, parallel I/O, data management systems for managing scientific data, heterogeneous computing, and in the convergence of Big Data and HPC storage systems.