RADAR: Runtime Asymmetric Data Access-driven Replication

Xiaocheng Zou and Houjun Tang
Seminar

Efficient data access on High Performance Computing (HPC) architectures is difficult to achieve. Asides from the ever-increasing scale of both architecture and dataset sizes, an important driver of this difficulty is the non-trivial relationship between application access patterns, data distribution, and resulting performance. Specifically, access patterns that vary across applications, core counts, and usage scenarios complicate the choosing of a single, effective data distribution. In our work, we present RADAR, Runtime Asymmetric Data Access-driven Replication,  a step towards adaptive layout optimization, integrating intelligent pattern-detection with a set of layout modification operators, to accelerate data access across time-varying, heterogeneous access patterns. Our layout optimization is based on partial replication, allowing the replication of data in exchange for I/O performance optimization, under a user-control bounded storage space. RADAR consists of the following components: ADIO Tracer, which produces datatype-aware, collective-aware I/O traces by capturing I/O requests at ROMIO’s ADIO layer; Pattern Analyzer, which analyzes I/O trace and outputs access patterns of interest, such as strided access patterns; Replica Layout Manager, which determines what data to replicate and in what layout for replication; and Replica-aware ADIO Driver, which accelerates I/O requests by remapping them into the replication space. Nearly all aspects of our system reside within a single file container, without modifying the original data layout, through the usage of EOF, an extension of PVFS2 allowing direct object-based access. In this talk, we will explain each component of RADAR in detail, discuss the project’s current status and some preliminarily results, and finally discuss future work.

Bio:

Xiaocheng Zou, PhD student of Computer Science in North Carolina State University

Xiaocheng Zou is a third year (for coming semester) Computer Science Ph.D student at North Carolina State University (NCSU). He works with Professor Nagiza Samatova at NCSU. Before coming to NCSU, he received his Master degree in Computer Science from University of North Carolina at Greensboro in May of 2011. His general research interest is High Performance Computing. More specifically, he focuses on improving performance of parallel I/O and large-scale data analysis.

Houjun Tang, PhD student of Computer Science in North Carolina State University

Houjun Tang received his B.E. degree in Computer Science and Technology in 2012 from Shenzhen University, China. He is currently a PhD student of Computer Science at North Carolina State University. His research interests include high performance computing, parallel I/O, and data mining.