Layout Optimization Techniques for Extreme-scale Analytics

Jonathan Jenkins
Seminar

Recent trends in I/O in an HPC context present significant, multi-dimensional challenges: coping with huge increases in the amount and complexity of data produced, effectively using increasingly complex I/O subsystems and hardware configurations, and allowing for swift data analysis under varying access patterns, to name a few. To address these challenges, advanced data reorganization and analytics techniques must be explored, placing particular focus on data reduction as a first-order constraint. To this end, I will present two technologies made to accelerate different data access workloads. First, I will introduce a precision-based technique for multiresolution analysis, extracting favorable performance and accuracy characteristics by exploiting the floating-point data format. Second, I will discuss a parallel system for query-driven analysis that drives down storage and query processing costs by operating directly in a compressed data space. To conclude, I will present ongoing works that aim to tame the data complexity problem inherent in data layout optimizations: how transparent can these tools be made to the end-user, and can I/O libraries support them effectively?