The 'Get-Compute-Update' template is the primary programming model utilized by large scale GA (Global Array) based applications, especially in the field of quantum chemistry as seen in the case of applications such as NWChem, GAMESS-UK, and Molpro. GA used in such application codes, allows the programmer to create large, multidimensional shared arrays that span the memory of multiple nodes and supports asynchronous one-sided operations such as get, put, and accumulate as well as high-level parallel mathematical routines. However such applications conforming to the 'Get-Compute-Accumulate' template are often limited in terms of fully exploiting the amount of reuse associated with the data transferred during the remote get/accumulate operations. We investigate and analyze how incorporating 1-D and 2-D blocking strategies, within this template serves as an optimization and affects the performance of such codes substantially. Blocking Get and Accumulate operations allows us to minimize the amount of data-movement by fetching multiple blocks of data at a time into local memory. We implement and evaluate this approach using several microbenchmarks (each capturing a different blocking strategy variant) and a full-scale application (NWChem). In this talk, we present our findings and discuss the scope for future optimizations.
Bio:
Priyanka Ghosh is a PhD student in the HPCTools group Laboratory at the University of Houston. She completed her MS in computer science at the University of Houston in 2012. Her current research interests lie in parallel programming, task-parallel data-driven programming models.