Accelerated Simulations of Cosmic Dust Heating on Intel Xeon Phi Coprocessors

Andrey Vladimirov
Seminar

Cosmic dust absorbs starlight in the optical and ultraviolet ranges and re-emits it in infrared, making the Galaxy optically thick with respect to this process. In numerical models of Galactic radiation transfer, the heating of small (tens to thousands of atoms) grains is computationally demanding, because the thermal approximation is not applicable, and a temperature distribution must be computed using a matrix formalism similar to that for atomic level excitation. Yet, the stochastic heating of small grains is an essential piece of a larger project aiming to reconstruct the 3D structure of the Milky Way from the multi-wavelength 2D maps (sky surveys) from space-based observations of the past two decades, using numerical simulation of radiative transport in the Galaxy and Bayesian inference. This talk will present a case study on using Intel Xeon Phi coprocessors (also known as the Intel Many Integrated Core, or MIC, architecture) for the accelerating an astrophysical library for computing the stochastic emissivity of small cosmic dust grains, HEATCODE.

The nature of Xeon Phi-accelerated calculations in HEATCODE is representative of a general class of physics problems with voxelized simulation spaces, dense linear algebra, discretized and analytically fitted functional dependencies of physical quantities. I will describe the programming methods and optimization practices for the Intel MIC architecture used in HEAT-CODE, and present optimized code examples based on our work, along with benchmarks of our library on Intel Xeon Phi coprocessors at each step of the optimization process. The most important aspect of optimization of legacy codes for Xeon Phi that we observed is that the same methods apply to optimization for general-purpose multi-core CPU architecture and for MIC architecture. Code improvements that lead to a 600x speedup on the coprocessor with respect to legacy baseline also improved the CPU version by 100x. As a result, a single implementation of the performance-critical code needs to be developed, used and maintained for the CPU and for the coprocessor.

Andrey Vladimirov, PhD, is Head of HPC Research at Colfax International. His primary interest is the application of modern computing technologies to computationally demanding scientific problems. Prior to joining Colfax, A. Vladimirov was involved in computational astrophysics research at Stanford University, North Carolina State University, and the Ioffe Institute (Russia), where he studied cosmic rays, collision less plasmas and the interstellar medium using computer simulations. He is a co-author of the book Parallel Programming and Optimization with Intel Xeon Phi Coprocessors, a regular contributor to the online resource Colfax Research, and an author or co-author of over 10 peer-reviewed publications in the fields of theoretical astrophysics and scientific computing.