Memory movement optimized multigrid algorithms

Jed Brown, Postdoc, MCS
Mark Adams, Columbia University
Seminar

Current and future generations of large scale computers will be constrained by power -- this is driving an epochal change in computer architectures that rivals the transition to distributed memory machines in the late 1980s. Data movement, though important since the 1980s, will be central to the cost of computing on future generations of machines because memory use and data movement are responsible for most of the power budget of large scale computers. Future architectures are far from well understood but we can assume that memory movement and massive concurrency will be central to effective algorithms on these machines. With this in mind, we propose an equation solver algorithm that is a parallel extension of a low memory multigrid method proposed by Brandt in the 1970s (segregated refinement, with log^d(N) memory complexity). This method has the potential to radically reduce memory usage, and consequently data movement, and thereby use future machines effectively. This algorithm possesses massive concurrency, is amenable to data or task driven programming models, enforces good data locality, is amenable to loop fusion and is generally attractive in memory-centric computer cost models. We describe the algorithm, techniques to implement the method and show numerical results for a simple model problem to verify the correctness of the method. We further consider parallel data models on an example exa-scale machine model and case studies on the application of these methods to applications.