Design of a Multithreaded Barnes-Hut Algorithm for Multicore Clusters

Junchao Zhang
Seminar

The Barnes-Hut (BH) algorithm is a fast algorithm for the n-body problem, which simulates the evolution of a system of n particles. Thanks to the non-uniform distribution of input particles, the communication in BH is computation dependent and irregular, which makes it very hard to program it in MPI two-sided. In a previous work, we optimized BH in UPC -- a partitioned global address space (PGAS) language. We gained high performance but also showed what lacks in current PGAS languages to make them really productive and proposed some high level language abstracts. In this talk, I will introduce a new multithreaded BH design for multicore clusters in this direction. In the new code, we decompose computations into tasks and spawn multiple threads in a node. Threads switches between tasks based on their data availability and forward remote data requests to a specially designated communication thread, which fetches data on remote nodes using one-sided communication. I will use experimental data to show benefits of this approach, including less memory consumption, less inter-node communication, better load-balancing, and better performance than before. Finally, I will also compare it with other BH implemented in MPI or Charm++.