In the last years, GPUs gain high popularity in high performance Computing, due to their high computational power and high-energy efficiency. Especially, since energy efficiency becomes more and more important due to ecological, economical and technical reasons. Despite this popularity, data transfer in GPU-based Clusters remains a challenging problem, due to the disjoint memory of GPU and host. New technologies, such as GPUDirect RDMA, help to improve data transfer among multiple GPUs and enable new data transfer and communication models for distributed GPUs. Data transfer will be one of the main factors for scaling and energy efficiency in next generation high performance computing systems.
In my work, I evaluate different communication methods for distributed GPUs. The most common approach is a hybrid approach, where compute Intensive task are off loaded to the GPU, while the CPU controls the communication. As an example for this approach, I will talk about GPI2- for GPUs, a P GAS based framework for one-sided communication in GPU accelerated clusters.
However, another approach is to allow the GPU to control the communication, avoiding context switches between GPU and host CPU. In my talk I will illustrate which steps are required to enable communication control from the GPU and which problems are related with GPU-controlled communication. I will introduce two possible frameworks for GPU-controlled communication, a one-sided communication interface, based on put/get operations and a global address space approach, which allows thread collective communication with simple load and store instructions on the GPU.
This different communication approaches are evaluated with the scope on performance and energy efficiency. Although GPU controlled communication not necessary means better performance, enabling the GPU to control the communication and reliving the CPU from this work can increase the performance per Watt.