Scalable Parallel Graph Algorithms

Mostafa Patwary
Seminar

Graph algorithms have been playing a crucial role for solving problems in several domains including scientific computing, data mining, social networks, computational biology, and image decomposition. To deal with the emerging big data, it has been of significant importance to develop scalable parallel graph algorithms capable of utilizing the parallelism of todays multicore, many-core, and heterogeneous machines. In this talk, I will present the Union-Find (UF) algorithm, which is popularly known as one of the best ways to solve the minimum spanning tree (MST) problem. We performed a comprehensive study to compare all of the different versions of the UF algorithm and proposed the best way of implementing the UF algorithm in general and demonstrated that a somewhat forgotten simple algorithm is the fastest, in spite of the fact that its worst-case time complexity is inferior to that of the commonly accepted “best” algorithms. To solve the MST problem using big data, I will then briefly present scalable frameworks for the UF algorithm both on shared memory and distributed memory computers with reasonable speedup.

Later I will talk on a well-known density based clustering algorithm, named DBSCAN, capable of discovering arbitrary shaped clusters and eliminating noise data. DBSCAN has been popular in image processing for many scientific domains, such as astronomy, biology, and earth science. I will show how a graph algorithmic technique (the parallel UF algorithm) can be used to break the access sequentiality of DBSCAN. Using data sets from millennium simulation runs consisting of billions of stars and galaxies, I show that parallel DBSCAN significantly out-performs previous algorithms, achieving speedups up to 25.97 using 40 cores on shared memory architecture, and speedups up to 5,765 using 8,192 cores on distributed memory architecture.