Computing Strongly Connected Components in Parallel on CUDA (full version)
July 2010, 24 pages.
Available as Postscript,
The problem of decomposition of a directed graph into its strongly connected
components is a fundamental graph problem inherently present in many
scientific and commercial applications. In this paper we show how existing
parallel algorithms can be reformulated in order to be accelerated by NVIDIA
CUDA technology. In particular, we design a new CUDA-aware procedure for pivot
selection and we redesign the parallel algorithms in order to allow for CUDA
accelerated computation. We also experimentally demonstrate that with a single
GTX 280 GPU card we can easily outperform optimal serial CPU implementation,
which is particularly interesting result as unlike the serial CPU case, the
asymptotic complexity of the parallel algorithms is not optimal.