Computational & Technology Resources
an online resource for computational,
engineering & technology publications
Civil-Comp Proceedings
ISSN 1759-3433
CCP: 94
PROCEEDINGS OF THE SEVENTH INTERNATIONAL CONFERENCE ON ENGINEERING COMPUTATIONAL TECHNOLOGY
Edited by:
Paper 29

Parallelization Strategies for Computing PageRank

H. Migallón1, V. Migallón2, J.A. Palomino2 and J. Penadés2

1Department of Physics and Computer Architectures, University Miguel Hernández, Elche, Alicante, Spain
2Department of Computer Science and Artificial Intelligence, University of Alicante, Spain

Full Bibliographic Reference for this paper
, "Parallelization Strategies for Computing PageRank", in , (Editors), "Proceedings of the Seventh International Conference on Engineering Computational Technology", Civil-Comp Press, Stirlingshire, UK, Paper 29, 2010. doi:10.4203/ccp.94.29
Keywords: PageRank, parallel algorithms, Power method, shared memory, distributed memory, sychronous and asynchronous iterations.

Summary
One of the most difficult problems in web search is the ranking of the results recalled in response to a user query. Since contemporary web crawls discover billions of pages, broad topic queries may result in recalling hundreds of thousand of pages containing the query terms. Moreover, these results are presented in order of relevance. A variety of ranking features are used by internet search engines to determine with a good order. Some of the query-independent features utilize the hyperlink structure of the web. The PageRank algorithm [1] is one of those that introduces a content neutral-ranking function over web pages. PageRank uses the Power method to compute successive iterates that converge to the principal eigenvector of the Markov chain representing the web link graph. As a result of the great size and sparsity of the transition matrix of the Markov chain, methods based on decomposition are considered infeasible; instead, iterative methods are used, where the computation is dominated by matrix vector products [2]. In recent years opportunities for parallel execution have broadened their scope. In this paper different parallel implementations of the Power method are proposed for computing PageRank.

Synchronous and asynchronous implementations of the Power method are analyzed using different data distribution strategies. Concretely three strategies of data distribution were used: row-wise partitioning, nonzero elements partitioning, and a distribution that combines the previous ones.

The proposed algorithms have been implemented on two parallel computers. The first platform is a DELL PowerEdge 2900 with two Quad-Core Intel Xeon 5320 sequence processors at up to 1.86 GHz. The second platform is a Beowulf cluster of 6 nodes Intel x86 connected through a 1 Gb/sec. Each node consists of two Intel Xeon Quad-Core processors at up to 2.60 GHz. The reported experiments show the behaviour of the designed algorithms for realistic test data on both shared and distributed architectures. The results show that we achieve a considerable speed-up with the parallel (synchronous and asynchronous) implementations. Also, the best strategies of data distribution were the nonzero elements partitioning and the mixed partitioning. On the other hand, in order to deal with larger problems and to use all the available memory in a distributed system, we conclude that the best strategy of parallelization needs to use at the same time the benefits of shared and distributed memory multiprocessors.

References
1
L. Page, S. Brin, R. Motwani, T. Winograd, "The PageRank citation ranking: Bringing order to the Web", Technical Report, Stanford Digital Library Technologies Project, 1998.
2
P. Berkhin, "A Survey on PageRank Computing", Internet Mathematics, 2(1), 73-120, 2005. doi:10.1080/15427951.2005.10129098

purchase the full-text of this paper (price £20)

go to the previous paper
go to the next paper
return to the table of contents
return to the book description
purchase this book (price £125 +P&P)