Computational & Technology Resources
an online resource for computational,
engineering & technology publications 

CivilComp Proceedings
ISSN 17593433 CCP: 95
PROCEEDINGS OF THE SECOND INTERNATIONAL CONFERENCE ON PARALLEL, DISTRIBUTED, GRID AND CLOUD COMPUTING FOR ENGINEERING Edited by:
Paper 74
Parallel Matrix Algorithms for Image Processing P. Kotas, V. Vondrák, P. Praks and M. Stachon
Department of Applied Mathematics, Faculty of Electrical Engineering and Computer Science, VSBTechnical University of Ostrava, Czech Republic , "Parallel Matrix Algorithms for Image Processing", in , (Editors), "Proceedings of the Second International Conference on Parallel, Distributed, Grid and Cloud Computing for Engineering", CivilComp Press, Stirlingshire, UK, Paper 74, 2011. doi:10.4203/ccp.95.74
Keywords: singular value decomposition, latent semantic indexing, matrix algorithms, image retrieval.
Summary
In this work we present a parallel version of Householder's bidiagonalization, which we consider the most computationally demanding phase of the singular value decomposition. Singular value decomposition consists of three consecutive steps: (i) bidiagonalization, (ii) computation of singular values and vectors of bidiagonal matrix and (iii) postmultiplication of results from previous two steps. We have implemented a parallel version of bidiagonalization using the C++ programming language with a message passing interface on a computing cluster. This allows us to process huge multimedia data directly in the distributed memory. Our algorithm uses cyclic row distribution for better load balancing, which helps to reduce communication over the fast communication network. This also allows us to efficiently use our implementation of BLAS2 routines. Our parallel version of Householder bidiagonalization is based on a standard algorithm defined in [2], which is widely used in several numerical libraries including LAPACK.
The main advantage of our implementation is efficiency in handling largescale data, which makes our algorithm well designed for problems in multimedia processing, where document matrices are usually very large [3]. In our experiments, we tried to decompose several very large matrices. The largest matrix decomposed by our algorithm had the dimensions 32768x32768, which is 8.1GB in memory. We used 32 processors and our algorithm had been running for 32.32 hours and required 1.79 hours of MPI communication. References
purchase the fulltext of this paper (price £20)
go to the previous paper 
