Computational & Technology Resources
an online resource for computational,
engineering & technology publications
Civil-Comp Proceedings
ISSN 1759-3433
CCP: 95
Edited by:
Paper 24

GPU-Based Parallel Nonlinear Conjugate Gradient Algorithms

V. Galiano1, H. Migallón1, V. Migallón2 and J. Penadés2

1Department of Physics and Computer Architectures, University Miguel Hernández, Elche, Alicante, Spain
2Department of Computer Science and Artificial Intelligence, University of Alicante, Spain

Full Bibliographic Reference for this paper
, "GPU-Based Parallel Nonlinear Conjugate Gradient Algorithms", in , (Editors), "Proceedings of the Second International Conference on Parallel, Distributed, Grid and Cloud Computing for Engineering", Civil-Comp Press, Stirlingshire, UK, Paper 24, 2011. doi:10.4203/ccp.95.24
Keywords: GPGPU, GPU libraries, multicore, nonlinear conjugate gradient algorithms, parallel preconditioners, ILU factorizations, two-stage methods, Bratu problem.


The algorithms described here have been implemented using an Intel Core 2 Quad Q6600 and an NVIDIA GTX 280 GPU. We display the numerical results obtained using CUDA over the GPU and we compare these results with those obtained on the shared memory platform using an OpenMP model. Furthermore, a mixed model is considered in order to exploit the characteristics of both parallel systems. The reported numerical experiments analyze the behavior of these algorithms working in a fine grain parallel environment compared with a thread-based environment. We have analyzed the proposed algorithms in order to identify the main operations, and we have implemented some optimizations and tested some libraries in order to perform these operations optimally. CUBLAS and CUSPARSE libraries offer a good performance, and the sparse matrix format should be chosen according to the parallel architecture, being ELLPACK-R [3] the most efficient format.

On the other hand, we have shown differences in adaptation of both methods to the fine grain GPU architecture. We would like to point out that the use of the GPU improves the results obtained using any of the proposed methods. Moreover, the NLCG method exploits better the parallelism offered by the GPU than the NLPCG method.

R. Fletcher, C. Reeves, "Function Minimization by Conjugate Gradients", The Computer Journal, 7, 149-154, 1964. doi:10.1093/comjnl/7.2.149
R. Bru, V. Migallón, J. Penadés, D.B. Szyld, "Parallel, Synchronous and Asynchronous Two-Stage Multisplitting Methods", Electronic Transactions on Numerical Analysis, 3, 24-38, 1995.
F. Vázquez, J.J. Fernández, E.M. Garzón, "A new approach for sparse matrix vector product on NVIDIA GPUs", Concurrency and Computation: Practice and experience, 2010. doi:10.1002/cpe.1658

purchase the full-text of this paper (price £20)

go to the previous paper
go to the next paper
return to the table of contents
return to the book description
purchase this book (price £85 +P&P)