Computational & Technology Resources
an online resource for computational,
engineering & technology publications
Computational Science, Engineering & Technology Series
TRENDS IN PARALLEL, DISTRIBUTED, GRID AND CLOUD COMPUTING FOR ENGINEERING
Edited by: P. Iványi, B.H.V. Topping
Parallel Approximate Inverse Preconditioning using the Finite Difference Method: The General Purpose Graphics Processing Unit Approach
G.A. Gravvanis1, C.K. Filelis-Papadopoulos1 and K.M. Giannoutakis2
1Department of Electrical and Computer Engineering, School of Engineering, Democritus University of Thrace, Xanthi, Greece
G.A. Gravvanis, C.K. Filelis-Papadopoulos, K.M. Giannoutakis, "Parallel Approximate Inverse Preconditioning using the Finite Difference Method: The General Purpose Graphics Processing Unit Approach", in P. Iványi, B.H.V. Topping, (Editors), "Trends in Parallel, Distributed, Grid and Cloud Computing for Engineering", Saxe-Coburg Publications, Stirlingshire, UK, Chapter 13, pp 291-319, 2011. doi:10.4203/csets.27.13
Keywords: sparse linear systems, parallel approximate inverses, parallel preconditioned conjugate gradient type methods, parallel computations, GPGPU, CUDA programming.
During recent decades, explicit approximate inverse preconditioning methods have been extensively used for efficiently solving sparse linear systems on multiprocessor systems. The effectiveness of explicit approximate inverse preconditioning schemes relies on the use of efficient preconditioners that are close approximants to the coefficient matrix and are fast to compute in parallel.
A new class of parallel computational techniques is proposed for the parallelization of the explicit approximate inverse and the explicit preconditioned conjugate gradient type method, [4,5,9], on a graphics processing unit (GPU). The proposed parallel methods have been implemented using compute unified device architecture (CUDA) developed by NVIDIA, [1,7,10].
For the parallel construction of the approximate inverse a "fish bone" computational approach is introduced, with respect to the anti-diagonal data dependency pattern, where the massively parallel environment of the GPU offers simultaneous calculation of the elements of the inverse through a pipeline scheduling assigning each inverted L-shaped block to each hardware thread in the GPU [3,6]. The inherently parallel linear operations between vectors and matrices involved in the explicit preconditioned bi-conjugate gradient schemes exhibit significant amounts of loop-level parallelism because of the matrix-vector and the vector-vector products that can lead to high performance gain on the GPU systems, specifically designed for such computations, .
Finally, numerical results for the performance of the explicit approximate inverse and the explicit preconditioned conjugate gradient type method for solving characteristic two-dimensional problems, using the finite difference method on a massive multiprocessor interface of the GPU, are presented. The CUDA implementation issues of the proposed method are also discussed.
purchase the full-text of this chapter (price £20)