Computational & Technology Resources
an online resource for computational,
engineering & technology publications 

Computational Science, Engineering & Technology Series
ISSN 17593158 CSETS: 27
TRENDS IN PARALLEL, DISTRIBUTED, GRID AND CLOUD COMPUTING FOR ENGINEERING Edited by: P. Iványi, B.H.V. Topping
Chapter 13
Parallel Approximate Inverse Preconditioning using the Finite Difference Method: The General Purpose Graphics Processing Unit Approach G.A. Gravvanis^{1}, C.K. FilelisPapadopoulos^{1} and K.M. Giannoutakis^{2}
^{1}Department of Electrical and Computer Engineering, School of Engineering, Democritus University of Thrace, Xanthi, Greece G.A. Gravvanis, C.K. FilelisPapadopoulos, K.M. Giannoutakis, "Parallel Approximate Inverse Preconditioning using the Finite Difference Method: The General Purpose Graphics Processing Unit Approach", in P. Iványi, B.H.V. Topping, (Editors), "Trends in Parallel, Distributed, Grid and Cloud Computing for Engineering", SaxeCoburg Publications, Stirlingshire, UK, Chapter 13, pp 291319, 2011. doi:10.4203/csets.27.13
Keywords: sparse linear systems, parallel approximate inverses, parallel preconditioned conjugate gradient type methods, parallel computations, GPGPU, CUDA programming.
Summary
During recent decades, explicit approximate inverse preconditioning methods have been extensively used for efficiently solving sparse linear systems on multiprocessor systems. The effectiveness of explicit approximate inverse preconditioning schemes relies on the use of efficient preconditioners that are close approximants to the coefficient matrix and are fast to compute in parallel.
A new class of parallel computational techniques is proposed for the parallelization of the explicit approximate inverse and the explicit preconditioned conjugate gradient type method, [4,5,9], on a graphics processing unit (GPU). The proposed parallel methods have been implemented using compute unified device architecture (CUDA) developed by NVIDIA, [1,7,10]. For the parallel construction of the approximate inverse a "fish bone" computational approach is introduced, with respect to the antidiagonal data dependency pattern, where the massively parallel environment of the GPU offers simultaneous calculation of the elements of the inverse through a pipeline scheduling assigning each inverted Lshaped block to each hardware thread in the GPU [3,6]. The inherently parallel linear operations between vectors and matrices involved in the explicit preconditioned biconjugate gradient schemes exhibit significant amounts of looplevel parallelism because of the matrixvector and the vectorvector products that can lead to high performance gain on the GPU systems, specifically designed for such computations, [8]. Finally, numerical results for the performance of the explicit approximate inverse and the explicit preconditioned conjugate gradient type method for solving characteristic twodimensional problems, using the finite difference method on a massive multiprocessor interface of the GPU, are presented. The CUDA implementation issues of the proposed method are also discussed. References
purchase the fulltext of this chapter (price £20)
go to the previous chapter 
