Computational & Technology Resources
an online resource for computational,
engineering & technology publications
Civil-Comp Proceedings
ISSN 1759-3433
CCP: 89
PROCEEDINGS OF THE SIXTH INTERNATIONAL CONFERENCE ON ENGINEERING COMPUTATIONAL TECHNOLOGY
Edited by: M. Papadrakakis and B.H.V. Topping
Paper 2

Acceleration of an Element-by-Element Preconditioned Conjugate Gradient Solver for Three-Dimensional Tetrahedral Finite Elements using Field Programmable Gate Arrays

J. Hu1, S.F. Quigley1 and A.H.C. Chan2

1Department of Electronic, Electrical and Computer Engineering,
2Department of Civil Engineering,
University of Birmingham, United Kingdom

Full Bibliographic Reference for this paper
J. Hu, S.F. Quigley, A.H.C. Chan, "Acceleration of an Element-by-Element Preconditioned Conjugate Gradient Solver for Three-Dimensional Tetrahedral Finite Elements using Field Programmable Gate Arrays", in M. Papadrakakis, B.H.V. Topping, (Editors), "Proceedings of the Sixth International Conference on Engineering Computational Technology", Civil-Comp Press, Stirlingshire, UK, Paper 2, 2008. doi:10.4203/ccp.89.2
Keywords: reconfigurable computing, hardware acceleration, field programmable gate array, finite element method, preconditioned conjugate gradient.

Summary
Three dimensional finite element analyses require a large amount of time and memory in order to solve a large but sparse matrix-vector system of linear equations. One approach to the parallel computation of solutions is to use reconfigurable hardware, such as field programmable gate arrays (FPGAs), to create custom co-processors to accelerate the algorithm. However, in order to realize the full potential of such approaches, the underlying algorithms must be inherently parallelizable.

Recently, FPGAs have reached the speed and logic density required to implement highly complex systems. The latest Virtex 5 series produced by Xilinx Corporation has up to 330,000 logic cells (equivalent to 55 million basic 2-input logic gates) capable of operation at speeds exceeding 550 MHz [1]. FPGA co-processors have much lower cost and greater flexibility than ASIC hardware. For the right type of application, a reconfigurable hardware-software co-processor can rival expensive parallel computers in accelerating computationally expensive algorithms.

This paper presents the results for the implementation of an element-by-element preconditioned conjugate gradient FEM solver using single precision floating-point arithmetic on a reconfigurable computing platform. The platform consists of two Celoxica RC2000 PCI bus plug-in cards equipped with one single Xlinx Virtex 2V 6000 FPGA and one single Xilinx Virtex 4VLX160 FPGA respectively. The 4VLX160 FPGA contains 152,064 logic cells, which consists of a look-up table and a flip-flop, plus 288 blocks of 18 Kb RAM, and 96 Xtreme DSP slices. A software host is used to implement an element-by-element scheme in which the whole problem is scattered into a series of sub-domains that can be downloaded into the reconfigurable computing boards. In this case, the large and sparse global stiffness matrix is no longer required to be assembled in the FPGA. The element matrix-vector calculations are completely independent, thus, the computational resources of the reconfigurable hardware can be efficiently used in a scalable manner, even when the matrix size increases.

The 32-bit floating-point hardware-software co-processor for the three-dimensional tetrahedral finite element using the preconditioned conjugate gradient method can achieve a speed-up of 40 for a single FPGA board (based on a 4VLX160 FPGA) compared to a software solution implemented using the same algorithm on a fast PC.

References