Computational & Technology Resources
an online resource for computational,
engineering & technology publications
Civil-Comp Proceedings
ISSN 1759-3433
CCP: 77
PROCEEDINGS OF THE NINTH INTERNATIONAL CONFERENCE ON CIVIL AND STRUCTURAL ENGINEERING COMPUTING
Edited by: B.H.V. Topping
Paper 137

A Low-Cost Parallel Architecture, the Hybrid System, for Solving a Large Linear Matrix System

C.S. Leo+, G. Leedham+, C.J. Leo* and H. Schroder$

+School of Computer Engineering, Nanyang Technological University, Singapore
*School of Engineering and Industrial Design, University of Western Sydney, Australia
$School of Computer and IT, Royal Melbourne Institute of Technology, Australia

Full Bibliographic Reference for this paper
C.S. Leo, G. Leedham, C.J. Leo, H. Schroder, "A Low-Cost Parallel Architecture, the Hybrid System, for Solving a Large Linear Matrix System", in B.H.V. Topping, (Editor), "Proceedings of the Ninth International Conference on Civil and Structural Engineering Computing", Civil-Comp Press, Stirlingshire, UK, Paper 137, 2003. doi:10.4203/ccp.77.137
Keywords: parallelization, speedup, Gauss-LU, hybrid system, cluster.

Summary
In this paper, a high-speed computation algorithm is proposed for solving a large NxN matrix system using the MIMD-SIMD Hybrid System. The MIMD-SIMD Hybrid System (also denoted as Hybrid System in this paper) is a new parallel architecture consisting of a combination of Cluster of Workstations (COWs) and SIMD systems working concurrently to produce an optimal parallel computation.

The proposed Hybrid System is an extended form of parallel architecture in that it is a combination of both Single Instruction Multiple Data (SIMD) and Multiple Instruction Multiple Data (MIMD) systems.

Figure 137.1: Hybrid Architecture (4PE Cluster)
leo.eps

The concept of the Hybrid System architecture is to make use of standard `Off-The-Shelf' components and thereby making the hardware easily maintainable and also low cost. We have implemented such a prototype Hybrid System and have named this new cluster the Hybrid 4PE Cluster. As the name implies, the Hybrid 4PE Cluster, involves using 4 PCs networked together. In each PC, a Systola 1024 card is installed for fine granularity (See Figure 137.1). The Systola 1024 make use of the Instruction Systolic Array concept for processing and details are given in the paper.

In introducing the Hybrid System, therefore, also comes a need to develop or at the very least modify existing algorithms, for solving a large NxN matrix system that are routinely required in analysis of engineering problems of practical interest. This paper presents the implementation a parallel algorithm known as the Gauss-LU algorithm. The algorithm integrates a modified Gauss-Jordan Elimination routine and a subsequent routine performing the LU factorization. Table 137.1 shows the results of the Hybrid 4PE Cluster system processing time using the Gauss-LU algorithm.


Table 137.1: Performance Results of Hybrid 4PE Cluster using Gauss-LU algorithm
Matrix Sub-Matrix Hybrid 4PE 1 PC with no Speedup
Order (N) Array (32x32) Cluster (sec) Systola (sec.)  
128 4x4 0.007 0.069 9.8
256 8x8 0.023 0.670 29.1
384 12x12 0.049 3.130 63.8
512 16x16 0.084 8.315 98.9


Currently, the Systola 1024 is still in a prototype stage and there are plans to enhance it further. With increased performance of this SIMD or any other SIMD system, it can be expected that the performance of speedup can be increased significantly more. In the design of this algorithm, most of the computational task is performed in the SIMD with limited task in the MIMD. The task in the MIMD is confined mainly to the computation of the matrix inverse and loading data into the SIMD. As much of these tasks are performed in parallel with the SIMD (double parallelization), there are therefore little overheads incurred by the PC clusters. A detailed derivation of the Hybrid Speedup performance has been derived and established previously [1].

References
1
C.S. Leo, H. Schroder and G. Leedham, MIMD-SIMD Hybrid System - Towards a New Low Cost Parallel System, Journal of Parallel Computing 29, (2003) pp 21-36. doi:10.1016/S0167-8191(02)00182-5

purchase the full-text of this paper (price £20)

go to the previous paper
go to the next paper
return to the table of contents
return to the book description
purchase this book (price £123 +P&P)