Computational & Technology Resources
an online resource for computational,
engineering & technology publications
Civil-Comp Proceedings
ISSN 1759-3433
CCP: 89
PROCEEDINGS OF THE SIXTH INTERNATIONAL CONFERENCE ON ENGINEERING COMPUTATIONAL TECHNOLOGY
Edited by: M. Papadrakakis and B.H.V. Topping
Paper 6

A Parallel Implementation for Finding the Longest Common Subsequence

P.D. Michailidis and K.G. Margaritis

Department of Applied Informatics, University of Macedonia, Greece

Full Bibliographic Reference for this paper
P.D. Michailidis, K.G. Margaritis, "A Parallel Implementation for Finding the Longest Common Subsequence", in M. Papadrakakis, B.H.V. Topping, (Editors), "Proceedings of the Sixth International Conference on Engineering Computational Technology", Civil-Comp Press, Stirlingshire, UK, Paper 6, 2008. doi:10.4203/ccp.89.6
Keywords: LCS computation, bioinformatics, cluster of heterogeneous workstations, Message Passing Interface, master-worker, parallel algorithms.

Summary
Searching for the longest common subsequence (LCS) is one of the most fundamental tasks in bioinformatics. The LCS problem had been extensively studied in the literature on sequential and parallel algorithms. For the LCS problem, the time complexity tends to grow very fast as the size of biological databases increases. For this reason, in recent years several attempts at implementing the LCS and other similar problems have been made on a cluster of workstations. Recently [1] parallel efficient parallel implementations were developed to run a biological sequence alignment algorithm on a PC cluster.

The primary goal of this paper is to implement and analyze the performance of the parallel implementation for the LCS problem on a cluster of heterogeneous workstations. The key technique of our implementation is based on the dynamic master-worker programming model. This model consists of a master workstation and a collection of worker workstations. The master workstation is used to partition a given sequence database into a set of several smaller databases and distribute them to all worker workstations dynamically and collect the local results from the worker workstations. The worker workstations are mainly performing a sequential dynamic programming algorithm on their respective databases. It is the first parallel implementation using a cluster environment for this problem.

We implemented our proposed parallel implementation in the C language using the MPI communication library [2] and tested this on a cluster of heterogeneous workstations running Linux. The communication between workstations is performed through a fast ethernet switch. From the performance results we observed that using different values of block size leads to different performances. The worst performance is obtained for very small and large values of block size. This is because small values of block size increase the inter-workstation communication overhead, while large values of block size produce a poorly balanced load. However, from experiments it is obtained that there is an optimal value of block size of nearly 10,000 characters and that this value produces a better performance. Further, the experimental results showed that the total running time of the proposed implementation is decreased as the number of workstations is increased. Also, the communication time of the parallel implementation is lower than the total running time. It is therefore more important to improve local computations than communication rounds. Moreover, we observed that there is a significant effect of the query length on the overall performance of the LCS implementation. Finally, the results show that the proposed parallel implementation still scales well though the problem size has been increased two times.

Therefore, as a general conclusion we can say that the proposed parallel implementation is efficient with a low communication overhead.

References
1
A. Boukerche, A.C.M.A. de Melo, E.F. de Oliveira Sandes, M. Ayala-Rinco'n, "An exact parallel algorithm to compare very long biological sequences in clusters of workstations", Cluster Computing, 10, 187-202, 2007. doi:10.1007/s10586-007-0020-0
2
P. Pacheco, "Parallel programming with MPI", San Francisco, CA, Morgan Kaufmann, 1997.

purchase the full-text of this paper (price £20)

go to the previous paper
go to the next paper
return to the table of contents
return to the book description