Computational & Technology Resources
an online resource for computational,
engineering & technology publications
Civil-Comp Proceedings
ISSN 1759-3433
CCP: 101
PROCEEDINGS OF THE THIRD INTERNATIONAL CONFERENCE ON PARALLEL, DISTRIBUTED, GRID AND CLOUD COMPUTING FOR ENGINEERING
Edited by:
Paper 32

Dynamic Horizontal Job Clustering in Grid Environments

Y. Yudin, Y. Dorozhko, N. Currle-Linde and M. Resch

High Performance Computing Center Stuttgart (HLRS), University of Stuttgart, Germany

Full Bibliographic Reference for this paper
Y. Yudin, Y. Dorozhko, N. Currle-Linde, M. Resch, "Dynamic Horizontal Job Clustering in Grid Environments", in , (Editors), "Proceedings of the Third International Conference on Parallel, Distributed, Grid and Cloud Computing for Engineering", Civil-Comp Press, Stirlingshire, UK, Paper 32, 2013. doi:10.4203/ccp.101.32
Keywords: grid, high-performance computing, job clustering, middleware, quantum chemistry, workflow.

Summary
The continuous development of high performance computing technologies leads to an increase in the complexity of computations. Along with the development of mathematical and computational models and with the increasing demand for more precise results, the data volume which is to be processed is growing. The problem of handling large volumes of data emerges increasingly in the field of simulation technologies. The complexity of individual tasks increases for ever more sophisticated simulation experiments, where it is important to support dynamic data scaling for a wide range.

In this paper we present an approach to clustering large numbers of computational jobs in a HPC environment. Job submission on HPC resources is usually based on the portable batch system (PBS), which is well suited to run a given job on the resource. However, often one simulation scenario consists of a large number of independent computational jobs with different input parameter values, e.g. in a parameter sweep or in an optimization run. Using PBS for such scenarios usually leads to a significant summarized wait time, as every job requires some waiting time in the queue. So, the total wait time can be a significant part of the overall workflow execution time. Furthermore, the number of jobs in a queue for one user is typically limited; though the limitations can differ depending on the resource and computational center, varying from dozens to several thousands. We therefore propose an approach based on dynamic job clustering. Instead of submitting each separated job individually, one cluster mastering job is submitted, which then dynamically distributes its available resources to the smaller jobs. In this paper we demonstrate job clustering using a quantum chemistry experiment, which is a good example for a complex and data-intensive simulation workflow. In the experiment a package of ab initio programs for all molecular electronic and vibrational structure calculations MOLPRO has been interfaced to SEGL in order to compute accurate multidimensional potential energy surfaces based on ab initio electronic structure single-point calculations. The results clearly show that this concept of job clustering can be successfully used in a true production environment.

Furthermore, the problem of optimal job distribution on HPC resources in a grid is discussed, in order to minimize the overall time of a computational experiment. If an experiment interacts with multiple resources, as in the context of grid or virtual organizations (VO) that consists of a number of HPC resources, the question of optimal selection of computational resources plays an important role for long-running computational experiments. Here, the possibility to optimize load balancing in the case of a large number of jobs under conditions of VO is considered, based on dynamic job clustering and continuous load balancing.

purchase the full-text of this paper (price £20)

go to the previous paper
go to the next paper
return to the table of contents
return to the book description
purchase this book (price £40 +P&P)