Computational & Technology Resources
an online resource for computational,
engineering & technology publications
Civil-Comp Proceedings
ISSN 1759-3433
CCP: 90
PROCEEDINGS OF THE FIRST INTERNATIONAL CONFERENCE ON PARALLEL, DISTRIBUTED AND GRID COMPUTING FOR ENGINEERING
Edited by:
Paper 21

A Predictive File Replication Strategy for Grid Computing

C.H. Liao1, F.Z. Wang1, S.N. Wu1, M.M. Rashid1 and N. Helian2

1Department of Applied Mathematics and Computing, School of Engineering, Cranfield University, United Kingdom
2Department of Computer Science, University of Hertfordshire, United Kingdom

Full Bibliographic Reference for this paper
C.H. Liao, F.Z. Wang, S.N. Wu, M.M. Rashid, N. Helian, "A Predictive File Replication Strategy for Grid Computing", in , (Editors), "Proceedings of the First International Conference on Parallel, Distributed and Grid Computing for Engineering", Civil-Comp Press, Stirlingshire, UK, Paper 21, 2009. doi:10.4203/ccp.90.21
Keywords: data grids, replication strategies, decision tree, predictive model.

Summary
In grid environments, file replication strategies are critical to the overall performance of large-scale data intensive applications. However, due to the dynamism of the Grids, file replication decisions are always made by monitoring the change of the popularity of a file. Although prompt replication can avoid the increase in access latency in future, the burden of the replications and the current accesses to the relative files may conflict each other and hence increase the access latency. Ideally, advanced file replications can smooth the access latency if the changes of the file popularity can be predicted. In this paper, we propose a predictive file replication strategy based on forecasting the future popularity of files to address the problem.

The real-system-trace-based simulations were conducted under the European Data Grid simulation environment OptorSim. Having simulated the three replication strategies, it is clear that the proposed predictive replication strategy outperforms the LRU and Economic model under sequential and Zipf access patterns. In addition, the Queue Length scheduling algorithm brings the balance between mean job time and resource usage. It is also noticed that no strategy delivers the best performance results in every circumstance. In order to choose a good replication strategy, trace skewness, storage capacity and the maximal computing power have to be considered.

As a policy generator, the decision tree based predictive model should link the file characteristics and the future popularity of the file and pass the rules to areplication manager to decide when and where to create replicas. With advanced evaluation of files, access latency caused by geographically distributed resources can be smoothed.

purchase the full-text of this paper (price £20)

go to the previous paper
go to the next paper
return to the table of contents
return to the book description
purchase this book (price £72 +P&P)