Computational & Technology Resources
an online resource for computational,
engineering & technology publications
Civil-Comp Proceedings
ISSN 1759-3433
CCP: 86
PROCEEDINGS OF THE ELEVENTH INTERNATIONAL CONFERENCE ON CIVIL, STRUCTURAL AND ENVIRONMENTAL ENGINEERING COMPUTING
Edited by: B.H.V. Topping
Paper 63

A High Performance Tool for Parallel Access to netCDF Files from High Level Languages

V. Galiano1, H. Migallón1, V. Migallón2 and J. Penadés2

1Department of Physics and Computer Architectures, University Miguel Hernández, Elche, Alicante, Spain
2Department of Computational Science and Artificial Intelligence, University of Alicante, Spain

Full Bibliographic Reference for this paper
V. Galiano, H. Migallón, V. Migallón, J. Penadés, "A High Performance Tool for Parallel Access to netCDF Files from High Level Languages", in B.H.V. Topping, (Editor), "Proceedings of the Eleventh International Conference on Civil, Structural and Environmental Engineering Computing", Civil-Comp Press, Stirlingshire, UK, Paper 63, 2007. doi:10.4203/ccp.86.63
Keywords: parallel distribution, dataset, performance, MPI, netCDF, Python interface.

Summary
In scientific and engineering applications, two obstacles hinder the full use of heterogeneous networks of powerful workstations: data access and data representation. Usually, data representations difficult distributing applications across networks or to display output from programs running on different system architectures. The network Common Data Form (netCDF) [1] is a data abstraction for storing and retrieving multidimensional data. NetCDF is distributed as a free software library that provides a concrete implementation of that abstraction. The library provides a machine-independent format for representing large datasets that are created and used by scientific applications. The netCDF software includes C and Fortran interfaces for accessing netCDF data. On the other hand, there are available netCDF interfaces for high level languages that improve its ease of use from Matlab, Ruby, Java and particularly, Python [2]. Python is a dynamic object-oriented programming language that can be used for many kinds of software development.

Today most scientific applications are programmed to run in parallel environments because of the increasing requirements of data amount and computational resources. It is highly desirable to develop a set of parallel APIs for accessing netCDF files that employs appropriate parallel I/O techniques. In this way, PnetCDF [3] provides a high-performance and parallel interface for accessing netCDF files from C using the MPI standard.

Our goal is to create an easy and powerful tool for Python, that we have called PyPnetCDF, which be able to manage netCDF properties in a similar way to the serial version from ScientificPython, but hiding parallelism to the user. PyPnetCDF has been constructed such that parallel environment and data distribution are internally managed by the interfaces. Moreover, since Python users are accustomed to use ScientificPython for managing netCDF files, the layout of the wrappers follows that of the serial version from ScientificPython. Python examples have demonstrated that PyPnetCDF can be manage huge netCDF files without worrying about data distribution. Performance tests prove that PyPnetCDF scales well with the number of processors and the Python interface does not involve a penalty in performance. As summary, PyPnetCDF is an intuitive, handy, parallel and powerful tool to manage netCDF file from Python in a parallel architecture. Future work involves completing the production-quality parallel PyPnetCDF package, providing new functionalities and integrating it with other Python packages.

References
1
R. Rew and G. Davis, "The Unidata netCDF: software for scientific data access", In "Sixth International Conference on Interactive Information and Processing Systems for Meteorology, Oceanography and Hydrology", Anaheim, CA, 2001.
2
G. van Rossum and F.L. Drake Jr., "An Introduction to Python", Network Theory Ltd, 2003.
3
J. Li, W. Liao, A. Choudhary, R. Ross, R. Thakur, W. Gropp, and R. Latham, "Parallel netCDF: A high-performance scientific I/O interface", In "Proceedings of SC2003: High Performance Networking and Computing", Phoenix, AZ, 2003.

purchase the full-text of this paper (price £20)

go to the previous paper
go to the next paper
return to the table of contents
return to the book description