Computational & Technology Resources
an online resource for computational,
engineering & technology publications
PROCEEDINGS OF THE ELEVENTH INTERNATIONAL CONFERENCE ON CIVIL, STRUCTURAL AND ENVIRONMENTAL ENGINEERING COMPUTING
Edited by: B.H.V. Topping
A High Performance Tool for Parallel Access to netCDF Files from High Level Languages
V. Galiano1, H. Migallón1, V. Migallón2 and J. Penadés2
1Department of Physics and Computer Architectures, University Miguel Hernández, Elche, Alicante, Spain
V. Galiano, H. Migallón, V. Migallón, J. Penadés, "A High Performance Tool for Parallel Access to netCDF Files from High Level Languages", in B.H.V. Topping, (Editor), "Proceedings of the Eleventh International Conference on Civil, Structural and Environmental Engineering Computing", Civil-Comp Press, Stirlingshire, UK, Paper 63, 2007. doi:10.4203/ccp.86.63
Keywords: parallel distribution, dataset, performance, MPI, netCDF, Python interface.
In scientific and engineering applications, two obstacles hinder the full use of heterogeneous networks of powerful workstations: data access and data representation. Usually, data representations difficult distributing applications across networks or to display output from programs running on different system architectures. The network Common Data Form (netCDF)  is a data abstraction for storing and retrieving multidimensional data. NetCDF is distributed as a free software library that provides a concrete implementation of that abstraction. The library provides a machine-independent format for representing large datasets that are created and used by scientific applications. The netCDF software includes C and Fortran interfaces for accessing netCDF data. On the other hand, there are available netCDF interfaces for high level languages that improve its ease of use from Matlab, Ruby, Java and particularly, Python . Python is a dynamic object-oriented programming language that can be used for many kinds of software development.
Today most scientific applications are programmed to run in parallel environments because of the increasing requirements of data amount and computational resources. It is highly desirable to develop a set of parallel APIs for accessing netCDF files that employs appropriate parallel I/O techniques. In this way, PnetCDF  provides a high-performance and parallel interface for accessing netCDF files from C using the MPI standard.
Our goal is to create an easy and powerful tool for Python, that we have called PyPnetCDF, which be able to manage netCDF properties in a similar way to the serial version from ScientificPython, but hiding parallelism to the user. PyPnetCDF has been constructed such that parallel environment and data distribution are internally managed by the interfaces. Moreover, since Python users are accustomed to use ScientificPython for managing netCDF files, the layout of the wrappers follows that of the serial version from ScientificPython. Python examples have demonstrated that PyPnetCDF can be manage huge netCDF files without worrying about data distribution. Performance tests prove that PyPnetCDF scales well with the number of processors and the Python interface does not involve a penalty in performance. As summary, PyPnetCDF is an intuitive, handy, parallel and powerful tool to manage netCDF file from Python in a parallel architecture. Future work involves completing the production-quality parallel PyPnetCDF package, providing new functionalities and integrating it with other Python packages.
purchase the full-text of this paper (price £20)