Computational & Technology Resources
an online resource for computational,
engineering & technology publications
Computational Science, Engineering & Technology Series
TRENDS IN PARALLEL, DISTRIBUTED, GRID AND CLOUD COMPUTING FOR ENGINEERING
Edited by: P. Iványi, B.H.V. Topping
Network Services for High Performance Distributed Computing and Data Management
W.E. Johnston, C. Guok, J. Metzger and B. Tierne
ESnet and Lawrence Berkeley National Laboratory, Berkeley California, United States of America
W.E. Johnston, C. Guok, J. Metzger, B. Tierne, "Network Services for High Performance Distributed Computing and Data Management", in P. Iványi, B.H.V. Topping, (Editors), "Trends in Parallel, Distributed, Grid and Cloud Computing for Engineering", Saxe-Coburg Publications, Stirlingshire, UK, Chapter 4, pp 83-104, 2011. doi:10.4203/csets.27.4
Keywords: high performance distributed computing and data management, high throughput networks, network services, science use of networks.
Much of modern science is dependent on high performance distributed computing and data handling. This distributed infrastructure, in turn, depends on the high performance operation of high speed networks and services; especially when the science infrastructure is widely distributed geographically. This is true for small science groups in fields such as materials, nanotechnology, molecular biology and genomics, etc., that involve remote instruments and/or large amounts of data, as well as for the obvious cases of the large science collaborations in fields such as high energy physics, astronomy and cosmology, large-scale computational science, etc. In all of these cases sophisticated and highly tuned network services are needed to enable the science because the science is dependent on high throughput so that the distributed computing and data management systems will be able to analyze data as quickly as instruments produce it.
Two network services have emerged as essential for supporting high performance distributed applications: guaranteed bandwidth and multi-domain monitoring. Guaranteed bandwidth service, typically supplied as a virtual circuit, is essential for time critical distributed applications, as most science applications are. Detailed monitoring and active diagnosis are critical to isolating degraded network elements that inhibit "high performance use of the network". That is, the very low packet loss at very high data rates (typically 10 gigabits/second) that is necessary for high network throughput over long (national and intercontinental) distances.
Implementations of both of these capabilities, one providing guaranteed bandwidth and one doing network monitoring, are being fairly widely deployed in the campus, regional, national and international networks that support distributed science and the research and education communities, including ESnet.
The OSCARS implementation of reservable virtual circuits is a production service in ESnet and provides a virtual circuit service for collaborative science. The characteristics of the service are reservable, end-to-end guaranteed throughput across the multiple network domains used by large science collaborations.
OSCARS is jointly developed in an informal international consortium and several dozen research and education networks around the world use OSCARS or variations of it.
The perfSONAR network monitoring system is the first effective multi-domain network test and monitor capability. It is widely deployed in the research and education community and is routinely used to debug network paths (IP and circuit) that are used by the international science community. perfSONAR is also developed in an informal international consortium.
Together, these services have provided critical infrastructure for the functioning of large science experiments such as those at CERN's LHC.
purchase the full-text of this chapter (price £20)