Computational & Technology Resources
an online resource for computational,
engineering & technology publications
PROCEEDINGS OF THE SIXTH INTERNATIONAL CONFERENCE ON PARALLEL, DISTRIBUTED, GPU AND CLOUD COMPUTING FOR ENGINEERING
Edited by: P. Iványi and B.H.V. Topping
Fine-grained application tuning on OpenPOWER HPC systems
L. Riha1, A. Bartolini2 and O. Vysocky1
1IT4Innovations, VSB - Technical University of Ostrava, Czech Republic
L. Riha, A. Bartolini, O. Vysocky, "Fine-grained application tuning on OpenPOWER HPC systems", in P. Iványi, B.H.V. Topping, (Editors), "Proceedings of the Sixth International Conference on Parallel, Distributed, GPU and Cloud Computing for Engineering", Civil-Comp Press, Stirlingshire, UK, Paper 33, 2019. doi:10.4203/ccp.112.33
Keywords: high performance computing, energy efficient computing, performance analysis, DiG, Power Architecture, MERIC.
Energy consumption of HPC centers becomes a major limitation in building a new peta or exascale system. In this paper we evaluate the approach of dynamic application tuning used to reduce energy consumption of HPC systems, introduced in the Horizon 2020 READEX project, on the IBM’s Power8 system.
A D.A.V.I.D.E HPC system installed in CINECA has been selected for these experiments due to its advanced power consumption monitoring system, which provides power samples at high sampling rate of 1kHz, that are stored in the internal memory and can be read on request.
In order to perform the presented evaluations we have enhanced our in-house C/C++MERIC library to support both hardware parameters tuning on Power8 system as well as energy measurement system of the D.A.V.I.D.E.
We have used two hardware parameters that are available to the user on this system: (1) Dynamic Voltage and Frequency Scaling and (2) concurrency throttling with focus on hyperthreading, which plays a significant role with respect to performance on Power8 systems.
As a test application we have used the ESPRESO FEM library, which is fully fledged application with several regions that shows different kinds of behavior (compute, memory, I/O or communication bound). Each of the regions may have its optimal configuration of the tuned parameters to utilize the system however at the same time without wasting the resources. In this way we were able to reduce the application runtime about 23.7% when tuning to reduce runtime or 27.3% of energy with 9.9% runtime savings when tuning for minimal energy consumption.
download the full-text of this paper (PDF, 19 pages, 344 Kb)