Computational & Technology Resources
an online resource for computational,
engineering & technology publications
Civil-Comp Proceedings
ISSN 1759-3433
CCP: 101
PROCEEDINGS OF THE THIRD INTERNATIONAL CONFERENCE ON PARALLEL, DISTRIBUTED, GRID AND CLOUD COMPUTING FOR ENGINEERING
Edited by:
Paper 14

Parallel Performance of Fast Fourier Transform Routines in PRACE

A. Sunderland1, C. Moulinec1 and R. Sandberg2

1STFC Daresbury Laboratory, Warrington, United Kingdom
2Engineering and the Environment, University of Southampton, United Kingdom

Full Bibliographic Reference for this paper
A. Sunderland, C. Moulinec, R. Sandberg, "Parallel Performance of Fast Fourier Transform Routines in PRACE", in , (Editors), "Proceedings of the Third International Conference on Parallel, Distributed, Grid and Cloud Computing for Engineering", Civil-Comp Press, Stirlingshire, UK, Paper 14, 2013. doi:10.4203/ccp.101.14
Keywords: fast Fourier transform, FFT, parallel performance, PRACE, Hartree Centre.

Summary
The Fast Fourier Transform (FFT) is one of the most widely used and useful algorithms in engineering and scientific applications and therefore its analysis and performance on large-scale computing platforms is of much importance to a range of research fields. In computational fluid dynamics applications, computing fast and efficient FFTs enables ever larger direct numerical simulations and large-eddy simulations, in which Reynolds numbers can approach those found in reality. Under the European Community's Seventh Framework Programme, the PRACE [1] `Tier-0' systems [2,3] with parallel computing environments, enabling a great deal of processing power (either through a large numbers of CPU cores or the provision of computational accelerators such as GPUs), have been made available for high-end computing researchers and code developers. Recently, high-end computing resources (IBM Blue Gene/Q) have also been made available to researchers in the UK through the Hartree Centre at STFC Daresbury Laboratory [4]. This paper analyses parallel three-dimensional FFT performance on these high-end resources using routines from the numerical libraries FFTW [5], FFTE [6] and DAFT [7]. The implementations of the FFT investigated range from pure MPI versions to hybrid MPI-OpenMP approaches that can utilize simultaneous multithreading features on multicore architectures. Alternative three-dimensional data distributions, such as slab, pencil and block are also investigated to assess the impact upon parallel performance. The paper extends former work to testing the various FFT methods for the large datasets often used in simulations involving the High-Performance Solver for Turbulence and Aeroacoustic Research (HiPSTAR), which is developed at the University of Southampton, UK [8]. The paper presents, compares and analyses performance results from benchmark runs undertaken on the three architectures listed above. The authors conclude that although new implementations and techniques can now extend performance scalability to several thousands of cores, parallel scalability is ultimately limited by the all-to-all nature of the underlying communications.

References
1
Partnership for Advanced Computing in Europe, PRACE, www.prace-ri.eu
2
JUGENE - IBM Blue Gene/P at Julich Supercomputing Centre, Germany, www.fz-juelich.de/jsc/jugene
3
CURIE - Bull Intel Supercomputer at GENCI, France, www-hpc.cea.fr/en/complexe/tgcc-curie.htm
4
The Hartree Centre, www.stfc.ac.uk/hartree
5
"Fastest Fourier Transform in the West", FFTW Home Page, www.fftw.org
6
"Fastest Fourier Transform in the East", FFTE Home Page, www.ffte.jp
7
I.J. Bush, I.T. Todorov, "A DAFT DL_POLY distributed memory adaptation of the smoothed particle mesh Ewald method", Comp. Phys. Commun., 175, (5), 323-329, 2006.
8
High-Performance Solver for Turbulence and Aeroacoustic Research (HiPSTAR), Computational Modelling Group, University of Southampton, cmg.soton.ac.uk/research/categories/simulation-software/hipstar/

purchase the full-text of this paper (price £20)

go to the previous paper
go to the next paper
return to the table of contents
return to the book description
purchase this book (price £40 +P&P)