Computational Technology Resources - CCP

Keywords: computational fluid dynamics, Code_Saturne, OpenMP, sparse matrix vector product, benchmarks, renumbering algorithms.

Summary

The scale of computational fluid dynamics (CFD) simulation problems is rapidly increasing as a result of the requirements for higher spatial resolution, varied turbulence models, and more detailed physics. As is the case with many CFD Navier-Stokes tools, EDF's Code_Saturne which is also one of the two CFD software packages of the PRACE benchmark, is parallelized using domain partitioning and MPI. On large systems with thousands of compute nodes, even with simulations employing multi-billion cell meshes a pure MPI approach will not able to fully take advantage of the multiple levels of parallelism and the steady increase in the number of cores per processor. To tackle this problem the most popular approach is to introduce a hybrid MPI/OpenMP approach. Code_Saturne implements a three-dimensional general finite volume solver with conformal and non-conformal meshes. The computation time is dominated by the linear equation solvers, mainly for the pressure and to a lesser degree by gradient reconstructions. The thread-level parallelism was mainly applied on computational loops which iterate over the cells or faces in the cell-centred formulation. A general loop transformation was implemented to allow a wide range of methods to control memory indirect addressing conflicts between threads, while minimizing code changes. In this paper different mesh renumbering algorithms are presented to generate threads (multipass approach with METIS, SCOTCH partitioning or space filling Morton curves, Cuthill McKee approach), while exploiting communication overlapping. Performance, scalability and comparison results are presented on an Intel x86 cluster (with three generations of Intel Xeon processor: Westmere, Ivy Bridge and Haswell) and IBM Blue Gene/Q systems. A very significant part of the total execution time is spent in sparse matrix-vector products. It is shown that this product can behave as a stream kernel benchmark and therefore depends on the memory system performance. It is pointed out that significant performance degradation occurs per core depending on the number of cores used per node. Results on several Intel Xeon generations are provided as well as hardware counter analysis.

purchase the full-text of this paper (price £20)

go to the previous paper
go to the next paper
return to the table of contents
return to the book description
purchase this book (price £45 +P&P)

	Computational & Technology Resources an online resource for computational, engineering & technology publications
	not logged in - login
Front Page Browse CCP CSETS CTR IJRT Other Authors Search Purchase Guide FAQ Contact us	Civil-Comp Proceedings ISSN 1759-3433 CCP: 107 PROCEEDINGS OF THE FOURTH INTERNATIONAL CONFERENCE ON PARALLEL, DISTRIBUTED, GRID AND CLOUD COMPUTING FOR ENGINEERING Edited by: P. Iványi and B.H.V. Topping Paper 30 Mesh Renumbering Methods and Performance with OpenMP/MPI in Code Saturne P. Trespeuch¹, Y. Fournier², C. Evangelinos³ and P. Vezolle⁴ ¹CS Information Systems, Le Plessis Robinson, France ²EDF R&D, Département Mécanique des Fluides, Energies et Environnement, Chatou Cedex, France ³IBM Research, Cambridge, Massachusetts, United States of America ⁴IBM France, Montpellier, France doi:10.4203/ccp.107.30 purchase the full-text of this paper Full Bibliographic Reference for this paper P. Trespeuch, Y. Fournier, C. Evangelinos, P. Vezolle, "Mesh Renumbering Methods and Performance with OpenMP/MPI in Code Saturne", in P. Iványi, B.H.V. Topping, (Editors), "Proceedings of the Fourth International Conference on Parallel, Distributed, Grid and Cloud Computing for Engineering", Civil-Comp Press, Stirlingshire, UK, Paper 30, 2015. doi:10.4203/ccp.107.30 Keywords: computational fluid dynamics, Code_Saturne, OpenMP, sparse matrix vector product, benchmarks, renumbering algorithms. Summary The scale of computational fluid dynamics (CFD) simulation problems is rapidly increasing as a result of the requirements for higher spatial resolution, varied turbulence models, and more detailed physics. As is the case with many CFD Navier-Stokes tools, EDF's Code_Saturne which is also one of the two CFD software packages of the PRACE benchmark, is parallelized using domain partitioning and MPI. On large systems with thousands of compute nodes, even with simulations employing multi-billion cell meshes a pure MPI approach will not able to fully take advantage of the multiple levels of parallelism and the steady increase in the number of cores per processor. To tackle this problem the most popular approach is to introduce a hybrid MPI/OpenMP approach. Code_Saturne implements a three-dimensional general finite volume solver with conformal and non-conformal meshes. The computation time is dominated by the linear equation solvers, mainly for the pressure and to a lesser degree by gradient reconstructions. The thread-level parallelism was mainly applied on computational loops which iterate over the cells or faces in the cell-centred formulation. A general loop transformation was implemented to allow a wide range of methods to control memory indirect addressing conflicts between threads, while minimizing code changes. In this paper different mesh renumbering algorithms are presented to generate threads (multipass approach with METIS, SCOTCH partitioning or space filling Morton curves, Cuthill McKee approach), while exploiting communication overlapping. Performance, scalability and comparison results are presented on an Intel x86 cluster (with three generations of Intel Xeon processor: Westmere, Ivy Bridge and Haswell) and IBM Blue Gene/Q systems. A very significant part of the total execution time is spent in sparse matrix-vector products. It is shown that this product can behave as a stream kernel benchmark and therefore depends on the memory system performance. It is pointed out that significant performance degradation occurs per core depending on the number of cores used per node. Results on several Intel Xeon generations are provided as well as hardware counter analysis. purchase the full-text of this paper (price £20) go to the previous paper go to the next paper return to the table of contents return to the book description purchase this book (price £45 +P&P)
Back to top	©Civil-Comp Limited 2023 - terms & conditions