Computational & Technology Resources
an online resource for computational,
engineering & technology publications
Civil-Comp Proceedings
ISSN 1759-3433
CCP: 101
PROCEEDINGS OF THE THIRD INTERNATIONAL CONFERENCE ON PARALLEL, DISTRIBUTED, GRID AND CLOUD COMPUTING FOR ENGINEERING
Edited by:
Paper 12

Testing, Deploying, and Evolving the Code_Saturne CFD Toolchain for Billion-Cell Calculations

Y. Fournier1, C. Moulinec2 and P. Vezolle3

1EDF R&D, MFEE, Chatou, France
2STFC Daresbury Laboratory, Warrington, United Kingdom
3IBM France, La Pompignane, Montpellier, France

Full Bibliographic Reference for this paper
Y. Fournier, C. Moulinec, P. Vezolle, "Testing, Deploying, and Evolving the Code_Saturne CFD Toolchain for Billion-Cell Calculations", in , (Editors), "Proceedings of the Third International Conference on Parallel, Distributed, Grid and Cloud Computing for Engineering", Civil-Comp Press, Stirlingshire, UK, Paper 12, 2013. doi:10.4203/ccp.101.12
Keywords: CFD, PRACE, Code_Saturne, meshes, unstructured, HPC, petascale.

Summary
As with many CFD Navier-Stokes tools, EDF's Code_Saturne software is parallelised using sdomain partitioning and MPI. Over recent years, an increasing portion of the toolchain has been parallelised, from mesh modification operations to postprocessing output generation, and complex configurations with larger unstructured meshes of the size of several billion grid cells are now achievable on a few thousand cores.

Mainly developed by EDF, Code_Saturne is distributed under the GPL licence, and has been one of the PRACE benchmark codes for several years. Several collaborations concerning the code are underway, with both PRACE partners and IBM. As an industrial code developed in an industry R&D laboratory, collaboration, with other laboratories having totally different backgrounds, expertise domains, IT organizations, and goals, is quite complex from a technical standpoint. Optimal conditions, such as read/write access to shared version control systems or distant connections with access to highly parallel debuggers are often not available, and running even minor tests on a very large number of processors may quickly exhaust allocated resources, so these tests must be well chosen and prepared.

Also, and very importantly, as an industrial code, it is important that the toolchain be relatively easy to use, and not differ significantly from a laptop to a supercomputer. We will try to detail which architectural, algorithmic, and implementation choices have been essential to allowing such heterogeneous collaborations in mixed environments to provide code or essential feedback, testing, and debugging loops to the main developpement team. As examples, after describing essential parallel implementation features of the code, we focus on a few domains:

  • parallel distribution and partitioning: parallel partitioning options have been extended and their use strealimed. We provide a general overview of the current possibilities;

  • progress on parallel modification (especially joining): parallel mesh joining has so far enabled the largest computations run with the code, enabling cases too large to be built by most meshing tools. It leverages many mesh and connectivity management and distribution algorithms. Recent tests provide insight into how scalability may be further improved with localized algorithm changes;

  • implementation of hybrid MPI/OpenMP parallelism and mesh renumbering: OpenMP parallelism has been tested by IBM teams, and recently introduced in the mainline code. We provide an update on the current status of renumbering and threading features, using alternate renumbering schemes, compared to those described in our previous papers;

  • parallel I/O: although Code_Saturne has strived to find a good compromise between recommendations from hardware vendors, filesystem developers, and usability aspects, parallel I/O performance has often been both disappointing and very heterogeneous across file systems. We avoided adding additional intermediate layers so far, so we have added additional options to our I/O strategy, as well as the possibility to use both hints and subcommunicators for MPI-IO.

We also provide a few insights on the current roadmap, so as to explain how things are expected to be further improved and streamlined, and where specialists may find it interesting to experiment.

purchase the full-text of this paper (price £20)

go to the previous paper
go to the next paper
return to the table of contents
return to the book description
purchase this book (price £40 +P&P)