Computational & Technology Resources
an online resource for computational,
engineering & technology publications
PROCEEDINGS OF THE SECOND INTERNATIONAL CONFERENCE ON PARALLEL, DISTRIBUTED, GRID AND CLOUD COMPUTING FOR ENGINEERING
Edited by: P. Iványi and B.H.V. Topping
Multiparticle Collision Dynamics on the Cell Broadband Engine using CellSs
A. Schiller1, G. Sutmann1, L. Martinell2, P. Bellens2 and R. Badia2,3
1Jülich Supercomputing Centre, Forschungszentrum Jülich, Germany
A. Schiller, G. Sutmann, L. Martinell, P. Bellens, R. Badia, "Multiparticle Collision Dynamics on the Cell Broadband Engine using CellSs", in P. Iványi, B.H.V. Topping, (Editors), "Proceedings of the Second International Conference on Parallel, Distributed, Grid and Cloud Computing for Engineering", Civil-Comp Press, Stirlingshire, UK, Paper 69, 2011. doi:10.4203/ccp.95.69
Keywords: particle simulations, multiparticle collision dynamics, multi-core architectures, cell broadband engine, parallel computing, cell superscalar.
To accelerate the MPC algorithm, the cell broadband engine (Cell/BE) is considered as target platform for a parallel implementation. The Cell/BE is a heterogeneous multicore processor consisting of multiple heterogeneous execution units, SIMD processing engines, fast local storages and a software-managed memory-hierarchy.
The purpose of this work is to use a programming model that is more general, so that the code can also be executed on other architectures with only minor changes. Therefore, the high-level programming model cell superscalar (CellSs) developed by the Barcelona Supercomputing Center is used to port the application to Cell/BE. CellSs is based on pragmas similar to OpenMP. It allows the programmer to write sequential code based on which CellSs is able to exploit the existing concurrency and to use the different components of the Cell/BE (PPE and SPEs) by means of an automatic parallelization at execution time.
Previous attempts to develop a Cell/BE implementation based on the original MPC algorithm suffered from the limited amount of parallelizable program parts. Therefore, the MPC algorithm was redesigned using a domain decomposition approach which considers the Cell/BE and CellSs specific requirements to be able to calculate all program parts in parallel on the SPEs. The results show that the parallelism could be increased significantly. The idling times of the SPEs could be reduced to a minimum so that for eight SPEs a speedup factor of five is achieved.
The CellSs implementation can also be executed in parallel on shared-memory architectures. This can be achieved by compiling the CellSs code with the SMP Superscalar (SMPSs) compiler. Like CellSs, SMPSs is an instantiation of the Star Superscalar (STARSs) programming model, which enables developers to write portable code for multi-core and many-core architectures. The next steps will look towards porting the redesigned MPC algorithm to GPUs. For that purpose, the programming model GPUSs will be used which is also part of the StarSs family and similar to CellSs and SMPSs.
purchase the full-text of this paper (price £20)