Computational & Technology Resources
an online resource for computational,
engineering & technology publications 

Computational Science, Engineering & Technology Series
ISSN 17593158 CSETS: 21
PARALLEL, DISTRIBUTED AND GRID COMPUTING FOR ENGINEERING Edited by: B.H.V. Topping, P. Iványi
Chapter 21
A Parallel Approach for Solving a Wide Class of Structural NonLinear Problems J.Y. Cognard<sup>1</sup> and P. Verpeaux<sup>2</sup>
^{1}Brest Laboratory of Mechanics and Systems, ENSIETA  University of Brest  ENIB, France J.Y. Cognard, P. Verpeaux, "A Parallel Approach for Solving a Wide Class of Structural NonLinear Problems", in B.H.V. Topping, P. Iványi, (Editors), "Parallel, Distributed and Grid Computing for Engineering", SaxeCoburg Publications, Stirlingshire, UK, Chapter 21, pp 455482, 2009. doi:10.4203/csets.21.21
Keywords: nonlinear computations, parallel strategies, algorithms, large scale problems, load balancing, industrial environment.
Summary
Reducing the time and cost of mechanical design necessitates taking into account material and geometrical nonlinearities when simulating the behaviour of the structures. Moreover, realistic models have to be used in order to obtain accurate numerical predictions, especially to respect safety conception constrains which are more and more required in hightech nuclear, space and naval industries. Unfortunately, these simulations with accurate meshes involving a large number of elements, with highly nonlinear material behaviour with softening and localisation and with complex history loadings requiring a large number of time steps, often lead to numerical costs too high for their use to be widespread in the industry. The joint use of powerful algorithms and parallel computers is necessary to greatly reduce the cost of these complex simulations [1,2]. The aim of this research project was to extend the possibilities of the finite element code CAST3M (developed at CEA, France), whose purpose is to facilitate the development of new algorithms. Thus, in order to strongly reduce the numerical elapsed time for solving complex non linear problems it is important to take into account the possibilities of different parallel computers, and in particular efficient and economic configurations of multicore 64 bits PC. Moreover, in order to let the programmer focus on the program design, which is a critical aspect for the application efficiency, the parallel environment has to free him from parallel programming intricacies (management of data, coherence of data, ...). The challenge is to merge these advanced features with the traditional requirements of an industrial code: robustness and flexibility, ease of use, predictability of computational resource employment.
The purpose of this paper is to present a parallel approach suited to the simulation of a wide class of nonlinear problems for quasistatic response. The starting point is to make use of the mechanical properties of the different types of equations to be solved in order to distribute computations over the different processors of a parallel computer. The approach is based on the use of two domain decompositions where the goal is to balance the computation load over the various processors by limiting the redistribution of the tasks. A good load balancing of the tasks as well as keeping the communications as low as possible are key to an effective parallel algorithm. The implementation of this algorithm is carried out starting from an extension of the possibilities of GIBIANE: the user language of the code CAST3M. We have created a parallel environment language that eases the development of parallel algorithms either at the programming level or at the user level. It is based on the development environment of the Finite Element code CAST3M. The developed parallel language, which is based on an objectbased virtual shared memory system, offers the user the vision of a unique and global address space over the individual memories [2]. It ensures the data coherence and hides data exchanges between processors and a great part of the sequential code can be reused. The propounded system can be implemented on most parallel computers as it is developed with machineindependent programming techniques and it is important to notice that the different concepts can be used in other objectbased parallel languages. Nonlinear problems are usually solved by means of NEWTON methods and lead mainly to compute two types of subproblems. The proposed parallel strategy uses the mechanical properties of these subproblems. On one hand, a domain decomposition technique with a direct resolution of the condensed problem is proposed to solve the linear global problems, in order to be compatible with the BFGS type convergence speedup. One can also use a parallel direct solver associated to a "nested dissection" ordering approach which limits the fillin effect in the factorization of the matrix [3]. In fact, this strategy is similar to a decomposition domain technique and gives nearly optimal performance on shared memory computers. On the other hand, it is almost impossible to predict the space evolution of the CPU time spent to integrate the constitutive laws. Therefore, in order to have a wellbalanced load, without communication, we propose the use of a second domain decomposition [4]. An optimization of the communications between the two domain decompositions is necessary to obtain good performances. To facilitate the resolution in parallel of a wide class of problems, in a transparent way for the user while ensuring good effectiveness, various recent developments were carried out. The asynchronous execution of calculations at the user level was simplified on one hand, using a "container" object gathering the decompositions of the objects used by a calculation and on the other hand, using an operator to distribute the tasks on various processors starting from these decompositions. Moreover, the structuring of the data proposed for the implementation of this "data parallel" technique to distribute the data and calculations on the various processors of a parallel machine makes it possible to ensure compatibility with sequential simulations. The resolution of large scale problems requires an intensive use of the virtual memory (swap on disk for unused objects). The management of objects which can be shared between various applications has been optimised in order to ensure the data coherence and to limit the blocking phases of the parallel applications. A version based on the standard Posix "pthread" was initially developed to ensure the performances of the parallel programming environment and to allow the code portability on shared memory computers. The extension of this strategy to shareddistributed memory computers is underway. Numerical examples, in the case of large scale industrial problems with material and geometrical nonlinearities are presented to validate the propounded parallel approach. References
purchase the fulltext of this chapter (price £20)
go to the previous chapter 
