Computational & Technology Resources
an online resource for computational,
engineering & technology publications
Civil-Comp Proceedings
ISSN 1759-3433
CCP: 112
PROCEEDINGS OF THE SIXTH INTERNATIONAL CONFERENCE ON PARALLEL, DISTRIBUTED, GPU AND CLOUD COMPUTING FOR ENGINEERING
Edited by: P. Iványi and B.H.V. Topping
Paper 17

Distributed asynchronous convergence detection without detection protocol

G. Gbikpi-Benissan1 and F. Magoules2

1RUDN University, Russia
2Centrale Supelec, Universite Paris-Saclay, France

Full Bibliographic Reference for this paper
G. Gbikpi-Benissan, F. Magoules, "Distributed asynchronous convergence detection without detection protocol", in P. Iványi, B.H.V. Topping, (Editors), "Proceedings of the Sixth International Conference on Parallel, Distributed, GPU and Cloud Computing for Engineering", Civil-Comp Press, Stirlingshire, UK, Paper 17, 2019. doi:10.4203/ccp.112.17
Keywords: asynchronous iterations, convergence detection, global residual, parallel computing.

Summary
One of the major questions which arise when implementing asynchronous iterations consists of finding a mechanism to detect when convergence is reached. On efficiency aspects, centralized detection protocols suffer from scaling limits, and more elaborated mechanisms may imply termination delays. On the other hand, effective convergence is hardly guaranteed when resorting to assumptions-based protocols. One thus has to figure out what is the most appropriate choice according to his parallel configuration.

To be more precise, let a sequence of vectors be generated by asynchronous iterations to find the solution of a fixed-point problem. In such a context, this sequence of vectors is actually implicit, and one only explicitly handles parallel sequences of local subvectors. The asynchronous convergence detection problem therefore consists of determining, in a non-blocking way, and as quickly as possible, the moment when a residual error evaluation function would nearly vanish if applied to a gathered potential solution. The main distributed approaches consist of: modifying the iterative procedure to ensure finite-time termination, explicitly evaluating residual errors from global state snapshots, approximating the number of iterations required to reach convergence, monitoring both the consistency and the persistence of local convergence, evaluating the diameter of solutions nested sets by means of “macro-iterations”.

Modifying the iterative procedure is intrusive and even requires additional assumptions over the asynchronous iterative model. Making use of nested sets was investigated only on mathematical aspects, and suggests the need of intrusive piggybacking techniques. The monitoringbased and the prediction-based approaches can lead to untimely termination, which requires a post-detection final check. The snapshot method introduces computation data into snapshot messages, which leads to an O(n) communication overhead. In our earlier work an O(1) snapshot message size is achieved, but at the cost of assuming a bound on communication delays. The analysis therein shows the evaluation of an approximated residual error, while this approximation is explicitly bounded. Roughly, it allows for a non consistent snapshot. We therefore investigate, in this paper, to which extent such a snapshot could be non consistent, which even allows us to consider no control at all, meaning not performing any prior snapshot protocol. We performed several experiments on a supercomputer, with up to 504 processor cores, for solving a convection-diffusion equation in a regular 3D grid geometry, by means of an asynchronous iterative method based on a mixed Jacobi and Gauss-Seidel relaxation scheme.

purchase the full-text of this paper (price £22)

go to the previous paper
go to the next paper
return to the table of contents
return to the book description