![]() |
Computational & Technology Resources
an online resource for computational,
engineering & technology publications |
Civil-Comp Conferences
ISSN 2753-3239 CCC: 12
PROCEEDINGS OF THE EIGHTH INTERNATIONAL CONFERENCE ON PARALLEL, DISTRIBUTED, GPU AND CLOUD COMPUTING FOR ENGINEERING Edited by: P. Iványi, J. Kruis and B.H.V. Topping
Paper 2.4
GPU Parallelization for Analytical Hierarchical Tucker Representation Using Binary Trees Z. Qiu1,2, F. Magoulès2 and D. Peláez1
1Institut des Sciences Moléculaires d'Orsay, Université Paris-Saclay, Orsay, Île-de France, France
Full Bibliographic Reference for this paper
Z. Qiu, F. Magoulès, D. Peláez, "GPU Parallelization for Analytical Hierarchical Tucker Representation Using Binary Trees", in P. Iványi, J. Kruis, B.H.V. Topping, (Editors), "Proceedings of the Eighth International Conference on
Parallel, Distributed, GPU and Cloud Computing for Engineering", Civil-Comp Press, Edinburgh, UK,
Online volume: CCC 12, Paper 2.4, 2025,
Keywords: functional Hierarchical Tucker format, finite basis representation, deep learning, multi-GPU parallelism, DDP, high-dimensional fitting.
Abstract
In this contribution, we present a Distributed Data Parallel (DDP) approach for optimizing an analytical hierarchical Tucker in finite basis representation functional Tucker representation (HT-FBR) of high-order multivariate functions. We achieve up to a 10× speedup compared to the benchmark on a 6D proof-of-concept dataset and 2x speedup on a 12D Ethene trajectory dataset trained on 1, 2, 4 and 8 GPUs. This speedup is attributed to our implementation of an element-wise GPU parallelization algorithm for both forward and backward propagations, as well as a customized dataset and data loader configuration. We also show that the model trained on multi-GPU has less overfitting issue than the one trained on a single CPU/GPU. This paves the way to a large scale high-performance training schema and model parallelization on multi-GPU setting, especially the parallelism on levels of the nodes based on their binary tree dependency. On the other hand, improving the accuracy and reducing overfitting as a function of number of GPUs.
download the full-text of this paper (PDF, 12 pages, 707 Kb)
go to the previous paper |
|