Computational Technology Resources - CSETS

Keywords: GPU, Cuda, OpenCL, OpenMP, MPI, HMPP, Curie, big data.

Abstract

This chapter relates a typical example of porting a legacy application to GPU architectures. The application named MetaProf aims to provide correlation patterns in meta-genomic catalogues and to help in identifying new species. Specificity of the application lies in the high volumetry of data and calculation handled in the process i.e. a matrix of 8 million genes by 800 samples as input data, the complexity of the calculation being quadratic. The time required to process such data with a sequential single-core implementation exceeds one month on a commodity server. In this chapter, we describe first a parallel version of the algorithm for multi-core architecture based on hybrid MPI-OpenMP. Then we demonstrate how emerging GPU architectures turned out to be an interesting alternative to these early implementations in terms of pure performance, scalability and power efficiency. Different programming models such as Cuda, OpenCL and HMPP are evaluated and compared. Our implementations were tested on both high-scale GPU clusters such as TGCC Titane and Curie and GPU-based workstations. The conclusion of our work confirmed that Cuda implementations are the fastest on Nvidia GPU, whereas their OpenCL or HMMP show slightly less performance but are valuable in terms of portability and perenity. As far as the initial use case is concerned a GPU cluster such as Curie gave us the opportunity to bring the processing time of a 3 M matrix down to a few minutes.

purchase the full-text of this chapter (price £20)

go to the previous chapter
go to the next chapter
return to the table of contents
return to the book description
purchase this book (price £85 +P&P)

	Computational & Technology Resources an online resource for computational, engineering & technology publications
	not logged in - login
Front Page Browse CCP CSETS CTR IJRT Other Authors Search Purchase Guide FAQ Contact us	Computational Science, Engineering & Technology Series ISSN 1759-3158 CSETS: 34 PATTERNS FOR PARALLEL PROGRAMMING ON GPUS Edited by: F. MagoulÃ¨s Chapter 12 Migrating a Big-Data Grade Application to Large GPU Clusters D. Tello¹, V. Ducrot¹, J.-M. Batto², S. Monot¹, F. Boumezbeur², V. Arslan¹ and T. Saidani¹ ¹Alliance Services Plus, Groupe EOLEN, Malakoff, France ²Unité MICALIS, INRA, Jouy-en-Josas, France doi:10.4203/csets.34.12 purchase the full-text of this chapter Full Bibliographic Reference for this chapter D. Tello, V. Ducrot, J.-M. Batto, S. Monot, F. Boumezbeur, V. Arslan, T. Saidani, "Migrating a Big-Data Grade Application to Large GPU Clusters", in F. MagoulÃ¨s, (Editor), "Patterns for Parallel Programming on GPUs", Saxe-Coburg Publications, Stirlingshire, UK, Chapter 12, pp 281-310, 2014. doi:10.4203/csets.34.12 Keywords: GPU, Cuda, OpenCL, OpenMP, MPI, HMPP, Curie, big data. Abstract This chapter relates a typical example of porting a legacy application to GPU architectures. The application named MetaProf aims to provide correlation patterns in meta-genomic catalogues and to help in identifying new species. Specificity of the application lies in the high volumetry of data and calculation handled in the process i.e. a matrix of 8 million genes by 800 samples as input data, the complexity of the calculation being quadratic. The time required to process such data with a sequential single-core implementation exceeds one month on a commodity server. In this chapter, we describe first a parallel version of the algorithm for multi-core architecture based on hybrid MPI-OpenMP. Then we demonstrate how emerging GPU architectures turned out to be an interesting alternative to these early implementations in terms of pure performance, scalability and power efficiency. Different programming models such as Cuda, OpenCL and HMPP are evaluated and compared. Our implementations were tested on both high-scale GPU clusters such as TGCC Titane and Curie and GPU-based workstations. The conclusion of our work confirmed that Cuda implementations are the fastest on Nvidia GPU, whereas their OpenCL or HMMP show slightly less performance but are valuable in terms of portability and perenity. As far as the initial use case is concerned a GPU cluster such as Curie gave us the opportunity to bring the processing time of a 3 M matrix down to a few minutes. purchase the full-text of this chapter (price £20) go to the previous chapter go to the next chapter return to the table of contents return to the book description purchase this book (price £85 +P&P)
Back to top	©Civil-Comp Limited 2023 - terms & conditions