Computational & Technology Resources
an online resource for computational,
engineering & technology publications
Civil-Comp Proceedings
ISSN 1759-3433
CCP: 111
PROCEEDINGS OF THE FIFTH INTERNATIONAL CONFERENCE ON PARALLEL, DISTRIBUTED, GRID AND CLOUD COMPUTING FOR ENGINEERING
Edited by: P. Iványi, B.H.V. Topping and G. Várady
Paper 4

Speed up of Volumetric Non-local Transform-Domain Filter

P. Strakos, M. Jaros and T. Karasek

IT4Innovations, VSB-Technical University of Ostrava, Czech Republic

Full Bibliographic Reference for this paper
P. Strakos, M. Jaros, T. Karasek, "Speed up of Volumetric Non-local Transform-Domain Filter", in P. Iványi, B.H.V. Topping, G. Várady, (Editors), "Proceedings of the Fifth International Conference on Parallel, Distributed, Grid and Cloud Computing for Engineering", Civil-Comp Press, Stirlingshire, UK, Paper 4, 2017. doi:10.4203/ccp.111.4
Keywords: volumetric data, image denoising, parallel implementation, BM4D, medical imaging, OpenMP, MPI.

Summary
We present a parallel implementation of Non-local Transform-Domain filter (BM4D) in this paper. Effectiveness of this implementation is presented on de-noising of 3D images from Computed Tomography (CT) and Magnetic Resonance Imaging (MRI) scans. The principle of BM4D filter is that this filter performs grouping and collaborative filtering where mutually similar data within the image are stacked together and filtered. In BM4D cubes of voxels, called patches, are used as basic image elements for filtering. Using voxels instead of pixels means that the area for searching the similar patches is quite large. Because of this and due to the application of multi-dimensional transformations the BM4Dís computation time is extremely long. Despite that, only single-threaded implementation is presented in the literature. To speed up the filtering process, multi-core or even multi-node parallel implementation is necessary. In this paper, we present original parallel version of the filter. To parallelize the BM4D implementation, the filtering concept is decomposed to smaller parts which can be solved concurrently. Our implementation uses hybrid parallelization, which combines OpenMP and MPI technologies. We use OpenMP for the parallelization on one computational node and MPI for parallelization among more computational nodes. The speed up of our parallel implementation is demonstrated on several numerical examples.

purchase the full-text of this paper (price £22)

go to the previous paper
go to the next paper
return to the table of contents
return to the book description