Computational Technology Resources - CCC

Keywords: convolutional neural network,semantic image segmentation, binary human segmentation, learning rate optimisation, lecture capture technology.

Abstract

This paper provides an insight into the development of a state-of-the-art video processing system to address limitations within Durham University’s ‘Encore’ lecture capture solution. The aim of the research described in this paper is to digitally remove the persons presenting from the view of a whiteboard to provide students with a more effective online learning experience. This work enlists a ‘human entity detection module’, which uses a remodelled version of the Fast Segmentation Neural Network to perform efficient binary image segmentation, and a ‘background restoration module’, which introduces a novel procedure to retain only background pixels in consecutive video frames. The segmentation network is trained from the outset with a Tversky loss function on a dataset of images extracted from various Tik-Tok dance videos. The most effective training techniques are described in detail, and it is found that these produce asymptotic convergence to within 5% of the final loss in only 40 training epochs. A cross-validation study then concludes that a Tversky parameter of 0.9 is optimal for balancing recall and precision in the context of this work. Finally, it is demonstrated that the system successfully removes the human form from the view of the whiteboard in a real lecture video. Whilst the system is believed to have the potential for real-time usage, it is not possible to prove this owing to hardware limitations. In the conclusions, wider application of this work is also suggested.

download the full-text of this paper (PDF, 4447 Kb)

go to the previous paper
go to the next paper
return to the table of contents
return to the volume description

	Computational & Technology Resources an online resource for computational, engineering & technology publications
	not logged in - login
Front Page Browse CCC CCP CSETS CTR IJRT Other Authors Search Purchase Guide FAQ Contact us	Civil-Comp Conferences ISSN 2753-3239 CCC: 2 PROCEEDINGS OF THE ELEVENTH INTERNATIONAL CONFERENCE ON ENGINEERING COMPUTATIONAL TECHNOLOGY Edited by: B.H.V. Topping and P. Iványi Paper 6.2 Enhancing Lecture Capture with Deep Learning R.M. Sales^1,2 and S. Giani² ¹Whittle Laboratory, Cambridge University, Cambridge, United Kingdom ²Engineering Department, Durham University, Durham, United Kingdom doi:10.4203/ccc.2.6.2 Full Bibliographic Reference for this paper R.M. Sales, S. Giani, "Enhancing Lecture Capture with Deep Learning", in B.H.V. Topping, P. Iványi, (Editors), "Proceedings of the Eleventh International Conference on Engineering Computational Technology", Civil-Comp Press, Edinburgh, UK, Online volume: CCC 2, Paper 6.2, 2022, doi:10.4203/ccc.2.6.2 Keywords: convolutional neural network,semantic image segmentation, binary human segmentation, learning rate optimisation, lecture capture technology. Abstract This paper provides an insight into the development of a state-of-the-art video processing system to address limitations within Durham University’s ‘Encore’ lecture capture solution. The aim of the research described in this paper is to digitally remove the persons presenting from the view of a whiteboard to provide students with a more effective online learning experience. This work enlists a ‘human entity detection module’, which uses a remodelled version of the Fast Segmentation Neural Network to perform efficient binary image segmentation, and a ‘background restoration module’, which introduces a novel procedure to retain only background pixels in consecutive video frames. The segmentation network is trained from the outset with a Tversky loss function on a dataset of images extracted from various Tik-Tok dance videos. The most effective training techniques are described in detail, and it is found that these produce asymptotic convergence to within 5% of the final loss in only 40 training epochs. A cross-validation study then concludes that a Tversky parameter of 0.9 is optimal for balancing recall and precision in the context of this work. Finally, it is demonstrated that the system successfully removes the human form from the view of the whiteboard in a real lecture video. Whilst the system is believed to have the potential for real-time usage, it is not possible to prove this owing to hardware limitations. In the conclusions, wider application of this work is also suggested. download the full-text of this paper (PDF, 4447 Kb) go to the previous paper go to the next paper return to the table of contents return to the volume description
Back to top	©Civil-Comp Limited 2023 - terms & conditions