Turkish lip-reading using Bi-LSTM and deep learning models

Atila, Uemit; Sabaz, Furkan

Turkish lip-reading using Bi-LSTM and deep learning models

dc.authorid	ATILA, UMIT/0000-0002-1576-9977
dc.contributor.author	Atila, Uemit
dc.contributor.author	Sabaz, Furkan
dc.date.accessioned	2024-09-29T15:57:36Z
dc.date.available	2024-09-29T15:57:36Z
dc.date.issued	2022
dc.department	Karabük Üniversitesi	en_US
dc.description.abstract	In recent years, lip-reading has been one of the studies whose importance has increased considerably, especially with the spread of deep learning applications. In this topic, researchers try to detect what a person says from video frames without sound. When the previous studies are analysed, it is seen that automatic lip-reading systems have been developed for various languages such as Chinese, Korean, English and German. However, these studies reveal that the development of the system is difficult because lip-reading from video frame images without audio data depends on many parameters such as light, shooting distance, and the gender of the person. Lip-reading systems were first developed using classical machine learning methods. However, especially in recent years, with the popularity of deep learning applications, this subject has started to be studied more than before and studies reveal that in general, deep learning-based lip-reading gives more successful results.Even though there are studies in this field in different languages, there is no current study and dataset in Turkish. Therefore, this study aims to investigate the performances of the state-of-the art deep learning models on Turkish lip-reading. To this aim, two new datasets, one with 111 words and other with 113 sentences were created using image processing techniques. The model used in this study to perform lip-reading extracts features from video frames using CNN based models and performs classification using Bidirectional Long Short-Term Memory (Bi-LSTM). Results of experiments reveal that, ResNet-18 and Bi-LSTM pair gives the best results in both word and sentence datasets with accuracy values 84.5% and 88.55%, respectively. It is also observed that, better performances are obtained in sentence recogni-tion than word recognition in almost every model implemented.(c) 2022 Karabuk University. Publishing services by Elsevier B.V. This is an open access article under the CC BY-NC-ND license (http://creativecommons.org/licenses/by-nc-nd/4.0/).	en_US
dc.identifier.doi	10.1016/j.jestch.2022.101206
dc.identifier.issn	2215-0986
dc.identifier.scopus	2-s2.0-85133350584	en_US
dc.identifier.scopusquality	Q1	en_US
dc.identifier.uri	https://doi.org/10.1016/j.jestch.2022.101206
dc.identifier.uri	https://hdl.handle.net/20.500.14619/4911
dc.identifier.volume	35	en_US
dc.identifier.wos	WOS:000892526300009	en_US
dc.identifier.wosquality	Q1	en_US
dc.indekslendigikaynak	Web of Science	en_US
dc.indekslendigikaynak	Scopus	en_US
dc.language.iso	en	en_US
dc.publisher	Elsevier - Division Reed Elsevier India Pvt Ltd	en_US
dc.relation.ispartof	Engineering Science and Technology-An International Journal-Jestech	en_US
dc.relation.publicationcategory	Makale - Uluslararası Hakemli Dergi - Kurum Öğretim Elemanı	en_US
dc.rights	info:eu-repo/semantics/openAccess	en_US
dc.subject	Lip-reading	en_US
dc.subject	Bi-lstm	en_US
dc.subject	Deep learning	en_US
dc.subject	Dataset	en_US
dc.subject	Turkish	en_US
dc.title	Turkish lip-reading using Bi-LSTM and deep learning models	en_US
dc.type	Article	en_US

Collections

WoS İndeksli Yayınlar Koleksiyonu
Scopus İndeksli Yayınlar Koleksiyonu

Turkish lip-reading using Bi-LSTM and deep learning models

Files

Collections