Facial Movement Recognition Using CNN-BiLSTM in Vowel for Bahasa Indonesia

Muhammad Daffa Abiyyu Rahman; Alif Aditya Wicaksono; Eko Mulyanto Yuniarno; Supeno Mardi Susiki Nugroho

doi:10.12962/jaree.v8i1.372

Penulis

Muhammad Daffa Abiyyu Rahman Institut Teknologi Sepuluh Nopember
Alif Aditya Wicaksono Institut Teknologi Sepuluh Nopember
Eko Mulyanto Yuniarno Institut Teknologi Sepuluh Nopember
Supeno Mardi Susiki Nugroho Institut Teknologi Sepuluh Nopember

DOI:

https://doi.org/10.12962/jaree.v8i1.372

Abstrak

Speaking is a multimodal phenomenon that has both verbal and non-verbal cues. One of the non-verbal cues in speaking is the facial movement of the subject, which can be used to find the letter being spoken by the subject. Previous research has been done to prove that lip movement can translate to vowels for Bahasa Indonesia, but detecting the whole facial movement is yet to be covered. This research aimed to establish a CNN-BiLSTM model that can learn spoken vowels by reading the subject's facial movements. The CNN-BiLSTM model yielded a 98.66% validation accuracy, with over 94% accuracy for all five vowels. The model is also capable of recognizing whether the subject is currently silent or speaking a vowel with 98.07% accuracy.

Biografi Penulis

Muhammad Daffa Abiyyu Rahman, Institut Teknologi Sepuluh Nopember

Electrical Engineering

Alif Aditya Wicaksono, Institut Teknologi Sepuluh Nopember

Computer Engineering

Eko Mulyanto Yuniarno, Institut Teknologi Sepuluh Nopember

Electrical Engineering, Supervisor

Supeno Mardi Susiki Nugroho, Institut Teknologi Sepuluh Nopember

Electrical Engineering, Supervisor

Referensi

G. Vigliocco, P. Perniss, and D. Vinson, â€œLanguage as a multimodal phenomenon: implications for language learning, processing, and evolution,â€ Philosophical Transactions of the Royal Society B: Biological Sciences, vol. 369, no. 1651. The Royal Society, p. 20130292, Sep. 19, 2014. doi: 10.1098/rstb.2013.0292.

R. Sultana and R. Palit, "A survey on Bengali speech-to-text recognition techniques," 2014 9th International Forum on Strategic Technology (IFOST), Cox's Bazar, Bangladesh, 2014, pp. 26-29

I. Papadimitriou, A. Vafeiadis, A. Lalas, K. Votis, and D. Tzovaras, â€˜Audio-Based Event Detection at Different SNR Settings Using Two-Dimensional Spectrogram Magnitude Representationsâ€™, Electronics, 2020.

S. Prom-on and M. Onsri, "Effects of Facial Movements to Expressive Speech Productions: A Computational Study," 2019 IEEE 2nd International Conference on Knowledge Innovation and Invention (ICKII), Seoul, Korea (South), 2019, pp. 481-484.

Z. Lu and L. Czap, "Modelling the tongue movement of Chinese Shaanxi Xi'an dialect speech," 2018 19th International Carpathian Control Conference (ICCC), Szilvasvarad, Hungary, 2018, pp. 98-103, doi: 10.1109/CarpathianCC.2018.8399609.

K. Kumatani and R. Stiefelhagen, "State Synchronous Modeling on Phone Boundary for Audio Visual Speech Recognition and Application to Muti-View Face Images," 2007 IEEE International Conference on Acoustics, Speech, and Signal Processing - ICASSP '07, Honolulu, HI, USA, 2007, pp. IV-417-IV-420.

N. K. Mudaliar, K. Hegde, A. Ramesh, and V. Patil, "Visual Speech Recognition: A Deep Learning Approach," 2020 5th International Conference on Communication and Electronics Systems (ICCES), Coimbatore, India, 2020, pp. 1218-1221, doi: 10.1109/ICCES48766.2020.9137926.

S. Isobe et al., â€˜GAMVA: A Japanese Audio-Visual Multi-Angle Speech Corpusâ€™, 2021 24th Conference of the Oriental COCOSDA International Committee for the Co-ordination and Standardisation of Speech Databases and Assessment Techniques (O-COCOSDA), pp. 134â€“139, 2021.

T. Tasaka and N. Hamada, "Speaker dependent visual word recognition by using sequential mouth shape codes," 2012 International Symposium on Intelligent Signal Processing and Communications Systems, Tamsui, Taiwan, 2012, pp. 96-101, doi: 10.1109/ISPACS.2012.6473460.

Maxalmina, S. Kahfi, K. N. Ramadhani, and A. Arifianto, "Lip Motion Recognition for Indonesian Vowel Phonemes Using 3D Convolutional Neural Networks," 2020 3rd International Conference on Computer and Informatics Engineering (IC2IE), Yogyakarta, Indonesia, 2020, pp. 157-161, doi: 10.1109/IC2IE50715.2020.9274562.

G. Inc., "MediaPipe", 2022. [Online]. Available: https://github.com/google/mediapipe.

D. E. Rumelhart, G. E. Hinton, and R. J. Williams, â€˜Learning representations by back-propagating errors', Nature, vol. 323, no. 6088, pp. 533â€“536, 1986.

Y. LeCun, L. Bottou, Y. Bengio, and P. Haffner, "Gradient-based learning applied to document recognition", Proceedings of the IEEE, vol. 86, no. 11, 1998, pp. 2278â€“2324.

S. Hochreiter and J. Schmidhuber, â€˜Long Short-term Memoryâ€™, Neural computation, vol. 9, pp. 1735â€“1780, 12 1997.

S. Kanai, Y. Fujiwara, Y. Yamanaka, and S. Adachi, â€˜Sigsoftmax: Reanalysis of the Softmax Bottleneckâ€™, arXiv [stat.ML]. 2018.

J. D. Oâ€™Connor, and J. L. M. Trim, â€œVowel, Consonant, and Syllableâ€”A Phonological Definition,â€ vol. 9, no. 2. Informa UK Limited, pp. 103â€“122, Aug. 1953.

Kementrian Pendidikan dan Kebudayaan Indonesia, "Huruf Vokal - EYD V", 2023. [Online]. Available: https://ejaan.kemdikbud.go.id/eyd/penggunaan-huruf/huruf-vokal/.

Kementrian Pendidikan dan Kebudayaan Indonesia, "Kata Pengantar - EYD V", 2023. [Online]. Available: https://ejaan.kemdikbud.go.id/eyd/.

A. L. Maas, A. Y. Hannun, and A. Y. Ng, "Rectifier Nonlinearities Improve Neural Network Acoustic Models", 2013.

B. Xu, N. Wang, T. Chen, and M. Li, â€œEmpirical Evaluation of Rectified Activations in Convolutional Network,â€ May 2015.

D. Kingma and J. Ba, â€œAdam: A Method for Stochastic Optimization,â€ International Conference on Learning Representations, Dec. 2014.

N. Srivastava, G. Hinton, A. Krizhevsky, I. Sutskever, and R. Salakhutdinov, â€œDropout: A Simple Way to Prevent Neural Networks from Overfitting,â€ Journal of Machine Learning Research, vol. 15, no. 56, pp. 1929â€“1958, 2014.

Facial Movement Recognition Using CNN-BiLSTM in Vowel for Bahasa Indonesia

Penulis

DOI:

Abstrak

Biografi Penulis

Muhammad Daffa Abiyyu Rahman, Institut Teknologi Sepuluh Nopember

Alif Aditya Wicaksono, Institut Teknologi Sepuluh Nopember

Eko Mulyanto Yuniarno, Institut Teknologi Sepuluh Nopember

Supeno Mardi Susiki Nugroho, Institut Teknologi Sepuluh Nopember

Referensi

##submission.downloads##

Diterbitkan

Terbitan

Bagian

Masukkan Naskah

Terbitan Terkini

Cari

Informasi

Bahasa

Find us

Publisher

Visitors