Facial Movement Recognition Using CNN-BiLSTM in Vowel for Bahasa Indonesia
Abstract
Full Text:
PDFReferences
G. Vigliocco, P. Perniss, and D. Vinson, “Language as a multimodal phenomenon: implications for language learning, processing, and evolution,” Philosophical Transactions of the Royal Society B: Biological Sciences, vol. 369, no. 1651. The Royal Society, p. 20130292, Sep. 19, 2014. doi: 10.1098/rstb.2013.0292.
R. Sultana and R. Palit, "A survey on Bengali speech-to-text recognition techniques," 2014 9th International Forum on Strategic Technology (IFOST), Cox's Bazar, Bangladesh, 2014, pp. 26-29
I. Papadimitriou, A. Vafeiadis, A. Lalas, K. Votis, and D. Tzovaras, ‘Audio-Based Event Detection at Different SNR Settings Using Two-Dimensional Spectrogram Magnitude Representations’, Electronics, 2020.
S. Prom-on and M. Onsri, "Effects of Facial Movements to Expressive Speech Productions: A Computational Study," 2019 IEEE 2nd International Conference on Knowledge Innovation and Invention (ICKII), Seoul, Korea (South), 2019, pp. 481-484.
Z. Lu and L. Czap, "Modelling the tongue movement of Chinese Shaanxi Xi'an dialect speech," 2018 19th International Carpathian Control Conference (ICCC), Szilvasvarad, Hungary, 2018, pp. 98-103, doi: 10.1109/CarpathianCC.2018.8399609.
K. Kumatani and R. Stiefelhagen, "State Synchronous Modeling on Phone Boundary for Audio Visual Speech Recognition and Application to Muti-View Face Images," 2007 IEEE International Conference on Acoustics, Speech, and Signal Processing - ICASSP '07, Honolulu, HI, USA, 2007, pp. IV-417-IV-420.
N. K. Mudaliar, K. Hegde, A. Ramesh, and V. Patil, "Visual Speech Recognition: A Deep Learning Approach," 2020 5th International Conference on Communication and Electronics Systems (ICCES), Coimbatore, India, 2020, pp. 1218-1221, doi: 10.1109/ICCES48766.2020.9137926.
S. Isobe et al., ‘GAMVA: A Japanese Audio-Visual Multi-Angle Speech Corpus’, 2021 24th Conference of the Oriental COCOSDA International Committee for the Co-ordination and Standardisation of Speech Databases and Assessment Techniques (O-COCOSDA), pp. 134–139, 2021.
T. Tasaka and N. Hamada, "Speaker dependent visual word recognition by using sequential mouth shape codes," 2012 International Symposium on Intelligent Signal Processing and Communications Systems, Tamsui, Taiwan, 2012, pp. 96-101, doi: 10.1109/ISPACS.2012.6473460.
Maxalmina, S. Kahfi, K. N. Ramadhani, and A. Arifianto, "Lip Motion Recognition for Indonesian Vowel Phonemes Using 3D Convolutional Neural Networks," 2020 3rd International Conference on Computer and Informatics Engineering (IC2IE), Yogyakarta, Indonesia, 2020, pp. 157-161, doi: 10.1109/IC2IE50715.2020.9274562.
G. Inc., "MediaPipe", 2022. [Online]. Available: https://github.com/google/mediapipe.
D. E. Rumelhart, G. E. Hinton, and R. J. Williams, ‘Learning representations by back-propagating errors', Nature, vol. 323, no. 6088, pp. 533–536, 1986.
Y. LeCun, L. Bottou, Y. Bengio, and P. Haffner, "Gradient-based learning applied to document recognition", Proceedings of the IEEE, vol. 86, no. 11, 1998, pp. 2278–2324.
S. Hochreiter and J. Schmidhuber, ‘Long Short-term Memory’, Neural computation, vol. 9, pp. 1735–1780, 12 1997.
S. Kanai, Y. Fujiwara, Y. Yamanaka, and S. Adachi, ‘Sigsoftmax: Reanalysis of the Softmax Bottleneck’, arXiv [stat.ML]. 2018.
J. D. O’Connor, and J. L. M. Trim, “Vowel, Consonant, and Syllable—A Phonological Definition,” vol. 9, no. 2. Informa UK Limited, pp. 103–122, Aug. 1953.
Kementrian Pendidikan dan Kebudayaan Indonesia, "Huruf Vokal - EYD V", 2023. [Online]. Available: https://ejaan.kemdikbud.go.id/eyd/penggunaan-huruf/huruf-vokal/.
Kementrian Pendidikan dan Kebudayaan Indonesia, "Kata Pengantar - EYD V", 2023. [Online]. Available: https://ejaan.kemdikbud.go.id/eyd/.
A. L. Maas, A. Y. Hannun, and A. Y. Ng, "Rectifier Nonlinearities Improve Neural Network Acoustic Models", 2013.
B. Xu, N. Wang, T. Chen, and M. Li, “Empirical Evaluation of Rectified Activations in Convolutional Network,” May 2015.
D. Kingma and J. Ba, “Adam: A Method for Stochastic Optimization,” International Conference on Learning Representations, Dec. 2014.
N. Srivastava, G. Hinton, A. Krizhevsky, I. Sutskever, and R. Salakhutdinov, “Dropout: A Simple Way to Prevent Neural Networks from Overfitting,” Journal of Machine Learning Research, vol. 15, no. 56, pp. 1929–1958, 2014.
DOI: https://doi.org/10.12962/jaree.v8i1.372
Refbacks
- There are currently no refbacks.
This work is licensed under a Creative Commons Attribution-NonCommercial 4.0 International License.