Multiple Face Tracking using Kalman and Hungarian Algorithm to Reduce Face Recognition Computational Cost

Willy Achmat Fauzi, Supeno M Susiki Nugroho, Eko Mulyanto Yuniarno, Wiwik Anggraeni, Mauridhi Hery Purnomo


Currently, research in face recognition systems mainly utilized deep learning to achieve high accuracy. Using deep learning as the base platform, per frame image processing to detect and recognize faces is computationally expensive, especially for video surveillance systems using large numbers of mounted cameras simultaneously streaming video data to the system. The idea behind this research is that the system does not need to recognize every occurrence of faces in every frame. We used MobileNet SSD to detect the face, Kalman filter to predict face location in the next frame when detection fails, and Hungarian algorithm to maintain the identity of each face. Based on the result, using our algorithm 87.832 face that must be recognized is reduced to only 204 faces, and run at the real-time scenario. This method is proven to be used in surveillance systems by reducing the computational cost.

Keywords: Hungarian algorithm, Kalman filter, multiple face tracking, video surveillance system.

Full Text:



A. Krizhevsky, I. Sutskever, and G. E. Hinton, “ImageNet classification with deep convolutional neural networks,” Commun. ACM, 2017, doi: 10.1145/3065386.

S. Ren, K. He, R. Girshick, and J. Sun, “Faster R-CNN: Towards Real-Time Object Detection with Region Proposal Networks,” IEEE Trans. Pattern Anal. Mach. Intell., 2017, doi: 10.1109/TPAMI.2016.2577031.

J. Redmon and A. Farhadi, “YOLO9000: Better, faster, stronger,” in Proceedings - 30th IEEE Conference on Computer Vision and Pattern Recognition, CVPR 2017, 2017, doi: 10.1109/CVPR.2017.690.

R. Elhakim, M. Abdelwahab, A. Eldesokey, and M. Elhelw, “Traffisense: A smart integrated visual sensing system for traffic monitoring,” in IntelliSys 2015 - Proceedings of 2015 SAI Intelligent Systems Conference, 2015, doi: 10.1109/IntelliSys.2015.7361174.

H. Tayara, K. G. Soo, and K. T. Chong, “Vehicle Detection and Counting in High-Resolution Aerial Images Using Convolutional Regression Neural Network,” IEEE Access, 2017, doi: 10.1109/ACCESS.2017.2782260.

E. Ristani, F. Solera, R. Zou, R. Cucchiara, and C. Tomasi, “Performance Measures and a Data Set forMulti-Target, Multi-Camera Tracking,” in European Conference on Computer Vision Workshops (ECCVW), 2016.

M. Turk and A. Pentland, “Eigenfaces for recognition,” J. Cogn. Neurosci., 1991, doi: 10.1162/jocn.1991.3.1.71.

I. T. Jolliffe, “Principal component analysis,” Springer, New York., 1986.

Y. Taigman, M. Yang, M. Ranzato, and L. Wolf, “DeepFace: Closing the gap to human-level performance in face verification,” in Proceedings of the IEEE Computer Society Conference on Computer Vision and Pattern Recognition, 2014, doi: 10.1109/CVPR.2014.220.

Y. Sun, Y. Chen, X. Wang, and X. Tang, “Deep learning face representation by joint identification-verification,” in Advances in Neural Information Processing Systems, 2014.

F. Schroff, D. Kalenichenko, and J. Philbin, “FaceNet: A unified embedding for face recognition and clustering,” in Proceedings of the IEEE Computer Society Conference on Computer Vision and Pattern Recognition, 2015, doi: 10.1109/CVPR.2015.7298682.

W. Liu et al., “SSD: Single shot multibox detector,” in Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics), 2016, doi: 10.1007/978-3-319-46448-0_2.

C. Jia et al., “A Tracking-Learning-Detection (TLD) method with local binary pattern improved,” in 2015 IEEE International Conference on Robotics and Biomimetics, IEEE-ROBIO 2015, 2015, doi: 10.1109/ROBIO.2015.7419004.

Y. Tian, A. Dehghan, and M. Shah, “On Detection, Data Association and Segmentation for Multi-target Tracking,” IEEE Trans. Pattern Anal. Mach. Intell., 2018, doi: 10.1109/TPAMI.2018.2849374.

H. Sheng, Y. Zhang, J. Chen, Z. Xiong, and J. Zhang, “Heterogeneous Association Graph Fusion for Target Association in Multiple Object Tracking,” IEEE Trans. Circuits Syst. Video Technol., 2019, doi: 10.1109/TCSVT.2018.2882192.

P. Viola and M. Jones, “Robust real-time face detection,” Int. J. Comput. Vis., 2004, doi: 10.1023/B:VISI.0000013087.49260.fb.

N. Dalal and B. Triggs, “Histograms of oriented gradients for human detection,” in Proceedings - 2005 IEEE Computer Society Conference on Computer Vision and Pattern Recognition, CVPR 2005, 2005, doi: 10.1109/CVPR.2005.177.

K. Zhang, Z. Zhang, Z. Li, and Y. Qiao, “Joint Face Detection and Alignment Using Multitask Cascaded Convolutional Networks,” IEEE Signal Process. Lett., 2016, doi: 10.1109/LSP.2016.2603342.

Y. Wang and J. Zheng, “Real-time face detection based on YOLO,” in 1st IEEE International Conference on Knowledge Innovation and Invention, ICKII 2018, 2018, doi: 10.1109/ICKII.2018.8569109.

A. G. Howard et al., “MobileNets: Efficient convolutional neural networks for mobile vision applications,” arXiv. 2017.

N. S. Sanjay and A. Ahmadinia, “MobileNet-Tiny: A deep neural network-based real-time object detection for rasberry Pi,” in Proceedings - 18th IEEE International Conference on Machine Learning and Applications, ICMLA 2019, 2019, doi: 10.1109/ICMLA.2019.00118.

J. Huang et al., “Speed/accuracy trade-offs for modern convolutional object detectors,” in Proceedings - 30th IEEE Conference on Computer Vision and Pattern Recognition, CVPR 2017, 2017, doi: 10.1109/CVPR.2017.351.

A. Lukežič, T. Vojíř, L. Čehovin Zajc, J. Matas, and M. Kristan, “Discriminative Correlation Filter Tracker with Channel and Spatial Reliability,” Int. J. Comput. Vis., 2018, doi: 10.1007/s11263-017-1061-3.

Z. Yang, J. Wu, and C. Long, “Learning Spatial-Corrected Regularized Correlation Filters for Visual Tracking,” in Proceedings - International Conference on Tools with Artificial Intelligence, ICTAI, 2019, doi: 10.1109/ICTAI.2019.00-96.

A. Bewley, Z. Ge, L. Ott, F. Ramos, and B. Upcroft, “Simple online and realtime tracking,” in Proceedings - International Conference on Image Processing, ICIP, 2016, doi: 10.1109/ICIP.2016.7533003.

B. Sahbani and W. Adiprawita, “Kalman filter and iterative-hungarian algorithm implementation for low complexity point tracking as part of fast multiple object tracking system,” in Proceedings of the 2016 6th International Conference on System Engineering and Technology, ICSET 2016, 2017, doi: 10.1109/FIT.2016.7857548.

F. Cahyono, W. Wirawan, and R. Fuad Rachmadi, “Face recognition system using facenet algorithm for employee presence,” in 4th International Conference on Vocational Education and Training, ICOVET 2020, 2020, doi: 10.1109/ICOVET50258.2020.9229888.

T. Nyein and A. N. Oo, “University Classroom Attendance System Using FaceNet and Support Vector Machine,” in 2019 International Conference on Advanced Information Technologies, ICAIT 2019, 2019, doi: 10.1109/AITC.2019.8921316.

E. Jose, M. Greeshma, T. P. Mithun Haridas, and M. H. Supriya, “Face Recognition based Surveillance System Using FaceNet and MTCNN on Jetson TX2,” in 2019 5th International Conference on Advanced Computing and Communication Systems, ICACCS 2019, 2019, doi: 10.1109/ICACCS.2019.8728466.

H. C. Kaşkavalci and S. Gören, “A Deep Learning Based Distributed Smart Surveillance Architecture using Edge and Cloud Computing,” in Proceedings - 2019 International Conference on Deep Learning and Machine Learning in Emerging Applications, Deep-ML 2019, 2019, doi: 10.1109/Deep-ML.2019.00009.

F. Solera, S. Calderara, E. Ristani, C. Tomasi, and R. Cucchiara, “Tracking Social Groups Within and Across Cameras,” IEEE Trans. Circuits Syst. Video Technol., 2016.

E. Ristani and C. Tomasi, “Tracking Multiple People Online and in Real Time,” in Asian Conference on Computer Vision, 2014, pp. 444–459.

J. Hosang, R. Benenson, and B. Schiele, “Learning non-maximum suppression,” in Proceedings - 30th IEEE Conference on Computer Vision and Pattern Recognition, CVPR 2017, 2017, doi: 10.1109/CVPR.2017.685.

T. Agarwal and H. Mittal, “Performance Comparison of Deep Neural Networks on Image Datasets,” in 2019 12th International Conference on Contemporary Computing, IC3 2019, 2019, doi: 10.1109/IC3.2019.8844924.

S. Bianco, R. Cadene, L. Celona, and P. Napoletano, “Benchmark analysis of representative deep neural network architectures,” IEEE Access, 2018, doi: 10.1109/ACCESS.2018.2877890.

A. Canziani, A. Paszke, and E. Culurciello, “An Analysis of Deep Neural Network Models for Practical Applications,” 2016.

S. Yang, P. Luo, C. C. Loy, and X. Tang, “WIDER FACE: A face detection benchmark,” in Proceedings of the IEEE Computer Society Conference on Computer Vision and Pattern Recognition, 2016, doi: 10.1109/CVPR.2016.596.

T. Ojala, M. Pietikäinen, and T. Mäenpää, “Multiresolution gray-scale and rotation invariant texture classification with local binary patterns,” IEEE Trans. Pattern Anal. Mach. Intell., 2002, doi: 10.1109/TPAMI.2002.1017623.



  • There are currently no refbacks.

Creative Commons License
This work is licensed under a Creative Commons Attribution-NonCommercial 4.0 International License.