Obstacle Detection Using Monocular Camera with Mask R-CNN Method

Ari Santoso, Rafif Artono Darmawan, Mohamad Abdul Hady, Ali Fatoni


An autonomous car is a car that can operate without being controlled by humans. Autonomous cars must be able to detect obstacles so that the car does not hit objects that are on the path to be traversed. Therefore, it takes a variety of sensors to determine the surrounding conditions. The sensors commonly used in autonomous cars are cameras and LiDAR. Compared to LiDAR, the camera has a relatively long detection distance, lower cost, and can be used to classify objects. In this final project, the monocular camera and Mask R-CNN algorithm are used to create a system that can detect obstacles in the form of cars, motorcycles, and humans. The system will generate segmentation instances, bounding boxes, classifications, distance, and width estimation for each detected object. By using a custom dataset that is created manually it fits perfectly with the surrounding environment. The system used can produce a Mean Average Precision of 0.81, a Mean Average Recall of 0.89, an F1 score of 0.86, and a Mean Absolute Percentage Error of 13.4% for the distance estimator. The average detection speed of each image is 0.29 seconds.

Full Text:



E. Kim, P. Muennig, and Z. Rosen, “Vision zero: a toolkit for road safety in the modern era,” Inj. Epidemiol., vol. 4, no. 1, pp. 1–9, 2017, doi: 10.1186/s40621-016-0098-z.

N. O. and A. A. N. C. S. Center, “Lidar 101 : An Introduction to Lidar Technology, Data, and Applications,” NOAA Coast. Serv. Cent., no. November, p. 76, 2012.

G. Sharabok, “Why Tesla Won’t Use LIDAR. And which technology is ideal for… | by German Sharabok | Towards Data Science,” Towards Data Science, 2020. https://towardsdatascience.com/why-tesla-wont-use-lidar-57c325ae2ed5 (accessed Jun. 30, 2021).

K. He, G. Gkioxari, P. Dollar, and R. Girshick, “Mask R-CNN,” Proc. IEEE Int. Conf. Comput. Vis., vol. 2017-Octob, pp. 2980–2988, 2017, doi: 10.1109/ICCV.2017.322.

A. Bochkovskiy, C.-Y. Wang, and H.-Y. M. Liao, “YOLOv4: Optimal Speed and Accuracy of Object Detection,” 2020, [Online]. Available: http://arxiv.org/abs/2004.10934.

E. Prasetyo, N. Suciati, and C. Fatichah, “A Comparison of YOLO and Mask R-CNN for Segmenting Head and Tail of Fish,” ICICoS 2020 - Proceeding 4th Int. Conf. Informatics Comput. Sci., pp. 2–7, 2020, doi: 10.1109/ICICoS51170.2020.9299024.

A. Firmansyah, A. A. Firmansyah, R. E. AK, and A. Santoso, “Deteksi Halangan Menggunakan Metode Stereo R-CNN pada Mobil Otonom,” J. Tek. ITS, vol. 9, no. 2, pp. E160–E166, 2021, [Online]. Available: http://ejurnal.its.ac.id/index.php/teknik/article/view/53687%0Ahttps://ejurnal.its.ac.id.

A. O. Kurniawan et al., “Pendeteksi Objek untuk Kendaraan Otonom Menggunakan Single Kamera Berbasis YOLOv4,” vol. 1, no. 1, pp. 1–6, 2021.

K. N. Banjarnahor, R. Effendi, and Y. Bilfaqih, “Sistem Pengenalan Dan Estimasi Jarak Rambu Lalu Lintas Berbasis Mask R-CNN Dan Kamera Monokuler Untuk Kendaraan Otonom,” vol. 1, no. 1, pp. 1–7, 2021.

A. Dutta and A. Zisserman, “The VIA annotation software for images, audio and video,” MM 2019 - Proc. 27th ACM Int. Conf. Multimed., pp. 2276–2279, 2019, doi: 10.1145/3343031.3350535.

DOI: https://doi.org/10.12962/jaree.v6i2.325


  • There are currently no refbacks.

Creative Commons License
This work is licensed under a Creative Commons Attribution-NonCommercial 4.0 International License.