Md.Samin Rahman1, AsifAhammad Miazee1, Md. Mamun Ahmed1, Md.Aktarujjaman1
Keywords: Object recognition, SSD, MobileNet
Abstract: In the field of artificial intelligence, one of the challenging tasks is detecting real-time objects because it needs faster computation power to recognize the content at that moment. Here, a novel object recognition approach is proposed, which combines Single Shot Multi-Box Detection (SSD) with a lightweight network model known as MobileNet. SSD speeds up the classification of sub-windows by formulating the problem as a sequential decision process. Additionally, MobileNet provides better multi-scale handling to detect objects of all sizes without rescaling the input image. This speed-up builds upon the scale invariance property of image statistics in natural images that offers a powerful relationship for approximating feature responses of adjacent scales. Experimental results showed that this combination of MobileNet with the SSD template, which is the proposed novelty of the research, improves the level of validity when recognizing real-time household objects.
References
[1]T. D. R. Girshick, J. Donahue, and J. Malik. “Rich feature hierarchies for accurate object detection and semantic segmentation”. Computer Vision and Pattern Recognition (CVPR) Conference,2014.
[2]R. Girshick. “Fast R-CNN”, In Proceedings of the IEEE International Conference on Computer Vision, pages 1440–1448, 2015.
[3]S. Ren, K. He, R. Girshick, and J. Sun. “Faster R-CNN: Towards real-time object detection with region proposal networks”. In Advances in neural information processing systems, pages 91–99, 2015.
[4]A. Geiger, P. Lenz, C. Stiller, and R. Urtasun. “Vision meets robotics: The kitti dataset”. International Journal of Robotics Research (IJRR), 2013.
[5]J. Redmon, S. Divvala, R. Girshick, and A. Farhadi. “You only look once: Unified, real-time object detection”, In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pages 779–788, 2016.
[6]K. Kang, H. Li, T. Xiao, W. Ouyang, J. Yan, X. Liu, and X. Wang, “Object detection in videos with tubelet proposal networks,” in CVPR, 2017.
[7]J. Wan, D. Wang, S. C. H. Hoi, P. Wu, J. Zhu, Y. Zhang, and J. Li, “Deep learning for content-based image retrieval: A comprehensive study,” in ACM MM, 2014.
[8]Chu, W.T., Cheng, W.C.: Manga-specific features and latent style model for manga style analysis. In: International Conference on Acoustics, Speech and Signal Processing, pp. 1332–1336. IEEE (2016).
[9]C. Wojek, P. Dollar, B. Schiele, and P. Perona, “Pedestrian detection: An evaluation of the state of the art,” IEEE Trans. Pattern Anal. Mach. Intell., vol. 34, no. 4, p. 743, 2012.
[10]D. Ribeiro, A. Mateus, J. C. Nascimento, and P. Miraldo, “A real-time pedestrian detector using deep learning for human-aware navigation,” arXiv:1607.04441, 2016.
[11]W.T. Chu, W.W. Li, “Manga Face Net: Face Detection in Manga based on Deep Neural Network,” Proceedings of the 2017 ACM on International Conference on Multimedia Retrieval, pp. 412-415, June 2017.
[12]S. Kanimozhi, G. Gayathri and T. Mala, “Multiple Real-time object identification using Single-Shot Multi-Box detection,” 2019 International Conference on Computational Intelligence in Data Science (ICCIDS), Chennai, India, 2019, pp. 1-5.
[13]J. Redmon, S. Divvala, R. Girshick, and A. Farhadi.―You only look once: Unified, real-time object detection‖, In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pages 779–788, 2016.
[14]LubomirBourdev and Jonathan Brandt. Robust object detection via soft cascade. In Computer Vision and Pattern Recognition, 2005. CVPR 2005. IEEE Computer Society Conference on, volume 2, pages 236–243. IEEE, 2005.
[15]Roman Juránek. Detection of dogs in video using statistical classifiers. In Computer Vision and Graphics, pages 249–259. Springer, 2009.
[16]K. Kang, H. Li, T. Xiao, W. Ouyang, J. Yan, X. Liu, and X. Wang, ―Object detection in videos with tubelet proposal networks,‖ in CVPR, 2017.
[17]P. Doll´ar, R. Appel, S. Belongie, and P. Perona, “Fast feature pyramids for object detection,” PAMI, 2014.