{"status":"ok","message-type":"work","message-version":"1.0.0","message":{"indexed":{"date-parts":[[2026,2,6]],"date-time":"2026-02-06T00:26:33Z","timestamp":1770337593482,"version":"3.49.0"},"reference-count":46,"publisher":"MDPI AG","issue":"4","license":[{"start":{"date-parts":[[2021,2,23]],"date-time":"2021-02-23T00:00:00Z","timestamp":1614038400000},"content-version":"vor","delay-in-days":0,"URL":"https:\/\/creativecommons.org\/licenses\/by\/4.0\/"}],"content-domain":{"domain":[],"crossmark-restriction":false},"short-container-title":["Sensors"],"abstract":"<jats:p>One of the biggest challenges of training deep neural network is the need for massive data annotation. To train the neural network for object detection, millions of annotated training images are required. However, currently, there are no large-scale thermal image datasets that could be used to train the state of the art neural networks, while voluminous RGB image datasets are available. This paper presents a method that allows to create hundreds of thousands of annotated thermal images using the RGB pre-trained object detector. A dataset created in this way can be used to train object detectors with improved performance. The main gain of this work is the novel method for fully automatic thermal image labeling. The proposed system uses the RGB camera, thermal camera, 3D LiDAR, and the pre-trained neural network that detects objects in the RGB domain. Using this setup, it is possible to run the fully automated process that annotates the thermal images and creates the automatically annotated thermal training dataset. As the result, we created a dataset containing hundreds of thousands of annotated objects. This approach allows to train deep learning models with similar performance as the common human-annotation-based methods do. This paper also proposes several improvements to fine-tune the results with minimal human intervention. Finally, the evaluation of the proposed solution shows that the method gives significantly better results than training the neural network with standard small-scale hand-annotated thermal image datasets.<\/jats:p>","DOI":"10.3390\/s21041552","type":"journal-article","created":{"date-parts":[[2021,2,23]],"date-time":"2021-02-23T20:19:36Z","timestamp":1614111576000},"page":"1552","update-policy":"https:\/\/doi.org\/10.3390\/mdpi_crossmark_policy","source":"Crossref","is-referenced-by-count":17,"title":["Fully Automated DCNN-Based Thermal Images Annotation Using Neural Network Pretrained on RGB Data"],"prefix":"10.3390","volume":"21","author":[{"ORCID":"https:\/\/orcid.org\/0000-0002-6813-4318","authenticated-orcid":false,"given":"Adam","family":"Ligocki","sequence":"first","affiliation":[{"name":"Robotics and AI Research Group, Faculty of Electrical Engineering, Brno University of Technology, 61600 Brno, Czech Republic"}]},{"ORCID":"https:\/\/orcid.org\/0000-0001-7519-2092","authenticated-orcid":false,"given":"Ales","family":"Jelinek","sequence":"additional","affiliation":[{"name":"Cybernetics and Robotics Research Group, Central European Institute of Technology, Brno University of Technology, 61600 Brno, Czech Republic"}]},{"ORCID":"https:\/\/orcid.org\/0000-0003-2993-7772","authenticated-orcid":false,"given":"Ludek","family":"Zalud","sequence":"additional","affiliation":[{"name":"Robotics and AI Research Group, Faculty of Electrical Engineering, Brno University of Technology, 61600 Brno, Czech Republic"}]},{"ORCID":"https:\/\/orcid.org\/0000-0001-8767-0864","authenticated-orcid":false,"given":"Esa","family":"Rahtu","sequence":"additional","affiliation":[{"name":"Artificial Intelligence and Vision Research Group, Department of Computer Science, Tampere University, 33101 Tampere, Finland"}]}],"member":"1968","published-online":{"date-parts":[[2021,2,23]]},"reference":[{"key":"ref_1","doi-asserted-by":"crossref","unstructured":"Zalud, L., and Kocmanova, P. (2013, January 21\u201326). Fusion of thermal imaging and CCD camera-based data for stereovision visual telepresence. Proceedings of the 2013 IEEE International Symposium on Safety, Security, and Rescue Robotics (SSRR), Link\u00f6ping, Sweden.","DOI":"10.1109\/SSRR.2013.6719344"},{"key":"ref_2","first-page":"1097","article-title":"Imagenet classification with deep convolutional neural networks","volume":"25","author":"Krizhevsky","year":"2012","journal-title":"Adv. Neural Inf. Process. Syst."},{"key":"ref_3","unstructured":"Simonyan, K., and Zisserman, A. (2014). Very deep convolutional networks for large-scale image recognition. arXiv."},{"key":"ref_4","doi-asserted-by":"crossref","unstructured":"Szegedy, C., Liu, W., Jia, Y., Sermanet, P., Reed, S., Anguelov, D., Erhan, D., Vanhoucke, V., and Rabinovich, A. (2015, January 7\u201312). Going deeper with convolutions. Proceedings of the IEEE Conference on Computer Vision and Pattern recognition, Boston, MA, USA.","DOI":"10.1109\/CVPR.2015.7298594"},{"key":"ref_5","doi-asserted-by":"crossref","unstructured":"He, K., Zhang, X., Ren, S., and Sun, J. (2016, January 27\u201330). Deep residual learning for image recognition. Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, Las Vegas, NV, USA.","DOI":"10.1109\/CVPR.2016.90"},{"key":"ref_6","doi-asserted-by":"crossref","unstructured":"Huang, G., Liu, Z., Van Der Maaten, L., and Weinberger, K.Q. (2017, January 21\u201326). Densely connected convolutional networks. Proceedings of the IEEE Conference on Computer Vision and Pattern recognition, Honolulu, HI, USA.","DOI":"10.1109\/CVPR.2017.243"},{"key":"ref_7","doi-asserted-by":"crossref","first-page":"211","DOI":"10.1007\/s11263-015-0816-y","article-title":"Imagenet large scale visual recognition challenge","volume":"115","author":"Russakovsky","year":"2015","journal-title":"Int. J. Comput. Vis."},{"key":"ref_8","doi-asserted-by":"crossref","unstructured":"Ligocki, A., Jelinek, A., and Zalud, L. (2019). Brno Urban Dataset\u2013The New Data for Self-Driving Agents and Mapping Tasks. arXiv.","DOI":"10.1109\/ICRA40945.2020.9197277"},{"key":"ref_9","unstructured":"Ligocki, A., Jelinek, A., and Zalud, L. (2020). Atlas Fusion\u2013Modern Framework for Autonomous Agent Sensor Data Fusion. arXiv."},{"key":"ref_10","unstructured":"FLIR Systems, I. (2020, June 01). FREE FLIR Thermal Dataset for Algorithm Training. Available online: https:\/\/www.flir.com\/oem\/adas\/adas-dataset-form\/."},{"key":"ref_11","unstructured":"FLIR Systems, I. (2020, June 01). Enhanced San Francisco Dataset. Available online: https:\/\/www.flir.eu\/oem\/adas\/dataset\/san-francisco-dataset\/."},{"key":"ref_12","unstructured":"FLIR Systems, I. (2020, June 01). FLIR European Regional Thermal Dataset for Algorithm Training. Available online: https:\/\/www.flir.eu\/oem\/adas\/dataset\/european-regional-thermal-dataset\/."},{"key":"ref_13","doi-asserted-by":"crossref","first-page":"115","DOI":"10.1038\/nature21056","article-title":"Dermatologist-level classification of skin cancer with deep neural networks","volume":"542","author":"Esteva","year":"2017","journal-title":"Nature"},{"key":"ref_14","doi-asserted-by":"crossref","unstructured":"Hwang, S., Park, J., Kim, N., Choi, Y., and Kweon, I.S. (2015, January 7\u201312). Multispectral Pedestrian Detection: Benchmark Dataset and Baselines. Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (CVPR), Boston, MA, USA.","DOI":"10.1109\/CVPR.2015.7298706"},{"key":"ref_15","doi-asserted-by":"crossref","first-page":"210","DOI":"10.1016\/j.cviu.2011.10.006","article-title":"An iterative integrated framework for thermal\u2013visible image registration, sensor fusion, and people tracking for video surveillance applications","volume":"116","author":"Torabi","year":"2012","journal-title":"Comput. Vis. Image Underst."},{"key":"ref_16","doi-asserted-by":"crossref","unstructured":"Khellal, A., Ma, H., and Fei, Q. (2015). Pedestrian classification and detection in far infrared images. International Conference on Intelligent Robotics and Applications, Springer.","DOI":"10.1007\/978-3-319-22879-2_47"},{"key":"ref_17","doi-asserted-by":"crossref","unstructured":"Portmann, J., Lynen, S., Chli, M., and Siegwart, R. (June, January 31). People detection and tracking from aerial thermal views. Proceedings of the 2014 IEEE international conference on robotics and automation (ICRA), Hong Kong, China.","DOI":"10.1109\/ICRA.2014.6907094"},{"key":"ref_18","doi-asserted-by":"crossref","first-page":"162","DOI":"10.1016\/j.cviu.2006.06.010","article-title":"Background-subtraction using contour-based fusion of thermal and visible imagery","volume":"106","author":"Davis","year":"2007","journal-title":"Comput. Vis. Image Underst."},{"key":"ref_19","doi-asserted-by":"crossref","unstructured":"Wu, Z., Fuller, N., Theriault, D., and Betke, M. (2014, January 23\u201328). A thermal infrared video benchmark for visual analysis. Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition Workshops, Columbus, OH, USA.","DOI":"10.1109\/CVPRW.2014.39"},{"key":"ref_20","doi-asserted-by":"crossref","first-page":"1231","DOI":"10.1177\/0278364913491297","article-title":"Vision meets robotics: The kitti dataset","volume":"32","author":"Geiger","year":"2013","journal-title":"Int. J. Robot. Res."},{"key":"ref_21","unstructured":"Yu, F., Xian, W., Chen, Y., Liu, F., Liao, M., Madhavan, V., and Darrell, T. (2018). Bdd100k: A diverse driving video database with scalable annotation tooling. arXiv."},{"key":"ref_22","doi-asserted-by":"crossref","first-page":"2702","DOI":"10.1109\/TPAMI.2019.2926463","article-title":"The apolloscape open dataset for autonomous driving and its application","volume":"42","author":"Huang","year":"2019","journal-title":"IEEE Trans. Pattern Anal. Mach. Intell."},{"key":"ref_23","doi-asserted-by":"crossref","first-page":"3","DOI":"10.1177\/0278364916679498","article-title":"1 year, 1000 km: The Oxford RobotCar dataset","volume":"36","author":"Maddern","year":"2017","journal-title":"Int. J. Robot. Res."},{"key":"ref_24","unstructured":"Nyberg, A. (2021, February 20). Transforming Thermal Images to Visible Spectrum Images Using Deep Learning. Available online: https:\/\/www.diva-portal.org\/smash\/get\/diva2:1255342\/FULLTEXT01.pdf."},{"key":"ref_25","unstructured":"Goodfellow, I., Pouget-Abadie, J., Mirza, M., Xu, B., Warde-Farley, D., Ozair, S., Courville, A., and Bengio, Y. (2014). Generative adversarial nets. arXiv."},{"key":"ref_26","doi-asserted-by":"crossref","first-page":"1837","DOI":"10.1109\/TIP.2018.2879249","article-title":"Synthetic data generation for end-to-end thermal infrared tracking","volume":"28","author":"Zhang","year":"2018","journal-title":"IEEE Trans. Image Process."},{"key":"ref_27","doi-asserted-by":"crossref","unstructured":"Kniaz, V.V., Knyaz, V.A., Hladuvka, J., Kropatsch, W.G., and Mizginov, V. (2018, January 8\u201314). Thermalgan: Multimodal color-to-thermal image translation for person re-identification in multispectral dataset. Proceedings of the European Conference on Computer Vision (ECCV), Munich, Germany.","DOI":"10.1007\/978-3-030-11024-6_46"},{"key":"ref_28","doi-asserted-by":"crossref","unstructured":"Tumas, P., and Serackis, A. (2018, January 8\u201310). Automated image annotation based on YOLOv3. Proceedings of the 2018 IEEE 6th Workshop on Advances in Information, Electronic and Electrical Engineering (AIEEE), Vilnius, Lithuania.","DOI":"10.1109\/AIEEE.2018.8592167"},{"key":"ref_29","doi-asserted-by":"crossref","unstructured":"Iva\u0161i\u0107-Kos, M., Kri\u0161to, M., and Pobar, M. (2019, January 16\u201317). Human detection in thermal imaging using YOLO. Proceedings of the 2019 5th International Conference on Computer and Technology Applications, Istanbul, Turkey.","DOI":"10.1145\/3323933.3324076"},{"key":"ref_30","doi-asserted-by":"crossref","unstructured":"Gomez, A., Conti, F., and Benini, L. (2018, January 8\u201310). Thermal image-based CNN\u2019s for ultra-low power people recognition. Proceedings of the 15th ACM International Conference on Computing Frontiers, Ischia, Italy.","DOI":"10.1145\/3203217.3204465"},{"key":"ref_31","doi-asserted-by":"crossref","unstructured":"Park, J., Chen, J., Cho, Y.K., Kang, D.Y., and Son, B.J. (2020). CNN-based person detection using infrared images for night-time intrusion warning systems. Sensors, 20.","DOI":"10.3390\/s20010034"},{"key":"ref_32","doi-asserted-by":"crossref","unstructured":"Ghose, D., Desai, S.M., Bhattacharya, S., Chakraborty, D., Fiterau, M., and Rahman, T. (2019, January 16\u201320). Pedestrian Detection in Thermal Images using Saliency Maps. Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition Workshops, Long Beach, CA, USA.","DOI":"10.1109\/CVPRW.2019.00130"},{"key":"ref_33","unstructured":"Adam, L., and Ales, J.L.Z. (2021, January 20). Brno Urban Dataset. Available online: https:\/\/github.com\/Robotics-BUT\/Brno-Urban-Dataset."},{"key":"ref_34","unstructured":"Ultralytics (2021, January 20). Yolov5. Available online: https:\/\/github.com\/ultralytics\/yolov5."},{"key":"ref_35","doi-asserted-by":"crossref","unstructured":"Redmon, J., Divvala, S., Girshick, R., and Farhadi, A. (2016, January 27\u201330). You only look once: Unified, real-time object detection. Proceedings of the IEEE Conference on Computer Vision and Pattern recognition, Las Vegas, NV, USA.","DOI":"10.1109\/CVPR.2016.91"},{"key":"ref_36","doi-asserted-by":"crossref","unstructured":"Girshick, R. (2015, January 7\u201313). Fast r-cnn. Proceedings of the IEEE International Conference on Computer Vision, Santiago, Chile.","DOI":"10.1109\/ICCV.2015.169"},{"key":"ref_37","doi-asserted-by":"crossref","unstructured":"Redmon, J., and Farhadi, A. (2017, January 21\u201326). YOLO9000: Better, faster, stronger. Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, Honolulu, HI, USA.","DOI":"10.1109\/CVPR.2017.690"},{"key":"ref_38","unstructured":"Paszke, A., Gross, S., Massa, F., Lerer, A., Bradbury, J., Chanan, G., Killeen, T., Lin, Z., Gimelshein, N., and Antiga, L. (2019). PyTorch: An Imperative Style, High-Performance Deep Learning Library. Advances in Neural Information Processing Systems 32, Curran Associates, Inc."},{"key":"ref_39","unstructured":"Adam, L., and Ales, J.L.Z. (2021, January 20). Atlas Fusion. Available online: https:\/\/github.com\/Robotics-BUT\/Atlas-Fusion."},{"key":"ref_40","unstructured":"Merriaux, P., Dupuis, Y., Boutteau, R., Vasseur, P., and Savatier, X. (2017). LiDAR point clouds correction acquired from a moving car based on CAN-bus data. arXiv."},{"key":"ref_41","doi-asserted-by":"crossref","unstructured":"Zhang, B., Zhang, X., Wei, B., and Qi, C. (2019, January 8\u201312). A Point Cloud Distortion Removing and Mapping Algorithm based on Lidar and IMU UKF Fusion. Proceedings of the 2019 IEEE\/ASME International Conference on Advanced Intelligent Mechatronics (AIM), Hong Kong, China.","DOI":"10.1109\/AIM.2019.8868647"},{"key":"ref_42","doi-asserted-by":"crossref","unstructured":"Lin, T.Y., Maire, M., Belongie, S., Hays, J., Perona, P., Ramanan, D., Doll\u00e1r, P., and Zitnick, C.L. (2014). Microsoft coco: Common objects in context. European Conference on Computer Vision, Springer.","DOI":"10.1007\/978-3-319-10602-1_48"},{"key":"ref_43","unstructured":"Jung, A.B. (2018, October 30). imgaug. Available online: https:\/\/github.com\/aleju\/imgaug."},{"key":"ref_44","doi-asserted-by":"crossref","unstructured":"Zoph, B., Cubuk, E.D., Ghiasi, G., Lin, T.Y., Shlens, J., and Le, Q.V. (2020). Learning data augmentation strategies for object detection. European Conference on Computer Vision, Springer.","DOI":"10.1007\/978-3-030-58583-9_34"},{"key":"ref_45","unstructured":"Oksuz, K., Cam, B.C., Kalkan, S., and Akbas, E. (2019). Imbalance problems in object detection: A review. arXiv."},{"key":"ref_46","doi-asserted-by":"crossref","first-page":"303","DOI":"10.1007\/s11263-009-0275-4","article-title":"The pascal visual object classes (voc) challenge","volume":"88","author":"Everingham","year":"2010","journal-title":"Int. J. Comput. Vis."}],"container-title":["Sensors"],"original-title":[],"language":"en","link":[{"URL":"https:\/\/www.mdpi.com\/1424-8220\/21\/4\/1552\/pdf","content-type":"unspecified","content-version":"vor","intended-application":"similarity-checking"}],"deposited":{"date-parts":[[2025,10,11]],"date-time":"2025-10-11T05:27:14Z","timestamp":1760160434000},"score":1,"resource":{"primary":{"URL":"https:\/\/www.mdpi.com\/1424-8220\/21\/4\/1552"}},"subtitle":[],"short-title":[],"issued":{"date-parts":[[2021,2,23]]},"references-count":46,"journal-issue":{"issue":"4","published-online":{"date-parts":[[2021,2]]}},"alternative-id":["s21041552"],"URL":"https:\/\/doi.org\/10.3390\/s21041552","relation":{},"ISSN":["1424-8220"],"issn-type":[{"value":"1424-8220","type":"electronic"}],"subject":[],"published":{"date-parts":[[2021,2,23]]}}}