What is the primary challenge addressed by the Point RCNN framework in rotated object detection, and how does it tackle this challenge differently from previous methods?

The primary challenge addressed by the Point RCNN framework is the boundary discontinuity problem in angle-based rotated object detection methods. Previous methods often depended on predefined rotated anchor boxes which could suffer from inaccuracies due to arbitrary orientations and large scale variations in aerial images. Point RCNN tackles this issue by using a purely angle-free framework that learns representative points instead of angles. This approach eliminates the boundary problem by directly learning and refining the corner points of rotated objects through the MinAreaRect function of OpenCV, providing more precise object localization without relying on angle calculations .

How does Point RCNN's angle-free framework impact its adaptability to diverse rotated object detection challenges?

The angle-free framework of Point RCNN enhances its adaptability to diverse rotated object detection challenges by circumventing the limitations of angle-based approaches, such as boundary discontinuity and angle ambiguity. By using representative points to capture object shapes and orientations, Point RCNN adapts well to arbitrary rotations, variations in scale, and high-density object scenarios found in aerial images. This flexible, robust detection capability across varied conditions results in consistently high performance across multiple datasets, outperforming conventional angle-based models and demonstrating its wide applicability in different detection contexts .

How does the Point RCNN manage to outperform existing object detection methods on datasets like DOTA-v1.0 and HRSC2016?

Point RCNN outperforms existing object detection methods on datasets like DOTA-v1.0 and HRSC2016 due to its angle-free framework that focuses on learning representative points and refining RRoIs rather than relying on angle-based bounding boxes. This approach mitigates the boundary discontinuity problem common in previous methods. The framework also employs balanced dataset strategies to handle category imbalance, leading to improved mAP scores. Consequently, Point RCNN achieved 80.71 mAP in DOTA-v1.0 and 90.53 mAP in HRSC2016, marking significant improvements over prior methodologies .

What role does the MinAreaRect function from OpenCV play in the Point RCNN framework?

In the Point RCNN framework, the MinAreaRect function from OpenCV is crucial in converting learned representative points into precise rotated regions of interest (RRoIs). This function determines the minimum-area bounding rectangle for a set of points, effectively shaping the initial rotated object proposals without specific angle calculations. By doing so, it supports the angle-free aspect of the framework, allowing Point RCNN to accurately delineate object boundaries in aerial images and overcome the boundary discontinuity problem present in angle-based methods .

In what way does the Point RCNN framework achieve state-of-the-art performance on various aerial image datasets?

Point RCNN achieves state-of-the-art performance on various aerial image datasets by using a two-stage angle-free detection approach that precisely identifies rotated regions of interest (RRoIs) using learned representative points instead of relying on angle-based detection. This method not only improves detection accuracy but also handles densely packed objects more effectively. Consequently, Point RCNN achieves higher mAP scores on datasets such as DOTA-v1.0 with 80.71 mAP, DOTA-v1.5 with 79.31 mAP, HRSC2016 with 90.53 mAP, and UCAS-AOD with 90.04 mAP, surpassing previous methods .

What specific methodological changes in Point RCNN contribute to resolving the boundary discontinuity problem in rotated object detection?

Point RCNN resolves the boundary discontinuity problem by adopting a purely angle-free framework that eschews the traditional use of predefined rotated anchor boxes. Instead, it uses representative points and a two-stage detection process involving PointRPN and PointReg. PointRPN generates rotated region proposals using the MinAreaRect function, while PointReg further refines these proposals by precisely adjusting the corner points of each rotated bounding box. This eliminates angle ambiguity and the boundary discontinuity problem, enabling more accurate and robust detection of rotated objects in aerial images .

What innovative strategies does Point RCNN employ to handle challenges related to category imbalance in aerial images?

Point RCNN addresses category imbalance challenges by implementing resampling strategies targeted at minority classes within the datasets. This involves re-sampling images of rare categories to ensure balanced representation during the training process. This strategy stabilizes the convergence of the model and enhances its performance by mitigating biases towards more frequently occurring classes, thereby achieving more balanced detection outcomes and improving overall detection accuracy .

How does the Point RCNN framework differ from conventional angle-based models in its approach to feature extraction and object localization?

Unlike conventional angle-based models that rely on predefined rotated anchor boxes for feature extraction and object localization, the Point RCNN framework utilizes an angle-free approach where representative points define the object's shape and pose. This facilitates precise object localization, minimizing the influence of background content or less informative areas. By focusing on these representative points through its PointRPN and PointReg modules, Point RCNN achieves more accurate rotated bounding boxes without the need for angle regression, thereby improving detection performance .

How does the Point RCNN framework improve detection performance on imbalanced datasets in aerial images?

The Point RCNN framework addresses the problem of imbalanced datasets by implementing a balanced dataset strategy, which involves re-sampling images of rare categories. This approach helps stabilize the training procedure and improve detection performance by ensuring that minority classes are adequately represented during training, thus preventing the model from being biased towards more frequent classes. As a result, the framework achieves better overall performance, as evidenced by the increase in mAP scores on datasets like DOTA-v1.0 from 80.37 to 80.71 .

Discuss the contribution of PointRPN and PointReg modules to the enhanced detection capabilities of Point RCNN.

The PointRPN and PointReg modules are key to Point RCNN's enhanced detection capabilities. PointRPN serves as the region proposal network, learning a set of representative points for each object in a coarse-to-fine manner. It generates rotated regions of interest (RRoIs) using the MinAreaRect function, effectively capturing the shape and orientation of objects. Following this, the PointReg module refines these RRoIs by applying rotate RoI Align to extract RRoI features and further refining the corner points. Together, these modules enable Point RCNN to achieve high accuracy in rotated object detection by eliminating the need for angle regression and addressing category imbalance .

Open navigation menu

Upload

0% found this document useful (0 votes)

687 views574 pages

Deep Learning and Computer Vision in Remote Sensing

Uploaded by

amir.rad

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

687 views574 pages

Deep Learning and Computer Vision in Remote Sensing

Uploaded by

amir.rad

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

Deep Learning

and Computer
Vision in Remote
Sensing
Edited by
Fahimeh Farahnakian, Jukka Heikkonen and Pouya Jafarzadeh
Printed Edition of the Special Issue Published in Remote Sensing

[Link]/journal/remotesensing
Deep Learning and Computer Vision
in Remote Sensing
Deep Learning and Computer Vision
in Remote Sensing

Editors
Fahimeh Farahnakian
Jukka Heikkonen
Pouya Jafarzadeh

MDPI • Basel • Beijing • Wuhan • Barcelona • Belgrade • Manchester • Tokyo • Cluj • Tianjin
Editors
Fahimeh Farahnakian Jukka Heikkonen Pouya Jafarzadeh
University of Turku University of Turku University of Turku
Turku Turku Turku
Finland Finland Finland

Editorial Ofﬁce
MDPI
St. Alban-Anlage 66
4052 Basel, Switzerland

This is a reprint of articles from the Special Issue published online in the open access journal
Remote Sensing (ISSN 2072-4292) (available at: [Link]
special issues/deep learning computer vision remote sensing).

For citation purposes, cite each article independently as indicated on the article page online and as
indicated below:

LastName, A.A.; LastName, B.B.; LastName, C.C. Article Title. Journal Name Year, Volume Number,
Page Range.

ISBN 978-3-0365-6368-8 (Hbk)

ISBN 978-3-0365-6369-5 (PDF)

© 2023 by the authors. Articles in this book are Open Access and distributed under the Creative
Commons Attribution (CC BY) license, which allows users to download, copy and build upon
published articles, as long as the author and publisher are properly credited, which ensures maximum
dissemination and a wider impact of our publications.
The book as a whole is distributed by MDPI under the terms and conditions of the Creative Commons
license CC BY-NC-ND.
Contents

About the Editors . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . ix

José Francisco Guerrero Tello, Mauro Coltelli, Maria Marsella, Angela Celauro and José
Antonio Palenzuela Baena
Convolutional Neural Network Algorithms for Semantic Segmentation of Volcanic Ash Plumes
Using Visible Camera Imagery
Reprinted from: Remote Sens. 2022, 14, 4477, doi:10.3390/rs14184477 . . . . . . . . . . . . . . . . . 1

Hoàng-Ân Lê, Heng Zhang, Minh-Tan Pham, and Sébastien Lefèvre

Mutual Guidance Meets Supervised Contrastive Learning: Vehicle Detection in Remote Sensing
Images
Reprinted from: Remote Sens. 2022, 14, 3689, doi:10.3390/rs14153689 . . . . . . . . . . . . . . . . . 19

Nisha Maharjan, Hiroyuki Miyazaki, Bipun Man Pati, Matthew N. Dailey, Sangam Shrestha
and Tai Nakamura
Detection of River Plastic Using UAV Sensor Data and Deep Learning
Reprinted from: Remote Sens. 2022, 14, 3049, doi:10.3390/rs14133049 . . . . . . . . . . . . . . . . . 37

Qiang Zhou, Chaohui Yu

Point RCNN: An Angle-Free Framework for Rotated Object Detection
Reprinted from: Remote Sens. 2022, 14, 2605, doi:10.3390/rs14112605 . . . . . . . . . . . . . . . . . 67

Mingming Wang, Qingkui Chen and Zhibing Fu

LSNet: Learned Sampling Network for 3D Object Detection from Point Clouds
Reprinted from: Remote Sens. 2022, 14, 1539, doi:10.3390/rs14071539 . . . . . . . . . . . . . . . . . 89

Jianxiang Li, Yan Tian, Yiping Xu and Zili Zhang

Oriented Object Detection in Remote Sensing Images with Anchor-Free Oriented Region
Proposal Network
Reprinted from: Remote Sens. 2022, 14, 1246, doi:10.3390/rs14051246 . . . . . . . . . . . . . . . . . 111

Chuan Xu, Chang Liu, Hongli Li, Zhiwei Ye, Haigang Sui and Wei Yang
Multiview Image Matching of Optical Satellite and UAV Based on a Joint Description Neural
Network
Reprinted from: Remote Sens. 2022, 14, 838, doi:10.3390/rs14040838 . . . . . . . . . . . . . . . . . 133

Omid Abdi, Jori Uusitalo and Veli-Pekka Kivinen

Logging Trail Segmentation via a Novel U-Net Convolutional Neural Network and
High-Density Laser Scanning Data
Reprinted from: Remote Sens. 2022, 14, 349, doi:10.3390/rs14020349 . . . . . . . . . . . . . . . . . 153

Yuxiang Cai, Yingchun Yang, Qiyi Zheng, Zhengwei Shen, Yongheng Shang, Jianwei Yin
and Zhongtian Shi
BiFDANet: Unsupervised Bidirectional Domain Adaptation for Semantic Segmentation of
Remote Sensing Images
Reprinted from: Remote Sens. 2022, 14, 190, doi:10.3390/rs14010190 . . . . . . . . . . . . . . . . . 175

Zewei Wang, Pengfei Yang, Haotian Liang, Change Zheng, Jiyan Yin, Ye Tian and Wenbin
Cui
Semantic Segmentation and Analysis on Sensitive Parameters of Forest Fire Smoke Using
Smoke-Unet and Landsat-8 Imagery
Reprinted from: Remote Sens. 2022, 14, 45, doi:10.3390/rs14010045 . . . . . . . . . . . . . . . . . . 203

v
Bo Huang, Zhiming Guo, Liaoni Wu, Boyong He, Xianjiang Li and Yuxing Lin
Pyramid Information Distillation Attention Network for Super-Resolution Reconstruction of
Remote Sensing Images
Reprinted from: Remote Sens. 2021, 13, 5143, doi:10.3390/rs13245143 . . . . . . . . . . . . . . . . . 223

Zhen Wang, Nannan Wu, Xiaohan Yang, Bingqi Yan and Pingping Liu
Deep Learning Triplet Ordinal Relation Preserving Binary Code for Remote Sensing Image
Retrieval Task
Reprinted from: Remote Sens. 2021, 13, 4786, doi:10.3390/rs13234786 . . . . . . . . . . . . . . . . . 245

Xiangkai Xu, Zhejun Feng, Changqing Cao, Mengyuan Li, Jin Wu, Zengyan Wu, et al.
An Improved Swin Transformer-Based Model for Remote Sensing Object Detection and
Instance Segmentation
Reprinted from: Remote Sens. 2021, 13, 4779, doi:10.3390/rs13234779 . . . . . . . . . . . . . . . . . 263

Weisheng Li, Minghao Xiang and Xuesong Liang

A Dense Encoder–Decoder Network with Feedback Connections for Pan-Sharpening
Reprinted from: Remote Sens. 2021, 13, 4505, doi:10.3390/rs13224505 . . . . . . . . . . . . . . . . . 283

Xue Rui, Yang Cao, Xin Yuan, Yu Kang and Weiguo Song
DisasterGAN: Generative Adversarial Networks for Remote Sensing Disaster Image Generation
Reprinted from: Remote Sens. 2021, 13, 4284, doi:10.3390/rs13214284 . . . . . . . . . . . . . . . . . 313

Wenjie Zi, Wei Xiong, Hao Chen, Jun Li and Ning Jing
SGA-Net: Self-Constructing Graph Attention Neural Network for Semantic Segmentation of
Remote Sensing Images
Reprinted from: Remote Sens. 2021, 13, 4201, doi:rs13214201 . . . . . . . . . . . . . . . . . . . . . . 331

Javier Marı́n and Sergio Escalera

SSSGAN: Satellite Style and Structure Generative Adversarial Networks
Reprinted from: Remote Sens. 2021, 13, 3984, doi:10.3390/rs13193984 . . . . . . . . . . . . . . . . . 351

Lei Fan, Yang Zeng, Qi Yang, Hongqiang Wang and Bin Deng
Fast and High-Quality 3-D Terahertz Super-Resolution Imaging Using Lightweight SR-CNN
Reprinted from: Remote Sens. 2021, 13, 3800, doi:10.3390/rs13193800 . . . . . . . . . . . . . . . . . 373

Jian Wang, Le Yang and Fan Li

Predicting Arbitrary-Oriented Objects as Points in Remote Sensing Images
Reprinted from: Remote Sens. 2021, 13, 3731, doi:10.3390/rs13183731 . . . . . . . . . . . . . . . . . 393

Xu He, Shiping Ma, Linyuan He, Le Ru and Chen Wang

Learning Rotated Inscribed Ellipse for Oriented ObjectDetection in Remote Sensing Images
Reprinted from: Remote Sens. 2021, 13, 3622, doi:10.3390/rs13183622 . . . . . . . . . . . . . . . . . 413

Yutong Jia, Gang Wan, Lei Liu, Jue Wang, Yitian Wu, Naiyang Xue, Ying Wang and Rixin
Yang
Split-Attention Networks with Self-Calibrated Convolution for Moon Impact Crater Detection
from Multi-Source Data
Reprinted from: Remote Sens. 2021, 13, 3193, doi:rs13163193 . . . . . . . . . . . . . . . . . . . . . . 439

Zhongwei Li, Xue Zhu, Ziqi Xin, Fangming Guo, Xingshuai Cui and Leiquan Wang
Variational Generative Adversarial Network with Crossed Spatial and Spectral Interactions for
Hyperspectral Image Classiﬁcation
Reprinted from: Remote Sens. 2021, 13, 3131, doi:10.3390/rs13163131 . . . . . . . . . . . . . . . . . 459

vi
Ming Li, Lin Lei, Yuqi Tang, Yuli Sun and Gangyao Kuang
An Attention-Guided Multilayer Feature Aggregation Network for Remote Sensing Image
Scene Classiﬁcation
Reprinted from: Remote Sens. 2021, 13, 3113, doi:10.3390/rs13163113 . . . . . . . . . . . . . . . . . 483

Shengjing Tian, Xiuping Liu, Meng Liu, Yuhao Bian, Junbin Gao and Baocai Yin
Learning the Incremental Warp for 3D Vehicle Tracking in LiDAR Point Clouds
Reprinted from: Remote Sens. 2021, 13, 2770, doi:10.3390/rs13142770 . . . . . . . . . . . . . . . . . 505

Yuhao Qing, Wenyi Liu, Liuyan Feng and Wanjia Gao

Improved YOLO Network for Free-Angle Remote Sensing Target Detection
Reprinted from: Remote Sens. 2021, 13, 2171, doi:10.3390/rs13112171 . . . . . . . . . . . . . . . . . 527

Shanchen Pang, Pengfei Xie, Danya Xu, Fan Meng, Xixi Tao, Bowen Li, Ying Li and Tao Song
NDFTC: A New Detection Framework of Tropical Cyclones from Meteorological Satellite
Images with Deep Transfer Learning
Reprinted from: Remote Sens. 2021, 13, 1860, doi:10.3390/rs13091860 . . . . . . . . . . . . . . . . . 547

vii
About the Editors
Fahimeh Farahnakian
Fahimeh Farahnakian is currently an adjunct professor (docent) in the Algorithms and
Computational Intelligence Research Lab, Department of Future Technologies, University of Turku,
Finland. Her research interests include the theory and algorithms of machine learning, computer
vision and data analysis methods, and their applications in various different ﬁelds. She has published
+30 articles in journal and conference proceedings. She is a member of the IEEE and has also served
on the program committees of numerous scientiﬁc conferences.

Jukka Heikkonen
Jukka Heikkonen is a full professor and head of the Algorithms and Computational Intelligence
Research Lab, University of Turku, Finland. His research focuses on data analytics, machine learning,
and autonomous systems. He has worked at top-level research laboratories and Centers of Excellence
in Finland and international organizations (the European Commission and Japan) and has led many
international and national research projects. He has authored more than 150 peer-reviewed scientific
articles. He has served as an organizing/program committee member in numerous conferences and
has acted as a guest editor in five Special Issues of scientific journals.

Pouya Jafarzadeh
Pouya Jafarzadeh received an MS degree in Technological Competence Management from the
University of Applied Silence, Turku, Finland. He is currently working toward a PhD degree in the
Algorithms and Computational Intelligence Research Lab, University of Turku, Finland. His research
interests include artiﬁcial intelligence, machine learning, deep learning, computer vision, and data
analysis. He is a frequent reviewer for research journals.

ix
remote sensing
Article
Convolutional Neural Network Algorithms for Semantic
Segmentation of Volcanic Ash Plumes Using Visible
Camera Imagery
José Francisco Guerrero Tello 1, *, Mauro Coltelli 1 , Maria Marsella 2 , Angela Celauro 2
and José Antonio Palenzuela Baena 2

1 Istituto Nazionale di Geoﬁsica e Vulcanologia, Osservatorio Etneo, Piazza Roma 2, 95125 Catania, Italy
2 Department of Civil, Building and Environmental Engineering, Sapienza University of Rome, Via Eudossiana 18,
00184 Roma, Italy
* Correspondence: [Link]@[Link]

Abstract: In the last decade, video surveillance cameras have experienced a great technological
advance, making capturing and processing of digital images and videos more reliable in many fields
of application. Hence, video-camera-based systems appear as one of the techniques most widely used
in the world for monitoring volcanoes, providing a low cost and handy tool in emergency phases,
although the processing of large data volumes from continuous acquisition still represents a challenge.
To make these systems more effective in cases of emergency, each pixel of the acquired images must be
assigned to class labels to categorise them and to locate and segment the observable eruptive activity.
This paper is focused on the detection and segmentation of volcanic ash plumes using convolutional
neural networks. Two well-established architectures, the segNet and the U-Net, have been used for
the processing of in situ images to validate their usability in the field of volcanology. The dataset
Citation: Guerrero Tello, J.F.; Coltelli,
fed into the two CNN models was acquired from in situ visible video cameras from a ground-based
M.; Marsella, M.; Celauro, A.;
network (Etna_NETVIS) located on Mount Etna (Italy) during the eruptive episode of 24th December
Palenzuela Baena, J.A. Convolutional
2018, when 560 images were captured from three different stations: CATANIA-CUAD, BRONTE, and
Neural Network Algorithms for
Semantic Segmentation of Volcanic
Mt. CAGLIATO. In the preprocessing phase, data labelling for computer vision was used, adding
Ash Plumes Using Visible Camera one meaningful and informative label to provide eruptive context and the appropriate input for the
Imagery. Remote Sens. 2022, 14, 4477. training of the machine-learning neural network. Methods presented in this work offer a generalised
[Link] toolset for volcano monitoring to detect, segment, and track ash plume emissions. The automatic
detection of plumes helps to significantly reduce the storage of useless data, starting to register and
Academic Editors: Jukka Heikkonen,
save eruptive events at the time of unrest when a volcano leaves the rest status, and the semantic
Fahimeh Farahnakian and Pouya
Jafarzadeh
segmentation allows volcanic plumes to be tracked automatically and allows geometric parameters
to be calculated.
Received: 4 July 2022
Accepted: 29 August 2022 Keywords: ANN; automatic classification; risk mitigation; machine learning
Published: 8 September 2022

Publisher’s Note: MDPI stays neutral

with regard to jurisdictional claims in
published maps and institutional affil- 1. Introduction
iations. Volcano monitoring is composed of a set of techniques that enable the measurement of
different parameters (geochemical, seismic, thermal, deformational, etc.) [1]. Keeping these
parameters under surveillance is essential for risk mitigation and guarantees security to the
population. These parameters allow us to know the state of internal and external activity
Copyright: © 2022 by the authors.
of a volcano and to know if there are changes in the behaviour of the volcano that can lead
Licensee MDPI, Basel, Switzerland.
This article is an open access article
to an eruption or to understand if there are changes during an eruptive event. Although
distributed under the terms and
seismic and geodetic instruments permit quasi-real-time monitoring, video cameras are
conditions of the Creative Commons also currently a standard and necessary tool for effective volcano observation [2,3].
Attribution (CC BY) license (https:// Explosive volcanic eruptions eject a big quantity of pyroclastic products into the
[Link]/licenses/by/ atmosphere. In these events, continuous surveillance is mandatory to avoid significant
4.0/). damage in rural and metropolitan areas [4] that may disrupt the surface and air traffic [5],

Remote Sens. 2022, 14, 4477. [Link] 1 [Link]

Remote Sens. 2022, 14, 4477

and even may cause negative impacts on human health [6]. In 1985, the eruption of “Nevado
del Ruiz” volcano in Colombia ejected more than 35 tons of pyroclastic flow that reached
30 km in height. This eruption melted the ice and created four lahars that descended
through the slopes of the volcano and destroyed a whole town called “Armero” located
50 km from the volcano, with a loss of 24.800 lives [7]. To counteract further disasters, it is
fundamental to create new methodologies and instruments based on innovation for risk
mitigation. Video cameras have proven suitable for tracking those pyroclastic products in
many volcanoes in the world, whether with visible (0.4–0.7 μm) or near-infrared (~1 μm)
wavelength. Both sensors are suitable to collect and analyse information at a long distance.
Video cameras installed on volcanoes often experience limited performance in relation
to crisis episodes. They are programmed to capture images in a specific time range (i.e., one
capture per minute, one capture every two minutes, etc.); those settings lead to the storage
of unnecessary data that need to be deleted manually by an operator with time-consuming
tasks. On the other hand, video cameras do not have an internal software to deeply analyse
images in real time. This work is carried out after downloading by applying different
computer vision techniques to calibrate the sensor [8] and extract relevant information
by edge-detection algorithms and GIS-based methods, such as contours detections and
statistics classification, such as PCA [9]. All these kinds of postprocessing procedures
involve semi-automatics and time-consuming tasks.
These limitations can be faced through machine-learning techniques for computing
vision. In the last decade, technological innovation has increased dramatically in the world
of artificial intelligence (AI) and machine learning (ML) in parallel to video cameras [10].
The convolutional neural networks (CNN) became popular because they outperformed any
other network architecture on computer vision [11]. Specifically, the architecture U-Net is
nowadays being routinely and successfully used in image processing, reaching an accuracy
similar to or even higher than other existing ANN, for example, of the FCN type [12–14],
providing multiple applications where pattern recognition and feature extraction play
an essential role. CNNs have been applied to find solutions to mitigate risk in different
environmental fields, such as for the detection and segmentation of smoke and forest
fires [15,16], flood detection [17], and to find solutions regarding global warming, for
example, through monitoring of the ice of the poles [18,19]. CNNs have been applied in
several studies in the field of volcanology for earthquake detection and classification [20,21],
for the classification of volcanic ash particles [22], and to validate their capability for real-
time monitoring of the persistent explosive activity of Stromboli volcano [23], for video
data characterisation [2], detection of volcanic unrest [24], and volcanic eruption detection
using satellite images [25–27]. Thus, the importance of applying architectures based on
CNN could be an alternative to improve the results obtained in the different scientific
works performed till now.
This research aims to create algorithms that help solve computer vision problems based
on deep learning for the detection and segmentation of the volcanic plume, providing an
effective tool for emergency management to risk management practitioners. The concept of
this tool focuses on a neural network which is fed with data from the 24th to 27th December
2018 eruptive event. The eruption that began at noon was preceded by 130 earthquake
tremors, the two strongest of which measured 4.0 and 3.9 on the Richter scale. From this
eruptive event, 560 images were collected and then preprocessed and split into 80% training
and 20% validation. The training dataset was used in the training of two very consolidated
models: the SegNet Deep Convolutional Encoder-Decoder and U-net architectures. In this
groundwork phase, more consolidated models were sought to have a large comparative
pool and to substantiate their use in the volcanological field. As a result, a trained model
is generated to automatically detect the beginning of an eruptive activity and tracking
the entire eruptive episode. Automatic detection of the volcanic plume supports volcanic
monitoring to store useful information enabling real-time tracking of the plume and the
extraction of concerning geometric parameters. By developing a comprehensive and
reliable approach, it is possible to extend it to many other explosive volcanoes. The current

2
Remote Sens. 2022, 14, 4477

results encourage a broader research objective that will be oriented towards the creation
of more advanced neural networks [2], deepening the real-time monitoring for observing
precursors, such as change in degassing state.

2. Geological Settings
Mt. Etna is a basaltic volcano located in Sicily in the middle of Gela-Catania foredeep,
at the front of the Hyblean Foreland [28] (Figure 1). This volcano is one of the most
active in the world with its nearly continuous eruptions and lava ﬂow emissions and,
with its dimensions, it represents a major potential risk to the community inhabiting
its surroundings.

Figure 1. Location of Etna volcano.

The geological map, updated in 2011 [29] at the scale of 1:50,000, is a dataset of the
Etna eruptions that occurred throughout its history (Figure 2, from [29], with modiﬁcations).
This information is fundamental for land management and emergency planning.

Figure 2. Geological map of Mt. Etna.

3
Remote Sens. 2022, 14, 4477

3. Etna_NETVIS Network
Mt. Etna has become one of the better monitored volcanoes in the world by using sev-
eral instrumental networks. One of them is the permanent terrestrial Network of Thermal
and Visible Sensors of Mount Etna, which comprises thermal and visible cameras located at
different sites on the southern and eastern flanks of Etna. The network, initially composed
of CANON VC-C4R visible (V) and FLIR A40 Thermal (T) cameras installed in Etna Cuad
(ECV), Etna Milo (EMV), Etna Montagnola (EMOV and EMOT), and Etna Nicolosi (ENV
and ENT), has been recently upgraded (since 2011) by adding high-resolution (H) sensors
(VIVOTEK IP8172 and FLIR A320) at the Etna Mt. Cagliato (EMCT and EMCH), Etna
Montagnola (EMOH), and Etna Bronte (EBVH) sites [3]. Visible spectrum video cameras
used in this work and examples of field of view (FOV), Bronte, Catania, and Mt. Cagliato
are shown in Figure 3. These surveillance cameras do not allow 3D model extraction due to
poor overlap, unfavourable baseline, and low image resolution. Despite this, simulation of
the camera network geometry and sensor configuration have been carried out in a previous
project (MEDSUV project [3]) and will be adopted as a reference for future implementation
of the Etna Network.

Figure 3. Etna_Netvis surveillance network.

The technical speciﬁcations of Etna_NETVIS network cameras used in this work, such
as pixel resolution, linear distance to the vent, and horizontal and vertical ﬁeld of view
(HFOV and VFOV), are described in Table 1.

4
Remote Sens. 2022, 14, 4477

Table 1. Characteristics of the ETNA NETVIS cameras.

ETNA NETVIS
Resolution Distance to Image Captured
Station Name Model Angular FOV (deg)
Pixel the Vent per Minute
BRONTE 760 × 1040 13.78 km 1 VIVOTEK 33_~93_ (horizontal), 24_~68_ (vertical)
CATANIA 2560 × 1920 27 km 1
MONTE CAGLIATO 2560 × 1920 8 km 2 VIVOTEK 33_~93_ (horizontal), 24_~68_ (vertical)

4. Materials and Methods

4.1. Materials: Data Preparation
The paradigm used for this work was a supervised learning based on a set of samples
consisting of a pair of data; input variables (x) and output labelled variables (y). Data
labelling is the crucial part of the data preprocessing in the workﬂow to build a neural
network model, which requires large volumes of high-quality training data. The processes
for creating label data are expensive, complicated, and time-consuming. Many open-source
libraries, such as MNIST by Keras, offer a full dataset ready to use, but it covers neither
all types of objects nor labelled data for volcanic ash plume shapes. Thus, the 560 images
collected were manually labelled using an open-source image editor “GIMP” to delineate
the boundaries of volcanic plums and generate the ground truth mask (Figure 4). The
samples were split into two sets: training and validation in a proportion of 80% and 20%,
respectively. As this research deals with a binary classiﬁcation problem, the neural network
is contextualised within volcanic plume shapes by assigning pixel level. Thus, pixels that
are inside the ash column contour are assigned values of 255 or, otherwise, 0. Inputs with
large integer values could collapse the bias value or slow down the learning process, so, to
avoid this effect, pixels were normalised between 0 and 1 by applying Equation (1):

( x − xmin )
x = (1)
( xmax − xmin )

where x is the pixel to normalize, xmin is the minimum value of pixels of the image, and
xmax is the maximum value pixel of the image. To keep size consistency across the dataset
while reducing memory consumption, images were resized to (768px × 768px) by applying
bilinear interpolation.
Finally, to improve the robustness of the inputs, the training data were augmented
through a technique called “data augmentation”. It was applied with the Keras library
“ImageDataGenerator” class that artificially expands the size of the dataset, creating some
perturbating in our images as horizontal flips, zoom, random noise, and rotations (Figure 5).
Data augmentation avoids overfitting in the training stage.

4.2. Methods: ANN and UNET

The perceptron, core concept of deep learning and convolutional neural network
introduced by Rosenblatt [30], in brief, consists of a single-layer neural network whose base
algorithms are the threshold function and the gradient descent [31]. The latter method is the
most popular algorithm that performs parametrisation and optimisation of the parameters
in the artificial neural network (ANN), by means of labelled samples and process iterations
for the prediction of accurate outputs [31].
The optimisation minimises the loss function (or cost function), represented by the
cross-entropy as a measure of the difference between the actual and predicted classes.
Finally, the learning rate is an important parameter, used in the following sections to
control the time of the algorithm and the network parameter training at every iteration,
which is crucial to reach the expected results of the refined model. These parameters are
here briefly introduced, leaving the theoretical digression to dedicated sources [30,31].

5
Remote Sens. 2022, 14, 4477

Figure 4. Examples of variable pairs (in (A) the real images are shown and (B) represents the ground
truth mask).

Figure 5. Example of data augmentation with vertical and horizontal flips ((A) is a vertical right
flipped image of 60 inclination degrees, (B) is a horizontal and vertical flipped and (C) is a horizontal
and vertical flipped with distortion).

6
Remote Sens. 2022, 14, 4477

Convolutional Neural Network Architectures

Segmentation is a fundamental task for image analysis. Semantic segmentation de-
scribes the process of associating each pixel in an image with a class label. Segmenting
images of volcanic plumes is a complicated task, different from segmenting other objects,
such as people, cars, roads, buildings, and other entities that are well differentiated from
their background. Those types of objects are considered homogeneous and regular in form
and radiometry, but a volcanic plume can have very different physical properties [32], such
as shapes, colour, and density. In deep learning, CNN appears as a class of ANN based on
the shared-weight architecture of the convolution kernels [11] and proved very efficient
for pattern recognition, feature extraction for applications in computer vision analysis and
image recognition [33], classification [34], and segmentation [35]. This is useful to solve
problems as faced in this paper. Thus, this paper presents developed models based on
specific CNN architectures.
Different algorithms were implemented to develop a tool able to segment a vol-
canic ash plume from in situ images, creating two models based on architectures of Seg-
Net [36] and U-Net [37]. Those trained models were carried out using Tensorflow GPU
version 2.12 [38], Python 3.6 language, and Keras 2.9 [39], all of these based on open-source
libraries and built on Tensorflow framework. Keras appears here as the core language for
ANN programming, as it contains numerous implementations of commonly used neural
network building blocks, such as layers, activation functions, optimizers, metrics, and tools,
to preprocess images.
The U-net (Figure 6) is a CNN architecture for the segmentation of images, developed
by Olaf Ronneberger et al. [37] and used for medical scope, but now applied in several
other fields [40–43]. It is built upon the symmetric fully convolutional network and is
made up of two parts. The down-sampling network (encoder) reduces dimensionality of
the features while losing spatial information; instead, the up-sampling network (decoder)
enables the up-sampling of an input feature map to a desired output feature map using
some learnable parameters based on transposed convolutions. Thus, it is an end-to-end
fully convolutional network (FCN) that makes it possible to accept images of any size.

Figure 6. U-net architecture.

On the other hand, the SegNet architecture [36] FCN is based on decoupled encoder–
decoder, where the encoder network is based on convolutional layers, while the decoder is
based on up-samples. The architecture of this model is shown in Figure 7. It is a symmetric
network where each layer of encoder has a corresponding layer in the decoder.

7
Remote Sens. 2022, 14, 4477

Figure 7. SegNet architecture.

Loss functions are used to optimize the model during training stage, aiming at min-
imising the loss function (error). The lower the value of loss function, the better the model.
Cross-entropy loss is the most important loss function to face classiﬁcation problems. The
problem tackled in this work is a single classiﬁcation problem and the loss function applied
was a binary cross-entropy (Equation (2)):

N
1
yi ∗ logyi + (1 − yi ) ∗ log 1 − yi
N i∑
Loss = − (2)
=1

where yi is the i-th scalar value in the model output, yi is the corresponding target value,
and N is the number of scalar values in the model output.
A deep learning model is highly dependent on hyperparameters, and hyperparameter
optimisation is essential to reach good results. In this work, a CNN based on U-net
architecture was built, capable of segmenting volcanic plumes from visible cameras. The
values assigned to model parameters are shown in Table 2.

Table 2. Hyperparameters required for the training phase for both CNN architectures.

Hyperparameters Required for Training

Learning Rate 0.0001
Batch_Size 4
Compile networks
Optimiser adam
Loss binary_crossentropy
Metrics Accuracy; iou_score
Fit Generator
Step_per_epoch 112
Validation_steps 28
epochs 100

The encoder and encoder networks contain ﬁve layers with the conﬁguration shown
in Table 3.

8
Remote Sens. 2022, 14, 4477

Table 3. Convolutional layers description for U-Net architecture.

Input Layer A 2D Image with Shape (768, 768, 3)

Encoder Network
Convolutional Layer Filters Kernel Size Pooling Layer Activations Kernel Initialiser Stride Dropout
Conv1 16 3×3 yes ReLU he_normal 1×1 No
Conv2 32 3×3 yes ReLU he_normal 1×1 No
Conv3 64 3×3 yes ReLU he_normal 1×1 No
Conv4 128 3×3 yes ReLU he_normal 1×1 No
Conv5 256 3×3 yes ReLU he_normal 1×1 No
Bottle neck 512 3×3 No ReLU he_normal 0.5
Decoder Network
Convolutional Layer Filters Kernel Size Concatenate Layer Up-Sampling Activations Kernel Initializer Stride
Conv6 256 3×3 Conv5-Conv6 yes ReLU he_normal 1×1
Conv7 128 3×3 Conv4-Conv7 yes ReLU he_normal 1×1
Conv8 64 3×3 Conv3-Conv8 yes ReLU he_normal 1×1
Conv9 32 3×3 Conv2-Conv9 yes ReLU he_normal 1×1
Conv10 16 3×3 Conv1-Conv10 yes ReLU he_normal 1×1
Output layer 1 1×1 No No Sigmoid he_normal
Total trainable params 7.775.877

The encoder and encoder networks contain ﬁve layers with the conﬁguration shown
in Table 4.

Table 4. Convolutional layers description for SegNet architecture.

Input Layer A 2D Image with Shape (768, 768, 3)

Encoder Network
Convolutional Layer Filters Kernel Size Pooling Layer Activations Stride Dropout
Conv1 16 3×3 yes ReLU 1×1 No
Conv2 32 3×3 yes ReLU 1×1 No
Conv3 64 3×3 yes ReLU 1×1 No
Conv4 128 3×3 yes ReLU 1×1 0.5
Conv5 256 3×3 yes ReLU 1×1 0.5
Bottle neck 512 3×3 No ReLU 0.5
Decoder Network
Convolutional Layer Filters Kernel Size Up-Sampling Activations Stride Dropout
Conv6 256 3×3 yes ReLU 1×1 No
Conv7 128 3×3 yes ReLU 1×1 No
Conv8 64 3×3 yes ReLU 1×1 No
Conv9 32 3×3 yes ReLU 1×1 No
Conv10 16 3×3 yes ReLU 1×1 No
Output layer 1 1×1 No Sigmoid No
Total trainable params 11.005.841

In order to show the models built and the difference in the architecture used in this
work, Keras provides a function to create a plot of the neural network graph that can make
more complex models easier to understand, as is shown in Figure 8.

9
Remote Sens. 2022, 14, 4477

Figure 8. Left sketch of the U-net model with Deepest 4, right sketch of the SegNet model (the images
are available with higher resolution at the links in [44,45]).

4.3. Evaluation of the Proposed Model

Various evaluation metrics are used to calculate the performance of the model. The
evaluation metrics used in this research are explained below:
Accuracy score: it is the ratio of number of correct pixel predictions to the total number
of input samples (Equation (3)).

Accuracy = TP/TNP (3)

where TP is the number of true positives and NPT is the total number of predictions.

10
Remote Sens. 2022, 14, 4477

Jaccard index is the Intersection over Union (Equation (4)), where the perfect intersec-
tion has a minimum value equal to zero.

L( A, B) = 1 − ( A ∩ B/A ∪ B) (4)

where: (A ∩ B/A ∪ B) is the predicted masks overlap coefficient with the real masks
between the union of that masks.
Validation curves: the trend of a learning curve can be used to evaluate the behaviour
of a model and, in turn, it suggests the type of configuration changes that may be made to
improve learning performance [46]. On these curve plots, both the training error (blue line)
and the validation error (orange line) of the model are shown. By visually analysing both
of these errors, it is possible to diagnose if the model is suffering from high bias or high
variance. There are three common trends in learning curves: underfitting (high bias, low
variance), overfitting (low bias, high variance) and best fitting (Figure 9).

Figure 9. Underfitting, overfitting, and best fit example.

Figure 10 shows a trend graph of the cross-entropy loss of both architectures (Y axis)
over number of epochs (X axis) for the training (blue) and validation (orange) datasets. For
the U-Net architecture, the plot shows that the training process of our model converges
well and that the plot of training loss decreases to a point of stability. Moreover, the plot of
validation loss decreases to a point of stability and has a small gap with the training loss.
On the other hand, for the SegNet architecture, the plot shows that the training process of
our model converged well until epoch 30, then showed an increase in variance, taking to a
possible overfitting. This means that the model pays a lot of attention to training data and
does not generalise on the data that it has not seen before. As a result, the SegNet model
performs very well on training data but has more error rates than U-net model on test data.

Figure 10. Trend curve of loss function.

11
Remote Sens. 2022, 14, 4477

The loss function for U-Net architecture for the training dataset is 0.026 and validation
0.316 and, for SegNet, for the training dataset is 0.018, while for the validation dataset
is 0.142.
Figure 11 shows a trend graph of the accuracy metric (Y axis) over the number of
epochs (X axis) for the training (blue) and validation (orange) datasets. In the Epoch 100, the
accuracy value reached for the U-Net architecture training dataset is 98.35% and validation
dataset is 98.28; while, for SegNet, the accuracy value for the training dataset is 98.15% and
validation dataset is 97.56.

Figure 11. Trend curve of accuracy metric of training and validation dataset.