0% found this document useful (0 votes)
42 views11 pages

Satellite Image Segmentation

Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
42 views11 pages

Satellite Image Segmentation

Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd

Exploring Satellite Image Segmentation Techniques: A Comparative Study

Chandril Adhikary1, Soubik Ghosh2, Monalisa Dey3


1
Department of CSE, Institute of Engineering and Management, University of Engineering
and Management, Kolkata, India
2
Department of CSE, Institute of Engineering and Management, University of Engineering
and Management, Kolkata, India
3
Department of CSE, Institute of Engineering and Management, University of Engineering
and Management, Kolkata, India

Email: [email protected] 1; [email protected] 2


;
[email protected] 3

Abstract
Satellite Image Segmentation is crucial in sectors like defense, finance, disaster
management, agriculture, and urban planning. This paper aims to explore the latest
advancements in this field and provide a comprehensive study by examining various
methodologies. By understanding the techniques used for satellite image segmentation, this
study intends to offer valuable insights into how these images are analyzed. This knowledge
can guide practitioners and researchers in choosing the right methods for their specific needs.
The ultimate objective is to enhance the decision-making procedures and results in various
domains, including city development, financial planning, catastrophe response, defense tactics,
and crop monitoring. Our goal in doing this research is to help a variety of sectors and
companies by advancing the useful application of satellite imagery.

Keywords: Satellite Image Segmentation, Advancements, Methodologies, Analysis, Results,


Application.

1.1 Introduction

Segmentation of satellite images is a very important measure in unlocking the valuable


insights hidden in the enormous amounts of data captured by satellites as they orbit the Earth.
These orbiting sentinels provide a bird’s eye view of our planet, capturing intricate details of
its surface, ranging from natural landscapes to human-made infrastructure. However, extracting
meaningful information from these images is a very difficult task owing to the volume and

1
complexity of the data involved.
Traditionally, segmentation techniques relied on rule-based and hand-crafted algorithms,
which found it difficult to cope with the diverse and dynamic nature of satellite imagery. These
early methods often fell short in capturing the complex patterns and structures present in the
images, which limited their utility in real-world applications. However, the emergence of
remote sensing applications marked a significant paradigm shift in the field, spurring the
adoption of more advanced image analysis techniques.
The introduction of machine learning, particularly deep learning algorithms, emerged as a
game-changer in satellite image analysis. Convolutional Neural Networks (CNNs), in
particular, revolutionized image analysis tasks by effectively extracting hierarchical features
from images. CNNs excel at identifying subtle structures and patterns in satellite imagery,
making them well-suited for tasks such as land cover classification, object detection, and
semantic segmentation. Semantic segmentation, a technique enabled by CNNs, has become a
cornerstone in satellite image analysis. This approach involves categorizing each pixel in an
image, allowing for detailed classification of land use patterns, land cover types, and other
features. The granularity provided by semantic segmentation makes it easier to comprehend
Earth’s surface dynamics and enables a plethora of applications across diverse domains,
including planning urban landscape, agriculture management, monitoring the environment and
managing disasters.
Moreover, advancements in transfer learning and domain adaptation have further boosted
the evolution of satellite image segmentation algorithms. Transfer learning leverages models
which come pre-trained on large datasets to increase performance on tasks with limited training
data, a common challenge in satellite imagery analysis. Domain adaptation techniques enable
models to adapt effectively to various datasets, enhancing their generalization capability across
diverse satellite imagery.
Despite these advancements, satellite image segmentation still faces formidable challenges.
Variability in image resolution, complex landscapes, and atmospheric conditions pose hurdles
to achieving accurate and reliable segmentation results. Furthermore, ensuring that deep
learning models are transparent and interpretable remains a pressing concern, especially in
applications where decisions based on segmentation results have significant implications on
society.
In conclusion, a new era of satellite image analysis has been brought about by the
convergence of deep learning techniques with the abundance of satellite imagery data.
Continuous advancements in segmentation algorithms hold the promise of unlocking
2
invaluable insights into our planet’s dynamics and informing evidence-based decision-making
across various sectors. However, addressing the remaining challenges in satellite image
segmentation will be imperative to fully harness the transformative potential of satellite
imagery for scientific, environmental, and societal applications.

1.2 A Comparative Overview

In this section we compare different methods of satellite segmentation and try to figure out
the advancements made in this field of work with respect to the application areas and efficiency
figures.

1.2.1 T-Clustering Self Organizing Map Based Technique

Self-organizing map [1] and threshold technique work hand in hand one after another to
complete the segmentation process. Self-Organizing Map is a technique which falls under
artificial neural network method. It recognizes patterns and translates them into responses from
a two-dimensional neuron array, regardless of the pattern’s dimensions. One of the main
characteristics of SOM, the neighborhood relations of the input pattern, are preserved with the
aid of the feature map. After SOM is completed threshold the T-Cluster technique is used to
decrease over segmentation and remove tiny clusters. These two techniques need to work
together to complete the whole process. To arrange the pixels in various groups, the SOM first
receives the satellite image. The ultimate number of cluster centers is then determined from the
arranged pixels using the T-Cluster technique. The problems of over and under segmentation
which is generated by the SOM is handled separately by the TSOM. Segmentation is an
important stage in image processing. This model, which uses SOM and T-Cluster Segmentation
sequentially for satellite image processing, was created due to a lack of unsupervised neural
network techniques. The threshold value and number of iterations compute the efficiency of
the given selected process. However individual values can be determined easily for individual
images and these values can be used further in segmenting the next set of images.

1.2.2 Decision Tree, SVM and K-NN Based Technique

This technique is based on the usage of algorithms such as Decision Tree, KNN: K Nearest
Neighbour and SVM: Support Vector Mechanism [2]. The KNN algorithm, commonly referred
to as the Nearest Neighbor Algorithm, is a machine learning technique that categorizes objects
into groups according to the feature space’s nearest training instances. The label of the

3
unlabeled data point is simply assigned through the classification procedure based on its k-
nearest neighbors. Using the divide and conquer strategy, the decision tree classifier serves as
an inductive learning technique in machine learning that creates a classification tree from the
training set of data or sample. Selecting high-resolution and low-resolution photos for
categorization is the first step in the procedure. Following that, neighborhood pixel correlation
is used to segment these images. Seventy percent of all pixels are chosen in the subsequent step
of classification data selection to produce accurate classification outcomes. Two methods are
used in this process, KNN and SVM, which need to select a lot of parameters to run. Lastly the
ground level truth data is compared to the segmented images to get the accuracy result. In
conclusion KNN, SVM and decision tree data mining algorithms are used to classify the the
input images. The accuracy in the SVM method is higher as compared to the KNN and Decision
Tree algorithms. High resolution images have a accuracy of 78.60% and the low-resolution
images have an accuracy of 82.34% respectively. The kappa coefficient in SVM method is
quite higher than in the other two methods.

1.2.3 Attention Dilation-LinkNet (AD-LinkNet) Neural Network Based Technique

To accomplish semantic segmentation, Attention Dilation-LinkNet [3], or AD-LinkNet is


an integration of an encoder decoder structure, a channel-wise attention process, a dilated
convolution in a series-parallel combination, and a encoder which was previously trained.
Dilated convolution in series–parallel combination broadens the receptive field and creates
multi-scale properties for objects that have multiple scales, such a little pool and a long-span
road. Extending on the prior D-LinkNet34, AD LinkNet is a neural network that brings together
the benefits of several networks. To produce a more comprehensive semantic segmentation
network, AD-LinkNet incorporates an attention mechanism and dilated convolution in a series
parallel combination within the network. A structure that is a series-parallel combination is
also produced by the residual network’s "parallel expansion" properties and the dilated
convolution formed by using the short-cut connection. The variety of convolutional structures
are enabled to be used by parallel dilated convolution structures. It then combines the different
branches of information through stacking and succeeds in achieving multi-scale feature fusion.
The depth of a branch in a parallel structure is the same and there is just a single convolution
layer in each of them. The significance of several efficient feature layers is first weighed and
then their usage is enhanced. This is called the attention mechanism. To sum up, this model
uses loss function and network design to improve segmentation results.

4
1.2.4 Deep Learning Segmentation Using U-Net Based Technique

This technique is based on the usage of the Unet Model for segmentation using deep learning
[4]. Semantic Segmentation can be defined as labelling all the pixels of an image and then
assigning those pixels into different classes. The encoder-decoder architecture, on which U-
Net is built, is highly well-liked because of its excellent performance, versatility, and
efficiency. The basic operations of U-Net are (1) the application of convolutional layers, which
are used to extract features by applying several 3 × 3 convolution kernels (referred to as
convolution); (2) the application of the a batch normalizing layer, which is represented by batch
normalization and speeds up convergence during training; (3) the layer of the activation
function in which the nonlinear modification of the feature maps is handled .In this method the
frequently utilized RELU i.e. rectified linear unit is used , usually used activation, for this layer;
(4) the max-pooling layer (represented by max-pooling) for feature map down sampling; (5)
the use of the up-sampling layer, in which the size of the previously down sampled feature
maps are then increased with the help of the max-pooling layer (represented with up-sampling);
and (6) the concatenation layer, which combined the deep layers’ up sampled feature map with
the pertinent shallow layer feature map. In a nutshell, the U-Net architecture is as follows: after
receiving input images, each one undergoes two convolution processes making use of the ReLu
activation function. After that, the encoded picture goes via the layer used for pooling. The
feature maps size is decreased as a result of this several-time iterative approach. The model
initiates sampling after obtaining all of the sample maps. The feature map’s layers are
concatenated during the down sampling procedure. After sampling up the feature maps, the
segmentation maps are ultimately displayed as an output having the same dimensions as the
original picture.

1.2.5 Active Learning for enhanced Semi-Supervised Semantic Segmentation in Satellite


Images Based Technique

Using the most representative and instructive examples of the data, active learning labels
[5] the unlabeled data set according to some information metric. A conditional Generative
Adversarial Network (GAN) is used in which the input is a randomly selected sample of the
subset of images and labels from the dataset. The sample is repeated using a sampling strategy
based on active learning to provide a more diverse collection of training data that demonstrates
a relative performance increase. The semi-supervised semantic segmentation model would be
able to make better use of the representative set of labeled data in the event that active learning

5
strategies were applied to choose a labelled and informative data subset. To improve the semi-
supervised semantic segmentation performance tasks used in land cover categorization in
satellite pictures, this work provides a modeling strategy based on active learning. Using coarse
image classification based active-learning algorithms, the pixel-by-pixel labelled samples are
obtained and subsequently picked. Semi-supervised semantic segmentation network helps to
obtain appropriate amount of initial data to learn suitable representatives starting from the best
registered models. In a semi-supervised semantic segmentation network prototype based on
GAN, pool-based active learning methods were employed to select the tagged pictures. The
present study presents an innovative approach that makes use of sampling techniques based on
active learning to increase the efficiency of semantic segmentation using semi-supervised
techniques in the realm of land cover classification for satellite imagery. The key innovation
involves strategically selecting samples for pixel-wise labeling, guided by active-learning
strategies rooted in coarse image classification. The objective is to provide an ideal set of
labeled instances to the semi-supervised semantic segmentation network, which will aid in the
acquisition of essential preliminary data for learning an appropriate representation.
Implemented as a prototype for a semi-supervised semantic segmentation network founded on
GANs, or Generative Adversarial Networks, our approach selects labeled images through pool
based active learning strategies. Two distinct satellite image datasets is used to showcase the
method’s effectiveness, leveraging both quantitative metrics and qualitative assessments to
showcase significant performance improvements.

1.2.6 Graph based Segmentation using MST Based Technique

Graph-based segmentation using Minimum Spanning Trees (MST) [6] is a technique which
is often used in satellite image analysis to divide images into various classes which are
dependent on the connectivity and similarity of pixels or super pixels. One common approach
within graph-based segmentation for satellite images is the use of Minimum Spanning Trees
(MSTs) or Minimum Spanning Forests (MSFs). The satellite image is initially divided into
smaller, perceptually homogeneous regions known as super pixels. Super pixels group together
pixels with similar features like color, texture, and intensity. Various algorithms like SLIC
(Simple Linear Iterative Clustering) or Felzenszwalb’s method are commonly used for super
pixel generation. Once the super pixels are generated, a graph is constructed, where each node
represents a super pixel, and edges connect neighboring super pixels. The edges are weighted
based on the dissimilarity between the features of connected super pixels. This dissimilarity
can be calculated using color difference, texture features, spatial proximity, or a combination

6
of these factors. An MST (Minimum spanning tree) or MSF (Minimum Spanning Forest) is
then computed from the graph. The MST then connects all the nodes (super pixels) with the
minimum total edge weight. On the other hand, in the case of a forest, multiple disjoint trees
are generated. The minimum spanning tree or forest is used to group the superpixels into
segments or clusters. This grouping is based on the connectivity of super pixels in the tree or
forest. Each segment corresponds to a distinct region in the satellite image. The quality of
results is further increased by applying a post processing unit. This can include merging or
splitting segments based on criteria such as shape, size, or semantic information. Additionally,
smoothing filters or morphological operations may be applied to enhance the segmentation
output. Graph-based segmentation methods offer benefits such as computational efficiency, the
ability to capture spatial coherence, and flexibility in incorporating various image features.
They are widely used in satellite image processing for tasks like object identification, change
detection, urban planning, and land cover categorization.

1.2.7 Fuzzy Clustering Using FCM-Fuzzy C-Means Based Technique

This technique is based on Fuzzy Clustering using FCM-Fuzzy C-Means [7]. Fuzzy
clustering in satellite image segmentation involves clustering pixels in satellite images into
several groups based on their similarity, with a degree of uncertainty or fuzziness associated
with each pixel’s membership to a particular cluster. This method is particularly useful when
traditional methods which are usually known as hard clustering methods, e.g.-means clustering,
becomes insufficient because of the complicated and overlapping nature of satellite image data.
Satellite images are pre-processed to remove noise, normalize intensities, and enhance features
that are important for segmentation. One of the frequently used fuzzy clustering techniques for
segmenting images is FCM. It extends the K-means algorithm to allocate membership values
to each pixel for each cluster rather than assigning it exclusively to one cluster. In FCM, each
cluster has a membership value which is then assigned to every pixel and is used to represent
the degree of belongingness which in turn determines the extent to which that pixel is part of
each cluster. These membership values are bounded between the lower value of 0 which
signifies no membership and 1 which signifies full membership. The FCM algorithm iteratively
optimizes an objective function to minimize the fuzzy membership values’ distance from the
cluster centroids and enhance the clustering result. This is typically done using iterative
optimization techniques like gradient descent. Like K-means, FCM requires initial cluster
center positions. These centers can be randomly selected or initialized based on prior
knowledge of the image. FCM also effectively introduces a fuzziness parameter (m) which

7
controls how fuzzily the clustering procedure is done. Higher values of the fuzziness parameter
led to softer boundaries between clusters, allowing pixels to have higher memberships to
multiple clusters. Once the algorithm converges, each pixel is assigned into the cluster which
has the highest membership value. This results in a segmented image with each and every pixel
has been assigned a single or multiple such clusters with specific membership levels. To
enhance the quality of the segmented image, post processing methods like morphological
operations or spatial smoothing might be used. Fuzzy clustering in satellite image segmentation
can give more robust results compared to traditional hard clustering methods, particularly in
scenarios where pixels may belong to multiple land cover classes simultaneously or when there
are gradual transitions between different land cover types.

1.2.8 Mask R-CNN segmentation method for Satellite Imagery Based Technique

Mask R-CNN, which is also known as Mask Region-based convolutional neural network
[8] has been proven very effective in segmenting satellite images. It is a deep learning model
that combines the instance segmentation capabilities of the Mask R CNN and fast R-CNN for
object recognition, two well-liked deep learning architectures. With respect to satellite image
analysis, Mask R-CNN excels at precisely delineating objects of strategic value, like roads,
buildings, vegetation, and water bodies, while also providing pixel-level segmentation masks
that outline the boundaries of these objects with remarkable accuracy. The process begins with
the input satellite image being fed into the network, whereby a sequence of convolutional layers
is applied to extract hierarchical features at various scales. Afterwards, boxes which are more
likely to useful information known as region proposals—candidate bounding boxes are built
with the help of these attributes. Next, the network refines these proposals through regression
of these bounding boxes and then classifies them to accurately localize and classify objects
within the image. But Mask R-CNN stands out because it can give detailed segmentation of
instances in addition to object detection. In this step, the network employs a parallel branch to
predict segmentation masks for each object detected, assigning a binary mask to every pixel
within the bounding box, indicating whether it belongs to the object or background. This fine-
grained segmentation enables precise delineation of object boundaries, even in complex and
cluttered scenes commonly encountered in satellite imagery. The process of training the Mask
R-CNN typically involves a huge dataset containing annotated satellite images, where each
object of interest is meticulously labeled with its corresponding class and segmentation mask.
Through an iterative process of backpropagation and optimization, the network learns to

8
accurately predict object locations and segmentation masks, gradually improving its
performance over time. Once trained, Mask R-CNN is deployed for many applications in
satellite image analysis, including urban planning, disaster response, environmental
monitoring, and agricultural assessment.

1.2.9 Shadowed C-Means Based Satellite image segmentation Technique

The algorithmic and conceptual link between FCM i.e. fuzzy c-means and RCM i.e rough
c-means is provided by shadowed c-means (SCM)[9], which combines the strength of both
methods. To achieve effective segmentation and visualization, the technique is especially well-
suited for modeling and grouping huge data types, like photographs. Two of the important
features of the SCM [10] algorithm are its (i) ability to reduce the total number of parameters
which have been fixed by the users and (ii) stability in the face of outliers. According to a
threshold, the exclusion, shadow, and core regions are automatically identified. The Xie–Beni
(X)[12] and Davies–Bouldin(DB) clustering validity indices[11] are used to determine the most
suitable number of clusters (c).SCM technique offers an effective way to represent the
segments in which overlap in an image (such as satellite imagery) while organically addressing
the uncertainty and ambiguity present along the area borders. This could be used as an
instrument for improved visualization and later image analysis. SCM increases the algorithm’s
robustness and dependability by giving the core members the most weight during clustering.
Using satellite imagery, the SCM mapping may be utilized to efficiently classify several
overlapping landcover regions unsupervisedly. Certain regions, such as water, concrete, and
vegetation, occupy larger areas than others, such as roads and bridges. It is a difficult task to
automatically recognize clusters of such widely different sizes efficiently. But in the end it
depends on the user to assess an image’s suitability for a given application once it has been
segmented for visual interpretation. The algorithm’s performance is truly evaluated when a
synthetic image is created that overlaps between the regions and known pixel label. This makes
it easier to evaluate how well the segmentation worked with the help of true data from the
ground. The homogeneity of the regions was evaluated using the F-measure[13], along with
DB[11] and X[12], and the segmentation was compared to the ground truth regions using the
Jaccard coefficient[1].The outcomes of the experiments also show how well SCM segments
the distinct landcover types from the photos obtained through the Indian Remote Sensing
satellite (IRS) of Calcutta and Bombay, as well as the Systeme Probatoire d’Observation de la
Terre which is also known as SPOT image surrounding Kolkata.

9
Table 1.1 Comparison of various techniques

Computation
Technique Accuracy Complexity Suitability
Time
T-Clustering Self
Moderate Moderate Medium Reducing over-segmentation
Organizing Map
Decision Tree, SVM, High (82.34%
Moderate Medium Supervised classification
K-NN for low res)
AD-LinkNet Neural
High High Complex Complex multi-scale segmentation
Network
U-Net Based Deep
High High Complex Pixel-wise semantic segmentation
Learning
Active Learning with
High High Complex Semi-supervised segmentation
GAN
Graph-Based Object identification and land
Moderate Low Medium
Segmentation (MST) cover
Fuzzy Clustering Overlapping data, suitable for
Moderate Moderate Medium
(FCM) noisy images
Object-level segmentation and
Mask R-CNN Very High High Very Complex
boundary detection
Shadowed C-Means Effective for outliers, suitable
Moderate Low Medium
(SCM) for large datasets

1.3 Conclusion
By comparing various approaches and strategies in the domain of satellite image
segmentation, a comparative study was presented and the comprehensive analysis was
provided. Through meticulous examination of different methodologies, we evaluated the
evolution of satellite image segmentation through different approach throughout the time
period. Each methodology presented and solved different set of challenges through their
different approaches and manifested unique strengths and weakness. The comparative analysis
highlighted the contextual nuances that researchers and practitioners should weigh in. As this
comparative study is concluded, it is imperative to acknowledge the inherit complexity of
Satellite Image Segmentation and the evolving nature of the research methodologies which
would serve as a stepping stone for advancements and applications in sectors such as defense,
disaster management, agriculture, urban planning, etc.

Acknowledgement

We extend our sincerest gratitude to our professor and mentor, Prof. Monalisa Dey, whose
guidance and support throughout the entire process, along with her valuable insights and
constructive feedback, significantly contributed to the enhanced quality of our work. We also
want to sincerely thank our college for allowing us to work on this fascinating topic and for
providing the tools we required to complete this paper.

10
References

[1] M. Awad, “An unsupervised artificial neural network method for satellite image
segmentation,” Int. Arab J. Inf. Technol., vol. 7, no. 2, pp. 199–205, 2010.
[2] I. Nurwauziyah, S. UD, I. G. B. Putra, and M. I. Firdaus, “Satellite image classification
using decision tree, SVM and k-nearest neighbor,” July 2018.
[3] M. Wu, C. Zhang, J. Liu, L. Zhou, and X. Li, “Towards accurate high resolution satellite
image semantic segmentation,” IEEE Access, vol. 7, pp. 55609 55619, 2019.
[4] Z. Pan, J. Xu, Y. Guo, Y. Hu, and G. Wang, “Deep learning segmentation and classification
for urban village using a worldview satellite image based on U-Net,” Remote Sensing, vol. 12,
no. 10, p. 1574, 2020.
[5] S. Desai and D. Ghose, “Active learning for improved semi-supervised semantic
segmentation in satellite images,” in Proceedings of the IEEE/CVF Winter Conference on
Applications of Computer Vision, 2022, pp. 553–563.
[6] K. Ravali, M. V. Kumar, and K. Rao, “Graph-based high-resolution satellite image
segmentation for object recognition,” ISPRS-Int. Arch. Photogramm. Remote Sens. Spatial Inf.
Sci., vol. XL-8, pp. 913–917, 2014.
[7] S. Naz, H. Majeed, and H. Irshad, “Image segmentation using fuzzy clustering: A survey,”
in Proceedings of the 2010 6th International Conference on Emerging Technologies (ICET),
2010.
[8] S. Liang, F. Qi, Y. Ding, R. Cao, Q. Yang, and W. Yan, “Mask R-CNN based segmentation
method for satellite imagery of photovoltaics generation systems,” in 2020 39th Chinese
Control Conference (CCC), 2020, pp. 5343–5348.
[9] S. Mitra and P. P. Kundu, “Satellite image segmentation with shadowed c-means,” Inf. Sci.,
vol. 181, no. 17, pp. 3601–3613, 2011.
[10] S. Mitra, W. Pedrycz, and B. Barman, “Shadowed c-means: Integrating fuzzy and rough
clustering,” Pattern Recognit., vol. 43, no. 4, pp. 1282–1291, 2010.
[11] L. A. Zadeh, “Is there a need for fuzzy logic?,” Inf. Sci., vol. 178, no. 13, pp. 2751–2779,
2008.
[12] X. L. Xie and G. Beni, “A validity measure for fuzzy clustering,” IEEE Trans. Pattern
Anal. Mach. Intell., vol. 13, no. 8, pp. 841–847, 1991.

[13] T. Y. Chen, F. C. Kuo, and R. Merkel, “On the statistical properties of the f-measure,” in
Proceedings of the Fourth International Conference on Quality Software (QSIC), 2004, pp.
146–153.

11

You might also like