Multisensor Data Fusion
Multisensor Data Fusion
Ni-Bin Chang
Kaixu Bai
MATLAB ® is a trademark of The MathWorks, Inc. and is used with permission. The MathWorks does not warrant the
accuracy of the text or exercises in this book. This book’s use or discussion of MATLAB ® software or related products
does not constitute endorsement or sponsorship by The MathWorks of a particular pedagogical approach or particu-
lar use of the MATLAB ® software.
CRC Press
Taylor & Francis Group
6000 Broken Sound Parkway NW, Suite 300
Boca Raton, FL 33487-2742
This book contains information obtained from authentic and highly regarded sources. Reasonable effort has been
made to publish reliable data and information, but the author and publisher cannot assume responsibility for the
validity of all materials or the consequences of their use. The authors and publishers have attempted to trace the
copyright holders of all material reproduced in this publication and apologize to copyright holders if permission to
publish in this form has not been obtained. If any copyright material has not been acknowledged, please write and let
us know so we may rectify in any future reprint.
Except as permitted under U.S. Copyright Law, no part of this book may be reprinted, reproduced, transmitted, or
utilized in any form by any electronic, mechanical, or other means, now known or hereafter invented, including pho-
tocopying, microfilming, and recording, or in any information storage or retrieval system, without written permission
from the publishers.
For permission to photocopy or use material electronically from this work, please access [Link] (http://
[Link]/) or contact the Copyright Clearance Center, Inc. (CCC), 222 Rosewood Drive, Danvers, MA
01923, 978-750-8400. CCC is a not-for-profit organization that provides licenses and registration for a variety of users.
For organizations that have been granted a photocopy license by the CCC, a separate system of payment has been
arranged.
Trademark Notice: Product or corporate names may be trademarks or registered trademarks, and are used only for
identification and explanation without intent to infringe.
Visit the Taylor & Francis Web site at
[Link]
and the CRC Press Web site at
[Link]
Contents
Preface.............................................................................................................................................. xv
Acknowledgments...........................................................................................................................xvii
Authors.............................................................................................................................................xix
Chapter 1 Introduction...................................................................................................................1
1.1 Background.........................................................................................................1
1.2 Objectives and Definitions.................................................................................3
1.3 Featured Areas of the Book................................................................................5
References..................................................................................................................... 7
v
vi Contents
5.9 Summary.......................................................................................................... 89
References................................................................................................................... 89
Chapter 7 Feature Extraction with Machine Learning and Data Mining Algorithms.............. 127
7.1 Introduction.................................................................................................... 127
7.2 Genetic Programming.................................................................................... 130
7.2.1 Modeling Principles and Structures.................................................. 130
7.2.2 Illustrative Example.......................................................................... 132
7.3 Artificial Neural Networks............................................................................. 137
7.3.1 Single-Layer Feedforward Neural Networks and Extreme
Learning Machine............................................................................. 138
7.3.2 Radial Basis Function Neural Network............................................. 142
7.4 Deep Learning Algorithms............................................................................ 144
7.4.1 Deep Learning Machine................................................................... 144
7.4.2 Bayesian Networks............................................................................ 146
7.4.3 Illustrative Example.......................................................................... 150
7.5 Support Vector Machine................................................................................. 153
7.5.1 Classification Based on SVM............................................................ 153
7.5.2 Multi-Class Problem......................................................................... 156
7.5.3 Illustrative Example.......................................................................... 156
7.6 Particle Swarm Optimization Models............................................................ 158
7.7 Summary........................................................................................................ 160
References................................................................................................................. 161
viii Contents
Chapter 9 Major Techniques and Algorithms for Multisensor Data Fusion.............................. 195
9.1 Introduction.................................................................................................... 195
9.2 Data Fusion Techniques and Algorithms....................................................... 196
9.2.1 Pan-Sharpening................................................................................. 197
[Link] Component Substitution.................................................... 198
[Link] Relative Spectral Contribution..........................................200
[Link] High Frequency Injection..................................................200
[Link] Multi-Resolution Transformation...................................... 201
[Link] Statistical and Probabilistic Methods................................ 201
9.2.2 Statistical Fusion Methods................................................................202
[Link] Regression-Based Techniques...........................................202
[Link] Geostatistical Approaches.................................................202
Contents ix
Chapter 10 System Design of Data Fusion and the Relevant Performance Evaluation Metrics...... 229
10.1 Introduction.................................................................................................... 229
10.2 System Design of Suitable Data Fusion Frameworks..................................... 230
10.2.1 System Design for Data Fusion—Case 1.......................................... 230
10.2.2 System Design for Data Fusion—Case 2.......................................... 232
10.2.3 The Philosophy for System Design of Data Fusion.......................... 234
10.3 Performance Evaluation Metrics for Data Fusion.......................................... 234
10.3.1 Qualitative Analysis.......................................................................... 235
10.3.2 Quantitative Analysis........................................................................ 235
[Link] Without Reference Image.................................................. 235
[Link] With Reference Image....................................................... 237
10.4 Summary........................................................................................................ 241
References................................................................................................................. 241
Chapter 13 Integrated Data Fusion and Machine Learning for Intelligent Feature Extraction....... 301
13.1 Introduction.................................................................................................... 301
13.1.1 Background....................................................................................... 301
13.1.2 The Pathway of Data Fusion.............................................................302
13.2 Integrated Data Fusion and Machine Learning Approach.............................304
13.2.1 Step 1—Data Acquisition.................................................................. 305
13.2.2 Step 2—Image Processing and Preparation......................................306
13.2.3 Step 3—Data Fusion.........................................................................307
13.2.4 Step 4—Machine Learning for Intelligent Feature Extraction.........308
13.2.5 Step 5—Water Quality Mapping....................................................... 312
13.3 Summary........................................................................................................ 317
Appendix 1: Ground-Truth Data............................................................................... 318
Appendix 2................................................................................................................ 319
References................................................................................................................. 319
Index............................................................................................................................................... 489
Preface
Earth observation and environmental monitoring require data to be collected for assessing various
types of natural systems and the man-made environment with varying scales. Such research
enables us to deepen our understanding of a wealth of geophysical, geochemical, hydrological,
meteorological, and ecological processes of interest. For the past few years, the scientific community
has realized that obtaining a better understanding of interactions between natural systems and the
man-made environment across different scales demands more research efforts in remote sensing.
The key research questions include: (1) how to properly fuse the multisensor images with different
spatial, temporal, and spectral resolution to minimize the data gaps and create a myriad of long-
term consistent and cohesive observations for further feature extraction, and (2) how feature
extraction can be adequately processed by these traditional algorithms and advanced computational
intelligence methods to overcome barriers when complex features are embedded or constrained by
heterogeneous images. From systems engineering perspectives, the method of integrating the latest
forefronts of multisensor data fusion and feature extraction with the aid of advanced computational
intelligence methods is of primary importance to achieve our common goal—“the whole is greater
than the sum of its parts.”
The aim of this book is thus to elucidate the essence of integrated multisensor data fusion and
machine learning highlighted for promoting environmental sustainability. It emphasizes the concept
of the “System of Systems Engineering” approach with implications for both art and science. Such
an endeavor can accommodate an all-inclusive capability of sensing, monitoring, modeling, and
decision making to help mitigate the natural and human-induced stresses on the environment.
On this foundation, many new techniques of remote sensing image processing tools and feature
extraction methods in concert with existing space-borne, air-borne, and ground-based measurements
have been collectively presented across five distinctive topical areas in this book. This initiative
leads to a thorough discussion of possible future research with synergistic functionality across space
and time at the end of the book. The book will be a useful reference for graduate students, academic
scholars, and working professionals who are involved in the study of “earth systems science” and
“environmental science and engineering” promoting environmental sustainability.
MATLAB® is a registered trademark of The MathWorks, Inc. For product information, please contact:
xv
Acknowledgments
This book grew out of a series of research grants and contracts as well as a wealth of international
collaborative work. This book could not have been written without the valuable assistance of several
people. The authors are indebted to Mr. Benjamin Vannah, Ms. Xiaoli Wei, Mr. Chandan Mostafiz,
and Dr. Zhibin Sun for their collective contributions. Their helpful research and/or thesis work are
gratefully acknowledged. We also extend our gratitude to Ms. Rachel Winter, who serves as the
language editor of this book. Finally, without encouragement from Ms. Irma Shagla Britton, who
is the senior editor of Environmental Sciences, Remote Sensing & GIS in the CRC Press/Taylor &
Francis Group, we could have not made up our mind to complete this lengthy work. Special thanks
are extended to her as well.
xvii
Authors
Ni-Bin Chang has been professor of Environmental Systems Engineering,
having held this post in the United States of America since 2002. He received
his BS in Civil Engineering from the National Chiao-Tung University in
Taiwan in 1983, and MS and PhD in Environmental Systems Engineering
from Cornell University in the United States of America in 1989 and
1991, respectively. Dr. Chang’s highly interdisciplinary research lies at the
intersection among “Environmental Sustainability, Green Engineering, and
Systems Analysis.” He is director of the Stormwater Management Academy
and professor with the Department of Civil, Environmental, and Construction
Engineering at the University of Central Florida in the United States of America. From August
2012 to August 2014, Professor Chang served as program director of the Hydrologic Sciences
Program and Cyber-Innovated Sustainability Science and Engineering Program at the National
Science Foundation in the United States of America. He was elevated to Fellow of the Institute
of Electronics and Electrical Engineers (IEEE) in 2017, and he has been actively with the IEEE
Geoscience and Remote Sensing Society, IEEE Systems, Man, and Cybernetics Society, and the
IEEE Computational Intelligence Society. He also has distinctions which are selectively awarded
titles, such as an inducted Fellow of the European Academy of Sciences in 2008, and an elected
Fellow of American Society of Civil Engineers ([Link]) in 2009, the American Association for
the Advancement of Science ([Link]) in 2011, the International Society of Optics and Photonics
([Link]) in 2014, and the Royal Society of Chemistry (the United Kingdom) ([Link]) in 2015. He
has been the editor-in-chief of SPIE Journal of Applied Remote Sensing since 2014. He is currently
an editor, associated editor, or editorial board member of 20+ international journals.
xix
1 Introduction
1.1 BACKGROUND
Remote sensing is defined as the acquisition and analysis of remotely sensed images to gain
information about the state and condition of an object through sensors that are not in physical contact
with it and discover relevant knowledge for decision making. Remote sensing for environmental
monitoring and Earth observations can be defined as:
Remote sensing is the art and science of obtaining information about the surface or subsurface of
Earth without needing to be in contact with it. This can be achieved by sensing and recording emitted
or reflected energy toward processing, analyzing, and interpreting the retrieved information for
decision-making.
The remote sensing process involves the use of various imaging systems where the following seven
elements are involved for environmental monitoring and earth observations: (1) illumination by
the sun or moon; (2) travel through the atmosphere; (3) interactions with the target; (4) recording
of energy by the sensor; (5) transmission, absorption, reflection, and emission; (6) retrieval,
interpretation, and analysis; and (7) decision making for applications.
Types of remote sensing technologies include air-borne, space-borne, ground-based, and sea-
based remote sensing technologies with a wealth of sensors onboard different platforms. These
sensors are designed to observe electromagnetic, acoustic, ultrasonic, seismic, and magnetic energy
for environmental monitoring and earth observation. This book focuses on remote sensing sensors
making use of the electromagnetic spectrum for environmental decision making. These sensors
generally detect reflected and emitted energy wavelengths ranging from ultraviolet to optical, to
infrared, to microwave remote sensing that can measure the electromagnetic energy.
Over the last few decades, satellite remote sensing that aims to observe solar radiation has
become an invaluable tool for providing estimates of spatial and temporal time series variables
with electromagnetic sensors. The traditional image-processing algorithms often involve image
restoration, image enhancement, image segmentation, image transformation, image fusion, and data
assimilation with feature extraction/classification models. With the availability of field observations,
such image-processing efforts enable us to provide our society with an unprecedented learning
capacity to observe, monitor, and quantify the fluxes of water, sediment, solutes, and heat through
varying pathways at different scales on the surface of Earth. Environmental status and ecosystem
state can then be assessed through a more lucid and objective approach. Yet this requires linking
remote sensing image processing with change detection in a more innovative way.
In an attempt to enlarge the application potential, sensor and data fusion with improved spatial,
temporal, and spectral resolution has become a precious decision support tool that helps observe
complex and dynamic Earth systems at different scales. The need to build more comprehensive and
predictive capabilities requires intercomparing earth observations across remote sensing platforms
and in situ field sites, leading to cohesively explore multiscale earth observation from local up
to a regional or global extent for scientific investigation (CUAHSI, 2011). Recent advancements
in artificial intelligence techniques have motivated a significant initiative of advanced image
processing for better feature extraction, information retrieval, classification, pattern recognition,
and knowledge discovery. In concert with image and data fusion, the use of machine learning and
an ensemble of classifiers to enhance such an initiative are gaining more attention. The progress in
this regard will certainly help answer more sophisticated and difficult science questions as to how
1
2 Multisensor Data Fusion and Machine Learning for Environmental Remote Sensing
Source
Sensors
Information
Atmospheric People
Transm
Hardware
interaction
Processes
ission
Software
Data
Information
system
Target
Receiving
unit
Processing
unit
and control with the aid of wired or wireless communication systems to produce multidimensional
information and to monitor the presence of unique events. The SoSE approach may certainly provide
sound routine monitoring, early warning, and emergency response capacity in our society, which is
facing climate change, globalization, urbanization, economic development, population growth, and
resource depletion.
An integrated Earth system observatory that merges surface-based, air-borne, space-borne, and
even underground sensors with comprehensive and predictive capabilities indicates promise for
revolutionizing the study of global water, energy, and carbon cycles as well as land use and land
cover changes. This may especially be true if these multisensor data fusion and machine learning
technologies are developed and deployed in a coordinated manner and the synergistic data are further
screened, synthesized, analyzed, and assimilated into appropriate numerical simulation models for
advanced decision analysis. Thus, the aim of this book is to present a suite of relevant concepts,
tools, and methods of the integrated multisensor data fusion and machine learning technologies to
promote environmental sustainability.
Data fusion is a multi-level, multifaceted process dealing with the automatic detection of the registration,
detection, association, correlation, and combination of data and information from multiple sources to
achieve refined state and identity estimation, and complete timely assessments of situation including
both threats and opportunities.
The data fusion process proposed by JDL was classified into five processing levels, an associated
database, and an information bus that connects the five levels (Castanedo, 2013). The five levels of
processing are defined as (Figure 1.2):
In this book, machine learning or data mining algorithms are emphasized to help feature extraction
of remote sensing images. The discussion of data mining in this book is based on the following
definition:
4 Multisensor Data Fusion and Machine Learning for Environmental Remote Sensing
Sensors
FIGURE 1.2 Structure of the JDL data fusion. (From Chang, N. B. et al., 2016. IEEE Systems Journal, 1–17.)
Data mining, sometimes called knowledge discovery, is a big data analytic process designed to explore
or investigate data from different perspectives in search of useful patterns, consistent rules, systematic
relationships, and/or applicable information embedded among various types of flexibly grouped system
variables of interests. It requires subsequent validation of the findings by applying these findings to new
subsets of data.
The discussion of machine learning in this book is based on the following definition:
Machine learning that was born from artificial intelligence is a computer science theory making
computers learn something from different data analyses and even automate analytical or empirical
model building without being coded to do so. By using various statistics, decision science, evolutionary
computation, and optimization techniques to learn from data iteratively, machine learning allows
computers to identify or discover hidden rules, inherent patterns, possible associations, and unknown
interactions without being programmed what to search and where to look explicitly.
The major difference between data mining and machine learning is that the former has no clue
about what the patterns or rules are in a system whereas the latter has some clues in advance about
what the system looks like based on local or labeled samples. In image classification and feature
extraction of remote sensing studies, such a distinction rests on whether or not the system of interest
has some ground or sea truth data to infer. Since ground or sea truth data may or may not be
available in different types of remote sensing studies, and collecting ground or sea truth data is
favored for image classification and feature extraction toward much better prediction accuracy, the
emphasis of this book is placed on machine learning rather than data mining although both cases
are discussed in the context.
The process of data mining or machine learning consists of four general stages: (1) the initial
exploration (i.e., data collection and/or sampling); (2) model building or pattern identification and
recognition; (3) model verification and validation; and (4) application of the model to new data in
order to generate predictions.
The niche for integrating data fusion and machine learning for remote sensing rests upon the
creation of a new scientific architecture in remote sensing science that is designed to support
numerical as well as symbolic data fusion managed by several cognitively oriented machine learning
tasks. Whereas the former is represented by the JDL framework, the latter is driven by a series of
image restorations, reconstructions, enhancements, segmentations, transformations, and fusions for
intelligent image processing and knowledge discovery (Figure 1.3) (Chang et al., 2016). Well-known
machine learning methods include but are not limited to genetic algorithms, genetic programming,
Introduction 5
National
Distributed
Local
Source Processing:
Level 0 pre-processing Level 1 object refinement
Data fusion domain
Processing:
Level 2 Database management
situation refinement
system
Processing:
Level 3
impact refinement
Fusion Fusion
database database
Processing:
Level 4 process refinement
FIGURE 1.3 The new architecture of integrated data fusion and machine learning.
artificial neural networks, particle swarm optimization, support vector machines, and so on (Zilioli
and Brivio, 1997; Volpe et al., 2007; Chen et al., 2009; Bai et al., 2015; Chang et al., 2015). They can
be used for data mining as well if we do not have in situ observations.
• Part I—Fundamental Principles of Remote Sensing: This part of the discussion will
demonstrate a contemporary coverage of the basic concepts, tools, and methods associated
with remote sensing science. The relationship between electromagnetic radiation and remote
sensing will be discussed in Chapter 2. Then the types of sensors and platforms that provide
the image acquisition capacity will be delineated and emphasized in Chapter 3. Finally, the
method of pre-processing raw images and how pre-processed images can be integrated with
6 Multisensor Data Fusion and Machine Learning for Environmental Remote Sensing
a geographical information system for different types of analyses via differing software
packages will be introduced in Chapter 4.
• Part II—Feature Extraction for Remote Sensing: With the foundation of Part I, Part II
aims to introduce basic feature extraction skills and discussion of the latest machine
learning techniques for feature extraction. In this context, an overview of concepts
and basics for feature extraction will be introduced for readers who do not have such a
background (Chapter 5). The statistics or probability-based and machine-learning-based
feature extraction methods will be described in sequence to entail how remote sensing
products can be analyzed and interpreted for different purposes (Chapters 6 and 7). These
machine learning analyses need to be validated with the aid of ground- or sea-based
sensor networks that help collect reference data. Along this line, the final product can be
integrated further with numerical simulation models in support of high-end research for
environmental science and Earth system science.
• Part III—Image and Data Fusion for Remote Sensing: Following the streamlines of
logic sequence in Parts I and II, Part III will focus on the concepts of image fusion with
respect to current technology hubs (Chapter 8) toward an all-inclusive coverage of the
most important image or data fusion algorithms (Chapter 9). These image and data fusion
efforts may lead to the derivation of a series of new remote sensing data products with
sound system design and the corresponding advancements must be specifically addressed
and evaluated by a suite of performance-based evaluation metrics (Chapter 10). Situation
refinement can be made possible with iterative work based on the knowledge developed
in this part.
• Part IV—Integrated Data Merging, Data Reconstruction, Data Fusion, and Machine
Learning: Starting at Part IV, focus is placed on large-scale, complex, and integrated
data merging, fusion, image reconstruction, and machine learning scenarios.
Chapter 11 offers intensive information about the concepts and tools of data merging
with respect to multiple satellite sensors. Whether using data fusion or data merging,
cloudy pixels cannot be fully recovered. Thus, cloudy pixel reconstruction at appropriate
stages may come to elevate the accomplishment of data fusion and merging in support
of machine learning (Chapter 12). With the inclusion of Chapter 12, regardless of
intensity and spatial variability of cloudy pixels regionwide, the nature of cloudy pixel
reconstruction with signal processing and machine learning techniques may greatly
expand the general utility of fused or merged images. Chapter 13 presents a holistic
discussion of integrated data fusion and machine learning for intelligent feature
extraction. Chapter 14 demonstrates the highest level of synergy with a SoSE approach
that enables readers to comprehend the sophistication of integrated cross-mission data
merging, fusion, and machine learning algorithms flexibly toward better environmental
surveillance. From various SoSE approaches, the integrated data fusion, merging, and
machine learning processes may be evaluated by a set of indices for performance
evaluation.
• Part V—Remote Sensing for Environmental Decision Analysis: Expanding upon the
predictive capabilities from multisensor data fusion, merging, image reconstruction,
and machine learning, the area of environmental applications may include but are not
limited to air resources management (Chapter 15), water quality management (Chapter
16), ecosystem toxicity assessment (Chapter 17), land use and land cover change detection
(Chapter 18), and air quality monitoring in support of public health assessment (Chapter
19). These case studies from Chapter 15 to Chapter 19 will be systematically organized
in association with current state-of-the-art remote sensing sensors, platforms, tools, and
methods applied for environmental decision making.
Introduction 7
REFERENCES
Bai, K. X., Chang, N. B., and Chen, C. F., 2015. Spectral information adaptation and synthesis scheme for
merging cross-mission consistent ocean color reflectance observations from MODIS and VIIRS. IEEE
Transactions on Geoscience and Remote Sensing, 99, 1–19.
Castanedo, F., 2013. A review of data fusion techniques. The Scientific World Journal, 2013, 704504.
Chang, N. B., Bai, K. X., and Chen, C. F., 2015. Smart information reconstruction via time-space-spectrum
continuum for cloud removal in satellite images. IEEE Journal of Selected Topics in Applied Earth
Observations, 99, 1–19.
Chang, N. B., Bai, K. X., Imen, S., Chen, C. F., and Gao, W., 2016. Multi-sensor satellite image fusion,
networking, and cloud removal for all-weather environmental monitoring. IEEE Systems Journal, 1–17,
DOI: 10.1109/JSYST.2016.2565900.
Chen, H. W., Chang, N. B., Yu, R. F., and Huang, Y. W., 2009. Urban land use and land cover classification
using the neural-fuzzy inference approach with Formosat-2 Data. Journal of Applied Remote Sensing,
3, 033558.
Consortium of Universities for the Advancement of Hydrologic Science, Inc. (CUAHSI), [Link]
org/[Link], accessed May 2011.
Hall, D. L. and Llinas, J., 1997. An introduction to multisensor data fusion. Proceedings of the IEEE, 85(1),
6–23.
United States Department of Defense, 2008. Systems Engineering Guide for Systems of Systems, [Link]
[Link]/se/docs/[Link], accessed February 13, 2016.
Volpe, G., Santoleri, R., Vellucci, V., d’Alcalà, M. R., Marullo, S., and D’Ortenzio, F., 2007. The colour of
the Mediterranean Sea: Global versus regional bio-optical algorithms evaluation and implication for
satellite chlorophyll estimates. Remote Sensing of Environment, 107, 625–638.
Zilioli, E. and Brivio, P. A., 1997. The satellite derived optical information for the comparative assessment of
lacustrine water quality. Science of the Total Environment, 196, 229–245.
Part I
Fundamental Principles of Remote
Sensing
2 Electromagnetic Radiation
and Remote Sensing
2.1 INTRODUCTION
Identified by Einstein in 1905, quanta—or photons—stand for the energy packets that are particles
of pure energy; such particles have no mass when they are at rest. While the German physicist
Max Planck was developing the blackbody radiation law, he realized that the incorporation of the
supposition that electromagnetic energy could be emitted only in “quantized” form was the key
to smoothly interpreting the electromagnetic wave. This scientific discovery is the reason he was
awarded the Nobel Prize in Physics in 1918. On the other hand, light must consist of bullet-like tiny
particles, now known as photons, as Einstein pointed out in 1905. The photoelectric effect brought
up by Einstein in 1905 successfully supplemented the quantization supposition that Planck proposed.
This is also the reason Einstein was awarded the 1921 Nobel Prize in Physics. When possessing a
certain quantity of energy, a photon is said to be quantized by that quantity of energy. Therefore,
the well-known “wave-particle” duality entails the findings of Planck and Einstein that all forms
of electromagnetic radiation (EMR) and light behave as waves and particles simultaneously in
quantum mechanics. These findings imply that every quantic entity or elementary particle exhibits
the properties of waves and particles, from which the properties of light may be characterized.
Photons as quanta thus show a wide range of discrete energies, forming a basis for the spectrum
of EMR. Quanta may travel in the form of electromagnetic waves, which provide remote sensing a
classical basis for data collection.
Sunlight refers to the portion of the EMR spectrum given off by the sun, particularly in the
range of infrared, visible, and ultraviolet light. On Earth, before the sunlight can reach ground
level, sunlight is filtered by the atmosphere. The interactions among solar radiation, atmospheric
scattering and reflections, and terrestrial absorption and emission play a key role in the ecosystem
conditions at the surface of Earth. Atmospheric radiative transfer processes with the effect of
transmission, absorption, reflection, and scattering have collectively affected the energy budget
of the atmospheric system on Earth. For example, absorption by several gas-phase species in the
atmosphere (e.g., water vapor, carbon dioxide, or methane) defines the so-called greenhouse effect
and determines the general behavior of the atmosphere, which results in a surface temperature
higher than zero degrees Celsius (273.15 K). In addition to the natural system, human activities
have had a profound impact on the energy budget of the earth system. To some extent, air pollutants
emitted by anthropogenic activities also affect the atmospheric radiative transfer processes and
result in environmental effects and public health impact.
Following this deepened understanding of EMR, wavelength-dependent analyses for remote
sensing data collection are often highlighted with respect to the given band specifications in
the literature. Remote sensing sensor design based on specified bands and center wavelengths
thus becomes feasible for collecting various images for processing, information extraction, and
interpretation. Depending on the goals of each individual application, satellites onboard different
sensors may be regarded as a cohesive task force to achieve a unique mission for earth observation
and environmental monitoring. It is the aim of this chapter to establish a foundation by introducing
a series of basic concepts and methods along this line.
11
12 Multisensor Data Fusion and Machine Learning for Environmental Remote Sensing
E = hν = h(c/λ ) (2.1)
in which E is the radiant energy of a photon (in joules), h is Planck’s constant (6.63 × 10 −34 Joules-sec
or Watt-sec2), c is the speed of light (=3 × 108 m · s−1), λ is its wavelength (in meters), and v represents
frequency (in hertz).
Electric field
λ = wavelength
ν = frequency
Reference point
Magnetic field
Transmission
Amplitude
direction
Wavelength in centimeters
10–12 10–10 10–8 10–6 10–4 10–2 100 102 104
FIGURE 2.2 Comparison of wavelength, frequency, and energy for the electromagnetic spectrum. (NASA’s
Imagine the Universe, [Link] accessed March, 2013.)
TABLE 2.1
Approximate Frequency, Wavelength, and Energy Limits of the Various Regions
of the EM spectrum
Wavelength (m) Frequency (Hz) Energy (J)
Radio Waves >1 × 10 −1 <3 × 10 9 <2 × 10−24
Microwave 1 × 10 ∼ 1 × 10
−3 −1 3 × 10 ∼ 3 × 10
9 11 2 × 10−24 ∼ 2 × 10−22
Infrared 7 × 10 ∼ 1 × 10
−7 −3 3 × 10 ∼ 4 × 10
11 14 2 × 10−22 ∼ 3 × 10−19
Optical 4 × 10 ∼ 7 × 10
−7 −7 4 × 10 ∼ 7.5 × 10
14 14 3 × 10−19 ∼ 5 × 10−19
Ultra Violet 1 × 10 ∼ 4 × 10
−8 −7 7.5 × 10 ∼ 3 × 10
14 16 5 × 10−19 ∼ 2 × 10−17
X-ray 1 × 10−11 ∼ 1 × 10−8 3 × 10 ∼ 3 × 10
16 19 2 × 10−17 ∼ 2 × 10−14
All parts of the EM spectrum consist of EM radiation produced through different processes. They
are detected in different ways by remote sensing, although they are not fundamentally different
relative to the nature of EM radiation.
Transmission Reflection
θ1 θ1 θ2 θ1 = θ2
Medium 1 θ1 > θ2
θ2
Medium 2
Medium 1
θ1
Scattering Absorption
Emission
Emission
For energy conservation, the summation of the fraction of the total radiation energy associated with
transmission, absorption, and reflection must be equal to 1.
Similarly, the atmosphere filters the energy delivered by the sun and emitted from Earth while
performing radiative transfer. A series of radiative transfer processes may collectively describe
the interaction between matter and radiation; such interactions might involve matter such as gases,
aerosols, and cloud droplets in the atmosphere and the four key processes of absorption, reflection,
emission, and scattering. Whereas scattering of an incident radiation by the atmospheric matter
results in a redistribution of the radiative energy in all directions, absorption of an incident radiation
by the atmospheric matter results in a decrease of radiative energy in the incident direction. Figure
2.4 conceptually illustrates the radiative transfer processes through an atmospheric layer. In such
Aerosols
scattering
Emitted by the
atmosphere Atmospheric
emission
Absorbed
by the
atmosphere
Reflected by
surface Surface Black
radiation radiation
Surface characterization
Temperature, albedo, emissivity, composition
FIGURE 2.4 Radiative transfer processes through an atmospheric layer. (NASA Goddard Space Flight
Center, PSG, 2017. [Link] accessed December, 2017.)
Electromagnetic Radiation and Remote Sensing 15
an environment, whether radiation is absorbed or transmitted depends on the wavelength and the
surface properties of the matter in the atmospheric environment. When an air-borne or a space-
borne sensor views Earth, reflection and refraction are two key radiative transfer processes; these
two processes are discussed in detail below.
2.4.2 Reflection
Our ability to see luminous objects with our eyes depends on the reflective properties of light, as
does an air-borne or a space-borne sensor. In the earth system, types of reflections at the surface
of Earth include specular and diffuse reflection (Figure 2.5). Factors affecting surface reflectance
on Earth include absorption features (e.g., water, pigments, and minerals) at ground level, surface
roughness, and observation and illumination angles. Specular reflection occurs on smooth surfaces
(Figure 2.5a) whereas varying surface roughness may result in Lambertian or diffuse reflectance
(Figure 2.5b). Note that diffuse reflection that is also termed Lambertian bidirectional reflection
distribution function (BRDF) occurs on rough surfaces (Figure 2.5b) such as forest or agricultural
fields. In diffuse reflection, the roughness of the surface results in variations of the normals
along the surface; however, all of the reflected rays still behave according to the law of reflection,
which states that the incident ray, the reflected ray, and the normal to the surface of the mirror
all lie in the same plane. The BRDF depends on wavelength and the BRDF function is based on
illumination geometry and viewing geometry, which is determined by the optical and structural
properties of the surface. These properties include but are not limited to: multiple scattering, facet
orientation distribution, facet density, shadow casting, mutual shadowing, reflection, absorption,
transmission, and emission by surface objects. Hence, BRDF is related to Lambertian reflection,
which defines how light reflected at an opaque surface differs from what we may see with our
eyes with respect to the same scene when Earth moves over different positions relative to the sun.
2.4.3 Refraction
Refraction is a light movement effect that happens between transparent media of different densities
in which the transparent media can be air, water, or even snow on dearth (Robinson, 1997). The
bending effect of light in association with the refraction media is a physical representation of the
longer time it takes for light to move through the denser of two media (Figure 2.6). The level of
refraction is dependent on the incident angle with which the ray of light strikes the surface of the
medium. Given that the temperature of the atmospheric layers may vary with height, this effect
could affect the density of air, and bias of the signals collected by remote sensing sensors could
impact the measurement accuracy.
θ2 Glass Water
θ2 = angle of
EX: glass block Air Actual position Air
reflection θ1
of object
Visible
Ultra-
Gamma X-ray violet Infrared Microwave Radio
(UV) (IR)
Shorter waves Longer waves
Thermosphere
(auroras)
Mesosphere
(meteors burn up)
Stratosphere
Atmosphere
Troposphere
(weather)
Optical
“window” Radio “window”
FIGURE 2.7 Atmospheric windows and radiative transfer. (Modification of work by STScI/JHU/NASA.)
The illustration of Figure 2.7 shows how far different portions of the EM spectrum can move
forward before being absorbed in the atmosphere. It is notable that only portions of visible light,
infrared, and some ultraviolet light can reach Earth’s ground surface or make it to sea level. EM
radiation from space that is able to reach the surface of Earth through the atmosphere window
provides a wealth of ground-leaving or water-leaving reflectance data for remote sensing sensors
to collect.
Such atmospheric windows deeply affect the assessment of the environmental sustainability of the
earth system. For instance, the stratosphere, located in the upper atmosphere, and the troposphere,
located in the lower atmosphere, are chemically identical in terms of ozone molecules. However,
these ozone molecules have very different roles in these two layers and very different effects on
life systems on the surface of Earth. As shown in Figure 2.7, stratospheric ozone filters most of the
solar ultraviolet radiation, playing a beneficial role by absorbing most of the biologically damaging
ultraviolet sunlight, known as UV-B. Ozone near the ground surface in the tropospheric layer not
only lacks the filtering action of the ozone layer, but is also toxic to life on Earth. In addition, the
change of water-leaving reflectance associated with different wavelengths of visible light can be
regarded as a surrogate index for monitoring water pollution. Terrestrial thermal emissions are
correlated with the evapotranspiration process through all plant species, which may be monitored
by remote sensing to understand the ecosystem status.
18 Multisensor Data Fusion and Machine Learning for Environmental Remote Sensing
Reflected Thermal
Radiance sensor
sunlight emission
Blue Green Red Near infrared Shortwave infrared Midwave infrared Longwave infrared
FIGURE 2.8 Definition of specific spectral regions and visible and infrared radiation.
Electromagnetic Radiation and Remote Sensing 19
TABLE 2.2
Band Distribution of Microwave Remote Sensing and Related Properties
Designation Wavelength Range Frequency (GHz) Applications
105
Energy emitted, w/m2-micron-sr
104
103
UV Infrared
102 Radiation
101
100
10–1
10–2
10–1 100 101 102
Wavelength, microns
FIGURE 2.9 Comparisons of the spectral emittance from a 6,000 K Blackbody with a candle and total
irradiance at Earth’s surface at the TOA level. (NASA Data Center, 2016)
heat it takes to increase the unitary temperature of the soil layer. The lower the value of the
thermal diffusivity, the less the temperature rises further into the soil, and the higher the reflected
radiation into the atmosphere. Net radiation in this context is the amount of energy actually added
to the earth system.
Earth’s net radiation is the balance between outgoing and incoming energy at the TOA level. The
solar energy arriving at the surface can vary from 550 W/m2 with cirrus clouds to 1025 W/m2 with
a clear sky (Krivova et al., 2011). Earth and the atmosphere absorb 341 W/m2 of solar radiation on
average annually (Johnson, 1954). In view of the energy budget delineation in Figure 2.4, outgoing
longwave radiation is EMR emitted from Earth and its atmosphere out to space in the form of
thermal radiation through both soil layers and atmospheric layers. Most of the outgoing longwave
radiation has wavelengths (from 4 to 100 µm) in the thermal infrared part of the electromagnetic
spectrum. In fact, our planet’s climate is driven by absorption, reflection, shortwave or longwave
emission, and scattering of radiation within the atmosphere due to the presence of thin clouds,
aerosol, and some gases. Cases can be seen in some extreme weather events such as tropical storms
and hurricane assessment.
Sun
Zenith
Altitude
S N
uth
Azim
• Top of the atmosphere (TOA): TOA is defined as the outermost layer of Earth’s atmosphere,
which is the upper limit of the atmosphere—the boundary of Earth that receives sun
radiation.
• Albedo: Albedo means whiteness in Latin, and refers to the fraction of the incident sunlight
that the surface reflects. The residual radiation not reflected is then absorbed by the surface.
• Spherical albedo of the atmosphere: Spherical albedo is the average of the plane albedo
over all sun angles. Spherical albedo of the atmosphere is the effective albedo of an entire
planet that is the average of the plane albedo over all sun angles at TOA.
2.8 SUMMARY
In this chapter, some basic properties of light and concepts of EMR are introduced sequentially to
support the basic understanding of remote sensing. The discussion is followed by remote sensing
data collection conditional to atmospheric windows and specified band regions. In addition, the
global energy budget in relation to thermal radiation is presented to provide a complementary
view of thermal emission relative to sun light reflection. The chapter ends by including the basic
terminologies of remote sensing for environmental monitoring and earth observation.
REFERENCES
Eismann, M. T., 2012. Hyperspectral Remote Sensing. SPIE Press, Bellingham, Washington, USA.
Johnson, F. S., 1954. The solar constant. Journal of Meteorology, 11, 431–439.
Krivova, N. A., Solanki, S. K., and Unruh, Y. C., 2011. Towards a long-term record of solar total and spectral
irradiance. Journal of Atmospheric and Solar-Terrestrial Physics, 73, 223–234.
Robinson, D. A., 1997. Hemispheric snow cover and surface albedo for model validation. Annals of Glaciology,
25, 241–245.
National Aeronautics and Space Administration (NASA) Data Center, [Link]
radiation-energy-transfer/, accessed June, 2016.
National Aeronautics and Space Administration (NASA) Goddard Space Flight Center, [Link]
[Link]/science/toolbox/[Link], accessed June, 2016.
National Aeronautics and Space Administration (NASA) Goddard Space Flight Center, PSG, 2017. [Link]
[Link]/[Link], accessed December, 2017.
National Aeronautics and Space Administration (NASA) Goddard Space Flight Center, [Link]
[Link]/science/toolbox/[Link], accessed August, 2016.
3 Remote Sensing Sensors
and Platforms
3.1 INTRODUCTION
Solar energy moves from the sun to Earth and finally to the satellite sensors onboard a variety of
platforms for measurement. Remote sensing images and data provide critical information about
how the natural system can be sustained over time. Such radiative transfer processes reveal how
the solar energy is partitioned into different compartments in the natural system. Remote sensing
images and data with different spatial, spectral, radiometric, and temporal resolution need to
be pre-processed, retrieved, analyzed, interpreted, and mapped in an iterative and holistic way
to support various types of decision analysis for sustainable development. In cases that require
involving more than one satellite for applications, automated data merging and/or fusion processes
for dealing with challenging problems are critical for supporting human decision making, which
requires linking data with information, knowledge discovery, and decision analysis to achieve
timely and reliable projections of a given situation in a system (Figure 3.1), such as climate change
impact.
Consequently, as mentioned in Chapter 2, the following energy partition terminologies in the
radiative transfer processes in the natural environment deeply influence the system design of both
sensors and platforms and deserve our attention:
• Transmitted energy—The energy that passes through a medium with a change in the
velocity of the light as determined by the refraction index for two adjacent media of interest.
• Absorbed energy—The energy that is surrendered to the target through electron or even
molecular reactions.
• Reflected energy—The energy bounced back with an angle of incidence equal to the angle
of reflection.
• Scattered energy—The energy that is diffused into the air with directions of energy
propagation in a randomly changing condition. Rayleigh and Mie scattering are the two
major types of scattering in the atmosphere.
• Emitted energy—The energy that is first absorbed, then re-emitted as thermal emissions at
longer wavelengths while the target, such as the ground level, heats up.
As mentioned, remote sensing images and data play a critical role in understanding the solar
energy paths and extracting features associated with targets. Before conducting data merging and/
or fusion, which fuses or merges images of different spatial, temporal, and spectral resolution, there
is a need to understand important functionalities of different sensors and platforms individually or
collectively. This chapter thus aims to investigate quality sensors and platforms capable of supporting
multisensor data merging and/or fusion based on synthetic aperture radar, infrared, and optical
remote sensing images and data. The following sections present different classification principles of
sensors and platforms to establish a fundamental understanding of remote sensing systems as well
as the current, historic, and future missions of remote sensing with inherent connections. These
sensors and platforms are regarded as enabling technologies for monitoring solar radiation and
improving our comprehension of the interactions between solar energy and materials over different
wavelengths at the ground level or in atmospheric layers. The sensor, bands, spatial resolution, swath
width, spectral range, and temporal resolution associated with a suite of major multispectral remote
23
24 Multisensor Data Fusion and Machine Learning for Environmental Remote Sensing
FIGURE 3.1 Contribution of remote sensing images and data to sustainable development.
sensing platforms are highlighted for demonstration. An understanding of this relevant knowledge
may lead to optimizing the system planning and design of data merging and/or fusion, meeting
the overarching goal of the system of systems engineering, when more sensors and platforms with
different features become available and join the synergistic endeavor.
3.2.1 Space-Borne Platforms
The most popular platform for remote sensing aloft is a space-borne satellite. Over three-thousand
remote sensing satellites have been launched since 1957 at which Russia launched the first man-
made satellite of Sputnik 1. In addition, the space shuttle, which functions as a remote sensing
satellite, belongs to this category. However, the space shuttle can be reused for multiple missions,
unlike satellites. The path of a satellite in space is referred to as its orbit. Satellites can be classified
based on either orbital geometry or timing for image acquisition. Two types of orbits, including
geostationary/equatorial and polar/Sun synchronous, are commonly used as a broad guideline
for the classification of remote sensing satellites (Natural Resources Canada, 2017). These orbits
are fixed after launch and can be only slightly adjusted to maintain their anticipated position for
Remote Sensing Sensors and Platforms 25
environmental monitoring and earth observation over time. The type of orbit that affects the design
of the sensor onboard determines its altitude with respect to Earth and the limit of its instantaneous
field of view (i.e., the area on Earth which can be viewed at any moment in time).
In general, geostationary or equatorial satellites are designed to have a period of rotation equal to
24 hours, the same as that of the Earth, making these satellites consistently stay over the same location
on the top of Earth. These geostationary satellites must be placed at a very high altitude (∼36,000 km)
to maintain an orbital period equal to that of Earth’s rotation and appear to be stationary with respect
to Earth, as illustrated in Figure 3.2. Due to the stationarity, any sensor onboard these satellites can
only view the same area of Earth over a very large area because of the high altitude. Such satellites
normally circle Earth at a low inclination in an equatorial orbit (i.e., inclination is defined as the
angle between the orbital plane and the equatorial plane). This type of system design of geostationary
orbits can meet the needs for communications and weather monitoring, hence many of them are
located over the equator. However, the space shuttle chose an equatorial orbit with an inclination
of 57 degrees. The space shuttle has a low orbital altitude of 300 km, whereas other common polar
satellites typically maintain orbits ranging from 200 to 1,000 km.
Polar-orbiting or sun-synchronous satellites are designed to pass above (i.e., polar) or nearly
above (i.e., sun-synchronous or near-polar orbits) each of Earth’s poles periodically. Polar or sun-
synchronous orbits are thus the most common orbits for remote sensing due to the need to provide
illumination for passive sensors. Note that although active sensors such as LiDAR and radar do not
need the Sun’s illumination for image acquisition, passive sensors count on solar energy as a source
of power. We will define active and passive sensors later in this chapter. Both types of polar-orbiting
satellites with similar polar orbits can pass over the equator at a different longitude at the same
local sun time on each revolution, as illustrated in Figure 3.2. The satellite revisit time (or revisit
interval or revisit period) is the time elapsed between two successive observations of the same
point on Earth, and this time interval is called the repeat cycle of the satellite. Each repeat cycle
enables a polar-orbiting satellite to eventually see every part of Earth’s surface. A satellite with a
near-polar orbit that passes close to the poles can cover nearly the whole earth surface in a repeat
cycle depending on sensor and orbital characteristics. For most polar-orbiting or sun-synchronous
satellites, the repeat cycle ranges from twice a day to once every 16 days. Real-world examples
include Landsat and the well-known Earth Observing System (EOS) series satellites such as Terra
and Aqua. Such an attribute of global coverage is often required for holistic earth observation.
Data collected by most remote sensing satellites can be transmitted to ground receiving stations
immediately or can be temporarily stored on the satellite in a compressed form. This option depends
on whether the receiving station has a line of sight to the satellite when the satellite wishes to
transmit the data. If there are not enough designated receiving stations around the world to be in
line with the satellite, data can be temporarily stored onboard the satellite until acquiring direct
contact with the ground-level receiving station. Nowadays, there is a network of geosynchronous
(geostationary) communications satellites deployed to relay data from satellites to ground receiving
stations, and they are called the Tracking and Data Relay Satellite System (TDRSS). With the
availability of TDRSS, data may be relayed from the TDRSS toward the nearest receiving stations
without needing to be stored temporarily onboard the satellite.
3.2.2 Air-Borne Platforms
Air-borne platforms collect aerial images with cameras or sensors; airplanes are currently the most
common air-borne platforms. When altitude and stability requirements are not limiting factors
for a sensor, simple, low-cost aircraft such as an Unmanned Aerial Vehicle (UAV) can be used as
platforms, too. If instrument stability and/or higher altitude requirements become essential, more
sophisticated high-altitude aircraft platforms that can fly at altitudes greater than 10,000 meters
above sea level must be employed for remote sensing applications. Included in this case are the fixed-
wing, propeller-driven planes handled by pilots. Although they are limited to a relatively small area,
all air-borne platforms are more suitable for acquiring high spatial resolution data. Remote sensing
instruments may be either outfitted in the underside of the airplane or simply hung out the door of the
airplane using simple mounts. On the other hand, mid-altitude aircraft have an altitude limit under
10,000 meters above sea level, and are used when stability is demanded and when it is essential for
acquiring imagery remotely that would not be available from low-altitude aircraft such as helicopters
or UAV. A real-world example includes the C-130 air-borne platform, owned by the National Center
for Atmospheric Research (NCAR) and the National Science Foundation (NSF) in the United States.
(a) (b)
FIGURE 3.3 The difference between (a) passive sensors and (b) active sensors. (National Aeronautics and
Space Administration (NASA), 2012. [Link]
funfacts/txt_passive_active.html, accessed May 2017)
frequency bands. On the other hand, active sensors send out their own energy for illumination. This
means the sensors can emit their own radiation directed toward the target of interest. The reflected
radiation from that target of interest is detected and measured by the sensor (Figure 3.3b).
Regardless of the type of sensors that are radiation-detection imagers, the resulting images embedded
with these DN values associated with each band have different resolutions, as summarized below.
• Spatial resolution: Spatial resolution is usually measured in pixel size, depending on focal
length, detector size, and sensor altitude. Spatial resolution is a key factor required for the
discrimination of essential features.
• Spectral resolution: Spectral resolution is the density of the spectral bands in the
electromagnetic spectrum of multispectral or hyperspectral sensors; each band corresponds
to an image.
• Radiometric resolution: Radiometric resolution, usually measured in binary digits (bits),
is the range of available brightness values corresponding to the maximum range of DNs in
the image, specifying the ability of a sensor to distinguish the differences in brightness (or
grey-scale values) while acquiring an image. For example, an image with 8-bit resolution
has 256 levels of brightness (Richards and Jia, 2006).
• Temporal resolution: Temporal resolution is the time required for revisiting the same area
of Earth (NASA, 2013).
Most of the current satellite platforms for remote sensing follow polar or near-polar (i.e., sun-
synchronous) orbits. These satellites travel to the northern pole on one side of Earth (ascending
passes) and then proceed toward the southern pole on the second half of their orbital paths
(descending passes). If the orbit is a sun-synchronous orbit rather than a pure polar orbit, the
descending mode is on the sunlit side, while the ascending mode is normally on the shadowed side of
Earth. Passive optical sensors onboard these satellites may record reflected images from the surface
on a descending pass, when solar illumination is available. Unlike active sensors, which count on
their own illumination, passive sensors only record emitted radiation (e.g., thermal radiation) and
can also image the surface of Earth on ascending passes. Active and passive sensors are further
classified in detail below.
28 Multisensor Data Fusion and Machine Learning for Environmental Remote Sensing
0.5 Soil
0.4
Reflectance
0.3
0.2 Vegetation
0.1
Water
0
0.5 0.7 0.9 1.1 1.3 1.5 1.7 1.9 2.1 2.3 2.5
Wavelength (µm)
In addition, these spectrometers may include, but are not limited to:
The difference between multispectral and hyperspectral imaging is illustrated in the diagram
shown below (Figure 3.5). Broadband sensors typically produce panchromatic images with very
wide bandwidths, typically 400–500 nanometers. For example, WorldView-1 produced panchromatic
images with a high spatial resolution of 50 centimeters. Most multispectral imagers have four basic
spectral bands, including blue, green, red, and near-infrared bands. Some multispectral imaging
satellites, such as Landsats 7 and 8, have additional spectral bands in the shortwave infrared
(SWIR) region of the spectrum. Hyperspectral imaging systems are designed to obtain imagery over
hundreds of narrow, continuous spectral bands with typical bandwidths of 10 nanometers or less. For
example, the NASA JPL AVIRIS air-borne hyperspectral imaging sensor obtains spectral data over
224 continuous channels, each with a bandwidth of 10 nm over a spectral range from 400 to 2,500
nanometers. Ultraspectral sensors represent the future design of hyperspectral imaging technology.
3.3.2 Active Sensors
An active sensor in remote sensing systems is a radar, laser or acronym for light detection and ranging
(LiDAR) instrument used for detecting, measuring, and analyzing signals transmitted by the sensor,
FIGURE 3.5 The comparison among broadband, multispectral, hyperspectral, and ultraspectral remote sensing.
30 Multisensor Data Fusion and Machine Learning for Environmental Remote Sensing
which are reflected, refracted or scattered back by the surface of Earth and/or its atmosphere. The
majority of active sensors operate in the microwave portion of the electromagnetic spectrum, and
the frequency allocations of an active sensor from Ka band to L band (Table 2.2) are common to
other radar systems. Some active sensors are specifically designed to detect precipitation, aerosol,
and clouds, simply based on the radar echos. Active sensors have a variety of applications related
to hydrology, meteorology, ecology, environmental science, and atmospheric science. For example,
precipitation radars, either ground-based (Zrnic and Ryzhkov, 1999; Wood et al., 2001), air-borne
(Atlas and Matejka, 1985), or space-borne (Anagnostou and Kummerow, 1997), are designed to
measure the radar echo intensity (reflectivity) measured in dBZ (decibels) from rainfall droplets in
all directions to determine the holistic rainfall rate over the surface of Earth. These radar, LiDAR,
and laser sensors may include, but are not limited to:
• LiDAR: A Light Detection And Ranging active sensor (LiDAR) is designed to measure the
distance to a target by illuminating that target with a pulsed light amplification through
stimulated emission of radiation (laser) light and measuring the reflected pulses (the
backscattered or reflected light) using a receiver with sensitive detectors. Distance is equal
to velocity multiplied by time, so the distance to the object is calculated by recording the
time between transmitted and backscattered pulses multiplied by the speed of light.
• Laser altimeter: Mounted on a spacecraft or aircraft, a laser altimeter is a remote sensing
instrument that is designed to use a LiDAR to measure the height of Earth’s surface (either
sea level or ground level). It works by emitting short flashes of laser light toward the surface
of Earth. The height of the sea level or ground level with respect to the mean surface of
Earth is then calculated by the time spent between emitted and reflected pulses multiplied
by the speed of light to produce the topography of the underlying surface.
• Radar: An active radar sensor, whether air-borne or space-borne, emits microwave radiation
in a series of pulses from an antenna based on its own source of electromagnetic energy.
When the energy hits the target in the air or at the ground/sea level, some of the energy
is reflected back toward the sensor. This backscattered or reflected microwave radiation is
detected, measured, and analyzed. The time required for the energy to travel to the target and
return back to the sensor multiplied by the speed of light determines the distance or range to
the target. Therefore, a two-dimensional image of the surface can be produced by calculating
the distance of all targets as the remote sensing system passes through.
Microwave instruments that are designed for finer environmental monitoring and earth
observation may include, but are not limited to:
carried special television cameras used for Earth’s cloud cover observation from a 720 km (450 mile)
orbit, and was the first experimental attempt of the National Aeronautics and Space Administration
(NASA) to study Earth with satellite instruments. Since then, satellites have become the primary
platform utilized to carry remotely sensed instruments for earth observation and environmental
monitoring. These space-borne instruments take advantage of large spatial coverage and regular
revisiting periods. From 1964 to 1970, a series of four meteorological research satellites, named after
Nimbus, were launched into space and had profound impacts due to their synoptic views of Earth;
they provided information on issues such as weather dynamics and vegetation patterns.
With the rapid progress of new remote sensing technologies in each successive generation of
new satellites, remotely sensed instruments and platforms became increasingly sophisticated and
professional, generating finer temporal, spectral, and spatial resolution imagery on a routine basis.
There have been abundant remotely sensed instruments onboard different platforms since the passive
remote sensing techniques were first proposed for satellite Earth observation in the 1970s. The
chronological history of a set of famous satellite remote sensing platforms is illustrated in Figure 3.6.
One of the world’s best-known families of remote sensing satellites is Landsat, which is operated
by the USA and has evolved over the past 40 years. Landsat 1, which was also called the “Earth
Resources Technology Satellite” until 1975, is the first satellite in the Landsat family and was
launched in 1972, dedicated to periodic environmental monitoring. The Landsat 1 satellite had
two sensors, called the Return Beam Vidicon (RBV) and the Multispectral Scanner (MSS). The
MSS sensor was designed to capture images in the red, blue, and green spectra at 60 m resampled
resolution over four separate spectral bands between 500 and 1,100 nm. The other two successive
satellites (Landsat 2-3), were launched in 1975 and 1978, respectively. The same sensors were
deployed onboard the Landsat 2, while the spectral capability of the MSS sensor on Landsat 3 was
extended to measure radiation between 1,050 and 1,240 nm. Following the success of Landsat 1-3,
the Landsat 4 was launched in 1982 with improved spectral and spatial resolution. The RBV was
replaced with the thematic mapper (TM) sensor, providing seven bands from 450 to 2,350 nm with
30-m resolution pixels. In addition, the revisiting time of the satellite was improved from 18 days to
Fengyun–4A
OCO–2
Aura
Aqua 2016
2014 2015
Terra 2011
2009
2004
2002 GPM
1999
Landsat–1 Suomi–NPP Sentinel–2A
1995
1986 Envisat
GOSAT
1972
RADARSAT–1
1964
SPOT–1
1957
Nimbus–1
FIGURE 3.6 The timeline of some well-known Earth science satellite systems since the 1950s.
32 Multisensor Data Fusion and Machine Learning for Environmental Remote Sensing
16 days. Launched in 1985, Landsat 5 is a duplicate of Landsat 4, and its TM sensor remains active
25 years beyond its designated lifetime. The next two satellites (Landsat 6-7) were launched in
1993 and 1999, respectively. However, Landsat 6 did not reach its orbit due to launch failure. These
two satellites were equipped with the Panchromatic (PAN), Enhanced Thematic Mapper (ETM),
and Enhanced Thematic Mapper Plus (ETM+) sensors, providing a spatial resolution of 15-m
panchromatic and 30-m multispectral images. In addition, the latest generation Landsat satellite,
Landsat 8, was launched in 2013 with a two-sensor payload, including the Operational Land Imager
(OLI) and the Thermal InfraRed Sensor (TIRS). Landsat 8 OLI and TIRS images are comprised of
nine spectral bands with a spatial resolution of 30 m for bands 1 to 7 and 9 (Table 3.1) (Barsi et al.,
2014). The ultra-blue band 1 is useful for coastal and aerosol studies and band 9 is useful for cirrus
cloud detection (Barsi et al., 2014). Thermal bands 10 and 11 are useful for providing more accurate
surface temperatures and are collected at 100 m (Barsi et al., 2014).
Developed by NASA in the USA since the early 1970s, the Landsat program is the longest remote
sensing program in the world, providing over 40 years of calibrated data about Earth’s surface with
moderate resolution to a broad user community. In summary, Landsats 1–3 images are comprised of four
spectral bands with 60-m spatial resolution, and the approximate scene size is 170 km north-south by
185 km east-west (USGS, 2017). Specific band designations differ from Landsats 1, 2, and 3 to Landsats
4 and 5 (Tables 3.2 and 3.3). Landsats 4–5 images are comprised of seven spectral bands with a spatial
resolution of 30 m for bands 1 to 5 and 7 (Table 3.3). The approximate scene size is 170 km north-south
by 183 km east-west (USGS, 2017). The images of ETM+ are comprised of eight spectral bands with a
spatial resolution of 30 m for bands 1 to 7 (Table 3.4). Yet the resolution for band 8 (panchromatic) is 15
m, which provides a niche for data fusion. The approximate scene size of the ETM+ images is 170 km
north-south by 183 km east-west (USGS, 2017). The resolution for band 8 (panchromatic) is 15 m. The
approximate scene size is 170 km north-south by 183 km east-west (USGS, 2017). Overlapped bands
provide a critical basis for information consistency, which is essential for cross-checking the continuity
of the multispectral data coverage provided by Landsat missions (Figure 3.7).
Besides Landsat satellites, the second important remote sensing satellite family, SPOT (Satellite
Pour l’Observation de la Terre), was designed and subsequently launched by a French–Belgian–Swedish
TABLE 3.1
Comparison of Corresponding Basic Properties of Landsat 8 OLI
and TIRS Images
Landsat 8 Bands Wavelength (µm) Resolution (m)
Band 1—Ultra Blue (coastal/aerosol) 0.435–0.451 30
Band 2—Blue 0.452–0.512 30
Band 3—Green 0.533–0.590 30
Band 4—Red 0.636–0.673 30
Band 5—Near Infrared (NIR) 0.851–0.879 30
Band 6—Shortwave Infrared (SWIR) 1 1.566–1.651 30
Band 7—Shortwave Infrared (SWIR) 2 2.107–2.294 30
Band 8—Panchromatic 0.503–0.676 15
Band 9—Cirrus 1.363–1.384 30
Band 10—Thermal Infrared (TIRS) 1 10.60–11.19 100 * (30)
Band 11—Thermal Infrared (TIRS) 2 11.50–12.51 100 * (30)
TABLE 3.2
Comparison of Corresponding Basic Properties of Landsats
1–3 Multispectral Scanner (MSS) Images
Landsat 1–3 MSS Bands Wavelength (µm) Resolution (m)
Band 4—Green 0.5–0.6 60a
Band 5—Red 0.6–0.7 60a
Band 6—Near Infrared (NIR) 0.7–0.8 60a
Band 7—Near Infrared (NIR) 0.8–1.1 60a
meters.
TABLE 3.3
Comparison of Corresponding Basic Properties of Landsats 4–5 Thematic
Mapper (TM) Images
Landsat 4–5 TM Bands Wavelength (µm) Resolution (m)
Band 1—Blue 0.45–0.52 30
Band 2—Green 0.52–0.60 30
Band 3—Red 0.63–0.69 30
Band 4—Near Infrared (NIR) 0.76–0.90 30
Band 5—Shortwave Infrared (SWIR) 1 1.55–1.75 30
Band 6—Thermal 10.40–12.50 120 (resampled to 30)
Band 7—Shortwave Infrared (SWIR) 2 2.08–2.35 30
TABLE 3.4
Comparison of Corresponding Basic Properties of Landsat 7 Enhanced
Thematic Mapper Plus (ETM+) Images
Landsat 7 ETM+ Bands Wavelength (µm) Resolution (m)
Band 1—Blue 0.45–0.52 30
Band 2—Green 0.52–0.60 30
Band 3—Red 0.63–0.69 30
Band 4—Near Infrared (NIR) 0.77–0.90 30
Band 5—Shortwave Infrared 1.57–1.75 30
(SWIR) 1
Band 6—Thermal 10.40–12.50 60 (resampled to 30)
Band 7—Shortwave Infrared 2.09–2.35 30
(SWIR) 2
Band 8—Panchromatic 0.52–0.90 15
30 30 m 30 m 30 m 30 30 m
30 m 30 m 30 m 30 m
Landsat 7 ETM+ 183 km 8 15 m
30 m 30 m 60 m
MSS 185 km 8 82 m 82 m 82 m 82 m
Landsat 4–5
TM 185 km 8 30 m 30 m 30 m 30 m 30 m 30 m 120 m
FIGURE 3.7 Continuity of multispectral data coverage provided by Landsat missions. (United States
Geological Survey (USGS), 2014. Landsat 8 (L8) Data Users Handbook. [Link]
data-users-handbook-section-1, accessed May 2017.)
joint program beginning in 1986. In 1986, the SPOT 1 was launched, equipped with a high-resolution
visible (HRV) sensor that offered 10-m panchromatic and 20-m multispectral images with a 26-day
revisit interval (i.e., repeat cycle). Improvements in the spatial resolution of SPOT 1 provided more
accurate data for better understanding and monitoring the surface of Earth. The SPOT 2, SPOT
3, and SPOT 4 were launched with the same instruments in 1990, 1993, and 1998, respectively.
SPOT 5, the latest SPOT in the sky, was launched in 2002 with 2.5- or 5-m panchromatic and 10-m
multispectral image resolution, with the same 26-day revisit interval.
IKONOS, the first commercialized remote sensing satellite launched by a private entity, was
launched in 1999 and was capable of providing a high spatial resolution (1 m) and high temporal
resolution (1.5 to 3 days) imagery. During the same year, Terra, the flagship satellite for the Earth
Observing System (EOS), was launched with five different instruments aboard, including the Clouds
and the Earth’s Radiant Energy System (CERES), the Multi-angle Imaging SpectroRadiometer
(MISR), the Moderate-Resolution Imaging Spectroradiometer (MODIS), the Measurements of
Pollution in the Troposphere (MOPITT), and the Advanced Spaceborne Thermal Emission and
Reflection Radiometer (ASTER). These sensors were designed to monitor the state of Earth’s
environment and ongoing changes in its climate system with a spatial resolution of 250–1,000 m
around the globe. Aqua is another EOS satellite similar to Terra but with a different equator-crossing
time. Both sensors of MODIS onboard Aqua and Terra provide us with the ability of near real-time
environmental monitoring on a daily basis (i.e., Aqua passes the equator daily at the local time of
1:30 p.m. as it heads north (ascending mode) in contrast to Terra, which passes the equator daily at
the local time of 10:30 a.m. (descending mode)).
In addition to remotely sensed sensors such as MOPITT, other specific instruments were utilized
to provide information regarding the atmospheric compositions and air quality. Total Ozone
Mapping Spectrometer (TOMS), first deployed onboard NASA’s Nimbus-7 satellite in 1978, was
the first instrument designed to monitor the total column ozone at the global scale in order to
track the ozone depletion from space. The following instruments, such as the Ozone Monitoring
Instrument (OMI) onboard Aura launched in 2004, the Total Ozone Unit (TOU) onboard the
Chinese FY-3 series satellite since 2008, and the Ozone Monitoring and Profiler Suite (OMPS)
onboard the Suomi-NPP satellite launched in 2011, are all dedicated to monitoring ozone variability
at the global scale. Similarly, the Greenhouse Gases Observing SATellite (GOSAT), launched in
Remote Sensing Sensors and Platforms 35
2009 by Japan, was designed to monitor global carbon dioxide (CO2) and methane (CH4) variability.
The latest Orbiting Carbon Observatory-2 satellite, NASA’s first dedicated Earth remote sensing
satellite used to study atmospheric carbon dioxide from space, was launched in 2014. With the aid
of these satellite platforms and instruments, more detailed information about Earth is provided to
help us better understand our changing world.
The third family of remote sensing satellites is the National Oceanic and Atmospheric
Administration (NOAA) family of polar-orbiting platforms (POES). The Coastal Zone Color
Scanner (CZCS) was launched in 1978 for measuring ocean color from space. Following the success
of the CZCS, other similar ocean color sensors were also launched. They include the Moderate
Optoelectrical Scanner (MOS), the Ocean Color Temperature Scanner (OCTS), Polarization
and Directionality of the Earth’s Reflectances (POLDER), and the Sea-Viewing Wide Field-of-
View Sensor (SeaWiFS). The most popular space-borne sensor, Advanced Very High Resolution
Radiometer (AVHRR), embarked on this series of satellites for remotely determining cloud cover
and the surface temperature. With the technological advances of the late 20th century, satellite
sensors with improved spatiotemporal and spectral resolutions were designed and utilized for
different Earth observation purposes.
In addition, the Medium Resolution Imaging Spectrometer (MERIS), one of the main payloads
onboard Europe’s Environmental Satellite (ENVISAT-1), provided hyperspectral rather than
multispectral remote sensing images with a relatively high spatial resolution (300 m). Although
the ENVISAT-1 was lost in space, those onboard instruments such as MERIS had provided an
ample record of Earth’s environment. Meanwhile, more and more commercial satellites managed by
private sectors in the USA, such as Quickbird 2, Worldview 1, and Worldview 2, provided remotely
sensed optical imagery with enhanced spatial and spectral details.
In the 1990s, microwave sensors, such as synthetic aperture radar (SAR), were deployed onboard
a series of satellites as the fourth family of remote sensing satellites; they were designed to provide
microwave sensing capacity with higher resolution in all weather conditions. They include JERS-1
(Japan), ERS-1 and ERS-2 (Europe), and RADARSAT-1 (Canada). With increased spatial, spectral,
and temporal resolutions, more detailed information on Earth’s changing environment and climate
can be provided. As the importance of microwave remote sensing was generally recognized, new
generations of satellites of this kind continued the advancement in SAR remote sensing with varying
polarization (HH+HV+VH+VV) modes, including Japan (ALOS), Canada (RADARSAT-2), and
Germany (TerraSAR-X).
the Canadian Space Agency (CSA) helps fund the construction and launch of the RADARSAT
satellite and recovers this investment through the supply of RADARSAT-1 and RADARSAT-2 data
to the Government of Canada and various user communities during the lifetime of the mission.
The Japan Aerospace Exploration Agency (JAXA) also performs various space activities related
to Earth observation. JAXA handles the Advanced Land Observation Satellite, known as ALOS.
The following summary tables (Tables 3.5 through 3.8) present most of the current satellites
as of July 2017 relevant for environmental monitoring and Earth observation. These remote
sensing systems are operated mainly by space agencies in many countries such as NASA (USA),
ESA (European Union), DLR (Germany), CNES (France), JAXA (Japan), and CSA (Canada). In
addition, IKONOS and GeoEye/RapidEye are commercial optical-NIR (near infrared) providing
high-resolution satellite imageries.
TABLE 3.5
Current Important Missions Using Passive Spectrometers for Environmental Applications
Platform Sensor Type Feature
Aircraft Airborne Visible/ Imaging • AVIRIS has 224 contiguous
Infrared Imaging Spectrometer channels.
Spectrometer (AVIRIS) (Passive Sensor) • Measurements are used for studying
water vapor, ocean color, vegetation
classification, mineral mapping, and
snow and ice cover.
Suomi National Polar-orbiting Cross-Track Infrared Spectrometer • CrIS produces high-resolution,
Partnership (Suomi-NPP) Sounder (CrIS) (Passive Sensor) three-dimensional temperature,
pressure, and moisture profiles.
Suomi National Polar-orbiting Ozone Mapping Profiler Spectrometer • OMPS is an advanced suite of two
Partnership (Suomi-NPP) Suite (OMPS) (Passive Sensor) hyperspectral instruments.
• OMPS extends the 25+ year
total-ozone and ozone-profile records.
Terra Multi-angle Imaging Imaging • MISR obtains images in four spectral
SpectroRadiometer Spectrometer bands at nine different angles.
(MISR) (Passive Sensor) • MISR provides aerosol, cloud, and
land surface data.
Sentinel-2 MultiSpectral Imager Imaging • Sentinel-2A and 2B provide satellite
(MSI) Spectrometer image data to support generic land
(Passive Sensor) cover, land use and change detection,
leaf area index, leaf chlorophyll
content, and leaf water content.
Source: National Aeronautics and Space Administration (NASA), 2017. EOSDIS – Remote Sensors. [Link]
gov/user-resources/remote-sensors, accessed May 2017.
Remote Sensing Sensors and Platforms 37
TABLE 3.6
Current Important Missions Using Passive Multispectral Radiometers for Environmental
Applications
Platform Sensor Type Feature
Aqua Advanced Microwave Multichannel Microwave • AMSR-E measures precipitation,
Scanning Radiometer Radiometer (Passive oceanic water vapor, cloud water,
(AMSR-E) Sensor) near-surface wind speed, sea and
land surface temperature, soil
moisture, snow cover, and sea ice.
Aqua Moderate-Resolution Imaging Spectroradiometer • MODIS measures ocean and land
Imaging Spectroradiometer (Passive Sensor) surface properties, surface
(MODIS) reflectance and emissivity, and air
properties.
Landsat 7 Enhanced Thermatic Mapper Scanning Radiometer • The ETM+ instrument provides
Plus (ETM+) (Passive Sensor) high-resolution imaging
information of Earth’s surface.
Landsat 8 The Operational Land Imager Radiometer (Passive • OLI and TIRS are designed
(OLI) and the Thermal Sensor) similarly to Landsat 7 for the same
Infrared Sensor (TIRS) purpose in applications.
Soil Moisture L-Band Radiometer (LBR) Radiometer (Passive • SMAP-LBR radiometer chooses an
Active Passive Sensor) advanced radiometer to monitor
(SMAP) water and energy fluxes and
improve flood predictions and
drought monitoring.
Suomi National Visible Infrared Imaging Radiometer • VIIRS collects water-leaving
Polar-orbiting Radiometer Suite (VIIRS) (Passive Sensor) reflectance and land-reflective data.
Partnership
(Suomi-NPP)
Terra Advanced Spaceborne Multispectral Radiometer • ASTER measures surface radiance,
Thermal Emission and (Passive Sensor) reflectance, emissivity, and
Reflection Radiometer temperature. Provides spatial
(ASTER) resolutions of 15 m, 30 m, and
90 m.
Terra Clouds and the Earth’s Broadband Scanning • CERES measures atmospheric and
Radiant Energy System Radiometer (Passive surface energy fluxes.
(CERES) Sensor)
Terra Moderate-Resolution Imaging Spectroradiometer • The same as MODIS Aqua
Imaging Spectroradiometer (Passive Sensor)
(MODIS)
Aura Ozone Monitoring Multispectral Radiometer • OMI collects 740 wavelength
Instrument (OMI) (Passive Sensor) bands in the visible and ultraviolet
electromagnetic spectrum.
• OMI measures total ozone and
profiles of ozone, N2O, SO2, and
several other chemical species.
SPOT High Resolution Visible Multispectral Radiometer • SPOT provides high-resolution
(HRV) Imaging (Passive Sensor) maps for change detections of
Spectroradiometer Earth’s surface.
IKONOS High Resolution Visible Multispectral and • IKONOS provides high-resolution
(HRV) Imaging Panchromatic Radiometer maps for change detections of
Spectroradiometer (Passive Sensor) Earth’s surface.
(Continued)
38 Multisensor Data Fusion and Machine Learning for Environmental Remote Sensing
Source: National Aeronautics and Space Administration (NASA), 2017. EOSDIS – Remote Sensors. [Link]
gov/user-resources/remote-sensors, accessed May 2017, CNES, and private sectors’ web sites.
February 2013, although Landsat 7 is still in operation. This will certainly continue the previous
Earth observation Landsat mission in the optical and infrared remote sensing regime over the past
four decades. GOES-R was deployed as GOES1-12 retired from its geostationary orbit. The mission
of TOMS aboard Nimbus 7 for total ozone mapping was continued by OMI aboard Aura and later
by OMPS aboard Suomi-NPP. The new Sentinel 1 satellite continued the mission of ERS-1 and
ERS-2 for SAR imaging from space.
TABLE 3.7
Current Important Missions Using Radar, LiDAR, Gravimeter, and Laser for Environmental
Applications
Platform Sensor Type Mission
Airborne Microwave Synthetic Aperture Radar (Active • P-band SAR provides calibrated
Observatory of Subcanopy Radar (SAR) Sensor) polarimetric measurements to retrieve
and Subsurface (AirMOSS) root-zone soil moisture.
Ice, Cloud, and land Geoscience Laser Radar (Active • ICESat measures ice sheet elevations
Elevation Satellite (ICESat) Altimeter System Sensor) and changes in elevation through time
(GLAS) in addition to the measurement of cloud
and aerosol height profiles, land
elevation and vegetation cover, and sea
ice thickness.
Cloud-Aerosol LiDAR and Cloud-Aerosol LiDAR Cloud and Aerosol • CALIOP is a two-wavelength
Infrared Pathfinder Satellite with Orthogonal LiDAR (Active polarization-sensitive LiDAR that
Observations (CALIPSO) Polarization Sensor) provides high-resolution vertical
(CALIOP) profiles of aerosols and clouds.
Cloud-Aerosol Transport Light Detection and LiDAR (Active • LiDAR provides range-resolved profile
System on the International Ranging (LiDAR) Sensor) measurements of atmospheric aerosols
Space Station (CATS) and clouds.
Global Precipitation Dual-Frequency Radar (Active • DPR provides information regarding
Measurement (GPM) Precipitation Radar Sensor) rain and snow worldwide.
(DPR)
Ocean Surface Topography Poseidon-3 Altimeter Altimeter (Active • PA provides sea surface heights for
Mission/Jason-2 (OSTM/ (PA) Sensor) determining ocean circulation, climate
Jason-2) change, and sea level rise.
Sentinel-1 Synthetic Aperture Radar (Active • Sentinel-1 SAR provides land and
Radar (SAR) Sensor) ocean monitoring regardless of the
weather.
Sentinel 3 Synthetic Aperture Radar (Active • Sentinel 3 supports marine observation,
Radar (SAR) Sensor) and will study sea-surface topography,
sea and land surface temperature, ocean
and land color.
Soil Moisture Active Passive L-Band Radar (LBR) Radar (Active • SMAP-LBR radar measures the amount
(SMAP) Sensor) of water in the top 5 cm of soil
everywhere on Earth’s surface.
Advanced Land Observation L-band ALOS Phased Array • ALOS expands SAR data utilization by
Satellite (ALOS) PALSAR L-band Synthetic enhancing its performance.
Aperture Radar
(Active Sensor)
TerraSAR-X X-band SAR sensor Radar (Active • TerraSAR-X provides SAR images with
Sensor) high resolution.
TanDEM-X X-band SAR sensor Radar (Active • TanDEM-X provides land subsidence,
Sensor) digital elevation model and other land
cover conditions.
Gravity Recovery and Low-earth orbit Passive Sensor • GRACE measures gravity changes to
Climate Experiment satellite gravimetry infer the water storage at the surface of
(GRACE) Satellite Earth.
Source: National Aeronautics and Space Administration (NASA), 2017. EOSDIS – Remote Sensors [Link]
gov/user-resources/remote-sensors, accessed May 2017.
40 Multisensor Data Fusion and Machine Learning for Environmental Remote Sensing
TABLE 3.8
Current Important Missions of Scatterometers and Sounding Instruments for
Environmental Applications
Platform Sensor Type Feature
Cyclone Global Navigation Delay Doppler Mapping Scatterometer • DDMI measures ocean surface wind
Satellite System Instrument (DDMI) (Active speed in all precipitating conditions.
(CYGNSS) Sensor)
Aqua Atmospheric Infrared Sounder (Passive • AIRS measures air temperature,
Sounder (AIRS) Sensor) humidity, clouds, and surface
temperature.
Aqua Advanced Microwave Sounder (Passive • AMSU measures temperature profiles
Sounding Unit (AMSU) Sensor) in the upper atmosphere.
Aura High-Resolution Sounder (Passive • HIRDLS measures profiles of
Dynamics Limb Sensor) temperature, ozone, CFCs, and
Sounder (HIRDLS) various other gases affecting ozone
chemistry.
Aura Microwave Limb Sounder (Passive • MLS derives profiles of ozone, SO2,
Sounder (MLS) Sensor) N2O, OH and other atmospheric
gases, temperature, pressure, and
cloud ice.
Suomi-National Polar- Ozone Mapping Profiler Sounder (Passive • OMPS provides operational ozone
orbiting Partnership Suite (OMPS) Sensor) measurements.
(Suomi-NPP)
Terra Measurements of Sounder (Passive • MOPITT measures carbon monoxide
Pollution in the Sensor) and methane in the troposphere.
Troposphere (MOPITT)
Source: National Aeronautics and Space Administration (NASA), 2017. EOSDIS – Remote Sensors [Link]
gov/user-resources/remote-sensors, accessed May 2017.
laser altimeter ICESat, and is scheduled for launch in 2018. The ICESat-2 mission will provide
multi-year elevation data needed to determine ice sheet mass balance as well as cloud property
information, especially for stratospheric clouds common over polar areas. The Gravity Recovery and
Climate Experiment Follow-on (GRACE-FO) (a.k.a. GFO) mission that is part of the US-German
GRACE consortium (NASA/Jet Propulsion Laboratory, Center for Space Research/University of
Texas, DLR, GFZ Helmholtz Center Potsdam) is heavily focused on maintaining data continuity
from GRACE and minimizing any data gap after GRACE. In concert with the functionality of
SWOT, this effort leads to the final closure of the water balance in the hydrological system.
Specifically, the future German satellite mission EnMAP (Environmental Mapping and Analysis
Program) addresses the future need for hyperspectral remote sensing (Stuffler et al., 2007). It aims
to measure, derive, and analyze diagnostic parameters for the vital processes on Earth’s land and
water surfaces. The EnMAP hyperspectral remote sensing products take images of 1024 × 1024
pixels (∼30 × 30 km2), which are generated by the processing system on demand and delivered to
the user community.
TABLE 3.9
Historic Important Missions for Environmental Applications
Platform Sensor Type Mission
Advanced Land Phased Array L-band Radar (Active • PALSAR provided mapping of regional
Observing Satellite Synthetic Aperture Radar Sensor) land coverage, disaster monitoring, and
(ALOS) (PALSAR) resource surveying.
Advanced Land Panchromatic Remote Spectrometer • PRISM provided panchromatic images with
Observing Satellite Sensing Instrument for (Passive 2.5-m spatial resolution that digital surface
(ALOS) Stereo Mapping (PRISM) Sensor) model (DSM).
Radar Satellite Synthetic Aperture Radar Radar (Active • RADARSAT-1 collected data on resource
(RADARSAT-1) (SAR) Sensor) management, ice, ocean and environmental
monitoring, and Arctic and off-shore
surveillance.
Nimbus-7 Coastal Zone Color Scanner Radiometer • CZCS attempted to discriminate between
(CZCS) (Passive organic and inorganic materials in the water.
Sensor)
Nimbus-7 Earth Radiation Budget Radiometer • ERBE attempted to test infrared limb scanning
Experiment (ERBE) (Passive radiometry to sound the composition and
Sensor) structure of the middle atmosphere.
Nimbus-7 Stratospheric Aerosol Photometer • SAM II attempted to measure stratospheric
Measurement II (SAM II) (Passive aerosol measurement and provided vertical
Sensor) profiles of aerosol extinction in both the
Arctic and Antarctic polar regions.
Nimbus-7 Solar Backscatter Spectrometer • SBUV and TOMS sensors provided
Ultraviolet (SBUV), Total (Passive first-hand data of UV-B and total column
Ozone Mapping Sensor) ozone.
Spectrometer II (TOMS II)
Nimbus-7 Scanning Multichannel Multispectral • SMMR measured sea surface temperatures,
Microwave Radiometer Microwave ocean near-surface winds, water vapor and
(SMMR) Radiometer cloud liquid water content, sea ice extent, sea
(Passive ice concentration, snow cover, snow moisture,
Sensor) rainfall rates, and differential of ice types.
Ice, Cloud, and land Geoscience Laser Altimeter Laser Altimeter • GLAS measured ice sheet elevations and
Elevation Satellite System (GLAS) (Active Sensor) changes in elevation through time.
(ICESat) • GLAS measured cloud and aerosol height
profiles, land elevation and vegetation cover,
and sea ice thickness.
European Remote Sensing Synthetic Aperture Radar Radar (Active • ERS SAR emitted a radar pulse with a
Satellite (ERS-1, ERS-2) (SAR) Sensor) spherical wavefront which reflected from
the surface.
Cosmo/SkyMed 1, 2, 3, 4 Synthetic Aperture Radar SAR 2000 • Cosmo/SkyMed SAR emitted a radar pulse.
(SAR) (Active Sensor)
European Remote Sensing Active Microwave Microwave • ERS AMI-WIND emitted a radar pulse with
Satellite (ERS-1, ERS-2) Instrument (AMI-WIND) (Active Sensor) a spherical wavefront which reflected from
the surface.
European Remote Sensing Radar Altimetry (RA) Radar (Active • ERS RA emitted a radar pulse with a
satellite (ERS-1, ERS-2) Sensor) spherical wavefront which reflected from
the surface.
Geostationary Operational Advanced Very High Radiometer • AVHRR can be used for remotely
Environmental Satellite Resolution Radiometer (Passive determining cloud cover and the surface
(GOES 1-12) (AVHRR) Sensor) temperature.
Source: National Aeronautics and Space Administration (NASA), 2017. EOSDIS – Remote Sensors [Link]
user-resources/remote-sensors, accessed May 2017. CNES, DLR, European Space Agency (ESA), 2017. The Copernicus
Programme, [Link] Observing_the_Earth/Copernicus/Overview3, CSA (2017).
42 Multisensor Data Fusion and Machine Learning for Environmental Remote Sensing
TABLE 3.10
Future Important Missions for Environmental Applications
Platform Sensor Type Mission
Surface Water Ocean Advanced Microwave Radiometer • SWOT will provide sea surface heights and
Topography Radiometer (AMR) (Passive Sensor) terrestrial water heights over a 120-km-wide
(SWOT) swath with a ±10-km gap at the nadir track.
Sentinel 5P Synthetic Aperture Radar (Active • Sentinel 5P will aim to fill in the data gap and
Radar (SAR) Sensor) provide data continuity between the retirement
of the ENVISAT satellite and NASA’s Aura
mission and the launch of Sentinel-5.
Biomass Imaging Multispectral • Biomass will address the status and dynamics
Spectroradiometer Radiometer of tropical forests.
(Passive Sensor)
EnMap Imaging Hyperspectral • EnMap will address the dynamics of the land
Spectroradiometer Radiometer and water surface.
(Passive Sensor)
Source: National Aeronautics and Space Administration (NASA), 2017. EOSDIS – Remote Sensors [Link]
user-resources/remote-sensors, accessed May 2017. CNES, DLR, European Space Agency (ESA), 2017. The
Copernicus Programme, [Link] CSA (2017).
involves combining information to estimate or predict the state of some aspect of the system. This
can be geared toward much better Earth observations and to tackling a few challenging problems.
For instance, due to the impact of global climate change, data fusion missions can be organized
based on the existing and future satellites groupwise, such as ESA Sentinels 1 and 2 and Landsat-8,
gravimetry missions, TerraSAR-X, SWOT, SMAP, GRACE-FO, GOES-R and TanDEM-X, to
produce data fusion products. Data fusion, with respect to the different remote sensing sensors
above, can be carried out to blend different modalities of satellite imagery into a single image
for various Earth observation applications over temporal and spatial scales, leading to better
environmental decision making. However, a satellite constellation such as the A-train program,
which is a joint program between NASA, CNES, and JAXA, may group several satellites by design,
providing some insightful and complementary support to this type of profound research. Note that
the A-train (from Afternoon Train) is a satellite constellation of six Earth observation satellites of
varied nationalities in sun-synchronous orbit at an altitude of 705 km above Earth (Figure 3.8); they
include OCO-2, GCOM-W1 SHIZUKU, Aqua, CloudSat, CALIPSO, and Aura as of July 2014.
In addition, to fill in different data gaps of space-borne remote sensing and to facilitate the
system planning goals of providing low-cost and full-coverage images, the community has further
adopted a standard dubbed as CubeSat. (Heidt et al., 2000). A CubeSat is a spacecraft sized in
units, or Us, typically up to 12U (a unit is defined as a volume of about 10 cm × 10 cm × 10 cm)
that is launched fully enclosed in a container, enabling ease of launch vehicle system integration,
thus easing access to space (National Academies of Sciences, Engineering, and Medicine, 2016).
Continuous creation of customized nano-satellites and cube-satellites in the optical, microwave,
and radio frequency domains has become a big initiative and a giant niche for different types
of environmental applications. This fast evolution in Earth observation will possibly disrupt the
conventional ways of environmental monitoring.
Possible data merging and data fusion opportunities to tackle permafrost remote sensing
studies are shown in Figure 3.9. This system planning diagram exhibits the possible space-time
plot of selected near-term (2013–2020) satellite sensor observations with potential relevance for
permafrost. These parameters are linked to ALOS-2 (L-band SAR), Biomass Earth Explorer
mission, Landsat 8, RADARSAT (C-band radar data), IKONOS and GeoEye/RapidEye, SWOT,
Remote Sensing Sensors and Platforms 43
FIGURE 3.8 The A-Train that consists of six satellites in the constellation as of 2014. (National Aeronautics
and Space Administration (NASA), 2017. NASA A-Train portal, [Link] accessed
May 2017.)
ALOS Biomass
≥30-d
Landsat
(LDCM) SWOT
16-d
RADARSAT MODIS,VIIRS
Temporal fidelity
Sentinel 1
ICESat-2
Sentinel 2 TerraSAR-X/
8-d
IKONOS/GeoEye
ASCAT
TanDEM-X
3-d
SMAP
AMSR
≤1-d
FIGURE 3.9 Possible data fusion opportunities to tackle permafrost remote sensing studies. (Adapted from
The National Academies, 2014. Opportunities to Use Remote Sensing in Understanding Permafrost and
Related Ecological Characteristics: Report of a Workshop. ISBN 978-0-309-30121-3.)
44 Multisensor Data Fusion and Machine Learning for Environmental Remote Sensing
VIIRS, ICESat-2 (LiDAR), ASCAT (Advanced SCATterometer), SMAP (L-band SAR), AMSR
(Advanced Microwave Scanning Radiometer), ESA Sentinels 1 and 2, GRACE-FO, X-band SAR,
TerraSAR-X, and TanDEM-X.
3.7 SUMMARY
Many remote sensing studies have been carried out on national and international levels by different
government agencies, academia, research institutions, and industries to investigate new techniques
for observing Earth from space or sky. The close-knit relationships across many satellite missions
have welded several families of satellites that may help perform data merging or create some hidden
niches to promote data fusion, albeit not necessarily through a satellite constellation. The connection
between the EOS and SPOT families lives on as well from past missions to the current ones and to
the future missions—such as the Copernicus program—that spin off even more potential for long-
term data fusion and machine learning research. Future missions will influence cross-mission ties
from which the past, the current, and the future missions can be tailored cohesively for large-scale
research dealing with specific earth system science problems. These kinds of linkages have glorified
the principle of system engineering—“the whole is greater than the sum of its parts”. Various data
merging, data fusion, and machine learning algorithms have played a key behind-the-scenes role
in helping to expand the sum further. These algorithms will be introduced in subsequent chapters.
REFERENCES
Agren, A., Jansson, M., Ivarsson, H., Biship, K., and Seibert, J., 2008. Seasonal and runoff-related changes in
total organic carbon concentrations in the River Ore, Northern Sweden. Aquatic Sciences, 70(1), 21–29.
Anagnostou, E. N. and C. Kummerow, 1997. Stratiform and convective classification of rainfall using SSM/I
85-GHz brightness temperature observations. Journal of Atmospheric and Oceanic Technology, 14, 570–575.
Arenz, R., Lewis, W., and Saunders III, J., 1995. Determination of chlorophyll and dissolved organic carbon
from reflectance data for colorado reservoirs. International Journal of Remote Sensing, 17(8), 1547–1566.
Atlas, D. and Matejka, T. J., 1985. Airborne Doppler radar velocity measurements of precipitation seen in
ocean surface reflection. Journal of Geophysical Research-Atmosphere, 90, 5820–5828.
Barsi, J. A., Lee, K., Kvaran, G., Markham, B. L., and Pedelty, J. A., 2014. The spectral response of the
landsat-8 operational land imager. Remote Sensing, 6, 10232–10251.
Chang, N. B., Vannah, B., Yang, Y. J., and Elovitz, M., 2014. Integrated data fusion and mining techniques
for monitoring total organic carbon concentrations in a lake. International Journal of Remote Sensing,
35, 1064–1093.
Chang, N., Xuan, Z., and Yang, Y., 2013. Exploring spatiotemporal patterns of phosphorus concentrations in
a coastal bay with MODIS images and machine learning models. Remote Sensing of Environment, 134,
100–110.
European Space Agency (ESA), 2017. The Copernicus Programme, [Link]
Observing_the_Earth/Copernicus/Overview3
Heidt, H., Puig-Suari, J., Moore, A. S., Nakasuka, S., and Twiggs, R. J., 2000. CubeSat: A New Generation
of Picosatellite for Education and Industry Low-Cost Space Experimentation. In: Proceedings of the
14th Annual AIAA/USU Conference on Small Satellites, Lessons Learned-In Success and Failure,
SSC00-V-5. [Link]
King, M. D. and Byrne, D. M., 1976. A method for inferring total ozone content from the spectral variation of
total optical depth obtained with a solar radiometer. Journal of the Atmospheric Sciences, 33, 2242–2251.
King, M. D., Kaufman, Y. J., Tanré, D., and Nakajima, T., 1999. Remote sensing of tropospheric aerosols from
space: Past, present, and future. Bulletin of the American Meteorological Society, 80, 2229–2259.
Lee, H. J., Coull, B. A., Bell, M. L., and Koutrakis, P., 2012. Use of satellite-based aerosol optical depth and
spatial clustering to predict ambient PM2.5 concentrations. Environmental Research, 118, 8–15.
Li, J., Carlson, B. E., and Lacis, A. A., 2015. How well do satellite AOD observations represent the spatial
and temporal variability of PM2.5 concentration for the United States? Atmospheric Environment, 102,
260–273.
Li, Q., Li, C., and Mao, J., 2012. Evaluation of atmospheric aerosol optical depth products at ultraviolet bands
derived from MODIS products. Aerosol Science and Technology, 46, 1025–1034.
Remote Sensing Sensors and Platforms 45
National Academies of Sciences, Engineering, and Medicine. 2016. Achieving Science With CubeSats:
Thinking Inside the Box. The National Academies Press, Washington, DC. doi:10.17226/23503
National Aeronautics and Space Administration (NASA), 2012. [Link]
communications/outreach/funfacts/txt_passive_active.html, accessed May 2017.
National Aeronautics and Space Administration (NASA), 2013. Landsat 7 Science Data User’s Handbook.
Available at [Link]
National Aeronautics and Space Administration (NASA), 2017. EOSDIS - Remote Sensors. [Link]
[Link]/user-resources/remote-sensors, accessed May 2017.
National Aeronautics and Space Administration (NASA), 2017. NASA A-Train portal. [Link]
gov/ accessed May 2017.
Natural Resources Canada, 2017. In the “Fundamentals of Remote Sensing” tutorial, by the Canada Centre for
Remote Sensing (CCRS), Natural Resources Canada. [Link]
satellite-imagery-air-photos/satellite-imagery-products/educational-resources/9283, accessed May 2017.
Richards, J. A. and Jia, X., 2006. Remote Sensing Digital Image Analysis: An Introduction. Springer, Berlin,
Germany.
Stuffler, T., Kaufmann, C., Hofer, S., Förster, K.-P., Schreier, G., Mueller, A., and Eckardt, A. et al., 2007.
The EnMAP hyperspectral imager—An advanced optical payload for future applications in Earth
observation programmes. Acta Astronautica, 61, 115120.
The National Academies, 2014. Opportunities to Use Remote Sensing in Understanding Permafrost and
Related Ecological Characteristics: Report of a Workshop. ISBN 978-0-309-30121-3
United States Geological Survey (USGS), 2017. [Link]
satellites, accessed December 2017.
United States Geological Survey (USGS), 2014. Landsat 8 (L8) Data Users Handbook. [Link]
gov/landsat-8-l8-data-users-handbook-section-1, accessed May 2017.
Wood, V. T., Brown, R. A., and Sirmans, D., 2001. Technique for improving detection of WSR-88D mesocyclone
signatures by increasing angular sampling. Weather and Forecasting, 16, 177–184.
Zhang, Y., 2010. Ten years of remote sensing advancement & the research outcome of the CRC-AGIP Lab.
Geomatica, 64, 173–189.
Zrnic, D. S. and Ryzhkov, A. V., 1999. Polarimetry for weather surveillance radars. Bulletin of American
Meteorology Society, 80, 389–406.
4 Image Processing Techniques
in Remote Sensing
4.1 INTRODUCTION
Remote sensing involves the collection of data from different sensors far from the target (e.g., space-
borne instruments onboard satellites); the information is collected without making any physical
contact with the object. Data collected from remotely sensed instruments can be recorded either
in analog (e.g., audios) or digital format. Compared to the digital format, the conventional analog
format is confined to several drawbacks, such as a limitation on the size of the data that can be
transmitted at any given time, and inconveniences with manipulation as well. Therefore, the digital
format is commonly applied to archive remotely sensed data, especially in the form of images.
In remote sensing, data recorded by remotely sensed sensors are commonly archived in different
formats convenient for information storage and transformation, such as the Hierarchical Data Format
(HDF), network Common Data Format (netCDF), and so forth. Data archived in such formats are
always hard to be manipulated for visualization and interpretation without the aid of professional image
processing software and tools. In addition, remotely sensed data may contain noise or other deficiencies
resulting from various reasons, such as abnormal vibration of the observing systems. Therefore, further
processing procedures should be conducted to deal with these flaws. Since the remotely sensed data are
always archived in two-dimensional image forms, any further processing procedures performed on the
raw remotely sensed images can be generally interpreted as image processing.
Toward establishing a definition of image processing, two classic definitions are provided below:
As suggested therein, the overarching goal of image processing is to produce a better image to aid in
visualization or information extraction by performing specific operations on raw images. In general,
image processing is intended to make the raw image more interpretable for better understanding of
the object of interest.
With the development of remote sensing and computer sciences, various image processing
techniques have been developed to aid in the interpretation or information extraction from remotely
sensed images. Although the choice of image processing techniques depends on the goals of each
individual application, some basic techniques are still common in most remote sensing applications.
In this chapter, a set of basic image processing techniques, as well as several software and
programming languages that are commonly applied for image processing and analysis in remote
sensing will be introduced.
47
48 Multisensor Data Fusion and Machine Learning for Environmental Remote Sensing
transformation, and classification. In this section, techniques associated with each type of application
to process remotely sensed images will be introduced.
(a) (b)
FIGURE 4.1 Comparison of Landsat 8 Operational Land Imager (OLI) true color image (RGB composite of
bands 4, 3, 2) on October 6, 2014, (a) before and (b) after atmospheric correction by using Fast Line-of-sight
Atmospheric Analysis of Hypercube (FLAASH).
Despite atmospheric correction methods providing a way toward accurate remotely sensed
images, atmospheric correction should be carefully conducted as many factors must be estimated.
If these estimations are not properly derived, the atmospheric correction might result in even larger
bias than the atmosphere itself. An illustrative example of atmospheric correction is demonstrated
in Figure 4.1. As suggested therein, significant atmospheric effects are observed within the Landsat
8 Operational Land Imager (OLI) scene on October 6, 2014, due to smog and aerosols, and these
effects were predominantly removed after performing atmospheric correction with the aid of the
Fast Line-of-sight Atmospheric Analysis of Hypercube (FLAASH) method.
(a) (b)
FIGURE 4.2 Landsat OLI scenes (a) before and (b) after radiometric correction.
• Radiometric correction of noise due to sun angle and topography: In remote sensing,
particularly over the water surface, the observed scenes may be contaminated by the
diffusion of the sunlight, resulting in lighter areas in an image (i.e., sun glint). This effect
can be corrected by estimating a shading curve which is determined by Fourier analysis
to extract a low-frequency component (Kay et al., 2009). In addition, due to topographic
effects, especially in mountainous areas, shading effects could result in another kind of
radiometric distortion, making the shaded area darker than normal. To remove shading
effects, corrections can be conducted using the angle between the solar radiation direction
and the normal vector to the ground surface (Dozier and Frew, 1990; Essery and Marks,
2007) (Figure 4.2).
for the final knowledge gained through the integration of information from multiple data sources.
Thus, it is widely used in remote sensing, medical imaging, computer vision, and cartography.
The traditional approach for geometric correction mainly depends on the manual identification
of many ground control points to align the raster data, which is labor-intensive and time-consuming
(Goshtasby, 1987). In addition, the number of remotely sensed images has grown tremendously,
which has reinforced the need for highly efficient and automatic correction methods. With the
development of computer sciences and remote sensing technologies, a variety of methods has been
developed to advance automatic geometric correction, such as the automated ground control points
extraction technique (Gianinetto and Scaioni, 2008), scale-invariant feature transform (Deng et al.,
2013), and contour-based image matching (Eugenio et al., 2002).
With advancements in remote sensing technologies, high resolution satellite imagery (aerial
imagery as well) has become popular in real-world applications. Due to the high resolution of each
pixel, horizontal accuracy is of critical importance, because a tiny geometric variation from either
systematic sensors or terrain-related errors could result in significant distortions in the observed
imagery. Orthorectification, a process of removing inaccuracies caused by sensor, satellite/aircraft
motion, and terrain-related geometric distortions from raw imagery to improve the horizontal
accuracy, is also essential in geometric correction. Orthorectified imagery is required for most
applications involving multiple image analyses, especially for tasks when overlaying images with
existing data sets and maps, such as data fusion, change detection, and map updating. Compared
to the original imagery, the resulting orthorectified imagery is planimetric at every location with
consistent scale across all parts of the image so that features are represented in their “true” positions,
allowing for accurate direct measurement of distances, angles, and areas.
• Euclidean transformations: The Euclidean transformations are the most commonly used
transformations, and can be a translation, a rotation, or a reflection. Essentially, Euclidean
transformations do not change length and angle measures. Moreover, Euclidean transformations
preserve the shape of a geometric object (e.g., lines transform to lines, and circles transform to
circles). In other words, only position and orientation of the object will be changed.
• Affine transformations: Affine transformations are considered generalizations of Euclidean
transformations. An affine transformation (or affinity) refers to any transformation
that preserves collinearity (i.e., all points lying on a line initially still lie on a line after
transformation) and ratios of distances (e.g., the midpoint of a line segment remains the
midpoint after transformation) (Weisstein, 2017). In general, affine is a type of linear
mapping method. Hence, operations such as scaling, resampling, shearing, and rotation
are all affine transformations. The difference between Euclidean and affine is that all
Euclidean spaces are affine whereas affine spaces can be non-Euclidean.
• Projective transformations: Projective transformations are commonly applied to remotely
sensed imagery to transform the observed data from one coordinate system to another
by given projection information. Certain properties remain invariant after projective
transformations, which include collinearity, concurrency, tangency, and incidence.
techniques to manipulate one digital image and transform it into a more consistent form by changing
its spatial resolution or orientation for visualization or data analysis (Gurjar and Padmanabhan,
2005). Due to limitations imposed by imaging systems, remotely sensed images captured from
different remote sensing instruments may have different spatial resolutions. However, in real-world
applications, the initial spatial resolution of a remotely sensed image may not be sufficient, or may
need to be consistent with other images. In such cases, resampling should be applied to transform
the original image into another form to satisfy application needs.
[Link] Resampling
Mathematically, resampling involves interpolation and sampling to produce new estimates for
pixels at different grids (Parker et al., 1983; Baboo and Devi, 2010). To date, a variety of methods
have been developed for resampling, and the choice of resampling kernels is highly application-
dependent. The three most common resampling kernels are nearest neighbor, bilinear interpolation,
and cubic convolution.
• Nearest neighbor: Nearest neighbor is a method frequently used for resampling in remote
sensing, which estimates a new value for each “corrected” pixel (i.e., new grid) using data
values from the nearest “uncorrected” pixels (i.e., original grids). The advantages of nearest
neighbor are its simplicity and capability to preserve original values in the unaltered scene.
Nevertheless, the disadvantages of nearest neighbor are also significant, in particularly
its blocky effects (Baboo and Devi, 2010). An example of image resampling with nearest
neighbor is shown in Figure 4.3.
• Bilinear interpolation: Bilinear interpolation is an image smoothing method which uses
only values from the four nearest pixels that are located in diagonal directions from a given
pixel to estimate appropriate values of that pixel (Parker et al., 1983; Baboo and Devi,
2010). In general, bilinear interpolation takes a weighted average of the closest 2 × 2
neighborhood of known pixel values surrounding the corresponding pixel to produce an
interpolated value. Weights assigned to the four pixel values are normally based on the
computed pixel’s distance (in 2D space) from each of the known points.
• Cubic convolution: Cubic convolution is conducted through a weighted average of 16 pixels
nearby the corresponding input pixel through a cubic function. Compared to bilinear
interpolation, cubic convolution performs better and the result does not have a disjointed
appearance like nearest neighbor (Keys, 1981; Reichenbach and Geng, 2003). However,
computational times required by cubic convolution are about 10 times more than those
required by the nearest neighbor method (Baboo and Devi, 2010).
(a) (b)
FIGURE 4.3 Comparison of Landsat Thematic Mapper (TM) image (RGB composite of bands 4, 3, 2) on
October 17, 2009, at (a) 30-meter and (b) 200-meter (resampled) spatial resolution.
Image Processing Techniques in Remote Sensing 53
In addition to the three aforementioned commonly used resampling kernels, there are some other
methods for resampling, such as the fast Fourier transformation resampling (Li, 2014) and quadratic
interpolation (Dodgson, 1997).
[Link] Mosaicking
Due to constraints of imaging systems, observations within a single scene may be incapable of
providing a full coverage of targets of interest. Therefore, an assemblage of different images together
to form one image with larger spatial coverage is desirable. In image processing, such a blending
process is referred to as mosaicking (Inampudi, 1998; Abraham and Simon, 2013).
Generally, mosaicking of images relies on the identification of controlled points or features in
different images, and then blends these images based on the overlap of these extracted common
controlled points or features (Inampudi, 1998). The most straightforward mosaicking is to blend
images collected from the same or adjacent satellite paths because of minimal radiometric differences
between these images (e.g., Figure 4.2). However, when images are collected from different paths
with significant time-elapsing differences, radiometric corrections should be conducted prior to
mosaicking. Otherwise, new radiometric distortions might be introduced to the blended images.
(a) (b)
FIGURE 4.4 An example of Landsat 7 ETM+ SLC-off image (RGB composite of bands 4, 3, 2) on February
12, 2015 (a) before and (b) after gap filling.
54 Multisensor Data Fusion and Machine Learning for Environmental Remote Sensing
• Spatial domain methods: Spatial domain methods directly deal with image pixels through
different operations, such as histogram equalization (Hummel, 1977) and contrast
stretching (Yang, 2006). An overview of spatial domain methods can be found in the
literature (Maini and Aggarwal, 2010; Bedi and Khandelwal, 2013). An example of image
enhancement through histogram equalization is shown in Figure 4.5. Contrast is the
difference in visual properties that makes an object distinguishable from other objects and
the background. In visual perception, contrast is determined by the difference in the color
and brightness of the object compared to other objects. Methods such as contrast stretching
for image enhancement are oftentimes used to increase contrast between different objects
in order to make the objects of interest distinguishable (Starck et al., 2003; Yang, 2006).
• Frequency domain methods: Frequency domain methods operate on Fourier transform
of an image. This means that enhancement operations are performed on the Fourier
transform of the image, and the final output image is obtained by using the inverse Fourier
transform. Filtering is the commonly applied method for image enhancement; filtering out
unnecessary information (or noise) highlights certain frequency components (Chen et al.,
1994; Silva Centeno and Haertel, 1997).
Image Processing Techniques in Remote Sensing 55
(a) (b)
FIGURE 4.5 Landsat TM image (RGB composite of bands 3, 2, 1) on October 17, 2009 (a) before and (b)
after enhancement by performing histogram equalization.
Nevertheless, there is no general theory for determining the quality of image enhancement,
which means that most enhancements are empirical and require interactive procedures to obtain
satisfactory results.
NIR − Red
NDVI = (4.1)
NIR + Red
56 Multisensor Data Fusion and Machine Learning for Environmental Remote Sensing
(a) (b)
FIGURE 4.6 Normalized difference vegetation index generated from Landsat TM image on October 17,
2009. (a) Landsat TM image (RGB composite of bands 4, 3, 2) and (b) NDVI.
where NIR and Red denote reflectance collected from a near-infrared band and a red band,
respectively. In real-world applications, various vegetation indices such as LAI have been developed
to aid in the monitoring of vegetation; most of them rely on the absorption differences of vegetation
in the red and near-infrared wavelengths, such as the soil-adjusted vegetation index (Huete, 1988)
and enhanced vegetation index (Huete et al., 2002).
In addition to simple arithmetic operations, Principal Component Analysis (PCA) is another
procedure that is frequently applied for image transformation, especially for information reduction
on multispectral or particular hyperspectral images, as the multispectral imagery data are always
correlated from one band to the other (Cheng and Hsia, 2003; Pandey et al., 2011). In image processing,
the essence of PCA is to apply a linear transformation of multispectral band data to make a rotation
and translation of the original coordinate system (Batchelor, 1978). Normally, PCA is performed
on all bands of multispectral images without a priori information associated with image spectral
characteristics. Derived PCAs represent the spectral information more efficiently than the original
ones. The first principal component always accounts for the largest portion of variance, while other
principal components subsequently account for the remaining variance. Due to its efficiency and
information reduction characteristics, PCA has been frequently used for spectral pattern recognition
and image enhancement (Cheng and Hsia, 2003; KwangIn et al., 2005; Pandey et al., 2011).
(a) (b)
FIGURE 4.7 Unsupervised classification of Landsat–5 TM image on October 17, 2009, using the ISODATA
method. (a) Landsat–5 TM image with RGB composite of bands 4, 3, 2 and (b) classified image.
their distinctive features (Acharya and Ray, 2005; Gonzalez and Woods, 2008; Giri, 2012). Land
covers are identified and classified into different categories based on the differences of spectral
features. In general, techniques developed for image classification in remote sensing can be divided
into unsupervised classification and supervised classification.
• Unsupervised classification: Pixels in one image are automatically classified and then
grouped into separate clusters, depending on the similarities of spectral features of
each pixel, without human intervention (Lee et al., 1999; Fjortoft, 2003). These kinds of
classifications are also termed as clustering, and the representative algorithms are K-means
(Mac Queen, 1967) and Iterative Self-Organizing Data Analysis Technique (ISODATA)
(Ball and Hall, 1964). Classification with unsupervised methods is simple and fast since
it involves only statistical calculation of the input image. However, the final output highly
depends on the number of clusters given by operators, and results in feature mixtures
frequently, especially for those objects having similar spectral characteristics, such as
water and shadows. In addition to ISODATA, a variety of algorithms have been developed
for unsupervised classification, such as the K-means and based methods (Hara et al., 1994;
Yuan et al., 2009), probabilistic methods (Fjortoft, 2003), and even hybrid methods (Lee
et al., 1999) (Figure 4.7).
• Supervised classification: Compared to unsupervised approaches, supervised classifications
require the user to select representative samples for each cluster as training sites (i.e., samples)
beforehand, and the identified clusters thus highly depend on these predetermined training
sites (Khorram et al., 2016). Therefore, the final output depends heavily on the cognition
and skills of the image specialist for training site selection. Commonly used supervised
classification algorithms include maximum likelihood (Ahmad and Quegan, 2012) and
minimum-distance classification (Wacker and Landgrebe, 1972). Despite this, results
from supervised classification are still much more accurate than those from unsupervised
approaches.
4.3.1 ENVI
ENVI, an acronym for “the ENvironment for Visualizing Images,” is a software application
developed by the Exelis Visual Information Solutions (Exelis VIS) company, which specializes in
remote sensing imagery processing and analysis. ENVI was first released in 1994, and written in
IDL (Interactive Data Language). In contrast to the text-based IDL, ENVI has a suite of user-friendly
Graphical User Interfaces (GUI) with a number of advanced scientific algorithms and wizard-based
tools embedded for imagery visualization, analysis, and processing (Figure 4.8).
As shown in Figure 4.8, EVNI provides various algorithms and tools for image processing and
analysis, including basic imagery reading modules to visualize images collected from different
platforms in different formats, as well as pre-processing functions and further advanced spatial and
spectral transformations. Compared to other image processing software, one of the advantages of
ENVI lies in its distinct combination of spectral-based and file-based techniques through interactive
manipulations which enables users to easily manipulate more than one image simultaneously for
advanced processing steps. In addition, ENVI provides extension interfaces to external tools and
functions, which enables users to create customized or application-oriented tools for different
purposes. Due to its supereminent performance in image processing, ENVI has been used in a
variety of industries, particularly in remote sensing.
4.3.2 ERDAS IMAGINE
ERDAS IMAGINE, a geospatial image processing application with raster graphics editor capabilities
designed by ERDAS Inc., has also been widely applied to process and analyze remotely sensed
imagery from different satellite platforms such as AVHRR, Landsat, SPOT, and LiDAR. Before
the ERDAS IMAGINE suite, various products were developed by ERDAS Inc. under the name of
ERDAS to assist in processing imagery collected from most optical and radar mapping sensors.
Similar to most image processing applications, ERDAS IMAGINE also provides a user-friendly
GUI to support imagery visualization, mapping, and so forth.
The first version of ERDAS was released in 1978, whereas the ERDAS IMAGINE was provided
in 1991. The latest version of ERDAS IMAGINE was released in 2015. Like all the previous
products, ERDAS IMAGINE aims mainly at processing geospatial raster data by providing many
solutions associated with image visualization, mapping, and data (e.g., raster, vector, LiDAR point)
analysis in one, allowing users to perform numerous operations on imageries toward specific goals.
It supports optical panchromatic, multispectral, and hyperspectral imagery, as well as radar and
LiDAR data in a wide variety of formats.
By integrating multiple geospatial technologies, ERDAS IMAGINE can be used as a powerful
package to process remotely sensed imagery supporting consolidated workflows. In addition,
ERDAS IMAGINE is flexible, depending on users’ needs. It provides three product tiers (i.e.,
Essentials, Advantage, and Professional) designed for all levels of users, which enables handling
any geospatial analysis task. Due to the robust multicore and distributed batch processing, ERDAS
IMAGINE is capable of handling tasks with a remarkable processing performance through dynamic
modeling, even when dealing with massive datasets from any sensor.
4.3.4 ArcGIS
ArcGIS is a leading geographic information system (GIS) application that allows users to work
with geospatial maps and perform geoprocessing on the input raw data resulting in the production
of valuable information. The first ArcGIS suite was released in late 1999 by ESRI (Environmental
Systems Research Institute). Prior to ArcGIS, ESRI had developed various products focusing
mainly on the development of ArcInfo workstation and several GUI-based products such as the
ArcView. However, these products did not integrate well with one another. Within this context,
ESRI revamped its GIS software platform toward a single integrated software architecture, which
finally resulted in the ArcGIS suite.
ArcGIS provides a comprehensive platform to manage, process, and analyze the input raster or
vector data to extract valuable information. It is capable of managing geographic information in a
database, creating and analyzing geospatial maps, discovering and sharing geographic information,
and so forth. Key features of ArcGIS include: (1) a variety of powerful spatial analysis tools,
(2) automated advanced workflows, (3) high-quality maps creation, (4) geocoding capabilities, and
(5) advanced imagery support.
60 Multisensor Data Fusion and Machine Learning for Environmental Remote Sensing
With the development of remote sensing, ArcGIS provides a suite of image processing and analysis
tools enabling users to better understand the information locked in the imagery pixels. At present,
ArcGIS is capable of efficient managing and processing of time-variant, multi-resolution imagery
from multiple sources (e.g., satellite, aerial, LiDAR, and SAR), formats (e.g., GeoTIFF, HDF, [General
Regularly-distributed Information in Binary form] [GRIB], and netCDF), and projections. In addition
to the basic viewing and editing modules, ArcGIS provides a number of extensions that can be added
to aid in complex tasks, including spatial analyst, geostatistical analyst, network analyst, 3D analyst,
and so forth, which are capable of geoprocessing, data conversion, and analysis.
Due to its multiple functionalities, ArcGIS has been widely applied to process geospatial
imagery in remote sensing. One of the significant features of ArcGIS is that it provides a model
builder tool, which can be used to create, edit, and manage workflows for automatic sequential
execution of geoprocessing tools. In other words, outputs of one tool are fed into another tool as
input (Figure 4.9). The established model can be thought of as a new tool for batch processing, and it
is of great help in handling large volumes of datasets (e.g., long-term satellite imagery) for multiple
processing purposes (Figure 4.9).
4.3.5 MATLAB®
MATLAB is a high-level proprietary programming language developed by MathWorks Inc. that
integrates computation, visualization, and programming in a user-friendly interactive environment.
MATLAB has been widely used across disciplines for numeric computation, data analysis and
visualization, programming and algorithm development, creation of user interfaces, and so forth.
Since the basic data element of MATLAB is an array, it allows fast solution formulations for many
numeric computing problems, in particular those involving matrix representations, such as images
(i.e., two-dimensional numerical arrays). This means image processing operations can be easily
expressed in a compact and clear manner toward a quick solution of image processing problems
(Gonzalez et al., 2004).
With the development of an extensive set of algorithms and functions specializing in manipulating
images, the capability of MATLAB is extended to the image processing domain. These comprehensive
algorithms and functions are achieved through a toolbox termed the Image Processing Toolbox. With
the aid of this toolbox, MATLAB can be easily applied to perform image analysis and processing
including image segmentation, enhancement, registration, and transformations, as well as noise
reduction and so forth. In addition, many algorithms and functions provided in the toolbox support
multicore processors and even GPUs (i.e., graphics processing units), resulting in the acceleration of
image processing, especially for computationally intensive workflows.
At present, MATLAB supports a diverse set of image types in different formats. Images achieved
in standard data and image formats can be directly read into a matrix in MATLAB for visualization
and even further manipulation purposes, as well as a number of specialized file formats, such as
HDF and netCDF. Meanwhile, results or matrices acquired after processing can also be exported as
raster datasets or images.
4.3.6 IDL
IDL, short for Interactive Data Language, is a scientific program with similar capabilities to
MATLAB, also developed by Exelis VIS. It has been commonly used along with ENVI, an image
processing software package built in IDL, for data analysis and image processing, particularly in
remote sensing and medical imaging. Similar to other programming languages, IDL incorporates
three essential capabilities including interactivity, graphics display, and array-oriented operation
for data analysis. Its vectorized nature makes IDL capable of performing fast array computations,
especially for numerically heavy computations, by taking advantage of the built-in vector operations.
With the capability of handling a large volume of data, IDL has been widely applied for image
processing and analysis. In addition to the built-in iTools widgets for interactive image display,
hundreds of algorithms and functions are provided for further advanced image manipulation and
processing with capabilities including segmentation, enhancement, filtering, Fourier transform and
wavelet transform, spectral analysis, and so forth. A distinctive feature of IDL is that it can be used
to develop customized tools for use as extended modules in ENVI for specific purposes.
4.4 SUMMARY
In this chapter, a variety of commonly used image pre-processing techniques including atmospheric
correction, radiometric correction, geometric correction, resampling, mosaicking, and gap filling
are discussed, as well as advanced processing methods including image enhancement, image
transformation, and image classification. In addition, image processing software and programming
languages such as ENVI, ArcGIS, MATLAB, and IDL are also briefly introduced. In the next
chapter, concepts of feature extraction in remote sensing will be formally introduced to expand the
theoretical foundation of remote sensing.
REFERENCES
Abraham, R. and Simon, P., 2013. Review on mosaicing techniques in image processing. International Journal
of Software Engineering Research and Practices, 3, 63–68.
Acharya, T. and Ray, A. K., 2005. Image Processing: Principles and Applications. Wiley InterScience, New
Jersey.
Addink, E. A., 1999. A comparison of conventional and geostatistical methods to replace clouded pixels in
NOAA-AVHRR images. International Journal of Remote Sensing, 20, 961–977.
Ahmad, A. and Quegan, S., 2012. Analysis of maximum likelihood classification on multispectral data.
Applied Mathematical Sciences, 6, 6425–6436.
62 Multisensor Data Fusion and Machine Learning for Environmental Remote Sensing
Baboo, D. S. S. and Devi, M. R., 2010. An analysis of different resampling methods in coimbatore, district.
Journal of Computer Science and Technology, 10, 61–66.
Baboo, D. S. S. and Devi, M. R., 2011. Geometric correction in recent high resolution satellite imagery: A case
study in coimbatore, Tamil Nadu. International Journal of Computer Applications, 14, 32–37.
Ball, G. H. and Hall, D. J., 1964. Some fundamental concepts and synthesis procedures for pattern recognition
preprocessors. In: International Conference on Microwaves, Circuit Theory, and Information Theory,
September, Tokyo, 113–114.
Batchelor, B. G., 1978. Digital image processing. Electronics & Power, 24, 863.
Bedi, S. S. and Khandelwal, R., 2013. Various image enhancement techniques—a critical review. International
Journal of Advanced Research in Computer Engineering, 2, 1605–1609.
Berk, A., Bernstein, L. S., and Robertson, D. C., 1989. MODTRAN: A Moderate Resolution Model for
LOWTRAN 7. Technical Report, May 12, 1986–May 11, 1987. Spectral Sciences, Inc., Burlington, MA.
Chander, G., Markham, B. L., and Helder, D. L., 2009. Summary of current radiometric calibration coefficients
for landsat MSS, TM, ETM+, and EO-1 ALI sensors. Remote Sensing of Environment, 113, 893–903.
Chang, N.-B., Bai, K., and Chen, C.-F., 2015. Smart information reconstruction via time-space-spectrum
continuum for cloud removal in satellite images. IEEE Journal of Selected Topics in Applied Earth
Observations and Remote Sensing, 8, 1898–1912.
Chavez, P. S., 1988. An improved dark-object subtraction technique for atmospheric scattering correction of
multispectral data. Remote Sensing of Environment, 24, 459–479.
Chen, H., Li, A., Kaufman, L., Hale, J., Haiguang, C., Li, A., Kaufman, L., and Hale, J., 1994. A fast filtering
algorithm for image enhancement. IEEE Transactions on Medical Imaging, 13, 557–564.
Chen, J., Zhu, X., Vogelmann, J. E., Gao, F., and Jin, S., 2011. A simple and effective method for filling gaps
in Landsat ETM+ SLC-off images. Remote Sensing of Environment, 115, 1053–1064.
Cheng, S.-C. and Hsia, S.-C., 2003. Fast algorithms for color image processing by principal component
analysis. Journal of Visual Communication and Image Representation, 14, 184–203.
Deng, H., Wang, L., Liu, J., Li, D., Chen, Z., and Zhou, Q., 2013. Study on application of scale invariant feature
transform algorithm on automated geometric correction of remote sensing images. In: Computer and
Computing Technologies in Agriculture VI, 352–358. Editted by Li, D. and Chen, Y., Zhangjiajie, China.
Dodgson, N. A., 1997. Quadratic interpolation for image resampling. IEEE Transactions on Image Processing,
6, 1322–1326.
Dozier, J. and Frew, J., 1990. Rapid calculation of terrain parameters for radiation modeling from digital
elevation data. IEEE Transactions on Geoscience and Remote Sensing, 28, 963–969.
Du, Y., Teillet, P. M., and Cihlar, J., 2002. Radiometric normalization of multitemporal high-resolution
satellite images with quality control for land cover change detection. Remote Sensing of Environment,
82, 123–134.
Duggin, M. J. and Piwinski, D., 1984. Recorded radiance indices for vegetation monitoring using NOAA
AVHRR data; atmospheric and other effects in multitemporal data sets. Applied Optics, 23, 2620.
Essery, R. and Marks, D., 2007. Scaling and parametrization of clear-sky solar radiation over complex
topography. Journal of Geophysical Research-Atmospheres, 112, D10122.
Eugenio, F., Marques, F., and Marcello, J., 2002. A contour-based approach to automatic and accurate
registration of multitemporal and multisensor satellite imagery. In: IEEE International Geoscience and
Remote Sensing Symposium, 3390–3392. Toronto, Ontario, Canada.
Figueiredo, M. A. T. and Nowak, R. D., 2003. An EM algorithm for wavelet-based image restoration. IEEE
Transactions on Image Processing, 12, 906–16.
Fjortoft, R., 2003. Unsupervised classification of radar images using hidden markov chains and hidden markov
random fields. IEEE Transactions on Geoscience and Remote Sensing, 14, 1735–686.
Gao, F., Morisette, J. T., Wolfe, R. E., Ederer, G., Pedelty, J., Masuoka, E., Myneni, R., Tan, B., and Nightingale,
J., 2008. An algorithm to produce temporally and spatially continuous MODIS-LAI time series. IEEE
Geoscience and Remote Sensing Letters, 5, 60–64.
Gianinetto, M. and Scaioni, M., 2008. Automated geometric correction of high-resolution pushbroom satellite
data. Photogrammetric Engineering & Remote Sensing, 74, 107–116.
Giri, C. P., 2012. Remote Sensing of Land Use and Land Cover: Principles and Applications, CRC Press,
Boca Raton, FL, pp. 1–469.
Gonzalez, R. C. and Woods, R. E., 2008. Digital Image Processing. 3rd edition, Pearson Prentice Hall, Upper
Saddle River, NJ.
Gonzalez, R. C., Woods, R. E., and Eddins, S. L., 2004. Digital Image Processing Using MATLAB. Gatesmark
Publishing, Knoxville, TN.
Image Processing Techniques in Remote Sensing 63
Gopalan, K., Jones, W. L., Biswas, S., Bilanow, S., Wilheit, T., and Kasparis, T., 2009. A time-varying
radiometric bias correction for the TRMM microwave imager. IEEE Transactions on Geoscience and
Remote Sensing, 47, 3722–3730.
Goshtasby, A., 1987. Geometric correction of satellite images using composite transformation functions. In:
The 21st International Symposium on Remote Sensing of Environment, Ann Arbor, Michigan.
Gunturk, B. K., Li, X., 2013. Image Restoration Fundamentals and Advances. CRC Press, Boca Raton, FL.
Gurjar, S. B. and Padmanabhan, N., 2005. Study of various resampling techniques for high-resolution remote
sensing imagery. Journal of the Indian Society of Remote Sensing, 33, 113–120.
Hadjimitsis, D. G. and Clayton, C., 2009. Darkest pixel atmospheric correction algorithm: A revised procedure
for environmental applications of satellite remotely sensed imagery. Environmental Monitoring and
Assessment, 159, 281–292.
Hadjimitsis, D. G., Papadavid, G., Agapiou, A., Themistocleous, K., Hadjimitsis, M. G., Retalis, A.,
Michaelides, S. et al. 2010. Atmospheric correction for satellite remotely sensed data intended for
agricultural applications: impact on vegetation indices. Natural Hazards and Earth System Sciences,
10, 89–95.
Hall, F. G., Strebel, D. E., Nickeson, J. E., and Goetz, S. J., 1991. Radiometric rectification: Toward a common
radiometric response among multidate, multisensor images. Remote Sensing of Environment, 35, 11–27.
Hara, Y., Atkins, R., Yueh, S., Shin, R., and Kong, J., 1994. Application of neural networks to radar image
classification. IEEE Transactions on Geoscience and Remote Sensing, 32, 1994.
Herman, B. M., Browning, S. R., and Curran, R. J., 1971. The effect of atmospheric aerosols on scattered
sunlight. Journal of the Atmospheric Sciences, 28, 419–428.
Huete, A., 1988. A soil-adjusted vegetation index (SAVI). Remote Sensing of Environment, 25, 295–309.
Huete, A., Didan, K., Miura, T., Rodriguez, E., Gao, X., and Ferreira, L., 2002. Overview of the radiometric
and biophysical performance of the MODIS vegetation indices. Remote Sensing of Environment, 83,
195–213.
Hummel, R., 1977. Image enhancement by histogram transformation. Computer Graphics and Image
Processing, 6, 184–195.
Inampudi, R. B., 1998. Image Mosaicing. In: IGARSS ‘98. Sensing and Managing the Environment. 1998
IEEE International Geoscience and Remote Sensing. Symposium Proceedings, 2363–2365.
Janzen, D. T., Fredeen, A. L., and Wheate, R. D., 2006. Radiometric correction techniques and accuracy assessment
for Landsat TM data in remote forested regions. In: Canadian Journal of Remote Sensing, 330–340.
Kandasamy, S., Baret, F., Verger, A., Neveux, P., and Weiss, M., 2013. A comparison of methods for smoothing
and gap filling time series of remote sensing observations—application to MODIS LAI products.
Biogeosciences, 10, 4055–4071.
Kang, S., Running, S. W., Zhao, M., Kimball, J. S., and Glassy, J., 2005. Improving continuity of MODIS
terrestrial photosynthesis products using an interpolation scheme for cloudy pixels. International
Journal of Remote Sensing, 26, 1659–1676.
Kaufman, Y. J. and Sendra, C., 1988. Algorithm for automatic atmospheric corrections to visible and near-IR
satellite imagery. International Journal of Remote Sensing, 9, 1357–1381.
Kaufman, Y. J. and Tanré, D., 1996. Strategy for direct and indirect methods for correcting the aerosol effect
on remote sensing: From AVHRR to EOS-MODIS. Remote Sensing of Environment, 55, 65–79.
Kay, S., Hedley, J. D., and Lavender, S., 2009. Sun glint correction of high and low spatial resolution images
of aquatic scenes: A review of methods for visible and near-infrared wavelengths. Remote Sensing, 1,
697–730.
Keys, R., 1981. Cubic convolution interpolation for digital image processing. IEEE Transactions on Acoustics,
Speech, and Signal Processing, 29, 1153–1160.
Khorram, S., Nelson, S. A. C., Cakir, H., and van der Wiele, C. F., 2013. Digital image acquisition:
Preprocessing and data reduction, in: Pelton, J. N., Madry, S., and Camacho-Lara, S. (Eds.) Handbook
of Satellite Applications, 809–837.
Khorram, S., van der Wiele, C. F., Koch, F. H., Nelson, S. A. C., and Potts, M. D., 2016. Principles of Applied
Remote Sensing. Springer, New York.
Kim, W., He, T., Wang, D., Cao, C., and Liang, S., 2014. Assessment of long-term sensor radiometric
degradation using time series analysis. IEEE Transactions on Geoscience and Remote Sensing, 52,
2960–2976.
Kneizys, F. X., Shettle, E. P., and Gallery, W. O., 1981. Atmospheric transmittance and radiance: The
LOWTRAN 5 code, in: Fan, R. W. (Ed.), SPIE 0277, Atmospheric Transmission. 116 (July 28, 1981),
116–124. SPIE, Washington D.C., United States.
64 Multisensor Data Fusion and Machine Learning for Environmental Remote Sensing
KwangIn, K., Franz, M., and Scholkopf, B., 2005. Iterative kernel principal component analysis for image
modeling. Pattern Anal. Mach. Intell. 27, 1351–1366.
Lagendijk, R. and Biemond, J., 1999. Basic methods for image restoration and identification, in: Bovik, A.
(Ed.), Handbook of Image and Video Processing, 1–25. Academic Press, Massachusetts, USA.
Lee, J. S., Grünes, M. R., Ainsworth, T. L., Du, L. J., Schuler, D. L., and Cloude, S. R., 1999. Unsupervised
classification using polarimetric decomposition and the complex wishart classifier. IEEE Transactions
on Geoscience and Remote Sensing, 37, 2249–2258.
Li, Z., 2014. Fast Fourier transformation resampling algorithm and its application in satellite image processing.
Journal of Applied Remote Sensing, 8, 83683.
Mac Queen, J., 1967. Some methods for classification and analysis of multivariate observations. In: Proceedings
of the Fifth Berkeley Symposium on Mathematical Statistics and Probability, 231–297.
Maini, R. and Aggarwal, H., 2010. A Comprehensive review of image enhancement techniques. Journal of
Computing, 2, 39–44.
Maxwell, S. K., Schmidt, G. L., and Storey, J. C., 2007. A multi-scale segmentation approach to filling gaps in
Landsat ETM+ SLC-off images. International Journal of Remote Sensing, 28, 5339–5356.
Nielsen, A. A., Conradsen, K., and Simpson, J. J., 1998. Multivariate alteration detection (MAD) and MAF
postprocessing in multispectral, bitemporal image data: New approaches to change detection studies.
Remote Sensing of Environment, 64, 1–19.
Obata, K., Tsuchida, S., and Iwao, K., 2015. Inter-band radiometric comparison and calibration of ASTER
visible and near-infrared bands. Remote Sensing, 7, 15140–15160.
Pandey, P. K., Singh, Y., and Tripathi, S., 2011. Image processing using principle component analysis.
International Journal of Computer Applications, 15, 37–40.
Parker, J. A., Kenyon, R.V., and Troxel, D. E., 1983. Comparison of interpolating methods for image resampling.
IEEE Transactions on Medical Imaging, 2, 31–39.
Pons, X. and Solé-Sugrañes, L., 1994. A simple radiometric correction model to improve automatic mapping
of vegetation from multispectral satellite data. Remote Sensing of Environment, 48, 191–204.
Pons, X., Pesquer, L., Cristóbal, J., and González-Guerrero, O., 2014. Automatic and improved radiometric
correction of Landsat imagery using reference values from MODIS surface reflectance images.
International Journal of Applied Earth Observation and Geoinformation, 33, 243–254.
Reichenbach, S. E. and Geng, F., 2003. Two-dimensional cubic convolution. IEEE Transactions on Image
Processing, 12, 857–865.
Richter, R., 1996a. A spatially adaptive fast atmospheric correction algorithm. International Journal of
Remote Sensing, 17, 1201–1214.
Richter, R., 1996b. Atmospheric correction of satellite data with haze removal including a haze/clear transition
region. Computers & Geosciences, 22, 675–681.
Roerink, G. J., Menenti, M., and Verhoef, W., 2000. Reconstructing cloudfree NDVI composites using fourier
analysis of time series. International Journal of Remote Sensing, 21, 1911–1917.
Rouse, J. W., Hass, R. H., Schell, J. A., and Deering, D. W., 1974. Monitoring vegetation systems in the great
plains with ERTS. In: Third Earth Resources Technology Satellite (ERTS) Symposium, pp. 309–317.
Texas, United States.
Schott, J. R., Salvaggio, C., and Volchok, W. J., 1988. Radiometric scene normalization using pseudoinvariant
features. Remote Sensing of Environment, 26, 1–16.
Silva Centeno, J. A. and Haertel, V., 1997. An adaptive image enhancement algorithm. Pattern Recognition,
30, 1183–1189.
Song, C., Woodcock, C. E., Seto, K. C., Lenney, M., and Macomber, S. A., 2001. Classification and change
detection using landsat TM data: when and how to correct atmospheric effects. Remote Sensing of
Environment, 75, 230–244.
Starck, J.-L., Murtagh, F., Candes, E. J., and Donoho, D. L., 2003. Gray and color image contrast enhancement
by the curvelet transform. IEEE Transactions on Image Processing, 12, 706–717.
Teillet, P. M., Fedosejevs, G., Thome, K. J., and Barker, J. L., 2007. Impacts of spectral band difference
effects on radiometric cross-calibration between satellite sensors in the solar-reflective spectral domain.
Remote Sensing of Environment, 110, 393–409.
Thorne, K., Markharn, B., Slater, P., and Biggar, S., 1997. Radiometric calibration of landsat. Photogrammetric
Engineering & Remote Sensing, 63, 853–858.
Toutin, T., 2004. Review article: Geometric processing of remote sensing images: Models, algorithms and
methods. International Journal of Remote Sensing, 25, 1893–1924.
Image Processing Techniques in Remote Sensing 65
Verger, A., Baret, F., Weiss, M., Kandasamy, S., and Vermote, E., 2013. The CACAO method for smoothing,
gap filling, and characterizing seasonal anomalies in satellite time series. IEEE Transactions on
Geoscience and Remote Sensing, 51, 1963–1972.
Vermote, E. F., Tanré, D., Deuzé, J. L., Herman, M., and Morcrette, J. J., 1997. Second simulation of the
satellite signal in the solar spectrum, 6s: an overview. IEEE Transactions on Geoscience and Remote
Sensing, 35, 675–686.
Vicente-Serrano, S. M., Pérez-Cabello, F., and Lasanta, T., 2008. Assessment of radiometric correction
techniques in analyzing vegetation variability and change using time series of Landsat images. Remote
Sensing of Environment, 112, 3916–3934.
Wacker, A. G. and Landgrebe, D. A., 1972. Minimum distance classification in remote sensing. LARS
Technichal Reports, paper 25.
Wang, D., Morton, D., Masek, J., Wu, A., Nagol, J., Xiong, X., Levy, R., Vermote, E., and Wolfe, R., 2012.
Impact of sensor degradation on the MODIS NDVI time series. Remote Sensing of Environment, 119,
55–61.
Weiss, D. J., Atkinson, P. M., Bhatt, S., Mappin, B., Hay, S. I., and Gething, P. W., 2014. An effective approach
for gap-filling continental scale remotely sensed time-series. ISPRS Journal of Photogrammetry and
Remote Sensing, 98, 106–118.
Weisstein, E. W. Affine Transformation. From MathWorld—A Wolfram Web Resource. [Link]
[Link]/[Link]. Accessed 2017.
Yang, C.-C., 2006. Image enhancement by modified contrast-stretching manipulation. Optics & Laser
Technology, 38, 196–201.
Yang, X. and Lo, C. P., 2000. Relative radiometric normalization performance for change detection from
multi-date satellite images. Photogrammetric Engineering & Remote Sensing, 66, 967–980.
Yuan, H., Van Der Wiele, C. F., and Khorram, S., 2009. An automated artificial neural network system for land
use/land cover classification from landsat TM imagery. Remote Sensing, 1, 243–265.
Zhang, C., Li, W., and Travis, D., 2007. Gaps-fill of SLC-off Landsat ETM+ satellite image using a
geostatistical approach. International Journal of Remote Sensing, 28, 5103–5122.
Zhu, X., Liu, D., and Chen, J., 2012. A new geostatistical approach for filling gaps in Landsat ETM+ SLC-off
images. Remote Sensing of Environment, 124, 49–60.
Zitová, B. and Flusser, J., 2003. Image registration methods: a survey. Image and Vision Computing, 21,
977–1000.
Part II
Feature Extraction for Remote Sensing
5 Feature Extraction and
Classification for Environmental
Remote Sensing
5.1 INTRODUCTION
Human beings seeking to detect and extract information from imagery dates back to the time when
the first photographic image was acquired, as early as the mid-nineteenth century. Motivated by the
subsequent advances in photogrammetry, the invention of the airplane, improvements in the relevant
instrumentations and techniques, the advent of digital imagery, and the capabilities of electronic
processing, interest in efficiently extracting information from imagery to help with learning and
decision-making has increased significantly (Wolf et al., 2000; Quackenbush, 2004).
With the advancement of remote sensing, a wealth of instruments has been deployed onboard
various satellite and space-borne platforms dedicated to providing versatile remotely sensed data
to monitor Earth’s environment. As many remotely sensed imageries with high spatial, temporal,
and spectral resolutions are available on a daily basis at the global scale, the data volume increases
by many orders of magnitude, making it even harder to convert images into actionable information
and knowledge through conventional manual interpretation approaches for further decision-making
(Momm and Easso, 2011). Manual interpretation is time-consuming and labor-intensive; in addition,
it is difficult to cope with large volume information embedded in remotely sensed data, particularly
for remotely sensed images with fine resolutions in spectral (e.g., hyperspectral images) and spatial
(e.g., panchromatic images) domains.
Along this line, many statistical and geophysical methods were developed to help retrieve
information from different types of remote sensing imageries. Machine learning and/or data
mining are relatively new methods for feature extraction. When performing feature extraction
with machine learning or data mining in search of geospatial intelligence for a complex dataset,
one of the major problems is the low efficiency issue stemming from the large number of variables
involved. With more learning algorithms becoming available, feature extraction not only requires
a huge amount of memory and computational power, but also results in a slow learning process
with possible overfitting the training samples and poor generalization of the prediction to new
samples (Zena and Gillies, 2015). Generally, the large amount of information retrieved from
remotely sensed data makes it difficult to perform classification or pattern recognition for
environmental decision-making, because the observed information can be miscellaneous and
highly overlapped, although some are complementary with each other. Overall, these problems
can be mainly attributed to the large amount of redundant or complementary information
embedded in data in either spatial, temporal or spectral domain requiring tremendous efforts of
data analyses and syntheses.
When the input data to an algorithm is prone to be redundant or too large to be managed,
it is desirable to transform the raw data into a reduced form by keeping only the primary
characteristics of the raw data. Key information embedded in such reduced forms can be well
represented or preserved by a set of features to facilitate the subsequent learning process and
improve generalization and interpretability toward efficient decision making. In image processing
and pattern recognition, the techniques designed to construct a compact feature vector well
representing the raw observations are referred to as feature extraction, which is largely related
69
70 Multisensor Data Fusion and Machine Learning for Environmental Remote Sensing
to dimensionality reduction (Sharma and Sarma, 2016). In some cases, the reduced form (i.e.,
feature vector) could even lead to better human interpretations, especially for hyperspectral data,
which has more than several hundreds to one thousand total bands. Due to significant advantages
in reducing the size and dimensionality of the raw data, feature extraction has been widely
used to help with the problem of constructing and identifying certain types of features from the
given input data to solve various problems via the use of machine learning, data mining, image
compression, pattern recognition, and classification.
In fields of computational intelligence and information management such as machine learning
and pattern recognition, feature extraction has become the most critical step prior to classification
and decision-making, as the final performance of analysis is highly dependent on the quality of
extracted features. A typical workflow of image processing and pattern recognition for environmental
monitoring is presented in Figure 5.1, from which we can see that feature extraction is the first
essential step of processing after the pre-processing. With the fast development of computer sciences
and other relevant information technologies, having all the features of interest in an observed scene
automatically identified at the push of a button, namely, a process of automatic feature extraction, is
truly appealing and plausible. The ultimate goal is to develop automatic and intelligent techniques
to cope with the problem of detecting and extracting informative features from the input data
effectively and efficiently.
In this chapter, basic concepts and fundamentals associated with feature extraction, as well as a
wealth of commonly applied feature extraction techniques that can be used to help with classification
problems in remote sensing, will be introduced to aid in environmental decision-making. Different
learning strategies summarized below will be thoroughly discussed:
• Supervised Learning: This involves a set of target values that may be fed into the learning
model, allowing the model to adjust according to errors.
• Unsupervised Learning: This is required when there is not a set of target values for a model
to learn, such as searching for a hidden pattern in a big dataset. Often, clustering analysis
is conducted by dividing the big data set into groups according to some unknown pattern.
• Semi-supervised Learning: This is a class of supervised learning processes that make use
of very small amounts of labeled data within a large amount of unlabeled data for training.
In this way, we may guess the shape of the underlying data distribution and generalize
better to new samples. These algorithms can perform well when we have a very small
amount of labeled points and a large amount of unlabeled points.
In addition, metrics that can be used to evaluate the performance of feature extraction methods as
well as perspectives of feature extraction will be presented.
Feature Pattern
Remotely sensed imagery Pre-processing Post-processing Decision making
extraction recognition
Impact
Atmospheric Low-level Clustering Statistic
assessment
correction feature extraction computation
Unsupervised Risk
Radiometric Risk analysis
classification management
correction Unsupervised Modeling
feature extraction Supervised Mitigation
Geometric classification strategy
Generalization
correction
Supervised Machine Precautionary
Re-projection learning Upscaling
feature extraction principle
• Feature extraction is the process of transforming raw data into more informative signatures
or characteristics of a system, which will most efficiently or meaningfully represent the
information that is important for analysis and classification (Elnemr et al., 2016).
• Feature extraction is a process for extracting relevant information from an image. After
detecting a face, some valuable information is extracted from the image which is used in
the next step for identifying the image (Bhagabati and Sarma, 2016).
• Feature extraction is the process of transforming the input data into a set of features which
can very well represent the input data. It is a special form of dimensionality reduction
(Sharma and Sarma, 2016).
• Feature extraction is a process of deriving new features from the original features in order
to reduce the cost of feature measurement, increase classifier efficiency, and allow higher
classification accuracy (Akhtar and Hassan, 2015).
• Feature extraction is a process of extracting the important or relevant characteristics
that are enclosed within the input data. Dimensionality or size of the input data will be
subsequently reduced to preserve important information only (Ooi et al., 2015).
• Feature extraction is a special form of dimensionality reduction aiming at transforming the
input data into a reduced representation set of features (Kumar and Bhatia, 2014).
• Feature extraction is one of the important steps in pattern recognition, aiming at extracting a
set of descriptors, various characteristic attributes, and the relevant information associated
to form a representation of input pattern (Ashoka et al., 2012; Jain et al., 2000).
• Feature extraction is the process of extracting and building features from raw data. Feature
functions are utilized to extract and process informative features that are useful for
prediction (Gopalakrishnan, 2009).
• Feature extraction is a dimensionality reduction method that finds a reduced set of features
that are a combination of the original ones (Sánchez-Maroño and Alonso-Betanzos, 2009).
• Feature extraction refers to the extraction of linguistic items from the documents to provide
a representative sample of their content. Distinctive vocabulary items found in a document
are assigned to the different categories by measuring the importance of those items to the
document content (Durfee, 2006).
• Feature extraction can be viewed as finding a set of vectors that represent an observation
while reducing the dimensionality (Benediktsson et al., 2003).
• Feature extraction is a process that extracts a set of new features from the original features
through some functional mapping (Wyse et al., 1980).
The above-listed definitions are all meaningful and informative, indicating that the key to
feature extraction is to construct a compact feature vector to well represent the original data in a
lower dimensionality space. However, it is clear that the definition varies among research domains
and applications. By summarizing previous definitions, we may define feature extraction broadly
as a general term referring to “the process of constructing a set of compact feature vectors by
extracting the most relevant features from the input data to facilitate further decision-making by
using the reduced representation (i.e., feature vector) instead of the original full-size data while still
maintaining sufficient accuracy.”
72 Multisensor Data Fusion and Machine Learning for Environmental Remote Sensing
• Informative: The resulting feature should be expressive and perceptually meaningful, and
be able to explain a certain level of information embedded in the input data.
• Distinctive: The neighborhood around the feature center varies enough to allow for a
reliable discrimination between the features.
• Nonredundant: Features derived from different samples of the same class should be
grouped in the same category, and each type of feature should represent a unique property
of the input data.
• Repeatable detections: The resulting features should be the same in two different images
of the same scene. In other words, the features should be resistant to changes in viewing
conditions and noise, such as the presence of rotation and scaling effect.
• Localizable: The feature should have a unique location assigned to it, and changes in
viewing conditions or directions should not affect its location.
Attributes Features
Data points
v1 v2 vn
Original data
FIGURE 5.2 An illustrative example of selecting features from a given input data.
Feature Extraction and Classification for Environmental Remote Sensing 73
Nevertheless, the aforementioned properties are not the only criteria that can be used to evaluate
a feature vector, and features should not be limited to these characteristics, because the resulting
features are highly dependent on the specific problems at hand.
Features can be broadly categorized as low-level features and high-level features, although there
is no distinct gap between them (Elnemr et al., 2016). In general, low-level features are fundamental
features such as edges and lines as well as many other basic descriptors that can be easily detected
without performing any complex manipulation, which can be further divided into general features
and domain-specific features. The so-called “general features” mainly refer to those common
features that can be directly detected from any given image. In other words, general features should
be universal and application-independent. Three general features commonly used are:
• Color features: Color is one of the most important features of images because it is visually
intuitive to human perception. Color features are often defined subject to a particular
color space or model, and the most popular color spaces are RGB (red-green-blue) and
HSV (hue-saturation-value). Based on these color spaces, a variety of color features
including color histogram (Wang et al., 2009), color moment (Huang et al., 2010), color
coherence vector (Pass et al., 1998), and color correlogram (Huang et al., 1997) can be
then extracted.
• Texture features: In image processing, texture refers to a set of metrics designed to quantify
the perceived information about the spatial arrangement of color or intensities in an image
or selected region of an image (Haralick et al., 1973). As opposed to color, which is usually
represented by the brightness of each individual pixel, texture is often measured based
on a set of pixels by considering spatial or spectral similarities among pixels. Based on
the domain from which features are extracted, textures can be divided into spatial texture
features and spectral texture features.
• Shape features: Shape is an important geometrical cue used by human beings to
discriminate real-world objects. A shape can be described by different parameters such as
rectangularity, circularity ratio, eccentricity ratio, and center of gravity.
A list of object-based features that have been categorized into three classes, namely spectral,
textural, and shape features is summarized in Table 5.1. Most of the object-based features in Table 5.1
TABLE 5.1
List of Object-based Features that have been Categorized into Three Classes: Spectral,
Textural, and Shape Features (Chen et al., 2014)
Feature Category Features
Spectral feature Mean, Standard Deviation, Skewness, Ratio, Maximum, Minimum Mean of Inner Border, Mean
of Outer Border, Mean Diff. to Darker Neighbors, Mean Diff. to Brighter Neighbors, Contrast
to Neighbor Pixels, Edge Contrast of Neighbor Pixels, Std. Dev. To Neighbor Pixels, Circular
Mean, Circular Std. Dev., Mean Diff. to Neighbors, Mean Diff. to Scene, Ratio to Scene
Texture GLCM feature Angular Second Moment, Contrast, Correlation, Dissimilarity, Entropy, Homogeneity, Mean,
Std. Dev.
Texture GLDV feature Angular Second Moment, Contrast, Entropy, Mean
Shape feature Area, Asymmetry, Border Index, Border Length, Compactness, Density, Elliptic Fit, Length,
Main Direction, Radius of Largest Enclosed Ellipse, Radius of Smallest Enclosing Ellipse,
Rectangular Fit, Roundness, Shape Index, Width
Source: Chen, X., Li, H., and Gu, Y., 2014. 2014 Fourth International Conference on Instrumentation and Measurement,
Computer, Communication and Control, Harbin, China, 539–543.
Note: The features and their abbreviations are shown in the right column.
74 Multisensor Data Fusion and Machine Learning for Environmental Remote Sensing
are low-level features. In contrast, the so-called “domain-specific features” mainly refer to those
application-dependent features, and thus are highly related to practical applications in the domain.
For instance, fingerprints can be used as good features in human identity identification. Analogously,
human faces can be detected for face recognition, while chlorophyll-a content can be considered a
good feature for vegetation detection. In general, domain-specific features are confined to certain
specific applications that are not universal across domains.
High-level features refer to those feature vectors further derived from the low-level features by
using certain extra algorithms after basic feature extraction in the sense that hybrid algorithms are
often employed. The primary difference between low- and high-level features lies in the complexity
of the process used to extract the advanced features based on the low-level features. In Table 5.1,
for example, shape features should be application-dependent, such as fingerprints, faces, and
body gestures used for basic evidence for high-level human pattern recognitions, whereas spectral
features usually serve as low-level features for subsequent high-level feature extraction. Although it
is sometimes harder to obtain a high-level feature, it is of more help in understanding the designated
target embedded in those low-level features.
data. This is achievable by removing redundant or irrelevant features and reducing the dimensionality
of feature vectors, which facilitates the advanced learning process and improves the generalization
(Zena and Gillies, 2015). Thus, feature selection techniques are frequently used in many domains
to cope with problems with large input spaces, such as data mining, pattern recognition, and image
processing, because too much information can reduce the effectiveness of further data manipulation
due to complexities. Nevertheless, we should be aware that feature selection is different from feature
construction, as the latter generates new feature vectors, whereas the former focuses on selecting a
subset of features. In addition, applying a feature selection technique relies primarily on the input
data, which contains a variety of either redundant or irrelevant features, as the dimensionality or
space can be substantially reduced while still maintaining sufficient information to well represent
the original target (Bermingham et al., 2015).
A feature selection method can be considered an integrated process of feature subset screening
and quality assessment (Peng et al., 2005). By examining the manner of combining the screening
algorithm and the model building, the feature selection methods can be divided into three primary
categories including (1) wrappers, (2) filters, and (3) embedded methods (Das, 2001; Guyon and
Elisseeff, 2003; Zena and Gillies, 2015). Some different applications for feature selection methods
were summarized in Table 5.2. These methods were differentiated mostly by the preselected
evaluation criterion (Guyon and Elisseeff, 2006).
The wrapper methods use a predictive model to score feature subsets, whereas the filter methods
only use a proxy measure (e.g., correlation and mutual information) instead of the error rate to
rank subsets of features (Guyon and Elisseeff, 2003). More specifically, the wrapper methods take
advantage of the prediction performance of the given learning machine to evaluate the importance
of each feature subset (Kohavi and Johnb, 1997). Thus, it has nothing to do with the chosen learning
machine, as it is often used as a perfect black box (Phuong et al., 2005). Compared to the wrapper
methods, filter-type methods are usually less computationally intensive, as the rationale of filter
methods is mainly based on the filter metrics (e.g., mutual information) without incorporating learning
to detect the similarity between a candidate feature subset and the desired output (Hall, 1999; Peng
et al., 2005; Nguyen et al., 2009). Nevertheless, filter methods are vulnerable to redundant features,
as the interrelationships between candidate features are not taken into account. Many experimental
results show that although the wrapper methods have the disadvantage of being computationally
inefficient, they often yield better performances (Zhuo et al., 2008). Embedded methods aim at
reducing the computational complexity by incorporating feature selection as part of the training
process, which is usually specific to the given learning machine (Guyon and Elisseeff, 2003; Duval
et al., 2009; Zare et al., 2013). In general, extracting features from a given input data set is associated
with combining various attributes into a reduced set of features, which is a combination of art and
science, as the whole process involves the integration of advanced computational algorithms as well
as the knowledge of the professional domain expert.
Overall, there are three primary aspects associated with feature extraction:
• Feature detectors: A good feature detector is of critical importance to the final extracted
features, as the detected inherent objects in the original input data are fundamental
elements for the construction of the initial set of raw features.
• Feature construction: The process of constructing features is the key to feature extraction,
and how well the constructed features can represent the original target determines the final
performance of the whole feature extraction process.
• Dimensionality reduction: Selecting a subset of features from the initial set of raw features
by removing those redundant or irrelevant features may significantly improve the learning
and generalization efficiency, which in turn advances the development and application of
feature extraction techniques, particularly in domains dealing with large feature spaces,
such as remote sensing applications.
76 Multisensor Data Fusion and Machine Learning for Environmental Remote Sensing
TABLE 5.2
Different Feature Selection Methods and Their Characteristics
Methods Types Descriptions References
Minimum-redundancy- Filter The method aims to select good features Peng et al. (2005)
maximum-relevance according to the maximal statistical dependency
(mRMR) feature selection criterion based on mutual information.
Bayesian network Filter The method can be viewed as a search and Castro and Von Zuben
optimization procedure where features are (2009); Hruschka
evaluated based on their likelihood. et al. (2004)
Correlation Feature Filter Features are evaluated on the basis of their Haindl et al. (2006);
Selection (CFS) correlation with the class. Hall (1999); Yu and
Liu (2003)
Cascade Correlation Feature Wrapper This new internal wrapper feature selection Backstrom and
Selection (C2FS) method selects features at the same time hidden Caruana (2006)
units are being added to the growing C2 net
architecture.
Genetic algorithm Wrapper The method uses an evolutional way to optimize Zhuo et al. (2008)
the feature subset.
Sequential search Wrapper Candidate features are sequentially added to the Glass and Cooper
subset until the further addition does not (1965); Nakariyakul
increase the classification performance. (2014)
Particle Swarm Optimization Wrapper Features are selected according to the likelihood Xue et al. (2013)
(PSO) calculated by PSO.
Support Vector Machine- Embedded The SVM-RFE method looks for the features Guyon et al. (2002)
Recursive Feature that lead to the maximum margin separation
Elimination (SVM-RFE) between the classes as the features are ranked
based on certain ranking criteria.
Kernel-Penalized SVM Embedded The method uses the scaling factors principle to Maldonado and Weber
penalize the use of features in the dual (2011)
formulation of SVM by considering an
additional term that penalizes the zero norm of
the scaling factors.
Random Forests Embedded The method combines binary decision trees built Genuer et al. (2010)
based on several bootstrap samples, as each
decision tree has maximal depth and is not
pruned, and using different algorithms to attain
generalization improvement.
Laplacian Score ranking + a Hybrid The method sorts the features according to their Solorio-Fernández
modified Calinski– relevance and evaluates the features considering et al. (2016)
Harabasz index them as a subset rather than individually based
on a modified Calinski–Harabasz index.
Information gain + wrapper Hybrid The method uses a combination of sample Naseriparsa et al.
subset evaluation + genetic domain filtering and resampling to refine the (2013)
algorithm sample domain and two feature subset
evaluation methods to select reliable features.
Although many high-end computational algorithms can substantially advance feature extraction,
the knowledge of the domain expert is still critical because it is often difficult to quantitatively
assess the accuracy or performance of each process so expert knowledge is thus of help. For instance,
the selection of feature detectors and the number of features to be selected is often determined
according to human intuition and interpretation.
Feature Extraction and Classification for Environmental Remote Sensing 77
(a) (b)
FIGURE 5.3 Clouds extracted from one observed Landsat-8 OLI scene by using the thresholding method.
(a) Original image; (b) extracted clouds.
The process to extract features based on threshold is pretty simple and straightforward, but the
key is to determine an optimal threshold value. For example, by setting a proper threshold value to
the cloud-contaminated surface reflectance imagery observed by the Landsat 8 Operational Land
Imager (OLI), clouds can be easily detected and extracted from the original imagery (Figure 5.3).
The whole process can be modeled as
0, DN i < θ
DN i′ = (5.1)
1, DN i ≥ θ
where DN i and DN i′ denote the value of a digit number at pixel i in the original image and the
segmented binary image (e.g., Figure 5.3b), respectively. θ is the threshold value to be determined
by an expert with a priori knowledge or other advanced methods.
Although the thresholding method is capable of extracting certain features effectively from a
given input imagery, the determination of an optimal threshold is not easy without the aid of a series
of experiments or a priori knowledge. Furthermore, one single threshold value may not suffice
to handle all features in one image with various properties (Lv et al., 2017), for example, land
use classification in complex urban regions. To cope with such a complex problem, two or more
thresholds can be used to separate each type of feature sequentially.
PCA is a classic statistical technique which has been commonly used to decorrelate a set of
possibly correlated variables by projecting the original space into different orthogonal spaces
(Pearson, 1901). After the transformation, the resulting vectors are an uncorrelated orthogonal basis
set, which is termed principal components (PCs). Theoretically, the total number of resulting PCs
should not be greater than the dimension of the input data set, with the first PC accounting for the
largest variance of the input data, and with each subsequent component in turn having the rest of
highest variance while being orthogonal to the preceding components. Because of this advantage,
PCA has been extensively used in processing remotely sensed imagery, especially for the purpose
of dimensionality reduction.
Due to the fact that the resulting PCs are orthogonal to one another, PCA can be applied for
feature extraction purposes by explaining the largest variance in one specific space because each
PC can be considered a unique feature by integrating most of the relevant spectral information.
Compared to other methods, PCA has great advantages due to its low complexity, the absence of
pre-determined parameters and, last but not least, the fact that PCs are orthogonal to each other.
An illustrative example of applying PCA to remotely sensed imagery is demonstrated in Figure 5.4.
The first PC explained a total variance of 71.3%, which mainly represents the characteristics of
Feature Extraction and Classification for Environmental Remote Sensing 79
FIGURE 5.4 Principal component analysis performed on multispectral images. (a) Landsat-5 TM true color
image; (b) the first principal component; (c) the second principal component.
the texture of the observed scene, whereas the second PC explained a total variance of 23.4%, by
mainly emphasizing rivers and built-up areas.
Band math can be linked to the development of some bio-optical models that are functions in
terms of certain bands. Remotely sensed imageries recorded at multiple wavelengths provide a
synergistic opportunity to better monitor the changing Earth environment because of increasing
spectral information content. Based on the absorption spectrum difference, objects viewed in
one scene can be separated by generating composite images with specific features emphasized
through certain mathematical calculations between images observed at different wavelengths. The
process to generate a composite image by using spectral information at different wavelengths with
mathematical tools is referred to as band math. Compared to the original imagery with multiple
observations, the composite image should be more intuitive. One representative is the Normalized
Difference Vegetation Index (NDVI) that is commonly used to assess whether the target being
observed contains live green vegetation. NDVI is calculated based on the absorption difference at
red and near-infrared wavelengths by green vegetation, which can be modeled as follows (Rouse
et al., 1974):
NIR − Red
NDVI = (5.2)
NIR + Red
where NIR and Red stand for the surface spectral reflectance measurements acquired in the near-
infrared and red wavelengths regions, respectively.
Statistically, the values of NDVI should vary between −1.0 and +1.0, with green vegetation often
possessing positive values. Theoretically, negative values of NDVI (<−0.1) often correspond to non-
vegetation objects, for example, water bodies, and values close to zero (−0.1 to +0.1) commonly
correspond to barren areas of rock, sand or snow. By contrast, low, positive values of NDVI (0.2 to
0.4) should correspond to low-level vegetated areas like shrub and grassland, while high values
(>0.4) indicate areas covered with dense green vegetation (e.g., temperate and tropical rainforests).
Based on these characteristics, NDVI has considered a good graphic indicator for measuring green
vegetation coverage on Earth’s surface. Hence, the vegetation features can be easily extracted from a
given remotely sensed imagery after calculating the NDVI values. The spatial distribution of green
vegetation shown in Figure 5.4a can be easily detected and separated from other targets within the
scene based on the calculated NDVI values (Figure 5.5). Similarly, by calculating different water
feature-related indexes from a series of multitemporal Landsat 5-TM, 7-ETM+, and 8-OLI images,
the spatiotemporal changes of the surface area of the water body in Lake Urmia in the Middle East
during 2000–2013 were investigated (Rokni et al., 2014).
80 Multisensor Data Fusion and Machine Learning for Environmental Remote Sensing
NDVI
0.35
–0.43
FIGURE 5.5 Calculated NDVI based on Landsat-5 TM surface reflectance at red and near-infrared
wavelengths.
(a) (b)
FIGURE 5.6 (a) One Worldview observed scene on August 13, 2010; (b) a segmented image from (a) based
on shape, texture, and spectral characteristics.
2007), shape-size index (Han et al., 2012), and morphological profiles (Benediktsson et al., 2003;
Huang et al., 2014; Chunsen et al., 2016). Although the fundamental theory of each method is
different, the basic process is almost the same since these methods work mainly in the spatial
domain to detect and extract features of the targets being observed.
In real-world applications, spatial and spectral attributes are always used in conjunction with
one another to provide synergistic capability to aid in complex feature extraction tasks, such as
the high-resolution imagery classification (Shahdoosti and Mirzapour, 2017). For instance, urban
morphology is commonly characterized by a complex and variable coexistence of diverse and
spatially and spectrally heterogeneous objects (Boltz, 2004), hence the classification of land use
types in urban regions is by no means an easy task. To cope with such a complex problem, both
spatial and spectral attributes should be considered simultaneously to extract multiple land use
types effectively. Because features are detected and extracted for different objects, these methods
are also referred to as object-based feature extraction (Taubenböck et al., 2010; Shruthi et al., 2011;
Lv et al., 2017). For example, by considering both spatial and spectral attributes of objects observed
in one Worldview image, different types of ground targets can be detected and extracted with high
accuracy (Figure 5.6).
Dimensional
Classification Regression Clustering
reduction
Principal component
Logic regression Linear regression K-means clustering analysis
Linear discriminant
Classification trees Decision trees K-nearest neighbor analysis
Hierarchical Tensor
Random forests Fuzzy classification
clustering decomposition
FIGURE 5.7 Examples of supervised and unsupervised methods to automate the feature extraction process.
new inputs. In general, a robust learning algorithm should work well for the unseen data, and this
leads to the cross-validation in learning algorithms for performance evaluation.
In order to perform feature extraction in a supervised learning manner, one must go through the
following primary steps to:
1.
Determine what data should be used: This is essential for each exploratory data analysis;
the user must first have a clear idea about what data are to be used as training inputs. In the
case of land cover classification, for example, a possible training set could be constructed
from a remotely sensed panchromatic imagery, a multispectral imagery, or hyperspectral
imagery.
2.
Construct an optimal training set: A supervised learning algorithm will generalize a
function from the given training set (i.e., pairwise training samples and targets) and then
apply this inferred function for mapping the unseen data. Thus, the training set should be
comprehensive and representative, because the final accuracy of the generalized function
depends largely on how well the input-output corresponding structure is modeled. In order
to facilitate the advanced learning processes, the number of features is usually determined
based on expert knowledge such that redundant information could be significantly reduced
in the training sets. Otherwise, the learning burden could be very heavy due to a large
amount of irrelevant information. Moreover, the learning algorithm might fail to generalize
a proper function for the given input because of high dimensionality. On the other hand, the
number of features should not be too small, as the training inputs should contain adequate
Feature Extraction and Classification for Environmental Remote Sensing 83
information to represent all possible cases of the target so as to accurately predict the
unseen data.
3.
Select a suitable learning algorithm: To date, a wide range of learning algorithms are
available, and the user can select different algorithms toward a specific application by
considering the strengths and weaknesses of each algorithm. In addition, the structure of
the learning algorithm must be determined simultaneously; for example, the number of
hidden layers and hidden neurons represents the structure of an Artificial Neural Network
(ANN) model. Since there is no method satisfying all types of problems, the user should
select the algorithm most suitable for the specified real-world application.
4.
Choose a stopping criterion for the learning process: Once a training set and learning
algorithm are determined, the learning process can be started by generating a set of
models with the given learning algorithm to build relationships between dependent and
independent variables in the training set. In supervised learning, however, certain control
parameters (or stopping criteria) are required to stop the learning process, in particular
machine learning algorithms, such as ANN and Genetic Programming (GP). These
parameters can be tuned through an optimization algorithm, which can also be arbitrarily
defined by the user by setting certain criterion via cross-validation.
5.
Examine the performance of the learned function: After parameter optimization, the
performance of the inferred function should be carefully evaluated. The accuracy is
commonly assessed by mapping a separated subset that differs from the training set.
Statistical comparisons will be performed among the predicted output and the desired
value to check the overall accuracy of the inferred function. Once the accuracy meets the
anticipated level, the whole learning process is finished and the inferred function can then
be applied for mapping unseen data.
Supervised learning has been widely used in environmental remote sensing, and the most
common applications of feature extraction in remote sensing are remotely sensed image classification
and environmental modeling via machine learning tools. For example, Zhao and Du (2016)
developed a spectral–spatial feature-based classification framework to advance hyperspectral image
classification by using the trained multiple-feature-based classifier. In this framework, a balanced
local discriminant-embedding algorithm was proposed for spectral feature extraction from high-
dimensional hyperspectral data sets while a convolutional neural network was utilized to automatically
find spatial-related features at high levels. With the aid of GP, Chang et al. (2014) successfully
predicted the total organic carbon concentration in William H. Harsha Lake during 2008–2012 based
on the in situ measurements and the fused satellite-based remote sensing reflectance imagery. The
feature extraction performed in these two examples are realized in terms of supervised learning.
Despite the effectiveness of supervised learning algorithms, several major issues with respect to
supervised learning should be noted as well. The first is the tradeoff between bias and variance, as
the prediction error of a learned function is related to the sum of the bias and the variance of the
learning algorithm (Geman et al., 1992; James, 2003). Generally, a learning algorithm with low
bias should be flexible enough to fit data well. However, a large variance will be observed in the
predicted output if the algorithm is too flexible (e.g., fits each training set differently). Thus, a good
learning algorithm should be able to adjust this tradeoff automatically.
The second issue is related to the number of training data and the complexity of the function.
A small amount of training data will suffice if the inherent function is simple; otherwise, a large
volume of data is required if the true function is highly complex. The third issue is related to the
dimensionality of the input space. If the dimensionality of the input features is high, the learning
process can be very difficult because the high dimension inputs could confuse the learning algorithm,
making it fail to generalize well or generalize at local optimization. Therefore, a large training set
typically requires the learning algorithm to have low variance but high bias, and this motivates the
development of dimensionality reduction algorithms.
84 Multisensor Data Fusion and Machine Learning for Environmental Remote Sensing
The fourth issue is the noise level embedded in the desired output. In a situation where the
predicted values for the desired output are often incorrect, the learning algorithm should stop fitting
a function to the training set to avoid possible underfitting. In such a situation, early stopping as
well as removing noisy samples from the training set prior to the learning process can be of help.
In addition to the above-mentioned four major issues, other aspects such as the redundancy and
heterogeneity of data should also be considered in performing supervised learning tasks.
unsupervised learning. More accurately, it should be considered a class of supervised learning tasks
to a certain extent because it makes use of labeled data in the learning process. In other words,
the desired output values are provided for a subset of the training data whereas the remaining is
unlabeled. Typically, the amount of labeled data used is relatively smaller than that of the unlabeled
data. Despite this, the use of a small amount of labeled data in conjunction with unlabeled data may
result in considerable improvement in learning accuracy (Zhu, 2008). Therefore, semi-supervised
learning can be of great practical value in real-world applications.
By taking advantage of combined information from labeled and unlabeled data, semi-
supervised learning attempts to surpass the performance that could be obtained from either
supervised learning or unsupervised learning on each individual data set. In order to make
use of unlabeled data, the structure of the input data should be limited to one of the following
assumptions (Chapelle et al., 2006):
• Continuity assumption: Data close to each other are more likely to be labeled in the same
class. This is generally assumed in supervised learning and should also be obeyed in the
case of semi-supervised learning. This assumption yields a preference for geometrically
simple decision boundaries even in low-density regions to guarantee that fewer points in
different classes are close to each other.
• Cluster assumption: Discrete clusters can be formed and data in the same cluster tend to
be labeled in the same class. This is a special case of the continuity assumption that gives
rise to clustering-based feature learning.
• Manifold assumption: The data tend to lie on a manifold of much lower dimension than
that of the input data. Learning can proceed using distances and densities defined on the
manifold with both the labeled and unlabeled data to avoid dimensionality issues. The
manifold assumption is practical, especially for high-dimensional data with a few degrees
of freedom that are hard to model directly.
Due to the difficulty in acquiring a large volume of labeled data and the availability of vast
amounts of unlabeled data, semi-supervised learning has recently gained more popularity. In the
case of semi-supervised learning, unlabeled data are commonly used to either modify or reprioritize
hypotheses obtained from labeled data alone to aid in feature extraction (Zhu, 2008). Toward semi-
supervised learning, many different methods and algorithms have been developed in the past decades,
such as generative models, which are considered the oldest semi-supervised learning method based
on probabilistic theory (Zhu, 2008). Many other methods can be also applied, such as transductive
support vector machine (Vapnick, 1998; Yu et al., 2012), information regularization (Corduneanu
and Jaakkola, 2002; Szummer and Jaakkola, 2002), and graph-based methods (Camps-Valls et al.,
2007). More details of these methods will be introduced in the following chapters.
Neural computing methods are data-driven methods that have high fault-tolerance in general
(Giacinto and Roli, 2001; Canty, 2009; Kavzoglu, 2009). Many successful applications of remote
sensing image classification use classifiers based on various ANNs (i.e., such as Backward
Propagation (BP), Radial Basis Function (RBF), Self-Organized Mapping [SOM]) (Heermann and
Khazenie, 1992; Hoi-Ming and Ersoy, 2005; Suresh et al. 2008), and global optimization techniques
(such as Support Vector Machine [SVM]) (Foody and Mathur, 2004a,b). In ANNs, such as BP, RBF,
and SOM, an input space is mapped onto a feature space through the hidden layer, resulting in a
nonlinear classifier that outperforms most traditional statistical methods. However, these ANNs are
all “black box” models, whose classification mechanisms are difficult to interpret. Problems such as
overfitting, local minimum, and slow convergence speed are quite common for neural computing
methods. SVM differs radically from ANNs because SVM training processes always give rise to
a global minimum, and their simple geometric interpretation provides opportunities for advanced
optimization. While ANNs are limited by multiple local minima, the solution to an SVM is global
and unique; on the other hand, ANNs use empirical risk minimization whereas SVMs choose
structural risk minimization.
Classifiers based on fuzzy logic are much easier to interpret because the classification is usually
implemented according to rules summarized from the training dataset. Most fuzzy logic methods
are hybrid methods; for example, the Fuzzy C-Means (FCM) algorithm (Fan et al., 2009) is a hybrid
between fuzzy logic and the statistical algorithm (c-means). Classifiers based on Fuzzy Neural Networks
(FNN) (Chen et al., 2009) and Fuzzy Artmap (FA) (Han et al., 2004), which are combinations of
fuzzy logic and neural networks, were also reported. However, involving fuzzy logic in these hybrid
algorithms (i.e., FCM, FNN, and FA) may enlarge the uncertainty in the final classification.
Evolutionary algorithms are another category of machine learning techniques that have
been widely used in remote sensing image classification. Genetic Algorithm (GA), Evolutionary
Programming (EP), and GP are several classical evolutionary algorithms with many successful
applications (Agnelli et al., 2002; Ross et al., 2005; Makkeasorn et al., 2006; Awad et al., 2007; Chang
et al., 2009; Makkeasorn and Chang, 2009). Classifiers based on Artificial Immune System (AIS)
(Zhong et al., 2007) and swarm intelligence (Daamouche and Melgani, 2009) can also be included
in this category. In addition, classifiers such as those based on expert system theory (Stefanov et al.,
2001) and decision tree techniques (Friedl and Brodley, 1997) are also representative and important
classification methods. The current progress in the literature can be summarized based on the above
findings (Table 5.3). Hybrid learning algorithms integrating ML, FCM, FNN, FA, or KNN with
TABLE 5.3
Summary of Classification Methods in Image Processing
Statistical Methods ML, KNN, K-means
Artificial Neural Networks BP, RBF, SOM
Intelligence Global Optimization SVM
Fuzzy Logic FCM, FA, FNN
Evolutionary Algorithms GA, EP, GP, AIS
Other Methods Expert system, Decision tree
Source: Chang, N. B. (Ed.), 2012. Environmental Remote Sensing and Systems Analysis.
CRC Press, Boca Raton, FL.
Note: Maximum likelihood (ML), K-Nearest Neighbor (KNN), Backward propagation (BP),
radial basis function (RBF), self-organized mapping (SOM), support vector machine
(SVM), Fuzzy C-means (FCM), fuzzy artmap (FA), fuzzy neural network (FNN),
Genetic algorithm (GA), evolutionary programming (EP), and genetic programming
(GP), artificial immune system (AIS).
Feature Extraction and Classification for Environmental Remote Sensing 87
ANN, SOM, RBF, BP SVM or GP to form unique learning systems for specific feature extraction
can be anticipated.
TP + TN
OA = (5.3)
P+N
Another measure is the Kappa coefficient, which measures the agreement between two raters
each of which classifies N items into C mutually exclusive categories (Galton, 1892; Smeeton, 1985).
Prediction condition
Condition positive
True positive (TP) False negative (FN)
(P)
True
condition
Condition negative
False positive (FP) True negative (TN)
(N)
FIGURE 5.8 A 2 × 2 confusion matrix with P positive instances and N negative instances. P is the total
number of true positive instances (equivalent to TP + FN) and N is the total number of true negative instances
(equivalent to FP + TN).
88 Multisensor Data Fusion and Machine Learning for Environmental Remote Sensing
In other words, it is a measure of how the classification results compared to values assigned by
chance. In contrast to the simple percent agreement calculation, the Kappa coefficient is generally
thought to be more robust, because it considers the possibility of the agreement occurring by chance
(Fauvel et al., 2008). Conceptually, it can be defined as:
po − pe
KC = (5.4)
1 − pe
where po denotes the observed proportionate agreement equivalent to the OA and pe denotes the
overall random agreement probability which can be calculated as:
TP + FN TP + FP FP + TN FN + TN
pe = ∗ + ∗ (5.5)
P+N P+N P+N P+N
Kappa values range from 0 to 1. A value of 0 means no agreement between the predicted
condition and the actual condition, while a value of 1 indicates that the predicted condition and
the actual condition are totally identical (i.e., perfect agreement). Hence, the larger the value of the
Kappa coefficient the more accurate the result. However, some researchers expressed concerns that
the Kappa coefficient is an overly conservative measure of agreement because it has a tendency to
take the observed categories’ frequencies as givens, making it unreliable for measuring agreement
in situations with limited observations (Wu and Yang, 2005; Strijbos et al., 2006).
Apart from the overall accuracy and the Kappa coefficient to measure the general performance,
the accuracy of class identification should also be assessed. Within such a context, statistics like
errors of commission and/or errors of omission can be computed. Errors of commission are a
measure of false positives, representing the fraction of values that were predicted to be in a class
but do not belong to that class. In contrast, errors of omission are a measure of false negatives,
representing the fraction of values that belong to a class but were predicted to be in a different class.
Hence, the Commission Error (CE) of condition positive shown in Figure 5.8 can be calculated as:
FN
CE = (5.6)
P
FP
OE = (5.7)
TP + FP
In addition, Producer Accuracy (PA) and User Accuracy (UA) are another two performance
measures that are commonly computed and used for performance assessment of classification.
The PA shows the probability that a value in a given class was classified correctly, which can be
calculated by summing the number of pixels correctly classified in a particular category divided by
the total number of pixels actually belonging to that category.
TP
PA = (5.8)
TP + FP
The UA shows the probability that a value predicted to be in a certain class really belongs to that
class, which can be calculated as the fraction of correctly predicted values to the total number of
values predicted to be in a class.
TP
UA = (5.9)
P
Feature Extraction and Classification for Environmental Remote Sensing 89
It is clear that the PA and OE complement each other, as a total value of them equals 1, so do the
UA and CE.
Furthermore, statistics like Partition Coefficient (Bezdek, 1973), Fukuyama-Sugeno index
(Fukuyama and Sugeno, 1989), Fuzzy Hyper Volume (Gath and Geva, 1989), β index (Pal et al.,
2000), Xie-Beni index (Xie and Beni, 1991), and many others (Congalton, 1991; Wu and Yang, 2005),
can be also applied. To assess the statistical significance of differences in the classification results,
methods such as the McNemars test can be further applied, which is based upon the standardized
normalized test statistic (Foody, 2004). The parameter Z12 in McNemars test is defined as:
f12 − f21
Z12 = (5.10)
f12 + f21
where f12 denotes the number of samples classified correctly by classifier 1 and incorrectly by
classifier 2. A positive Z12 indicates classifier 1 outperforms classifier 2 while a negative value
shows vice versa. The difference in accuracy between two classifiers is considered to be statistically
significant one | Z12 | > 1.96 (Fauvel et al., 2008; Imani and Ghassemian, 2016).
5.9 SUMMARY
In this chapter, the basic concepts and fundamentals of feature extraction were introduced, including
definitions of feature, feature selection, and feature extraction. Based on the domain of interest,
feature extraction methods can be grouped into spectral- and spatial-based feature extraction, yet by
considering the working modes, feature extraction techniques can be divided into supervised, semi-
supervised, and unsupervised methods. Illustrative examples were also provided for demonstration
purposes. In addition, a set of different statistical indicators were introduced as performance
measures for the evaluation of the resulting outcomes.
It is clear that developing a robust feature extraction workflow is by no means a simple task, since
it requires us to gain a thorough understanding of the input data, to devote the requisite time for
pre-processing, and to effectively apply the elements of image interpretation for decision analysis.
In the next chapter, a wealth of traditional methods and approaches that were proposed for feature
extraction will be introduced and discussed.
REFERENCES
Agnelli, D., Bollini, A., and Lombardi, L., 2002. Image classification: An evolutionary approach. Pattern
Recognition Letters, 23, 303–309.
Agrawal, R., Gehrke, J., Gunopulos, D., and Raghavan, P., 2005. Automatic subspace clustering of high
dimensional data. Data Mining and Knowledge Discovery, 11, 5–33.
Ahmad, A., 2012. Analysis of maximum likelihood classification on multispectral data. Applied Mathematical
Sciences, 6, 6425–6436.
Akhtar, U. and Hassan, M., 2015. Big data mining based on computational intelligence and fuzzy clustering.
In: Zaman, N., Seliaman, M. E., Hassan, M. F. and Marquez, F. P. G. (Eds.), Handbook of Research on
Trends and Future Directions in Big Data and Web Intelligence, IGI Global, 130–148.
Ashoka, H. N., Manjaiah, D. H., and Rabindranath, B., 2012. Feature extraction technique for neural network
based pattern recognition. International Journal of Computer Science and Engineering, 4, 331–340.
Awad, M., Chehdi, K., and Nasri, A., 2007. Multicomponent image segmentation using a genetic algorithm
and artificial neural network. IEEE Geoscience and Remote Sensing Letters, 4, 571–575.
Backstrom, L. and Caruana, R., 2006. C2FS: An algorithm for feature selection in cascade neural networks.
In: The 2006 IEEE International Joint Conference on Neural Network Proceedings, Vancouver, BC,
Canada, 4748–4753.
Baraldi, A. and Parmiggiani, F., 1995. An investigation of the textural characteristics associated with gray
level cooccurence matrix statistical parameters. IEEE Transactions on Geoscience and Remote Sensing,
33, 293–304.
90 Multisensor Data Fusion and Machine Learning for Environmental Remote Sensing
Belkin, M., Niyogi, P., and Sindhwani, V., 2006. Manifold regularization: A geometric framework for learning
from labeled and unlabeled examples. Journal of Machine Learning Research, 7, 2399–2434.
Benediktsson, J. A., Pesaresi, M., and Arnason, K., 2003. Classification and feature extraction for remote
sensing images from urban areas based on morphological transformations. IEEE Transactions on
Geoscience and Remote Sensing, 41, 1940–1949.
Bermingham, M. L., Pong-Wong, R., Spiliopoulou, A., Hayward, C., Rudan, I., Campbell, H., Wright, A. F.,
et al., 2015. Application of high-dimensional feature selection: Evaluation for genomic prediction in
man. Scientific Reports, 5, 10312.
Bezdek, J. C., 1973. Cluster validity with fuzzy sets. Cybernetics and Systems, 3, 58–73.
Bhagabati, B. and Sarma, K. K., 2016. Application of face recognition techniques in video for biometric
security. In: Gupta, B., Dharma, P., Agrawal, D. P., and Yamaguchi, S. (Eds.), Handbook of Research on
Modern Cryptographic Solutions for Computer and Cyber Security, IGI Global, 460–478.
Bishop, C., 2006. Pattern Recognition and Machine Learning, Technometrics. Springer-Verlag,
New York.
Blanzieri, E. and Melgani, F., 2008. Nearest neighbor classification of remote sensing images with the maximal
margin principle. IEEE Transactions on Geoscience and Remote Sensing, 46, 1804–1811.
Boltz, S., 2004. Statistical region merging code. IEEE Transactions on Pattern Analysis and Machine
Intelligence, 26, 1452–1458.
Bouman, C. A. and Shapiro, M., 1994. A multiscale random field model for Bayesian image segmentation.
IEEE Transactions on Image Processing, 3, 162–177.
Camps-Valls, G., Bandos Marsheva, T. V., and Zhou, D., 2007. Semi-supervised graph-based hyperspectral
image classification. IEEE Transactions on Geoscience and Remote Sensing, 45, 3044–3054.
Canty, M. J. 2009. Boosting a fast neural network for supervised land cover classification. Computers &
Geosciences, 35, 1280–1295.
Carson, C., Belongie, S., Greenspan, H., and Malik, J., 2002. Blobworld: Image segmentation using expectation-
maximization and its application to image querying. IEEE Transactions on Pattern Analysis and
Machine Intelligence, 24, 1026–1038.
Castro, P. A. D. and Von Zuben, F. J., 2009. Learning Bayesian networks to perform feature selection. In: 2009
International Joint Conference on Neural Networks, Atlanta, GA, USA, 467–473.
Cattell, R. B., 1943. The description of personality: Basic traits resolved into clusters. Journal of Abnormal &
Social Psychology, 38, 476–506.
Chang, N. B., Daranpob, A., Yang, J. and Jin, K. R., 2009. Comparative data mining analysis for information
retrieval of MODIS images: Monitoring lake turbidity changes at Lake Okeechobee, Florida. Journal of
Applied Remote Sensing, 3, 033549.
Chang, N.-B., Vannah, B. W., Yang, Y. J., and Elovitz, M., 2014. Integrated data fusion and mining techniques
for monitoring total organic carbon concentrations in a lake. International Journal of Remote Sensing,
35, 1064–1093.
Chapelle, O., Schölkopf, B., and Zien, A., 2006. Semi-supervised Learning. MIT Press, Cambridge, MA.
Chen, H. W., Chang, N. B., Yu, R. F., and Huang, Y. W., 2009. Urban land use and land cover classification
using the neural-fuzzy inference approach with Formosat-2 data. Journal of Applied Remote Sensing,
3, 033558.
Chen, X., Li, H., and Gu, Y., 2014. Multiview Feature Selection for Very High Resolution Remote Sensing
Images. In: 2014 Fourth International Conference on Instrumentation and Measurement, Computer,
Communication and Control, Harbin, China, 539–543.
Chunsen, Z., Yiwei, Z., and Chenyi, F., 2016. Spectral–spatial classification of hyperspectral images using
probabilistic weighted strategy for multifeature fusion. IEEE Geoscience and Remote Sensing Letters,
13, 1562–1566.
Congalton, R. G., 1991. A review of assessing the accuracy of classifications of remotely sensed data. Remote
Sensing of Environment, 37, 35–46.
Corduneanu, A. and Jaakkola, T., 2002. On information regularization. In: Proceedings of the Nineteenth
Conference on Uncertainty in Artificial Intelligence, Acapulco, Mexico, 151–158.
Daamouche, A. and Melgani, F., 2009. Swarm intelligence approach to wavelet design for hyperspectral image
classification. IEEE Geoscience and Remote Sensing Letters, 6(4), 825–829.
Das, S., 2001. Filters, wrappers and a boosting-based hybrid for feature selection. In: Proceedings of the
Eighteenth International Conference on Machine Learning (ICML’01), San Francisco, CA, USA,
Morgan Kaufmann Publisher, 74–81.
Durfee, A., 2006. Text mining. In: Garson, G. D. and Khosrow-Pour (Eds.), Handbook of Research on Public
Information Technology, IGI Global, 592–603.
Feature Extraction and Classification for Environmental Remote Sensing 91
Duval, B., Hao, J.-K., and Hernandez Hernandez, J. C., 2009. A memetic algorithm for gene selection and
molecular classification of cancer. In: Proceedings of the 11th Annual Conference on Genetic and
Evolutionary Computation—GECCO ’09, New York, New York, USA, ACM Press.
Elnemr, H. A., Zayed, N. M., and Fakhreldein, M. A., 2016. Feature extraction techniques: Fundamental
concepts and survey. In: Kamila, N. K. (Ed.), Handbook of Research on Emerging Perspectives in
Intelligent Pattern Recognition, Analysis, and Image Processing, IGI Global, 264–294.
Ester, M., Kriegel, H. P., Sander, J., and Xu, X., 1996. A density-based algorithm for discovering clusters in
large spatial databases with noise. In: Proceedings of the 2nd International Conference on Knowledge
Discovery and Data Mining, Portland, Oregon, USA, 226–231.
Everitt, B. S., Landau, S., Leese, M., and Stahl, D., 2011. Hierarchical Clustering, in Cluster Analysis, 5th
Edition. John Wiley & Sons, Ltd, Chichester, UK.
Fan, J., Han M., and Wang, J., 2009. Single point iterative weighted fuzzy C-means clustering algorithm for
remote sensing image segmentation. Pattern Recognition, 42, 2527–2540.
Fauvel, M., Benediktsson, J. A., Chanussot, J., and Sveinsson, J. R., 2008. Spectral and spatial classification
of hyperspectral data using SVMs and morphological profiles. IEEE Transactions on Geoscience and
Remote Sensing, 46, 3804–3814.
Foody, G. M., 2004. Thematic map comparison: Evaluating the statistical significance of differences in
classification accuracy. Photogrammetric Engineering & Remote Sensing, 70, 627–633.
Foody, G. M. and Mathur, A., 2004a. A relative evaluation of multiclass image classification by support vector
machines. IEEE Transactions on Geoscience and Remote Sensing, 42, 1335–1343.
Foody, G. M. and Mathur, A., 2004b. Toward intelligent training of supervised image classifications: Directing
training data acquisition for SVM classification. Remote Sensing of Environment, 93, 107–117.
Forgy, E. W., 1965. Cluster analysis of multivariate data: Efficiency versus interpretability of classifications.
Biometrics, 21, 768–769.
Friedl, M. A. and Brodley, C. E., 1997. Decision tree classification of land cover from remotely sensed data.
Remote Sensing of Environment, 61, 399–409.
Fukuyama, Y. and Sugeno, M., 1989. A new method of choosing the number of clusters for the fuzzy C-means
method. In: Procedings of 5th Fuzzy System Sympptom, 247–250.
Galton, F., 1892. Finger Prints. Macmillan, London.
Gath, I. and Geva, A. B., 1989. Unsupervised optimal fuzzy clustering. IEEE Transactions on Pattern Analysis
and Machine Intelligence, 11, 773–780.
Geman, S., Bienenstock, E., and Doursat, R., 1992. Neural Networks and the Bias/Variance Dilemma. Neural
Computation, 4, 1–58.
Genuer, R., Poggi, J.-M., and Tuleau-Malot, C., 2010. Variable selection using random forests. Pattern
Recognition Letters, 31, 2225–2236.
Giacinto, G. and Roli, F., 2001. Design of effective neural network ensembles for image classification purposes.
Image and Vision Computing, 19, 699–707.
Glass, H. and Cooper, L., 1965. Sequential search: A method for solving constrained optimization problems.
Journal of the ACM, 12, 71–82.
Gopalakrishnan, V., 2009. Computer aided knowledge discovery in biomedicine. In: Daskalaki, A. (Ed.),
Handbook of Research on Systems Biology Applications in Medicine, IGI Global, 126–141.
Guyon, I. and Elisseeff, A., 2003. An introduction to variable and feature selection. Journal of Machine
Learning Research, 3, 1157–1182.
Guyon, I. and Elisseeff, A., 2006. An Introduction to Feature Extraction, in: Feature Extraction: Foundations
and Applications. Springers, Berlin, Heidelberg, 1–25.
Guyon, I., Weston, J., Barnhill, S., and Vapnik, V., 2002. Gene selection for cancer classification using support
vector machines. Machine Learning, 46, 389–422.
Haindl, M., Somol, P., Ververidis, D., and Kotropoulos, C., 2006. Feature selection based on mutual correlation.
In: Progress in Pattern Recognition, Image Analysis and Applications, Havana, Cuba, 569–577.
Hall, M. A., 1999. Correlation-based Feature Selection for Machine Learning. University of Waikato,
Hamilton, NewZealand.
Han, Y., Kim, H., Choi, J., and Kim, Y., 2012. A shape–size index extraction for classification of high resolution
multispectral satellite images. International Journal of Remote Sensing, 33, 1682–1700.
Han, M., Tang, X., and Cheng, L., 2004. An improved fuzzy ARTMAP network and its application in wetland
classification. In: Proceedings of 2004 IEEE International Geoscience and Remote Sensing Symposium,
Alaska, USA.
Haralick, R. M., Shanmugam, K., and Dinstein, I., 1973. Textural features for image classification. IEEE
Transactions on Systems, Man, and Cybernetics, SMC-3, 610–621.
92 Multisensor Data Fusion and Machine Learning for Environmental Remote Sensing
Heermann, P. D. and Khazenie, N., 1992. Classification of multispectral remote sensing data using a back-
propagation neural network. IEEE Transactions on Geoscience and Remote Sensing, 30, 81–88.
Hoi-Ming, C. and Ersoy, O. K., 2005. A statistical self-organizing learning system for remote sensing
classification. IEEE Transactions on Geoscience and Remote Sensing, 43, 1890–1900.
Hruschka, E. R., Hruschka, E. R., and Ebecken, N. F. F., 2004. Feature selection by Bayesian networks. In:
Tawfik, A. Y. and Goodwin, S. D. (Eds.), Advances in Artificial Intelligence, Cairns, Australia, 370–379.
Huang, Z. C., Chan, P. P. K., Ng, W. W. Y., and Yeung, D. S., 2010. Content-based image retrieval using
color moment and Gabor texture feature. In: 2010 International Conference on Machine Learning and
Cybernetics (ICMLC), Qingdao, China.
Huang, X., Guan, X., Benediktsson, J. A., Zhang, L., Li, J., Plaza, A., and Dalla Mura, M., 2014. Multiple
morphological profiles from multicomponent-base images for hyperspectral image classification. IEEE
Journal of Selected Topics in Applied Earth Observations and Remote Sensing, 7, 4653–4669.
Huang, J., Kumar, S. R., Mitra, M., Zhu, W., and Zabih, R., 1997. Image indexing using color correlograms.
In: Conference on Computer Vision and Pattern Recognition (CVPR’97), Puerto Rico, USA, 762–768.
Huang, X., Zhang, L., and Li, P., 2007. Classification and extraction of spatial features in urban areas using
high-resolution multispectral imagery. IEEE Geoscience and Remote Sensing Letters, 4, 260–264.
Imani, M. and Ghassemian, H., 2016. Binary coding based feature extraction in remote sensing high
dimensional data. Information Sciences, 342, 191–208.
Jain, A. K., Duin, R. P. W., and Jianchang, M., 2000. Statistical pattern recognition: A review. IEEE
Transactions on Pattern Analysis and Machine Intelligence, 22, 4–37.
James, G. M., 2003. Variance and bias for general loss functions. Machine Learning, 51, 115–135.
Johnson, S. C., 1967. Hierarchical clustering schemes. Psychometrika, 32, 241–254.
Jordan, M. and Bishop, C., 2004. Neural networks. In: Tucker, A. B. (Ed.), Computer Science Handbook,
Second Edition, Chapman and Hall/CRC, Flordia, USA, 1–16.
Kavzoglu, T., 2009. Increasing the accuracy of neural network classification using refined training data.
Environmental Modelling & Software, 24, 850–858.
Kohavi, R. and Johnb, G. H., 1997. Wrappers for feature subset selection. Artificial Intelligence, 97, 273–324.
Kriegel, H.-P., Kröger, P., Sander, J., and Zimek, A., 2011. Density-based clustering. WIREs Data Mining and
Knowledge Discovery, 1, 231–240.
Kumar, G. and Bhatia, P. K., 2014. A detailed review of feature extraction in image processing systems. In:
2014 IEEE 4th International Conference on Advanced Computing & Communication Technologies,
Rohtak, India, 5–12.
Kuo, B. C., Chang, C. H., Sheu, T. W., and Hung, C. C., 2005. Feature extractions using labeled and unlabeled data.
In: International Geoscience and Remote Sensing Symposium (IGARSS), Seoul, South Korea, 1257–1260.
Lillesand, T. M., Kiefer, R. W., and Chipman, J. W., 1994. Remote Sensing and Image Interpretation. John
Wiley and Sons, Inc., Toronto.
Liu, Z., Li, H., Zhou, W., and Tian, Q., 2012. Embedding spatial context information into inverted file for
large-scale image retrieval. In: Proceedings of the 20th ACM International Conference on Multimedia,
New York, USA, ACM Press, 199.
Lv, Z., Zhang, P., and Atli Benediktsson, J., 2017. Automatic object-oriented, spectral-spatial feature extraction
driven by Tobler’s first law of geography for very high resolution aerial imagery Classification. Remote
Sensing, 9, 285.
Makkeasorn, A. and Chang, N. B., 2009. Seasonal change detection of riparian zones with remote sensing
images and genetic programming in a semi-arid watershed. Journal of Environmental Management,
90, 1069–1080.
Makkeasorn, A., Chang, N. B., Beaman, M., Wyatt, C., and Slater, C., 2006. Soil moisture prediction in a
semi-arid reservoir watershed using RADARSAT satellite images and genetic programming. Water
Resources Research, 42, 1–15.
Maldonado, S. and Weber, R., 2011. Embedded feature selection for support vector machines: State-of-the-
Art and future challenges. In: Progress in Pattern Recognition, Image Analysis, Computer Vision, and
Applications, Pucón, Chile, 304–311.
Mohri, M., 2012. Foundations of Machine Learning. The MIT Press, Massachusetts, USA.
Momm, H. and Easso, G., 2011. Feature extraction from high-resolution remotely sensed imagery using
evolutionary computation. In: Kita, E. (Ed.), Evolutionary Algorithms, InTech, 423–442.
Moon, T. K., 1996. The expectation-maximization algorithm. IEEE Signal Processing Magazine, 13, 47–60.
Moser, G. and Serpico, S. B., 2013. Combining support vector machines and Markov random fields in an
integrated framework for contextual image classification. IEEE Transactions on Geoscience and
Remote Sensing, 51, 2734–2752.
Feature Extraction and Classification for Environmental Remote Sensing 93
Murtagh, F. and Contreras, P., 2012. Algorithms for hierarchical clustering: An overview. WIREs Data Mining
and Knowledge Discovery, 2, 86–97.
Nakariyakul, S., 2014. Improved sequential search algorithms for classification in hyperspectral remote
sensing images. In: Proceedings SPIE 9273, Optoelectronic Imaging and Multimedia Technology III,
927328, Beijing, China.
Naseriparsa, M., Bidgoli, A.-M., and Varaee, T., 2013. A hybrid feature selection method to improve
performance of a group of classification algorithms. International Journal of Computer Applications,
69, 28–35.
Nguyen, H., Franke, K., and Petrovic, S., 2009. Optimizing a class of feature selection measures. In: NIPS
2009 Workshop on Discrete Optimization in Machine Learning: Submodularity, Sparsity & Polyhedra
(DISCML), Vancouver, Canada.
Nixon, M. S. and Aguado, A. S., 2012. Feature Extraction and Image Processing, 2nd Edition. Academic
Press, London, UK.
Ooi, C. S., Seng, K. P., and Ang, L.-M., 2015. Automated technology integrations for customer satisfaction
assessment. In: Kaufmann, H.-R. (Ed.), Handbook of Research on Managing and Influencing Consumer
Behavior, IGI Global, Hershey, Pennsylvania, USA, 606–620.
Pal, S. K., Ghosh, A., and Shankar, B. U., 2000. Segmentation of remotely sensed images with fuzzy
thresholding, and quantitative evaluation. International Journal of Remote Sensing, 21, 2269–2300.
Pass, G., Zabih, R., and Miller, J., 1998. Comparing images using color coherence vectors. In: Proceedings of
the Fourth ACM International Conference on Multimedia, Massachusetts, USA, 1–14.
Pearson, K., 1901. On lines and planes of closest fit to systems of points in space. Philosophical Magazine, 2,
559–572.
Peng, H., Long, F., and Ding, C., 2005. Feature selection based on mutual information criteria of max-
dependency, max-relevance, and min-redundancy. IEEE Transactions on Pattern Analysis and Machine
Intelligence, 27, 1226–1238.
Phillips, S. J., 2002. Acceleration of K-means and related clustering algorithms. In: Mount, D. M. and
Stein, C. (Eds.), Lecture Notes in Computer Science. Springer Berlin Heidelberg, Berlin, Germany,
166–177.
Phuong, T. M., Lin, Z., and Altman, R. B., 2005. Choosing SNPs using feature selection. In: Proceedings—2005
IEEE Computational Systems Bioinformatics Conference, California, USA, 301–309.
Powers, D. M. W., 2011. Evaluation: From Precision, Recall and F-Measure to Roc, Informedness, Markedness
& Correlation. Journal of Machine Learning Technologies, 2, 37–63.
Quackenbush, L. J., 2004. A review of techniques for extracting linear features from imagery. Photogrammetric
Engineering & Remote Sensing, 70, 1383–1392.
Rokni, K., Ahmad, A., Selamat, A., and Hazini, S., 2014. Water feature extraction and change detection using
multitemporal landsat imagery. Remote Sensing, 6, 4173–4189.
Ross, B. J., Gualtieri, A. G., and Budkewitsch, P., 2005. Hyperspectral image analysis using genetic
programming. Applied Soft Computing, 5, 147–156.
Rouse, J. W., Haas, R. H., Schell, J. A., and Deering, D. W., 1974. Monitoring vegetation systems in the
great Okains with ERTS. In: Third Earth Resources Technology Satellite-1 Symposium, Texas, USA,
325–333.
Sánchez-Maroño, N. and Alonso-Betanzos, A., 2009. Feature selection. In: Shapiro, S. C. (Ed.), Encyclopedia
of Artificial Intelligence, IGI Global, Hershey, Pennsylvania, USA, 632–638.
Shahdoosti, H. R. and Mirzapour, F., 2017. Spectral–spatial feature extraction using orthogonal linear
discriminant analysis for classification of hyperspectral data. European Journal of Remote Sensing,
50, 111–124.
Sharma, M. and Sarma, K. K., 2016. Soft-computational techniques and Spectro-temporal features for
telephonic speech recognition. In: Bhattacharyya, S., Banerjee, P., Majumdar, D., and Dutta, P. (Eds.),
Handbook of Research on Advanced Hybrid Intelligent Techniques and Applications, IGI Global,
Hershey, Pennsylvania, USA, 161–189.
Shruthi, R. B. V., Kerle, N., and Jetten, V., 2011. Object-based gully feature extraction using high spatial
resolution imagery. Geomorphology, 134, 260–268.
Smeeton, N. C., 1985. Early history of the Kappa statistic (response letter). Biometrics, 41, 795.
Solorio-Fernández, S., Carrasco-Ochoa, J. A., and Martínez-Trinidad, J. F., 2016. A new hybrid filter–wrapper
feature selection method for clustering based on ranking. Neurocomputing, 214, 866–880.
Stefanov, W. L., Ramsey, M. S., and Christensen, P. R., 2001. Monitoring urban land cover change: An expert
system approach to land cover classification of semiarid to arid urban centers. Remote Sensing of
Environment, 77, 173–185.
94 Multisensor Data Fusion and Machine Learning for Environmental Remote Sensing
Stehman, S. V., 1997. Selecting and interpreting measures of thematic classification accuracy. Remote Sensing
of Environment, 62, 77–89.
Strijbos, J.-W., Martens, R. L., Prins, F. J., and Jochems, W. M. G., 2006. Content analysis: What are they
talking about? Computers & Education, 46, 29–48.
Suresh, S., Sundararajan, N., and Saratchandran, P., 2008. A sequential multi-category classifier using radial
basis function networks. Neurocomputing, 71, 1345–1358.
Szummer, M. and Jaakkola, T., 2002. Information regularization with partially labeled data. In: Proceedings
of Advances in Neural Information Processing Systems, 15, 1025–1032.
Taubenböck, H., Esch, T., Wurm, M., Roth, A., and Dech, S., 2010. Object-based feature extraction using high
spatial resolution satellite data of urban areas. Journal of Spatial Science, 55, 117–132.
Tian, D. P., 2013. A review on image feature extraction and representation techniques. International Journal
of Multimedia and Ubiquitous Engineering, 8, 385–395.
Tryon, R. C., 1939. Cluster Analysis: Correlation Profile and Orthometric (factor) Analysis for the Isolation
of Unities in Mind and Personality. Edwards Brother. Inc. lithoprinters Publ.
Tso, B. C. K. and Mather, P. M., 1999. Classification of multisource remote sensing imagery using a genetic
algorithm and Markov random fields. IEEE Transactions on Geoscience and Remote Sensing, 37, 1255–1260.
Vapnick, V. N., 1998. Statistical Learning Theory. Wiley, New York.
Wang, X.-Y., Wu, J.-F., and Yang, H.-Y., 2009. Robust image retrieval based on color histogram of local feature
regions. Multimedia Tools and Applications, 49, 323–345.
Wolf, P., Dewitt, B., and Mikhail, E., 2000. Elements of Photogrammetry with Applications in GIS. McGraw-
Hill Education, New York, Chicago, San Francisco, Athens, London, Madrid, Mexico City, Milan, New
Delhi, Singapore, Sydney, Toronto.
Wu, K.-L. and Yang, M.-S., 2005. A cluster validity index for fuzzy clustering. Pattern Recognition Letters,
26, 1275–1291.
Wyse, N., Dubes, R., and Jain, A. K., 1980. A critical evaluation of intrinsic dimensionality algorithms. In:
Gelsema, E. S. and Kanal, L. N. (Eds.), Pattern Recognition in Practice, North-Holland Publishing
Company, Amsterdam, Netherlands, 415–425.
Xie, X. L. and Beni, G., 1991. A validity measure for fuzzy clustering. IEEE Transactions on Pattern Analysis
and Machine Intelligence, 13, 841–847.
Xue, B., Zhang, M., and Browne, W. N., 2013. Particle swarm optimization for feature selection in classification:
A multi-objective approach. IEEE Transactions on Cybernetics, 43, 1656–1671.
Yu, L. and Liu, H., 2003. Feature selection for high-dimensional data: A fast correlation-based filter solution.
In: Proceedings of the Twentieth International Conference on Machine Learning (ICML-2003),
Washington, DC, USA, 1–8.
Yu, X., Yang, J., and Zhang, J., 2012. A transductive support vector machine algorithm based on spectral
clustering. AASRI Procedia, 1, 384–388.
Zare, H., Haffari, G., Gupta, A., and Brinkman, R. R., 2013. Scoring relevancy of features based on
combinatorial analysis of Lasso with application to lymphoma diagnosis. BMC Genomics, 14, S14.
Zena, M. H. and Gillies, D. F., 2015. A review of feature selection and feature extraction methods applied on
microarray data. Advances in Bioinformatics, 2015, Article ID 198363, 1–13.
Zhao, W. and Du, S., 2016. Spectral-spatial feature extraction for hyperspectral image classification: A
dimension reduction and deep learning approach. IEEE Transactions on Geoscience and Remote
Sensing, 54, 4544–4554.
Zhao, H., Sun, S., Jing, Z., and Yang, J., 2006. Local structure based supervised feature extraction. Pattern
Recognition, 39, 1546–1550.
Zhao, J., Zhong, Y., and Zhang, L., 2015. Detail-preserving smoothing classifier based on conditional random
fields for high spatial resolution remote sensing imagery. IEEE Transactions on Geoscience and Remote
Sensing, 53, 2440–2452.
Zhong, Y., Zhang, L., Gong, J., and Li, P., 2007. A supervised artificial immune classifier for remote-sensing
imagery. IEEE Transactions on Geoscience and Remote Sensing, 45, 3957–3966.
Zhu, X., 2008. Semi-Supervised Learning Literature Survey. Computer Sciences TR 1530. University of
Wisconsin—Madison.
Zhuo, L., Zheng, J., Li, X., Wang, F., Ai, B., and Qian, J., 2008. A genetic algorithm based wrapper feature
selection method for classification of hyperspectral images using support vector machine. In: In
Proceedings of SPIE 7147, The International Society for Optical Engineering, Guangzhou, China,
71471J–71471J–9.
6 Feature Extraction with
Statistics and Decision
Science Algorithms
6.1 INTRODUCTION
With the fast development of air-borne and space-borne remote sensing technologies, large volumes
of remotely sensed multispectral, hyperspectral, and microwave images have become available to
the public. Such massive data sources often require a large amount of memory for storage and
computational power for processing. With proper feature extraction techniques, these images may
provide a huge amount of information to help better understand Earth’s environment. Traditional
feature extraction methods involve using regression, filtering, clustering, transformation, and
probabilistic theory as opposed to modern feature extraction methods that heavily count on machine
learning and data mining. Nevertheless, these massive and various data sources are prone to be
redundant, which in turn may complicate traditional feature extraction processes and even result in
overfitting issues in machine learning or data mining (Liao et al., 2013; Huang et al., 2014; Romero
et al., 2016). Hence, the extraction of specific features of interest from complex and redundant data
inputs is of great importance to the exploitation of these data sources on a large scale.
Due to its efficacy in transforming the original redundant and complex inputs into a set of
informative and nonredundant features, feature extraction has long been considered a crucial step in
image processing and pattern recognition, as well as remote sensing and environmental modeling,
because it facilitates the subsequent data manipulation and/or decision making (Elnemr et al.,
2016). In the remote sensing community, feature extraction techniques have been widely used for
image processing, typically for pattern recognition and image classification. In image classification
and pattern recognition, feature extraction is often considered a special form of dimensionality
reduction which aims to construct a compact and informative feature space by removing the
irrelevant and redundant information from the original data space (Elnemr et al., 2016). Practical
applications typically involve road extraction (Gamba et al., 2006), urban and building detection
(Bastarrika et al., 2011), oil spill detection (Brekke and Solberg, 2005), change detection (Celik,
2009), burned areas mapping (Bastarrika et al., 2011), surface water body mapping (Feyisa et al.,
2014), hyperspectral image classification (Chen et al., 2013; Qian et al., 2013), and so forth.
As elaborated in Chapter 5, feature extraction can be performed either on the spatial or
spectral domain; hence, methods of feature extraction can be developed by making use of various
theories such as simple filtering, mathematical morphology, clustering, regression, spatial/
spectral transformation, classification, and so on. Traditional feature extraction approaches work
in practice based on conventional mathematical theories like filtering, regression, spatial/spectral
transformation, and others requiring less computational resources. Yet those advanced methods
taking advantage of artificial intelligence (e.g., artificial neural network, genetic algorithm, support
vector machine, genetic programming), as well as other advanced optimization theories (e.g., particle
swarm optimization), demand a lot more computational resources. Therefore, those advanced
methods perceptually involve high-performance computing issues (i.e., compression, storage, and
performance-driven load distribution for heterogeneous computational grids) in various real-world
applications.
95
96 Multisensor Data Fusion and Machine Learning for Environmental Remote Sensing
In this chapter, a suite of traditional feature extraction approaches that rely on statistics and
decision science principles, such as filtering, morphology, decision trees, transformation, regression,
and probability theory, will be introduced with specific focus on the mathematical foundations of
each kind of method. Since numerous methods and approaches found in the literature share similar
principles, these analogous approaches will be grouped into the same category in a logical order.
Chapter 7, which focuses on machine learning and data mining for advanced feature extraction, will
follow these traditional feature extraction techniques.
6.2.1 Filtering Operation
Remotely sensed data are collected in forms of either panchromatic, multispectral or hyperspectral
images at various spatiotemporal scales. The embedded color, shape, and textual characteristics
are three typical features that can be extracted to represent the property of an image (Tian, 2013).
These retrieved features can be further referred to as the “fingerprint” or “signature” associated
with a given image. In general, color features are often defined and are subject to a specific color
space or model, such as RGB (red-blue-green), HSV (hue, saturation, value) also known as HSB
(hue, saturation, brightness), and LUV. Note that LUV stands for non-RGB color space that
decouples the “color” (chromaticity, the UV part) and “lightness” (luminance, the L part) of color to
improve object detection. These fixed color spaces will in turn limit further explorations of spectral
information at other wavelengths (e.g., multispectral and hyperspectral) because only three spectral
components are required with respect to these color spaces. To overcome this constraint, other
kinds of techniques are often used to better detect and extract the relevant features embedded in
each image.
In practice, filtering is generally considered the simplest method for feature extraction, which has
also been routinely used in image processing to detect and extract the targets of interest from a given
remotely sensed image. Thresholding, which can be considered a representative of such a technique,
is actually a form of low-level feature extraction method performed as a point operation on the input
image by applying a single threshold to transform any greyscale (or color image) into a binary map
(Nixon and Aguado, 2012). An illustrative example of thresholding-based feature extraction was
given in the previous chapter, as shown in Figure 5.3. For instance, clouds can be easily detected
with the aid of human vision, since clouds are brighter relative to other terrestrial objects in RGB
color space. Evidenced by this property, an empirical threshold value can then be applied to one
spectral band (e.g., blue band) to extract the observed clouds by simply using a Boolean operator
(see Equation 5.2 for details).
Technically, thresholding can also be treated as a decision-based feature extraction method.
In most cases, one or two empirical threshold values will fulfill the need to detect and extract the
desired features with good accuracy. However, this does not hold for some extreme conditions
such as the extraction of water bodies from remotely sensed images in mountainous areas, where
shadows are often an obstacle. This is mainly due to the fact that both targets always show similar
spectral properties optically. In other words, it is difficult to obtain a satisfying result by simply
Feature Extraction with Statistics and Decision Science Algorithms 97
applying threshold values to one or two fixed spectral bands. Thus, more complex thresholding
networks should be developed by making use of external information such as elevation data and
microwave satellite images. This often leads to the creation of a multilayer stacked decision tree
framework. More details regarding decision tree classifiers will be introduced in the following
subsections.
Thresholding techniques work mainly by relying on spectral differences between various
targets to extract the desired features; hence the threshold values for the same target could even
vary between images due to radiometric distortions caused by various factors such as illumination
conditions. To account for such drawbacks, more flexible approaches should be applied. In
image interpretation, the shapes of targets are often considered good features for further pattern
recognition, since the perimeter of an object can be easily perceived by human vision. Hence,
detecting the shape features from a given imagery is critical to the subsequent feature extraction.
Essentially, the shape of an object is commonly treated as a step change in the intensity levels
(Nixon and Aguado, 2012).
In the remote sensing community, filtering approaches have been widely used in image processing
to detect and extract shape features, for example, linear features such as roads and rivers. In order
to extract the perimeter of an object or linear features like roads and rivers from a remotely sensed
imagery, a suite of convolutional filters has been proposed for the extraction of edge features. Among
them, “Roberts cross operator” and “Sobel operator” are the two most well-known filters, and they
have been extensively used in practical applications. As one of the first edge detectors, the Roberts
cross operator was initiated by Lawrence Roberts in 1963 (Davis, 1975), with two convolutional
kernels (or operators) formulated as:
1 0 0 1
t1 = and t2 = (6.1)
0 −1 0
−1
According to Roberts (1963), the produced edges from an edge detector should be well defined,
while the intensity of edges should correspond closely to what a human would perceive with little
noise introduced by the background (Davis, 1975).
The Roberts cross operator works as a differential filter aiming to approximate to the gradient of
an image through discrete differentiation by computing the summation of the squared differences
between diagonally adjacent pixels. Let I(i,j) be a pixel (at the location (i,j)) in the original image
X, while Gx is the convoluted pixel value with the first kernel (e.g., t1) and Gy is the convoluted pixel
value with the second kernel (e.g., t2); the gradient can be then defined as:
∇I (i, j ) = G ( x, y ) ≅ Gx 2 + Gy 2 (6.2)
It is clear that this operation will highlight changes in intensity in a diagonal direction, hence
it enables the detection of changes between targets (i.e., edges). An example of enhanced edge
features in an observed scene through the application of Roberts filters as well as two other edge
detectors can be used to sharpen our understanding (Figure 6.1). The results indicate that the edge
features and linear features have been better characterized compared to the original image without
applying Roberts cross operation (Figure 6.1a). Despite the simplicity and capability of this filter, it
is observed that the Roberts cross suffers greatly from sensitivity to noise due to its convolutional
nature (Davis, 1975).
98 Multisensor Data Fusion and Machine Learning for Environmental Remote Sensing
FIGURE 6.1 Edge features extracted by three different edge detectors. (a) RGB composite image; (b) edge
features detected by performing Roberts cross operation; (c) features detected by Sobel operation; and (d)
features detected by Laplacian operation.
In order to better detect edge features, an enhanced discrete differentiation operator, the Sobel
filter (also known as Sobel–Feldman operator), was developed by Irwin Sobel and Gary Feldman in
1968 with the aim of computing an approximation of the gradient of the image intensity function.
Differing from the Roberts cross operator, the Sobel filter is an isotropic image gradient operator
that uses two separable and integer-valued 3 × 3 kernels to calculate approximations of the
derivatives by convolving with the input image in the horizontal and vertical directions, respectively.
Two kernels are formulated as follows:
1 2 1 −1 0 1
t1 = 0 0 0 and t2 = −2 0 2 (6.4)
−1 −2 −1 −1 0 1
Since the Sobel kernels have a larger window size than the Roberts cross operator, the Sobel
operator will yield larger degrees of accuracy in detecting and extracting edge features. More
precisely, the derived edge features from the Sobel filters will be much clearer and brighter (i.e.,
with larger contrast) to human vision (Figure 6.1c). Aside from the Roberts and Sobel operators,
Feature Extraction with Statistics and Decision Science Algorithms 99
there also exist other analogue filters, such as the Laplacian filters. A commonly used convolutional
kernel is
0 1 0
t = 1 −4 1 (6.5)
0 1 0
Theoretically, the Laplacian filters approximate a second-order derivative on the original image,
which in turn highlights regions of rapid intensity change in particular (Figure 6.1d). Because the
second derivatives are very sensitive to noise, a Gaussian smoothing is often performed on the
original image before applying the Laplacian filter to counter this constraint (Reuter et al., 2009).
Since all these filtering-based techniques are commonly used to enhance certain specific features,
they also fall into the image enhancement category in the image processing community to some
extent.
Due to their simplicity and relatively less expensive computation cost properties, filtering
techniques have been widely exploited for practical applications. For instance, Gabor filters were
also implemented for edge detection (Mehrotra et al., 1992) and texture classification (Clausi and
Ed Jernigan, 2000). In addition to the direct application of filter approaches for feature extraction,
various filters were also used in conjunction with other techniques to aid in feature extraction
practices. For instance, Gamba et al. (2006) proposed an adaptive directional filtering procedure
to enhance urban road extraction from high-resolution optical and Synthetic Aperture Radar
(SAR) images, in which the filtering scheme was used to capture the predominant directions of
roads. Similarly, Kang et al. (2014) used an edge-preserving filter to aid in hyperspectral image
classification, and the results indicated that the incorporation of the edge-preserving filtering in the
classification scheme resulted in higher classification accuracy.
6.2.2 Mathematical Morphology
Similar to filtering operators, morphological operators are another kind of promising filter that
have been widely used in computer vision for geometric structure information analysis. Essentially,
the foundation of morphological processing is in the mathematically rigorous field of describing
shapes using set theory, geometry, and topology, hence such processing procedures are generally
termed mathematical morphology (Serra, 1992; Soille and Pesaresi, 2002; Soille, 2004). In image
processing, morphological operators refer to a variety of image filters that process images based
on morphological information (e.g., size and shape). As opposed to many methods based on the
spectral property of pixels, mathematical morphology concentrates on the spatial relationships
between groups of pixels and treats the objects present in an image as sets (Soille and Pesaresi,
2002).
In image processing, mathematical morphology is commonly used to examine interactions
between an image and a set of structuring elements using certain operations, while the structuring
element acts as a probe for extracting or suppressing specific structures of the image objects (Plaza,
2007). More specifically, morphological operations apply a structuring element to filter an image,
while the value of each pixel in the output image is based on a comparison of the corresponding pixel
in the input image with its neighbors. By choosing a proper size and shape of the neighborhood, a
morphological operation that is sensitive to specific shapes in the input image can be constructed.
The output of filtering process depends fully on the matches between the input image and the
structuring element and the operation being performed (Quackenbush, 2004).
A structuring element is a small binary image which is actually a small matrix of pixels with
values of ones or zeros. Technically, in morphological operations, structuring elements play the
same role as convolutional kernels in traditional linear image filtering, yet the basic operations of
100 Multisensor Data Fusion and Machine Learning for Environmental Remote Sensing
1 1 1 1 1 0 0 1 0 0 0 0 1 0 0
1 1 1 1 1 0 0 1 0 0 0 1 1 1 0
1 1 1 1 1 1 1 1 1 1 1 1 1 1 1
1 1 1 1 1 0 0 1 0 0 0 1 1 1 0
1 1 1 1 1 0 0 1 0 0 0 0 1 0 0
FIGURE 6.2 An example of three simple structuring elements with different shapes. The blue square denotes
the origin of the structuring elements. (a) Square-shaped 5 × 5 element, (b) Cross-shaped 5 × 5 element, and
(c) Diamond-shaped 5 × 5 element.
morphology are nonlinear in nature (Davies, 2012). An example of three simple structuring elements
is demonstrated for more understanding in Figure 6.2. There are two basic aspects associated with
the structuring element. One is related to the size and the other to the shape. As indicated, the size
of an element is determined by the dimension of a matrix in general. A common practice is to have
a structuring matrix with an odd dimension, since the origin of the element is commonly defined
as the center of the matrix (Pratt, 2007). The shape of an element depends fully on the pattern of
ones and zeros distributed over the matrix grid. In practical usages, the structuring element may be
applied over the input image, acting as a filter to compare with the input image block by block based
on certain operations for the detection and extraction of specified geometrical features similar to the
given structuring element (Tuia et al., 2009).
Aside from the structuring element, another critical factor in morphological image analysis is
the morphological operation. The two most fundamental morphological operations are dilation and
erosion (Soille, 2004). Conceptually, both operations rely on translating the structuring element
to various points over the input image and then examining the intersection between the translated
element coordinates and the input image coordinates. If g is a binary image to analyze and B is a
structuring element, dilation (δB( g )) and erosion (εB( g ) ) can be mathematically represented as (Tuia
et al., 2009):
As indicated, dilation expands the image by adding pixels in the structuring element, that is,
a union between g and B. On the contrary, erosion is used to perform an intersection between
them. This kind of analysis (based on binary images) is often called binary morphology, which
can also be extended to grayscale images by considering them as a topographic relief. However, in
grayscale morphology, the pointwise minimum and maximum operators will be used instead of the
intersection and union, respectively (Tuia et al., 2009). More specifically, dilation adds pixels to the
boundaries of objects in an image (i.e., grows boundary regions), while erosion is used to remove
pixels on object boundaries (i.e., shrinks boundary regions). According to this principle, the number
of pixels added or removed from the objects depends totally on the size and shape of the given
structuring element.
The graphs in Figure 6.3 show a schematic illustration of dilation and erosion, comparatively.
A practical application of these two operations to an image is shown in Figure 6.4. It is clear that
Feature Extraction with Statistics and Decision Science Algorithms 101
Dilation Erosion
rule rule
the dilation operation has a unique effect; gaps between different regions are reduced and small
intrusions into boundaries of a region are filled in (Figure 6.4a). For example, the road shown at the
bottom of the Figure 6.4a confirms this observation. In contrast, the erosion operation shrinks the
objects’ boundaries, resulting in holes and gaps between different regions which become larger when
small details are eliminated (Figure 6.4b). In addition to the two basic operations of dilation and
erosion, many morphological operations in practical use are represented as compound operations
based on dilation and erosion, such as opening, closing, hit and miss transform, thickening, thinning,
and so forth (Tuia et al., 2009).
In the remote sensing community, morphological operators have been widely used in various
practical applications; common usages include edge detection, noise removal, image enhancement,
and image segmentation. In most cases, mathematical morphology is used to advance automatic
pattern recognition, in particular the detection of targets in urban areas since an accurate extraction
of shape and size features is essential to an automatic extraction process (Benediktsson et al.,
2005; Chaudhuri et al., 2016). Due to its pronounced efficacy, mathematical morphology has been
extensively used in detection and extraction of various terrestrial targets from remotely sensed high-
resolution optical/SAR imageries, such as roads (Mohammadzadeh et al., 2006; Valero et al., 2010),
(a) (b)
FIGURE 6.4 Morphologically transformed images by performing (a) dilation and (b) erosion operations
based on a square-shaped 5 × 5 structure element.
102 Multisensor Data Fusion and Machine Learning for Environmental Remote Sensing
bridges (Chen et al., 2014b), rivers (Sghaier et al., 2017), buildings (Chaudhuri et al., 2016), dwelling
structures (Kemper et al., 2011), and so on.
In addition to these ordinary applications, morphological concepts have also been applied to
aid in disaster management practices. Most recently, Lee et al. (2016) developed a mathematical
morphology method for automatically extracting the hurricane eyes from C-band SAR data to
advance understanding of hurricane dynamics. The results indicated that the morphology-based
analyses of the subsequent reconstructions of the hurricane eyes showed a high degree of agreement
with results derived from reference data based on National Oceanic and Atmospheric Administration
(NOAA) manual work. Similarly, Chen et al. (2017) developed an object-oriented framework for
landslide mapping based on Random Forests (RF) and mathematical morphology. The RF was used
as a dimensionality reduction tool to extract landslides’ relevant features, while a set of four closing
and opening morphology operations were subsequently applied to optimize the RF classification
results to map the landslides with higher accuracy. Moreover, morphological operators have
also been applied in astronomy. For instance, Aragón-Calvo et al. (2007) developed a multiscale
morphology filter to automatically segment cosmic structure into a set of basic components. Due to
the distinct advantage of scale independence in segmentation, anisotropic features such as filaments
and walls were well identified in this cosmic structure study.
No Yes
Rule 1 ?
Intermediate
Class 1 result 1
Yes No
Rule 2 ?
Intermediate
result 2 Class 2
Internal node
Class 3 Class 4
Terminal node (leaf)
In general, a decision tree consists of three essential components: a root node, several internal
nodes, and a set of terminal nodes (also called leaves). An illustrative example of a decision tree
structure is shown in Figure 6.5. As indicated, for each internal and terminal node (child node),
there should exist a parent node showing the data source. Meanwhile, regarding the root node
and each internal node (parent node), two or more children nodes will be generated from these
parent nodes based on various decision rules. If each parent node is split into two descendants,
the decision tree is often known as a binary tree (e.g., Figure 6.5), and the inherent decision rule
can be expressed as a dyadic Boolean operator such that the data points are split simply based
on whether the condition rule is satisfied or not. Among these three types of nodes, the root
node involves the input data space, while the other two kinds of nodes correspond to partitioned
subspaces. As opposed to root and internal nodes, the terminal nodes (i.e., leaves) refer to the final
determined outputs of the whole decision-making process, which cannot be further partitioned;
corresponding class labels (the majority class) will then be assigned. When developing a decision
tree, the most critical process is to split each internal node and the root node with various decision
rules or learning algorithms. In practice, there exist various learning algorithms of which the
most well known is the CART algorithm, which is a binary recursive partitioning procedure
(Breiman et al., 1984).
In the CART algorithm, a splitting rule is inherently defined as a determination function used
to maximize the purity (or homogeneity) of the training data as represented by the resulting
descendant nodes. Typically, an impurity function is defined to examine the goodness of split
for each node, and the Gini diversity index is commonly used as a popular measure for the
impurity function. Mathematically, the impurity measurement of the node t is usually defined as
the follows:
K
i (t ) = 1 − ∑P (t )
j =1
j
2
(6.8)
104 Multisensor Data Fusion and Machine Learning for Environmental Remote Sensing
where Pj (t) denotes the posterior probability of class j presenting in node t. This probability is often
defined as the proportion between the number of training samples that go to node t labeled as class
j and the total number of training samples within node t:
N j (t )
Pj (t ) = , j = 1, 2,…, K (6.9)
N (t )
Taking a binary node for example, the goodness of the split s for the node t can be calculated as:
where PR and PL are the proportions of the samples in node t that go to the right descendant tR and
the left descendant tL, respectively. Essentially, the goodness of the split s should be maximized to
eventually achieve the lowest impurity in each step toward the largest purity in the terminal nodes.
Analogous to a machine learning process, the stopping criterion is also required by decision trees
to stop the split process. In the CART algorithm, the stopping criterion is commonly defined as:
where β is a predetermined threshold. The split process will continue until it meets such a stopping
criterion. In other words, the decision tree will stop growing, which implies that the training process
of the decision tree classifier is complete. In general, a decision tree classifier-based process starts
from the root node, and the unclassified data points in the root node are partitioned into different
internal nodes following a set of splitting rules before they finally arrive at terminal nodes (leaves)
where a class label will be assigned to each of them.
Decision tree has been extensively used in support of data mining and machine learning, aiming
to extract a target variable based on several input variables and a set of decision rules (Friedl et al.,
1999; Liu et al., 2008). Due to their nonparametric and top-down framework, decision trees have
been widely used in many practical remote sensing applications for feature extraction such as
hyperspectral image classification (Chen and Wang, 2007), snow cover extraction (Liu et al., 2008),
invasive plant species detection (Ghulam et al., 2014), and so on.
[Link] RF Classifier
RF, or random decision forests, is a suite of decision trees created by drawing a subset of training
data through a bagging approach (Breinman, 2001). More specifically, RF consists of a combination
of decision trees where each tree is constructed using an independently sampled random vector
from the input set, while all trees in the forest maintain a consistent distribution (Pal, 2005; Tian
et al., 2016). In practice, about two-thirds of the original input will be randomly selected to train
these trees through the bagging process, while the remaining one-third will not be used for training.
Instead, that portion of the data is used for internal cross-validation in order to check the performance
of the trained trees (Belgiu and Drăguţ, 2016). In other words, there is no need to perform cross-
validation to get an unbiased estimate of the test set error since it has already been done in the
process of constructing RF. In general, RF tries to construct multiple CART models by making use
of different samples and different initial variables, which in turn renders RF, accounting for the
inherent drawback of overfitting associated with conventional decision trees (Hastie et al., 2009).
In general, two parameters are required to perform an RF-based classification, namely, the
number of trees and the number of variables randomly chosen at each split (Winham et al., 2013).
Each node in a tree will be split with a given number of randomly sampled variables from the input
feature space. In RF, the Shannon entropy (or Gini index) is routinely used as the splitting function
(or attribute selection measure) to measure the impurity of an attribute with respect to the classes
Feature Extraction with Statistics and Decision Science Algorithms 105
(Pal, 2005). In prediction, each tree votes for a class membership for each test sample, and the class
with maximum votes will be considered the final class (Ni et al., 2017).
Unlike many other classification methods typically based on one classifier, hundreds of classifiers
can be constructed in RF and a final prediction is always obtained by combining all these decisions
with an optimal function (e.g., plurality vote). In traditional and advanced feature extraction practices,
ensemble learning methods use multiple learning algorithms to obtain better predictive performance
than could be obtained from any single learning algorithm. In fact, RF is regarded as an ensemble
classifier of decision tree in which decision tree plays the role of a meta model. This ensemble learning
nature renders RF many desirable advantages, for example, high accuracy, robustness against
overfitting the training data, and integrated measures of variable importance (Chan and Paelinckx,
2008; Guo et al., 2011; Stumpf and Kerle, 2011). In addition, no distribution assumption is required
for the input data, hence it can be used to process various data sets. Nevertheless, like many other
statistical learning techniques, RF is also observed to be prone to bias once the number of instances
is distributed unequally among the classes of interest (Winham et al., 2013). However, because of
its outstanding advantages, RF has been widely used for remote sensing classification in terms of
various applications, for example, laser data point clouds classification (Ni et al., 2017), LiDAR and
multispectral image-based urban scene classification (Guo et al., 2011), land cover classification and
mapping (Fan, 2013; Tian et al., 2016), hyperspectral image classification (Ham et al., 2005), and
landslide mapping (Stumpf and Kerle, 2011; Chen et al., 2014a).
6.2.4 Cluster Analysis
Cluster analysis, or clustering, was initiated in anthropology in 1932 and then introduced to
psychology in the late 1930s; its usage for trait theory classification in personality psychology in the
1940s caused it to be more widely known to the public (Tryon, 1939; Cattell, 1943; Bailey, 1994).
Essentially, clustering itself is not a specific algorithm, but is instead a general term referring to
various tasks or processes of grouping a set of targets/objects with similar characteristics into the
same class while isolating those with different properties into other classes. The core of clustering
is related to various algorithms that have the capability of detecting and isolating distinct features
into different groups. Therefore, the difference between various clustering algorithms primarily lies
in their notion of what constitutes a cluster and how to detect clusters efficiently. Clustering can thus
be considered a knowledge discovery or multi-objective optimization problem.
To date, more than one hundred clustering algorithms have been developed for various
applications found in the literature. The reason for such numerous clustering algorithms can be
ascribed to the fact that the notion of a cluster is difficult to define because it varies significantly
in properties between algorithms (Estivill-Castro, 2002). In other words, different clustering
algorithms are produced by employing different cluster models. Thus, understanding cluster models
is critical to the realization of the differences between the various clustering algorithms as cluster
models act as the core of each algorithm. As found in the literature, popular notions of clusters
include groups with small distances among the cluster members, dense regions in the data space,
intervals or particular statistical distributions. Typical cluster models associated with these notions
include connectivity models, centroid models, density models, distribution models, and many others
(Estivill-Castro, 2002). They are described below.
the maximum distance needed to connect objects within this cluster, and hence different clusters
will be formed under different distances (Everitt, 2011). Therefore, connectivity-based clustering
methods will differ largely by the distance functions used in each method. In addition to the
selection of distance functions, the linkage criteria also need to be decided. Popular choices include
single linkage clustering (Sibson, 1973) and complete linkage clustering (Defays, 1977). Despite
the efficacy of clustering objects into different groups, it has been observed that connectivity-based
clustering methods are prone to outliers (e.g., resulting in additional clusters or causing other clusters
to merge) in practical applications. Moreover, the computational burden of manipulating large data
sets will be huge since it is difficult to compute an optimal distance due to the high dimensionality
(Estivill-Castro, 2002; Everitt, 2011).
∑∑ x − u
2
arg S min i (6.12)
i=1 x∈Si
where ui is the mean value of Si. Compared to other clustering, finding an optimal solution to
k-means clustering is often computationally complex. Commonly, an iterative refinement technique
is used to solve the problem. More details related to the modeling process can be found in MacKay
(2003). Despite the computational complexity, k-means clustering is still featured in several distinct
applications, including the Voronoi structure-based data partitioning scheme, the nearest neighbor
classification concept, and model-based clustering basis.
An experimental example of k-means clustering for land use and land cover classification can
be seen in Figure 6.6. The results show an adequate accuracy in classifying different land cover
types. Compared to the true color image, the classified map exhibits more contrast between different
features, in particular the water bodies, which are largely highlighted in the classified map. In recent
years, a set of analogs has been developed based on the foundation of k-means clustering, including
X-means clustering (Ishioka, 2000), G-means clustering (Hamerly and Elkan, 2004), and the most
widely used fuzzy clustering (Dell’Acqua and Gamba, 2001; Modava and Akbarizadeh, 2017).
(a) (b)
FIGURE 6.6 One observed Landsat TM scene and the corresponding classified result from k-means
clustering. (a) true color image by given bands 3, 2, and 1 to the RGB space, respectively; (b) the classified
result (6 classes) from the k-means method.
density criterion is also required by the density-based clustering. In other words, only connecting
points satisfying the density criterion will be retained.
In addition to DBSCAN, some other types of density-based clustering methods are detailed in the
literature as well, such as the Ordering Points to idenTify the Clustering Structure (OPTICS) (Ankerst
et al., 1999), a generalized form of DBSCAN, which works regardless of an appropriate value for the
range parameter. Nevertheless, such methods are always associated with one key drawback: they expect
some kind of density drop to detect cluster borders. In contrast to many other clustering methods,
density-based clustering methods only consider density-connected objects to form a cluster. Thus, the
shape of the cluster is often arbitrary. Nevertheless, such methods may perform poorly in dealing with
data sets with Gaussian mixtures since it is hard for them to model such data sets precisely.
(e.g., the distance function, a density threshold or the number of expected clusters) depends largely
on the input data set (i.e., an algorithm specifically for one kind of model would generally fail
on a data set involving different kinds of models) as well as the further usage or objective of the
derived results. Overall, the clustering method with respect to a particular problem often needs to
be selected experimentally or with a priori knowledge about the data set as well as the intended use
of the results.
Clustering methods have long been widely used for feature learning and feature extraction to aid
in remote sensing applications; they include building extraction from panchromatic images (Wei and
Zhao, 2004), aerial laser cloud data (Tokunaga and Thuy Vu, 2007), SAR image segmentation by
making use of spectral clustering (Zhang et al., 2008), fuzzy c-means clustering (Tian et al., 2013),
and many other practices such as street tracking (fuzzy clustering) (Dell’Acqua and Gamba, 2001),
coastline extraction (fuzzy clustering) (Modava and Akbarizadeh, 2017), geometrical structure
retrieval (density-distance-based clustering) (Wu et al., 2017), and so on. In recent years, with the
advances of big remote sensing data such as high-resolution hyperspectral imageries, many of the
existing methods could fail in handling these data sets due to the curse of high dimensionality, which
in turn stimulates the development of new clustering algorithms that focus on subspace clustering
(Kriegel et al., 2012). An example of such a clustering algorithm is Clustering in QUEst (CLIQUE)
(Agrawal et al., 2005). In order to advance hyperspectral imagery classification, Sun et al. (2015)
proposed an improved sparse subspace clustering method to advance the band subset selection based
on the assumption that band vectors can be sampled from the integrated low-dimensional orthogonal
subspaces and each band can be sparsely represented as a linear or affine combination of other bands
within its subspace. The experimental results indicated that such a subspace clustering method could
significantly reduce the computational burden while improving the classification accuracy.
NIR − Red
NDVI = (6.13)
NIR + Red
In most cases, this kind of process is also considered to be image enhancement or data mining,
since the vegetation information is emphasized through a data mining scheme. Nevertheless, we
prefer to consider such a framework to be a feature extraction process, since informative features
(e.g., vegetation) are extracted from multiple spectral bands toward a dimensionality reduction.
Feature Extraction with Statistics and Decision Science Algorithms 109
Such a method has been widely used in many remote sensing applications. For instance,
McFeeters (1996) proposed the Normalized Difference Water Index (NDWI) based on the spectral
differences of water bodies in the green and near-infrared wavelength ranges for surface water
extraction purposes. The NDWI is formulated as follows (McFeeters, 1996):
Green − NIR
NDWI = (6.14)
Green + NIR
where pixels with positive NDWI values (NDWI > 0) are considered to be covered by water and
negative values are nonwater. In recent years, to account for drawbacks associated with NDWI,
a set of enhanced water indexes has been introduced seeking possible accuracy improvements,
such as Modified Normalized Difference Water Index (MNDWI) (Xu, 2006) and Automated Water
Extraction Index (AWEI) (Feyisa et al., 2014). Despite the usage of different spectral bands, these
indexes are still derived in a manner of regression and extrapolation based on several empirical
models. In the literature, there are many applications using such derivatives for specific feature
extraction purposes, for example, red edge position extraction (Cho and Skidmore, 2006), flower
coverage estimation (Chen et al., 2009), and burned areas mapping (Bastarrika et al., 2011).
Multivariate regression is another popular technique commonly used for detection and
extraction of certain specific features. An example is the empirical algorithm operationally used for
deriving chlorophyll-a concentrations in aquatic environments. Based on the in situ chlorophyll-a
concentration measurements and the relevant spectral reflectance observations, a fourth-order
polynomial empirical relationship was established for chlorophyll-a concentration estimation from
a suite of optical remotely sensed images. The algorithm can be formulated as (O’Reilly et al., 1998):
4
log10 (chl-a) = a0 + ∑a * R
i=1
i
i
(6.15)
where the operator > means to find the largest reflectance value of Rrs 443 and Rrs 488. The chl-
a maps over the dry and wet seasons of Lake Nicaragua and Lake Managua are presented in
Figure 6.7. These four subdiagrams in Figure 6.7 exhibit a seasonality effect in a comparative way.
In general, the water quality in the dry season is worse than that of the wet season. However,
based on the Probability Density Function (PDF) of all the band values, the input band values
of Equation 6.15 do not follow the normality assumption as a linear regression equation in (6.15)
implies (Figures 6.8 and 6.9). Hence, the predicted values of chl-a concentrations do not follow the
normality assumption closely either (Figures 6.10 and 6.11). This finding gives rise to some insights
about the inadequacy of using a linear regression model to infer the water quality conditions of the
two tropical shallow lakes.
(a) (b)
(c) (d)
FIGURE 6.7 Chl-a concentration maps of Lake Managua and Lake Nicaragua for dry season and wet season,
respectively. (a) Lake Managua (Dry Season/March 04, 2016); (b) Lake Managua (Wet Season/September 08,
2016); (c) Lake Nicaragua (Dry Season/March 01, 2016); and (d) Lake Nicaragua (Wet Season/September 03, 2016).
(a) (b)
0.25 0.07
Rrs443 Rrs443
Rrs488 0.06 Rrs488
0.20 Rrs547 Rrs547
0.05
0.15 0.04
PDF
PDF
0.03
0.10
0.02
0.05
0.01
0 0
400 600 800 1000 1200 1400 1600 0 500 1000 1500 2000
Band value Band value
FIGURE 6.8 Band PDF for Lake Managua, (a) dry season; (b) wet season. (Note that the X axes do not stand
for the original reflectance value, and they have been multiplied by a scale factor for convenience of expression.)
Feature Extraction with Statistics and Decision Science Algorithms 111
(a) (b)
0.07 0.12
Rrs443 Rrs443
0.06 Rrs488 0.10 Rrs488
Rrs547 Rrs547
0.05
0.08
0.04
PDF
PDF
0.06
0.03
0.04
0.02
0.01 0.02
0 0
0 1000 2000 3000 4000 0 1000 2000 3000 4000
Band value Band value
FIGURE 6.9 Band PDF for Lake Nicaragua, (a) dry season; (b) wet season. (Note that the X axes do not
stand for the original reflectance value, and they have been multiplied by a scale factor for convenience of
expression.)
(a) (b)
0.06 0.45
0.40
0.05
0.35
0.04 0.30
0.25
PDF
PDF
0.03
0.20
0.02 0.15
0.10
0.01
0.05
0 0
2 2.5 3 3.5 4 4.5 2 4 6 8 10 12 14 16
Chl-a (µg/L) Chl-a (µg/L)
FIGURE 6.10 PDF of Chl-a concentrations in Lake Managua, (a) dry season; (b) wet season.
(a) (b)
0.50 0.7
0.45
0.6
0.40
0.35 0.5
0.30 0.4
PDF
0.25
0.20 0.3
0.15 0.2
0.10
0.1
0.05
0 0
10 20 30 40 50 10 20 30 40 50 60
Chl-a (µg/L) Chl-a (µg/L)
FIGURE 6.11 PDF of Chl-a concentrations in Lake Nicaragua, (a) dry season; (b) wet season.
112 Multisensor Data Fusion and Machine Learning for Environmental Remote Sensing
2009). Conceptually, a simplified form of a logistic regression problem can be written as follows
(binomial):
1, β0 + β1 x1 + β2 x2 + + βk xk > 0
y = (6.17)
0 else
p
log j = X * W = β0 j + β1 j x1 + β2 j x2 + + βkj xk (6.18)
1 − p j
e(
β0 j + β1 j x1 + β2 j x2 ++βkj xk )
pj = (6.19)
∑
J −1
e(
β0 l + β1l x1 +β2 l x2 ++βkl xk )
1+
l =1
where β is the weight matrix to be optimized. If the Jth class is the baseline, the logistic regression
model can be written in terms of J – 1 logit transformations as:
p
log 1 = β01 + β11 x1 + β21 x2 + + βk1 xk
pJ
p
log 2 = β02 + β12 x1 + β22 x2 + + βk 2 xk
pJ (6.20)
pJ −1
log = β0(J −1) + β1(J −1) x1 + β2(J −1) x2 + + βk(J −1) xk
pJ
Feature Extraction with Statistics and Decision Science Algorithms 113
and hence
1
pJ =
(6.21)
∑
J −1
(β0 l +β1l x1 +β2 l x2 ++βkl xk )
1+ e
l =1
The model’s prediction is thus the class with maximal probability:
e( 0 j 1 j 1 2 j 2
β +β x +β x ++βkj xk )
arg max β (6.22)
(β0 l +β1l x1 +β2 l x2 ++βkl xk )
∑
J −1
1 + e
l =1
and the optimal weight matrix β * can in turn be estimated using the maximum likelihood method
(Hosmer and Lemeshow, 2000).
In feature extraction, logistic regression is commonly used to reduce dimensionality of the input
feature space by extracting the most relevant features based on the predicted probability of each
feature class. Cheng et al. (2006) developed a systematic approach based on logistic regression
for the feature selection and classification of remotely sensed images. The experimental results
performed on both multispectral (Landsat ETM+) and hyperspectral (Airborne Visible/Infrared
Imaging Spectrometer) images showed that the logistic regression enabled the reduction of the
number of features substantially without any significant decrease in the classification accuracy.
Similar work can be also found in Khurshid and Khan (2015). In addition, logistic regression can
be further extended to structured sparse logistic regression by adding a structured sparse constraint
(Qian et al., 2012). On the other hand, more advanced regression analyses can be conducted through
either data mining or machine learning although the implementation of these methods are not
straightforward but are always physically meaningful.
6.2.6 Linear Transformation
One of the essential tasks of feature extraction is to detect and extract a set of the most relevant
features from the original data set to reduce the dimensionality. Toward such a goal, many techniques
have been developed. The most popular methods are those that attempt to project or decompose the
original data inputs into a set of components, and then only the most relevant components will be
extracted and used for dimensionality reduction purposes. In this section, three popular methods
working with such a principle will be introduced below, including principal component analysis,
linear discriminate analysis, and wavelet transform.
t k(i) = Wk T xi (6.23)
114 Multisensor Data Fusion and Machine Learning for Environmental Remote Sensing
In order to maximize the variance, the first weight W1 thus has to satisfy the following condition:
W1 = arg max W =1 {∑ ( x ⋅ W ) }
i
i
2
(6.24)
{
W1 = arg max W =1 X ⋅ W
2
} = arg max {W T X T XW}
W =1
(6.25)
A symmetric matrix like XT X can be easily solved by finding the largest eigenvalue of the matrix,
as W is the corresponding eigenvector. Once W1 is obtained, the first principal component can
be derived by projecting the original data matrix X onto the W1 in the transformed space. The
further components can be acquired in a similar manner after subtracting the previously derived
components.
Since the number of principal components is usually determined by the number of significant
eigenvalues with respect to the global covariance matrix, the derived components always have a
lower dimension than the original data set (Prasad and Bruce, 2008). These components often
retain as much of the variance in the original dataset as possible. A set of six principal components
derived from a PCA analysis based on Landsat TM multispectral imageries is shown in Figure
6.12. The first two principal components (Figure 6.12a,b) have explained more than 95% of the
variances of these multispectral images. The remaining four components are thus considered to
be noise, which can be discarded for dimensionality reduction. Compared to each individual band
of the original multispectral image, the information content of the first principal component is
more abundant, which makes it a good data source for further data analysis such as classification.
Because of this unique feature, PCA has been extensively used in various data analyses for
dimensionality reduction, especially in manipulating high-dimensional data sets (Farrell and
(d) (e) (f )
FIGURE 6.12 Principal components derived from the Landsat TM multispectral image shown in Figure
6.6a. A total of 6 components are shown in (a)–(f) sequentially, with the explained variance of 68.5%, 27.2%,
3.3%, 0.6%, 0.3%, and 0.1%, respectively.
Feature Extraction with Statistics and Decision Science Algorithms 115
Mersereau, 2005; Celik, 2009; Lian, 2012). However, several drawbacks and constraints have been
observed associated with PCA, for example, the scaling effects (principal components are not
scale invariant) (Rencher, 2003; Prasad and Bruce, 2008). In recent years, many enhanced PCA
methods have been proposed toward various applications such as kernel PCA (Schölkopf et al.,
1997), scale-invariant PCA (Han and Liu, 2012, 2014), and even more advanced techniques like
independent component analysis (Stone, 2004; Wang and Chang, 2006) and projection pursuit
(Friedman and Tukey, 1974; Chiang et al., 2001).
zi = f ( xi ) = W T xi (6.26)
Analogous to many other feature extraction methods, the transformation matrix W can be
computed by solving an optimization problem in terms of fulfilling a given maximization criterion
of separability among classes. In LDA, this equals finding the best discrimination of the investigated
groups by maximizing the ratio of the interclass variance to intraclass variance to measure the
disparity of the groups (Wurm et al., 2016). The transformation matrix W can be optimized as:
W T SW
W * = arg maxW T (6.27)
W SW
where S denotes the interclass variance and S is the intraclass variance, and can be modeled as
follows:
C
∑n (u − u)(u − u)
T
S= k k k (6.28)
k =1
C nk
∑ ∑( xik − uk )( xik − uk )
T
S= (6.29)
k =1
i=1
where nk is the number of samples in the kth class, u is the mean of the entire training set, uk is the
mean of the kth class, and xik is the ith sample in the kth class.
It is clear that the interclass variance is calculated as the square sum of the dispersion of the mean
discriminant variables of each class (uk) from the mean of all discriminant variable elements, and
the intraclass variance is defined as the square sum of the dispersion of the discriminant variables
116 Multisensor Data Fusion and Machine Learning for Environmental Remote Sensing
of single objects from their class means (Wurm et al., 2016). The solution to Equation 6.27 can be
obtained by solving the following eigenvalue problem:
S W = ΛSW (6.30)
S −1 S W = Λ W (6.31)
In remote sensing, LDA has been widely used for land cover classification from various remotely
sensed imageries, in particular hyperspectral images because of their high-dimensional space (Liao
et al., 2013; Yuan et al., 2014; Shahdoosti and Mirzapour, 2017). Despite its good performance in
many applications, conventional LDA has the inherent limitation of becoming intractable when
the number of input features exceeds the training samples’ size (Bandos et al., 2009; Shahdoosti
and Mirzapour, 2017). In order to extend the application of LDA to many practical cases, a number
of adaptations have been implemented to conventional LDA, which in turn yields many enhanced
LDA, such as regularized LDA (Bandos et al., 2009), orthogonal LDA (Duchene and Leclercq,
1988), uncorrelated LDA (Bandos et al., 2009), stepwise LDA (Siddiqi et al., 2015), two-dimensional
LDA (Imani and Ghassemian, 2015b), and so on.
1 t − b
ϕa,b (t ) = ϕ (6.32)
a a
where a (a > 0) and b are scaling and shifting factors, respectively. It is clear that the wavelet
functions will be dilated when a > 1 and contracted when a < 1 relative to the mother wavelet.
The 1/ a is used as a modulation coefficient to normalize the energy of the wavelets (Bruce et al.,
2001). The most popular mother wavelet is the Morlet wavelet (also called Gabor wavelet), because
it is closely related to human perception in both hearing and vision (Bernardino and Santos-Victor,
2005). The Morlet wavelet is modeled as a wavelet composed of a complex exponential multiplied
by a Gaussian window, which can be expressed as:
2
(
ϕσ (t ) = cσ π−(1/ 2 )e−(1/ 2 )t eiσλ − e−(1/ 2 )t
2
) (6.33)
( )
2 2 −(1/ 2 )
cσ = 1 + e−σ − 2e−(3 / 4 )σ (6.34)
Feature Extraction with Statistics and Decision Science Algorithms 117
Based on the admissibility criterion, all wavelet functions must oscillate with an average value
of zero and finite support. Given an input signal x(t), the projection onto the subspace of one wavelet
function yields:
x a (t ) =
∫ W{x, a, b}⋅ϕ
R
a ,b ( t ) db
(6.35)
W {x, a, b} = x,ϕa,b =
∫ x (t ) ⋅ ϕ
R
a ,b ( t ) dt
(6.36)
Because of its multi-resolution capability, wavelet transform has been widely used for remotely
sensed data analysis; however, many applications were confined to image compression (e.g., DeVore
et al., 1992; Walker and Nguyen, 2001) and image fusion (e.g., Zhou et al., 1998; Nunez et al.,
1999). Later, wavelet transform was introduced into the feature extraction domain, and was then
extensively used for various practices. A fundamental reason why the wavelet transform is an
excellent tool for feature extraction is its inherent multi-resolution properties, which enable it to
project a signal onto a basis of wavelet functions to separate features at different scales by changing
the scaling and shifting parameters with respect to the features to be extracted (Mallat, 1989; Bruce
et al., 2002). The challenge here is related to how the wavelet coefficients can be interpreted to
represent various features, and a common approach is to compute coefficient distribution over the
selected wavelet functions (Ghazali et al., 2007). In feature extraction practices, wavelet transform
has been widely used for various applications, for example, target detection (Bruce et al., 2001),
dimensionality reduction of hyperspectral image (Bruce et al., 2002), forest mapping (Pu and Gong,
2004), vegetation phenology feature extraction (Martínez and Gilabert, 2009), hyperspectral image
classification (Qian et al., 2013), and so on.
Mathematically, the likelihood that a pixel with feature vector X belongs to class k can be defined
as a posterior probability (Ahmad and Quegan, 2012):
P ( k ) P ( X |k )
P ( k |X ) = (6.37)
P( X )
where P( X |k ) is the conditional probability to observe X from class k (or probability density function).
P(k) is the prior probability of class k, the values of which are usually assumed to be equal to each
other due to the lack of sufficient reference data. P(X) is the probability that the X is observed, which
can be further written as follows:
N
P( X ) = ∑P(k)P( X|k)
k =1
(6.38)
where N is the total number of classes. Commonly, P(X) is assumed to be a normalization constant
in order to ensure ΣkN=1P(k|X ) sums to 1 (Ahmad and Quegan, 2012). A pixel x will be assigned into
the class k once it satisfies the following criterion:
For mathematical reasons, ML often assumes the distribution (or probability density function)
of the data in a given class to be a multivariate Gaussian distribution; the likelihood can then be
expressed as follows:
1 1
P ( k |X ) = exp − ( X − uk )Σk −1 ( X − uk )T (6.40)
(2 π ) N /2
Σk
1/ 2 2
where N is the number of data sets (e.g., bands for multispectral image), X is the whole data set of
N bands, uk is the mean vector of class k, and Σk is the variance-covariance matrix of class k. |Σk | is
thus the determinant of Σk .
Due to its probability theory principle, MLC has been widely used in remote sensing for
classification. Applications include, but are not limited to: forest encroachment mapping (Tiwari
et al., 2016), rice crop mapping (Chen et al., 2011), land cover change detection (Otukei and Blaschke,
2010), salt farm mapping (Hagner and Reese, 2007), and water quality mapping (Jay and Guillaume,
2014). The performance of MLC has been thoroughly compared with many other classification
methods in the literature, such as decision tree, logistic regression, artificial neural network, and
support vector machine. More details can be found in Frizzelle and Moody (2001), Hagner and
Reese (2007), Kavzoglu and Reis (2008), and Hogland et al. (2013). Further investigations showed
that MLC may be ineffective in some cases, for example, classifying spectrally similar categories
and classes having subclasses (Kavzoglu and Reis, 2008). To account for these problems, methods
like PCA could be used to aid in the classification process. In addition, many extended MLC
methods have been developed, such as hierarchical MLC (Ediriwickrema and Khorram, 1997) and
calibrated MLC (Hagner and Reese, 2007).
class Ck once it yields the largest posterior probability P(Ck |X ). Based on the Bayes’ theorem, the
conditional probability P(Ck |X ) can be calculated as:
P( X |Ck )P(Ck )
P(Ck |X ) = (6.41)
P( X )
where P(Ck) is the priori probability of class Ck. P( X |Ck ) is the likelihood (or conditional probability)
of the feature vector X falling into the class Ck. P(x) is the priori probability of predictor X.
Since P(X) is independent of the class vector C and the feature values, it is thus equivalent to
a constant. Therefore, the critical component to the calculation of the conditional probability lies
in the estimation of the priori probability P(Ck) and the class-conditional probability P( X |Ck ). In
practice, the priori probability P(Ck) can be estimated from the training dataset as the portion of
samples within the training dataset taking the class label Ck:
N Ck
P(Ck ) = (6.42)
N
where N is the number of the training dataset and NCk is the number of training samples with class
label Ck.
The estimation of the conditional probability P( X |Ck ) can be defined as a joint probability as
follows:
To simplify the work, the Naive Bayes assumes that the presence of a feature in one class is
conditionally independent of other features, which means that they have the same conditional
probability:
∝ P(Ck )
ΠP( x |C )
i=1
p k
Based on the independent assumptions, the conditional probability P(Ck |X ) can be further
written as:
p
P(Ck )
P(Ck |X ) =
P( X ) ΠP( x |C )
i=1
p k (6.46)
120 Multisensor Data Fusion and Machine Learning for Environmental Remote Sensing
In practice, the Naive Bayes classifier can handle both discrete and continuous variables (Chang
et al., 2012). If X contains a finite number of discrete features xi, then the estimation of P ( xi |Ck ) is equal
to the calculation of the proportion of training samples xi within class Ck, which can be expressed as:
NCk ( xi )
P( xi |Ck ) = (6.47)
N Ck
The parameters µ and σ can be directly estimated from the training dataset.
In remote sensing, the Naive Bayes classifier is often used as another popular probabilistic
method for classification problems. As opposed to other classifiers, the popularity of the Naive Bayes
classifier is enhanced by the following properties (Wu et al., 2008). First, the Naive Bayes classifier
does not require a complicated iterative parameter estimation scheme, thus it is easy to construct,
making it able to handle huge datasets. Second, the Bayesian scheme makes the classification
process easy to understand and interpret, even for users unskilled in classifier technology. Finally,
the classification result has much higher accuracy. In practice, applications of Naive Bayes classifier
are relatively fewer compared to methods such as MLC. In the past years, the Naive Bayes classifier
has been successfully used for multi-label learning (Zhang et al., 2009), image classification (Liu
et al., 2011), text classification (Feng et al., 2015), and so forth.
6.3 SUMMARY
In this chapter, a suite of feature extraction methods based on statistic and decision science principles
were introduced, focusing primarily on their theoretical foundations with some illustrative examples
for practical applications. More specifically, methods discussed in this chapter include filtering
operation, morphology, decision trees, clustering algorithms, linear regression, PCA, wavelet
transform, MLC, and Naive Bayes classifier. All these techniques have been extensively used in
remote sensing for feature extraction, mainly for dimensionality reduction and feature selection.
In the next chapter, a set of artificial intelligence-based methods that are widely applied for feature
extraction will be described in detail.
REFERENCES
Agrawal, R., Gehrke, J., Gunopulos, D., and Raghavan, P., 2005. Automatic subspace clustering of high
dimensional data. Data Mining and Knowledge Discovery, 11, 5–33.
Ahmad, A. and Quegan, S., 2012. Analysis of maximum likelihood classification on multispectral data.
Applied Mathematical Sciences, 6, 6425–6436.
Ankerst, M., Breunig, M. M., Kriegel, H., and Sander, J., 1999. OPTICS: Ordering points to identify the
clustering structure. In: Proceedings of the 1999 ACM SIGMOD International Conference on
Management of Data, 49–60, Pennsylvania, USA.
Aragón-Calvo, M. A., Jones, B. J. T., van de Weygaert, R., and van der Hulst, J. M., 2007. The multiscale
morphology filter: identifying and extracting spatial patterns in the galaxy distribution. Astronomy &
Astrophysics, 474, 315–338.
Bailey, K., 1994. Numerical taxonomy and cluster analysis. In: K. D. Bailey (Ed.) Typologies and Taxonomies:
An Introduction to Classification Techniques, SAGE Publications Ltd., Thousand Oaks, California,
USA, 34, 24.
Bandos, T., Bruzzone, L., and Camps-Valls, G., 2009. Classification of hyperspectral images with regularized
linear discriminant analysis. IEEE Transactions on Geoscience and Remote Sensing, 47, 862–873.
Feature Extraction with Statistics and Decision Science Algorithms 121
Bastarrika, A., Chuvieco, E., and Martín, M. P., 2011. Mapping burned areas from Landsat TM/ETM+
data with a two-phase algorithm: Balancing omission and commission errors. Remote Sensing of
Environment, 115, 1003–1012.
Belgiu, M. and Drăguţ, L., 2016. Random forest in remote sensing: A review of applications and future
directions. ISPRS Journal of Photogrammetry and Remote Sensing, 114, 24–31.
Benediktsson, J. A., Palmason, J. A., and Sveinsson, J. R., 2005. Classification of hyperspectral data from
urban areas based on extended morphological profiles. IEEE Transactions on Geoscience and Remote
Sensing, 43, 480–491.
Bernardino, A. and Santos-Victor, J., 2005. A real-time Gabor primal sketch for visual attention. In: 2nd
Iberian Conference on Pattern Recognition and Image Analysis, 335–342, Estoril, Portugal.
Breinman, L., 2001. Random forests. Machine Learning, 45, 5–32.
Breiman, L., Friedman, J. H., Olshen, R. A., and Stone, C. J., 1984. Classification and Regression Trees.
Wadsworth & Brooks/Cole Advanced Books & Software.
Brekke, C. and Solberg, A. H. S., 2005. Oil spill detection by satellite remote sensing. Remote Sensing of
Environment, 95, 1–13.
Bruce, L. M., Koger, C. H., and Li, J., 2002. Dimensionality reduction of hyperspectral data using discrete
wavelet transform feature extraction. IEEE Transactions on Geoscience and Remote Sensing, 40,
2331–2338.
Bruce, L. M., Morgan, C., and Larsen, S., 2001. Automated detection of subpixel hyperspectral targets with
continuous and discrete wavelet transforms. IEEE Transactions on Geoscience and Remote Sensing,
39, 2217–2226.
Cattell, R. B., 1943. The description of personality: Basic traits resolved into clusters. Journal of Abnormal
Psychology, 38, 476–506.
Celik, T. 2009. Unsupervised change detection in satellite images using principal component analysis and
k-Means clustering. IEEE Geoscience and Remote Sensing Letters, 6, 772–776.
Chan, J. C. W. and Paelinckx, D., 2008. Evaluation of Random Forest and Adaboost tree-based ensemble
classification and spectral band selection for ecotope mapping using airborne hyperspectral imagery.
Remote Sensing of Environment, 112, 2999–3011.
Chang, N.-B., Han, M., Yao, W., and Chen, L.-C., 2012. Remote sensing assessment of coastal land
reclamation impact in Dalian, China, using high-resolution SPOT images and support vector machine.
In: Environmental Remote Sensing and Systems Analysis. CRC Press, Boca Raton, FL, USA, 249–276.
Chaudhuri, D., Kushwaha, N. K., Samal, A., and Agarwal, R. C., 2016. Automatic building detection from
high-resolution satellite images based on morphology and internal gray variance. IEEE Journal of
Selected Topics in Applied Earth Observations and Remote Sensing, 9, 1767–1779.
Chen, W., Li, X., Wang, Y., Chen, G., and Liu, S., 2014a. Forested landslide detection using LiDAR data and
the random forest algorithm: A case study of the Three Gorges, China. Remote Sensing of Environment,
152, 291–301.
Ch