Why is a compact feature vector critical in the process of feature extraction?

A compact feature vector is critical in feature extraction because it efficiently represents the original dataset in a reduced dimensionality space, making it easier to process and analyze. This reduction maintains computational efficiency while preserving essential information, aiding in accurate classification and decision-making tasks. The quality and representativeness of these vectors directly influence the performance of machine learning models, leading to more interpretative and generalizable results .

How does the use of ensemble learning models improve the accuracy of feature extraction in image processing?

Ensemble learning models enhance accuracy in feature extraction by combining the strengths of multiple learning algorithms to produce a more robust overall model. By integrating various models, these ensembles reduce the likelihood of errors inherent in single models and improve generalization. Techniques such as hybrid models incorporating genetic algorithms, artificial neural networks, and support vector machines exploit different aspects of independent variables, effectively capturing complex data relationships and thus improving extraction accuracy .

What role does feature extraction play in the workflow of image processing and pattern recognition for environmental monitoring?

Feature extraction serves as the first essential step in the workflow of image processing and pattern recognition for environmental monitoring, following pre-processing. It aims to construct a compact feature vector that effectively represents the raw observations, thereby reducing size and dimensionality to facilitate subsequent learning processes. This step is critical as the quality of extracted features significantly affects the final performance of analysis and decision-making. Feature extraction improves generalization and interpretability, aids in solving various problems through machine learning, and enhances efficiency in detecting and extracting informative features from raw data .

In what ways does supervised learning differ from semi-supervised learning in handling labeled datasets?

Supervised learning relies entirely on labeled datasets, allowing models to learn by adjusting to errors using target values to predict outcomes. In contrast, semi-supervised learning uses a combination of very small amounts of labeled data with large volumes of unlabeled data. This approach is advantageous because it can improve generalization by inferring the shape of the underlying data distribution better than relying solely on labeled data. Semi-supervised learning is particularly useful when creating labels for every piece of data is infeasible .

In the context of Bayesian fusion methods, how does Bayesian Maximum Entropy contribute to improved data fusion outcomes?

Bayesian Maximum Entropy (BME) contributes significantly to improved data fusion outcomes by allowing the integration of diverse data sources through a coherent probabilistic framework. It effectively combines spatial and temporal data, even with different accuracy levels, ensuring that uncertainty is managed, and data integrity is preserved. BME's flexibility in accounting for non-Gaussian distributions makes it particularly advantageous for applications involving varying data quality, leading to more accurate predictions and analyses .

What challenges are associated with using multisensor data fusion in enhancing remote sensing images?

Multisensor data fusion faces challenges such as handling spatial, temporal, and spectral heterogeneities due to differences in sensor specifications and observational biases. Issues like nonsimultaneous data acquisition, co-registration errors, and diverse spatial resolutions complicate the fusion process. Additionally, accurately modeling the relationship between different datasets requires sophisticated statistical techniques to minimize introduced distortions and manage ambiguities in the data .

What advantages does automatic feature extraction offer in the context of environmental decision making?

Automatic feature extraction offers significant advantages in environmental decision making by enabling efficient and objective detection and extraction of informative features from input data swiftly and accurately. This automation reduces human errors and speeds up the processing time, leading to timely decision-making. It also enhances the ability to handle large datasets, as seen in hyperspectral data with numerous bands, thereby improving the interpretation and classification results in remote sensing applications .

How does probabilistic fusion models differ in handling data uncertainties compared to traditional methods?

Probabilistic fusion models, such as those based on Bayesian theory, excel in handling data uncertainties by modeling problems in a probabilistic framework which allows for accounting of uncertainties inherent in input data. Unlike traditional methods which may impose strict modeling assumptions, probabilistic approaches like Bayesian Maximum Entropy (BME) offer flexibility in dealing with non-Gaussian distributions and integrate data from multiple sources with different accuracies. This capability makes them superior for scenarios involving unstructured data and uncertainties, providing a more robust solution for data fusion .

What are the key differences between fusion methods like STARFM and FSDAF in satellite image processing?

STARFM (Spatial and Temporal Adaptive Reflectance Fusion Model) focuses on building relationships between predicted and candidate pixels spatially and temporally to enhance spatial and temporal resolution. It is often used to blend data from satellites like Landsat and MODIS. FSDAF (Flexible Spatiotemporal Data Fusion), on the other hand, integrates spectral unmixing analysis with interpolators and is designed to handle heterogeneous landscapes, predict gradual changes, and accommodate changes in land cover types. FSDAF requires minimal input data and offers robust predictions even in variable environments .

What role does deep learning play in enhancing the capability of feature extraction algorithms?

Deep learning enhances feature extraction algorithms by increasing their ability to model complex data representations and relationships. Its layered architecture allows for hierarchical feature learning, and models such as convolutional neural networks (CNNs) can effectively handle high-dimensional data inputs, improving pattern recognition and classification tasks. Deep learning's advanced computational abilities refine extraction processes, providing critical insight into environmental data and aiding in decision-making .

Open navigation menu

Upload

100% found this document useful (1 vote)

969 views529 pages

Multisensor Data Fusion

Uploaded by

ayouaz3884

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

100% found this document useful (1 vote)

969 views529 pages

Multisensor Data Fusion

Uploaded by

ayouaz3884

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

Multisensor Data Fusion

and Machine Learning for

Environmental Remote Sensing
Multisensor Data Fusion
and Machine Learning for
Environmental Remote Sensing

Ni-Bin Chang
Kaixu Bai
MATLAB ® is a trademark of The MathWorks, Inc. and is used with permission. The MathWorks does not warrant the
accuracy of the text or exercises in this book. This book’s use or discussion of MATLAB ® software or related products
does not constitute endorsement or sponsorship by The MathWorks of a particular pedagogical approach or particu-
lar use of the MATLAB ® software.

CRC Press
Taylor & Francis Group
6000 Broken Sound Parkway NW, Suite 300
Boca Raton, FL 33487-2742

© 2018 by Taylor & Francis Group, LLC

CRC Press is an imprint of Taylor & Francis Group, an Informa business

No claim to original U.S. Government works

Printed on acid-free paper

International Standard Book Number-13: 978-1-4987-7433-8 (Hardback)

This book contains information obtained from authentic and highly regarded sources. Reasonable effort has been
made to publish reliable data and information, but the author and publisher cannot assume responsibility for the
validity of all materials or the consequences of their use. The authors and publishers have attempted to trace the
copyright holders of all material reproduced in this publication and apologize to copyright holders if permission to
publish in this form has not been obtained. If any copyright material has not been acknowledged, please write and let
us know so we may rectify in any future reprint.

Except as permitted under U.S. Copyright Law, no part of this book may be reprinted, reproduced, transmitted, or
utilized in any form by any electronic, mechanical, or other means, now known or hereafter invented, including pho-
tocopying, microfilming, and recording, or in any information storage or retrieval system, without written permission
from the publishers.

For permission to photocopy or use material electronically from this work, please access [Link] (http://
[Link]/) or contact the Copyright Clearance Center, Inc. (CCC), 222 Rosewood Drive, Danvers, MA
01923, 978-750-8400. CCC is a not-for-profit organization that provides licenses and registration for a variety of users.
For organizations that have been granted a photocopy license by the CCC, a separate system of payment has been
arranged.

Trademark Notice: Product or corporate names may be trademarks or registered trademarks, and are used only for
identification and explanation without intent to infringe.
Visit the Taylor & Francis Web site at
[Link]
and the CRC Press Web site at
[Link]
Contents
Preface.............................................................................................................................................. xv
Acknowledgments...........................................................................................................................xvii
Authors.............................................................................................................................................xix

Chapter 1 Introduction...................................................................................................................1
1.1 Background.........................................................................................................1
1.2 Objectives and Definitions.................................................................................3
1.3 Featured Areas of the Book................................................................................5
References..................................................................................................................... 7

PART I Fundamental Principles of Remote Sensing

Chapter 2 Electromagnetic Radiation and Remote Sensing........................................................ 11

2.1 Introduction...................................................................................................... 11
2.2 Properties of Electromagnetic Radiation......................................................... 12
2.3 Solar Radiation................................................................................................. 12
2.4 Atmospheric Radiative Transfer....................................................................... 13
2.4.1 Principles of Radiative Transfer.......................................................... 13
2.4.2 Reflection............................................................................................. 15
2.4.3 Refraction............................................................................................ 15
2.5 Remote Sensing Data Collection...................................................................... 16
2.5.1 Atmospheric Windows for Remote Sensing....................................... 16
2.5.2 Specific Spectral Region for Remote Sensing..................................... 18
2.5.3 Band Distribution for Remote Sensing................................................ 18
2.6 Rationale of Thermal Remote Sensing............................................................. 19
2.6.1 Thermal Radiation.............................................................................. 19
2.6.2 Energy Budget and Earth’s Net Radiation..........................................20
2.7 Basic Terminologies of Remote Sensing.......................................................... 21
2.8 Summary.......................................................................................................... 22
References................................................................................................................... 22

Chapter 3 Remote Sensing Sensors and Platforms...................................................................... 23

3.1 Introduction...................................................................................................... 23
3.2 Remote Sensing Platforms...............................................................................24
3.2.1 Space-Borne Platforms........................................................................24
3.2.2 Air-Borne Platforms............................................................................26
3.2.3 Ground- or Sea-Based Platforms........................................................26
3.3 Remote Sensing Sensors...................................................................................26
3.3.1 Passive Sensors....................................................................................28
3.3.2 Active Sensors..................................................................................... 29
3.4 Real-World Remote Sensing Systems............................................................... 30
3.5 Current, Historical, and Future Important Missions........................................ 35
3.5.1 Current Important Missions................................................................ 35

v
vi Contents

3.5.2 Historic Important Missions............................................................... 36

3.5.3 Future Important Missions.................................................................. 38
3.6 System Planning of Remote Sensing Applications..........................................40
3.7 Summary..........................................................................................................44
References...................................................................................................................44

Chapter 4 Image Processing Techniques in Remote Sensing...................................................... 47

4.1 Introduction...................................................................................................... 47
4.2 Image Processing Techniques.......................................................................... 47
4.2.1 Pre-Processing Techniques................................................................. 48
[Link] Atmospheric Correction...................................................... 48
[Link] Radiometric Correction....................................................... 49
[Link] Geometric Correction.......................................................... 50
[Link] Geometric Transformation.................................................. 51
[Link] Resampling.......................................................................... 52
[Link] Mosaicking.......................................................................... 53
[Link] Gap Filling........................................................................... 53
4.2.2 Advanced Processing Techniques....................................................... 54
[Link] Image Enhancement............................................................ 54
[Link] Image Restoration................................................................ 55
[Link] Image Transformation......................................................... 55
[Link] Image Segmentation............................................................ 56
4.3 Common Software for Image Processing......................................................... 57
4.3.1 ENVI................................................................................................... 58
4.3.2 ERDAS IMAGINE............................................................................. 58
4.3.3 PCI Geomatica.................................................................................... 59
4.3.4 ArcGIS................................................................................................ 59
4.3.5 MATLAB®..........................................................................................60
4.3.6 IDL...................................................................................................... 61
4.4 Summary.......................................................................................................... 61
References................................................................................................................... 61

PART II Feature Extraction for Remote Sensing

Chapter 5 Feature Extraction and Classification for Environmental Remote Sensing................ 69

5.1 Introduction...................................................................................................... 69
5.2 Feature Extraction Concepts and Fundamentals.............................................. 71
5.2.1 Definition of Feature Extraction......................................................... 71
5.2.2 Feature and Feature Class................................................................... 72
5.2.3 Fundamentals of Feature Extraction................................................... 74
5.3 Feature Extraction Techniques......................................................................... 77
5.3.1 Spectral-Based Feature Extraction...................................................... 77
5.3.2 Spatial-Based Feature Extraction........................................................80
5.4 Supervised Feature Extraction......................................................................... 81
5.5 Unsupervised Feature Extraction.....................................................................84
5.6 Semi-supervised Feature Extraction................................................................84
5.7 Image Classification Techniques with Learning Algorithms........................... 85
5.8 Performance Evaluation Metric........................................................................ 87
Contents vii

5.9 Summary.......................................................................................................... 89
References................................................................................................................... 89

Chapter 6 Feature Extraction with Statistics and Decision Science Algorithms......................... 95

6.1 Introduction...................................................................................................... 95
6.2 Statistics and Decision Science-Based Feature Extraction Techniques...........96
6.2.1 Filtering Operation..............................................................................96
6.2.2 Mathematical Morphology..................................................................99
6.2.3 Decision Tree Learning..................................................................... 102
[Link] Decision Tree Classifier..................................................... 102
[Link] RF Classifier...................................................................... 104
6.2.4 Cluster Analysis................................................................................ 105
[Link] Connectivity-Based Clustering.......................................... 105
[Link] Centroid-Based Clustering................................................. 106
[Link] Density-Based Clustering.................................................. 106
[Link] Distribution-Based Clustering........................................... 107
6.2.5 Regression and Statistical Modeling................................................. 108
[Link] Linear Extrapolation and Multivariate Regression............ 108
[Link] Logistic Regression............................................................ 109
6.2.6 Linear Transformation...................................................................... 113
[Link] Principal Component Analysis (PCA)............................... 113
[Link] Linear Discriminant Analysis (LDA)................................ 115
[Link] Wavelet Transform............................................................. 116
6.2.7 Probabilistic Techniques................................................................... 117
[Link] Maximum Likelihood Classifier (MLC)........................... 117
[Link] Naive Bayes Classifier....................................................... 118
6.3 Summary........................................................................................................ 120
References................................................................................................................. 120

Chapter 7 Feature Extraction with Machine Learning and Data Mining Algorithms.............. 127
7.1 Introduction.................................................................................................... 127
7.2 Genetic Programming.................................................................................... 130
7.2.1 Modeling Principles and Structures.................................................. 130
7.2.2 Illustrative Example.......................................................................... 132
7.3 Artificial Neural Networks............................................................................. 137
7.3.1 Single-Layer Feedforward Neural Networks and Extreme
Learning Machine............................................................................. 138
7.3.2 Radial Basis Function Neural Network............................................. 142
7.4 Deep Learning Algorithms............................................................................ 144
7.4.1 Deep Learning Machine................................................................... 144
7.4.2 Bayesian Networks............................................................................ 146
7.4.3 Illustrative Example.......................................................................... 150
7.5 Support Vector Machine................................................................................. 153
7.5.1 Classification Based on SVM............................................................ 153
7.5.2 Multi-Class Problem......................................................................... 156
7.5.3 Illustrative Example.......................................................................... 156
7.6 Particle Swarm Optimization Models............................................................ 158
7.7 Summary........................................................................................................ 160
References................................................................................................................. 161
viii Contents

PART III Image and Data Fusion for Remote Sensing

Chapter 8 Principles and Practices of Data Fusion in Multisensor Remote Sensing

for Environmental Monitoring.................................................................................. 167
8.1 Introduction.................................................................................................... 167
8.2 Concepts and Basics of Image and Data Fusion............................................. 168
8.2.1 Pixel-Level Fusion............................................................................. 169
8.2.2 Feature-Level Fusion......................................................................... 170
8.2.3 Decision-Level Fusion....................................................................... 170
8.3 Image and Data Fusion Technology Hubs...................................................... 171
8.3.1 Multispectral Remote Sensing-Based Fusion Techniques................ 171
[Link] Image and Data Fusion with the Aid of Pre-Processors.... 171
[Link] Image and Data Fusion with the Aid of Feature
Extractors........................................................................... 173
[Link] Uncertainty-Based Approaches for Multi-Resolution
Fusion................................................................................. 173
[Link] Matrix Factorization Approaches across
Spatiotemporal and Spectral Domains.............................. 174
[Link] Hybrid Approaches............................................................ 174
[Link] Environmental Applications.............................................. 175
8.3.2 Hyperspectral Remote Sensing-Based Fusion Techniques............... 176
[Link] Data Fusion between Hyperspectral and Multispectral
Images................................................................................ 178
[Link] Data Fusion between Hyperspectral Images and
LiDAR Data....................................................................... 179
[Link] Hybrid Approach............................................................... 179
[Link] Environmental Applications.............................................. 180
8.3.3 Microwave Remote Sensing-Based Fusion Techniques.................... 180
[Link] Data Fusion between SAR and the Optical Imageries...... 182
[Link] Data Fusion between SAR and LiDAR or Laser
Altimeter............................................................................ 183
[Link] Data Fusion between SAR, Polarimetric,
and Interferometric SAR or Others................................... 184
[Link] Environmental Applications.............................................. 184
8.4 Summary........................................................................................................ 185
References................................................................................................................. 187

Chapter 9 Major Techniques and Algorithms for Multisensor Data Fusion.............................. 195
9.1 Introduction.................................................................................................... 195
9.2 Data Fusion Techniques and Algorithms....................................................... 196
9.2.1 Pan-Sharpening................................................................................. 197
[Link] Component Substitution.................................................... 198
[Link] Relative Spectral Contribution..........................................200
[Link] High Frequency Injection..................................................200
[Link] Multi-Resolution Transformation...................................... 201
[Link] Statistical and Probabilistic Methods................................ 201
9.2.2 Statistical Fusion Methods................................................................202
[Link] Regression-Based Techniques...........................................202
[Link] Geostatistical Approaches.................................................202
Contents ix

[Link] Spatiotemporal Modeling Algorithms............................... 203

9.2.3 Unmixing-Based Fusion Methods.....................................................206
9.2.4 Probabilistic Fusion Methods............................................................209
9.2.5 Neural Network-Based Fusion Methods........................................... 212
9.2.6 Fuzzy Set Theory-Based Fusion Methods........................................ 215
9.2.7 Support Vector Machine-Based Fusion Methods............................. 215
9.2.8 Evolutionary Algorithms................................................................... 215
9.2.9 Hybrid Methods................................................................................. 217
9.3 Summary........................................................................................................ 218
References................................................................................................................. 219

Chapter 10 System Design of Data Fusion and the Relevant Performance Evaluation Metrics...... 229
10.1 Introduction.................................................................................................... 229
10.2 System Design of Suitable Data Fusion Frameworks..................................... 230
10.2.1 System Design for Data Fusion—Case 1.......................................... 230
10.2.2 System Design for Data Fusion—Case 2.......................................... 232
10.2.3 The Philosophy for System Design of Data Fusion.......................... 234
10.3 Performance Evaluation Metrics for Data Fusion.......................................... 234
10.3.1 Qualitative Analysis.......................................................................... 235
10.3.2 Quantitative Analysis........................................................................ 235
[Link] Without Reference Image.................................................. 235
[Link] With Reference Image....................................................... 237
10.4 Summary........................................................................................................ 241
References................................................................................................................. 241

PART IV I ntegrated Data Merging, Data Reconstruction,

Data Fusion, and Machine Learning

Chapter 11 Cross-Mission Data Merging Methods and Algorithms........................................... 247

11.1 Introduction.................................................................................................... 247
11.1.1 Data Merging with Bio-Optical or Geophysical Models..................248
11.1.2 Data Merging with Machine Learning Techniques..........................248
11.1.3 Data Merging with Statistical Techniques........................................ 249
11.1.4 Data Merging with Integrated Statistical and Machine
Learning Techniques......................................................................... 249
11.2 The SIASS Algorithm.................................................................................... 250
11.2.1 Data Merging via SIASS................................................................... 250
[Link] SVD................................................................................... 255
[Link] Q-Q adjustment.................................................................. 255
[Link] EMD.................................................................................. 258
[Link] ELM................................................................................... 259
[Link] Performance Evaluation..................................................... 262
11.3 Illustrative Example for Demonstration......................................................... 262
11.3.1 Study Area......................................................................................... 262
11.3.2 Baseline Sensor Selection................................................................. 263
11.3.3 Systematic Bias Correction...............................................................266
11.3.4 Location-Dependent Bias Correction................................................ 269
11.3.5 Spectral Information Synthesis......................................................... 269
x Contents

11.4 Summary........................................................................................................ 273

References................................................................................................................. 274

Chapter 12 Cloudy Pixel Removal and Image Reconstruction.................................................... 277

12.1 Introduction.................................................................................................... 277
12.2 Basics of Cloud Removal for Optical Remote Sensing Images..................... 283
12.2.1 Substitution Approaches.................................................................... 283
12.2.2 Interpolation Approaches..................................................................284
12.3 Cloud Removal with Machine Learning Techniques..................................... 285
12.4 Summary........................................................................................................ 296
References................................................................................................................. 296

Chapter 13 Integrated Data Fusion and Machine Learning for Intelligent Feature Extraction....... 301
13.1 Introduction.................................................................................................... 301
13.1.1 Background....................................................................................... 301
13.1.2 The Pathway of Data Fusion.............................................................302
13.2 Integrated Data Fusion and Machine Learning Approach.............................304
13.2.1 Step 1—Data Acquisition.................................................................. 305
13.2.2 Step 2—Image Processing and Preparation......................................306
13.2.3 Step 3—Data Fusion.........................................................................307
13.2.4 Step 4—Machine Learning for Intelligent Feature Extraction.........308
13.2.5 Step 5—Water Quality Mapping....................................................... 312
13.3 Summary........................................................................................................ 317
Appendix 1: Ground-Truth Data............................................................................... 318
Appendix 2................................................................................................................ 319
References................................................................................................................. 319

Chapter 14 Integrated Cross-Mission Data Merging, Fusion, and Machine Learning

Algorithms Toward Better Environmental Surveillance.......................................... 323
14.1 Introduction.................................................................................................... 323
14.2 Architecture of CDIRM................................................................................. 325
14.2.1 Image Pre-Processing........................................................................ 327
14.2.2 Data Merging via SIASS................................................................... 328
14.2.3 Data Reconstruction via SMIR......................................................... 330
14.2.4 Feature Extraction and Content-Based Mapping.............................. 334
14.3 Summary........................................................................................................ 336
Appendix: Field Data Collection for Ground Truthing............................................. 339
References................................................................................................................. 341

PART V Remote Sensing for Environmental Decision Analysis

Chapter 15 Data Merging for Creating Long-Term Coherent Multisensor

Total Ozone Record................................................................................................ 345
15.1 Introduction.................................................................................................... 345
15.2 Data Collection and Analysis.........................................................................348
15.2.1 OMPS TCO Data..............................................................................348
Contents xi

15.2.2 OMI TCO Data................................................................................. 349

15.2.3 WOUDC TCO Data.......................................................................... 350
15.2.4 Comparative Analysis of TCO Data................................................. 351
15.3 Statistical Bias Correction Scheme................................................................ 351
15.3.1 Basics of the Q-Q Adjustment Method in This Study...................... 355
[Link] Traditional Bias Correction Method.................................. 355
[Link] Modified Bias Correction Method..................................... 357
15.3.2 Overall Inconsistency Index for Performance Evaluation................ 359
15.4 Performance of Modified Bias Correction Method........................................360
15.5 Detection of Ozone Recovery Based on the Merged TCO Data.................... 365
15.6 Calibration of the Merged TCO Record with Ground-Based Measurements....367
15.7 Summary........................................................................................................ 370
References................................................................................................................. 370

Chapter 16 Water Quality Monitoring in a Lake for Improving a Drinking Water

Treatment Process..................................................................................................... 375
16.1 Introduction.................................................................................................... 375
16.2 Study Region.................................................................................................. 378
16.3 Study Framework........................................................................................... 379
16.4 TOC Mapping in Lake Harsha Using IDFM................................................. 384
16.4.1 Field Campaign................................................................................. 384
16.4.2 Impact of Data Fusion....................................................................... 386
16.4.3 Impact of Feature Extraction Algorithms......................................... 388
16.4.4 Spatiotemporal TOC Mapping for Daily Monitoring....................... 391
16.5 Summary........................................................................................................ 394
References................................................................................................................. 395

Chapter 17 Monitoring Ecosystem Toxins in a Water Body for Sustainable Development

of a Lake Watershed.................................................................................................. 397
17.1 Introduction.................................................................................................... 397
17.2 Study Region and Pollution Episodes............................................................. 401
17.3 Space-borne and in situ Data Collection........................................................403
17.4 Study Framework...........................................................................................404
17.4.1 Data Acquisition................................................................................407
17.4.2 Image Processing..............................................................................407
17.4.3 Data Fusion........................................................................................408
17.4.4 Machine Learning or Data Mining...................................................408
17.4.5 Concentration Map Generation.........................................................409
17.5 Model Comparison for Feature Extraction.....................................................409
17.5.1 Reliability Analysis...........................................................................409
17.5.2 Prediction Accuracy over Different Models..................................... 413
[Link] Comparison between Bio-Optical Model and Machine
Learning Model for Feature Extraction............................. 413
[Link] Impact of Data Fusion on Final Feature Extraction.......... 414
[Link] Influences of Special Bands on Model Development........ 414
17.6 Mapping for Microcystin Concentrations...................................................... 416
17.7 Summary........................................................................................................ 418
References................................................................................................................. 419
xii Contents

Chapter 18 Environmental Reconstruction of Watershed Vegetation Cover

to Reflect the Impact of a Hurricane Event............................................................... 421
18.1 Introduction.................................................................................................... 421
18.2 Study Regions and Environmental Events..................................................... 423
18.2.1 The Hackensack and Pascack Watershed.......................................... 423
18.2.2 The Impact of Hurricane Sandy........................................................ 424
18.3 Unsupervised Multitemporal Change Detection............................................ 426
18.3.1 Data Fusion........................................................................................ 426
18.3.2 NDVI Mapping Based on the Fused Images..................................... 428
18.3.3 Performance Evaluation of Data Fusion........................................... 428
18.3.4 Tasseled Cap Transformation for Hurricane Sandy Event................ 428
18.4 Entropy Analysis of Data Fusion.................................................................... 430
18.5 Comparison of the Hurricane Sandy Impact on the Selected Coastal
Watershed....................................................................................................... 432
18.5.1 NDVI Maps....................................................................................... 432
18.5.2 Tasseled Cap Transformation Plots................................................... 432
18.6 Multitemporal Change Detection...................................................................440
18.7 Dispersion Analysis of TCT versus NDVI..................................................... 443
18.8 Summary........................................................................................................ 447
References.................................................................................................................448

Chapter 19 Multisensor Data Merging and Reconstruction for Estimating PM2.5

Concentrations in a Metropolitan Region................................................................. 451
19.1 Introduction.................................................................................................... 451
19.2 AOD Products and Retrieval Algorithms....................................................... 454
19.2.1 AOD Products................................................................................... 454
19.2.2 AOD Retrieval Algorithms............................................................... 456
19.3 Challenges in Merging of AOD Products....................................................... 457
19.4 Study Framework and Methodology.............................................................. 458
19.4.1 Study Area......................................................................................... 458
19.4.2 Data Sources...................................................................................... 458
19.4.3 Methodology.....................................................................................460
19.4.4 Performance Evaluation.................................................................... 463
19.5 Results............................................................................................................ 463
19.5.1 Variability of PM2.5 Concentrations.................................................. 463
19.5.2 Data Merging of AOD Products........................................................464
19.5.3 PM2.5 Concentration Modeling and Mapping...................................466
19.5.4 Gap Filling Using SMIR Method...................................................... 471
19.6 Application Potential for Public Health Studies............................................. 474
19.7 Summary........................................................................................................ 475
References................................................................................................................. 475

Chapter 20 Conclusions............................................................................................................... 481

20.1 Introduction.................................................................................................... 481
20.2 Challenges...................................................................................................... 481
20.2.1 Data Science and Big Data Analytics............................................... 481
20.2.2 Environmental Sensing..................................................................... 483
20.2.3 Environmental Modeling..................................................................484
Contents xiii

20.3 Future Perspectives and Actualization........................................................... 485

20.3.1 Contemporary Research Topics........................................................ 485
20.3.2 Remote Sensing Education................................................................ 485
20.4 Summary........................................................................................................ 487
References................................................................................................................. 487

Index............................................................................................................................................... 489
Preface
Earth observation and environmental monitoring require data to be collected for assessing various
types of natural systems and the man-made environment with varying scales. Such research
enables us to deepen our understanding of a wealth of geophysical, geochemical, hydrological,
meteorological, and ecological processes of interest. For the past few years, the scientific community
has realized that obtaining a better understanding of interactions between natural systems and the
man-made environment across different scales demands more research efforts in remote sensing.
The key research questions include: (1) how to properly fuse the multisensor images with different
spatial, temporal, and spectral resolution to minimize the data gaps and create a myriad of long-
term consistent and cohesive observations for further feature extraction, and (2) how feature
extraction can be adequately processed by these traditional algorithms and advanced computational
intelligence methods to overcome barriers when complex features are embedded or constrained by
heterogeneous images. From systems engineering perspectives, the method of integrating the latest
forefronts of multisensor data fusion and feature extraction with the aid of advanced computational
intelligence methods is of primary importance to achieve our common goal—“the whole is greater
than the sum of its parts.”
The aim of this book is thus to elucidate the essence of integrated multisensor data fusion and
machine learning highlighted for promoting environmental sustainability. It emphasizes the concept
of the “System of Systems Engineering” approach with implications for both art and science. Such
an endeavor can accommodate an all-inclusive capability of sensing, monitoring, modeling, and
decision making to help mitigate the natural and human-induced stresses on the environment.
On this foundation, many new techniques of remote sensing image processing tools and feature
extraction methods in concert with existing space-borne, air-borne, and ground-based measurements
have been collectively presented across five distinctive topical areas in this book. This initiative
leads to a thorough discussion of possible future research with synergistic functionality across space
and time at the end of the book. The book will be a useful reference for graduate students, academic
scholars, and working professionals who are involved in the study of “earth systems science” and
“environmental science and engineering” promoting environmental sustainability.

Ni-Bin Chang and Kaixu Bai

MATLAB® is a registered trademark of The MathWorks, Inc. For product information, please contact:

The MathWorks, Inc.

3 Apple Hill Drive
Natick, MA 01760-2098 USA Tel: 508 647 7000
Fax: 508-647-7001
E-mail: info@[Link]
Web: [Link]

xv
Acknowledgments
This book grew out of a series of research grants and contracts as well as a wealth of international
collaborative work. This book could not have been written without the valuable assistance of several
people. The authors are indebted to Mr. Benjamin Vannah, Ms. Xiaoli Wei, Mr. Chandan Mostafiz,
and Dr. Zhibin Sun for their collective contributions. Their helpful research and/or thesis work are
gratefully acknowledged. We also extend our gratitude to Ms. Rachel Winter, who serves as the
language editor of this book. Finally, without encouragement from Ms. Irma Shagla Britton, who
is the senior editor of Environmental Sciences, Remote Sensing & GIS in the CRC Press/Taylor &
Francis Group, we could have not made up our mind to complete this lengthy work. Special thanks
are extended to her as well.

xvii
Authors
Ni-Bin Chang has been professor of Environmental Systems Engineering,
having held this post in the United States of America since 2002. He received
his BS in Civil Engineering from the National Chiao-Tung University in
Taiwan in 1983, and MS and PhD in Environmental Systems Engineering
from Cornell University in the United States of America in 1989 and
1991, respectively. Dr. Chang’s highly interdisciplinary research lies at the
intersection among “Environmental Sustainability, Green Engineering, and
Systems Analysis.” He is director of the Stormwater Management Academy
and professor with the Department of Civil, Environmental, and Construction
Engineering at the University of Central Florida in the United States of America. From August
2012 to August 2014, Professor Chang served as program director of the Hydrologic Sciences
Program and Cyber-Innovated Sustainability Science and Engineering Program at the National
Science Foundation in the United States of America. He was elevated to Fellow of the Institute
of Electronics and Electrical Engineers (IEEE) in 2017, and he has been actively with the IEEE
Geoscience and Remote Sensing Society, IEEE Systems, Man, and Cybernetics Society, and the
IEEE Computational Intelligence Society. He also has distinctions which are selectively awarded
titles, such as an inducted Fellow of the European Academy of Sciences in 2008, and an elected
Fellow of American Society of Civil Engineers ([Link]) in 2009, the American Association for
the Advancement of Science ([Link]) in 2011, the International Society of Optics and Photonics
([Link]) in 2014, and the Royal Society of Chemistry (the United Kingdom) ([Link]) in 2015. He
has been the editor-in-chief of SPIE Journal of Applied Remote Sensing since 2014. He is currently
an editor, associated editor, or editorial board member of 20+ international journals.

Kaixu Bai has been an assistant professor of cartography and geographic

information systems at School of Geographic Sciences of East China Normal
University since 2016. He received his BS in urban and rural planning from
Northeast Agricultural University in 2009, and PhD in geographic information
system from East China Normal University in 2015, respectively, in China.
His research interest focuses mainly on environmental remote sensing and
modeling for earth system science by taking advantage of multisensor data
merging, fusion, and mining as well as machine learning.

xix
1 Introduction

1.1 BACKGROUND
Remote sensing is defined as the acquisition and analysis of remotely sensed images to gain
information about the state and condition of an object through sensors that are not in physical contact
with it and discover relevant knowledge for decision making. Remote sensing for environmental
monitoring and Earth observations can be defined as:

Remote sensing is the art and science of obtaining information about the surface or subsurface of
Earth without needing to be in contact with it. This can be achieved by sensing and recording emitted
or reflected energy toward processing, analyzing, and interpreting the retrieved information for
decision-making.

The remote sensing process involves the use of various imaging systems where the following seven
elements are involved for environmental monitoring and earth observations: (1) illumination by
the sun or moon; (2) travel through the atmosphere; (3) interactions with the target; (4) recording
of energy by the sensor; (5) transmission, absorption, reflection, and emission; (6) retrieval,
interpretation, and analysis; and (7) decision making for applications.
Types of remote sensing technologies include air-borne, space-borne, ground-based, and sea-
based remote sensing technologies with a wealth of sensors onboard different platforms. These
sensors are designed to observe electromagnetic, acoustic, ultrasonic, seismic, and magnetic energy
for environmental monitoring and earth observation. This book focuses on remote sensing sensors
making use of the electromagnetic spectrum for environmental decision making. These sensors
generally detect reflected and emitted energy wavelengths ranging from ultraviolet to optical, to
infrared, to microwave remote sensing that can measure the electromagnetic energy.
Over the last few decades, satellite remote sensing that aims to observe solar radiation has
become an invaluable tool for providing estimates of spatial and temporal time series variables
with electromagnetic sensors. The traditional image-processing algorithms often involve image
restoration, image enhancement, image segmentation, image transformation, image fusion, and data
assimilation with feature extraction/classification models. With the availability of field observations,
such image-processing efforts enable us to provide our society with an unprecedented learning
capacity to observe, monitor, and quantify the fluxes of water, sediment, solutes, and heat through
varying pathways at different scales on the surface of Earth. Environmental status and ecosystem
state can then be assessed through a more lucid and objective approach. Yet this requires linking
remote sensing image processing with change detection in a more innovative way.
In an attempt to enlarge the application potential, sensor and data fusion with improved spatial,
temporal, and spectral resolution has become a precious decision support tool that helps observe
complex and dynamic Earth systems at different scales. The need to build more comprehensive and
predictive capabilities requires intercomparing earth observations across remote sensing platforms
and in situ field sites, leading to cohesively explore multiscale earth observation from local up
to a regional or global extent for scientific investigation (CUAHSI, 2011). Recent advancements
in artificial intelligence techniques have motivated a significant initiative of advanced image
processing for better feature extraction, information retrieval, classification, pattern recognition,
and knowledge discovery. In concert with image and data fusion, the use of machine learning and
an ensemble of classifiers to enhance such an initiative are gaining more attention. The progress in
this regard will certainly help answer more sophisticated and difficult science questions as to how

1
2 Multisensor Data Fusion and Machine Learning for Environmental Remote Sensing

environmental, ecological, meteorological, hydrological, and geological components interact with

each other in Earth systems.
Given the complexity of these technologies and the diversity of application domains, various
ultraviolet, optical, infrared, and microwave remote sensing technologies must be used, either
independently or collectively, in dealing with different challenges. To achieve a sound system design
for remote sensing, the “system-of-systems engineering” (SoSE) approach must often be applied for
tackling these complexities and challenges. SoSE is a set of developing processes, tools, and methods
for designing, redesigning, and deploying solutions to System-of-Systems challenges (United States
Department of Defense, 2008). In support of various types of missions of environmental monitoring
and earth observations, the main components associated with a remote sensing system that the
system design has to consider include (Figure 1.1): (1) a source of the electromagnetic radiation (i.e.,
the sun) to be detected by the sensors mounted on air-borne, space-borne, sea-based, or ground-
based platforms; (2) the targets to be detected on Earth’s surface which receive the incident energy
from a source such as the sun and those targets which emit energy; (3) the quantity and type of sensor
instruments to measure and record the reflected and/or emitted energy leaving Earth’s surface;
(4) the platforms supporting satellite sensors and payload operations such as altitude, gesture,
orbital trajectory control, instrument calibration, power supply, data compression and transmission
through digital telemetry, and communication between the receiving station and the satellite or
among satellites in a satellite constellation; (5) the receiving station that collects raw digital data or
compressed data through a wireless communication network and converts them into an appropriate
format; (6) the processing unit in charge of pre-processing the raw data through a suite of correction
procedures that are generally performed for generating remote sensing data products prior to the
distribution of the products or the imageries; (7) the knowledge discovery or information retrieval
to be carried out by professional analysts to convert the pre-processed image data into various
thematic information of interest using a suite of image-processing techniques; and (8) the end-user
communities who may utilize these thematic products for applications.
On many occasions, sensor networks onboard satellites and/or aircrafts are required to provide
holistic viewpoints in concert with ground- or sea-based sensors. To further advance Earth system
science, such sensor networks can be extended to cover sensors onboard unmanned aerial vehicles
(UAVs) and autonomous underwater vehicles (AUVs) to overcome local or regional surveillance
barriers. Such expanded sensor networks may be coordinated through proper sensing, networking,

Source
Sensors

Information

Atmospheric People
Transm

Hardware
interaction
Processes
ission

Software
Data
Information
system
Target

Receiving
unit
Processing
unit

FIGURE 1.1 Basic structure of a remote sensing system.

Introduction 3

and control with the aid of wired or wireless communication systems to produce multidimensional
information and to monitor the presence of unique events. The SoSE approach may certainly provide
sound routine monitoring, early warning, and emergency response capacity in our society, which is
facing climate change, globalization, urbanization, economic development, population growth, and
resource depletion.
An integrated Earth system observatory that merges surface-based, air-borne, space-borne, and
even underground sensors with comprehensive and predictive capabilities indicates promise for
revolutionizing the study of global water, energy, and carbon cycles as well as land use and land
cover changes. This may especially be true if these multisensor data fusion and machine learning
technologies are developed and deployed in a coordinated manner and the synergistic data are further
screened, synthesized, analyzed, and assimilated into appropriate numerical simulation models for
advanced decision analysis. Thus, the aim of this book is to present a suite of relevant concepts,
tools, and methods of the integrated multisensor data fusion and machine learning technologies to
promote environmental sustainability.

1.2 OBJECTIVES AND DEFINITIONS

The main objective of this book is to demonstrate the knowledge base for capacity building of
integrated data fusion and machine learning with respect to a suite of satellite sensors and machine
learning techniques to enhance the synergistic effect of multiscale remote sensing observations
and intelligent image processing. The discussion of data fusion in this book is based on a definition
derived from the recommendation of the United States Department of Defense Joint Directors of
Laboratories (JDL) Data Fusion Subpanel:

Data fusion is a multi-level, multifaceted process dealing with the automatic detection of the registration,
detection, association, correlation, and combination of data and information from multiple sources to
achieve refined state and identity estimation, and complete timely assessments of situation including
both threats and opportunities.

The data fusion process proposed by JDL was classified into five processing levels, an associated
database, and an information bus that connects the five levels (Castanedo, 2013). The five levels of
processing are defined as (Figure 1.2):

• Level 0—Source pre-processing: Source pre-processing allocates data to suitable processes

and performs data prescreening which maintains useful information for high-level
processes (Hall and Llinas, 1997). It performs fusion at signal and pixel levels (Castanedo,
2013).
• Level 1—Object refinement: Object refinement transforms the processed data into consistent
data structures. It combines features extracted from processed images and refines the
identification of objects and classification (Hall and Llinas, 1997; Castanedo, 2013).
• Level 2—Situation assessment: Decision-level fusion starts at level 2. This level relates the
events to the likely observed situations and helps us with data interpretation.
• Level 3—Impact assessment: The risk of future events is assessed at level 3 by evaluating
current situations.
• Level 4—Process refinement: Finally, the last level monitors the data fusion process
and identifies what additional information is required to improve the data fusion process
(Castanedo, 2013).

In this book, machine learning or data mining algorithms are emphasized to help feature extraction
of remote sensing images. The discussion of data mining in this book is based on the following
definition:
4 Multisensor Data Fusion and Machine Learning for Environmental Remote Sensing

Level 4—Process refinement

Level 3—Impact assessment Monitor data

fusion
Project the
Level 2—Situation assessment current
situation into
Decision rules future

Level 1—Object refinement

Relate pattern
Pattern to events
recognition
Level 0—Source pre-processing
Extract features
Pre-processing

Sensors

FIGURE 1.2 Structure of the JDL data fusion. (From Chang, N. B. et al., 2016. IEEE Systems Journal, 1–17.)

Data mining, sometimes called knowledge discovery, is a big data analytic process designed to explore
or investigate data from different perspectives in search of useful patterns, consistent rules, systematic
relationships, and/or applicable information embedded among various types of flexibly grouped system
variables of interests. It requires subsequent validation of the findings by applying these findings to new
subsets of data.

The discussion of machine learning in this book is based on the following definition:

Machine learning that was born from artificial intelligence is a computer science theory making
computers learn something from different data analyses and even automate analytical or empirical
model building without being coded to do so. By using various statistics, decision science, evolutionary
computation, and optimization techniques to learn from data iteratively, machine learning allows
computers to identify or discover hidden rules, inherent patterns, possible associations, and unknown
interactions without being programmed what to search and where to look explicitly.

The major difference between data mining and machine learning is that the former has no clue
about what the patterns or rules are in a system whereas the latter has some clues in advance about
what the system looks like based on local or labeled samples. In image classification and feature
extraction of remote sensing studies, such a distinction rests on whether or not the system of interest
has some ground or sea truth data to infer. Since ground or sea truth data may or may not be
available in different types of remote sensing studies, and collecting ground or sea truth data is
favored for image classification and feature extraction toward much better prediction accuracy, the
emphasis of this book is placed on machine learning rather than data mining although both cases
are discussed in the context.
The process of data mining or machine learning consists of four general stages: (1) the initial
exploration (i.e., data collection and/or sampling); (2) model building or pattern identification and
recognition; (3) model verification and validation; and (4) application of the model to new data in
order to generate predictions.
The niche for integrating data fusion and machine learning for remote sensing rests upon the
creation of a new scientific architecture in remote sensing science that is designed to support
numerical as well as symbolic data fusion managed by several cognitively oriented machine learning
tasks. Whereas the former is represented by the JDL framework, the latter is driven by a series of
image restorations, reconstructions, enhancements, segmentations, transformations, and fusions for
intelligent image processing and knowledge discovery (Figure 1.3) (Chang et al., 2016). Well-known
machine learning methods include but are not limited to genetic algorithms, genetic programming,
Introduction 5

Data and information sources

National

Distributed

Local

Intelligence sources Sensor data Databases

Source Processing:
Level 0 pre-processing Level 1 object refinement
Data fusion domain

Processing:
Level 2 Database management
situation refinement
system

Processing:
Level 3
impact refinement
Fusion Fusion
database database
Processing:
Level 4 process refinement

Human (level 5 processing) or computer interaction

Cognitive, psychology, Computer science

and ergonomics

Social, organization, Design and arts

and psychology

FIGURE 1.3 The new architecture of integrated data fusion and machine learning.

artificial neural networks, particle swarm optimization, support vector machines, and so on (Zilioli
and Brivio, 1997; Volpe et al., 2007; Chen et al., 2009; Bai et al., 2015; Chang et al., 2015). They can
be used for data mining as well if we do not have in situ observations.

1.3 FEATURED AREAS OF THE BOOK

The organization of this book will include most of the latest forefronts in the subject area streamlined
by a logical sequence of multisensor data fusion and machine learning with a range of specific
chapters associated with the following five featured areas:

• Part I—Fundamental Principles of Remote Sensing: This part of the discussion will
demonstrate a contemporary coverage of the basic concepts, tools, and methods associated
with remote sensing science. The relationship between electromagnetic radiation and remote
sensing will be discussed in Chapter 2. Then the types of sensors and platforms that provide
the image acquisition capacity will be delineated and emphasized in Chapter 3. Finally, the
method of pre-processing raw images and how pre-processed images can be integrated with
6 Multisensor Data Fusion and Machine Learning for Environmental Remote Sensing

a geographical information system for different types of analyses via differing software
packages will be introduced in Chapter 4.
• Part II—Feature Extraction for Remote Sensing: With the foundation of Part I, Part II
aims to introduce basic feature extraction skills and discussion of the latest machine
learning techniques for feature extraction. In this context, an overview of concepts
and basics for feature extraction will be introduced for readers who do not have such a
background (Chapter 5). The statistics or probability-based and machine-learning-based
feature extraction methods will be described in sequence to entail how remote sensing
products can be analyzed and interpreted for different purposes (Chapters 6 and 7). These
machine learning analyses need to be validated with the aid of ground- or sea-based
sensor networks that help collect reference data. Along this line, the final product can be
integrated further with numerical simulation models in support of high-end research for
environmental science and Earth system science.
• Part III—Image and Data Fusion for Remote Sensing: Following the streamlines of
logic sequence in Parts I and II, Part III will focus on the concepts of image fusion with
respect to current technology hubs (Chapter 8) toward an all-inclusive coverage of the
most important image or data fusion algorithms (Chapter 9). These image and data fusion
efforts may lead to the derivation of a series of new remote sensing data products with
sound system design and the corresponding advancements must be specifically addressed
and evaluated by a suite of performance-based evaluation metrics (Chapter 10). Situation
refinement can be made possible with iterative work based on the knowledge developed
in this part.
• Part IV—Integrated Data Merging, Data Reconstruction, Data Fusion, and Machine
Learning: Starting at Part IV, focus is placed on large-scale, complex, and integrated
data merging, fusion, image reconstruction, and machine learning scenarios.
Chapter 11 offers intensive information about the concepts and tools of data merging
with respect to multiple satellite sensors. Whether using data fusion or data merging,
cloudy pixels cannot be fully recovered. Thus, cloudy pixel reconstruction at appropriate
stages may come to elevate the accomplishment of data fusion and merging in support
of machine learning (Chapter 12). With the inclusion of Chapter 12, regardless of
intensity and spatial variability of cloudy pixels regionwide, the nature of cloudy pixel
reconstruction with signal processing and machine learning techniques may greatly
expand the general utility of fused or merged images. Chapter 13 presents a holistic
discussion of integrated data fusion and machine learning for intelligent feature
extraction. Chapter 14 demonstrates the highest level of synergy with a SoSE approach
that enables readers to comprehend the sophistication of integrated cross-mission data
merging, fusion, and machine learning algorithms flexibly toward better environmental
surveillance. From various SoSE approaches, the integrated data fusion, merging, and
machine learning processes may be evaluated by a set of indices for performance
evaluation.
• Part V—Remote Sensing for Environmental Decision Analysis: Expanding upon the
predictive capabilities from multisensor data fusion, merging, image reconstruction,
and machine learning, the area of environmental applications may include but are not
limited to air resources management (Chapter 15), water quality management (Chapter
16), ecosystem toxicity assessment (Chapter 17), land use and land cover change detection
(Chapter 18), and air quality monitoring in support of public health assessment (Chapter
19). These case studies from Chapter 15 to Chapter 19 will be systematically organized
in association with current state-of-the-art remote sensing sensors, platforms, tools, and
methods applied for environmental decision making.
Introduction 7

REFERENCES
Bai, K. X., Chang, N. B., and Chen, C. F., 2015. Spectral information adaptation and synthesis scheme for
merging cross-mission consistent ocean color reflectance observations from MODIS and VIIRS. IEEE
Transactions on Geoscience and Remote Sensing, 99, 1–19.
Castanedo, F., 2013. A review of data fusion techniques. The Scientific World Journal, 2013, 704504.
Chang, N. B., Bai, K. X., and Chen, C. F., 2015. Smart information reconstruction via time-space-spectrum
continuum for cloud removal in satellite images. IEEE Journal of Selected Topics in Applied Earth
Observations, 99, 1–19.
Chang, N. B., Bai, K. X., Imen, S., Chen, C. F., and Gao, W., 2016. Multi-sensor satellite image fusion,
networking, and cloud removal for all-weather environmental monitoring. IEEE Systems Journal, 1–17,
DOI: 10.1109/JSYST.2016.2565900.
Chen, H. W., Chang, N. B., Yu, R. F., and Huang, Y. W., 2009. Urban land use and land cover classification
using the neural-fuzzy inference approach with Formosat-2 Data. Journal of Applied Remote Sensing,
3, 033558.
Consortium of Universities for the Advancement of Hydrologic Science, Inc. (CUAHSI), [Link]
org/[Link], accessed May 2011.
Hall, D. L. and Llinas, J., 1997. An introduction to multisensor data fusion. Proceedings of the IEEE, 85(1),
6–23.
United States Department of Defense, 2008. Systems Engineering Guide for Systems of Systems, [Link]
[Link]/se/docs/[Link], accessed February 13, 2016.
Volpe, G., Santoleri, R., Vellucci, V., d’Alcalà, M. R., Marullo, S., and D’Ortenzio, F., 2007. The colour of
the Mediterranean Sea: Global versus regional bio-optical algorithms evaluation and implication for
satellite chlorophyll estimates. Remote Sensing of Environment, 107, 625–638.
Zilioli, E. and Brivio, P. A., 1997. The satellite derived optical information for the comparative assessment of
lacustrine water quality. Science of the Total Environment, 196, 229–245.
Part I
Fundamental Principles of Remote
Sensing
2 Electromagnetic Radiation
and Remote Sensing

2.1 INTRODUCTION
Identified by Einstein in 1905, quanta—or photons—stand for the energy packets that are particles
of pure energy; such particles have no mass when they are at rest. While the German physicist
Max Planck was developing the blackbody radiation law, he realized that the incorporation of the
supposition that electromagnetic energy could be emitted only in “quantized” form was the key
to smoothly interpreting the electromagnetic wave. This scientific discovery is the reason he was
awarded the Nobel Prize in Physics in 1918. On the other hand, light must consist of bullet-like tiny
particles, now known as photons, as Einstein pointed out in 1905. The photoelectric effect brought
up by Einstein in 1905 successfully supplemented the quantization supposition that Planck proposed.
This is also the reason Einstein was awarded the 1921 Nobel Prize in Physics. When possessing a
certain quantity of energy, a photon is said to be quantized by that quantity of energy. Therefore,
the well-known “wave-particle” duality entails the findings of Planck and Einstein that all forms
of electromagnetic radiation (EMR) and light behave as waves and particles simultaneously in
quantum mechanics. These findings imply that every quantic entity or elementary particle exhibits
the properties of waves and particles, from which the properties of light may be characterized.
Photons as quanta thus show a wide range of discrete energies, forming a basis for the spectrum
of EMR. Quanta may travel in the form of electromagnetic waves, which provide remote sensing a
classical basis for data collection.
Sunlight refers to the portion of the EMR spectrum given off by the sun, particularly in the
range of infrared, visible, and ultraviolet light. On Earth, before the sunlight can reach ground
level, sunlight is filtered by the atmosphere. The interactions among solar radiation, atmospheric
scattering and reflections, and terrestrial absorption and emission play a key role in the ecosystem
conditions at the surface of Earth. Atmospheric radiative transfer processes with the effect of
transmission, absorption, reflection, and scattering have collectively affected the energy budget
of the atmospheric system on Earth. For example, absorption by several gas-phase species in the
atmosphere (e.g., water vapor, carbon dioxide, or methane) defines the so-called greenhouse effect
and determines the general behavior of the atmosphere, which results in a surface temperature
higher than zero degrees Celsius (273.15 K). In addition to the natural system, human activities
have had a profound impact on the energy budget of the earth system. To some extent, air pollutants
emitted by anthropogenic activities also affect the atmospheric radiative transfer processes and
result in environmental effects and public health impact.
Following this deepened understanding of EMR, wavelength-dependent analyses for remote
sensing data collection are often highlighted with respect to the given band specifications in
the literature. Remote sensing sensor design based on specified bands and center wavelengths
thus becomes feasible for collecting various images for processing, information extraction, and
interpretation. Depending on the goals of each individual application, satellites onboard different
sensors may be regarded as a cohesive task force to achieve a unique mission for earth observation
and environmental monitoring. It is the aim of this chapter to establish a foundation by introducing
a series of basic concepts and methods along this line.

11
12 Multisensor Data Fusion and Machine Learning for Environmental Remote Sensing

2.2 PROPERTIES OF ELECTROMAGNETIC RADIATION

Quanta, or photons, travel as an electromagnetic (EM) wave at the speed of light, as shown in Figure
2.1. They have two components which are also coincident in time in an EM wave; one consists of the
varying electric field and the other is the varying magnetic field. Such a wave can generate varying
levels of EMR when the size and/or direction of the magnetic and electric field oscillating as sine
waves mutually at right angles is changed over time at its source. The wave amplitudes of the two fields
are a measure of radiation intensity, which is also called brightness. In Figure 2.1, the total number of
peaks or crests at the top of the individual up-down or left-right curve passing by a reference point in
one second is defined as that wave’s frequency ν (read as nu), and the frequency is measured in units
of cycles per second, for which the International System of Units (SI) version is Hertz [1 Hertz = s−1].
The distance between two adjacent peaks on a wave is its wavelength λ and ν ∼ 1/λ. Note that c (speed
of light) = λν. Photons traveling at higher frequencies with smaller wavelengths are more energetic.
The level of energy that may characterize a photon can be determined with Planck’s general
equation. When an excited material has gone through a change from a higher energy level E2 to a
lower energy level E1, we may use the formula below to characterize the radiant energy as:

E = hν = h(c/λ ) (2.1)

in which E is the radiant energy of a photon (in joules), h is Planck’s constant (6.63 × 10 −34 Joules-sec
or Watt-sec2), c is the speed of light (=3 × 108 m · s−1), λ is its wavelength (in meters), and v represents
frequency (in hertz).

2.3 SOLAR RADIATION

A beam of radiation that has photons of different energies, such as sunlight, is called a polychromatic
light. The beam is monochromatic if only photons of one wavelength are involved. The EM spectrum
is the range of all types of EM radiation associated with all photon energies. These types of EM
radiation include gamma rays, X-rays, ultraviolet rays, visible, infrared rays, microwaves, and radio
waves, which together make up the entire EM spectrum (Figure 2.2). The EM spectrum covers a wide
range of photon energies, wavelengths, and regions of the EM spectrum, as summarized in Table 2.1.

Electric field
λ = wavelength
ν = frequency

Reference point

Magnetic field

Transmission
Amplitude
direction

FIGURE 2.1 Structure of an electromagnetic wave.

Electromagnetic Radiation and Remote Sensing 13

Gamma ray Ultraviolet Infrared Radio

X-ray Visible Microwave

Wavelength in centimeters
10–12 10–10 10–8 10–6 10–4 10–2 100 102 104

FIGURE 2.2 Comparison of wavelength, frequency, and energy for the electromagnetic spectrum. (NASA’s
Imagine the Universe, [Link] accessed March, 2013.)

TABLE 2.1
Approximate Frequency, Wavelength, and Energy Limits of the Various Regions
of the EM spectrum
Wavelength (m) Frequency (Hz) Energy (J)
Radio Waves >1 × 10 −1 <3 × 10 9 <2 × 10−24
Microwave 1 × 10 ∼ 1 × 10
−3 −1 3 × 10 ∼ 3 × 10
9 11 2 × 10−24 ∼ 2 × 10−22
Infrared 7 × 10 ∼ 1 × 10
−7 −3 3 × 10 ∼ 4 × 10
11 14 2 × 10−22 ∼ 3 × 10−19
Optical 4 × 10 ∼ 7 × 10
−7 −7 4 × 10 ∼ 7.5 × 10
14 14 3 × 10−19 ∼ 5 × 10−19
Ultra Violet 1 × 10 ∼ 4 × 10
−8 −7 7.5 × 10 ∼ 3 × 10
14 16 5 × 10−19 ∼ 2 × 10−17
X-ray 1 × 10−11 ∼ 1 × 10−8 3 × 10 ∼ 3 × 10
16 19 2 × 10−17 ∼ 2 × 10−14

Source: NASA Goddard Space Flight Center-Science Toolbox, 2016.

All parts of the EM spectrum consist of EM radiation produced through different processes. They
are detected in different ways by remote sensing, although they are not fundamentally different
relative to the nature of EM radiation.

2.4 ATMOSPHERIC RADIATIVE TRANSFER

2.4.1 Principles of Radiative Transfer
During the transfer of radiant energy through a medium, such as Earth’s atmosphere, four types of
radiative transfer processes may occur at the same time: transmission, absorption, reflection, and
scattering. Scattering may be combined into the reflection category for simplicity. When considering a
beam of photons from a source passing through medium 1 that impinges upon medium 2, the beam will
experience some reactions of transmission, absorption, and reflection, as explained below (Figure 2.3):

• Transmission—Some fraction of the total radiation energy in a beam of photons may

penetrate certain surface materials.
• Absorption—Some fraction of the total radiation energy in a beam of photons may be
absorbed through molecular or electronic reactions within the medium; a fraction of this
absorbed radiation energy is re-emitted at longer wavelengths as thermal emissions, and
the rest of it remains while heating the target.
• Reflection—Some fractions of the total radiation energy in a beam of photons reflect at
specific angles moving or scattering away from the target at different angles, depending on
the incidence angle of the beam and the surface roughness.
14 Multisensor Data Fusion and Machine Learning for Environmental Remote Sensing

Transmission Reflection
θ1 θ1 θ2 θ1 = θ2
Medium 1 θ1 > θ2

θ2
Medium 2

Medium 1
θ1

Scattering Absorption
Emission

Emission

FIGURE 2.3 Radiative transfer processes through a medium.

For energy conservation, the summation of the fraction of the total radiation energy associated with
transmission, absorption, and reflection must be equal to 1.
Similarly, the atmosphere filters the energy delivered by the sun and emitted from Earth while
performing radiative transfer. A series of radiative transfer processes may collectively describe
the interaction between matter and radiation; such interactions might involve matter such as gases,
aerosols, and cloud droplets in the atmosphere and the four key processes of absorption, reflection,
emission, and scattering. Whereas scattering of an incident radiation by the atmospheric matter
results in a redistribution of the radiative energy in all directions, absorption of an incident radiation
by the atmospheric matter results in a decrease of radiative energy in the incident direction. Figure
2.4 conceptually illustrates the radiative transfer processes through an atmospheric layer. In such

Reflected solar Incoming solar Outgoing

radiation radiation longwave
radiation

Aerosols
scattering
Emitted by the
atmosphere Atmospheric
emission

Absorbed
by the
atmosphere
Reflected by
surface Surface Black
radiation radiation

Surface characterization
Temperature, albedo, emissivity, composition

FIGURE 2.4 Radiative transfer processes through an atmospheric layer. (NASA Goddard Space Flight
Center, PSG, 2017. [Link] accessed December, 2017.)
Electromagnetic Radiation and Remote Sensing 15

an environment, whether radiation is absorbed or transmitted depends on the wavelength and the
surface properties of the matter in the atmospheric environment. When an air-borne or a space-
borne sensor views Earth, reflection and refraction are two key radiative transfer processes; these
two processes are discussed in detail below.

2.4.2 Reflection
Our ability to see luminous objects with our eyes depends on the reflective properties of light, as
does an air-borne or a space-borne sensor. In the earth system, types of reflections at the surface
of Earth include specular and diffuse reflection (Figure 2.5). Factors affecting surface reflectance
on Earth include absorption features (e.g., water, pigments, and minerals) at ground level, surface
roughness, and observation and illumination angles. Specular reflection occurs on smooth surfaces
(Figure 2.5a) whereas varying surface roughness may result in Lambertian or diffuse reflectance
(Figure 2.5b). Note that diffuse reflection that is also termed Lambertian bidirectional reflection
distribution function (BRDF) occurs on rough surfaces (Figure 2.5b) such as forest or agricultural
fields. In diffuse reflection, the roughness of the surface results in variations of the normals
along the surface; however, all of the reflected rays still behave according to the law of reflection,
which states that the incident ray, the reflected ray, and the normal to the surface of the mirror
all lie in the same plane. The BRDF depends on wavelength and the BRDF function is based on
illumination geometry and viewing geometry, which is determined by the optical and structural
properties of the surface. These properties include but are not limited to: multiple scattering, facet
orientation distribution, facet density, shadow casting, mutual shadowing, reflection, absorption,
transmission, and emission by surface objects. Hence, BRDF is related to Lambertian reflection,
which defines how light reflected at an opaque surface differs from what we may see with our
eyes with respect to the same scene when Earth moves over different positions relative to the sun.

2.4.3 Refraction
Refraction is a light movement effect that happens between transparent media of different densities
in which the transparent media can be air, water, or even snow on dearth (Robinson, 1997). The
bending effect of light in association with the refraction media is a physical representation of the
longer time it takes for light to move through the denser of two media (Figure 2.6). The level of
refraction is dependent on the incident angle with which the ray of light strikes the surface of the
medium. Given that the temperature of the atmospheric layers may vary with height, this effect
could affect the density of air, and bias of the signals collected by remote sensing sensors could
impact the measurement accuracy.

(a) Specular reflection (b) Diffuse reflection

θ1 = θ2 (combined diffuse and partial specular
possible depending on surface)
θ1 = angle of θ2 = angle of
incidence reflection θ1
Polished Unpolished Partial
θ1 θ2 textured specular
smooth
surface surface

EX: mirrors EX: reflective ceilings

and interior walls
Opaque material Opaque material

FIGURE 2.5 Types of reflections.

16 Multisensor Data Fusion and Machine Learning for Environmental Remote Sensing

Refraction and direct transmission

Perceived position of an object due to refraction
θ1 = angle of The impact of
incidence refraction through
1/8″ glass is small Air
1/8″ clear
Apparent position
single
Air of object
glass

θ2 Glass Water

θ2 = angle of
EX: glass block Air Actual position Air
reflection θ1
of object

FIGURE 2.6 Types of refraction layers.

2.5 REMOTE SENSING DATA COLLECTION

2.5.1 Atmospheric Windows for Remote Sensing
In earth system science, the interactions among solar radiation, atmospheric matter, and terrestrial
objects play a critical role for conditions at Earth’s surface. Line-of-sight is defined as an imaginary
line from the sensor to a perceived object. When a satellite sensor has a line-of-sight with an object
that is reflecting solar energy, the sensor collects that reflected energy and stores the observation for
further analysis. In summary, the radiative transfer processes that account for the reflected energy
include: (1) reflection, (2) emission, and (3) emission-reflection. As shown in Figure 2.4, emission is
associated with outgoing longwave radiation that is EMR emitted by the Earth-atmosphere system
in the form of thermal radiation. As mentioned before, EM propagation of the electromagnetic
radiation in an atmosphere is affected by the state of the atmospheric conditions and the atmospheric
composition. Whereas the former is related to temperature, pressure, and air density, the latter
is associated with gases and particulates in the air. Gases can have a strong scattering effect on
incoming radiation. Gases can absorb (and/or emit) a fraction of this incoming radiation energy
depending on their molecular structure. In addition, particulates including aerosol and clouds in the
air can also scatter radiation energy, and some of them can absorb (and/or emit) radiation energy
depending on their refractive properties, which are driven by composition. Consequently, reflected
solar radiation plus emission and emission-reflection radiation as measured by satellite sensors can
produce remote sensing data and tell us about the physical, chemical, and biological properties of
objects in the air or at ground level.
Remote sensing data are collected by multiple channels or bands which divide the EMR ranging
from ultraviolet to radio waves (Table 2.1). These remote sensing data are called multichannel data,
multiband data or multispectral data. Remote sensing data are then digitalized through a process
of sampling and quantization of the electromagnetic energy detected and gathered by space-borne,
air-borne, sea-based, or ground-based sensors. To save the information embedded over different
wavelengths, remote sensing data use different quantization levels in the process of data gathering,
digitization, and storage. Remote sensing data are always compared with other data collected by the
same or different satellite platforms to decide whether the quality is good or bad based on selected
quality indexes. Yet comparison of data with quality indexes can only be made possible when both
use the same quantization levels.
Remote sensing data collection is limited by atmospheric windows, however. This implies
that Earth’s atmosphere prevents several types of solar EM from reaching Earth’s surface.
Electromagnetic Radiation and Remote Sensing 17

Visible
Ultra-
Gamma X-ray violet Infrared Microwave Radio
(UV) (IR)
Shorter waves Longer waves

Thermosphere
(auroras)

Mesosphere
(meteors burn up)

Stratosphere
Atmosphere

(ozone layer at 20–30 km;

jets fly at 10 km)

Troposphere
(weather)

Optical
“window” Radio “window”

FIGURE 2.7 Atmospheric windows and radiative transfer. (Modification of work by STScI/JHU/NASA.)

The illustration of Figure 2.7 shows how far different portions of the EM spectrum can move
forward before being absorbed in the atmosphere. It is notable that only portions of visible light,
infrared, and some ultraviolet light can reach Earth’s ground surface or make it to sea level. EM
radiation from space that is able to reach the surface of Earth through the atmosphere window
provides a wealth of ground-leaving or water-leaving reflectance data for remote sensing sensors
to collect.
Such atmospheric windows deeply affect the assessment of the environmental sustainability of the
earth system. For instance, the stratosphere, located in the upper atmosphere, and the troposphere,
located in the lower atmosphere, are chemically identical in terms of ozone molecules. However,
these ozone molecules have very different roles in these two layers and very different effects on
life systems on the surface of Earth. As shown in Figure 2.7, stratospheric ozone filters most of the
solar ultraviolet radiation, playing a beneficial role by absorbing most of the biologically damaging
ultraviolet sunlight, known as UV-B. Ozone near the ground surface in the tropospheric layer not
only lacks the filtering action of the ozone layer, but is also toxic to life on Earth. In addition, the
change of water-leaving reflectance associated with different wavelengths of visible light can be
regarded as a surrogate index for monitoring water pollution. Terrestrial thermal emissions are
correlated with the evapotranspiration process through all plant species, which may be monitored
by remote sensing to understand the ecosystem status.
18 Multisensor Data Fusion and Machine Learning for Environmental Remote Sensing

2.5.2 Specific Spectral Region for Remote Sensing

Light can be used to “see” an object at the Earth’s surface if and only if light has a wavelength
about the same order of magnitude as or smaller than the object. Given the allowable penetration
of the EM waves through limited atmospheric windows, the approximate scale of the wavelength
at the objective level can be associated with a radiation-type wavelength for remote sensing
design. For example, visible light and infrared regions of the spectrum may span the wavelengths
suitable for studying human society and the ecosystem, which can then be used to support a
myriad of earth science applications. Many satellite sensors are designed based on this range of
the EM spectrum. However, remote sensing associated with the visible light and infrared regions
of the spectrum works with two different types of EMR (Eismann, 2012). One works with the
reflective spectral region, and the other works with the emissive spectral region (i.e., thermal
radiation) (Figure 2.8) (Eismann, 2012). This is due to the nature of the thermal equilibrium at
the ground level. In fact, remote sensing sensors can measure these two types of radiation sources
from the reflective spectral and/or emissive spectral region in terms of spectral radiance at the
sensor (Figure 2.8).

2.5.3 Band Distribution for Remote Sensing

According to Figure 2.7, remote sensing sensors may be designed to measure reflected radiation
over the microwave, visible light, and infrared regions of the spectrum throughout the appropriate
atmospheric windows. Microwave radiation with longer wavelengths can penetrate through cloud,
haze, and dust because the longer wavelengths are not susceptible to atmospheric scattering, which
only affects shorter optical wavelengths. Although microwave remote sensing may have the capacity
for all-weather and day or night imaging, microwave remote sensing has a much smaller signal-
to-noise ratio than optical remote sensing. Thus, optical and microwave remote sensing could be
complementary when observing objects at the ground or sea level.
Overall, band distribution in specific spectral regions for sensor design over these atmospheric
windows can be a challenging task. While the band distribution in visible light and infrared regions
of the spectrum may be designed to meet the needs of various hyperspectral and multispectral
remote sensing applications flexible to any central wavelength, microwave remote sensing is
normally designed based on the fixed band distribution (Table 2.2).

Reflective spectral region Emissive spectral region

day-only night-only

Reflected Thermal
Radiance sensor

sunlight emission

Blue Green Red Near infrared Shortwave infrared Midwave infrared Longwave infrared

0.4 0.5 0.6 0.7 1.1 3.0 5.0 14.0

Wavelength (microns)

FIGURE 2.8 Definition of specific spectral regions and visible and infrared radiation.
Electromagnetic Radiation and Remote Sensing 19

TABLE 2.2
Band Distribution of Microwave Remote Sensing and Related Properties
Designation Wavelength Range Frequency (GHz) Applications

Ka 5.0–11.3 mm 26.5–40 Earth observation, satellite communications

K 11.3–16.7 mm 18–26.5 Earth observation, radar, satellite communications
Ku 16.7–25 mm 12–18 Earth observation, satellite communications
X 25–37.5 mm 8–12 Satellite communications, radar, terrestrial broadband
C 3.75–7.5 cm 4–8 Earth observation, long-distance radio telecommunications
S 7.5–15 cm 2–4 Weather radar, satellite communications, surface ship radar
L 15–30 cm 1–2 Earth observation, global positioning system, mobile phones
P 30–100 cm >0.39 Earth exploration for various applications

2.6 RATIONALE OF THERMAL REMOTE SENSING

2.6.1 Thermal Radiation
Some types of radiation processes in remote sensing may account for thermal emissions. Blackbody
temperature plays a critical role in better understanding thermal emissions. As mentioned before,
Planck’s major research back in the 1910s dealt with blackbody temperature. A blackbody that is a
perfect absorber (emitter) of radiation is one that absorbs all the EMR falling on it at all wavelengths,
and a blackbody emits more radiation than any other type of object at the same temperature. To stay
in a thermal equilibrium, it must emit radiation at the same rate as it absorbs. Blackbody radiation
at a given wavelength depends on the temperature only. Radiation emitted by a blackbody is
homogeneous, isotropic, and unpolarized. Any two blackbodies emit the same radiation at the same
temperature. In this context, thermodynamic equilibrium is defined as the state of radiation and
matter inside an enclosure under isolated constant temperature. However, the atmosphere is not in
thermodynamic equilibrium because its pressure and temperature vary in terms of position. Hence,
the atmosphere is usually subdivided into small subsystems, assuming that each subsystem holds
the isobaric and isothermal condition generally referred to as local thermodynamic equilibrium.
In fact, every object emits electromagnetic radiation based on its temperature. In Figure 2.9, the
temperature of objects at which this radiation emits the most intense wavelength is highlighted.
Emitted radiation can be reflected by the particles in the air. However, objects around room
temperature radiate mainly in the infrared, as seen in Figure 2.9, in which the graph shows examples
for three objects including the sun, Earth, and a candle, with widely different temperatures given in
Kelvin (K = C + 273.15). Solar irradiance is defined as the power per unit area received from the
Sun in the form of EMR in the wavelength range of the measuring instrument. The solar irradiance
integrated over time is called solar irradiation, insolation, or solar exposure. Total irradiance at the
Top of Atmosphere (TOA) level on Earth in this case is defined as the sum of solar irradiance, direct
irradiance, and diffuse irradiance. For example, the solar radiation spectrum is described by the
blackbody radiation curve on the top, which has the temperature of the sun (close to 6,000 K). The
peak of the sun’s energy emission is exactly in the range of wavelengths of visible light, although it
also emits a portion of infrared and ultraviolet. Conversely, as shown in the green curve in the same
graph, the peak of Earth’s energy emission is in the range of wavelengths of infrared only. Given
that the green curve stands for Earth’s condition, it is indicative that Earth emits orders of magnitude
less radiation and that radiation peaks are mainly in the infrared region. This will differentiate the
type of emission from the same object experiencing differing temperatures.
20 Multisensor Data Fusion and Machine Learning for Environmental Remote Sensing

107 T = 5785 K: Sun

T = 1670 K: candle
T = 255 K: Earth
106

105
Energy emitted, w/m2-micron-sr

104

103
UV Infrared
102 Radiation

101

100

10–1

10–2
10–1 100 101 102
Wavelength, microns

FIGURE 2.9 Comparisons of the spectral emittance from a 6,000 K Blackbody with a candle and total
irradiance at Earth’s surface at the TOA level. (NASA Data Center, 2016)

2.6.2 Energy Budget and Earth’s Net Radiation

The solar constant is the portion of total radiation that is emitted by the sun and reaches the TOA on
Earth. In fact, it is not a constant, but is a function of several factors such as sun spots, sun activity,
and the distance between sun and Earth. As discussed in Section 2.4, sunlight will pass through
the atmospheric layers and reach the ground level after experiencing the scattering and reflection
effects due to gaseous species and particles in the air. After reaching the ground level, a fraction of
the incident sunlight reflected from Earth’s surface will go back to the atmosphere. The residual sun
radiation not reflected is absorbed by the surface. This part of the energy that evaporates water as
well as sublimates/melts snow and ice with latent heat raises the surface temperature with sensible
heat, and energizes the heat exchange in the turbulent boundary layer, which affects the weather
conditions. The surface albedo that accounts for the reflection of radiation that is a function of both
surface and atmospheric properties is a key attribute in remote sensing from space.
In general, the state of the atmosphere is determined by its energy budget, which is comprised
of the solar and terrestrial radiation fluxes as well as the latent and sensible heat fluxes, all of
which are related to EMR. Note that flux can be used to delineate longwave radiation, shortwave
radiation or the total of shortwave and longwave radiation that is moving up or down in the
earth system. While latent heat is related to changes in phase between gases, liquids, and solids,
sensible heat is related to changes in temperature of an object or a gas without any change in
phase. Broadly speaking, sensible heat is the radiative flux caused by conduction and convection.
For instance, heat will be conducted into the atmosphere and then convection will follow when
a warm front meets a cold front in the planetary boundary layer. Latent heat is also regarded as
the heat moved up into the atmosphere by water evaporation and condensation processes. The
rate of heat penetration into the soil depends on the thermal diffusivity, which is defined as the
thermal conductivity divided by the heat capacity. In fact, the thermal conductivity stands for
how well heat is conducted through the soil layer, and the heat capacity is related to how much
Electromagnetic Radiation and Remote Sensing 21

heat it takes to increase the unitary temperature of the soil layer. The lower the value of the
thermal diffusivity, the less the temperature rises further into the soil, and the higher the reflected
radiation into the atmosphere. Net radiation in this context is the amount of energy actually added
to the earth system.
Earth’s net radiation is the balance between outgoing and incoming energy at the TOA level. The
solar energy arriving at the surface can vary from 550 W/m2 with cirrus clouds to 1025 W/m2 with
a clear sky (Krivova et al., 2011). Earth and the atmosphere absorb 341 W/m2 of solar radiation on
average annually (Johnson, 1954). In view of the energy budget delineation in Figure 2.4, outgoing
longwave radiation is EMR emitted from Earth and its atmosphere out to space in the form of
thermal radiation through both soil layers and atmospheric layers. Most of the outgoing longwave
radiation has wavelengths (from 4 to 100 µm) in the thermal infrared part of the electromagnetic
spectrum. In fact, our planet’s climate is driven by absorption, reflection, shortwave or longwave
emission, and scattering of radiation within the atmosphere due to the presence of thin clouds,
aerosol, and some gases. Cases can be seen in some extreme weather events such as tropical storms
and hurricane assessment.

2.7 BASIC TERMINOLOGIES OF REMOTE SENSING

Measurements of solar radiation are based on a few parameters in remote sensing defined below
(Figure 2.10):

• Azimuth: Geographic orientation of a line given as an angle measured in degrees clockwise

from the north.
• Azimuth direction: This is the direction of a celestial object, measured clockwise around
the observer’s horizon from the north. In air-borne or space-borne radar imaging, the
direction in which the aircraft or satellite is heading is the azimuth direction, which is the
same as the flight direction.
• Zenith angle: Zenith angle is the angle between the sun and the vertical and it is a
complementary angle to solar elevation or altitude.
• Nadir point: Nadir point is the direction at the ground level directly in line with the remote
sensing system and the center of the earth. In other words, the nadir at a given point is the
local vertical direction pointing in the direction of the force of gravity at that location.
• Off-Nadir: Any point not directly beneath a scanner’s detectors, but rather off at an angle.
• Solar azimuth angle: The azimuth angle of the sun.

Sun
Zenith
Altitude

S N

uth
Azim

FIGURE 2.10 Basic terminologies of remote sensing for earth observation.

22 Multisensor Data Fusion and Machine Learning for Environmental Remote Sensing

• Top of the atmosphere (TOA): TOA is defined as the outermost layer of Earth’s atmosphere,
which is the upper limit of the atmosphere—the boundary of Earth that receives sun
radiation.
• Albedo: Albedo means whiteness in Latin, and refers to the fraction of the incident sunlight
that the surface reflects. The residual radiation not reflected is then absorbed by the surface.
• Spherical albedo of the atmosphere: Spherical albedo is the average of the plane albedo
over all sun angles. Spherical albedo of the atmosphere is the effective albedo of an entire
planet that is the average of the plane albedo over all sun angles at TOA.

2.8 SUMMARY
In this chapter, some basic properties of light and concepts of EMR are introduced sequentially to
support the basic understanding of remote sensing. The discussion is followed by remote sensing
data collection conditional to atmospheric windows and specified band regions. In addition, the
global energy budget in relation to thermal radiation is presented to provide a complementary
view of thermal emission relative to sun light reflection. The chapter ends by including the basic
terminologies of remote sensing for environmental monitoring and earth observation.

REFERENCES
Eismann, M. T., 2012. Hyperspectral Remote Sensing. SPIE Press, Bellingham, Washington, USA.
Johnson, F. S., 1954. The solar constant. Journal of Meteorology, 11, 431–439.
Krivova, N. A., Solanki, S. K., and Unruh, Y. C., 2011. Towards a long-term record of solar total and spectral
irradiance. Journal of Atmospheric and Solar-Terrestrial Physics, 73, 223–234.
Robinson, D. A., 1997. Hemispheric snow cover and surface albedo for model validation. Annals of Glaciology,
25, 241–245.
National Aeronautics and Space Administration (NASA) Data Center, [Link]
radiation-energy-transfer/, accessed June, 2016.
National Aeronautics and Space Administration (NASA) Goddard Space Flight Center, [Link]
[Link]/science/toolbox/[Link], accessed June, 2016.
National Aeronautics and Space Administration (NASA) Goddard Space Flight Center, PSG, 2017. [Link]
[Link]/[Link], accessed December, 2017.
National Aeronautics and Space Administration (NASA) Goddard Space Flight Center, [Link]
[Link]/science/toolbox/[Link], accessed August, 2016.
3 Remote Sensing Sensors
and Platforms

3.1 INTRODUCTION
Solar energy moves from the sun to Earth and finally to the satellite sensors onboard a variety of
platforms for measurement. Remote sensing images and data provide critical information about
how the natural system can be sustained over time. Such radiative transfer processes reveal how
the solar energy is partitioned into different compartments in the natural system. Remote sensing
images and data with different spatial, spectral, radiometric, and temporal resolution need to
be pre-processed, retrieved, analyzed, interpreted, and mapped in an iterative and holistic way
to support various types of decision analysis for sustainable development. In cases that require
involving more than one satellite for applications, automated data merging and/or fusion processes
for dealing with challenging problems are critical for supporting human decision making, which
requires linking data with information, knowledge discovery, and decision analysis to achieve
timely and reliable projections of a given situation in a system (Figure 3.1), such as climate change
impact.
Consequently, as mentioned in Chapter 2, the following energy partition terminologies in the
radiative transfer processes in the natural environment deeply influence the system design of both
sensors and platforms and deserve our attention:

• Transmitted energy—The energy that passes through a medium with a change in the
velocity of the light as determined by the refraction index for two adjacent media of interest.
• Absorbed energy—The energy that is surrendered to the target through electron or even
molecular reactions.
• Reflected energy—The energy bounced back with an angle of incidence equal to the angle
of reflection.
• Scattered energy—The energy that is diffused into the air with directions of energy
propagation in a randomly changing condition. Rayleigh and Mie scattering are the two
major types of scattering in the atmosphere.
• Emitted energy—The energy that is first absorbed, then re-emitted as thermal emissions at
longer wavelengths while the target, such as the ground level, heats up.

As mentioned, remote sensing images and data play a critical role in understanding the solar
energy paths and extracting features associated with targets. Before conducting data merging and/
or fusion, which fuses or merges images of different spatial, temporal, and spectral resolution, there
is a need to understand important functionalities of different sensors and platforms individually or
collectively. This chapter thus aims to investigate quality sensors and platforms capable of supporting
multisensor data merging and/or fusion based on synthetic aperture radar, infrared, and optical
remote sensing images and data. The following sections present different classification principles of
sensors and platforms to establish a fundamental understanding of remote sensing systems as well
as the current, historic, and future missions of remote sensing with inherent connections. These
sensors and platforms are regarded as enabling technologies for monitoring solar radiation and
improving our comprehension of the interactions between solar energy and materials over different
wavelengths at the ground level or in atmospheric layers. The sensor, bands, spatial resolution, swath
width, spectral range, and temporal resolution associated with a suite of major multispectral remote

23
24 Multisensor Data Fusion and Machine Learning for Environmental Remote Sensing

Data Monitoring, investigation

Information Database management,

information retrieval

Knowledge Data mining,

systems analysis

Decision making Management

alternatives

FIGURE 3.1 Contribution of remote sensing images and data to sustainable development.

sensing platforms are highlighted for demonstration. An understanding of this relevant knowledge
may lead to optimizing the system planning and design of data merging and/or fusion, meeting
the overarching goal of the system of systems engineering, when more sensors and platforms with
different features become available and join the synergistic endeavor.

3.2 REMOTE SENSING PLATFORMS

A remote sensing system is an integrated software and hardware system structured by sensors,
instruments, platforms, receiving stations, data storage centers, and a control room. Platforms in such
systems have normally included various mounted sensors and grouped instruments such as global
positioning systems. The platform design, which is geared toward the optimal payload distribution,
determines a number of attributes. The payload for remote sensing satellites, for example, can
typically include photographic systems, electro-optical sensors, and microwave systems. More than
one sensing system can be housed on a single satellite to increase the benefit of simultaneous coverage
by different sensors. There are auxiliary devices for recording, pre-processing, compressing, and
transmitting the data in addition to these sensing systems. These attributes include periodicity and
timing of image acquisition, orbiting geometry or flight path, types of sensors and instruments for
the viewing object of interest, the longitudinal extent of the swath, and even the trade-off between
swath extent (coverage) and resolution.
There are three broadly based categories of remote sensing platforms: ground- or sea-based, air-
borne, and space-borne platforms. They are discussed below.

3.2.1 Space-Borne Platforms
The most popular platform for remote sensing aloft is a space-borne satellite. Over three-thousand
remote sensing satellites have been launched since 1957 at which Russia launched the first man-
made satellite of Sputnik 1. In addition, the space shuttle, which functions as a remote sensing
satellite, belongs to this category. However, the space shuttle can be reused for multiple missions,
unlike satellites. The path of a satellite in space is referred to as its orbit. Satellites can be classified
based on either orbital geometry or timing for image acquisition. Two types of orbits, including
geostationary/equatorial and polar/Sun synchronous, are commonly used as a broad guideline
for the classification of remote sensing satellites (Natural Resources Canada, 2017). These orbits
are fixed after launch and can be only slightly adjusted to maintain their anticipated position for
Remote Sensing Sensors and Platforms 25

environmental monitoring and earth observation over time. The type of orbit that affects the design
of the sensor onboard determines its altitude with respect to Earth and the limit of its instantaneous
field of view (i.e., the area on Earth which can be viewed at any moment in time).
In general, geostationary or equatorial satellites are designed to have a period of rotation equal to
24 hours, the same as that of the Earth, making these satellites consistently stay over the same location
on the top of Earth. These geostationary satellites must be placed at a very high altitude (∼36,000 km)
to maintain an orbital period equal to that of Earth’s rotation and appear to be stationary with respect
to Earth, as illustrated in Figure 3.2. Due to the stationarity, any sensor onboard these satellites can
only view the same area of Earth over a very large area because of the high altitude. Such satellites
normally circle Earth at a low inclination in an equatorial orbit (i.e., inclination is defined as the
angle between the orbital plane and the equatorial plane). This type of system design of geostationary
orbits can meet the needs for communications and weather monitoring, hence many of them are
located over the equator. However, the space shuttle chose an equatorial orbit with an inclination
of 57 degrees. The space shuttle has a low orbital altitude of 300 km, whereas other common polar
satellites typically maintain orbits ranging from 200 to 1,000 km.
Polar-orbiting or sun-synchronous satellites are designed to pass above (i.e., polar) or nearly
above (i.e., sun-synchronous or near-polar orbits) each of Earth’s poles periodically. Polar or sun-
synchronous orbits are thus the most common orbits for remote sensing due to the need to provide
illumination for passive sensors. Note that although active sensors such as LiDAR and radar do not
need the Sun’s illumination for image acquisition, passive sensors count on solar energy as a source
of power. We will define active and passive sensors later in this chapter. Both types of polar-orbiting
satellites with similar polar orbits can pass over the equator at a different longitude at the same
local sun time on each revolution, as illustrated in Figure 3.2. The satellite revisit time (or revisit
interval or revisit period) is the time elapsed between two successive observations of the same
point on Earth, and this time interval is called the repeat cycle of the satellite. Each repeat cycle
enables a polar-orbiting satellite to eventually see every part of Earth’s surface. A satellite with a
near-polar orbit that passes close to the poles can cover nearly the whole earth surface in a repeat
cycle depending on sensor and orbital characteristics. For most polar-orbiting or sun-synchronous
satellites, the repeat cycle ranges from twice a day to once every 16 days. Real-world examples
include Landsat and the well-known Earth Observing System (EOS) series satellites such as Terra
and Aqua. Such an attribute of global coverage is often required for holistic earth observation.
Data collected by most remote sensing satellites can be transmitted to ground receiving stations
immediately or can be temporarily stored on the satellite in a compressed form. This option depends
on whether the receiving station has a line of sight to the satellite when the satellite wishes to

Sun synchronous orbit

(or polar orbit) Geo
200–1,000 km high stat
36,0 ionary
00 k o
m h rbit
igh

FIGURE 3.2 Orbital geometry of space-borne satellites.

26 Multisensor Data Fusion and Machine Learning for Environmental Remote Sensing

transmit the data. If there are not enough designated receiving stations around the world to be in
line with the satellite, data can be temporarily stored onboard the satellite until acquiring direct
contact with the ground-level receiving station. Nowadays, there is a network of geosynchronous
(geostationary) communications satellites deployed to relay data from satellites to ground receiving
stations, and they are called the Tracking and Data Relay Satellite System (TDRSS). With the
availability of TDRSS, data may be relayed from the TDRSS toward the nearest receiving stations
without needing to be stored temporarily onboard the satellite.

3.2.2 Air-Borne Platforms
Air-borne platforms collect aerial images with cameras or sensors; airplanes are currently the most
common air-borne platforms. When altitude and stability requirements are not limiting factors
for a sensor, simple, low-cost aircraft such as an Unmanned Aerial Vehicle (UAV) can be used as
platforms, too. If instrument stability and/or higher altitude requirements become essential, more
sophisticated high-altitude aircraft platforms that can fly at altitudes greater than 10,000 meters
above sea level must be employed for remote sensing applications. Included in this case are the fixed-
wing, propeller-driven planes handled by pilots. Although they are limited to a relatively small area,
all air-borne platforms are more suitable for acquiring high spatial resolution data. Remote sensing
instruments may be either outfitted in the underside of the airplane or simply hung out the door of the
airplane using simple mounts. On the other hand, mid-altitude aircraft have an altitude limit under
10,000 meters above sea level, and are used when stability is demanded and when it is essential for
acquiring imagery remotely that would not be available from low-altitude aircraft such as helicopters
or UAV. A real-world example includes the C-130 air-borne platform, owned by the National Center
for Atmospheric Research (NCAR) and the National Science Foundation (NSF) in the United States.

3.2.3 Ground- or Sea-Based Platforms

Ground- or sea-based platforms with instruments can be used to measure indirect illumination,
which is also known as diffuse illumination at the surface of Earth, from the sky as opposed
to direct sunlight. They can be handheld devices, tripods, towers, buoys or cranes on which the
instruments are mounted. They are used for monitoring not only atmospheric phenomenon, but
also long-term terrestrial or oceanic features for close-range characterization of objects. Real-world
examples include: (1) the NSF-owned ground-based, polarimetric weather radars that provide us with
measurements of precipitating cloud systems with different movement, structure, and severity; and
(2) the United States Department of Agriculture-owned Ultraviolet-B Monitoring Climatological
and Research Network.

3.3 REMOTE SENSING SENSORS

Sensors can be onboard satellites or airplanes or mounted at ground- or sea-based stations for
measuring electromagnetic radiation at specific wavelength ranges, usually named bands. Remote
sensing systems measuring energy distribution that is naturally available through radiative transfer
processes are called passive sensors (Figure 3.3a). Because the sun’s energy is either reflected, as it
is for visible wavelengths, or absorbed and then re-emitted, as it is for thermal infrared wavelengths,
the energy being measured by passive sensors may be associated with visible light, infrared, and/
or the microwave spectrum. In remote sensing systems, these measurements are quantized and
converted into a digital image, where each pixel (i.e., picture element) has a discrete value in units
of Digital Number (DN). These space-borne passive satellite sensors operate in the predesigned
frequency bands allocated to the satellites, and these bands are driven by fixed physical properties
such as molecular resonance of the object being monitored. Once designed, these frequencies do
not change and information embedded in these bands is unique and cannot be duplicated in other
Remote Sensing Sensors and Platforms 27

(a) (b)

FIGURE 3.3 The difference between (a) passive sensors and (b) active sensors. (National Aeronautics and
Space Administration (NASA), 2012. [Link]
funfacts/txt_passive_active.html, accessed May 2017)

frequency bands. On the other hand, active sensors send out their own energy for illumination. This
means the sensors can emit their own radiation directed toward the target of interest. The reflected
radiation from that target of interest is detected and measured by the sensor (Figure 3.3b).
Regardless of the type of sensors that are radiation-detection imagers, the resulting images embedded
with these DN values associated with each band have different resolutions, as summarized below.

• Spatial resolution: Spatial resolution is usually measured in pixel size, depending on focal
length, detector size, and sensor altitude. Spatial resolution is a key factor required for the
discrimination of essential features.
• Spectral resolution: Spectral resolution is the density of the spectral bands in the
electromagnetic spectrum of multispectral or hyperspectral sensors; each band corresponds
to an image.
• Radiometric resolution: Radiometric resolution, usually measured in binary digits (bits),
is the range of available brightness values corresponding to the maximum range of DNs in
the image, specifying the ability of a sensor to distinguish the differences in brightness (or
grey-scale values) while acquiring an image. For example, an image with 8-bit resolution
has 256 levels of brightness (Richards and Jia, 2006).
• Temporal resolution: Temporal resolution is the time required for revisiting the same area
of Earth (NASA, 2013).

Most of the current satellite platforms for remote sensing follow polar or near-polar (i.e., sun-
synchronous) orbits. These satellites travel to the northern pole on one side of Earth (ascending
passes) and then proceed toward the southern pole on the second half of their orbital paths
(descending passes). If the orbit is a sun-synchronous orbit rather than a pure polar orbit, the
descending mode is on the sunlit side, while the ascending mode is normally on the shadowed side of
Earth. Passive optical sensors onboard these satellites may record reflected images from the surface
on a descending pass, when solar illumination is available. Unlike active sensors, which count on
their own illumination, passive sensors only record emitted radiation (e.g., thermal radiation) and
can also image the surface of Earth on ascending passes. Active and passive sensors are further
classified in detail below.
28 Multisensor Data Fusion and Machine Learning for Environmental Remote Sensing

3.3.1 Passive Sensors

Passive sensors used in remote sensing systems measure the radiance corresponding to the brightness
along the direction toward the sensor. Due to the radiative transfer processes, sensors are used to
measure the sum of direct and indirect reflectance, defined as the ratio of reflected versus total
power energy, through a set of spectral reflectance curves for different targets (Figure 3.4). The
spectral signature is the reflectance that is a function of wavelength, in which each target material
has a unique signature useful for feature extraction and classification. Passive sensors have a variety
of applications related to ecology, urban planning, geology, hydrology, meteorology, environmental
science, and atmospheric science. For example, multispectral remote sensing images have been used
widely to monitor water quality (Arenz et al., 1995; Agren et al., 2008; Chang et al., 2013; Chang
et al., 2014) and air quality (King and Byrne, 1976; King et al., 1999; Lee et al., 2012; Li et al., 2012;
Li et al., 2015) issues in different regions.
Space-borne passive satellite sensors provide the capability to acquire global observations
of Earth. Most passive sensors can detect reflectance (i.e., visible light), emission (i.e., infrared
and thermal infrared), and/or microwave portions of the electromagnetic spectrum by using
different types of radiometers and spectrometers. These radiometers may include, but are not
limited to:

• Radiometer: A remote sensing instrument that measures the intensity of electromagnetic

radiation in some visible, infrared or microwave bands within the spectrum.
• Hyperspectral radiometer: An advanced remote sensing instrument that can detect
hundreds of narrow spectral bands throughout the visible, near-infrared, and mid-infrared
portions of the electromagnetic spectrum. This type of sensor, which has very high spectral
resolution, can conduct fine discrimination among different targets based on their spectral
response in association with each of the narrow bands.
• Imaging radiometer: A remote sensing instrument that has a scanning capability to
generate a two-dimensional array of pixels from which an image of targets may be
produced. Scanning can be carried out electronically or mechanically by using an array
of detectors.

0.6 Visible Near infrared Middle infrared

0.5 Soil

0.4
Reflectance

0.3

0.2 Vegetation

0.1
Water
0
0.5 0.7 0.9 1.1 1.3 1.5 1.7 1.9 2.1 2.3 2.5
Wavelength (µm)

FIGURE 3.4 Reflectance spectra of soil, water, and vegetation.

Remote Sensing Sensors and Platforms 29

In addition, these spectrometers may include, but are not limited to:

• Sounder: A remote sensing instrument that is designed to measure vertical distributions of

specific atmospheric parameters such as composition, temperature, humidity, and pressure
from multispectral information.
• Spectrometer: A remote sensing instrument that is designed to detect, measure, and
analyze the spectral content of incident electromagnetic radiation. Traditional imaging
spectrometers count on prisms or gratings to disperse the radiation for image discrimination
analysis of spectral content.
• Spectroradiometer: A radiometer that is designed to detect and measure the intensity of
radiation in multispectral bands or hyperspectral bands. Multispectral remote sensing has
only limited bands available.

The difference between multispectral and hyperspectral imaging is illustrated in the diagram
shown below (Figure 3.5). Broadband sensors typically produce panchromatic images with very
wide bandwidths, typically 400–500 nanometers. For example, WorldView-1 produced panchromatic
images with a high spatial resolution of 50 centimeters. Most multispectral imagers have four basic
spectral bands, including blue, green, red, and near-infrared bands. Some multispectral imaging
satellites, such as Landsats 7 and 8, have additional spectral bands in the shortwave infrared
(SWIR) region of the spectrum. Hyperspectral imaging systems are designed to obtain imagery over
hundreds of narrow, continuous spectral bands with typical bandwidths of 10 nanometers or less. For
example, the NASA JPL AVIRIS air-borne hyperspectral imaging sensor obtains spectral data over
224 continuous channels, each with a bandwidth of 10 nm over a spectral range from 400 to 2,500
nanometers. Ultraspectral sensors represent the future design of hyperspectral imaging technology.

3.3.2 Active Sensors
An active sensor in remote sensing systems is a radar, laser or acronym for light detection and ranging
(LiDAR) instrument used for detecting, measuring, and analyzing signals transmitted by the sensor,

Broadband Visible SWIR LWIR

Band Band Band Band Band Band Band

1 2 3 4 5 7 6
Multispectral .45–.52 .52–0.60 .63–.69 .79–.90 1.55–1.75 2.08–2.35 10.4–12.4
µm µm µm µm µm µm µm

Hyperspectral 100s of bands

Ultraspectral 1000s of bands

FIGURE 3.5 The comparison among broadband, multispectral, hyperspectral, and ultraspectral remote sensing.
30 Multisensor Data Fusion and Machine Learning for Environmental Remote Sensing

which are reflected, refracted or scattered back by the surface of Earth and/or its atmosphere. The
majority of active sensors operate in the microwave portion of the electromagnetic spectrum, and
the frequency allocations of an active sensor from Ka band to L band (Table 2.2) are common to
other radar systems. Some active sensors are specifically designed to detect precipitation, aerosol,
and clouds, simply based on the radar echos. Active sensors have a variety of applications related
to hydrology, meteorology, ecology, environmental science, and atmospheric science. For example,
precipitation radars, either ground-based (Zrnic and Ryzhkov, 1999; Wood et al., 2001), air-borne
(Atlas and Matejka, 1985), or space-borne (Anagnostou and Kummerow, 1997), are designed to
measure the radar echo intensity (reflectivity) measured in dBZ (decibels) from rainfall droplets in
all directions to determine the holistic rainfall rate over the surface of Earth. These radar, LiDAR,
and laser sensors may include, but are not limited to:

• LiDAR: A Light Detection And Ranging active sensor (LiDAR) is designed to measure the
distance to a target by illuminating that target with a pulsed light amplification through
stimulated emission of radiation (laser) light and measuring the reflected pulses (the
backscattered or reflected light) using a receiver with sensitive detectors. Distance is equal
to velocity multiplied by time, so the distance to the object is calculated by recording the
time between transmitted and backscattered pulses multiplied by the speed of light.
• Laser altimeter: Mounted on a spacecraft or aircraft, a laser altimeter is a remote sensing
instrument that is designed to use a LiDAR to measure the height of Earth’s surface (either
sea level or ground level). It works by emitting short flashes of laser light toward the surface
of Earth. The height of the sea level or ground level with respect to the mean surface of
Earth is then calculated by the time spent between emitted and reflected pulses multiplied
by the speed of light to produce the topography of the underlying surface.
• Radar: An active radar sensor, whether air-borne or space-borne, emits microwave radiation
in a series of pulses from an antenna based on its own source of electromagnetic energy.
When the energy hits the target in the air or at the ground/sea level, some of the energy
is reflected back toward the sensor. This backscattered or reflected microwave radiation is
detected, measured, and analyzed. The time required for the energy to travel to the target and
return back to the sensor multiplied by the speed of light determines the distance or range to
the target. Therefore, a two-dimensional image of the surface can be produced by calculating
the distance of all targets as the remote sensing system passes through.

Microwave instruments that are designed for finer environmental monitoring and earth
observation may include, but are not limited to:

• Scatterometer: A high-frequency microwave radar that is designed to measure backscattered

radiation over ocean surfaces to derive maps of surface wind speed and direction.
• Sounder: An active sensor that specifically measures vertical distribution of atmospheric
characteristics such as humidity, temperature, and cloud composition, as well as
precipitation.

3.4 REAL-WORLD REMOTE SENSING SYSTEMS

The advancement of new space, communication, and sensor technologies developed in the 1950s
triggered the world’s first man-made satellite when Sputnik-1 was launched on October 4, 1957, by
Russia. It was the first man-made satellite launched into space with scientific instruments onboard,
albeit it had only a 90-day lifetime. After a silence of three years, in 1960, the term “remote sensing”
was formally introduced. On April 1, 1960, the Television and Infrared Observation Satellite
(TIROS), the first true weather satellite, was launched by the United States of America (USA). TIROS
Remote Sensing Sensors and Platforms 31

carried special television cameras used for Earth’s cloud cover observation from a 720 km (450 mile)
orbit, and was the first experimental attempt of the National Aeronautics and Space Administration
(NASA) to study Earth with satellite instruments. Since then, satellites have become the primary
platform utilized to carry remotely sensed instruments for earth observation and environmental
monitoring. These space-borne instruments take advantage of large spatial coverage and regular
revisiting periods. From 1964 to 1970, a series of four meteorological research satellites, named after
Nimbus, were launched into space and had profound impacts due to their synoptic views of Earth;
they provided information on issues such as weather dynamics and vegetation patterns.
With the rapid progress of new remote sensing technologies in each successive generation of
new satellites, remotely sensed instruments and platforms became increasingly sophisticated and
professional, generating finer temporal, spectral, and spatial resolution imagery on a routine basis.
There have been abundant remotely sensed instruments onboard different platforms since the passive
remote sensing techniques were first proposed for satellite Earth observation in the 1970s. The
chronological history of a set of famous satellite remote sensing platforms is illustrated in Figure 3.6.
One of the world’s best-known families of remote sensing satellites is Landsat, which is operated
by the USA and has evolved over the past 40 years. Landsat 1, which was also called the “Earth
Resources Technology Satellite” until 1975, is the first satellite in the Landsat family and was
launched in 1972, dedicated to periodic environmental monitoring. The Landsat 1 satellite had
two sensors, called the Return Beam Vidicon (RBV) and the Multispectral Scanner (MSS). The
MSS sensor was designed to capture images in the red, blue, and green spectra at 60 m resampled
resolution over four separate spectral bands between 500 and 1,100 nm. The other two successive
satellites (Landsat 2-3), were launched in 1975 and 1978, respectively. The same sensors were
deployed onboard the Landsat 2, while the spectral capability of the MSS sensor on Landsat 3 was
extended to measure radiation between 1,050 and 1,240 nm. Following the success of Landsat 1-3,
the Landsat 4 was launched in 1982 with improved spectral and spatial resolution. The RBV was
replaced with the thematic mapper (TM) sensor, providing seven bands from 450 to 2,350 nm with
30-m resolution pixels. In addition, the revisiting time of the satellite was improved from 18 days to

Fengyun–4A

OCO–2
Aura

Aqua 2016
2014 2015
Terra 2011
2009
2004
2002 GPM
1999
Landsat–1 Suomi–NPP Sentinel–2A
1995

1986 Envisat
GOSAT
1972
RADARSAT–1

1964
SPOT–1
1957
Nimbus–1

Sputnik–1: the first human-made satellite

FIGURE 3.6 The timeline of some well-known Earth science satellite systems since the 1950s.
32 Multisensor Data Fusion and Machine Learning for Environmental Remote Sensing

16 days. Launched in 1985, Landsat 5 is a duplicate of Landsat 4, and its TM sensor remains active
25 years beyond its designated lifetime. The next two satellites (Landsat 6-7) were launched in
1993 and 1999, respectively. However, Landsat 6 did not reach its orbit due to launch failure. These
two satellites were equipped with the Panchromatic (PAN), Enhanced Thematic Mapper (ETM),
and Enhanced Thematic Mapper Plus (ETM+) sensors, providing a spatial resolution of 15-m
panchromatic and 30-m multispectral images. In addition, the latest generation Landsat satellite,
Landsat 8, was launched in 2013 with a two-sensor payload, including the Operational Land Imager
(OLI) and the Thermal InfraRed Sensor (TIRS). Landsat 8 OLI and TIRS images are comprised of
nine spectral bands with a spatial resolution of 30 m for bands 1 to 7 and 9 (Table 3.1) (Barsi et al.,
2014). The ultra-blue band 1 is useful for coastal and aerosol studies and band 9 is useful for cirrus
cloud detection (Barsi et al., 2014). Thermal bands 10 and 11 are useful for providing more accurate
surface temperatures and are collected at 100 m (Barsi et al., 2014).
Developed by NASA in the USA since the early 1970s, the Landsat program is the longest remote
sensing program in the world, providing over 40 years of calibrated data about Earth’s surface with
moderate resolution to a broad user community. In summary, Landsats 1–3 images are comprised of four
spectral bands with 60-m spatial resolution, and the approximate scene size is 170 km north-south by
185 km east-west (USGS, 2017). Specific band designations differ from Landsats 1, 2, and 3 to Landsats
4 and 5 (Tables 3.2 and 3.3). Landsats 4–5 images are comprised of seven spectral bands with a spatial
resolution of 30 m for bands 1 to 5 and 7 (Table 3.3). The approximate scene size is 170 km north-south
by 183 km east-west (USGS, 2017). The images of ETM+ are comprised of eight spectral bands with a
spatial resolution of 30 m for bands 1 to 7 (Table 3.4). Yet the resolution for band 8 (panchromatic) is 15
m, which provides a niche for data fusion. The approximate scene size of the ETM+ images is 170 km
north-south by 183 km east-west (USGS, 2017). The resolution for band 8 (panchromatic) is 15 m. The
approximate scene size is 170 km north-south by 183 km east-west (USGS, 2017). Overlapped bands
provide a critical basis for information consistency, which is essential for cross-checking the continuity
of the multispectral data coverage provided by Landsat missions (Figure 3.7).
Besides Landsat satellites, the second important remote sensing satellite family, SPOT (Satellite
Pour l’Observation de la Terre), was designed and subsequently launched by a French–Belgian–Swedish

TABLE 3.1
Comparison of Corresponding Basic Properties of Landsat 8 OLI
and TIRS Images
Landsat 8 Bands Wavelength (µm) Resolution (m)
Band 1—Ultra Blue (coastal/aerosol) 0.435–0.451 30
Band 2—Blue 0.452–0.512 30
Band 3—Green 0.533–0.590 30
Band 4—Red 0.636–0.673 30
Band 5—Near Infrared (NIR) 0.851–0.879 30
Band 6—Shortwave Infrared (SWIR) 1 1.566–1.651 30
Band 7—Shortwave Infrared (SWIR) 2 2.107–2.294 30
Band 8—Panchromatic 0.503–0.676 15
Band 9—Cirrus 1.363–1.384 30
Band 10—Thermal Infrared (TIRS) 1 10.60–11.19 100 * (30)
Band 11—Thermal Infrared (TIRS) 2 11.50–12.51 100 * (30)

Source: United States Geological Survey (USGS), 2017. [Link]

what-are-band-designations-landsat-satellites, accessed December 2017.
* TIRS bands are acquired at 100 meter resolution, but are resampled to 30 meter in deliv-
ered data product.
Remote Sensing Sensors and Platforms 33

TABLE 3.2
Comparison of Corresponding Basic Properties of Landsats
1–3 Multispectral Scanner (MSS) Images
Landsat 1–3 MSS Bands Wavelength (µm) Resolution (m)
Band 4—Green 0.5–0.6 60a
Band 5—Red 0.6–0.7 60a
Band 6—Near Infrared (NIR) 0.7–0.8 60a
Band 7—Near Infrared (NIR) 0.8–1.1 60a

Source: United States Geological Survey (USGS), 2017. [Link]

designations-landsat-satellites, accessed December 2017.
a Original MSS pixel size was 79 × 57 meters; production systems now resample the data to 60

meters.

TABLE 3.3
Comparison of Corresponding Basic Properties of Landsats 4–5 Thematic
Mapper (TM) Images
Landsat 4–5 TM Bands Wavelength (µm) Resolution (m)
Band 1—Blue 0.45–0.52 30
Band 2—Green 0.52–0.60 30
Band 3—Red 0.63–0.69 30
Band 4—Near Infrared (NIR) 0.76–0.90 30
Band 5—Shortwave Infrared (SWIR) 1 1.55–1.75 30
Band 6—Thermal 10.40–12.50 120 (resampled to 30)
Band 7—Shortwave Infrared (SWIR) 2 2.08–2.35 30

Source: United States Geological Survey (USGS), 2017. [Link]

designations-landsat-satellites, accessed December 2017.

TABLE 3.4
Comparison of Corresponding Basic Properties of Landsat 7 Enhanced
Thematic Mapper Plus (ETM+) Images
Landsat 7 ETM+ Bands Wavelength (µm) Resolution (m)
Band 1—Blue 0.45–0.52 30
Band 2—Green 0.52–0.60 30
Band 3—Red 0.63–0.69 30
Band 4—Near Infrared (NIR) 0.77–0.90 30
Band 5—Shortwave Infrared 1.57–1.75 30
(SWIR) 1
Band 6—Thermal 10.40–12.50 60 (resampled to 30)
Band 7—Shortwave Infrared 2.09–2.35 30
(SWIR) 2
Band 8—Panchromatic 0.52–0.90 15

Source: United States Geological Survey (USGS), 2017. [Link]

designations-landsat-satellites, accessed December 2017.
34 Multisensor Data Fusion and Machine Learning for Environmental Remote Sensing

Satellite Sensor Swath Bits VNIR SWIR TIR

30 30 m 30 m 30 m 30 30 m

Landsat 8 OLI 185 km 12

30 m
15 m
TIRS 100 m 100 m

30 m 30 m 30 m 30 m
Landsat 7 ETM+ 183 km 8 15 m
30 m 30 m 60 m

MSS 185 km 8 82 m 82 m 82 m 82 m
Landsat 4–5
TM 185 km 8 30 m 30 m 30 m 30 m 30 m 30 m 120 m

Landsat 1–2 RBV 183 km 80 m 80 m 80 m

Landsat 3 RBV 183 km 40 m

Landsat 1–3 MSS 183 km 8 79 m 79 m 79 m 79 m

240 m (L3 only)

FIGURE 3.7 Continuity of multispectral data coverage provided by Landsat missions. (United States
Geological Survey (USGS), 2014. Landsat 8 (L8) Data Users Handbook. [Link]
data-users-handbook-section-1, accessed May 2017.)

joint program beginning in 1986. In 1986, the SPOT 1 was launched, equipped with a high-resolution
visible (HRV) sensor that offered 10-m panchromatic and 20-m multispectral images with a 26-day
revisit interval (i.e., repeat cycle). Improvements in the spatial resolution of SPOT 1 provided more
accurate data for better understanding and monitoring the surface of Earth. The SPOT 2, SPOT
3, and SPOT 4 were launched with the same instruments in 1990, 1993, and 1998, respectively.
SPOT 5, the latest SPOT in the sky, was launched in 2002 with 2.5- or 5-m panchromatic and 10-m
multispectral image resolution, with the same 26-day revisit interval.
IKONOS, the first commercialized remote sensing satellite launched by a private entity, was
launched in 1999 and was capable of providing a high spatial resolution (1 m) and high temporal
resolution (1.5 to 3 days) imagery. During the same year, Terra, the flagship satellite for the Earth
Observing System (EOS), was launched with five different instruments aboard, including the Clouds
and the Earth’s Radiant Energy System (CERES), the Multi-angle Imaging SpectroRadiometer
(MISR), the Moderate-Resolution Imaging Spectroradiometer (MODIS), the Measurements of
Pollution in the Troposphere (MOPITT), and the Advanced Spaceborne Thermal Emission and
Reflection Radiometer (ASTER). These sensors were designed to monitor the state of Earth’s
environment and ongoing changes in its climate system with a spatial resolution of 250–1,000 m
around the globe. Aqua is another EOS satellite similar to Terra but with a different equator-crossing
time. Both sensors of MODIS onboard Aqua and Terra provide us with the ability of near real-time
environmental monitoring on a daily basis (i.e., Aqua passes the equator daily at the local time of
1:30 p.m. as it heads north (ascending mode) in contrast to Terra, which passes the equator daily at
the local time of 10:30 a.m. (descending mode)).
In addition to remotely sensed sensors such as MOPITT, other specific instruments were utilized
to provide information regarding the atmospheric compositions and air quality. Total Ozone
Mapping Spectrometer (TOMS), first deployed onboard NASA’s Nimbus-7 satellite in 1978, was
the first instrument designed to monitor the total column ozone at the global scale in order to
track the ozone depletion from space. The following instruments, such as the Ozone Monitoring
Instrument (OMI) onboard Aura launched in 2004, the Total Ozone Unit (TOU) onboard the
Chinese FY-3 series satellite since 2008, and the Ozone Monitoring and Profiler Suite (OMPS)
onboard the Suomi-NPP satellite launched in 2011, are all dedicated to monitoring ozone variability
at the global scale. Similarly, the Greenhouse Gases Observing SATellite (GOSAT), launched in
Remote Sensing Sensors and Platforms 35

2009 by Japan, was designed to monitor global carbon dioxide (CO2) and methane (CH4) variability.
The latest Orbiting Carbon Observatory-2 satellite, NASA’s first dedicated Earth remote sensing
satellite used to study atmospheric carbon dioxide from space, was launched in 2014. With the aid
of these satellite platforms and instruments, more detailed information about Earth is provided to
help us better understand our changing world.
The third family of remote sensing satellites is the National Oceanic and Atmospheric
Administration (NOAA) family of polar-orbiting platforms (POES). The Coastal Zone Color
Scanner (CZCS) was launched in 1978 for measuring ocean color from space. Following the success
of the CZCS, other similar ocean color sensors were also launched. They include the Moderate
Optoelectrical Scanner (MOS), the Ocean Color Temperature Scanner (OCTS), Polarization
and Directionality of the Earth’s Reflectances (POLDER), and the Sea-Viewing Wide Field-of-
View Sensor (SeaWiFS). The most popular space-borne sensor, Advanced Very High Resolution
Radiometer (AVHRR), embarked on this series of satellites for remotely determining cloud cover
and the surface temperature. With the technological advances of the late 20th century, satellite
sensors with improved spatiotemporal and spectral resolutions were designed and utilized for
different Earth observation purposes.
In addition, the Medium Resolution Imaging Spectrometer (MERIS), one of the main payloads
onboard Europe’s Environmental Satellite (ENVISAT-1), provided hyperspectral rather than
multispectral remote sensing images with a relatively high spatial resolution (300 m). Although
the ENVISAT-1 was lost in space, those onboard instruments such as MERIS had provided an
ample record of Earth’s environment. Meanwhile, more and more commercial satellites managed by
private sectors in the USA, such as Quickbird 2, Worldview 1, and Worldview 2, provided remotely
sensed optical imagery with enhanced spatial and spectral details.
In the 1990s, microwave sensors, such as synthetic aperture radar (SAR), were deployed onboard
a series of satellites as the fourth family of remote sensing satellites; they were designed to provide
microwave sensing capacity with higher resolution in all weather conditions. They include JERS-1
(Japan), ERS-1 and ERS-2 (Europe), and RADARSAT-1 (Canada). With increased spatial, spectral,
and temporal resolutions, more detailed information on Earth’s changing environment and climate
can be provided. As the importance of microwave remote sensing was generally recognized, new
generations of satellites of this kind continued the advancement in SAR remote sensing with varying
polarization (HH+HV+VH+VV) modes, including Japan (ALOS), Canada (RADARSAT-2), and
Germany (TerraSAR-X).

3.5 CURRENT, HISTORICAL, AND FUTURE IMPORTANT MISSIONS

3.5.1 Current Important Missions
Through NASA’s EOS program, a variety of remotely sensed data has become available in the Earth
Observing System Data Information System (EOSDIS) from diversified operational satellites in the
USA since the 1980s. Visible Infrared Imaging Radiometer Suite (VIIRS), a subsequent mission
to MODIS, is a scanning radiometer that collects visible and infrared imagery and radiometric
measurements rendering complementary support for date merging and fusion to Soil Moisture
Active Passive (SMAP) (L-band SAR), which was launched in late 2014 to provide global soil
moisture data. In Europe, Copernicus, which was previously known as Global Monitoring for
Environment and Security (GMES), is a major program in the European Space Agency (ESA)
in the European Union (EU) for the establishment of a European capacity for Earth observation.
Many of the active and passive sensors onboard different platforms whose data are supported by
EOSDIS (USA) and Copernicus (EU) are highlighted in this section. In addition, the German
Aerospace Center (Deutsches Zentrum für Luft- und Raumfahrt; DLR) has been very successful in
producing and managing TerraSAR-X, TanDEM-X, and Sentinel series satellites. France’s Centre
National d’Etudes Spatiales (CNES) has been successful in the SPOT satellite series. In addition,
36 Multisensor Data Fusion and Machine Learning for Environmental Remote Sensing

the Canadian Space Agency (CSA) helps fund the construction and launch of the RADARSAT
satellite and recovers this investment through the supply of RADARSAT-1 and RADARSAT-2 data
to the Government of Canada and various user communities during the lifetime of the mission.
The Japan Aerospace Exploration Agency (JAXA) also performs various space activities related
to Earth observation. JAXA handles the Advanced Land Observation Satellite, known as ALOS.
The following summary tables (Tables 3.5 through 3.8) present most of the current satellites
as of July 2017 relevant for environmental monitoring and Earth observation. These remote
sensing systems are operated mainly by space agencies in many countries such as NASA (USA),
ESA (European Union), DLR (Germany), CNES (France), JAXA (Japan), and CSA (Canada). In
addition, IKONOS and GeoEye/RapidEye are commercial optical-NIR (near infrared) providing
high-resolution satellite imageries.

3.5.2 Historic Important Missions

There were some important programs that provided valuable space-time observations of selected
parameters for various environmental applications in the past three decades. They are summarized
in Table 3.9. As old satellites stepped down at the end of their product life cycle, new satellites with
more advanced functionalities were sent into space for carrying out even more complex missions.
From microwave to optical remote sensing, a few important successions in this regard deserve our
attention; for instance, ALOS-2 (L-band SAR) was launched in 2014 to replace ALOS-1 in Japan
and RADARSAT-2 was launched in 2007 to replace RADARSAT-1. Landsat 8 was launched in

TABLE 3.5
Current Important Missions Using Passive Spectrometers for Environmental Applications
Platform Sensor Type Feature
Aircraft Airborne Visible/ Imaging • AVIRIS has 224 contiguous
Infrared Imaging Spectrometer channels.
Spectrometer (AVIRIS) (Passive Sensor) • Measurements are used for studying
water vapor, ocean color, vegetation
classification, mineral mapping, and
snow and ice cover.
Suomi National Polar-orbiting Cross-Track Infrared Spectrometer • CrIS produces high-resolution,
Partnership (Suomi-NPP) Sounder (CrIS) (Passive Sensor) three-dimensional temperature,
pressure, and moisture profiles.
Suomi National Polar-orbiting Ozone Mapping Profiler Spectrometer • OMPS is an advanced suite of two
Partnership (Suomi-NPP) Suite (OMPS) (Passive Sensor) hyperspectral instruments.
• OMPS extends the 25+ year
total-ozone and ozone-profile records.
Terra Multi-angle Imaging Imaging • MISR obtains images in four spectral
SpectroRadiometer Spectrometer bands at nine different angles.
(MISR) (Passive Sensor) • MISR provides aerosol, cloud, and
land surface data.
Sentinel-2 MultiSpectral Imager Imaging • Sentinel-2A and 2B provide satellite
(MSI) Spectrometer image data to support generic land
(Passive Sensor) cover, land use and change detection,
leaf area index, leaf chlorophyll
content, and leaf water content.

Source: National Aeronautics and Space Administration (NASA), 2017. EOSDIS – Remote Sensors. [Link]
gov/user-resources/remote-sensors, accessed May 2017.
Remote Sensing Sensors and Platforms 37

TABLE 3.6
Current Important Missions Using Passive Multispectral Radiometers for Environmental
Applications
Platform Sensor Type Feature
Aqua Advanced Microwave Multichannel Microwave • AMSR-E measures precipitation,
Scanning Radiometer Radiometer (Passive oceanic water vapor, cloud water,
(AMSR-E) Sensor) near-surface wind speed, sea and
land surface temperature, soil
moisture, snow cover, and sea ice.
Aqua Moderate-Resolution Imaging Spectroradiometer • MODIS measures ocean and land
Imaging Spectroradiometer (Passive Sensor) surface properties, surface
(MODIS) reflectance and emissivity, and air
properties.
Landsat 7 Enhanced Thermatic Mapper Scanning Radiometer • The ETM+ instrument provides
Plus (ETM+) (Passive Sensor) high-resolution imaging
information of Earth’s surface.
Landsat 8 The Operational Land Imager Radiometer (Passive • OLI and TIRS are designed
(OLI) and the Thermal Sensor) similarly to Landsat 7 for the same
Infrared Sensor (TIRS) purpose in applications.
Soil Moisture L-Band Radiometer (LBR) Radiometer (Passive • SMAP-LBR radiometer chooses an
Active Passive Sensor) advanced radiometer to monitor
(SMAP) water and energy fluxes and
improve flood predictions and
drought monitoring.
Suomi National Visible Infrared Imaging Radiometer • VIIRS collects water-leaving
Polar-orbiting Radiometer Suite (VIIRS) (Passive Sensor) reflectance and land-reflective data.
Partnership
(Suomi-NPP)
Terra Advanced Spaceborne Multispectral Radiometer • ASTER measures surface radiance,
Thermal Emission and (Passive Sensor) reflectance, emissivity, and
Reflection Radiometer temperature. Provides spatial
(ASTER) resolutions of 15 m, 30 m, and
90 m.
Terra Clouds and the Earth’s Broadband Scanning • CERES measures atmospheric and
Radiant Energy System Radiometer (Passive surface energy fluxes.
(CERES) Sensor)
Terra Moderate-Resolution Imaging Spectroradiometer • The same as MODIS Aqua
Imaging Spectroradiometer (Passive Sensor)
(MODIS)
Aura Ozone Monitoring Multispectral Radiometer • OMI collects 740 wavelength
Instrument (OMI) (Passive Sensor) bands in the visible and ultraviolet
electromagnetic spectrum.
• OMI measures total ozone and
profiles of ozone, N2O, SO2, and
several other chemical species.
SPOT High Resolution Visible Multispectral Radiometer • SPOT provides high-resolution
(HRV) Imaging (Passive Sensor) maps for change detections of
Spectroradiometer Earth’s surface.
IKONOS High Resolution Visible Multispectral and • IKONOS provides high-resolution
(HRV) Imaging Panchromatic Radiometer maps for change detections of
Spectroradiometer (Passive Sensor) Earth’s surface.
(Continued)
38 Multisensor Data Fusion and Machine Learning for Environmental Remote Sensing

TABLE 3.6 (Continued)

Current Important Missions Using Passive Multispectral Radiometers for Environmental
Applications
Platform Sensor Type Feature
GeoEye High-Resolution Visible Multispectral and • GeoEye provides high-resolution
(HRV) Imaging Panchromatic Radiometer maps for change detections of
Spectroradiometer (Passive Sensor) Earth’s surface.
WorldView High-Resolution Visible Panchromatic Radiometer • WorldView provides extra-fine
(HRV) Imaging (Passive Sensor) resolution (less than 1 m)
Spectroradiometer panchromatic maps of Earth’s
surface.
QuickBird High-Resolution Visible Multispectral and • QuickBird provides high-
(HRV) Imaging Panchromatic Radiometer resolution maps for change
Spectroradiometer (Passive Sensor) detections of Earth’s surface.

Source: National Aeronautics and Space Administration (NASA), 2017. EOSDIS – Remote Sensors. [Link]
gov/user-resources/remote-sensors, accessed May 2017, CNES, and private sectors’ web sites.

February 2013, although Landsat 7 is still in operation. This will certainly continue the previous
Earth observation Landsat mission in the optical and infrared remote sensing regime over the past
four decades. GOES-R was deployed as GOES1-12 retired from its geostationary orbit. The mission
of TOMS aboard Nimbus 7 for total ozone mapping was continued by OMI aboard Aura and later
by OMPS aboard Suomi-NPP. The new Sentinel 1 satellite continued the mission of ERS-1 and
ERS-2 for SAR imaging from space.

3.5.3 Future Important Missions

In view of future important missions, it is critical to stress the Copernicus program, which is an
initiative headed by the European Commission (EC) in partnership with the ESA. The Copernicus
program, known as GMES, is one of the most ambitious Earth observation programs to date. The
new family of satellites planned in GMES, called Sentinels, will meet the operational needs of the
Copernicus program. The family of Sentinels will provide a unique set of Earth observations, starting
with the all-weather, day and night radar images from Sentinel-1A and -1B launched in April 2014 and
April 2016, respectively, to Sentinel-3A, launched on 16 February 2016 to provide data for services
relevant to the ocean and land. In the future, Sentinel-4 and -5 will provide data for atmospheric
composition monitoring from geostationary and polar orbits, respectively (ESA, 2017). Sentinel-6 will
carry a radar altimeter to measure global sea surface height, primarily for operational oceanography
and climate studies (ESA, 2017). In addition, a Sentinel-5 precursor mission is being developed to
reduce data gaps between ENVISAT, in particular the SCIAMACHY instrument, and the launch
of Sentinel-5 (ESA, 2017). For forest management, the ESA Biomass Earth Explorer mission will
be the first satellite P-band SAR, with a potential launch in 2019. All of these efforts will lead to the
collection of accurate, timely, and easily accessible information to improve the management of the
environment, understand and mitigate the effects of climate change, and ensure civil security.
Parallel to the thrust of the Copernicus program in Europe, in North America the RADARSAT
Constellation Mission is scheduled for launch in 2018 to collect the C-band radar data. SWOT
(Surface Water Ocean Topography), designed by the Jet Propulsion Laboratory in the USA, is a
Ka-band SAR altimeter and radar interferometer with a potential launch in 2020. This will provide
an opportunity to create a breakthrough in hydrological science, leading to closure of the loop of
water-balance calculation in the natural environment. In addition, the Ice, Cloud, and land Elevation
Satellite-2 (ICESat-2) that is the benchmark EOS mission is the second generation of the orbiting
Remote Sensing Sensors and Platforms 39

TABLE 3.7
Current Important Missions Using Radar, LiDAR, Gravimeter, and Laser for Environmental
Applications
Platform Sensor Type Mission
Airborne Microwave Synthetic Aperture Radar (Active • P-band SAR provides calibrated
Observatory of Subcanopy Radar (SAR) Sensor) polarimetric measurements to retrieve
and Subsurface (AirMOSS) root-zone soil moisture.
Ice, Cloud, and land Geoscience Laser Radar (Active • ICESat measures ice sheet elevations
Elevation Satellite (ICESat) Altimeter System Sensor) and changes in elevation through time
(GLAS) in addition to the measurement of cloud
and aerosol height profiles, land
elevation and vegetation cover, and sea
ice thickness.
Cloud-Aerosol LiDAR and Cloud-Aerosol LiDAR Cloud and Aerosol • CALIOP is a two-wavelength
Infrared Pathfinder Satellite with Orthogonal LiDAR (Active polarization-sensitive LiDAR that
Observations (CALIPSO) Polarization Sensor) provides high-resolution vertical
(CALIOP) profiles of aerosols and clouds.
Cloud-Aerosol Transport Light Detection and LiDAR (Active • LiDAR provides range-resolved profile
System on the International Ranging (LiDAR) Sensor) measurements of atmospheric aerosols
Space Station (CATS) and clouds.
Global Precipitation Dual-Frequency Radar (Active • DPR provides information regarding
Measurement (GPM) Precipitation Radar Sensor) rain and snow worldwide.
(DPR)
Ocean Surface Topography Poseidon-3 Altimeter Altimeter (Active • PA provides sea surface heights for
Mission/Jason-2 (OSTM/ (PA) Sensor) determining ocean circulation, climate
Jason-2) change, and sea level rise.
Sentinel-1 Synthetic Aperture Radar (Active • Sentinel-1 SAR provides land and
Radar (SAR) Sensor) ocean monitoring regardless of the
weather.
Sentinel 3 Synthetic Aperture Radar (Active • Sentinel 3 supports marine observation,
Radar (SAR) Sensor) and will study sea-surface topography,
sea and land surface temperature, ocean
and land color.
Soil Moisture Active Passive L-Band Radar (LBR) Radar (Active • SMAP-LBR radar measures the amount
(SMAP) Sensor) of water in the top 5 cm of soil
everywhere on Earth’s surface.
Advanced Land Observation L-band ALOS Phased Array • ALOS expands SAR data utilization by
Satellite (ALOS) PALSAR L-band Synthetic enhancing its performance.
Aperture Radar
(Active Sensor)
TerraSAR-X X-band SAR sensor Radar (Active • TerraSAR-X provides SAR images with
Sensor) high resolution.
TanDEM-X X-band SAR sensor Radar (Active • TanDEM-X provides land subsidence,
Sensor) digital elevation model and other land
cover conditions.
Gravity Recovery and Low-earth orbit Passive Sensor • GRACE measures gravity changes to
Climate Experiment satellite gravimetry infer the water storage at the surface of
(GRACE) Satellite Earth.

Source: National Aeronautics and Space Administration (NASA), 2017. EOSDIS – Remote Sensors [Link]
gov/user-resources/remote-sensors, accessed May 2017.
40 Multisensor Data Fusion and Machine Learning for Environmental Remote Sensing

TABLE 3.8
Current Important Missions of Scatterometers and Sounding Instruments for
Environmental Applications
Platform Sensor Type Feature
Cyclone Global Navigation Delay Doppler Mapping Scatterometer • DDMI measures ocean surface wind
Satellite System Instrument (DDMI) (Active speed in all precipitating conditions.
(CYGNSS) Sensor)
Aqua Atmospheric Infrared Sounder (Passive • AIRS measures air temperature,
Sounder (AIRS) Sensor) humidity, clouds, and surface
temperature.
Aqua Advanced Microwave Sounder (Passive • AMSU measures temperature profiles
Sounding Unit (AMSU) Sensor) in the upper atmosphere.
Aura High-Resolution Sounder (Passive • HIRDLS measures profiles of
Dynamics Limb Sensor) temperature, ozone, CFCs, and
Sounder (HIRDLS) various other gases affecting ozone
chemistry.
Aura Microwave Limb Sounder (Passive • MLS derives profiles of ozone, SO2,
Sounder (MLS) Sensor) N2O, OH and other atmospheric
gases, temperature, pressure, and
cloud ice.
Suomi-National Polar- Ozone Mapping Profiler Sounder (Passive • OMPS provides operational ozone
orbiting Partnership Suite (OMPS) Sensor) measurements.
(Suomi-NPP)
Terra Measurements of Sounder (Passive • MOPITT measures carbon monoxide
Pollution in the Sensor) and methane in the troposphere.
Troposphere (MOPITT)

Source: National Aeronautics and Space Administration (NASA), 2017. EOSDIS – Remote Sensors [Link]
gov/user-resources/remote-sensors, accessed May 2017.

laser altimeter ICESat, and is scheduled for launch in 2018. The ICESat-2 mission will provide
multi-year elevation data needed to determine ice sheet mass balance as well as cloud property
information, especially for stratospheric clouds common over polar areas. The Gravity Recovery and
Climate Experiment Follow-on (GRACE-FO) (a.k.a. GFO) mission that is part of the US-German
GRACE consortium (NASA/Jet Propulsion Laboratory, Center for Space Research/University of
Texas, DLR, GFZ Helmholtz Center Potsdam) is heavily focused on maintaining data continuity
from GRACE and minimizing any data gap after GRACE. In concert with the functionality of
SWOT, this effort leads to the final closure of the water balance in the hydrological system.
Specifically, the future German satellite mission EnMAP (Environmental Mapping and Analysis
Program) addresses the future need for hyperspectral remote sensing (Stuffler et al., 2007). It aims
to measure, derive, and analyze diagnostic parameters for the vital processes on Earth’s land and
water surfaces. The EnMAP hyperspectral remote sensing products take images of 1024 × 1024
pixels (∼30 × 30 km2), which are generated by the processing system on demand and delivered to
the user community.

3.6 SYSTEM PLANNING OF REMOTE SENSING APPLICATIONS

Satellite imagery demonstrates the same physical environment in different ways as a result of
different spectral, spatial, and temporal resolutions. Various software dealing with image processing
may become essential for different missions (Zhang, 2010). In the broadest sense, data fusion
Remote Sensing Sensors and Platforms 41

TABLE 3.9
Historic Important Missions for Environmental Applications
Platform Sensor Type Mission
Advanced Land Phased Array L-band Radar (Active • PALSAR provided mapping of regional
Observing Satellite Synthetic Aperture Radar Sensor) land coverage, disaster monitoring, and
(ALOS) (PALSAR) resource surveying.
Advanced Land Panchromatic Remote Spectrometer • PRISM provided panchromatic images with
Observing Satellite Sensing Instrument for (Passive 2.5-m spatial resolution that digital surface
(ALOS) Stereo Mapping (PRISM) Sensor) model (DSM).
Radar Satellite Synthetic Aperture Radar Radar (Active • RADARSAT-1 collected data on resource
(RADARSAT-1) (SAR) Sensor) management, ice, ocean and environmental
monitoring, and Arctic and off-shore
surveillance.
Nimbus-7 Coastal Zone Color Scanner Radiometer • CZCS attempted to discriminate between
(CZCS) (Passive organic and inorganic materials in the water.
Sensor)
Nimbus-7 Earth Radiation Budget Radiometer • ERBE attempted to test infrared limb scanning
Experiment (ERBE) (Passive radiometry to sound the composition and
Sensor) structure of the middle atmosphere.
Nimbus-7 Stratospheric Aerosol Photometer • SAM II attempted to measure stratospheric
Measurement II (SAM II) (Passive aerosol measurement and provided vertical
Sensor) profiles of aerosol extinction in both the
Arctic and Antarctic polar regions.
Nimbus-7 Solar Backscatter Spectrometer • SBUV and TOMS sensors provided
Ultraviolet (SBUV), Total (Passive first-hand data of UV-B and total column
Ozone Mapping Sensor) ozone.
Spectrometer II (TOMS II)
Nimbus-7 Scanning Multichannel Multispectral • SMMR measured sea surface temperatures,
Microwave Radiometer Microwave ocean near-surface winds, water vapor and
(SMMR) Radiometer cloud liquid water content, sea ice extent, sea
(Passive ice concentration, snow cover, snow moisture,
Sensor) rainfall rates, and differential of ice types.
Ice, Cloud, and land Geoscience Laser Altimeter Laser Altimeter • GLAS measured ice sheet elevations and
Elevation Satellite System (GLAS) (Active Sensor) changes in elevation through time.
(ICESat) • GLAS measured cloud and aerosol height
profiles, land elevation and vegetation cover,
and sea ice thickness.
European Remote Sensing Synthetic Aperture Radar Radar (Active • ERS SAR emitted a radar pulse with a
Satellite (ERS-1, ERS-2) (SAR) Sensor) spherical wavefront which reflected from
the surface.
Cosmo/SkyMed 1, 2, 3, 4 Synthetic Aperture Radar SAR 2000 • Cosmo/SkyMed SAR emitted a radar pulse.
(SAR) (Active Sensor)
European Remote Sensing Active Microwave Microwave • ERS AMI-WIND emitted a radar pulse with
Satellite (ERS-1, ERS-2) Instrument (AMI-WIND) (Active Sensor) a spherical wavefront which reflected from
the surface.
European Remote Sensing Radar Altimetry (RA) Radar (Active • ERS RA emitted a radar pulse with a
satellite (ERS-1, ERS-2) Sensor) spherical wavefront which reflected from
the surface.
Geostationary Operational Advanced Very High Radiometer • AVHRR can be used for remotely
Environmental Satellite Resolution Radiometer (Passive determining cloud cover and the surface
(GOES 1-12) (AVHRR) Sensor) temperature.

Source: National Aeronautics and Space Administration (NASA), 2017. EOSDIS – Remote Sensors [Link]
user-resources/remote-sensors, accessed May 2017. CNES, DLR, European Space Agency (ESA), 2017. The Copernicus
Programme, [Link] Observing_the_Earth/Copernicus/Overview3, CSA (2017).
42 Multisensor Data Fusion and Machine Learning for Environmental Remote Sensing

TABLE 3.10
Future Important Missions for Environmental Applications
Platform Sensor Type Mission
Surface Water Ocean Advanced Microwave Radiometer • SWOT will provide sea surface heights and
Topography Radiometer (AMR) (Passive Sensor) terrestrial water heights over a 120-km-wide
(SWOT) swath with a ±10-km gap at the nadir track.
Sentinel 5P Synthetic Aperture Radar (Active • Sentinel 5P will aim to fill in the data gap and
Radar (SAR) Sensor) provide data continuity between the retirement
of the ENVISAT satellite and NASA’s Aura
mission and the launch of Sentinel-5.
Biomass Imaging Multispectral • Biomass will address the status and dynamics
Spectroradiometer Radiometer of tropical forests.
(Passive Sensor)
EnMap Imaging Hyperspectral • EnMap will address the dynamics of the land
Spectroradiometer Radiometer and water surface.
(Passive Sensor)

involves combining information to estimate or predict the state of some aspect of the system. This
can be geared toward much better Earth observations and to tackling a few challenging problems.
For instance, due to the impact of global climate change, data fusion missions can be organized
based on the existing and future satellites groupwise, such as ESA Sentinels 1 and 2 and Landsat-8,
gravimetry missions, TerraSAR-X, SWOT, SMAP, GRACE-FO, GOES-R and TanDEM-X, to
produce data fusion products. Data fusion, with respect to the different remote sensing sensors
above, can be carried out to blend different modalities of satellite imagery into a single image
for various Earth observation applications over temporal and spatial scales, leading to better
environmental decision making. However, a satellite constellation such as the A-train program,
which is a joint program between NASA, CNES, and JAXA, may group several satellites by design,
providing some insightful and complementary support to this type of profound research. Note that
the A-train (from Afternoon Train) is a satellite constellation of six Earth observation satellites of
varied nationalities in sun-synchronous orbit at an altitude of 705 km above Earth (Figure 3.8); they
include OCO-2, GCOM-W1 SHIZUKU, Aqua, CloudSat, CALIPSO, and Aura as of July 2014.
In addition, to fill in different data gaps of space-borne remote sensing and to facilitate the
system planning goals of providing low-cost and full-coverage images, the community has further
adopted a standard dubbed as CubeSat. (Heidt et al., 2000). A CubeSat is a spacecraft sized in
units, or Us, typically up to 12U (a unit is defined as a volume of about 10 cm × 10 cm × 10 cm)
that is launched fully enclosed in a container, enabling ease of launch vehicle system integration,
thus easing access to space (National Academies of Sciences, Engineering, and Medicine, 2016).
Continuous creation of customized nano-satellites and cube-satellites in the optical, microwave,
and radio frequency domains has become a big initiative and a giant niche for different types
of environmental applications. This fast evolution in Earth observation will possibly disrupt the
conventional ways of environmental monitoring.
Possible data merging and data fusion opportunities to tackle permafrost remote sensing
studies are shown in Figure 3.9. This system planning diagram exhibits the possible space-time
plot of selected near-term (2013–2020) satellite sensor observations with potential relevance for
permafrost. These parameters are linked to ALOS-2 (L-band SAR), Biomass Earth Explorer
mission, Landsat 8, RADARSAT (C-band radar data), IKONOS and GeoEye/RapidEye, SWOT,
Remote Sensing Sensors and Platforms 43

FIGURE 3.8 The A-Train that consists of six satellites in the constellation as of 2014. (National Aeronautics
and Space Administration (NASA), 2017. NASA A-Train portal, [Link] accessed
May 2017.)

ALOS Biomass
≥30-d

Landsat
(LDCM) SWOT
16-d
RADARSAT MODIS,VIIRS
Temporal fidelity

Sentinel 1
ICESat-2
Sentinel 2 TerraSAR-X/

8-d
IKONOS/GeoEye

ASCAT
TanDEM-X

3-d
SMAP
AMSR
≤1-d

≤30 m 100 m 250 m 500 m 1 km 3 km 10 km ≥25 km

Spatial resolution
Optical-IR LiDAR Microwave (active/passive)

FIGURE 3.9 Possible data fusion opportunities to tackle permafrost remote sensing studies. (Adapted from
The National Academies, 2014. Opportunities to Use Remote Sensing in Understanding Permafrost and
Related Ecological Characteristics: Report of a Workshop. ISBN 978-0-309-30121-3.)
44 Multisensor Data Fusion and Machine Learning for Environmental Remote Sensing

VIIRS, ICESat-2 (LiDAR), ASCAT (Advanced SCATterometer), SMAP (L-band SAR), AMSR
(Advanced Microwave Scanning Radiometer), ESA Sentinels 1 and 2, GRACE-FO, X-band SAR,
TerraSAR-X, and TanDEM-X.

3.7 SUMMARY
Many remote sensing studies have been carried out on national and international levels by different
government agencies, academia, research institutions, and industries to investigate new techniques
for observing Earth from space or sky. The close-knit relationships across many satellite missions
have welded several families of satellites that may help perform data merging or create some hidden
niches to promote data fusion, albeit not necessarily through a satellite constellation. The connection
between the EOS and SPOT families lives on as well from past missions to the current ones and to
the future missions—such as the Copernicus program—that spin off even more potential for long-
term data fusion and machine learning research. Future missions will influence cross-mission ties
from which the past, the current, and the future missions can be tailored cohesively for large-scale
research dealing with specific earth system science problems. These kinds of linkages have glorified
the principle of system engineering—“the whole is greater than the sum of its parts”. Various data
merging, data fusion, and machine learning algorithms have played a key behind-the-scenes role
in helping to expand the sum further. These algorithms will be introduced in subsequent chapters.

REFERENCES
Agren, A., Jansson, M., Ivarsson, H., Biship, K., and Seibert, J., 2008. Seasonal and runoff-related changes in
total organic carbon concentrations in the River Ore, Northern Sweden. Aquatic Sciences, 70(1), 21–29.
Anagnostou, E. N. and C. Kummerow, 1997. Stratiform and convective classification of rainfall using SSM/I
85-GHz brightness temperature observations. Journal of Atmospheric and Oceanic Technology, 14, 570–575.
Arenz, R., Lewis, W., and Saunders III, J., 1995. Determination of chlorophyll and dissolved organic carbon
from reflectance data for colorado reservoirs. International Journal of Remote Sensing, 17(8), 1547–1566.
Atlas, D. and Matejka, T. J., 1985. Airborne Doppler radar velocity measurements of precipitation seen in
ocean surface reflection. Journal of Geophysical Research-Atmosphere, 90, 5820–5828.
Barsi, J. A., Lee, K., Kvaran, G., Markham, B. L., and Pedelty, J. A., 2014. The spectral response of the
landsat-8 operational land imager. Remote Sensing, 6, 10232–10251.
Chang, N. B., Vannah, B., Yang, Y. J., and Elovitz, M., 2014. Integrated data fusion and mining techniques
for monitoring total organic carbon concentrations in a lake. International Journal of Remote Sensing,
35, 1064–1093.
Chang, N., Xuan, Z., and Yang, Y., 2013. Exploring spatiotemporal patterns of phosphorus concentrations in
a coastal bay with MODIS images and machine learning models. Remote Sensing of Environment, 134,
100–110.
European Space Agency (ESA), 2017. The Copernicus Programme, [Link]
Observing_the_Earth/Copernicus/Overview3
Heidt, H., Puig-Suari, J., Moore, A. S., Nakasuka, S., and Twiggs, R. J., 2000. CubeSat: A New Generation
of Picosatellite for Education and Industry Low-Cost Space Experimentation. In: Proceedings of the
14th Annual AIAA/USU Conference on Small Satellites, Lessons Learned-In Success and Failure,
SSC00-V-5. [Link]
King, M. D. and Byrne, D. M., 1976. A method for inferring total ozone content from the spectral variation of
total optical depth obtained with a solar radiometer. Journal of the Atmospheric Sciences, 33, 2242–2251.
King, M. D., Kaufman, Y. J., Tanré, D., and Nakajima, T., 1999. Remote sensing of tropospheric aerosols from
space: Past, present, and future. Bulletin of the American Meteorological Society, 80, 2229–2259.
Lee, H. J., Coull, B. A., Bell, M. L., and Koutrakis, P., 2012. Use of satellite-based aerosol optical depth and
spatial clustering to predict ambient PM2.5 concentrations. Environmental Research, 118, 8–15.
Li, J., Carlson, B. E., and Lacis, A. A., 2015. How well do satellite AOD observations represent the spatial
and temporal variability of PM2.5 concentration for the United States? Atmospheric Environment, 102,
260–273.
Li, Q., Li, C., and Mao, J., 2012. Evaluation of atmospheric aerosol optical depth products at ultraviolet bands
derived from MODIS products. Aerosol Science and Technology, 46, 1025–1034.
Remote Sensing Sensors and Platforms 45

National Academies of Sciences, Engineering, and Medicine. 2016. Achieving Science With CubeSats:
Thinking Inside the Box. The National Academies Press, Washington, DC. doi:10.17226/23503
National Aeronautics and Space Administration (NASA), 2012. [Link]
communications/outreach/funfacts/txt_passive_active.html, accessed May 2017.
National Aeronautics and Space Administration (NASA), 2013. Landsat 7 Science Data User’s Handbook.
Available at [Link]
National Aeronautics and Space Administration (NASA), 2017. EOSDIS - Remote Sensors. [Link]
[Link]/user-resources/remote-sensors, accessed May 2017.
National Aeronautics and Space Administration (NASA), 2017. NASA A-Train portal. [Link]
gov/ accessed May 2017.
Natural Resources Canada, 2017. In the “Fundamentals of Remote Sensing” tutorial, by the Canada Centre for
Remote Sensing (CCRS), Natural Resources Canada. [Link]
satellite-imagery-air-photos/satellite-imagery-products/educational-resources/9283, accessed May 2017.
Richards, J. A. and Jia, X., 2006. Remote Sensing Digital Image Analysis: An Introduction. Springer, Berlin,
Germany.
Stuffler, T., Kaufmann, C., Hofer, S., Förster, K.-P., Schreier, G., Mueller, A., and Eckardt, A. et al., 2007.
The EnMAP hyperspectral imager—An advanced optical payload for future applications in Earth
observation programmes. Acta Astronautica, 61, 115120.
The National Academies, 2014. Opportunities to Use Remote Sensing in Understanding Permafrost and
Related Ecological Characteristics: Report of a Workshop. ISBN 978-0-309-30121-3
United States Geological Survey (USGS), 2017. [Link]
satellites, accessed December 2017.
United States Geological Survey (USGS), 2014. Landsat 8 (L8) Data Users Handbook. [Link]
gov/landsat-8-l8-data-users-handbook-section-1, accessed May 2017.
Wood, V. T., Brown, R. A., and Sirmans, D., 2001. Technique for improving detection of WSR-88D mesocyclone
signatures by increasing angular sampling. Weather and Forecasting, 16, 177–184.
Zhang, Y., 2010. Ten years of remote sensing advancement & the research outcome of the CRC-AGIP Lab.
Geomatica, 64, 173–189.
Zrnic, D. S. and Ryzhkov, A. V., 1999. Polarimetry for weather surveillance radars. Bulletin of American
Meteorology Society, 80, 389–406.
4 Image Processing Techniques
in Remote Sensing

4.1 INTRODUCTION
Remote sensing involves the collection of data from different sensors far from the target (e.g., space-
borne instruments onboard satellites); the information is collected without making any physical
contact with the object. Data collected from remotely sensed instruments can be recorded either
in analog (e.g., audios) or digital format. Compared to the digital format, the conventional analog
format is confined to several drawbacks, such as a limitation on the size of the data that can be
transmitted at any given time, and inconveniences with manipulation as well. Therefore, the digital
format is commonly applied to archive remotely sensed data, especially in the form of images.
In remote sensing, data recorded by remotely sensed sensors are commonly archived in different
formats convenient for information storage and transformation, such as the Hierarchical Data Format
(HDF), network Common Data Format (netCDF), and so forth. Data archived in such formats are
always hard to be manipulated for visualization and interpretation without the aid of professional image
processing software and tools. In addition, remotely sensed data may contain noise or other deficiencies
resulting from various reasons, such as abnormal vibration of the observing systems. Therefore, further
processing procedures should be conducted to deal with these flaws. Since the remotely sensed data are
always archived in two-dimensional image forms, any further processing procedures performed on the
raw remotely sensed images can be generally interpreted as image processing.
Toward establishing a definition of image processing, two classic definitions are provided below:

• Image processing is a method to perform some operations on an image in order to get

an enhanced image or to extract some useful information from it. It is a type of signal
processing in which input is an image and output may be an image or characteristics/
features associated with that image (Acharya and Ray, 2005).
• Image processing is the processing of images with mathematical operations by using any
form of signal processing for which the input is an image or a series of images (Gonzalez
and Woods, 2008).

As suggested therein, the overarching goal of image processing is to produce a better image to aid in
visualization or information extraction by performing specific operations on raw images. In general,
image processing is intended to make the raw image more interpretable for better understanding of
the object of interest.
With the development of remote sensing and computer sciences, various image processing
techniques have been developed to aid in the interpretation or information extraction from remotely
sensed images. Although the choice of image processing techniques depends on the goals of each
individual application, some basic techniques are still common in most remote sensing applications.
In this chapter, a set of basic image processing techniques, as well as several software and
programming languages that are commonly applied for image processing and analysis in remote
sensing will be introduced.

4.2 IMAGE PROCESSING TECHNIQUES

Numerous image processing techniques have been developed in the past decades, and these
techniques can be categorized into the following four broad categories: pre-processing, enhancement,

47
48 Multisensor Data Fusion and Machine Learning for Environmental Remote Sensing

transformation, and classification. In this section, techniques associated with each type of application
to process remotely sensed images will be introduced.

4.2.1 Pre-Processing Techniques

As remote sensing systems fly high above Earth’s surface, remotely sensed images may contain
various distortions due to the vibration of imaging systems, the rotation and curvature of Earth, and
atmospheric contaminations. Therefore, it is essential to correct possible flaws in remotely sensed
images prior to post-processing and analysis. Forms of this image preparation step are referred to
as image pre-processing, which involves a set of initial processing operations that can be used to
correct or minimize distortions due to problems resulting from imaging systems and observing
conditions (Khorram et al., 2013). In this chapter, typical image pre-processing operations including
atmospheric correction, radiometric correction, geometric correction, registration, resampling,
mosaic, and gap filling will be discussed.

[Link] Atmospheric Correction

In remote sensing, radiation from Earth’s surface undergoes significant interaction with the
atmosphere before it reaches remote sensing sensors. Therefore, the measured radiation typically
contains at least two kinds of energy: part of the measured radiance from the reflection of the
target of interest and the remnant from the atmosphere itself (Hadjimitsis et al., 2010). For most
terrestrial applications, remotely sensed images should be collected under clear scenes without
other contaminants. However, it is hard to ensure that the observing scenes are fully clear, as thin
clouds, haze, and aerosols in the atmosphere are pervasive. Due to atmospheric effects such as
attenuation and scattering of clouds and aerosol particles, radiation measured within these scenes
contain atmospheric noise, which in turn affects the interpretation accuracy of the terrestrial targets,
especially in visible and near-infrared remote sensing (Herman et al., 1971; Duggin and Piwinski,
1984; Kaufman and Sendra, 1988; Kaufman and Tanré, 1996).
Broadly, atmospheric correction algorithms fall into two categories, absolute and relative
correction methods.

• Absolute correction method: Absolute correction method involves radiative transfer

calculations (e.g., Rayleigh and Mie scattering) by considering many time-dependent
parameters, such as sensor viewing geometry, irradiance at the top of the atmosphere,
solar zenith angle at the time of satellite overpass, and total optical depth associated
with aerosols and other molecular gases (Gonzalez and Woods, 2008). The physics-
based absolute correction methods are accurate in atmospheric correction, and have been
widely applied in real world applications, such as the commonly used dark object method
(e.g., Chavez, 1988; Song et al., 2001; Hadjimitsis and Clayton, 2009). Common models
that can be used for absolute atmospheric correction include 6S (Second Simulation of the
Satellite Signal in the Solar Spectrum) (Vermote et al., 1997), MODTRAN (MODerate
resolution atmospheric TRANsmission model) (Berk et al., 1989), and LOWTRAN
(Kneizys et al., 1981).
• Relative correction method: Relative correction method works to minimize the variations
within a scene through normalization of multiple images collected on different dates
compared with a reference of the same scene (Hall et al., 1991; Giri, 2012). A variety
of relative correction methods have been proposed, such as pseudo-invariant features
(Schott et al., 1988), multivariate alteration detection (Nielsen et al., 1998), and histogram
matching (Richter, 1996a,b). Compared to the absolute correction methods involving
complex radiative transfer calculations, the relative correction methods are simpler and
more efficient.
Image Processing Techniques in Remote Sensing 49

(a) (b)

FIGURE 4.1 Comparison of Landsat 8 Operational Land Imager (OLI) true color image (RGB composite of
bands 4, 3, 2) on October 6, 2014, (a) before and (b) after atmospheric correction by using Fast Line-of-sight
Atmospheric Analysis of Hypercube (FLAASH).

Despite atmospheric correction methods providing a way toward accurate remotely sensed
images, atmospheric correction should be carefully conducted as many factors must be estimated.
If these estimations are not properly derived, the atmospheric correction might result in even larger
bias than the atmosphere itself. An illustrative example of atmospheric correction is demonstrated
in Figure 4.1. As suggested therein, significant atmospheric effects are observed within the Landsat
8 Operational Land Imager (OLI) scene on October 6, 2014, due to smog and aerosols, and these
effects were predominantly removed after performing atmospheric correction with the aid of the
Fast Line-of-sight Atmospheric Analysis of Hypercube (FLAASH) method.

[Link] Radiometric Correction

Due to radiometric distortions arising from differences in illumination conditions and sensor
characteristics, the radiation observed by the imaging system may not coincide with the emitted or
reflected energy from the objects of interest (e.g., Duggin and Piwinski, 1984; Schott et al., 1988;
Song et al., 2001; Du et al., 2002; Teillet et al., 2007). In order to attain radiometric consistency
among different remotely sensed images, radiometric inconsistencies and distortions should be
accurately addressed prior to further image interpretation and analysis; processes to remove these
radiometric distortions are referred to as radiometric corrections (Pons and Solé-Sugrañes, 1994;
Yang and Lo, 2000; Janzen et al., 2006; Pons et al., 2014).
Generally, radiometric correction aims at calibrating the remotely sensed data in order to improve
the fidelity of reflected or emitted radiation from objects by removing radiometric distortions due
to sensors’ irregularities, topographic effects, and sun angles (e.g., Vicente-Serrano et al., 2008;
Chander et al., 2009; Gopalan et al., 2009). Considering different origins of radiometric noises,
radiometric corrections can be classified into the following two categories associated with sensor
irregularities as well as sun angle and topography:

• Radiometric correction of noise due to sensor irregularities: Sensor-related radiometric

noise may arise from changes in sensor sensitivity, such as degradation of remote sensing
sensors after operating in orbit for a long time period (Wang et al., 2012; Kim et al.,
2014). The measured signal from a degraded sensor will contain radiometric errors
due to changes in the sensor response function. To address these radiometric errors,
new relationships between calibrated irradiance measurement and sensor output signal
should be established; this process is also termed as calibration (Thorne et al., 1997;
Obata et al., 2015).
50 Multisensor Data Fusion and Machine Learning for Environmental Remote Sensing

(a) (b)

FIGURE 4.2 Landsat OLI scenes (a) before and (b) after radiometric correction.

• Radiometric correction of noise due to sun angle and topography: In remote sensing,
particularly over the water surface, the observed scenes may be contaminated by the
diffusion of the sunlight, resulting in lighter areas in an image (i.e., sun glint). This effect
can be corrected by estimating a shading curve which is determined by Fourier analysis
to extract a low-frequency component (Kay et al., 2009). In addition, due to topographic
effects, especially in mountainous areas, shading effects could result in another kind of
radiometric distortion, making the shaded area darker than normal. To remove shading
effects, corrections can be conducted using the angle between the solar radiation direction
and the normal vector to the ground surface (Dozier and Frew, 1990; Essery and Marks,
2007) (Figure 4.2).

[Link] Geometric Correction

In remote sensing, radiometric characteristics of objects of interest are always collected along with
geospatial (i.e., geographic location) information. Ideally, one pixel with the same geolocation
information in two images should precisely coincide to the same grid point on the ground. In other
words, pixels with the same geolocation should spatially overlay each other in the geographic
domain. However, remotely sensed images are always recorded along with the motion of imaging
instruments and platforms, hence geometric distortions may arise from changes in the sensor-Earth
geometry and altitude variations, such as movement of imaging platforms, variations in platform
velocity and scanning rate, and distortions due to the curvature of Earth (Khorram et al., 2013,
2016). Pixels recorded in different images may not be projected onto the same geographic grid due
to geometric distortions. Therefore, essential corrections or registrations should be performed to
improve the spatial coincidence among images.
Processes to adjust the coordinate system of images to reduce possible geometric distortions
and make pixels in different images coincide to the same grid points are referred to as geometric
correction (Toutin, 2004). Typically, geometric correction can be achieved through image-to-map
rectification or image-to-image registration (coregistration) by establishing affine relationships
between image coordinate system and geographic coordinate system with ground control points
(Toutin, 2004; Baboo and Devi, 2011; Khorram et al., 2013). The control points are used to build
a transformation relationship (e.g., polynomial) that will shift the raster dataset from its existing
location to the spatially correct location. This process also includes image registration, which is an
integrated image processing technique commonly used to align multiple scenes into a single integrated
image by transforming different sets of data into one identical coordinate system. Registration is
essential to image overlaying, because it helps to resolve common issues arising from image rotation
and scale, especially for overlaying two or more images of the same scene taken at different times,
from different viewpoints, and/or by different sensors for further grid scale calculations (Zitová and
Flusser, 2003). Image registration is considered a crucial step in all image analysis tasks, in particular
Image Processing Techniques in Remote Sensing 51

for the final knowledge gained through the integration of information from multiple data sources.
Thus, it is widely used in remote sensing, medical imaging, computer vision, and cartography.
The traditional approach for geometric correction mainly depends on the manual identification
of many ground control points to align the raster data, which is labor-intensive and time-consuming
(Goshtasby, 1987). In addition, the number of remotely sensed images has grown tremendously,
which has reinforced the need for highly efficient and automatic correction methods. With the
development of computer sciences and remote sensing technologies, a variety of methods has been
developed to advance automatic geometric correction, such as the automated ground control points
extraction technique (Gianinetto and Scaioni, 2008), scale-invariant feature transform (Deng et al.,
2013), and contour-based image matching (Eugenio et al., 2002).
With advancements in remote sensing technologies, high resolution satellite imagery (aerial
imagery as well) has become popular in real-world applications. Due to the high resolution of each
pixel, horizontal accuracy is of critical importance, because a tiny geometric variation from either
systematic sensors or terrain-related errors could result in significant distortions in the observed
imagery. Orthorectification, a process of removing inaccuracies caused by sensor, satellite/aircraft
motion, and terrain-related geometric distortions from raw imagery to improve the horizontal
accuracy, is also essential in geometric correction. Orthorectified imagery is required for most
applications involving multiple image analyses, especially for tasks when overlaying images with
existing data sets and maps, such as data fusion, change detection, and map updating. Compared
to the original imagery, the resulting orthorectified imagery is planimetric at every location with
consistent scale across all parts of the image so that features are represented in their “true” positions,
allowing for accurate direct measurement of distances, angles, and areas.

[Link] Geometric Transformation

Geometric transformation is the process of changing the geometry of a raster dataset from one
“coordinate” space to another. Since this “coordinate” space can be either the image coordinate or
map coordinate, we have to be very careful when talking about geometric transformations as the
object being transformed can be either the geometric objects or the coordinate system. Typically,
geometric transformation can be divided into three classes: Euclidean transformations, affine
transformations, and projective transformations.

• Euclidean transformations: The Euclidean transformations are the most commonly used
transformations, and can be a translation, a rotation, or a reflection. Essentially, Euclidean
transformations do not change length and angle measures. Moreover, Euclidean transformations
preserve the shape of a geometric object (e.g., lines transform to lines, and circles transform to
circles). In other words, only position and orientation of the object will be changed.
• Affine transformations: Affine transformations are considered generalizations of Euclidean
transformations. An affine transformation (or affinity) refers to any transformation
that preserves collinearity (i.e., all points lying on a line initially still lie on a line after
transformation) and ratios of distances (e.g., the midpoint of a line segment remains the
midpoint after transformation) (Weisstein, 2017). In general, affine is a type of linear
mapping method. Hence, operations such as scaling, resampling, shearing, and rotation
are all affine transformations. The difference between Euclidean and affine is that all
Euclidean spaces are affine whereas affine spaces can be non-Euclidean.
• Projective transformations: Projective transformations are commonly applied to remotely
sensed imagery to transform the observed data from one coordinate system to another
by given projection information. Certain properties remain invariant after projective
transformations, which include collinearity, concurrency, tangency, and incidence.

In performing geometric transformations on remotely sensed imagery, in particular the affine

transformations and projective transformations, resampling is one of the most commonly applied
52 Multisensor Data Fusion and Machine Learning for Environmental Remote Sensing

techniques to manipulate one digital image and transform it into a more consistent form by changing
its spatial resolution or orientation for visualization or data analysis (Gurjar and Padmanabhan,
2005). Due to limitations imposed by imaging systems, remotely sensed images captured from
different remote sensing instruments may have different spatial resolutions. However, in real-world
applications, the initial spatial resolution of a remotely sensed image may not be sufficient, or may
need to be consistent with other images. In such cases, resampling should be applied to transform
the original image into another form to satisfy application needs.

[Link] Resampling
Mathematically, resampling involves interpolation and sampling to produce new estimates for
pixels at different grids (Parker et al., 1983; Baboo and Devi, 2010). To date, a variety of methods
have been developed for resampling, and the choice of resampling kernels is highly application-
dependent. The three most common resampling kernels are nearest neighbor, bilinear interpolation,
and cubic convolution.

• Nearest neighbor: Nearest neighbor is a method frequently used for resampling in remote
sensing, which estimates a new value for each “corrected” pixel (i.e., new grid) using data
values from the nearest “uncorrected” pixels (i.e., original grids). The advantages of nearest
neighbor are its simplicity and capability to preserve original values in the unaltered scene.
Nevertheless, the disadvantages of nearest neighbor are also significant, in particularly
its blocky effects (Baboo and Devi, 2010). An example of image resampling with nearest
neighbor is shown in Figure 4.3.
• Bilinear interpolation: Bilinear interpolation is an image smoothing method which uses
only values from the four nearest pixels that are located in diagonal directions from a given
pixel to estimate appropriate values of that pixel (Parker et al., 1983; Baboo and Devi,
2010). In general, bilinear interpolation takes a weighted average of the closest 2 × 2
neighborhood of known pixel values surrounding the corresponding pixel to produce an
interpolated value. Weights assigned to the four pixel values are normally based on the
computed pixel’s distance (in 2D space) from each of the known points.
• Cubic convolution: Cubic convolution is conducted through a weighted average of 16 pixels
nearby the corresponding input pixel through a cubic function. Compared to bilinear
interpolation, cubic convolution performs better and the result does not have a disjointed
appearance like nearest neighbor (Keys, 1981; Reichenbach and Geng, 2003). However,
computational times required by cubic convolution are about 10 times more than those
required by the nearest neighbor method (Baboo and Devi, 2010).

(a) (b)

FIGURE 4.3 Comparison of Landsat Thematic Mapper (TM) image (RGB composite of bands 4, 3, 2) on
October 17, 2009, at (a) 30-meter and (b) 200-meter (resampled) spatial resolution.
Image Processing Techniques in Remote Sensing 53

In addition to the three aforementioned commonly used resampling kernels, there are some other
methods for resampling, such as the fast Fourier transformation resampling (Li, 2014) and quadratic
interpolation (Dodgson, 1997).

[Link] Mosaicking
Due to constraints of imaging systems, observations within a single scene may be incapable of
providing a full coverage of targets of interest. Therefore, an assemblage of different images together
to form one image with larger spatial coverage is desirable. In image processing, such a blending
process is referred to as mosaicking (Inampudi, 1998; Abraham and Simon, 2013).
Generally, mosaicking of images relies on the identification of controlled points or features in
different images, and then blends these images based on the overlap of these extracted common
controlled points or features (Inampudi, 1998). The most straightforward mosaicking is to blend
images collected from the same or adjacent satellite paths because of minimal radiometric differences
between these images (e.g., Figure 4.2). However, when images are collected from different paths
with significant time-elapsing differences, radiometric corrections should be conducted prior to
mosaicking. Otherwise, new radiometric distortions might be introduced to the blended images.

[Link] Gap Filling

Due to the presence of thick clouds or instrumental malfunctions, missing data values are
commonly observed in remotely sensed images. The most well-known example is the unscanned
gaps presenting in the Landsat 7 Enhanced Thermal Mapper Plus (ETM+) images, which result
in roughly 22% information loss in each ETM+ image (e.g., Figure 4.4a) due to the failure of the
scan-line corrector (SLC). Gaps in these remotely sensed images significantly reduce their utility
for environmental monitoring applications. In order to apply these valuable data sources, gaps in
these remotely sensed images should be removed by recovering or reconstructing all value-missing
pixels through a process known as gap filling (e.g., Maxwell et al., 2007; Zhang et al., 2007; Chen
et al., 2011; Zhu et al., 2012).
To date, numerous gap-filling approaches have been developed and used in real-world
applications. In general, these approaches can be broadly classified into three categories based on
spatial information, temporal information, or both.

• Gap-filling methods relying on spatial information: Methods relying on spatial information

to fill data gaps are mainly based on spatial interpolation, which use the non-gap neighboring
pixels to estimate values for value-missing pixels. Methods relying on this theory can be
found in numerous literatures, especially for removing gaps presenting in EMT+ images
(Addink, 1999; Maxwell et al., 2007; Zhang et al., 2007; Chen et al., 2011; Zhu et al., 2012).

(a) (b)

FIGURE 4.4 An example of Landsat 7 ETM+ SLC-off image (RGB composite of bands 4, 3, 2) on February
12, 2015 (a) before and (b) after gap filling.
54 Multisensor Data Fusion and Machine Learning for Environmental Remote Sensing

• Gap-filling methods relying on temporal information: Such gap-filling methods mainly

take advantage of historical information in time series to recover missing pixel values
through memory effects (Roerink et al., 2000; Gao et al., 2008; Kandasamy et al., 2013;
Verger et al., 2013). These methods are commonly used to fill gaps in time series such as
Leaf Area Index (LAI) and Normalized Difference Vegetation Index (NDVI), which have
prominent seasonal recurrence along the time horizon.
• Gap-filling methods using both spatial and temporal information: To achieve higher
accuracy, in some cases both spatial and temporal information are used to model the
missing pixel values. A simple example is the method proposed by Kang et al. (2005), who
proposed an approach for gap-filling ecosystem metrics using simple spatial interpolation
within land cover classes. If no cloud-free pixels were found within a 5 × 5 pixel window,
the algorithm used temporal interpolation to fill the pixel using data from earlier and later
dates. More complicated methods can also be found in the literature, such as the smart
information reconstruction algorithm (Chang et al., 2015) and other hybrid approaches
(Weiss et al., 2014).

4.2.2 Advanced Processing Techniques

In addition to those common pre-processing techniques mentioned above, certain advanced
processing techniques can also be applied to the processed imagery to further enhance the property
or quality of the imagery. This type of effort leads to the advancement of our understanding in
regard to how to manipulate those valuable image data sources. The following four techniques are
typically used to further manipulate remotely sensed imagery.

[Link] Image Enhancement

Basically, image enhancement modifies attributes of objects in an image with the aim of improving
the appearance or perception of these objects in order to aid the visual interpretability and analysis
(Maini and Aggarwal, 2010). In comparison to other manipulations performed on images, the
enhancement changes only the dynamic range of the chosen features for better visualization and
interpretation of objects instead of increasing the inherent information content of the original
data (Hummel, 1977; Starck et al., 2003). Techniques developed for image enhancement can be
generally divided into two categories including spatial domain methods and frequency domain
methods.

• Spatial domain methods: Spatial domain methods directly deal with image pixels through
different operations, such as histogram equalization (Hummel, 1977) and contrast
stretching (Yang, 2006). An overview of spatial domain methods can be found in the
literature (Maini and Aggarwal, 2010; Bedi and Khandelwal, 2013). An example of image
enhancement through histogram equalization is shown in Figure 4.5. Contrast is the
difference in visual properties that makes an object distinguishable from other objects and
the background. In visual perception, contrast is determined by the difference in the color
and brightness of the object compared to other objects. Methods such as contrast stretching
for image enhancement are oftentimes used to increase contrast between different objects
in order to make the objects of interest distinguishable (Starck et al., 2003; Yang, 2006).
• Frequency domain methods: Frequency domain methods operate on Fourier transform
of an image. This means that enhancement operations are performed on the Fourier
transform of the image, and the final output image is obtained by using the inverse Fourier
transform. Filtering is the commonly applied method for image enhancement; filtering out
unnecessary information (or noise) highlights certain frequency components (Chen et al.,
1994; Silva Centeno and Haertel, 1997).
Image Processing Techniques in Remote Sensing 55

(a) (b)

FIGURE 4.5 Landsat TM image (RGB composite of bands 3, 2, 1) on October 17, 2009 (a) before and (b)
after enhancement by performing histogram equalization.

Nevertheless, there is no general theory for determining the quality of image enhancement,
which means that most enhancements are empirical and require interactive procedures to obtain
satisfactory results.

[Link] Image Restoration

Similar to image enhancement, the purpose of image restoration is also to improve the quality
of image. However, image restoration attempts to reconstruct or recover an image that has been
degraded or corrupted. Corruptions and degradations can come in many forms, for instance, motion
blur, noise, illumination, and color imperfections. Image restoration works in principle by modeling
the degradation using a priori knowledge and applying the inverse process to restore the lost image
information (Lagendijk and Biemond, 1999). The ultimate goal is to reduce noise/corruption and
recover information loss. Because image processing is commonly performed either in image domain
or frequency domain, the most straightforward way for image restoration is to perform filtering and
deconvolution (Gunturk and Li, 2013). To date, a wide range of techniques have been developed
to perform image restoration, such as Weiner filtering (Lagendijk and Biemond, 1999), Fourier
transform (Lagendijk and Biemond, 1999), wavelet transform (Figueiredo and Nowak, 2003), and
blind deconvolution (Figueiredo and Nowak, 2003).

[Link] Image Transformation

In image processing, remotely sensed images can be converted from one representation or form
to another by applying simple arithmetic or complex mathematic operations; these processes are
referred to as image transformations (Gonzalez and Woods, 2008). In general, image transformations
include broad categories as forms of images are changed through operations applied to images,
such as image enhancement, rotation, resampling, registration, and so forth. However, unlike the
processes which are normally applied to only one image or a single spectral band, the transformations
mentioned here mainly refer to operations applied to multiple images or bands to create “new”
images in order to highlight certain features or objects of interest.
Arithmetic operations such as subtraction, addition, multiplication, and division are commonly
applied to perform transformation among images or spectral bands, and are also termed spectral
or band math (Khorram et al., 2013, 2016). A representative of such kind of transformations is
the calculation of vegetation index (Figure 4.6), such as the most frequently used NDVI (Rouse
et al., 1974):

NIR − Red
NDVI = (4.1)
NIR + Red
56 Multisensor Data Fusion and Machine Learning for Environmental Remote Sensing

(a) (b)

FIGURE 4.6 Normalized difference vegetation index generated from Landsat TM image on October 17,
2009. (a) Landsat TM image (RGB composite of bands 4, 3, 2) and (b) NDVI.

where NIR and Red denote reflectance collected from a near-infrared band and a red band,
respectively. In real-world applications, various vegetation indices such as LAI have been developed
to aid in the monitoring of vegetation; most of them rely on the absorption differences of vegetation
in the red and near-infrared wavelengths, such as the soil-adjusted vegetation index (Huete, 1988)
and enhanced vegetation index (Huete et al., 2002).
In addition to simple arithmetic operations, Principal Component Analysis (PCA) is another
procedure that is frequently applied for image transformation, especially for information reduction
on multispectral or particular hyperspectral images, as the multispectral imagery data are always
correlated from one band to the other (Cheng and Hsia, 2003; Pandey et al., 2011). In image processing,
the essence of PCA is to apply a linear transformation of multispectral band data to make a rotation
and translation of the original coordinate system (Batchelor, 1978). Normally, PCA is performed
on all bands of multispectral images without a priori information associated with image spectral
characteristics. Derived PCAs represent the spectral information more efficiently than the original
ones. The first principal component always accounts for the largest portion of variance, while other
principal components subsequently account for the remaining variance. Due to its efficiency and
information reduction characteristics, PCA has been frequently used for spectral pattern recognition
and image enhancement (Cheng and Hsia, 2003; KwangIn et al., 2005; Pandey et al., 2011).

[Link] Image Segmentation

Image segmentation is the process of partitioning an image into multiple distinct segments each
containing pixels with similar attributes. The goal of segmentation is to simplify and/or change
the representation of an image into something that is more meaningful and easier to analyze with
specific emphasis (Gunturk and Li, 2013). Hence, segmentation is typically used to identify objects
or other relevant information in digital images. A wealth of methods and techniques has been
developed to perform image segmentation, which can be either contextual or noncontextual. Based
on working principles, image segmentation techniques can be further classified as thresholding
methods, color-based segmentation, transform methods, and texture methods. The simplest non-
contextual segmentation technique is thresholding, which takes no account of spatial relationships
between features in an image, but simply groups pixels together on the basis of certain global
attributes such as gray level or color. In contrast, contextual techniques consider both spectral and
spatial relationships to advance feature extraction, for example, group together pixels with similar
gray levels and close spatial locations.
The commonly used image classification can be considered as one special form of segmentation
because it works mainly in the spectral domain without referring to spatial relationships among
pixels. Classification is an important approach to distinguish different types of objects based on
Image Processing Techniques in Remote Sensing 57

(a) (b)

FIGURE 4.7 Unsupervised classification of Landsat–5 TM image on October 17, 2009, using the ISODATA
method. (a) Landsat–5 TM image with RGB composite of bands 4, 3, 2 and (b) classified image.

their distinctive features (Acharya and Ray, 2005; Gonzalez and Woods, 2008; Giri, 2012). Land
covers are identified and classified into different categories based on the differences of spectral
features. In general, techniques developed for image classification in remote sensing can be divided
into unsupervised classification and supervised classification.

• Unsupervised classification: Pixels in one image are automatically classified and then
grouped into separate clusters, depending on the similarities of spectral features of
each pixel, without human intervention (Lee et al., 1999; Fjortoft, 2003). These kinds of
classifications are also termed as clustering, and the representative algorithms are K-means
(Mac Queen, 1967) and Iterative Self-Organizing Data Analysis Technique (ISODATA)
(Ball and Hall, 1964). Classification with unsupervised methods is simple and fast since
it involves only statistical calculation of the input image. However, the final output highly
depends on the number of clusters given by operators, and results in feature mixtures
frequently, especially for those objects having similar spectral characteristics, such as
water and shadows. In addition to ISODATA, a variety of algorithms have been developed
for unsupervised classification, such as the K-means and based methods (Hara et al., 1994;
Yuan et al., 2009), probabilistic methods (Fjortoft, 2003), and even hybrid methods (Lee
et al., 1999) (Figure 4.7).
• Supervised classification: Compared to unsupervised approaches, supervised classifications
require the user to select representative samples for each cluster as training sites (i.e., samples)
beforehand, and the identified clusters thus highly depend on these predetermined training
sites (Khorram et al., 2016). Therefore, the final output depends heavily on the cognition
and skills of the image specialist for training site selection. Commonly used supervised
classification algorithms include maximum likelihood (Ahmad and Quegan, 2012) and
minimum-distance classification (Wacker and Landgrebe, 1972). Despite this, results
from supervised classification are still much more accurate than those from unsupervised
approaches.

4.3 COMMON SOFTWARE FOR IMAGE PROCESSING

With the advancement of computer sciences and remote sensing technologies, various tools, software
packages, and programming languages have been developed and applied for image processing
purposes. Here, some common software packages and programming languages frequently used for
image processing in remote sensing are introduced.
58 Multisensor Data Fusion and Machine Learning for Environmental Remote Sensing

4.3.1 ENVI
ENVI, an acronym for “the ENvironment for Visualizing Images,” is a software application
developed by the Exelis Visual Information Solutions (Exelis VIS) company, which specializes in
remote sensing imagery processing and analysis. ENVI was first released in 1994, and written in
IDL (Interactive Data Language). In contrast to the text-based IDL, ENVI has a suite of user-friendly
Graphical User Interfaces (GUI) with a number of advanced scientific algorithms and wizard-based
tools embedded for imagery visualization, analysis, and processing (Figure 4.8).
As shown in Figure 4.8, EVNI provides various algorithms and tools for image processing and
analysis, including basic imagery reading modules to visualize images collected from different
platforms in different formats, as well as pre-processing functions and further advanced spatial and
spectral transformations. Compared to other image processing software, one of the advantages of
ENVI lies in its distinct combination of spectral-based and file-based techniques through interactive
manipulations which enables users to easily manipulate more than one image simultaneously for
advanced processing steps. In addition, ENVI provides extension interfaces to external tools and
functions, which enables users to create customized or application-oriented tools for different
purposes. Due to its supereminent performance in image processing, ENVI has been used in a
variety of industries, particularly in remote sensing.

4.3.2 ERDAS IMAGINE
ERDAS IMAGINE, a geospatial image processing application with raster graphics editor capabilities
designed by ERDAS Inc., has also been widely applied to process and analyze remotely sensed
imagery from different satellite platforms such as AVHRR, Landsat, SPOT, and LiDAR. Before
the ERDAS IMAGINE suite, various products were developed by ERDAS Inc. under the name of
ERDAS to assist in processing imagery collected from most optical and radar mapping sensors.

FIGURE 4.8 The GUI of ENVI version 5.0.

Image Processing Techniques in Remote Sensing 59

Similar to most image processing applications, ERDAS IMAGINE also provides a user-friendly
GUI to support imagery visualization, mapping, and so forth.
The first version of ERDAS was released in 1978, whereas the ERDAS IMAGINE was provided
in 1991. The latest version of ERDAS IMAGINE was released in 2015. Like all the previous
products, ERDAS IMAGINE aims mainly at processing geospatial raster data by providing many
solutions associated with image visualization, mapping, and data (e.g., raster, vector, LiDAR point)
analysis in one, allowing users to perform numerous operations on imageries toward specific goals.
It supports optical panchromatic, multispectral, and hyperspectral imagery, as well as radar and
LiDAR data in a wide variety of formats.
By integrating multiple geospatial technologies, ERDAS IMAGINE can be used as a powerful
package to process remotely sensed imagery supporting consolidated workflows. In addition,
ERDAS IMAGINE is flexible, depending on users’ needs. It provides three product tiers (i.e.,
Essentials, Advantage, and Professional) designed for all levels of users, which enables handling
any geospatial analysis task. Due to the robust multicore and distributed batch processing, ERDAS
IMAGINE is capable of handling tasks with a remarkable processing performance through dynamic
modeling, even when dealing with massive datasets from any sensor.

4.3.3 PCI Geomatica

PCI Geomatica is a powerful geospatial image processing software suite commonly used by
researchers and professionals to process and analyze remotely sensed data and imagery. It aims
primarily at fast data processing and allowing users to accurately perform advanced analysis and
operations on remotely sensed imagery. The latest version of the software is Geomatica 2015,
which provides visualization tools that support a variety of satellite and aerial sensors, including
the latest instruments. Meanwhile, hundreds of algorithms specifically optimized for performance
and accuracy have been assembled and integrated in PCI to support fast and accurate geospatial
imagery manipulation and analysis.
Due to its powerful workflows for orthorectification and mosaicking through OrthoEngine,
and the automation capability to generate intelligent products, PCI allows for efficient handling of
complex tasks and has been widely applied by many users around the world for image processing
purposes. Compared to similar image processing packages, PCI features cutting-edge atmospheric
correction workflows based on the automatic detection of cloud and haze within imagery, easier and
faster extraction of digital elevation models, quick and direct access to data, and advanced Synthetic
Aperture Radar (SAR) analysis as well.

4.3.4 ArcGIS
ArcGIS is a leading geographic information system (GIS) application that allows users to work
with geospatial maps and perform geoprocessing on the input raw data resulting in the production
of valuable information. The first ArcGIS suite was released in late 1999 by ESRI (Environmental
Systems Research Institute). Prior to ArcGIS, ESRI had developed various products focusing
mainly on the development of ArcInfo workstation and several GUI-based products such as the
ArcView. However, these products did not integrate well with one another. Within this context,
ESRI revamped its GIS software platform toward a single integrated software architecture, which
finally resulted in the ArcGIS suite.
ArcGIS provides a comprehensive platform to manage, process, and analyze the input raster or
vector data to extract valuable information. It is capable of managing geographic information in a
database, creating and analyzing geospatial maps, discovering and sharing geographic information,
and so forth. Key features of ArcGIS include: (1) a variety of powerful spatial analysis tools,
(2) automated advanced workflows, (3) high-quality maps creation, (4) geocoding capabilities, and
(5) advanced imagery support.
60 Multisensor Data Fusion and Machine Learning for Environmental Remote Sensing

With the development of remote sensing, ArcGIS provides a suite of image processing and analysis
tools enabling users to better understand the information locked in the imagery pixels. At present,
ArcGIS is capable of efficient managing and processing of time-variant, multi-resolution imagery
from multiple sources (e.g., satellite, aerial, LiDAR, and SAR), formats (e.g., GeoTIFF, HDF, [General
Regularly-distributed Information in Binary form] [GRIB], and netCDF), and projections. In addition
to the basic viewing and editing modules, ArcGIS provides a number of extensions that can be added
to aid in complex tasks, including spatial analyst, geostatistical analyst, network analyst, 3D analyst,
and so forth, which are capable of geoprocessing, data conversion, and analysis.
Due to its multiple functionalities, ArcGIS has been widely applied to process geospatial
imagery in remote sensing. One of the significant features of ArcGIS is that it provides a model
builder tool, which can be used to create, edit, and manage workflows for automatic sequential
execution of geoprocessing tools. In other words, outputs of one tool are fed into another tool as
input (Figure 4.9). The established model can be thought of as a new tool for batch processing, and it
is of great help in handling large volumes of datasets (e.g., long-term satellite imagery) for multiple
processing purposes (Figure 4.9).

4.3.5 MATLAB®
MATLAB is a high-level proprietary programming language developed by MathWorks Inc. that
integrates computation, visualization, and programming in a user-friendly interactive environment.
MATLAB has been widely used across disciplines for numeric computation, data analysis and
visualization, programming and algorithm development, creation of user interfaces, and so forth.
Since the basic data element of MATLAB is an array, it allows fast solution formulations for many
numeric computing problems, in particular those involving matrix representations, such as images
(i.e., two-dimensional numerical arrays). This means image processing operations can be easily

FIGURE 4.9 An example of model builder in ArcGIS.

Image Processing Techniques in Remote Sensing 61

expressed in a compact and clear manner toward a quick solution of image processing problems
(Gonzalez et al., 2004).
With the development of an extensive set of algorithms and functions specializing in manipulating
images, the capability of MATLAB is extended to the image processing domain. These comprehensive
algorithms and functions are achieved through a toolbox termed the Image Processing Toolbox. With
the aid of this toolbox, MATLAB can be easily applied to perform image analysis and processing
including image segmentation, enhancement, registration, and transformations, as well as noise
reduction and so forth. In addition, many algorithms and functions provided in the toolbox support
multicore processors and even GPUs (i.e., graphics processing units), resulting in the acceleration of
image processing, especially for computationally intensive workflows.
At present, MATLAB supports a diverse set of image types in different formats. Images achieved
in standard data and image formats can be directly read into a matrix in MATLAB for visualization
and even further manipulation purposes, as well as a number of specialized file formats, such as
HDF and netCDF. Meanwhile, results or matrices acquired after processing can also be exported as
raster datasets or images.

4.3.6 IDL
IDL, short for Interactive Data Language, is a scientific program with similar capabilities to
MATLAB, also developed by Exelis VIS. It has been commonly used along with ENVI, an image
processing software package built in IDL, for data analysis and image processing, particularly in
remote sensing and medical imaging. Similar to other programming languages, IDL incorporates
three essential capabilities including interactivity, graphics display, and array-oriented operation
for data analysis. Its vectorized nature makes IDL capable of performing fast array computations,
especially for numerically heavy computations, by taking advantage of the built-in vector operations.
With the capability of handling a large volume of data, IDL has been widely applied for image
processing and analysis. In addition to the built-in iTools widgets for interactive image display,
hundreds of algorithms and functions are provided for further advanced image manipulation and
processing with capabilities including segmentation, enhancement, filtering, Fourier transform and
wavelet transform, spectral analysis, and so forth. A distinctive feature of IDL is that it can be used
to develop customized tools for use as extended modules in ENVI for specific purposes.

4.4 SUMMARY
In this chapter, a variety of commonly used image pre-processing techniques including atmospheric
correction, radiometric correction, geometric correction, resampling, mosaicking, and gap filling
are discussed, as well as advanced processing methods including image enhancement, image
transformation, and image classification. In addition, image processing software and programming
languages such as ENVI, ArcGIS, MATLAB, and IDL are also briefly introduced. In the next
chapter, concepts of feature extraction in remote sensing will be formally introduced to expand the
theoretical foundation of remote sensing.

REFERENCES
Abraham, R. and Simon, P., 2013. Review on mosaicing techniques in image processing. International Journal
of Software Engineering Research and Practices, 3, 63–68.
Acharya, T. and Ray, A. K., 2005. Image Processing: Principles and Applications. Wiley InterScience, New
Jersey.
Addink, E. A., 1999. A comparison of conventional and geostatistical methods to replace clouded pixels in
NOAA-AVHRR images. International Journal of Remote Sensing, 20, 961–977.
Ahmad, A. and Quegan, S., 2012. Analysis of maximum likelihood classification on multispectral data.
Applied Mathematical Sciences, 6, 6425–6436.
62 Multisensor Data Fusion and Machine Learning for Environmental Remote Sensing

Baboo, D. S. S. and Devi, M. R., 2010. An analysis of different resampling methods in coimbatore, district.
Journal of Computer Science and Technology, 10, 61–66.
Baboo, D. S. S. and Devi, M. R., 2011. Geometric correction in recent high resolution satellite imagery: A case
study in coimbatore, Tamil Nadu. International Journal of Computer Applications, 14, 32–37.
Ball, G. H. and Hall, D. J., 1964. Some fundamental concepts and synthesis procedures for pattern recognition
preprocessors. In: International Conference on Microwaves, Circuit Theory, and Information Theory,
September, Tokyo, 113–114.
Batchelor, B. G., 1978. Digital image processing. Electronics & Power, 24, 863.
Bedi, S. S. and Khandelwal, R., 2013. Various image enhancement techniques—a critical review. International
Journal of Advanced Research in Computer Engineering, 2, 1605–1609.
Berk, A., Bernstein, L. S., and Robertson, D. C., 1989. MODTRAN: A Moderate Resolution Model for
LOWTRAN 7. Technical Report, May 12, 1986–May 11, 1987. Spectral Sciences, Inc., Burlington, MA.
Chander, G., Markham, B. L., and Helder, D. L., 2009. Summary of current radiometric calibration coefficients
for landsat MSS, TM, ETM+, and EO-1 ALI sensors. Remote Sensing of Environment, 113, 893–903.
Chang, N.-B., Bai, K., and Chen, C.-F., 2015. Smart information reconstruction via time-space-spectrum
continuum for cloud removal in satellite images. IEEE Journal of Selected Topics in Applied Earth
Observations and Remote Sensing, 8, 1898–1912.
Chavez, P. S., 1988. An improved dark-object subtraction technique for atmospheric scattering correction of
multispectral data. Remote Sensing of Environment, 24, 459–479.
Chen, H., Li, A., Kaufman, L., Hale, J., Haiguang, C., Li, A., Kaufman, L., and Hale, J., 1994. A fast filtering
algorithm for image enhancement. IEEE Transactions on Medical Imaging, 13, 557–564.
Chen, J., Zhu, X., Vogelmann, J. E., Gao, F., and Jin, S., 2011. A simple and effective method for filling gaps
in Landsat ETM+ SLC-off images. Remote Sensing of Environment, 115, 1053–1064.
Cheng, S.-C. and Hsia, S.-C., 2003. Fast algorithms for color image processing by principal component
analysis. Journal of Visual Communication and Image Representation, 14, 184–203.
Deng, H., Wang, L., Liu, J., Li, D., Chen, Z., and Zhou, Q., 2013. Study on application of scale invariant feature
transform algorithm on automated geometric correction of remote sensing images. In: Computer and
Computing Technologies in Agriculture VI, 352–358. Editted by Li, D. and Chen, Y., Zhangjiajie, China.
Dodgson, N. A., 1997. Quadratic interpolation for image resampling. IEEE Transactions on Image Processing,
6, 1322–1326.
Dozier, J. and Frew, J., 1990. Rapid calculation of terrain parameters for radiation modeling from digital
elevation data. IEEE Transactions on Geoscience and Remote Sensing, 28, 963–969.
Du, Y., Teillet, P. M., and Cihlar, J., 2002. Radiometric normalization of multitemporal high-resolution
satellite images with quality control for land cover change detection. Remote Sensing of Environment,
82, 123–134.
Duggin, M. J. and Piwinski, D., 1984. Recorded radiance indices for vegetation monitoring using NOAA
AVHRR data; atmospheric and other effects in multitemporal data sets. Applied Optics, 23, 2620.
Essery, R. and Marks, D., 2007. Scaling and parametrization of clear-sky solar radiation over complex
topography. Journal of Geophysical Research-Atmospheres, 112, D10122.
Eugenio, F., Marques, F., and Marcello, J., 2002. A contour-based approach to automatic and accurate
registration of multitemporal and multisensor satellite imagery. In: IEEE International Geoscience and
Remote Sensing Symposium, 3390–3392. Toronto, Ontario, Canada.
Figueiredo, M. A. T. and Nowak, R. D., 2003. An EM algorithm for wavelet-based image restoration. IEEE
Transactions on Image Processing, 12, 906–16.
Fjortoft, R., 2003. Unsupervised classification of radar images using hidden markov chains and hidden markov
random fields. IEEE Transactions on Geoscience and Remote Sensing, 14, 1735–686.
Gao, F., Morisette, J. T., Wolfe, R. E., Ederer, G., Pedelty, J., Masuoka, E., Myneni, R., Tan, B., and Nightingale,
J., 2008. An algorithm to produce temporally and spatially continuous MODIS-LAI time series. IEEE
Geoscience and Remote Sensing Letters, 5, 60–64.
Gianinetto, M. and Scaioni, M., 2008. Automated geometric correction of high-resolution pushbroom satellite
data. Photogrammetric Engineering & Remote Sensing, 74, 107–116.
Giri, C. P., 2012. Remote Sensing of Land Use and Land Cover: Principles and Applications, CRC Press,
Boca Raton, FL, pp. 1–469.
Gonzalez, R. C. and Woods, R. E., 2008. Digital Image Processing. 3rd edition, Pearson Prentice Hall, Upper
Saddle River, NJ.
Gonzalez, R. C., Woods, R. E., and Eddins, S. L., 2004. Digital Image Processing Using MATLAB. Gatesmark
Publishing, Knoxville, TN.
Image Processing Techniques in Remote Sensing 63

Gopalan, K., Jones, W. L., Biswas, S., Bilanow, S., Wilheit, T., and Kasparis, T., 2009. A time-varying
radiometric bias correction for the TRMM microwave imager. IEEE Transactions on Geoscience and
Remote Sensing, 47, 3722–3730.
Goshtasby, A., 1987. Geometric correction of satellite images using composite transformation functions. In:
The 21st International Symposium on Remote Sensing of Environment, Ann Arbor, Michigan.
Gunturk, B. K., Li, X., 2013. Image Restoration Fundamentals and Advances. CRC Press, Boca Raton, FL.
Gurjar, S. B. and Padmanabhan, N., 2005. Study of various resampling techniques for high-resolution remote
sensing imagery. Journal of the Indian Society of Remote Sensing, 33, 113–120.
Hadjimitsis, D. G. and Clayton, C., 2009. Darkest pixel atmospheric correction algorithm: A revised procedure
for environmental applications of satellite remotely sensed imagery. Environmental Monitoring and
Assessment, 159, 281–292.
Hadjimitsis, D. G., Papadavid, G., Agapiou, A., Themistocleous, K., Hadjimitsis, M. G., Retalis, A.,
Michaelides, S. et al. 2010. Atmospheric correction for satellite remotely sensed data intended for
agricultural applications: impact on vegetation indices. Natural Hazards and Earth System Sciences,
10, 89–95.
Hall, F. G., Strebel, D. E., Nickeson, J. E., and Goetz, S. J., 1991. Radiometric rectification: Toward a common
radiometric response among multidate, multisensor images. Remote Sensing of Environment, 35, 11–27.
Hara, Y., Atkins, R., Yueh, S., Shin, R., and Kong, J., 1994. Application of neural networks to radar image
classification. IEEE Transactions on Geoscience and Remote Sensing, 32, 1994.
Herman, B. M., Browning, S. R., and Curran, R. J., 1971. The effect of atmospheric aerosols on scattered
sunlight. Journal of the Atmospheric Sciences, 28, 419–428.
Huete, A., 1988. A soil-adjusted vegetation index (SAVI). Remote Sensing of Environment, 25, 295–309.
Huete, A., Didan, K., Miura, T., Rodriguez, E., Gao, X., and Ferreira, L., 2002. Overview of the radiometric
and biophysical performance of the MODIS vegetation indices. Remote Sensing of Environment, 83,
195–213.
Hummel, R., 1977. Image enhancement by histogram transformation. Computer Graphics and Image
Processing, 6, 184–195.
Inampudi, R. B., 1998. Image Mosaicing. In: IGARSS ‘98. Sensing and Managing the Environment. 1998
IEEE International Geoscience and Remote Sensing. Symposium Proceedings, 2363–2365.
Janzen, D. T., Fredeen, A. L., and Wheate, R. D., 2006. Radiometric correction techniques and accuracy assessment
for Landsat TM data in remote forested regions. In: Canadian Journal of Remote Sensing, 330–340.
Kandasamy, S., Baret, F., Verger, A., Neveux, P., and Weiss, M., 2013. A comparison of methods for smoothing
and gap filling time series of remote sensing observations—application to MODIS LAI products.
Biogeosciences, 10, 4055–4071.
Kang, S., Running, S. W., Zhao, M., Kimball, J. S., and Glassy, J., 2005. Improving continuity of MODIS
terrestrial photosynthesis products using an interpolation scheme for cloudy pixels. International
Journal of Remote Sensing, 26, 1659–1676.
Kaufman, Y. J. and Sendra, C., 1988. Algorithm for automatic atmospheric corrections to visible and near-IR
satellite imagery. International Journal of Remote Sensing, 9, 1357–1381.
Kaufman, Y. J. and Tanré, D., 1996. Strategy for direct and indirect methods for correcting the aerosol effect
on remote sensing: From AVHRR to EOS-MODIS. Remote Sensing of Environment, 55, 65–79.
Kay, S., Hedley, J. D., and Lavender, S., 2009. Sun glint correction of high and low spatial resolution images
of aquatic scenes: A review of methods for visible and near-infrared wavelengths. Remote Sensing, 1,
697–730.
Keys, R., 1981. Cubic convolution interpolation for digital image processing. IEEE Transactions on Acoustics,
Speech, and Signal Processing, 29, 1153–1160.
Khorram, S., Nelson, S. A. C., Cakir, H., and van der Wiele, C. F., 2013. Digital image acquisition:
Preprocessing and data reduction, in: Pelton, J. N., Madry, S., and Camacho-Lara, S. (Eds.) Handbook
of Satellite Applications, 809–837.
Khorram, S., van der Wiele, C. F., Koch, F. H., Nelson, S. A. C., and Potts, M. D., 2016. Principles of Applied
Remote Sensing. Springer, New York.
Kim, W., He, T., Wang, D., Cao, C., and Liang, S., 2014. Assessment of long-term sensor radiometric
degradation using time series analysis. IEEE Transactions on Geoscience and Remote Sensing, 52,
2960–2976.
Kneizys, F. X., Shettle, E. P., and Gallery, W. O., 1981. Atmospheric transmittance and radiance: The
LOWTRAN 5 code, in: Fan, R. W. (Ed.), SPIE 0277, Atmospheric Transmission. 116 (July 28, 1981),
116–124. SPIE, Washington D.C., United States.
64 Multisensor Data Fusion and Machine Learning for Environmental Remote Sensing

KwangIn, K., Franz, M., and Scholkopf, B., 2005. Iterative kernel principal component analysis for image
modeling. Pattern Anal. Mach. Intell. 27, 1351–1366.
Lagendijk, R. and Biemond, J., 1999. Basic methods for image restoration and identification, in: Bovik, A.
(Ed.), Handbook of Image and Video Processing, 1–25. Academic Press, Massachusetts, USA.
Lee, J. S., Grünes, M. R., Ainsworth, T. L., Du, L. J., Schuler, D. L., and Cloude, S. R., 1999. Unsupervised
classification using polarimetric decomposition and the complex wishart classifier. IEEE Transactions
on Geoscience and Remote Sensing, 37, 2249–2258.
Li, Z., 2014. Fast Fourier transformation resampling algorithm and its application in satellite image processing.
Journal of Applied Remote Sensing, 8, 83683.
Mac Queen, J., 1967. Some methods for classification and analysis of multivariate observations. In: Proceedings
of the Fifth Berkeley Symposium on Mathematical Statistics and Probability, 231–297.
Maini, R. and Aggarwal, H., 2010. A Comprehensive review of image enhancement techniques. Journal of
Computing, 2, 39–44.
Maxwell, S. K., Schmidt, G. L., and Storey, J. C., 2007. A multi-scale segmentation approach to filling gaps in
Landsat ETM+ SLC-off images. International Journal of Remote Sensing, 28, 5339–5356.
Nielsen, A. A., Conradsen, K., and Simpson, J. J., 1998. Multivariate alteration detection (MAD) and MAF
postprocessing in multispectral, bitemporal image data: New approaches to change detection studies.
Remote Sensing of Environment, 64, 1–19.
Obata, K., Tsuchida, S., and Iwao, K., 2015. Inter-band radiometric comparison and calibration of ASTER
visible and near-infrared bands. Remote Sensing, 7, 15140–15160.
Pandey, P. K., Singh, Y., and Tripathi, S., 2011. Image processing using principle component analysis.
International Journal of Computer Applications, 15, 37–40.
Parker, J. A., Kenyon, R.V., and Troxel, D. E., 1983. Comparison of interpolating methods for image resampling.
IEEE Transactions on Medical Imaging, 2, 31–39.
Pons, X. and Solé-Sugrañes, L., 1994. A simple radiometric correction model to improve automatic mapping
of vegetation from multispectral satellite data. Remote Sensing of Environment, 48, 191–204.
Pons, X., Pesquer, L., Cristóbal, J., and González-Guerrero, O., 2014. Automatic and improved radiometric
correction of Landsat imagery using reference values from MODIS surface reflectance images.
International Journal of Applied Earth Observation and Geoinformation, 33, 243–254.
Reichenbach, S. E. and Geng, F., 2003. Two-dimensional cubic convolution. IEEE Transactions on Image
Processing, 12, 857–865.
Richter, R., 1996a. A spatially adaptive fast atmospheric correction algorithm. International Journal of
Remote Sensing, 17, 1201–1214.
Richter, R., 1996b. Atmospheric correction of satellite data with haze removal including a haze/clear transition
region. Computers & Geosciences, 22, 675–681.
Roerink, G. J., Menenti, M., and Verhoef, W., 2000. Reconstructing cloudfree NDVI composites using fourier
analysis of time series. International Journal of Remote Sensing, 21, 1911–1917.
Rouse, J. W., Hass, R. H., Schell, J. A., and Deering, D. W., 1974. Monitoring vegetation systems in the great
plains with ERTS. In: Third Earth Resources Technology Satellite (ERTS) Symposium, pp. 309–317.
Texas, United States.
Schott, J. R., Salvaggio, C., and Volchok, W. J., 1988. Radiometric scene normalization using pseudoinvariant
features. Remote Sensing of Environment, 26, 1–16.
Silva Centeno, J. A. and Haertel, V., 1997. An adaptive image enhancement algorithm. Pattern Recognition,
30, 1183–1189.
Song, C., Woodcock, C. E., Seto, K. C., Lenney, M., and Macomber, S. A., 2001. Classification and change
detection using landsat TM data: when and how to correct atmospheric effects. Remote Sensing of
Environment, 75, 230–244.
Starck, J.-L., Murtagh, F., Candes, E. J., and Donoho, D. L., 2003. Gray and color image contrast enhancement
by the curvelet transform. IEEE Transactions on Image Processing, 12, 706–717.
Teillet, P. M., Fedosejevs, G., Thome, K. J., and Barker, J. L., 2007. Impacts of spectral band difference
effects on radiometric cross-calibration between satellite sensors in the solar-reflective spectral domain.
Remote Sensing of Environment, 110, 393–409.
Thorne, K., Markharn, B., Slater, P., and Biggar, S., 1997. Radiometric calibration of landsat. Photogrammetric
Engineering & Remote Sensing, 63, 853–858.
Toutin, T., 2004. Review article: Geometric processing of remote sensing images: Models, algorithms and
methods. International Journal of Remote Sensing, 25, 1893–1924.
Image Processing Techniques in Remote Sensing 65

Verger, A., Baret, F., Weiss, M., Kandasamy, S., and Vermote, E., 2013. The CACAO method for smoothing,
gap filling, and characterizing seasonal anomalies in satellite time series. IEEE Transactions on
Geoscience and Remote Sensing, 51, 1963–1972.
Vermote, E. F., Tanré, D., Deuzé, J. L., Herman, M., and Morcrette, J. J., 1997. Second simulation of the
satellite signal in the solar spectrum, 6s: an overview. IEEE Transactions on Geoscience and Remote
Sensing, 35, 675–686.
Vicente-Serrano, S. M., Pérez-Cabello, F., and Lasanta, T., 2008. Assessment of radiometric correction
techniques in analyzing vegetation variability and change using time series of Landsat images. Remote
Sensing of Environment, 112, 3916–3934.
Wacker, A. G. and Landgrebe, D. A., 1972. Minimum distance classification in remote sensing. LARS
Technichal Reports, paper 25.
Wang, D., Morton, D., Masek, J., Wu, A., Nagol, J., Xiong, X., Levy, R., Vermote, E., and Wolfe, R., 2012.
Impact of sensor degradation on the MODIS NDVI time series. Remote Sensing of Environment, 119,
55–61.
Weiss, D. J., Atkinson, P. M., Bhatt, S., Mappin, B., Hay, S. I., and Gething, P. W., 2014. An effective approach
for gap-filling continental scale remotely sensed time-series. ISPRS Journal of Photogrammetry and
Remote Sensing, 98, 106–118.
Weisstein, E. W. Affine Transformation. From MathWorld—A Wolfram Web Resource. [Link]
[Link]/[Link]. Accessed 2017.
Yang, C.-C., 2006. Image enhancement by modified contrast-stretching manipulation. Optics & Laser
Technology, 38, 196–201.
Yang, X. and Lo, C. P., 2000. Relative radiometric normalization performance for change detection from
multi-date satellite images. Photogrammetric Engineering & Remote Sensing, 66, 967–980.
Yuan, H., Van Der Wiele, C. F., and Khorram, S., 2009. An automated artificial neural network system for land
use/land cover classification from landsat TM imagery. Remote Sensing, 1, 243–265.
Zhang, C., Li, W., and Travis, D., 2007. Gaps-fill of SLC-off Landsat ETM+ satellite image using a
geostatistical approach. International Journal of Remote Sensing, 28, 5103–5122.
Zhu, X., Liu, D., and Chen, J., 2012. A new geostatistical approach for filling gaps in Landsat ETM+ SLC-off
images. Remote Sensing of Environment, 124, 49–60.
Zitová, B. and Flusser, J., 2003. Image registration methods: a survey. Image and Vision Computing, 21,
977–1000.
Part II
Feature Extraction for Remote Sensing
5 Feature Extraction and
Classification for Environmental
Remote Sensing

5.1 INTRODUCTION
Human beings seeking to detect and extract information from imagery dates back to the time when
the first photographic image was acquired, as early as the mid-nineteenth century. Motivated by the
subsequent advances in photogrammetry, the invention of the airplane, improvements in the relevant
instrumentations and techniques, the advent of digital imagery, and the capabilities of electronic
processing, interest in efficiently extracting information from imagery to help with learning and
decision-making has increased significantly (Wolf et al., 2000; Quackenbush, 2004).
With the advancement of remote sensing, a wealth of instruments has been deployed onboard
various satellite and space-borne platforms dedicated to providing versatile remotely sensed data
to monitor Earth’s environment. As many remotely sensed imageries with high spatial, temporal,
and spectral resolutions are available on a daily basis at the global scale, the data volume increases
by many orders of magnitude, making it even harder to convert images into actionable information
and knowledge through conventional manual interpretation approaches for further decision-making
(Momm and Easso, 2011). Manual interpretation is time-consuming and labor-intensive; in addition,
it is difficult to cope with large volume information embedded in remotely sensed data, particularly
for remotely sensed images with fine resolutions in spectral (e.g., hyperspectral images) and spatial
(e.g., panchromatic images) domains.
Along this line, many statistical and geophysical methods were developed to help retrieve
information from different types of remote sensing imageries. Machine learning and/or data
mining are relatively new methods for feature extraction. When performing feature extraction
with machine learning or data mining in search of geospatial intelligence for a complex dataset,
one of the major problems is the low efficiency issue stemming from the large number of variables
involved. With more learning algorithms becoming available, feature extraction not only requires
a huge amount of memory and computational power, but also results in a slow learning process
with possible overfitting the training samples and poor generalization of the prediction to new
samples (Zena and Gillies, 2015). Generally, the large amount of information retrieved from
remotely sensed data makes it difficult to perform classification or pattern recognition for
environmental decision-making, because the observed information can be miscellaneous and
highly overlapped, although some are complementary with each other. Overall, these problems
can be mainly attributed to the large amount of redundant or complementary information
embedded in data in either spatial, temporal or spectral domain requiring tremendous efforts of
data analyses and syntheses.
When the input data to an algorithm is prone to be redundant or too large to be managed,
it is desirable to transform the raw data into a reduced form by keeping only the primary
characteristics of the raw data. Key information embedded in such reduced forms can be well
represented or preserved by a set of features to facilitate the subsequent learning process and
improve generalization and interpretability toward efficient decision making. In image processing
and pattern recognition, the techniques designed to construct a compact feature vector well
representing the raw observations are referred to as feature extraction, which is largely related

69
70 Multisensor Data Fusion and Machine Learning for Environmental Remote Sensing

to dimensionality reduction (Sharma and Sarma, 2016). In some cases, the reduced form (i.e.,
feature vector) could even lead to better human interpretations, especially for hyperspectral data,
which has more than several hundreds to one thousand total bands. Due to significant advantages
in reducing the size and dimensionality of the raw data, feature extraction has been widely
used to help with the problem of constructing and identifying certain types of features from the
given input data to solve various problems via the use of machine learning, data mining, image
compression, pattern recognition, and classification.
In fields of computational intelligence and information management such as machine learning
and pattern recognition, feature extraction has become the most critical step prior to classification
and decision-making, as the final performance of analysis is highly dependent on the quality of
extracted features. A typical workflow of image processing and pattern recognition for environmental
monitoring is presented in Figure 5.1, from which we can see that feature extraction is the first
essential step of processing after the pre-processing. With the fast development of computer sciences
and other relevant information technologies, having all the features of interest in an observed scene
automatically identified at the push of a button, namely, a process of automatic feature extraction, is
truly appealing and plausible. The ultimate goal is to develop automatic and intelligent techniques
to cope with the problem of detecting and extracting informative features from the input data
effectively and efficiently.
In this chapter, basic concepts and fundamentals associated with feature extraction, as well as a
wealth of commonly applied feature extraction techniques that can be used to help with classification
problems in remote sensing, will be introduced to aid in environmental decision-making. Different
learning strategies summarized below will be thoroughly discussed:

• Supervised Learning: This involves a set of target values that may be fed into the learning
model, allowing the model to adjust according to errors.
• Unsupervised Learning: This is required when there is not a set of target values for a model
to learn, such as searching for a hidden pattern in a big dataset. Often, clustering analysis
is conducted by dividing the big data set into groups according to some unknown pattern.
• Semi-supervised Learning: This is a class of supervised learning processes that make use
of very small amounts of labeled data within a large amount of unlabeled data for training.
In this way, we may guess the shape of the underlying data distribution and generalize
better to new samples. These algorithms can perform well when we have a very small
amount of labeled points and a large amount of unlabeled points.

In addition, metrics that can be used to evaluate the performance of feature extraction methods as
well as perspectives of feature extraction will be presented.

Feature Pattern
Remotely sensed imagery Pre-processing Post-processing Decision making
extraction recognition

Impact
Atmospheric Low-level Clustering Statistic
assessment
correction feature extraction computation
Unsupervised Risk
Radiometric Risk analysis
classification management
correction Unsupervised Modeling
feature extraction Supervised Mitigation
Geometric classification strategy
Generalization
correction
Supervised Machine Precautionary
Re-projection learning Upscaling
feature extraction principle

FIGURE 5.1 A typical framework of pattern recognition for environmental decision-making.

Feature Extraction and Classification for Environmental Remote Sensing 71

5.2 FEATURE EXTRACTION CONCEPTS AND FUNDAMENTALS

5.2.1 Definition of Feature Extraction
Due to its efficacy in manipulating large amounts of information embedded in remotely sensed
imagery, feature extraction has long been considered the Holy Grail of remote sensing. However, it
is still difficult to develop a unified definition for feature extraction, and a wealth of variations can
be found in the literature:

• Feature extraction is the process of transforming raw data into more informative signatures
or characteristics of a system, which will most efficiently or meaningfully represent the
information that is important for analysis and classification (Elnemr et al., 2016).
• Feature extraction is a process for extracting relevant information from an image. After
detecting a face, some valuable information is extracted from the image which is used in
the next step for identifying the image (Bhagabati and Sarma, 2016).
• Feature extraction is the process of transforming the input data into a set of features which
can very well represent the input data. It is a special form of dimensionality reduction
(Sharma and Sarma, 2016).
• Feature extraction is a process of deriving new features from the original features in order
to reduce the cost of feature measurement, increase classifier efficiency, and allow higher
classification accuracy (Akhtar and Hassan, 2015).
• Feature extraction is a process of extracting the important or relevant characteristics
that are enclosed within the input data. Dimensionality or size of the input data will be
subsequently reduced to preserve important information only (Ooi et al., 2015).
• Feature extraction is a special form of dimensionality reduction aiming at transforming the
input data into a reduced representation set of features (Kumar and Bhatia, 2014).
• Feature extraction is one of the important steps in pattern recognition, aiming at extracting a
set of descriptors, various characteristic attributes, and the relevant information associated
to form a representation of input pattern (Ashoka et al., 2012; Jain et al., 2000).
• Feature extraction is the process of extracting and building features from raw data. Feature
functions are utilized to extract and process informative features that are useful for
prediction (Gopalakrishnan, 2009).
• Feature extraction is a dimensionality reduction method that finds a reduced set of features
that are a combination of the original ones (Sánchez-Maroño and Alonso-Betanzos, 2009).
• Feature extraction refers to the extraction of linguistic items from the documents to provide
a representative sample of their content. Distinctive vocabulary items found in a document
are assigned to the different categories by measuring the importance of those items to the
document content (Durfee, 2006).
• Feature extraction can be viewed as finding a set of vectors that represent an observation
while reducing the dimensionality (Benediktsson et al., 2003).
• Feature extraction is a process that extracts a set of new features from the original features
through some functional mapping (Wyse et al., 1980).

The above-listed definitions are all meaningful and informative, indicating that the key to
feature extraction is to construct a compact feature vector to well represent the original data in a
lower dimensionality space. However, it is clear that the definition varies among research domains
and applications. By summarizing previous definitions, we may define feature extraction broadly
as a general term referring to “the process of constructing a set of compact feature vectors by
extracting the most relevant features from the input data to facilitate further decision-making by
using the reduced representation (i.e., feature vector) instead of the original full-size data while still
maintaining sufficient accuracy.”
72 Multisensor Data Fusion and Machine Learning for Environmental Remote Sensing

5.2.2 Feature and Feature Class

A compact feature vector is of critical importance to feature extraction, as the final performance
depends largely on how well the extracted features can represent the original input data set.
Nevertheless, it is difficult to obtain a universal definition of a feature, because each specific
property of any object can be considered a possible feature. For example, an object can be
discriminated by simply considering its intrinsic characteristics such as color, size, shape, edge,
and other related properties (Lillesand et al., 1994; Tian, 2013). Therefore, any pattern associated
with distinct properties of objects can be applied to construct a feature. For example, in speech
recognition, features can be the length of sounds, relative power, and noise ratios. By contrast, in
image processing, the resulting features could be several unique subsets of the image represented
as Binary Large OBjects (BLOBs or blobs), isolated points, edges, continuous curves or connected
patches. Therefore, the definition of a feature can vary substantially between disciplines. In machine
learning and pattern recognition, a feature may refer to an individual measurable property of an
observed phenomenon (Bishop, 2006). In computer vision and image processing, a feature can be a
piece of information relevant for solving the computational task related to a certain application (Liu
et al., 2012). Overall, the concept of a feature is very general, and a solid definition often depends
largely on the domain of a specific problem or the type of application.
Given that, a feature can be referred to as a distinct pattern or structure associated with the
input data, such as a set of unique points or small patches present in an image (Figure 5.2). Toward
feature extraction, what constitutes a feature and what the feature actually represents are not issues;
the key is to determine whether the resulting feature set is sufficient to facilitate further decision-
making processes. There is no doubt that a good feature representation is essential to attain high
accuracy in any image processing or pattern recognition tasks (Elnemr et al., 2016). However, no
definite criterion is available to define a good feature set. Generally, a good feature should exhibit
the following properties:

• Informative: The resulting feature should be expressive and perceptually meaningful, and
be able to explain a certain level of information embedded in the input data.
• Distinctive: The neighborhood around the feature center varies enough to allow for a
reliable discrimination between the features.
• Nonredundant: Features derived from different samples of the same class should be
grouped in the same category, and each type of feature should represent a unique property
of the input data.
• Repeatable detections: The resulting features should be the same in two different images
of the same scene. In other words, the features should be resistant to changes in viewing
conditions and noise, such as the presence of rotation and scaling effect.
• Localizable: The feature should have a unique location assigned to it, and changes in
viewing conditions or directions should not affect its location.

Attributes Features
Data points

v1 v2 vn
Original data

FIGURE 5.2 An illustrative example of selecting features from a given input data.
Feature Extraction and Classification for Environmental Remote Sensing 73

Nevertheless, the aforementioned properties are not the only criteria that can be used to evaluate
a feature vector, and features should not be limited to these characteristics, because the resulting
features are highly dependent on the specific problems at hand.
Features can be broadly categorized as low-level features and high-level features, although there
is no distinct gap between them (Elnemr et al., 2016). In general, low-level features are fundamental
features such as edges and lines as well as many other basic descriptors that can be easily detected
without performing any complex manipulation, which can be further divided into general features
and domain-specific features. The so-called “general features” mainly refer to those common
features that can be directly detected from any given image. In other words, general features should
be universal and application-independent. Three general features commonly used are:

• Color features: Color is one of the most important features of images because it is visually
intuitive to human perception. Color features are often defined subject to a particular
color space or model, and the most popular color spaces are RGB (red-green-blue) and
HSV (hue-saturation-value). Based on these color spaces, a variety of color features
including color histogram (Wang et al., 2009), color moment (Huang et al., 2010), color
coherence vector (Pass et al., 1998), and color correlogram (Huang et al., 1997) can be
then extracted.
• Texture features: In image processing, texture refers to a set of metrics designed to quantify
the perceived information about the spatial arrangement of color or intensities in an image
or selected region of an image (Haralick et al., 1973). As opposed to color, which is usually
represented by the brightness of each individual pixel, texture is often measured based
on a set of pixels by considering spatial or spectral similarities among pixels. Based on
the domain from which features are extracted, textures can be divided into spatial texture
features and spectral texture features.
• Shape features: Shape is an important geometrical cue used by human beings to
discriminate real-world objects. A shape can be described by different parameters such as
rectangularity, circularity ratio, eccentricity ratio, and center of gravity.

A list of object-based features that have been categorized into three classes, namely spectral,
textural, and shape features is summarized in Table 5.1. Most of the object-based features in Table 5.1

TABLE 5.1
List of Object-based Features that have been Categorized into Three Classes: Spectral,
Textural, and Shape Features (Chen et al., 2014)
Feature Category Features
Spectral feature Mean, Standard Deviation, Skewness, Ratio, Maximum, Minimum Mean of Inner Border, Mean
of Outer Border, Mean Diff. to Darker Neighbors, Mean Diff. to Brighter Neighbors, Contrast
to Neighbor Pixels, Edge Contrast of Neighbor Pixels, Std. Dev. To Neighbor Pixels, Circular
Mean, Circular Std. Dev., Mean Diff. to Neighbors, Mean Diff. to Scene, Ratio to Scene
Texture GLCM feature Angular Second Moment, Contrast, Correlation, Dissimilarity, Entropy, Homogeneity, Mean,
Std. Dev.
Texture GLDV feature Angular Second Moment, Contrast, Entropy, Mean
Shape feature Area, Asymmetry, Border Index, Border Length, Compactness, Density, Elliptic Fit, Length,
Main Direction, Radius of Largest Enclosed Ellipse, Radius of Smallest Enclosing Ellipse,
Rectangular Fit, Roundness, Shape Index, Width

Source: Chen, X., Li, H., and Gu, Y., 2014. 2014 Fourth International Conference on Instrumentation and Measurement,
Computer, Communication and Control, Harbin, China, 539–543.
Note: The features and their abbreviations are shown in the right column.
74 Multisensor Data Fusion and Machine Learning for Environmental Remote Sensing

are low-level features. In contrast, the so-called “domain-specific features” mainly refer to those
application-dependent features, and thus are highly related to practical applications in the domain.
For instance, fingerprints can be used as good features in human identity identification. Analogously,
human faces can be detected for face recognition, while chlorophyll-a content can be considered a
good feature for vegetation detection. In general, domain-specific features are confined to certain
specific applications that are not universal across domains.
High-level features refer to those feature vectors further derived from the low-level features by
using certain extra algorithms after basic feature extraction in the sense that hybrid algorithms are
often employed. The primary difference between low- and high-level features lies in the complexity
of the process used to extract the advanced features based on the low-level features. In Table 5.1,
for example, shape features should be application-dependent, such as fingerprints, faces, and
body gestures used for basic evidence for high-level human pattern recognitions, whereas spectral
features usually serve as low-level features for subsequent high-level feature extraction. Although it
is sometimes harder to obtain a high-level feature, it is of more help in understanding the designated
target embedded in those low-level features.

5.2.3 Fundamentals of Feature Extraction

The concept of feature extraction refers to the construction of a set of feature vectors from the input
data to facilitate the complex decision-making process. Generally, a process of feature extraction
should consist of the following three essential steps: (1) feature detection, (2) feature construction,
and (3) feature selection. Feature detection aims at computing abstractions of the input image
by finding the inherent unique objects, such as edges or blobs. More specifically, it refers to a
discriminative process involving making local decisions at every pixel (i.e., image point) to examine
whether a given type of feature is presented at that point. The rationale is to find targets of interest
that can be used for feature construction and to finally choose an appropriate feature detector. Based
on the types of image features, common feature detectors can be grouped into three categories: (1)
edge, (2) corner, and (3) blob. In terms of a feature detector, the desirable property is repeatability,
which means that the detected features should be locally invariant, even in the presence of temporal
variations or rotations. The same feature should be detectable in two or more different images of the
same scene, regardless of changes in orientation or scale.
A good feature detector provides better chances of finding adequate unique objects that can be
used for generating features from the given input image. Once objects of interest have been detected,
a set of features can be constructed by transforming those candidate objects into different types of
features, thus the process of generating or constructing new features from functions of the original
detected objects is referred to as feature construction. This construction often involves a variety
of image processing techniques, such as segmentation and clustering (Bouman and Shapiro, 1994;
Agrawal et al., 2005), aimed at computing a set of descriptors by transforming fragmentary objects
into feature vectors to well represent the original input image. Since the input remotely sensed
image always contains a large amount of information, the resulting initial set of raw features can
be diverse. Although a larger number of input features may increase the final estimation accuracy
by providing more descriptions of the target to a certain extent, the increased number of features
may in turn add more computational burden and result in new problems such as overfitting. The
redundant or irrelevant features might make the feature vector too large to be efficiently managed by
the implemented algorithms. Therefore, removing those redundant and irrelevant features from the
initial set of raw features to reduce the number of input features is essential to facilitate advanced
learning and subsequent classification.
The process of constructing a new and compact feature vector by selecting a subset of relevant
features from the initial set of raw features based on certain principles or criteria is thus termed feature
selection. In other words, feature selection works to select the most relevant attributes from the input
Feature Extraction and Classification for Environmental Remote Sensing 75

data. This is achievable by removing redundant or irrelevant features and reducing the dimensionality
of feature vectors, which facilitates the advanced learning process and improves the generalization
(Zena and Gillies, 2015). Thus, feature selection techniques are frequently used in many domains
to cope with problems with large input spaces, such as data mining, pattern recognition, and image
processing, because too much information can reduce the effectiveness of further data manipulation
due to complexities. Nevertheless, we should be aware that feature selection is different from feature
construction, as the latter generates new feature vectors, whereas the former focuses on selecting a
subset of features. In addition, applying a feature selection technique relies primarily on the input
data, which contains a variety of either redundant or irrelevant features, as the dimensionality or
space can be substantially reduced while still maintaining sufficient information to well represent
the original target (Bermingham et al., 2015).
A feature selection method can be considered an integrated process of feature subset screening
and quality assessment (Peng et al., 2005). By examining the manner of combining the screening
algorithm and the model building, the feature selection methods can be divided into three primary
categories including (1) wrappers, (2) filters, and (3) embedded methods (Das, 2001; Guyon and
Elisseeff, 2003; Zena and Gillies, 2015). Some different applications for feature selection methods
were summarized in Table 5.2. These methods were differentiated mostly by the preselected
evaluation criterion (Guyon and Elisseeff, 2006).
The wrapper methods use a predictive model to score feature subsets, whereas the filter methods
only use a proxy measure (e.g., correlation and mutual information) instead of the error rate to
rank subsets of features (Guyon and Elisseeff, 2003). More specifically, the wrapper methods take
advantage of the prediction performance of the given learning machine to evaluate the importance
of each feature subset (Kohavi and Johnb, 1997). Thus, it has nothing to do with the chosen learning
machine, as it is often used as a perfect black box (Phuong et al., 2005). Compared to the wrapper
methods, filter-type methods are usually less computationally intensive, as the rationale of filter
methods is mainly based on the filter metrics (e.g., mutual information) without incorporating learning
to detect the similarity between a candidate feature subset and the desired output (Hall, 1999; Peng
et al., 2005; Nguyen et al., 2009). Nevertheless, filter methods are vulnerable to redundant features,
as the interrelationships between candidate features are not taken into account. Many experimental
results show that although the wrapper methods have the disadvantage of being computationally
inefficient, they often yield better performances (Zhuo et al., 2008). Embedded methods aim at
reducing the computational complexity by incorporating feature selection as part of the training
process, which is usually specific to the given learning machine (Guyon and Elisseeff, 2003; Duval
et al., 2009; Zare et al., 2013). In general, extracting features from a given input data set is associated
with combining various attributes into a reduced set of features, which is a combination of art and
science, as the whole process involves the integration of advanced computational algorithms as well
as the knowledge of the professional domain expert.
Overall, there are three primary aspects associated with feature extraction:

• Feature detectors: A good feature detector is of critical importance to the final extracted
features, as the detected inherent objects in the original input data are fundamental
elements for the construction of the initial set of raw features.
• Feature construction: The process of constructing features is the key to feature extraction,
and how well the constructed features can represent the original target determines the final
performance of the whole feature extraction process.
• Dimensionality reduction: Selecting a subset of features from the initial set of raw features
by removing those redundant or irrelevant features may significantly improve the learning
and generalization efficiency, which in turn advances the development and application of
feature extraction techniques, particularly in domains dealing with large feature spaces,
such as remote sensing applications.
76 Multisensor Data Fusion and Machine Learning for Environmental Remote Sensing

TABLE 5.2
Different Feature Selection Methods and Their Characteristics
Methods Types Descriptions References
Minimum-redundancy- Filter The method aims to select good features Peng et al. (2005)
maximum-relevance according to the maximal statistical dependency
(mRMR) feature selection criterion based on mutual information.
Bayesian network Filter The method can be viewed as a search and Castro and Von Zuben
optimization procedure where features are (2009); Hruschka
evaluated based on their likelihood. et al. (2004)
Correlation Feature Filter Features are evaluated on the basis of their Haindl et al. (2006);
Selection (CFS) correlation with the class. Hall (1999); Yu and
Liu (2003)
Cascade Correlation Feature Wrapper This new internal wrapper feature selection Backstrom and
Selection (C2FS) method selects features at the same time hidden Caruana (2006)
units are being added to the growing C2 net
architecture.
Genetic algorithm Wrapper The method uses an evolutional way to optimize Zhuo et al. (2008)
the feature subset.
Sequential search Wrapper Candidate features are sequentially added to the Glass and Cooper
subset until the further addition does not (1965); Nakariyakul
increase the classification performance. (2014)
Particle Swarm Optimization Wrapper Features are selected according to the likelihood Xue et al. (2013)
(PSO) calculated by PSO.
Support Vector Machine- Embedded The SVM-RFE method looks for the features Guyon et al. (2002)
Recursive Feature that lead to the maximum margin separation
Elimination (SVM-RFE) between the classes as the features are ranked
based on certain ranking criteria.
Kernel-Penalized SVM Embedded The method uses the scaling factors principle to Maldonado and Weber
penalize the use of features in the dual (2011)
formulation of SVM by considering an
additional term that penalizes the zero norm of
the scaling factors.
Random Forests Embedded The method combines binary decision trees built Genuer et al. (2010)
based on several bootstrap samples, as each
decision tree has maximal depth and is not
pruned, and using different algorithms to attain
generalization improvement.
Laplacian Score ranking + a Hybrid The method sorts the features according to their Solorio-Fernández
modified Calinski– relevance and evaluates the features considering et al. (2016)
Harabasz index them as a subset rather than individually based
on a modified Calinski–Harabasz index.
Information gain + wrapper Hybrid The method uses a combination of sample Naseriparsa et al.
subset evaluation + genetic domain filtering and resampling to refine the (2013)
algorithm sample domain and two feature subset
evaluation methods to select reliable features.

Although many high-end computational algorithms can substantially advance feature extraction,
the knowledge of the domain expert is still critical because it is often difficult to quantitatively
assess the accuracy or performance of each process so expert knowledge is thus of help. For instance,
the selection of feature detectors and the number of features to be selected is often determined
according to human intuition and interpretation.
Feature Extraction and Classification for Environmental Remote Sensing 77

5.3 FEATURE EXTRACTION TECHNIQUES

Feature extraction is motivated by the fact that data analysis tasks like environmental modeling often
require mathematically and computationally convenient input. Real-world data, such as remotely
sensed multispectral images, however, are usually complex, redundant, and highly variable. Thus,
there is a need to extract useful features or representations from raw input in a compact manner.
Although human interpretation and expert knowledge can substantially aid in the extraction of
features, converting the original input into actionable information is often labor-intensive because
the input data are often voluminous, whereas the human interpretation is usually less efficient and
arbitrary. Hence, an advanced feature extraction technique is essential to automate detection in
order to select the most relevant features from a set of candidate feature vectors, particularly in
coping with problems with large inputs, for example, remotely sensed hyperspectral imagery.
To date, a wealth of methods and techniques have been developed for feature extraction purposes.
Based on the domain on which the feature extraction is performed, feature extraction techniques
can be broadly grouped into spectral- and spatial-based feature extraction. Considering the working
principles used to extract features based on whether or not the labeled input data is applied, feature
extraction techniques can also be divided into three categories: supervised, semi-supervised, and
unsupervised methods.

5.3.1 Spectral-Based Feature Extraction

In the spectral domain, the data value over each grid of remotely sensed imagery recorded at one
particular channel represents the spectral information of one observed scene at a specific wavelength.
Hence, the data size depends highly on the spatial resolution and the number of wavelengths of the
sensors. With the advancement of remote sensing technologies, multispectral and hyperspectral
imageries provide synergistic capabilities spatially and spectrally to depict the observed target with
improved spectral characteristics. Although the increased spectral information in hyperspectral
remote sensing enables the detection of targets in more details, the large amount of information
may reduce the efficiency of data manipulation or data interpretation. Therefore, using several
techniques to extract informative and salient features from remotely sensed imagery in the spectral
domain is desirable; this has often been performed as pre-processing to remotely sensed imagery, in
particular for hyperspectral data analysis. The extracted features will be more intuitive and efficient
than the original data for further visualizations and calculations; in addition, the size of the data
after feature extraction will be significantly reduced due to the dimensionality reduction. All feature
extraction schemes performed in the spectral domain are collectively referred to as spectral-based
feature extraction, regardless of the applied methods and approaches.
The process of spectral-based feature extraction involves statistical transformations or band
math calculations among remotely sensed imagery at different spectral channels. The objective is to
extract unique features by removing irrelevant information (i.e., noise reduction) to facilitate further
classification or decision-making. Methods and approaches toward such a goal are numerous, and
three commonly used techniques, including (1) thresholding method, (2) Principal Component
Analysis (PCA), and (3) band math, are illustrated below for demonstration.
Thresholding is the simplest technique that has been commonly applied to extract features in
imagery, and is actually a form of low-level feature extraction performed as a point operation on
the input image (Nixon and Aguado, 2012). Theoretically, with a single threshold, it transforms
a greyscale or color image into a binary image, with pixels having data values larger than the
threshold labeled as one class (e.g., 0 or 1) and the remaining for the other class. Therefore, such
a method works purely based on an arbitrary threshold to partition the data sets into two distinct
classes, without referring to other criterion such as spatial relationships between features. Instead,
global attributes like grey level or color are used. Since the method separates an image into multiple
distinct segments with similar attributes, such a process is also termed as image segmentation.
78 Multisensor Data Fusion and Machine Learning for Environmental Remote Sensing

(a) (b)

FIGURE 5.3 Clouds extracted from one observed Landsat-8 OLI scene by using the thresholding method.
(a) Original image; (b) extracted clouds.

The process to extract features based on threshold is pretty simple and straightforward, but the
key is to determine an optimal threshold value. For example, by setting a proper threshold value to
the cloud-contaminated surface reflectance imagery observed by the Landsat 8 Operational Land
Imager (OLI), clouds can be easily detected and extracted from the original imagery (Figure 5.3).
The whole process can be modeled as

0, DN i < θ
DN i′ =  (5.1)
1, DN i ≥ θ

where DN i and DN i′ denote the value of a digit number at pixel i in the original image and the
segmented binary image (e.g., Figure 5.3b), respectively. θ is the threshold value to be determined
by an expert with a priori knowledge or other advanced methods.
Although the thresholding method is capable of extracting certain features effectively from a
given input imagery, the determination of an optimal threshold is not easy without the aid of a series
of experiments or a priori knowledge. Furthermore, one single threshold value may not suffice
to handle all features in one image with various properties (Lv et al., 2017), for example, land
use classification in complex urban regions. To cope with such a complex problem, two or more
thresholds can be used to separate each type of feature sequentially.
PCA is a classic statistical technique which has been commonly used to decorrelate a set of
possibly correlated variables by projecting the original space into different orthogonal spaces
(Pearson, 1901). After the transformation, the resulting vectors are an uncorrelated orthogonal basis
set, which is termed principal components (PCs). Theoretically, the total number of resulting PCs
should not be greater than the dimension of the input data set, with the first PC accounting for the
largest variance of the input data, and with each subsequent component in turn having the rest of
highest variance while being orthogonal to the preceding components. Because of this advantage,
PCA has been extensively used in processing remotely sensed imagery, especially for the purpose
of dimensionality reduction.
Due to the fact that the resulting PCs are orthogonal to one another, PCA can be applied for
feature extraction purposes by explaining the largest variance in one specific space because each
PC can be considered a unique feature by integrating most of the relevant spectral information.
Compared to other methods, PCA has great advantages due to its low complexity, the absence of
pre-determined parameters and, last but not least, the fact that PCs are orthogonal to each other.
An illustrative example of applying PCA to remotely sensed imagery is demonstrated in Figure 5.4.
The first PC explained a total variance of 71.3%, which mainly represents the characteristics of
Feature Extraction and Classification for Environmental Remote Sensing 79

(a) (b) (c)

FIGURE 5.4 Principal component analysis performed on multispectral images. (a) Landsat-5 TM true color
image; (b) the first principal component; (c) the second principal component.

the texture of the observed scene, whereas the second PC explained a total variance of 23.4%, by
mainly emphasizing rivers and built-up areas.
Band math can be linked to the development of some bio-optical models that are functions in
terms of certain bands. Remotely sensed imageries recorded at multiple wavelengths provide a
synergistic opportunity to better monitor the changing Earth environment because of increasing
spectral information content. Based on the absorption spectrum difference, objects viewed in
one scene can be separated by generating composite images with specific features emphasized
through certain mathematical calculations between images observed at different wavelengths. The
process to generate a composite image by using spectral information at different wavelengths with
mathematical tools is referred to as band math. Compared to the original imagery with multiple
observations, the composite image should be more intuitive. One representative is the Normalized
Difference Vegetation Index (NDVI) that is commonly used to assess whether the target being
observed contains live green vegetation. NDVI is calculated based on the absorption difference at
red and near-infrared wavelengths by green vegetation, which can be modeled as follows (Rouse
et al., 1974):

NIR − Red
NDVI = (5.2)
NIR + Red

where NIR and Red stand for the surface spectral reflectance measurements acquired in the near-
infrared and red wavelengths regions, respectively.
Statistically, the values of NDVI should vary between −1.0 and +1.0, with green vegetation often
possessing positive values. Theoretically, negative values of NDVI (<−0.1) often correspond to non-
vegetation objects, for example, water bodies, and values close to zero (−0.1 to +0.1) commonly
correspond to barren areas of rock, sand or snow. By contrast, low, positive values of NDVI (0.2 to
0.4) should correspond to low-level vegetated areas like shrub and grassland, while high values
(>0.4) indicate areas covered with dense green vegetation (e.g., temperate and tropical rainforests).
Based on these characteristics, NDVI has considered a good graphic indicator for measuring green
vegetation coverage on Earth’s surface. Hence, the vegetation features can be easily extracted from a
given remotely sensed imagery after calculating the NDVI values. The spatial distribution of green
vegetation shown in Figure 5.4a can be easily detected and separated from other targets within the
scene based on the calculated NDVI values (Figure 5.5). Similarly, by calculating different water
feature-related indexes from a series of multitemporal Landsat 5-TM, 7-ETM+, and 8-OLI images,
the spatiotemporal changes of the surface area of the water body in Lake Urmia in the Middle East
during 2000–2013 were investigated (Rokni et al., 2014).
80 Multisensor Data Fusion and Machine Learning for Environmental Remote Sensing

NDVI
0.35

–0.43

FIGURE 5.5 Calculated NDVI based on Landsat-5 TM surface reflectance at red and near-infrared
wavelengths.

5.3.2 Spatial-Based Feature Extraction

Objects observed in one scene can be described in terms of not only spectral characteristics but also
spatial arrangement. Differing from spectral-based methods relying simply on spectral differences
among pixels to extract specific features from remotely sensed imagery, spatial-based feature
extraction refers to extracting informative features by considering spatial relationships among
pixels, aiming at describing shapes, structures, textures, and the size of a target on Earth’s surface
(Lv et al., 2017). Although a large variety of spatial feature extraction methods have been developed
in the past decades, the methods taking advantage of spatial information for feature extraction are
still relatively few compared to widely applied spectral-based methods.
In spatial feature extraction, region-growing methods are commonly used to segment images
for feature extraction by considering spatial relationships among pixels, based on the assumption
that pixels nearby (or neighboring pixels) often have similar data values or attributes. Therefore,
the common practice is to find a data center and then compare this central pixel with its neighbors
to examine their similarities. For example, spectrally similar and spatially closing pixels can
be gathered to form one specific cluster. In spatial feature extraction methods, connectivity and
similarity are two basic aspects commonly applied for feature extraction. Connectivity is defined
in terms of pixel neighborhoods while similarity refers to texture property in either grey level or
shape. A typical method is the statistical region merging, which begins by building the graph of
pixels using 4-connectedness with edges weighted by the absolute value of the intensity difference
and then sorting those edges in a priority queue and merging the current regions belonging to the
edge pixels based on certain criteria (Boltz, 2004).
With the advent of high spatial resolution remotely sensed imagery, identifying small-scale
features such as roads and rivers becomes feasible by using spatial-based feature extraction
approaches (Han et al., 2012). Building a mathematical model to detect and extract contextual
information is an effective means for spatial feature extraction. Among spatial-based feature
extraction methods, Markov random field is one of the classic methods that is extensively used to
extract spatial-contexture features; it works mainly based on the probabilistic theory to construct
a segmentation model where regions are formed by spatial clusters of pixels with similar intensity
(Moser and Serpico, 2013; Zhao et al., 2015). Morphological profiles are another powerful tool that
is commonly used to automate spatial feature extraction. Relying on the morphology theory, a wealth
of methods has been developed for extracting spatial features, such as gray level co-occurrence
matrix (Haralick et al., 1973; Baraldi and Parmiggiani, 1995), pixel-based index (Huang et al.,
Feature Extraction and Classification for Environmental Remote Sensing 81

(a) (b)

FIGURE 5.6 (a) One Worldview observed scene on August 13, 2010; (b) a segmented image from (a) based
on shape, texture, and spectral characteristics.

2007), shape-size index (Han et al., 2012), and morphological profiles (Benediktsson et al., 2003;
Huang et al., 2014; Chunsen et al., 2016). Although the fundamental theory of each method is
different, the basic process is almost the same since these methods work mainly in the spatial
domain to detect and extract features of the targets being observed.
In real-world applications, spatial and spectral attributes are always used in conjunction with
one another to provide synergistic capability to aid in complex feature extraction tasks, such as
the high-resolution imagery classification (Shahdoosti and Mirzapour, 2017). For instance, urban
morphology is commonly characterized by a complex and variable coexistence of diverse and
spatially and spectrally heterogeneous objects (Boltz, 2004), hence the classification of land use
types in urban regions is by no means an easy task. To cope with such a complex problem, both
spatial and spectral attributes should be considered simultaneously to extract multiple land use
types effectively. Because features are detected and extracted for different objects, these methods
are also referred to as object-based feature extraction (Taubenböck et al., 2010; Shruthi et al., 2011;
Lv et al., 2017). For example, by considering both spatial and spectral attributes of objects observed
in one Worldview image, different types of ground targets can be detected and extracted with high
accuracy (Figure 5.6).

5.4 SUPERVISED FEATURE EXTRACTION

Feature extraction is considered an essential process in exploratory data analysis. To map the input
data onto a feature space which reflects the inherent structure of the original data, various methods
and approaches have been developed to automate the feature extraction process (Zhao et al.,
2006). Considering to what extent the training set is required, the feature learning methods can be
broadly categorized into three types: supervised, semi-supervised, and unsupervised (Figure 5.7).
Conceptually, in supervised feature extraction, features are learned and extracted with labeled input
data, and this learning process is usually referred to as “training” (Mohri, 2012). The labeled input
data are termed training sets, which typically consist of a suite of paired training examples (i.e.,
samples). Commonly, each training set is a pairwise data vector containing an input data set and the
corresponding output (i.e., target). For each input vector, the number of included samples is usually
arbitrarily determined by the user, as representative samples are often determined typically based
on the knowledge of experts. By analyzing the training sets, a supervised learning algorithm is
capable of inferring a function between the input and output, which can then be applied for mapping
82 Multisensor Data Fusion and Machine Learning for Environmental Remote Sensing

Feature extraction techniques

Supervised learning Unsupervised learning

Dimensional
Classification Regression Clustering
reduction

Principal component
Logic regression Linear regression K-means clustering analysis

Linear discriminant
Classification trees Decision trees K-nearest neighbor analysis

Hierarchical Tensor
Random forests Fuzzy classification
clustering decomposition

Gaussian mixture Random projection

Genetic algorithms Bayesian networks
models

Support vector Local linear

Genetic programming Genetic algorithms
machines embedding

Artificial neural Artificial neural Artificial neural Artificial neural

networks networks networks networks

FIGURE 5.7 Examples of supervised and unsupervised methods to automate the feature extraction process.

new inputs. In general, a robust learning algorithm should work well for the unseen data, and this
leads to the cross-validation in learning algorithms for performance evaluation.
In order to perform feature extraction in a supervised learning manner, one must go through the
following primary steps to:

1.
Determine what data should be used: This is essential for each exploratory data analysis;
the user must first have a clear idea about what data are to be used as training inputs. In the
case of land cover classification, for example, a possible training set could be constructed
from a remotely sensed panchromatic imagery, a multispectral imagery, or hyperspectral
imagery.
2.
Construct an optimal training set: A supervised learning algorithm will generalize a
function from the given training set (i.e., pairwise training samples and targets) and then
apply this inferred function for mapping the unseen data. Thus, the training set should be
comprehensive and representative, because the final accuracy of the generalized function
depends largely on how well the input-output corresponding structure is modeled. In order
to facilitate the advanced learning processes, the number of features is usually determined
based on expert knowledge such that redundant information could be significantly reduced
in the training sets. Otherwise, the learning burden could be very heavy due to a large
amount of irrelevant information. Moreover, the learning algorithm might fail to generalize
a proper function for the given input because of high dimensionality. On the other hand, the
number of features should not be too small, as the training inputs should contain adequate
Feature Extraction and Classification for Environmental Remote Sensing 83

information to represent all possible cases of the target so as to accurately predict the
unseen data.
3.
Select a suitable learning algorithm: To date, a wide range of learning algorithms are
available, and the user can select different algorithms toward a specific application by
considering the strengths and weaknesses of each algorithm. In addition, the structure of
the learning algorithm must be determined simultaneously; for example, the number of
hidden layers and hidden neurons represents the structure of an Artificial Neural Network
(ANN) model. Since there is no method satisfying all types of problems, the user should
select the algorithm most suitable for the specified real-world application.
4.
Choose a stopping criterion for the learning process: Once a training set and learning
algorithm are determined, the learning process can be started by generating a set of
models with the given learning algorithm to build relationships between dependent and
independent variables in the training set. In supervised learning, however, certain control
parameters (or stopping criteria) are required to stop the learning process, in particular
machine learning algorithms, such as ANN and Genetic Programming (GP). These
parameters can be tuned through an optimization algorithm, which can also be arbitrarily
defined by the user by setting certain criterion via cross-validation.
5.
Examine the performance of the learned function: After parameter optimization, the
performance of the inferred function should be carefully evaluated. The accuracy is
commonly assessed by mapping a separated subset that differs from the training set.
Statistical comparisons will be performed among the predicted output and the desired
value to check the overall accuracy of the inferred function. Once the accuracy meets the
anticipated level, the whole learning process is finished and the inferred function can then
be applied for mapping unseen data.

Supervised learning has been widely used in environmental remote sensing, and the most
common applications of feature extraction in remote sensing are remotely sensed image classification
and environmental modeling via machine learning tools. For example, Zhao and Du (2016)
developed a spectral–spatial feature-based classification framework to advance hyperspectral image
classification by using the trained multiple-feature-based classifier. In this framework, a balanced
local discriminant-embedding algorithm was proposed for spectral feature extraction from high-
dimensional hyperspectral data sets while a convolutional neural network was utilized to automatically
find spatial-related features at high levels. With the aid of GP, Chang et al. (2014) successfully
predicted the total organic carbon concentration in William H. Harsha Lake during 2008–2012 based
on the in situ measurements and the fused satellite-based remote sensing reflectance imagery. The
feature extraction performed in these two examples are realized in terms of supervised learning.
Despite the effectiveness of supervised learning algorithms, several major issues with respect to
supervised learning should be noted as well. The first is the tradeoff between bias and variance, as
the prediction error of a learned function is related to the sum of the bias and the variance of the
learning algorithm (Geman et al., 1992; James, 2003). Generally, a learning algorithm with low
bias should be flexible enough to fit data well. However, a large variance will be observed in the
predicted output if the algorithm is too flexible (e.g., fits each training set differently). Thus, a good
learning algorithm should be able to adjust this tradeoff automatically.
The second issue is related to the number of training data and the complexity of the function.
A small amount of training data will suffice if the inherent function is simple; otherwise, a large
volume of data is required if the true function is highly complex. The third issue is related to the
dimensionality of the input space. If the dimensionality of the input features is high, the learning
process can be very difficult because the high dimension inputs could confuse the learning algorithm,
making it fail to generalize well or generalize at local optimization. Therefore, a large training set
typically requires the learning algorithm to have low variance but high bias, and this motivates the
development of dimensionality reduction algorithms.
84 Multisensor Data Fusion and Machine Learning for Environmental Remote Sensing

The fourth issue is the noise level embedded in the desired output. In a situation where the
predicted values for the desired output are often incorrect, the learning algorithm should stop fitting
a function to the training set to avoid possible underfitting. In such a situation, early stopping as
well as removing noisy samples from the training set prior to the learning process can be of help.
In addition to the above-mentioned four major issues, other aspects such as the redundancy and
heterogeneity of data should also be considered in performing supervised learning tasks.

5.5 UNSUPERVISED FEATURE EXTRACTION

In many real-world applications like environmental remote sensing, labeled data (e.g., sampled ground
truthing) are usually limited due to the infeasibility or cost inefficiency of obtaining them. Hence,
it is necessary to utilize unlabeled data. In contrast to supervised feature extraction, unsupervised
feature extraction retrieves features from the given input using certain learning algorithms without
referring to any labeled data. The ultimate goal is to detect and extract low-dimensional features,
capturing the underlying structures embedded in high-dimensional data. Mathematically, a central
case of unsupervised feature learning is the problem of density estimation in statistics (Jordan and
Bishop, 2004). Feature learning in an unsupervised way does not utilize the labels of the data and
only exploits the structure underlying the data. Hence, unsupervised learning algorithms work by
relying mainly on the similarity among features.
The classical approach to unsupervised learning is clustering. Clustering refers to the task of
grouping a set of objects into different categories based on similarities among them, that is, objects
in the same group are more similar to each other than to those in other groups. Cluster analysis was
first introduced by Driver and Kroeber in 1932 in anthropology analysis and then used by Zubin in
1938 and Robert Tryon in 1939 in psychology, while Cattell made it famously known in 1943 by using
it for trait theory classification in personality psychology (Tryon, 1939; Cattell, 1943). To date, more
than 100 clustering algorithms can be found in the literature. Based on the cluster model, clustering
algorithms can be briefly categorized into four different groups: (1) density-based clustering (Ester
et al., 1996; Kriegel et al., 2011); (2) distribution-based clustering (Moon, 1996; Carson et al., 2002);
(3) centroid-based clustering (e.g., K-means) (Forgy, 1965); and (4) connectivity-based clustering (or
hierarchical clustering) (Johnson, 1967; Everitt et al., 2011; Murtagh and Contreras, 2012). Despite
a wide range of clustering algorithms, it should be noted that an algorithm that is designed for one
kind of model will generally fail on a data set that contains a radically different kind of model
(Murtagh and Contreras, 2012). Hence, to determine the most appropriate clustering algorithm for
a particular problem, experimental tests are commonly conducted unless there is a mathematical
reason to prefer one cluster model over another.
In addition to clustering algorithms, approaches such as PCA, independent principal analysis,
local linear embedding, and unsupervised dictionary learning, as well as types of neural network-
based methods can be also used as learning algorithms in unsupervised feature extraction toward
different purposes (e.g., dimensionality reduction). In real-world applications, it is usually difficult
to tell which method is the best one, and a priori knowledge of the input data can greatly aid in the
selection of the most appropriate method. Otherwise, the performance of the method needs to be
tested experimentally.

5.6 SEMI-SUPERVISED FEATURE EXTRACTION

Instead of performing feature extraction based on either completely labeled (supervised) or unlabeled
(unsupervised) training inputs, features can be extracted more effectively by taking advantage of a
large volume of unlabeled data together with a small amount of labeled data to provide synergistic
capability in advancing the learning process. The process of learning features from both labeled
and unlabeled data is thus referred to as semi-supervised feature extraction (Kuo et al., 2005; Belkin
et al., 2006). Generally, a semi-supervised learning process falls between supervised learning and
Feature Extraction and Classification for Environmental Remote Sensing 85

unsupervised learning. More accurately, it should be considered a class of supervised learning tasks
to a certain extent because it makes use of labeled data in the learning process. In other words,
the desired output values are provided for a subset of the training data whereas the remaining is
unlabeled. Typically, the amount of labeled data used is relatively smaller than that of the unlabeled
data. Despite this, the use of a small amount of labeled data in conjunction with unlabeled data may
result in considerable improvement in learning accuracy (Zhu, 2008). Therefore, semi-supervised
learning can be of great practical value in real-world applications.
By taking advantage of combined information from labeled and unlabeled data, semi-
supervised learning attempts to surpass the performance that could be obtained from either
supervised learning or unsupervised learning on each individual data set. In order to make
use of unlabeled data, the structure of the input data should be limited to one of the following
assumptions (Chapelle et al., 2006):

• Continuity assumption: Data close to each other are more likely to be labeled in the same
class. This is generally assumed in supervised learning and should also be obeyed in the
case of semi-supervised learning. This assumption yields a preference for geometrically
simple decision boundaries even in low-density regions to guarantee that fewer points in
different classes are close to each other.
• Cluster assumption: Discrete clusters can be formed and data in the same cluster tend to
be labeled in the same class. This is a special case of the continuity assumption that gives
rise to clustering-based feature learning.
• Manifold assumption: The data tend to lie on a manifold of much lower dimension than
that of the input data. Learning can proceed using distances and densities defined on the
manifold with both the labeled and unlabeled data to avoid dimensionality issues. The
manifold assumption is practical, especially for high-dimensional data with a few degrees
of freedom that are hard to model directly.

Due to the difficulty in acquiring a large volume of labeled data and the availability of vast
amounts of unlabeled data, semi-supervised learning has recently gained more popularity. In the
case of semi-supervised learning, unlabeled data are commonly used to either modify or reprioritize
hypotheses obtained from labeled data alone to aid in feature extraction (Zhu, 2008). Toward semi-
supervised learning, many different methods and algorithms have been developed in the past decades,
such as generative models, which are considered the oldest semi-supervised learning method based
on probabilistic theory (Zhu, 2008). Many other methods can be also applied, such as transductive
support vector machine (Vapnick, 1998; Yu et al., 2012), information regularization (Corduneanu
and Jaakkola, 2002; Szummer and Jaakkola, 2002), and graph-based methods (Camps-Valls et al.,
2007). More details of these methods will be introduced in the following chapters.

5.7 IMAGE CLASSIFICATION TECHNIQUES WITH LEARNING ALGORITHMS

One of the feature extraction techniques is image classification which deserves further attention.
Early studies of remote sensing image classification relied on a few statistical classifiers (i.e.,
either supervised or unsupervised learning) such as the Maximum Likelihood (ML) classifier
(Ahmad, 2012.), the K-Nearest Neighbor (KNN) classifier (Blanzieri and Melgani, 2008), and the
K-means clustering classifier (Phillips, 2002). Since the late 1990s, artificial intelligence-based
classifiers such as machine learning (i.e., supervised learning) or data mining (i.e., unsupervised
learning) classification techniques have been popular in this field. Various segmentation and
classification approaches, such as neural computing (Canty 2009; Kavzoglu, 2009), fuzzy logic
(Chen et al., 2009), evolutionary algorithms (Tso and Mather, 1999; Foody and Mathur, 2004a,b),
and expert systems (Stefanov et al., 2001), have been widely applied to improve the quality of
image classification.
86 Multisensor Data Fusion and Machine Learning for Environmental Remote Sensing

Neural computing methods are data-driven methods that have high fault-tolerance in general
(Giacinto and Roli, 2001; Canty, 2009; Kavzoglu, 2009). Many successful applications of remote
sensing image classification use classifiers based on various ANNs (i.e., such as Backward
Propagation (BP), Radial Basis Function (RBF), Self-Organized Mapping [SOM]) (Heermann and
Khazenie, 1992; Hoi-Ming and Ersoy, 2005; Suresh et al. 2008), and global optimization techniques
(such as Support Vector Machine [SVM]) (Foody and Mathur, 2004a,b). In ANNs, such as BP, RBF,
and SOM, an input space is mapped onto a feature space through the hidden layer, resulting in a
nonlinear classifier that outperforms most traditional statistical methods. However, these ANNs are
all “black box” models, whose classification mechanisms are difficult to interpret. Problems such as
overfitting, local minimum, and slow convergence speed are quite common for neural computing
methods. SVM differs radically from ANNs because SVM training processes always give rise to
a global minimum, and their simple geometric interpretation provides opportunities for advanced
optimization. While ANNs are limited by multiple local minima, the solution to an SVM is global
and unique; on the other hand, ANNs use empirical risk minimization whereas SVMs choose
structural risk minimization.
Classifiers based on fuzzy logic are much easier to interpret because the classification is usually
implemented according to rules summarized from the training dataset. Most fuzzy logic methods
are hybrid methods; for example, the Fuzzy C-Means (FCM) algorithm (Fan et al., 2009) is a hybrid
between fuzzy logic and the statistical algorithm (c-means). Classifiers based on Fuzzy Neural Networks
(FNN) (Chen et al., 2009) and Fuzzy Artmap (FA) (Han et al., 2004), which are combinations of
fuzzy logic and neural networks, were also reported. However, involving fuzzy logic in these hybrid
algorithms (i.e., FCM, FNN, and FA) may enlarge the uncertainty in the final classification.
Evolutionary algorithms are another category of machine learning techniques that have
been widely used in remote sensing image classification. Genetic Algorithm (GA), Evolutionary
Programming (EP), and GP are several classical evolutionary algorithms with many successful
applications (Agnelli et al., 2002; Ross et al., 2005; Makkeasorn et al., 2006; Awad et al., 2007; Chang
et al., 2009; Makkeasorn and Chang, 2009). Classifiers based on Artificial Immune System (AIS)
(Zhong et al., 2007) and swarm intelligence (Daamouche and Melgani, 2009) can also be included
in this category. In addition, classifiers such as those based on expert system theory (Stefanov et al.,
2001) and decision tree techniques (Friedl and Brodley, 1997) are also representative and important
classification methods. The current progress in the literature can be summarized based on the above
findings (Table 5.3). Hybrid learning algorithms integrating ML, FCM, FNN, FA, or KNN with

TABLE 5.3
Summary of Classification Methods in Image Processing
Statistical Methods ML, KNN, K-means
Artificial Neural Networks BP, RBF, SOM
Intelligence Global Optimization SVM
Fuzzy Logic FCM, FA, FNN
Evolutionary Algorithms GA, EP, GP, AIS
Other Methods Expert system, Decision tree

Source: Chang, N. B. (Ed.), 2012. Environmental Remote Sensing and Systems Analysis.
CRC Press, Boca Raton, FL.
Note: Maximum likelihood (ML), K-Nearest Neighbor (KNN), Backward propagation (BP),
radial basis function (RBF), self-organized mapping (SOM), support vector machine
(SVM), Fuzzy C-means (FCM), fuzzy artmap (FA), fuzzy neural network (FNN),
Genetic algorithm (GA), evolutionary programming (EP), and genetic programming
(GP), artificial immune system (AIS).
Feature Extraction and Classification for Environmental Remote Sensing 87

ANN, SOM, RBF, BP SVM or GP to form unique learning systems for specific feature extraction
can be anticipated.

5.8 PERFORMANCE EVALUATION METRIC

Developing a new algorithm to automate feature extraction is appealing and creative. For every
proposed new algorithm or approach, the inventors normally claim several superiorities and
advantages of their own algorithm relative to other methods. To support such claims, performance and
accuracy of the proposed method should be rigorously evaluated. Regarding feature extraction, the
performance of the methods can be assessed either qualitatively or quantitatively. Visual assessment
is the most straightforward method that is commonly used to evaluate the performance of the resulting
products. However, judgment of how to select or create these quality indexes or statistical indicators
for performance evaluation depends highly on expert knowledge. A lack of quantitative statistics
can make a comparison between methods difficult. Therefore, quantitative measures are essential
to performance evaluation because the final accuracy of each method can be compared objectively.
To date, a variety of statistical indicators has been proposed and used to measure the quality of
feature extraction results, especially for the problem of statistical classification. Among the proposed
indexes, the Overall Accuracy (OA) and Kappa coefficient are two widely used measures. In practice,
these measures are computed primarily based on a confusion matrix, which is also known as an error
matrix (Stehman, 1997). A confusion matrix refers to a special kind of table that typically has two
rows and two columns such that the number of false positives, false negatives, true positives, and
true negatives is reported; an illustrative example is given in Figure 5.8. Each column of the matrix
represents the instances in a predicted class while each row represents the instances in an actual class
(Powers, 2011). With this matrix, accuracy metrics such as OA and Kappa coefficient can be easily
computed (Fauvel et al., 2008; Chunsen et al., 2016; Imani and Ghassemian, 2016).
The OA is the fraction of correctly classified samples in relation to all samples for testing,
which is often used to indicate the overall performance of the method. An overall accuracy can be
calculated by summing the number of well-classified samples (diagonal elements) divided by the
total number of all testing samples. Based on the confusion matrix shown in Figure 5.8, the overall
accuracy can be computed as follows:

TP + TN
OA = (5.3)
P+N

Another measure is the Kappa coefficient, which measures the agreement between two raters
each of which classifies N items into C mutually exclusive categories (Galton, 1892; Smeeton, 1985).

Prediction condition

Total population Prediction positive Prediction negative

Condition positive
True positive (TP) False negative (FN)
(P)
True
condition
Condition negative
False positive (FP) True negative (TN)
(N)

FIGURE 5.8 A 2 × 2 confusion matrix with P positive instances and N negative instances. P is the total
number of true positive instances (equivalent to TP + FN) and N is the total number of true negative instances
(equivalent to FP + TN).
88 Multisensor Data Fusion and Machine Learning for Environmental Remote Sensing

In other words, it is a measure of how the classification results compared to values assigned by
chance. In contrast to the simple percent agreement calculation, the Kappa coefficient is generally
thought to be more robust, because it considers the possibility of the agreement occurring by chance
(Fauvel et al., 2008). Conceptually, it can be defined as:

po − pe
KC = (5.4)
1 − pe

where po denotes the observed proportionate agreement equivalent to the OA and pe denotes the
overall random agreement probability which can be calculated as:

TP + FN TP + FP FP + TN FN + TN
pe = ∗ + ∗ (5.5)
P+N P+N P+N P+N

Kappa values range from 0 to 1. A value of 0 means no agreement between the predicted
condition and the actual condition, while a value of 1 indicates that the predicted condition and
the actual condition are totally identical (i.e., perfect agreement). Hence, the larger the value of the
Kappa coefficient the more accurate the result. However, some researchers expressed concerns that
the Kappa coefficient is an overly conservative measure of agreement because it has a tendency to
take the observed categories’ frequencies as givens, making it unreliable for measuring agreement
in situations with limited observations (Wu and Yang, 2005; Strijbos et al., 2006).
Apart from the overall accuracy and the Kappa coefficient to measure the general performance,
the accuracy of class identification should also be assessed. Within such a context, statistics like
errors of commission and/or errors of omission can be computed. Errors of commission are a
measure of false positives, representing the fraction of values that were predicted to be in a class
but do not belong to that class. In contrast, errors of omission are a measure of false negatives,
representing the fraction of values that belong to a class but were predicted to be in a different class.
Hence, the Commission Error (CE) of condition positive shown in Figure 5.8 can be calculated as:

FN
CE = (5.6)
P

Similarly, the Omission Error (OE) of positive can be computed as:

FP
OE = (5.7)
TP + FP

In addition, Producer Accuracy (PA) and User Accuracy (UA) are another two performance
measures that are commonly computed and used for performance assessment of classification.
The PA shows the probability that a value in a given class was classified correctly, which can be
calculated by summing the number of pixels correctly classified in a particular category divided by
the total number of pixels actually belonging to that category.

TP
PA = (5.8)
TP + FP

The UA shows the probability that a value predicted to be in a certain class really belongs to that
class, which can be calculated as the fraction of correctly predicted values to the total number of
values predicted to be in a class.

TP
UA = (5.9)
P
Feature Extraction and Classification for Environmental Remote Sensing 89

It is clear that the PA and OE complement each other, as a total value of them equals 1, so do the
UA and CE.
Furthermore, statistics like Partition Coefficient (Bezdek, 1973), Fukuyama-Sugeno index
(Fukuyama and Sugeno, 1989), Fuzzy Hyper Volume (Gath and Geva, 1989), β index (Pal et al.,
2000), Xie-Beni index (Xie and Beni, 1991), and many others (Congalton, 1991; Wu and Yang, 2005),
can be also applied. To assess the statistical significance of differences in the classification results,
methods such as the McNemars test can be further applied, which is based upon the standardized
normalized test statistic (Foody, 2004). The parameter Z12 in McNemars test is defined as:

f12 − f21
Z12 = (5.10)
f12 + f21

where f12 denotes the number of samples classified correctly by classifier 1 and incorrectly by
classifier 2. A positive Z12 indicates classifier 1 outperforms classifier 2 while a negative value
shows vice versa. The difference in accuracy between two classifiers is considered to be statistically
significant one | Z12 | > 1.96 (Fauvel et al., 2008; Imani and Ghassemian, 2016).

5.9 SUMMARY
In this chapter, the basic concepts and fundamentals of feature extraction were introduced, including
definitions of feature, feature selection, and feature extraction. Based on the domain of interest,
feature extraction methods can be grouped into spectral- and spatial-based feature extraction, yet by
considering the working modes, feature extraction techniques can be divided into supervised, semi-
supervised, and unsupervised methods. Illustrative examples were also provided for demonstration
purposes. In addition, a set of different statistical indicators were introduced as performance
measures for the evaluation of the resulting outcomes.
It is clear that developing a robust feature extraction workflow is by no means a simple task, since
it requires us to gain a thorough understanding of the input data, to devote the requisite time for
pre-processing, and to effectively apply the elements of image interpretation for decision analysis.
In the next chapter, a wealth of traditional methods and approaches that were proposed for feature
extraction will be introduced and discussed.

REFERENCES
Agnelli, D., Bollini, A., and Lombardi, L., 2002. Image classification: An evolutionary approach. Pattern
Recognition Letters, 23, 303–309.
Agrawal, R., Gehrke, J., Gunopulos, D., and Raghavan, P., 2005. Automatic subspace clustering of high
dimensional data. Data Mining and Knowledge Discovery, 11, 5–33.
Ahmad, A., 2012. Analysis of maximum likelihood classification on multispectral data. Applied Mathematical
Sciences, 6, 6425–6436.
Akhtar, U. and Hassan, M., 2015. Big data mining based on computational intelligence and fuzzy clustering.
In: Zaman, N., Seliaman, M. E., Hassan, M. F. and Marquez, F. P. G. (Eds.), Handbook of Research on
Trends and Future Directions in Big Data and Web Intelligence, IGI Global, 130–148.
Ashoka, H. N., Manjaiah, D. H., and Rabindranath, B., 2012. Feature extraction technique for neural network
based pattern recognition. International Journal of Computer Science and Engineering, 4, 331–340.
Awad, M., Chehdi, K., and Nasri, A., 2007. Multicomponent image segmentation using a genetic algorithm
and artificial neural network. IEEE Geoscience and Remote Sensing Letters, 4, 571–575.
Backstrom, L. and Caruana, R., 2006. C2FS: An algorithm for feature selection in cascade neural networks.
In: The 2006 IEEE International Joint Conference on Neural Network Proceedings, Vancouver, BC,
Canada, 4748–4753.
Baraldi, A. and Parmiggiani, F., 1995. An investigation of the textural characteristics associated with gray
level cooccurence matrix statistical parameters. IEEE Transactions on Geoscience and Remote Sensing,
33, 293–304.
90 Multisensor Data Fusion and Machine Learning for Environmental Remote Sensing

Belkin, M., Niyogi, P., and Sindhwani, V., 2006. Manifold regularization: A geometric framework for learning
from labeled and unlabeled examples. Journal of Machine Learning Research, 7, 2399–2434.
Benediktsson, J. A., Pesaresi, M., and Arnason, K., 2003. Classification and feature extraction for remote
sensing images from urban areas based on morphological transformations. IEEE Transactions on
Geoscience and Remote Sensing, 41, 1940–1949.
Bermingham, M. L., Pong-Wong, R., Spiliopoulou, A., Hayward, C., Rudan, I., Campbell, H., Wright, A. F.,
et al., 2015. Application of high-dimensional feature selection: Evaluation for genomic prediction in
man. Scientific Reports, 5, 10312.
Bezdek, J. C., 1973. Cluster validity with fuzzy sets. Cybernetics and Systems, 3, 58–73.
Bhagabati, B. and Sarma, K. K., 2016. Application of face recognition techniques in video for biometric
security. In: Gupta, B., Dharma, P., Agrawal, D. P., and Yamaguchi, S. (Eds.), Handbook of Research on
Modern Cryptographic Solutions for Computer and Cyber Security, IGI Global, 460–478.
Bishop, C., 2006. Pattern Recognition and Machine Learning, Technometrics. Springer-Verlag,
New York.
Blanzieri, E. and Melgani, F., 2008. Nearest neighbor classification of remote sensing images with the maximal
margin principle. IEEE Transactions on Geoscience and Remote Sensing, 46, 1804–1811.
Boltz, S., 2004. Statistical region merging code. IEEE Transactions on Pattern Analysis and Machine
Intelligence, 26, 1452–1458.
Bouman, C. A. and Shapiro, M., 1994. A multiscale random field model for Bayesian image segmentation.
IEEE Transactions on Image Processing, 3, 162–177.
Camps-Valls, G., Bandos Marsheva, T. V., and Zhou, D., 2007. Semi-supervised graph-based hyperspectral
image classification. IEEE Transactions on Geoscience and Remote Sensing, 45, 3044–3054.
Canty, M. J. 2009. Boosting a fast neural network for supervised land cover classification. Computers &
Geosciences, 35, 1280–1295.
Carson, C., Belongie, S., Greenspan, H., and Malik, J., 2002. Blobworld: Image segmentation using expectation-
maximization and its application to image querying. IEEE Transactions on Pattern Analysis and
Machine Intelligence, 24, 1026–1038.
Castro, P. A. D. and Von Zuben, F. J., 2009. Learning Bayesian networks to perform feature selection. In: 2009
International Joint Conference on Neural Networks, Atlanta, GA, USA, 467–473.
Cattell, R. B., 1943. The description of personality: Basic traits resolved into clusters. Journal of Abnormal &
Social Psychology, 38, 476–506.
Chang, N. B., Daranpob, A., Yang, J. and Jin, K. R., 2009. Comparative data mining analysis for information
retrieval of MODIS images: Monitoring lake turbidity changes at Lake Okeechobee, Florida. Journal of
Applied Remote Sensing, 3, 033549.
Chang, N.-B., Vannah, B. W., Yang, Y. J., and Elovitz, M., 2014. Integrated data fusion and mining techniques
for monitoring total organic carbon concentrations in a lake. International Journal of Remote Sensing,
35, 1064–1093.
Chapelle, O., Schölkopf, B., and Zien, A., 2006. Semi-supervised Learning. MIT Press, Cambridge, MA.
Chen, H. W., Chang, N. B., Yu, R. F., and Huang, Y. W., 2009. Urban land use and land cover classification
using the neural-fuzzy inference approach with Formosat-2 data. Journal of Applied Remote Sensing,
3, 033558.
Chen, X., Li, H., and Gu, Y., 2014. Multiview Feature Selection for Very High Resolution Remote Sensing
Images. In: 2014 Fourth International Conference on Instrumentation and Measurement, Computer,
Communication and Control, Harbin, China, 539–543.
Chunsen, Z., Yiwei, Z., and Chenyi, F., 2016. Spectral–spatial classification of hyperspectral images using
probabilistic weighted strategy for multifeature fusion. IEEE Geoscience and Remote Sensing Letters,
13, 1562–1566.
Congalton, R. G., 1991. A review of assessing the accuracy of classifications of remotely sensed data. Remote
Sensing of Environment, 37, 35–46.
Corduneanu, A. and Jaakkola, T., 2002. On information regularization. In: Proceedings of the Nineteenth
Conference on Uncertainty in Artificial Intelligence, Acapulco, Mexico, 151–158.
Daamouche, A. and Melgani, F., 2009. Swarm intelligence approach to wavelet design for hyperspectral image
classification. IEEE Geoscience and Remote Sensing Letters, 6(4), 825–829.
Das, S., 2001. Filters, wrappers and a boosting-based hybrid for feature selection. In: Proceedings of the
Eighteenth International Conference on Machine Learning (ICML’01), San Francisco, CA, USA,
Morgan Kaufmann Publisher, 74–81.
Durfee, A., 2006. Text mining. In: Garson, G. D. and Khosrow-Pour (Eds.), Handbook of Research on Public
Information Technology, IGI Global, 592–603.
Feature Extraction and Classification for Environmental Remote Sensing 91

Duval, B., Hao, J.-K., and Hernandez Hernandez, J. C., 2009. A memetic algorithm for gene selection and
molecular classification of cancer. In: Proceedings of the 11th Annual Conference on Genetic and
Evolutionary Computation—GECCO ’09, New York, New York, USA, ACM Press.
Elnemr, H. A., Zayed, N. M., and Fakhreldein, M. A., 2016. Feature extraction techniques: Fundamental
concepts and survey. In: Kamila, N. K. (Ed.), Handbook of Research on Emerging Perspectives in
Intelligent Pattern Recognition, Analysis, and Image Processing, IGI Global, 264–294.
Ester, M., Kriegel, H. P., Sander, J., and Xu, X., 1996. A density-based algorithm for discovering clusters in
large spatial databases with noise. In: Proceedings of the 2nd International Conference on Knowledge
Discovery and Data Mining, Portland, Oregon, USA, 226–231.
Everitt, B. S., Landau, S., Leese, M., and Stahl, D., 2011. Hierarchical Clustering, in Cluster Analysis, 5th
Edition. John Wiley & Sons, Ltd, Chichester, UK.
Fan, J., Han M., and Wang, J., 2009. Single point iterative weighted fuzzy C-means clustering algorithm for
remote sensing image segmentation. Pattern Recognition, 42, 2527–2540.
Fauvel, M., Benediktsson, J. A., Chanussot, J., and Sveinsson, J. R., 2008. Spectral and spatial classification
of hyperspectral data using SVMs and morphological profiles. IEEE Transactions on Geoscience and
Remote Sensing, 46, 3804–3814.
Foody, G. M., 2004. Thematic map comparison: Evaluating the statistical significance of differences in
classification accuracy. Photogrammetric Engineering & Remote Sensing, 70, 627–633.
Foody, G. M. and Mathur, A., 2004a. A relative evaluation of multiclass image classification by support vector
machines. IEEE Transactions on Geoscience and Remote Sensing, 42, 1335–1343.
Foody, G. M. and Mathur, A., 2004b. Toward intelligent training of supervised image classifications: Directing
training data acquisition for SVM classification. Remote Sensing of Environment, 93, 107–117.
Forgy, E. W., 1965. Cluster analysis of multivariate data: Efficiency versus interpretability of classifications.
Biometrics, 21, 768–769.
Friedl, M. A. and Brodley, C. E., 1997. Decision tree classification of land cover from remotely sensed data.
Remote Sensing of Environment, 61, 399–409.
Fukuyama, Y. and Sugeno, M., 1989. A new method of choosing the number of clusters for the fuzzy C-means
method. In: Procedings of 5th Fuzzy System Sympptom, 247–250.
Galton, F., 1892. Finger Prints. Macmillan, London.
Gath, I. and Geva, A. B., 1989. Unsupervised optimal fuzzy clustering. IEEE Transactions on Pattern Analysis
and Machine Intelligence, 11, 773–780.
Geman, S., Bienenstock, E., and Doursat, R., 1992. Neural Networks and the Bias/Variance Dilemma. Neural
Computation, 4, 1–58.
Genuer, R., Poggi, J.-M., and Tuleau-Malot, C., 2010. Variable selection using random forests. Pattern
Recognition Letters, 31, 2225–2236.
Giacinto, G. and Roli, F., 2001. Design of effective neural network ensembles for image classification purposes.
Image and Vision Computing, 19, 699–707.
Glass, H. and Cooper, L., 1965. Sequential search: A method for solving constrained optimization problems.
Journal of the ACM, 12, 71–82.
Gopalakrishnan, V., 2009. Computer aided knowledge discovery in biomedicine. In: Daskalaki, A. (Ed.),
Handbook of Research on Systems Biology Applications in Medicine, IGI Global, 126–141.
Guyon, I. and Elisseeff, A., 2003. An introduction to variable and feature selection. Journal of Machine
Learning Research, 3, 1157–1182.
Guyon, I. and Elisseeff, A., 2006. An Introduction to Feature Extraction, in: Feature Extraction: Foundations
and Applications. Springers, Berlin, Heidelberg, 1–25.
Guyon, I., Weston, J., Barnhill, S., and Vapnik, V., 2002. Gene selection for cancer classification using support
vector machines. Machine Learning, 46, 389–422.
Haindl, M., Somol, P., Ververidis, D., and Kotropoulos, C., 2006. Feature selection based on mutual correlation.
In: Progress in Pattern Recognition, Image Analysis and Applications, Havana, Cuba, 569–577.
Hall, M. A., 1999. Correlation-based Feature Selection for Machine Learning. University of Waikato,
Hamilton, NewZealand.
Han, Y., Kim, H., Choi, J., and Kim, Y., 2012. A shape–size index extraction for classification of high resolution
multispectral satellite images. International Journal of Remote Sensing, 33, 1682–1700.
Han, M., Tang, X., and Cheng, L., 2004. An improved fuzzy ARTMAP network and its application in wetland
classification. In: Proceedings of 2004 IEEE International Geoscience and Remote Sensing Symposium,
Alaska, USA.
Haralick, R. M., Shanmugam, K., and Dinstein, I., 1973. Textural features for image classification. IEEE
Transactions on Systems, Man, and Cybernetics, SMC-3, 610–621.
92 Multisensor Data Fusion and Machine Learning for Environmental Remote Sensing

Heermann, P. D. and Khazenie, N., 1992. Classification of multispectral remote sensing data using a back-
propagation neural network. IEEE Transactions on Geoscience and Remote Sensing, 30, 81–88.
Hoi-Ming, C. and Ersoy, O. K., 2005. A statistical self-organizing learning system for remote sensing
classification. IEEE Transactions on Geoscience and Remote Sensing, 43, 1890–1900.
Hruschka, E. R., Hruschka, E. R., and Ebecken, N. F. F., 2004. Feature selection by Bayesian networks. In:
Tawfik, A. Y. and Goodwin, S. D. (Eds.), Advances in Artificial Intelligence, Cairns, Australia, 370–379.
Huang, Z. C., Chan, P. P. K., Ng, W. W. Y., and Yeung, D. S., 2010. Content-based image retrieval using
color moment and Gabor texture feature. In: 2010 International Conference on Machine Learning and
Cybernetics (ICMLC), Qingdao, China.
Huang, X., Guan, X., Benediktsson, J. A., Zhang, L., Li, J., Plaza, A., and Dalla Mura, M., 2014. Multiple
morphological profiles from multicomponent-base images for hyperspectral image classification. IEEE
Journal of Selected Topics in Applied Earth Observations and Remote Sensing, 7, 4653–4669.
Huang, J., Kumar, S. R., Mitra, M., Zhu, W., and Zabih, R., 1997. Image indexing using color correlograms.
In: Conference on Computer Vision and Pattern Recognition (CVPR’97), Puerto Rico, USA, 762–768.
Huang, X., Zhang, L., and Li, P., 2007. Classification and extraction of spatial features in urban areas using
high-resolution multispectral imagery. IEEE Geoscience and Remote Sensing Letters, 4, 260–264.
Imani, M. and Ghassemian, H., 2016. Binary coding based feature extraction in remote sensing high
dimensional data. Information Sciences, 342, 191–208.
Jain, A. K., Duin, R. P. W., and Jianchang, M., 2000. Statistical pattern recognition: A review. IEEE
Transactions on Pattern Analysis and Machine Intelligence, 22, 4–37.
James, G. M., 2003. Variance and bias for general loss functions. Machine Learning, 51, 115–135.
Johnson, S. C., 1967. Hierarchical clustering schemes. Psychometrika, 32, 241–254.
Jordan, M. and Bishop, C., 2004. Neural networks. In: Tucker, A. B. (Ed.), Computer Science Handbook,
Second Edition, Chapman and Hall/CRC, Flordia, USA, 1–16.
Kavzoglu, T., 2009. Increasing the accuracy of neural network classification using refined training data.
Environmental Modelling & Software, 24, 850–858.
Kohavi, R. and Johnb, G. H., 1997. Wrappers for feature subset selection. Artificial Intelligence, 97, 273–324.
Kriegel, H.-P., Kröger, P., Sander, J., and Zimek, A., 2011. Density-based clustering. WIREs Data Mining and
Knowledge Discovery, 1, 231–240.
Kumar, G. and Bhatia, P. K., 2014. A detailed review of feature extraction in image processing systems. In:
2014 IEEE 4th International Conference on Advanced Computing & Communication Technologies,
Rohtak, India, 5–12.
Kuo, B. C., Chang, C. H., Sheu, T. W., and Hung, C. C., 2005. Feature extractions using labeled and unlabeled data.
In: International Geoscience and Remote Sensing Symposium (IGARSS), Seoul, South Korea, 1257–1260.
Lillesand, T. M., Kiefer, R. W., and Chipman, J. W., 1994. Remote Sensing and Image Interpretation. John
Wiley and Sons, Inc., Toronto.
Liu, Z., Li, H., Zhou, W., and Tian, Q., 2012. Embedding spatial context information into inverted file for
large-scale image retrieval. In: Proceedings of the 20th ACM International Conference on Multimedia,
New York, USA, ACM Press, 199.
Lv, Z., Zhang, P., and Atli Benediktsson, J., 2017. Automatic object-oriented, spectral-spatial feature extraction
driven by Tobler’s first law of geography for very high resolution aerial imagery Classification. Remote
Sensing, 9, 285.
Makkeasorn, A. and Chang, N. B., 2009. Seasonal change detection of riparian zones with remote sensing
images and genetic programming in a semi-arid watershed. Journal of Environmental Management,
90, 1069–1080.
Makkeasorn, A., Chang, N. B., Beaman, M., Wyatt, C., and Slater, C., 2006. Soil moisture prediction in a
semi-arid reservoir watershed using RADARSAT satellite images and genetic programming. Water
Resources Research, 42, 1–15.
Maldonado, S. and Weber, R., 2011. Embedded feature selection for support vector machines: State-of-the-
Art and future challenges. In: Progress in Pattern Recognition, Image Analysis, Computer Vision, and
Applications, Pucón, Chile, 304–311.
Mohri, M., 2012. Foundations of Machine Learning. The MIT Press, Massachusetts, USA.
Momm, H. and Easso, G., 2011. Feature extraction from high-resolution remotely sensed imagery using
evolutionary computation. In: Kita, E. (Ed.), Evolutionary Algorithms, InTech, 423–442.
Moon, T. K., 1996. The expectation-maximization algorithm. IEEE Signal Processing Magazine, 13, 47–60.
Moser, G. and Serpico, S. B., 2013. Combining support vector machines and Markov random fields in an
integrated framework for contextual image classification. IEEE Transactions on Geoscience and
Remote Sensing, 51, 2734–2752.
Feature Extraction and Classification for Environmental Remote Sensing 93

Murtagh, F. and Contreras, P., 2012. Algorithms for hierarchical clustering: An overview. WIREs Data Mining
and Knowledge Discovery, 2, 86–97.
Nakariyakul, S., 2014. Improved sequential search algorithms for classification in hyperspectral remote
sensing images. In: Proceedings SPIE 9273, Optoelectronic Imaging and Multimedia Technology III,
927328, Beijing, China.
Naseriparsa, M., Bidgoli, A.-M., and Varaee, T., 2013. A hybrid feature selection method to improve
performance of a group of classification algorithms. International Journal of Computer Applications,
69, 28–35.
Nguyen, H., Franke, K., and Petrovic, S., 2009. Optimizing a class of feature selection measures. In: NIPS
2009 Workshop on Discrete Optimization in Machine Learning: Submodularity, Sparsity & Polyhedra
(DISCML), Vancouver, Canada.
Nixon, M. S. and Aguado, A. S., 2012. Feature Extraction and Image Processing, 2nd Edition. Academic
Press, London, UK.
Ooi, C. S., Seng, K. P., and Ang, L.-M., 2015. Automated technology integrations for customer satisfaction
assessment. In: Kaufmann, H.-R. (Ed.), Handbook of Research on Managing and Influencing Consumer
Behavior, IGI Global, Hershey, Pennsylvania, USA, 606–620.
Pal, S. K., Ghosh, A., and Shankar, B. U., 2000. Segmentation of remotely sensed images with fuzzy
thresholding, and quantitative evaluation. International Journal of Remote Sensing, 21, 2269–2300.
Pass, G., Zabih, R., and Miller, J., 1998. Comparing images using color coherence vectors. In: Proceedings of
the Fourth ACM International Conference on Multimedia, Massachusetts, USA, 1–14.
Pearson, K., 1901. On lines and planes of closest fit to systems of points in space. Philosophical Magazine, 2,
559–572.
Peng, H., Long, F., and Ding, C., 2005. Feature selection based on mutual information criteria of max-
dependency, max-relevance, and min-redundancy. IEEE Transactions on Pattern Analysis and Machine
Intelligence, 27, 1226–1238.
Phillips, S. J., 2002. Acceleration of K-means and related clustering algorithms. In: Mount, D. M. and
Stein, C. (Eds.), Lecture Notes in Computer Science. Springer Berlin Heidelberg, Berlin, Germany,
166–177.
Phuong, T. M., Lin, Z., and Altman, R. B., 2005. Choosing SNPs using feature selection. In: Proceedings—2005
IEEE Computational Systems Bioinformatics Conference, California, USA, 301–309.
Powers, D. M. W., 2011. Evaluation: From Precision, Recall and F-Measure to Roc, Informedness, Markedness
& Correlation. Journal of Machine Learning Technologies, 2, 37–63.
Quackenbush, L. J., 2004. A review of techniques for extracting linear features from imagery. Photogrammetric
Engineering & Remote Sensing, 70, 1383–1392.
Rokni, K., Ahmad, A., Selamat, A., and Hazini, S., 2014. Water feature extraction and change detection using
multitemporal landsat imagery. Remote Sensing, 6, 4173–4189.
Ross, B. J., Gualtieri, A. G., and Budkewitsch, P., 2005. Hyperspectral image analysis using genetic
programming. Applied Soft Computing, 5, 147–156.
Rouse, J. W., Haas, R. H., Schell, J. A., and Deering, D. W., 1974. Monitoring vegetation systems in the
great Okains with ERTS. In: Third Earth Resources Technology Satellite-1 Symposium, Texas, USA,
325–333.
Sánchez-Maroño, N. and Alonso-Betanzos, A., 2009. Feature selection. In: Shapiro, S. C. (Ed.), Encyclopedia
of Artificial Intelligence, IGI Global, Hershey, Pennsylvania, USA, 632–638.
Shahdoosti, H. R. and Mirzapour, F., 2017. Spectral–spatial feature extraction using orthogonal linear
discriminant analysis for classification of hyperspectral data. European Journal of Remote Sensing,
50, 111–124.
Sharma, M. and Sarma, K. K., 2016. Soft-computational techniques and Spectro-temporal features for
telephonic speech recognition. In: Bhattacharyya, S., Banerjee, P., Majumdar, D., and Dutta, P. (Eds.),
Handbook of Research on Advanced Hybrid Intelligent Techniques and Applications, IGI Global,
Hershey, Pennsylvania, USA, 161–189.
Shruthi, R. B. V., Kerle, N., and Jetten, V., 2011. Object-based gully feature extraction using high spatial
resolution imagery. Geomorphology, 134, 260–268.
Smeeton, N. C., 1985. Early history of the Kappa statistic (response letter). Biometrics, 41, 795.
Solorio-Fernández, S., Carrasco-Ochoa, J. A., and Martínez-Trinidad, J. F., 2016. A new hybrid filter–wrapper
feature selection method for clustering based on ranking. Neurocomputing, 214, 866–880.
Stefanov, W. L., Ramsey, M. S., and Christensen, P. R., 2001. Monitoring urban land cover change: An expert
system approach to land cover classification of semiarid to arid urban centers. Remote Sensing of
Environment, 77, 173–185.
94 Multisensor Data Fusion and Machine Learning for Environmental Remote Sensing

Stehman, S. V., 1997. Selecting and interpreting measures of thematic classification accuracy. Remote Sensing
of Environment, 62, 77–89.
Strijbos, J.-W., Martens, R. L., Prins, F. J., and Jochems, W. M. G., 2006. Content analysis: What are they
talking about? Computers & Education, 46, 29–48.
Suresh, S., Sundararajan, N., and Saratchandran, P., 2008. A sequential multi-category classifier using radial
basis function networks. Neurocomputing, 71, 1345–1358.
Szummer, M. and Jaakkola, T., 2002. Information regularization with partially labeled data. In: Proceedings
of Advances in Neural Information Processing Systems, 15, 1025–1032.
Taubenböck, H., Esch, T., Wurm, M., Roth, A., and Dech, S., 2010. Object-based feature extraction using high
spatial resolution satellite data of urban areas. Journal of Spatial Science, 55, 117–132.
Tian, D. P., 2013. A review on image feature extraction and representation techniques. International Journal
of Multimedia and Ubiquitous Engineering, 8, 385–395.
Tryon, R. C., 1939. Cluster Analysis: Correlation Profile and Orthometric (factor) Analysis for the Isolation
of Unities in Mind and Personality. Edwards Brother. Inc. lithoprinters Publ.
Tso, B. C. K. and Mather, P. M., 1999. Classification of multisource remote sensing imagery using a genetic
algorithm and Markov random fields. IEEE Transactions on Geoscience and Remote Sensing, 37, 1255–1260.
Vapnick, V. N., 1998. Statistical Learning Theory. Wiley, New York.
Wang, X.-Y., Wu, J.-F., and Yang, H.-Y., 2009. Robust image retrieval based on color histogram of local feature
regions. Multimedia Tools and Applications, 49, 323–345.
Wolf, P., Dewitt, B., and Mikhail, E., 2000. Elements of Photogrammetry with Applications in GIS. McGraw-
Hill Education, New York, Chicago, San Francisco, Athens, London, Madrid, Mexico City, Milan, New
Delhi, Singapore, Sydney, Toronto.
Wu, K.-L. and Yang, M.-S., 2005. A cluster validity index for fuzzy clustering. Pattern Recognition Letters,
26, 1275–1291.
Wyse, N., Dubes, R., and Jain, A. K., 1980. A critical evaluation of intrinsic dimensionality algorithms. In:
Gelsema, E. S. and Kanal, L. N. (Eds.), Pattern Recognition in Practice, North-Holland Publishing
Company, Amsterdam, Netherlands, 415–425.
Xie, X. L. and Beni, G., 1991. A validity measure for fuzzy clustering. IEEE Transactions on Pattern Analysis
and Machine Intelligence, 13, 841–847.
Xue, B., Zhang, M., and Browne, W. N., 2013. Particle swarm optimization for feature selection in classification:
A multi-objective approach. IEEE Transactions on Cybernetics, 43, 1656–1671.
Yu, L. and Liu, H., 2003. Feature selection for high-dimensional data: A fast correlation-based filter solution.
In: Proceedings of the Twentieth International Conference on Machine Learning (ICML-2003),
Washington, DC, USA, 1–8.
Yu, X., Yang, J., and Zhang, J., 2012. A transductive support vector machine algorithm based on spectral
clustering. AASRI Procedia, 1, 384–388.
Zare, H., Haffari, G., Gupta, A., and Brinkman, R. R., 2013. Scoring relevancy of features based on
combinatorial analysis of Lasso with application to lymphoma diagnosis. BMC Genomics, 14, S14.
Zena, M. H. and Gillies, D. F., 2015. A review of feature selection and feature extraction methods applied on
microarray data. Advances in Bioinformatics, 2015, Article ID 198363, 1–13.
Zhao, W. and Du, S., 2016. Spectral-spatial feature extraction for hyperspectral image classification: A
dimension reduction and deep learning approach. IEEE Transactions on Geoscience and Remote
Sensing, 54, 4544–4554.
Zhao, H., Sun, S., Jing, Z., and Yang, J., 2006. Local structure based supervised feature extraction. Pattern
Recognition, 39, 1546–1550.
Zhao, J., Zhong, Y., and Zhang, L., 2015. Detail-preserving smoothing classifier based on conditional random
fields for high spatial resolution remote sensing imagery. IEEE Transactions on Geoscience and Remote
Sensing, 53, 2440–2452.
Zhong, Y., Zhang, L., Gong, J., and Li, P., 2007. A supervised artificial immune classifier for remote-sensing
imagery. IEEE Transactions on Geoscience and Remote Sensing, 45, 3957–3966.
Zhu, X., 2008. Semi-Supervised Learning Literature Survey. Computer Sciences TR 1530. University of
Wisconsin—Madison.
Zhuo, L., Zheng, J., Li, X., Wang, F., Ai, B., and Qian, J., 2008. A genetic algorithm based wrapper feature
selection method for classification of hyperspectral images using support vector machine. In: In
Proceedings of SPIE 7147, The International Society for Optical Engineering, Guangzhou, China,
71471J–71471J–9.
6 Feature Extraction with
Statistics and Decision
Science Algorithms

6.1 INTRODUCTION
With the fast development of air-borne and space-borne remote sensing technologies, large volumes
of remotely sensed multispectral, hyperspectral, and microwave images have become available to
the public. Such massive data sources often require a large amount of memory for storage and
computational power for processing. With proper feature extraction techniques, these images may
provide a huge amount of information to help better understand Earth’s environment. Traditional
feature extraction methods involve using regression, filtering, clustering, transformation, and
probabilistic theory as opposed to modern feature extraction methods that heavily count on machine
learning and data mining. Nevertheless, these massive and various data sources are prone to be
redundant, which in turn may complicate traditional feature extraction processes and even result in
overfitting issues in machine learning or data mining (Liao et al., 2013; Huang et al., 2014; Romero
et al., 2016). Hence, the extraction of specific features of interest from complex and redundant data
inputs is of great importance to the exploitation of these data sources on a large scale.
Due to its efficacy in transforming the original redundant and complex inputs into a set of
informative and nonredundant features, feature extraction has long been considered a crucial step in
image processing and pattern recognition, as well as remote sensing and environmental modeling,
because it facilitates the subsequent data manipulation and/or decision making (Elnemr et al.,
2016). In the remote sensing community, feature extraction techniques have been widely used for
image processing, typically for pattern recognition and image classification. In image classification
and pattern recognition, feature extraction is often considered a special form of dimensionality
reduction which aims to construct a compact and informative feature space by removing the
irrelevant and redundant information from the original data space (Elnemr et al., 2016). Practical
applications typically involve road extraction (Gamba et al., 2006), urban and building detection
(Bastarrika et al., 2011), oil spill detection (Brekke and Solberg, 2005), change detection (Celik,
2009), burned areas mapping (Bastarrika et al., 2011), surface water body mapping (Feyisa et al.,
2014), hyperspectral image classification (Chen et al., 2013; Qian et al., 2013), and so forth.
As elaborated in Chapter 5, feature extraction can be performed either on the spatial or
spectral domain; hence, methods of feature extraction can be developed by making use of various
theories such as simple filtering, mathematical morphology, clustering, regression, spatial/
spectral transformation, classification, and so on. Traditional feature extraction approaches work
in practice based on conventional mathematical theories like filtering, regression, spatial/spectral
transformation, and others requiring less computational resources. Yet those advanced methods
taking advantage of artificial intelligence (e.g., artificial neural network, genetic algorithm, support
vector machine, genetic programming), as well as other advanced optimization theories (e.g., particle
swarm optimization), demand a lot more computational resources. Therefore, those advanced
methods perceptually involve high-performance computing issues (i.e., compression, storage, and
performance-driven load distribution for heterogeneous computational grids) in various real-world
applications.

95
96 Multisensor Data Fusion and Machine Learning for Environmental Remote Sensing

In this chapter, a suite of traditional feature extraction approaches that rely on statistics and
decision science principles, such as filtering, morphology, decision trees, transformation, regression,
and probability theory, will be introduced with specific focus on the mathematical foundations of
each kind of method. Since numerous methods and approaches found in the literature share similar
principles, these analogous approaches will be grouped into the same category in a logical order.
Chapter 7, which focuses on machine learning and data mining for advanced feature extraction, will
follow these traditional feature extraction techniques.

6.2 STATISTICS AND DECISION SCIENCE-BASED

FEATURE EXTRACTION TECHNIQUES
In this section, a suite of feature extraction algorithms that are commonly seen in the literature will
be introduced in light of statistics and decision science to aid in remote sensing domain-specific
practices. In addition to a holistic description of the principles and basics of each individual method,
the foundational mathematical theories will be emphasized. As found in the literature, traditional
feature extraction techniques typically include those methods based on theories in association with
filtering, clustering, regression, transformation, and other statistical theories, and each of them will
be delineated sequentially in the following subsections.

6.2.1 Filtering Operation
Remotely sensed data are collected in forms of either panchromatic, multispectral or hyperspectral
images at various spatiotemporal scales. The embedded color, shape, and textual characteristics
are three typical features that can be extracted to represent the property of an image (Tian, 2013).
These retrieved features can be further referred to as the “fingerprint” or “signature” associated
with a given image. In general, color features are often defined and are subject to a specific color
space or model, such as RGB (red-blue-green), HSV (hue, saturation, value) also known as HSB
(hue, saturation, brightness), and LUV. Note that LUV stands for non-RGB color space that
decouples the “color” (chromaticity, the UV part) and “lightness” (luminance, the L part) of color to
improve object detection. These fixed color spaces will in turn limit further explorations of spectral
information at other wavelengths (e.g., multispectral and hyperspectral) because only three spectral
components are required with respect to these color spaces. To overcome this constraint, other
kinds of techniques are often used to better detect and extract the relevant features embedded in
each image.
In practice, filtering is generally considered the simplest method for feature extraction, which has
also been routinely used in image processing to detect and extract the targets of interest from a given
remotely sensed image. Thresholding, which can be considered a representative of such a technique,
is actually a form of low-level feature extraction method performed as a point operation on the input
image by applying a single threshold to transform any greyscale (or color image) into a binary map
(Nixon and Aguado, 2012). An illustrative example of thresholding-based feature extraction was
given in the previous chapter, as shown in Figure 5.3. For instance, clouds can be easily detected
with the aid of human vision, since clouds are brighter relative to other terrestrial objects in RGB
color space. Evidenced by this property, an empirical threshold value can then be applied to one
spectral band (e.g., blue band) to extract the observed clouds by simply using a Boolean operator
(see Equation 5.2 for details).
Technically, thresholding can also be treated as a decision-based feature extraction method.
In most cases, one or two empirical threshold values will fulfill the need to detect and extract the
desired features with good accuracy. However, this does not hold for some extreme conditions
such as the extraction of water bodies from remotely sensed images in mountainous areas, where
shadows are often an obstacle. This is mainly due to the fact that both targets always show similar
spectral properties optically. In other words, it is difficult to obtain a satisfying result by simply
Feature Extraction with Statistics and Decision Science Algorithms 97

applying threshold values to one or two fixed spectral bands. Thus, more complex thresholding
networks should be developed by making use of external information such as elevation data and
microwave satellite images. This often leads to the creation of a multilayer stacked decision tree
framework. More details regarding decision tree classifiers will be introduced in the following
subsections.
Thresholding techniques work mainly by relying on spectral differences between various
targets to extract the desired features; hence the threshold values for the same target could even
vary between images due to radiometric distortions caused by various factors such as illumination
conditions. To account for such drawbacks, more flexible approaches should be applied. In
image interpretation, the shapes of targets are often considered good features for further pattern
recognition, since the perimeter of an object can be easily perceived by human vision. Hence,
detecting the shape features from a given imagery is critical to the subsequent feature extraction.
Essentially, the shape of an object is commonly treated as a step change in the intensity levels
(Nixon and Aguado, 2012).
In the remote sensing community, filtering approaches have been widely used in image processing
to detect and extract shape features, for example, linear features such as roads and rivers. In order
to extract the perimeter of an object or linear features like roads and rivers from a remotely sensed
imagery, a suite of convolutional filters has been proposed for the extraction of edge features. Among
them, “Roberts cross operator” and “Sobel operator” are the two most well-known filters, and they
have been extensively used in practical applications. As one of the first edge detectors, the Roberts
cross operator was initiated by Lawrence Roberts in 1963 (Davis, 1975), with two convolutional
kernels (or operators) formulated as:

1 0  0 1
t1 =   and t2 =   (6.1)
0 −1  0
 −1

According to Roberts (1963), the produced edges from an edge detector should be well defined,
while the intensity of edges should correspond closely to what a human would perceive with little
noise introduced by the background (Davis, 1975).
The Roberts cross operator works as a differential filter aiming to approximate to the gradient of
an image through discrete differentiation by computing the summation of the squared differences
between diagonally adjacent pixels. Let I(i,j) be a pixel (at the location (i,j)) in the original image
X, while Gx is the convoluted pixel value with the first kernel (e.g., t1) and Gy is the convoluted pixel
value with the second kernel (e.g., t2); the gradient can be then defined as:

∇I (i, j ) = G ( x, y ) ≅ Gx 2 + Gy 2 (6.2)

which can be further written as

∇I (i, j ) ≅ I i, j − I i+1, j+1 + I i+1, j − I i, j+1 (6.3)

It is clear that this operation will highlight changes in intensity in a diagonal direction, hence
it enables the detection of changes between targets (i.e., edges). An example of enhanced edge
features in an observed scene through the application of Roberts filters as well as two other edge
detectors can be used to sharpen our understanding (Figure 6.1). The results indicate that the edge
features and linear features have been better characterized compared to the original image without
applying Roberts cross operation (Figure 6.1a). Despite the simplicity and capability of this filter, it
is observed that the Roberts cross suffers greatly from sensitivity to noise due to its convolutional
nature (Davis, 1975).
98 Multisensor Data Fusion and Machine Learning for Environmental Remote Sensing

(a) Original image (b) Roberts operator

(c) Sobel operator (d) Laplacian operator

FIGURE 6.1 Edge features extracted by three different edge detectors. (a) RGB composite image; (b) edge
features detected by performing Roberts cross operation; (c) features detected by Sobel operation; and (d)
features detected by Laplacian operation.

In order to better detect edge features, an enhanced discrete differentiation operator, the Sobel
filter (also known as Sobel–Feldman operator), was developed by Irwin Sobel and Gary Feldman in
1968 with the aim of computing an approximation of the gradient of the image intensity function.
Differing from the Roberts cross operator, the Sobel filter is an isotropic image gradient operator
that uses two separable and integer-valued 3 × 3 kernels to calculate approximations of the
derivatives by convolving with the input image in the horizontal and vertical directions, respectively.
Two kernels are formulated as follows:

 1 2 1  −1 0 1
   
t1 =  0 0 0 and t2 = −2 0 2 (6.4)
−1 −2 −1  −1 0 1
 

Since the Sobel kernels have a larger window size than the Roberts cross operator, the Sobel
operator will yield larger degrees of accuracy in detecting and extracting edge features. More
precisely, the derived edge features from the Sobel filters will be much clearer and brighter (i.e.,
with larger contrast) to human vision (Figure 6.1c). Aside from the Roberts and Sobel operators,
Feature Extraction with Statistics and Decision Science Algorithms 99

there also exist other analogue filters, such as the Laplacian filters. A commonly used convolutional
kernel is

0 1 0
 
t =  1 −4 1 (6.5)
0 1 0


Theoretically, the Laplacian filters approximate a second-order derivative on the original image,
which in turn highlights regions of rapid intensity change in particular (Figure 6.1d). Because the
second derivatives are very sensitive to noise, a Gaussian smoothing is often performed on the
original image before applying the Laplacian filter to counter this constraint (Reuter et al., 2009).
Since all these filtering-based techniques are commonly used to enhance certain specific features,
they also fall into the image enhancement category in the image processing community to some
extent.
Due to their simplicity and relatively less expensive computation cost properties, filtering
techniques have been widely exploited for practical applications. For instance, Gabor filters were
also implemented for edge detection (Mehrotra et al., 1992) and texture classification (Clausi and
Ed Jernigan, 2000). In addition to the direct application of filter approaches for feature extraction,
various filters were also used in conjunction with other techniques to aid in feature extraction
practices. For instance, Gamba et al. (2006) proposed an adaptive directional filtering procedure
to enhance urban road extraction from high-resolution optical and Synthetic Aperture Radar
(SAR) images, in which the filtering scheme was used to capture the predominant directions of
roads. Similarly, Kang et al. (2014) used an edge-preserving filter to aid in hyperspectral image
classification, and the results indicated that the incorporation of the edge-preserving filtering in the
classification scheme resulted in higher classification accuracy.

6.2.2 Mathematical Morphology
Similar to filtering operators, morphological operators are another kind of promising filter that
have been widely used in computer vision for geometric structure information analysis. Essentially,
the foundation of morphological processing is in the mathematically rigorous field of describing
shapes using set theory, geometry, and topology, hence such processing procedures are generally
termed mathematical morphology (Serra, 1992; Soille and Pesaresi, 2002; Soille, 2004). In image
processing, morphological operators refer to a variety of image filters that process images based
on morphological information (e.g., size and shape). As opposed to many methods based on the
spectral property of pixels, mathematical morphology concentrates on the spatial relationships
between groups of pixels and treats the objects present in an image as sets (Soille and Pesaresi,
2002).
In image processing, mathematical morphology is commonly used to examine interactions
between an image and a set of structuring elements using certain operations, while the structuring
element acts as a probe for extracting or suppressing specific structures of the image objects (Plaza,
2007). More specifically, morphological operations apply a structuring element to filter an image,
while the value of each pixel in the output image is based on a comparison of the corresponding pixel
in the input image with its neighbors. By choosing a proper size and shape of the neighborhood, a
morphological operation that is sensitive to specific shapes in the input image can be constructed.
The output of filtering process depends fully on the matches between the input image and the
structuring element and the operation being performed (Quackenbush, 2004).
A structuring element is a small binary image which is actually a small matrix of pixels with
values of ones or zeros. Technically, in morphological operations, structuring elements play the
same role as convolutional kernels in traditional linear image filtering, yet the basic operations of
100 Multisensor Data Fusion and Machine Learning for Environmental Remote Sensing

(a) (b) (c)

1 1 1 1 1 0 0 1 0 0 0 0 1 0 0

1 1 1 1 1 0 0 1 0 0 0 1 1 1 0

1 1 1 1 1 1 1 1 1 1 1 1 1 1 1

1 1 1 1 1 0 0 1 0 0 0 1 1 1 0

1 1 1 1 1 0 0 1 0 0 0 0 1 0 0

FIGURE 6.2 An example of three simple structuring elements with different shapes. The blue square denotes
the origin of the structuring elements. (a) Square-shaped 5 × 5 element, (b) Cross-shaped 5 × 5 element, and
(c) Diamond-shaped 5 × 5 element.

morphology are nonlinear in nature (Davies, 2012). An example of three simple structuring elements
is demonstrated for more understanding in Figure 6.2. There are two basic aspects associated with
the structuring element. One is related to the size and the other to the shape. As indicated, the size
of an element is determined by the dimension of a matrix in general. A common practice is to have
a structuring matrix with an odd dimension, since the origin of the element is commonly defined
as the center of the matrix (Pratt, 2007). The shape of an element depends fully on the pattern of
ones and zeros distributed over the matrix grid. In practical usages, the structuring element may be
applied over the input image, acting as a filter to compare with the input image block by block based
on certain operations for the detection and extraction of specified geometrical features similar to the
given structuring element (Tuia et al., 2009).
Aside from the structuring element, another critical factor in morphological image analysis is
the morphological operation. The two most fundamental morphological operations are dilation and
erosion (Soille, 2004). Conceptually, both operations rely on translating the structuring element
to various points over the input image and then examining the intersection between the translated
element coordinates and the input image coordinates. If g is a binary image to analyze and B is a
structuring element, dilation (δB( g )) and erosion (εB( g ) ) can be mathematically represented as (Tuia
et al., 2009):

δB(g) = g ⊕ B = ∪ b∈B g−b (6.6)

εB(g) = g B = ∩ b∈B g−b (6.7)

As indicated, dilation expands the image by adding pixels in the structuring element, that is,
a union between g and B. On the contrary, erosion is used to perform an intersection between
them. This kind of analysis (based on binary images) is often called binary morphology, which
can also be extended to grayscale images by considering them as a topographic relief. However, in
grayscale morphology, the pointwise minimum and maximum operators will be used instead of the
intersection and union, respectively (Tuia et al., 2009). More specifically, dilation adds pixels to the
boundaries of objects in an image (i.e., grows boundary regions), while erosion is used to remove
pixels on object boundaries (i.e., shrinks boundary regions). According to this principle, the number
of pixels added or removed from the objects depends totally on the size and shape of the given
structuring element.
The graphs in Figure 6.3 show a schematic illustration of dilation and erosion, comparatively.
A practical application of these two operations to an image is shown in Figure 6.4. It is clear that
Feature Extraction with Statistics and Decision Science Algorithms 101

Dilation Erosion
rule rule

Dilated object Original object Eroded object

FIGURE 6.3 A schematic illustration of (left) dilation and (right) erosion.

the dilation operation has a unique effect; gaps between different regions are reduced and small
intrusions into boundaries of a region are filled in (Figure 6.4a). For example, the road shown at the
bottom of the Figure 6.4a confirms this observation. In contrast, the erosion operation shrinks the
objects’ boundaries, resulting in holes and gaps between different regions which become larger when
small details are eliminated (Figure 6.4b). In addition to the two basic operations of dilation and
erosion, many morphological operations in practical use are represented as compound operations
based on dilation and erosion, such as opening, closing, hit and miss transform, thickening, thinning,
and so forth (Tuia et al., 2009).
In the remote sensing community, morphological operators have been widely used in various
practical applications; common usages include edge detection, noise removal, image enhancement,
and image segmentation. In most cases, mathematical morphology is used to advance automatic
pattern recognition, in particular the detection of targets in urban areas since an accurate extraction
of shape and size features is essential to an automatic extraction process (Benediktsson et al.,
2005; Chaudhuri et al., 2016). Due to its pronounced efficacy, mathematical morphology has been
extensively used in detection and extraction of various terrestrial targets from remotely sensed high-
resolution optical/SAR imageries, such as roads (Mohammadzadeh et al., 2006; Valero et al., 2010),

(a) (b)

FIGURE 6.4 Morphologically transformed images by performing (a) dilation and (b) erosion operations
based on a square-shaped 5 × 5 structure element.
102 Multisensor Data Fusion and Machine Learning for Environmental Remote Sensing

bridges (Chen et al., 2014b), rivers (Sghaier et al., 2017), buildings (Chaudhuri et al., 2016), dwelling
structures (Kemper et al., 2011), and so on.
In addition to these ordinary applications, morphological concepts have also been applied to
aid in disaster management practices. Most recently, Lee et al. (2016) developed a mathematical
morphology method for automatically extracting the hurricane eyes from C-band SAR data to
advance understanding of hurricane dynamics. The results indicated that the morphology-based
analyses of the subsequent reconstructions of the hurricane eyes showed a high degree of agreement
with results derived from reference data based on National Oceanic and Atmospheric Administration
(NOAA) manual work. Similarly, Chen et al. (2017) developed an object-oriented framework for
landslide mapping based on Random Forests (RF) and mathematical morphology. The RF was used
as a dimensionality reduction tool to extract landslides’ relevant features, while a set of four closing
and opening morphology operations were subsequently applied to optimize the RF classification
results to map the landslides with higher accuracy. Moreover, morphological operators have
also been applied in astronomy. For instance, Aragón-Calvo et al. (2007) developed a multiscale
morphology filter to automatically segment cosmic structure into a set of basic components. Due to
the distinct advantage of scale independence in segmentation, anisotropic features such as filaments
and walls were well identified in this cosmic structure study.

6.2.3 Decision Tree Learning

Decision tree learning refers to creating a tree-like structure model to perform a set of decision rules
on original observations and internal results step-wise to retrieve some specific targets or conclusions.
Generally, a decision tree is a layer-stacked technique framework that enables the division of a given
data set into many smaller subsets following a tree structure. Considering whether the dependent
variable is categorical (discrete) or numeric (continuous), decision trees can be further divided into
classification trees and regression trees, respectively. A general term for both procedures is widely
known as ClAssification and Regression Tree (CART) analysis (Breiman et al., 1984).
Compared with other methods, the most appealing property of decision trees is their capability
for breaking down a complex decision-making process into a collection of simpler decisions, thus
facilitating the complexity of modeling. This unique property renders decision trees a popular
predictive modeling approach for use as a feature extractor in support of data mining and machine
learning for classification in the next stage (Rokach and Maimon, 2008). In remote sensing, a
decision tree is often used as an excellent feature extractor and/or classifier since it is computationally
efficient and requires no prior distribution assumptions for the input data. Moreover, a decision tree
sometimes performs even better in practice than a maximum likelihood classifier and a support
vector machine (Otukei and Blaschke, 2010). The distinction between RF and decision tree can be
described below.

[Link] Decision Tree Classifier

Decision tree classifiers are routinely used as a popular classification method in various pattern
recognition problems such as image classification and character recognition (Safavian and Landgrebe,
1991). Decision tree classifiers are observed to work more effectively, in particular for complex
classification problems, due to their flexible and computationally efficient features. Furthermore,
as claimed by Friedl and Brodley (1997), decision tree classifiers excel over many conventional
supervised classification methods, like maximum likelihood estimator, through several advantages.
Specifically, no distribution assumption is required by decision tree classifiers with respect to the
input data. This unique nature renders decision tree classifiers a higher flexibility to handle various
data sets, regardless whether numeric or categorical, even with missing values. In addition, decision
tree classifiers are essentially nonparametric. Moreover, decision trees are capable of handling
nonlinear relations between features and classes. Finally, the classification process through a tree-
like structure is always intuitive and interpretable.
Feature Extraction with Statistics and Decision Science Algorithms 103

Input data space

No Yes
Rule 1 ?

Intermediate
Class 1 result 1

Yes No
Rule 2 ?

Intermediate
result 2 Class 2

Root node (root) No Yes

Rule 3 ?

Internal node

Class 3 Class 4
Terminal node (leaf)

FIGURE 6.5 A schematic structure of a decision tree (binary tree).

In general, a decision tree consists of three essential components: a root node, several internal
nodes, and a set of terminal nodes (also called leaves). An illustrative example of a decision tree
structure is shown in Figure 6.5. As indicated, for each internal and terminal node (child node),
there should exist a parent node showing the data source. Meanwhile, regarding the root node
and each internal node (parent node), two or more children nodes will be generated from these
parent nodes based on various decision rules. If each parent node is split into two descendants,
the decision tree is often known as a binary tree (e.g., Figure 6.5), and the inherent decision rule
can be expressed as a dyadic Boolean operator such that the data points are split simply based
on whether the condition rule is satisfied or not. Among these three types of nodes, the root
node involves the input data space, while the other two kinds of nodes correspond to partitioned
subspaces. As opposed to root and internal nodes, the terminal nodes (i.e., leaves) refer to the final
determined outputs of the whole decision-making process, which cannot be further partitioned;
corresponding class labels (the majority class) will then be assigned. When developing a decision
tree, the most critical process is to split each internal node and the root node with various decision
rules or learning algorithms. In practice, there exist various learning algorithms of which the
most well known is the CART algorithm, which is a binary recursive partitioning procedure
(Breiman et al., 1984).
In the CART algorithm, a splitting rule is inherently defined as a determination function used
to maximize the purity (or homogeneity) of the training data as represented by the resulting
descendant nodes. Typically, an impurity function is defined to examine the goodness of split
for each node, and the Gini diversity index is commonly used as a popular measure for the
impurity function. Mathematically, the impurity measurement of the node t is usually defined as
the follows:
K

i (t ) = 1 − ∑P (t )
j =1
j
2
(6.8)
104 Multisensor Data Fusion and Machine Learning for Environmental Remote Sensing

where Pj (t) denotes the posterior probability of class j presenting in node t. This probability is often
defined as the proportion between the number of training samples that go to node t labeled as class
j and the total number of training samples within node t:

N j (t )
Pj (t ) = , j = 1, 2,…, K (6.9)
N (t )

Taking a binary node for example, the goodness of the split s for the node t can be calculated as:

∆i(s, t ) = i(t ) − PR ⋅ i(t R ) − PL ⋅ i(t L ) (6.10)

where PR and PL are the proportions of the samples in node t that go to the right descendant tR and
the left descendant tL, respectively. Essentially, the goodness of the split s should be maximized to
eventually achieve the lowest impurity in each step toward the largest purity in the terminal nodes.
Analogous to a machine learning process, the stopping criterion is also required by decision trees
to stop the split process. In the CART algorithm, the stopping criterion is commonly defined as:

max S ∆i(s, t ) < β (6.11)

where β is a predetermined threshold. The split process will continue until it meets such a stopping
criterion. In other words, the decision tree will stop growing, which implies that the training process
of the decision tree classifier is complete. In general, a decision tree classifier-based process starts
from the root node, and the unclassified data points in the root node are partitioned into different
internal nodes following a set of splitting rules before they finally arrive at terminal nodes (leaves)
where a class label will be assigned to each of them.
Decision tree has been extensively used in support of data mining and machine learning, aiming
to extract a target variable based on several input variables and a set of decision rules (Friedl et al.,
1999; Liu et al., 2008). Due to their nonparametric and top-down framework, decision trees have
been widely used in many practical remote sensing applications for feature extraction such as
hyperspectral image classification (Chen and Wang, 2007), snow cover extraction (Liu et al., 2008),
invasive plant species detection (Ghulam et al., 2014), and so on.

[Link] RF Classifier
RF, or random decision forests, is a suite of decision trees created by drawing a subset of training
data through a bagging approach (Breinman, 2001). More specifically, RF consists of a combination
of decision trees where each tree is constructed using an independently sampled random vector
from the input set, while all trees in the forest maintain a consistent distribution (Pal, 2005; Tian
et al., 2016). In practice, about two-thirds of the original input will be randomly selected to train
these trees through the bagging process, while the remaining one-third will not be used for training.
Instead, that portion of the data is used for internal cross-validation in order to check the performance
of the trained trees (Belgiu and Drăguţ, 2016). In other words, there is no need to perform cross-
validation to get an unbiased estimate of the test set error since it has already been done in the
process of constructing RF. In general, RF tries to construct multiple CART models by making use
of different samples and different initial variables, which in turn renders RF, accounting for the
inherent drawback of overfitting associated with conventional decision trees (Hastie et al., 2009).
In general, two parameters are required to perform an RF-based classification, namely, the
number of trees and the number of variables randomly chosen at each split (Winham et al., 2013).
Each node in a tree will be split with a given number of randomly sampled variables from the input
feature space. In RF, the Shannon entropy (or Gini index) is routinely used as the splitting function
(or attribute selection measure) to measure the impurity of an attribute with respect to the classes
Feature Extraction with Statistics and Decision Science Algorithms 105

(Pal, 2005). In prediction, each tree votes for a class membership for each test sample, and the class
with maximum votes will be considered the final class (Ni et al., 2017).
Unlike many other classification methods typically based on one classifier, hundreds of classifiers
can be constructed in RF and a final prediction is always obtained by combining all these decisions
with an optimal function (e.g., plurality vote). In traditional and advanced feature extraction practices,
ensemble learning methods use multiple learning algorithms to obtain better predictive performance
than could be obtained from any single learning algorithm. In fact, RF is regarded as an ensemble
classifier of decision tree in which decision tree plays the role of a meta model. This ensemble learning
nature renders RF many desirable advantages, for example, high accuracy, robustness against
overfitting the training data, and integrated measures of variable importance (Chan and Paelinckx,
2008; Guo et al., 2011; Stumpf and Kerle, 2011). In addition, no distribution assumption is required
for the input data, hence it can be used to process various data sets. Nevertheless, like many other
statistical learning techniques, RF is also observed to be prone to bias once the number of instances
is distributed unequally among the classes of interest (Winham et al., 2013). However, because of
its outstanding advantages, RF has been widely used for remote sensing classification in terms of
various applications, for example, laser data point clouds classification (Ni et al., 2017), LiDAR and
multispectral image-based urban scene classification (Guo et al., 2011), land cover classification and
mapping (Fan, 2013; Tian et al., 2016), hyperspectral image classification (Ham et al., 2005), and
landslide mapping (Stumpf and Kerle, 2011; Chen et al., 2014a).

6.2.4 Cluster Analysis
Cluster analysis, or clustering, was initiated in anthropology in 1932 and then introduced to
psychology in the late 1930s; its usage for trait theory classification in personality psychology in the
1940s caused it to be more widely known to the public (Tryon, 1939; Cattell, 1943; Bailey, 1994).
Essentially, clustering itself is not a specific algorithm, but is instead a general term referring to
various tasks or processes of grouping a set of targets/objects with similar characteristics into the
same class while isolating those with different properties into other classes. The core of clustering
is related to various algorithms that have the capability of detecting and isolating distinct features
into different groups. Therefore, the difference between various clustering algorithms primarily lies
in their notion of what constitutes a cluster and how to detect clusters efficiently. Clustering can thus
be considered a knowledge discovery or multi-objective optimization problem.
To date, more than one hundred clustering algorithms have been developed for various
applications found in the literature. The reason for such numerous clustering algorithms can be
ascribed to the fact that the notion of a cluster is difficult to define because it varies significantly
in properties between algorithms (Estivill-Castro, 2002). In other words, different clustering
algorithms are produced by employing different cluster models. Thus, understanding cluster models
is critical to the realization of the differences between the various clustering algorithms as cluster
models act as the core of each algorithm. As found in the literature, popular notions of clusters
include groups with small distances among the cluster members, dense regions in the data space,
intervals or particular statistical distributions. Typical cluster models associated with these notions
include connectivity models, centroid models, density models, distribution models, and many others
(Estivill-Castro, 2002). They are described below.

[Link] Connectivity-Based Clustering

Also known as hierarchical clustering, connectivity-based clustering aims to divide the input data
set into an extensive hierarchy of clusters that could merge with each other at certain distances
rather than a single partitioning (Rokach and Maimon, 2005). The relevant principle for such
methods is that objects are more related to those nearby rather than to those farther away. In other
words, clusters are formed based simply on distances between objects, and objects of the same kind
are more likely close to each other in the space domain. Thus, a cluster is determined largely by
106 Multisensor Data Fusion and Machine Learning for Environmental Remote Sensing

the maximum distance needed to connect objects within this cluster, and hence different clusters
will be formed under different distances (Everitt, 2011). Therefore, connectivity-based clustering
methods will differ largely by the distance functions used in each method. In addition to the
selection of distance functions, the linkage criteria also need to be decided. Popular choices include
single linkage clustering (Sibson, 1973) and complete linkage clustering (Defays, 1977). Despite
the efficacy of clustering objects into different groups, it has been observed that connectivity-based
clustering methods are prone to outliers (e.g., resulting in additional clusters or causing other clusters
to merge) in practical applications. Moreover, the computational burden of manipulating large data
sets will be huge since it is difficult to compute an optimal distance due to the high dimensionality
(Estivill-Castro, 2002; Everitt, 2011).

[Link] Centroid-Based Clustering

This kind of method assumes that clusters can be represented by various central vectors, as the
objects in the data set can then be assigned to the nearest cluster center. Therefore, it is essential
to determine optimal cluster centers, hence the clustering process can be in turn treated as an
optimization problem as the goal is to find certain optimal cluster centers to minimize the squared
distance from each cluster. The most well-known centroid-based clustering method is k-means
clustering, which aims to partition a set of observations into k clusters in which each observation
belongs to the cluster with the least variance. Given a data set X = { x1, x2 ,…, xn } with n observations,
it can be partitioned into k clusters S = {S1, S2 ,…, Sk ; k ≤ n} by making use of the k-means clustering
method. Mathematically, the clustering is used to solve the following optimization problem by
minimizing the squared distance within the cluster,
k

∑∑ x − u
2
arg S min i (6.12)
i=1 x∈Si

where ui is the mean value of Si. Compared to other clustering, finding an optimal solution to
k-means clustering is often computationally complex. Commonly, an iterative refinement technique
is used to solve the problem. More details related to the modeling process can be found in MacKay
(2003). Despite the computational complexity, k-means clustering is still featured in several distinct
applications, including the Voronoi structure-based data partitioning scheme, the nearest neighbor
classification concept, and model-based clustering basis.
An experimental example of k-means clustering for land use and land cover classification can
be seen in Figure 6.6. The results show an adequate accuracy in classifying different land cover
types. Compared to the true color image, the classified map exhibits more contrast between different
features, in particular the water bodies, which are largely highlighted in the classified map. In recent
years, a set of analogs has been developed based on the foundation of k-means clustering, including
X-means clustering (Ishioka, 2000), G-means clustering (Hamerly and Elkan, 2004), and the most
widely used fuzzy clustering (Dell’Acqua and Gamba, 2001; Modava and Akbarizadeh, 2017).

[Link] Density-Based Clustering

As the name suggests, in density-based clustering, clusters are constructed based on the density
of grouped data points. More specifically, clusters are defined as areas covering dense data points
(Kriegel et al., 2011). As opposed to many other clustering methods (e.g., k-means clustering) in
which every data point will be assigned to one cluster, in density-based clustering query data in
sparse regions are often considered to be outliers or noises. Among various density-based clustering
methods, the most popular one is Density-Based Spatial Clustering of Applications with Noise
(DBSCAN), as featured by a well-defined cluster model called “cluster-reachability” (Ester et al.,
1996). The working principle of density-based clustering is similar to connectivity-based clustering,
as data points within certain distance thresholds will be grouped to form clusters. However, a
Feature Extraction with Statistics and Decision Science Algorithms 107

(a) (b)

FIGURE 6.6 One observed Landsat TM scene and the corresponding classified result from k-means
clustering. (a) true color image by given bands 3, 2, and 1 to the RGB space, respectively; (b) the classified
result (6 classes) from the k-means method.

density criterion is also required by the density-based clustering. In other words, only connecting
points satisfying the density criterion will be retained.
In addition to DBSCAN, some other types of density-based clustering methods are detailed in the
literature as well, such as the Ordering Points to idenTify the Clustering Structure (OPTICS) (Ankerst
et al., 1999), a generalized form of DBSCAN, which works regardless of an appropriate value for the
range parameter. Nevertheless, such methods are always associated with one key drawback: they expect
some kind of density drop to detect cluster borders. In contrast to many other clustering methods,
density-based clustering methods only consider density-connected objects to form a cluster. Thus, the
shape of the cluster is often arbitrary. Nevertheless, such methods may perform poorly in dealing with
data sets with Gaussian mixtures since it is hard for them to model such data sets precisely.

[Link] Distribution-Based Clustering

Distribution models are given as prototypes and constraints of clusters, as only objects most
likely belonging to the same distribution will be gathered to form the clusters. The principle of
this approach resembles the generation of artificial data sets, as objects are randomly sampled
based on a priori distribution. In practical usages, the most widely used methods are Gaussian
Mixture Models (GMM) (Roberts et al., 1998; Jian and Vemuri, 2011), which are often performed
by making use of the expectation-maximization algorithm (Dempster and Laird, 1977). Essentially,
given a fitted GMM, the clustering algorithm will assign each data point to the relevant cluster
yielding the highest posterior probability. In other words, each query data can be assigned to more
than one cluster, and hence this method can be considered a type of fuzzy/soft clustering, even
probabilistic clustering, as each query data is weighted by a set of posterior probability. Like most
clustering methods, the number of desired clusters should also be defined before initiating model
fitting, and the complexity of the model will inevitably increase as the number of desired clusters
increases. Since each data point will be assigned a score to each cluster, GMM clustering enables
the accommodation of clusters with different sizes and correlation structures. This flexible property
renders GMM clustering more appropriate to use than fixed methods like k-means clustering.
With their distinct capabilities in dimensionality reduction as well as feature extraction, clustering
methods have been widely used in the field of computer vision for various practical applications
involving data mining, machine learning, pattern recognition, data compression, image analysis,
and so on. In general, the selection of an appropriate clustering algorithm and its parameter settings
108 Multisensor Data Fusion and Machine Learning for Environmental Remote Sensing

(e.g., the distance function, a density threshold or the number of expected clusters) depends largely
on the input data set (i.e., an algorithm specifically for one kind of model would generally fail
on a data set involving different kinds of models) as well as the further usage or objective of the
derived results. Overall, the clustering method with respect to a particular problem often needs to
be selected experimentally or with a priori knowledge about the data set as well as the intended use
of the results.
Clustering methods have long been widely used for feature learning and feature extraction to aid
in remote sensing applications; they include building extraction from panchromatic images (Wei and
Zhao, 2004), aerial laser cloud data (Tokunaga and Thuy Vu, 2007), SAR image segmentation by
making use of spectral clustering (Zhang et al., 2008), fuzzy c-means clustering (Tian et al., 2013),
and many other practices such as street tracking (fuzzy clustering) (Dell’Acqua and Gamba, 2001),
coastline extraction (fuzzy clustering) (Modava and Akbarizadeh, 2017), geometrical structure
retrieval (density-distance-based clustering) (Wu et al., 2017), and so on. In recent years, with the
advances of big remote sensing data such as high-resolution hyperspectral imageries, many of the
existing methods could fail in handling these data sets due to the curse of high dimensionality, which
in turn stimulates the development of new clustering algorithms that focus on subspace clustering
(Kriegel et al., 2012). An example of such a clustering algorithm is Clustering in QUEst (CLIQUE)
(Agrawal et al., 2005). In order to advance hyperspectral imagery classification, Sun et al. (2015)
proposed an improved sparse subspace clustering method to advance the band subset selection based
on the assumption that band vectors can be sampled from the integrated low-dimensional orthogonal
subspaces and each band can be sparsely represented as a linear or affine combination of other bands
within its subspace. The experimental results indicated that such a subspace clustering method could
significantly reduce the computational burden while improving the classification accuracy.

6.2.5 Regression and Statistical Modeling

In remote sensing, statistical modeling approaches like regression have been widely used for feature
extraction. Toward feature extraction, the commonly used modeling technique is regression-based,
such as linear regression (Cho and Skidmore, 2006; Feyisa et al., 2014), least squares regression/
partial least squares regression (Pu, 2012; Li et al., 2014), logistic regression (Cheng et al., 2006;
Qian et al., 2012; Khurshid and Khan, 2015), ridge regression (Imani and Ghassemian, 2015a;
Yuan and Tang, 2017), and so on. In general, these methods are performed on low-level radiometric
products like spectral reflectance at certain wavelength ranges by making use of a set of physically
meaningful mathematic models. In this section, two popular regression-based techniques, linear
regression and logistic regression, are illustrated below as representative methods with practical
remote sensing applications.

[Link] Linear Extrapolation and Multivariate Regression

This method is commonly used in remote sensing for certain domain specific index derivations
based on several spectral bands’ information in order to emphasize and better extract certain
terrestrial features from multispectral imageries. The most well-known example is the Normalized
Difference Vegetation Index (NDVI), which is routinely calculated as the ratio of the subtraction of
the near-infrared (NIR) and red (Red) bands to their summation (Rouse et al., 1974),

NIR − Red
NDVI = (6.13)
NIR + Red

In most cases, this kind of process is also considered to be image enhancement or data mining,
since the vegetation information is emphasized through a data mining scheme. Nevertheless, we
prefer to consider such a framework to be a feature extraction process, since informative features
(e.g., vegetation) are extracted from multiple spectral bands toward a dimensionality reduction.
Feature Extraction with Statistics and Decision Science Algorithms 109

Such a method has been widely used in many remote sensing applications. For instance,
McFeeters (1996) proposed the Normalized Difference Water Index (NDWI) based on the spectral
differences of water bodies in the green and near-infrared wavelength ranges for surface water
extraction purposes. The NDWI is formulated as follows (McFeeters, 1996):

Green − NIR
NDWI = (6.14)
Green + NIR

where pixels with positive NDWI values (NDWI > 0) are considered to be covered by water and
negative values are nonwater. In recent years, to account for drawbacks associated with NDWI,
a set of enhanced water indexes has been introduced seeking possible accuracy improvements,
such as Modified Normalized Difference Water Index (MNDWI) (Xu, 2006) and Automated Water
Extraction Index (AWEI) (Feyisa et al., 2014). Despite the usage of different spectral bands, these
indexes are still derived in a manner of regression and extrapolation based on several empirical
models. In the literature, there are many applications using such derivatives for specific feature
extraction purposes, for example, red edge position extraction (Cho and Skidmore, 2006), flower
coverage estimation (Chen et al., 2009), and burned areas mapping (Bastarrika et al., 2011).
Multivariate regression is another popular technique commonly used for detection and
extraction of certain specific features. An example is the empirical algorithm operationally used for
deriving chlorophyll-a concentrations in aquatic environments. Based on the in situ chlorophyll-a
concentration measurements and the relevant spectral reflectance observations, a fourth-order
polynomial empirical relationship was established for chlorophyll-a concentration estimation from
a suite of optical remotely sensed images. The algorithm can be formulated as (O’Reilly et al., 1998):
4

log10 (chl-a) = a0 + ∑a * R
i=1
i
i
(6.15)

where ai is the sensor-specific coefficient derived empirically and R is a local discriminative

ratio of blue-to-green bands remote sensing reflectance. Taking Moderate-Resolution Imaging
Spectroradiometer (MODIS) for example, where the coefficients a0 –a4 are 0.2424, −2.7423, 1.8017,
0.0015, and −1.2280, respectively, the R can be calculated as

 R 443 > Rrs 488 

R = log10  rs  (6.16)
 Rrs 547 

where the operator > means to find the largest reflectance value of Rrs 443 and Rrs 488. The chl-
a maps over the dry and wet seasons of Lake Nicaragua and Lake Managua are presented in
Figure 6.7. These four subdiagrams in Figure 6.7 exhibit a seasonality effect in a comparative way.
In general, the water quality in the dry season is worse than that of the wet season. However,
based on the Probability Density Function (PDF) of all the band values, the input band values
of Equation 6.15 do not follow the normality assumption as a linear regression equation in (6.15)
implies (Figures 6.8 and 6.9). Hence, the predicted values of chl-a concentrations do not follow the
normality assumption closely either (Figures 6.10 and 6.11). This finding gives rise to some insights
about the inadequacy of using a linear regression model to infer the water quality conditions of the
two tropical shallow lakes.

[Link] Logistic Regression

In statistics, logistic regression commonly refers to a regression model mainly used to measure
the relationship between categorical dependent variables and one or more independent variables
by estimating probabilities using a logistic function (Walker and Duncan, 1967; Freedman,
110 Multisensor Data Fusion and Machine Learning for Environmental Remote Sensing

(a) (b)

FIGURE 6.7 Chl-a concentration maps of Lake Managua and Lake Nicaragua for dry season and wet season,
respectively. (a) Lake Managua (Dry Season/March 04, 2016); (b) Lake Managua (Wet Season/September 08,
2016); (c) Lake Nicaragua (Dry Season/March 01, 2016); and (d) Lake Nicaragua (Wet Season/September 03, 2016).

(a) (b)
0.25 0.07
Rrs443 Rrs443
Rrs488 0.06 Rrs488
0.20 Rrs547 Rrs547
0.05
0.15 0.04
PDF
PDF

0.03
0.10
0.02
0.05
0.01

0 0
400 600 800 1000 1200 1400 1600 0 500 1000 1500 2000
Band value Band value

FIGURE 6.8 Band PDF for Lake Managua, (a) dry season; (b) wet season. (Note that the X axes do not stand
for the original reflectance value, and they have been multiplied by a scale factor for convenience of expression.)
Feature Extraction with Statistics and Decision Science Algorithms 111

(a) (b)
0.07 0.12
Rrs443 Rrs443
0.06 Rrs488 0.10 Rrs488
Rrs547 Rrs547
0.05
0.08
0.04

PDF
PDF

0.06
0.03
0.04
0.02

0.01 0.02

0 0
0 1000 2000 3000 4000 0 1000 2000 3000 4000
Band value Band value

FIGURE 6.9 Band PDF for Lake Nicaragua, (a) dry season; (b) wet season. (Note that the X axes do not
stand for the original reflectance value, and they have been multiplied by a scale factor for convenience of
expression.)

(a) (b)
0.06 0.45
0.40
0.05
0.35
0.04 0.30
0.25
PDF
PDF

0.03
0.20
0.02 0.15
0.10
0.01
0.05
0 0
2 2.5 3 3.5 4 4.5 2 4 6 8 10 12 14 16
Chl-a (µg/L) Chl-a (µg/L)

FIGURE 6.10 PDF of Chl-a concentrations in Lake Managua, (a) dry season; (b) wet season.

(a) (b)
0.50 0.7
0.45
0.6
0.40
0.35 0.5
0.30 0.4
PDF

PDF

0.25
0.20 0.3

0.15 0.2
0.10
0.1
0.05
0 0
10 20 30 40 50 10 20 30 40 50 60
Chl-a (µg/L) Chl-a (µg/L)

FIGURE 6.11 PDF of Chl-a concentrations in Lake Nicaragua, (a) dry season; (b) wet season.
112 Multisensor Data Fusion and Machine Learning for Environmental Remote Sensing

2009). Conceptually, a simplified form of a logistic regression problem can be written as follows
(binomial):

1, β0 + β1 x1 + β2 x2 + + βk xk > 0
y =  (6.17)
0 else

where the logistic regression is used to solve the regression y′ = β0 + β1 x1 + β2 x2 + + βk xk to

find the parameters β that best fit the Equation 6.17. Therefore, the underlying principle of logistic
regression is analogous to linear regression. However, the assumptions of logistic regression
are totally different. First, the conditional distribution should be Bernoulli distribution rather
than Gaussian distribution since the dependent variables are dichotomous (binary). Second, the
predicted values of logistic regression are probabilities of a particular outcome and hence vary
from 0 to 1.
In practical uses, logistic regression can be binomial, multinomial or ordinal. As shown in
Equation 6.17, binomial logistic regression is used to solve the problem in which the expected
outcomes of the dependent variables are dyadic (e.g., 0 and 1, representing dyadic problems like win/
loss, pass/fail, alive/dead). As opposed to binomial logistic regression, the outcomes of dependent
variables for multinomial logistic regression can be three or even more (Hosmer and Lemeshow,
2000). Ordinal logistic regression thus means that the dependent variables are ordered.
Due to the categorical outcomes’ nature, logistic regression has been widely used as a
probabilistic, linear classifier, by projecting the input vector onto a set of hyperplanes as each of
them corresponds to one specific class. Unlike the distance- and density-based clustering algorithms,
in logistic regression the “distance” from the input vector to a hyperplane is represented as the
probability that the input belongs to the corresponding class. Mathematically, given an input feature
space X = { x1, x2 ,…, xk } and a stochastic variable Y, logistical regression models the conditional
probability of the vector X belonging to the class y j , j = 1, 2,…, J as (Hosmer and Lemeshow, 2000;
Cheng et al., 2006):

 p 
log  j  = X * W = β0 j + β1 j x1 + β2 j x2 + + βkj xk (6.18)
1 − p j 

and in turn we obtain

e(
β0 j + β1 j x1 + β2 j x2 ++βkj xk )
pj = (6.19)
∑
J −1
e(
β0 l + β1l x1 +β2 l x2 ++βkl xk )
1+
l =1

where β is the weight matrix to be optimized. If the Jth class is the baseline, the logistic regression
model can be written in terms of J – 1 logit transformations as:

p 
log  1  = β01 + β11 x1 + β21 x2 + + βk1 xk
 pJ 
p 
log  2  = β02 + β12 x1 + β22 x2 + + βk 2 xk
 pJ  (6.20)

 pJ −1 
log   = β0(J −1) + β1(J −1) x1 + β2(J −1) x2 + + βk(J −1) xk
 pJ 
Feature Extraction with Statistics and Decision Science Algorithms 113

and hence
1
pJ =
(6.21)
∑
J −1
(β0 l +β1l x1 +β2 l x2 ++βkl xk )
1+ e
l =1
The model’s prediction is thus the class with maximal probability:
 

e( 0 j 1 j 1 2 j 2
β +β x +β x ++βkj xk ) 
arg max β   (6.22)
(β0 l +β1l x1 +β2 l x2 ++βkl xk ) 
∑
J −1
1 + e 
 
l =1
and the optimal weight matrix β * can in turn be estimated using the maximum likelihood method
(Hosmer and Lemeshow, 2000).
In feature extraction, logistic regression is commonly used to reduce dimensionality of the input
feature space by extracting the most relevant features based on the predicted probability of each
feature class. Cheng et al. (2006) developed a systematic approach based on logistic regression
for the feature selection and classification of remotely sensed images. The experimental results
performed on both multispectral (Landsat ETM+) and hyperspectral (Airborne Visible/Infrared
Imaging Spectrometer) images showed that the logistic regression enabled the reduction of the
number of features substantially without any significant decrease in the classification accuracy.
Similar work can be also found in Khurshid and Khan (2015). In addition, logistic regression can
be further extended to structured sparse logistic regression by adding a structured sparse constraint
(Qian et al., 2012). On the other hand, more advanced regression analyses can be conducted through
either data mining or machine learning although the implementation of these methods are not
straightforward but are always physically meaningful.

6.2.6 Linear Transformation
One of the essential tasks of feature extraction is to detect and extract a set of the most relevant
features from the original data set to reduce the dimensionality. Toward such a goal, many techniques
have been developed. The most popular methods are those that attempt to project or decompose the
original data inputs into a set of components, and then only the most relevant components will be
extracted and used for dimensionality reduction purposes. In this section, three popular methods
working with such a principle will be introduced below, including principal component analysis,
linear discriminate analysis, and wavelet transform.

[Link] Principal Component Analysis (PCA)

PCA is a statistical procedure that is commonly used to transform a set of possibly correlated
variables into a set of linearly uncorrelated subsets based on an orthogonal linear transformation.
Mathematically, it is defined as an orthogonal linear transformation that projects the original data
set to a new coordinate space such that the largest variance is projected onto the first coordinate
(also known as first principal component) while the subsequent largest variance lies on the second
coordinate under the constraint that it is orthogonal to the proceeding component, and so on (Jolliffe,
2002). Essentially, PCA aims to find a linear transformation z = W T x , where x ∈ R d , z ∈ R r, and
r < d, to maximize the variance of the data in the projected space (Prasad and Bruce, 2008).
Given a data matrix X = { x1, x2 ,…, xi } , xi ∈ R d , the transformation can be defined by a set of
p-dimensional vectors of weights W = {w1, w2 ,…, w p }, w p ∈ R k that map each vector xi of X to a
new space, that is

t k(i) = Wk T xi (6.23)

114 Multisensor Data Fusion and Machine Learning for Environmental Remote Sensing

In order to maximize the variance, the first weight W1 thus has to satisfy the following condition:

W1 = arg max W =1 {∑ ( x ⋅ W ) }
i
i
2
(6.24)

which can be further expanded as

{
W1 = arg max W =1 X ⋅ W
2
} = arg max {W T X T XW}
W =1

(6.25)

A symmetric matrix like XT X can be easily solved by finding the largest eigenvalue of the matrix,
as W is the corresponding eigenvector. Once W1 is obtained, the first principal component can
be derived by projecting the original data matrix X onto the W1 in the transformed space. The
further components can be acquired in a similar manner after subtracting the previously derived
components.
Since the number of principal components is usually determined by the number of significant
eigenvalues with respect to the global covariance matrix, the derived components always have a
lower dimension than the original data set (Prasad and Bruce, 2008). These components often
retain as much of the variance in the original dataset as possible. A set of six principal components
derived from a PCA analysis based on Landsat TM multispectral imageries is shown in Figure
6.12. The first two principal components (Figure 6.12a,b) have explained more than 95% of the
variances of these multispectral images. The remaining four components are thus considered to
be noise, which can be discarded for dimensionality reduction. Compared to each individual band
of the original multispectral image, the information content of the first principal component is
more abundant, which makes it a good data source for further data analysis such as classification.
Because of this unique feature, PCA has been extensively used in various data analyses for
dimensionality reduction, especially in manipulating high-dimensional data sets (Farrell and

(a) (b) (c)

(d) (e) (f )

FIGURE 6.12 Principal components derived from the Landsat TM multispectral image shown in Figure
6.6a. A total of 6 components are shown in (a)–(f) sequentially, with the explained variance of 68.5%, 27.2%,
3.3%, 0.6%, 0.3%, and 0.1%, respectively.
Feature Extraction with Statistics and Decision Science Algorithms 115

Mersereau, 2005; Celik, 2009; Lian, 2012). However, several drawbacks and constraints have been
observed associated with PCA, for example, the scaling effects (principal components are not
scale invariant) (Rencher, 2003; Prasad and Bruce, 2008). In recent years, many enhanced PCA
methods have been proposed toward various applications such as kernel PCA (Schölkopf et al.,
1997), scale-invariant PCA (Han and Liu, 2012, 2014), and even more advanced techniques like
independent component analysis (Stone, 2004; Wang and Chang, 2006) and projection pursuit
(Friedman and Tukey, 1974; Chiang et al., 2001).

[Link] Linear Discriminant Analysis (LDA)

Analogous to logistic regression and PCA, LDA is a generalization of Fisher’s linear discriminant,
which also attempts to represent dependent variables as a linear combination of a set of independent
features (Fisher, 1936; McLachlan, 2004). It is commonly used as a feature reduction tool to reduce
the high-dimensional feature space by projecting the original data into a small dimension set.
Unlike PCA, which ultimately intends to minimize mean square error between the original and
reduced spaces, LDA seeks to maximize the interclass (i.e., between class) variance and minimize
the intraclass (i.e., within class) variance (Dundar and Fung, 2009; Wurm et al., 2016). In LDA,
the feature contribution is quantified in terms of the degree of difference between the features as
well as the individual contribution to the discrimination of classes (Wurm et al., 2016). Therefore,
the number of features can be significantly reduced to the one having the highest impact on the
discrimination.
Given a high-dimensional data matrix X = { x1, x2 ,…, xi } , xi ∈ R d , there should exist a linear
mapping function f enabling the transformation of every original data point xi in X into a low-
dimensional vector Z = {z1, z2 ,…, z j } , z j ∈ R r (r ≤ d ) based on a transformation matrix W. That is,

zi = f ( xi ) = W T xi (6.26)

Analogous to many other feature extraction methods, the transformation matrix W can be
computed by solving an optimization problem in terms of fulfilling a given maximization criterion
of separability among classes. In LDA, this equals finding the best discrimination of the investigated
groups by maximizing the ratio of the interclass variance to intraclass variance to measure the
disparity of the groups (Wurm et al., 2016). The transformation matrix W can be optimized as:

 W T SW 
W * = arg maxW  T  (6.27)
 W SW 

where S denotes the interclass variance and S is the intraclass variance, and can be modeled as
follows:
C

∑n (u − u)(u − u)
T
S= k k k (6.28)
k =1

C  nk 

∑ ∑( xik − uk )( xik − uk ) 
T
S=   (6.29)
k =1
 i=1 

where nk is the number of samples in the kth class, u is the mean of the entire training set, uk is the
mean of the kth class, and xik is the ith sample in the kth class.
It is clear that the interclass variance is calculated as the square sum of the dispersion of the mean
discriminant variables of each class (uk) from the mean of all discriminant variable elements, and
the intraclass variance is defined as the square sum of the dispersion of the discriminant variables
116 Multisensor Data Fusion and Machine Learning for Environmental Remote Sensing

of single objects from their class means (Wurm et al., 2016). The solution to Equation 6.27 can be
obtained by solving the following eigenvalue problem:

S W = ΛSW (6.30)

which can be further written as:

S −1 S W = Λ W (6.31)

In remote sensing, LDA has been widely used for land cover classification from various remotely
sensed imageries, in particular hyperspectral images because of their high-dimensional space (Liao
et al., 2013; Yuan et al., 2014; Shahdoosti and Mirzapour, 2017). Despite its good performance in
many applications, conventional LDA has the inherent limitation of becoming intractable when
the number of input features exceeds the training samples’ size (Bandos et al., 2009; Shahdoosti
and Mirzapour, 2017). In order to extend the application of LDA to many practical cases, a number
of adaptations have been implemented to conventional LDA, which in turn yields many enhanced
LDA, such as regularized LDA (Bandos et al., 2009), orthogonal LDA (Duchene and Leclercq,
1988), uncorrelated LDA (Bandos et al., 2009), stepwise LDA (Siddiqi et al., 2015), two-dimensional
LDA (Imani and Ghassemian, 2015b), and so on.

[Link] Wavelet Transform

The wavelet transform was developed as an alternative to the conventional short-time Fourier
transform to overcome the associated frequency and time resolution problems. Essentially, wavelet
transform decomposes an input signal into a series of components with localized frequency and
time resolution based on a set of basic functions, and the derived components are in turn termed
wavelets. The fundamental idea behind wavelet transform is to analyze signal at various scales. To
some extent, the process of wavelet transform can be considered a type of multi-resolution analysis.
To perform wavelet analysis, a basic wavelet, also known as the mother wavelet, is routinely
required, as other wavelet basis functions are created by shifting and scaling the mother wavelet.
Given a mother wavelet ϕ (t ), the wavelet basis functions can be generated through the following
function:

1  t − b 
ϕa,b (t ) = ϕ  (6.32)
a  a 

where a (a > 0) and b are scaling and shifting factors, respectively. It is clear that the wavelet
functions will be dilated when a > 1 and contracted when a < 1 relative to the mother wavelet.
The 1/ a is used as a modulation coefficient to normalize the energy of the wavelets (Bruce et al.,
2001). The most popular mother wavelet is the Morlet wavelet (also called Gabor wavelet), because
it is closely related to human perception in both hearing and vision (Bernardino and Santos-Victor,
2005). The Morlet wavelet is modeled as a wavelet composed of a complex exponential multiplied
by a Gaussian window, which can be expressed as:

2
(
ϕσ (t ) = cσ π−(1/ 2 )e−(1/ 2 )t eiσλ − e−(1/ 2 )t
2
) (6.33)

where cσ is a normalization constant:

( )
2 2 −(1/ 2 )
cσ = 1 + e−σ − 2e−(3 / 4 )σ (6.34)

Feature Extraction with Statistics and Decision Science Algorithms 117

Based on the admissibility criterion, all wavelet functions must oscillate with an average value
of zero and finite support. Given an input signal x(t), the projection onto the subspace of one wavelet
function yields:

x a (t ) =
∫ W{x, a, b}⋅ϕ
R
a ,b ( t ) db

(6.35)

and the wavelet coefficient W can be obtained as:

W {x, a, b} = x,ϕa,b =
∫ x (t ) ⋅ ϕ
R
a ,b ( t ) dt

(6.36)

Because of its multi-resolution capability, wavelet transform has been widely used for remotely
sensed data analysis; however, many applications were confined to image compression (e.g., DeVore
et al., 1992; Walker and Nguyen, 2001) and image fusion (e.g., Zhou et al., 1998; Nunez et al.,
1999). Later, wavelet transform was introduced into the feature extraction domain, and was then
extensively used for various practices. A fundamental reason why the wavelet transform is an
excellent tool for feature extraction is its inherent multi-resolution properties, which enable it to
project a signal onto a basis of wavelet functions to separate features at different scales by changing
the scaling and shifting parameters with respect to the features to be extracted (Mallat, 1989; Bruce
et al., 2002). The challenge here is related to how the wavelet coefficients can be interpreted to
represent various features, and a common approach is to compute coefficient distribution over the
selected wavelet functions (Ghazali et al., 2007). In feature extraction practices, wavelet transform
has been widely used for various applications, for example, target detection (Bruce et al., 2001),
dimensionality reduction of hyperspectral image (Bruce et al., 2002), forest mapping (Pu and Gong,
2004), vegetation phenology feature extraction (Martínez and Gilabert, 2009), hyperspectral image
classification (Qian et al., 2013), and so on.

6.2.7 Probabilistic Techniques

In remote sensing, many probabilistic techniques taking advantage of probability theory such as Bayes
theorem have been used in feature extraction, particularly for image classification. The probability
theory framework renders the classification process physically meaningful as the partitioning of
one pixel into one specific class is fully determined by the posterior probability. In other words, the
decision is made fully based on mathematical evidence. In this section, two popular probabilistic
classifiers, maximum likelihood classifier and Naive Bayes classifier, will be introduced.

[Link] Maximum Likelihood Classifier (MLC)

Due to its probabilistic property, MLC is one of the most popular classification methods in remote
sensing, in which the partition of a pixel into one corresponding class will be simply based on its
likelihood (or probability). Essentially, MLC is a supervised classification method derived from
the Bayes theorem, as the classification of each cell to one specific class represented in the class
samples (or training set) is determined based on both the variance and covariance of the class
signatures. With the assumption that the cells in each class sample should be normally distributed,
a class can then be characterized by the mean vector and the covariance matrix. Given these two
characteristics for each cell value, the membership of the cells to the class can be determined based
on their statistical probabilities (likelihoods). More specifically, probability density functions will
be built for each class based on the training data set, and then all unlabeled pixels will be classified
based on the relative likelihood (probability) of that pixel occurring within each class’s probability
density function.
118 Multisensor Data Fusion and Machine Learning for Environmental Remote Sensing

Mathematically, the likelihood that a pixel with feature vector X belongs to class k can be defined
as a posterior probability (Ahmad and Quegan, 2012):

P ( k ) P ( X |k )
P ( k |X ) = (6.37)
P( X )

where P( X |k ) is the conditional probability to observe X from class k (or probability density function).
P(k) is the prior probability of class k, the values of which are usually assumed to be equal to each
other due to the lack of sufficient reference data. P(X) is the probability that the X is observed, which
can be further written as follows:
N

P( X ) = ∑P(k)P( X|k)
k =1
(6.38)

where N is the total number of classes. Commonly, P(X) is assumed to be a normalization constant
in order to ensure ΣkN=1P(k|X ) sums to 1 (Ahmad and Quegan, 2012). A pixel x will be assigned into
the class k once it satisfies the following criterion:

x ∈ k, if P(k|X ) > P( j|X ), for all j ≠ k (6.39)

For mathematical reasons, ML often assumes the distribution (or probability density function)
of the data in a given class to be a multivariate Gaussian distribution; the likelihood can then be
expressed as follows:

1  1 
P ( k |X ) = exp − ( X − uk )Σk −1 ( X − uk )T  (6.40)
(2 π ) N /2
Σk
1/ 2  2 

where N is the number of data sets (e.g., bands for multispectral image), X is the whole data set of
N bands, uk is the mean vector of class k, and Σk is the variance-covariance matrix of class k. |Σk | is
thus the determinant of Σk .
Due to its probability theory principle, MLC has been widely used in remote sensing for
classification. Applications include, but are not limited to: forest encroachment mapping (Tiwari
et al., 2016), rice crop mapping (Chen et al., 2011), land cover change detection (Otukei and Blaschke,
2010), salt farm mapping (Hagner and Reese, 2007), and water quality mapping (Jay and Guillaume,
2014). The performance of MLC has been thoroughly compared with many other classification
methods in the literature, such as decision tree, logistic regression, artificial neural network, and
support vector machine. More details can be found in Frizzelle and Moody (2001), Hagner and
Reese (2007), Kavzoglu and Reis (2008), and Hogland et al. (2013). Further investigations showed
that MLC may be ineffective in some cases, for example, classifying spectrally similar categories
and classes having subclasses (Kavzoglu and Reis, 2008). To account for these problems, methods
like PCA could be used to aid in the classification process. In addition, many extended MLC
methods have been developed, such as hierarchical MLC (Ediriwickrema and Khorram, 1997) and
calibrated MLC (Hagner and Reese, 2007).

[Link] Naive Bayes Classifier

Analogous to MLC, the Naive Bayes classifier is also a probabilistic estimator frequently used
for classification problems. Essentially, the Naive Bayes classifier works by following the Bayes’
theorem principle. An unclassified feature vector X = ( x1, x2 ,…, x p ) can be classified into the
Feature Extraction with Statistics and Decision Science Algorithms 119

class Ck once it yields the largest posterior probability P(Ck |X ). Based on the Bayes’ theorem, the
conditional probability P(Ck |X ) can be calculated as:

P( X |Ck )P(Ck )
P(Ck |X ) = (6.41)
P( X )

where P(Ck) is the priori probability of class Ck. P( X |Ck ) is the likelihood (or conditional probability)
of the feature vector X falling into the class Ck. P(x) is the priori probability of predictor X.
Since P(X) is independent of the class vector C and the feature values, it is thus equivalent to
a constant. Therefore, the critical component to the calculation of the conditional probability lies
in the estimation of the priori probability P(Ck) and the class-conditional probability P( X |Ck ). In
practice, the priori probability P(Ck) can be estimated from the training dataset as the portion of
samples within the training dataset taking the class label Ck:

N Ck
P(Ck ) = (6.42)
N

where N is the number of the training dataset and NCk is the number of training samples with class
label Ck.
The estimation of the conditional probability P( X |Ck ) can be defined as a joint probability as
follows:

P(Ck , X ) = P(Ck , x1, x2 ,…, x p )

To simplify the work, the Naive Bayes assumes that the presence of a feature in one class is
conditionally independent of other features, which means that they have the same conditional
probability:

P( xi |xi+1,…, x p , Ck ) = P( xi |Ck ) (6.44)

Then the joint probability model can be expressed as:

P(Ck |x1, x2 ,…, x p ) ∝ P(Ck , x1, x2 ,…, x p )

∝ P(Ck )P( x1|Ck )P( x2 |Ck ) P( x p |Ck )
(6.45)
p

∝ P(Ck )
ΠP( x |C )
i=1
p k

Based on the independent assumptions, the conditional probability P(Ck |X ) can be further
written as:

p
P(Ck )

P(Ck |X ) =
P( X ) ΠP( x |C )
i=1
p k (6.46)
120 Multisensor Data Fusion and Machine Learning for Environmental Remote Sensing

In practice, the Naive Bayes classifier can handle both discrete and continuous variables (Chang
et al., 2012). If X contains a finite number of discrete features xi, then the estimation of P ( xi |Ck ) is equal
to the calculation of the proportion of training samples xi within class Ck, which can be expressed as:

NCk ( xi )
P( xi |Ck ) = (6.47)
N Ck

However, if xi is a continuous variable, then discretization should be performed. A common

approach is to assume a normal distribution for the decomposed conditional probabilities:

P( xi |Ck ) = N ( xi , µCk , σC2 k ) (6.48)

The parameters µ and σ can be directly estimated from the training dataset.
In remote sensing, the Naive Bayes classifier is often used as another popular probabilistic
method for classification problems. As opposed to other classifiers, the popularity of the Naive Bayes
classifier is enhanced by the following properties (Wu et al., 2008). First, the Naive Bayes classifier
does not require a complicated iterative parameter estimation scheme, thus it is easy to construct,
making it able to handle huge datasets. Second, the Bayesian scheme makes the classification
process easy to understand and interpret, even for users unskilled in classifier technology. Finally,
the classification result has much higher accuracy. In practice, applications of Naive Bayes classifier
are relatively fewer compared to methods such as MLC. In the past years, the Naive Bayes classifier
has been successfully used for multi-label learning (Zhang et al., 2009), image classification (Liu
et al., 2011), text classification (Feng et al., 2015), and so forth.

6.3 SUMMARY
In this chapter, a suite of feature extraction methods based on statistic and decision science principles
were introduced, focusing primarily on their theoretical foundations with some illustrative examples
for practical applications. More specifically, methods discussed in this chapter include filtering
operation, morphology, decision trees, clustering algorithms, linear regression, PCA, wavelet
transform, MLC, and Naive Bayes classifier. All these techniques have been extensively used in
remote sensing for feature extraction, mainly for dimensionality reduction and feature selection.
In the next chapter, a set of artificial intelligence-based methods that are widely applied for feature
extraction will be described in detail.

REFERENCES
Agrawal, R., Gehrke, J., Gunopulos, D., and Raghavan, P., 2005. Automatic subspace clustering of high
dimensional data. Data Mining and Knowledge Discovery, 11, 5–33.
Ahmad, A. and Quegan, S., 2012. Analysis of maximum likelihood classification on multispectral data.
Applied Mathematical Sciences, 6, 6425–6436.
Ankerst, M., Breunig, M. M., Kriegel, H., and Sander, J., 1999. OPTICS: Ordering points to identify the
clustering structure. In: Proceedings of the 1999 ACM SIGMOD International Conference on
Management of Data, 49–60, Pennsylvania, USA.
Aragón-Calvo, M. A., Jones, B. J. T., van de Weygaert, R., and van der Hulst, J. M., 2007. The multiscale
morphology filter: identifying and extracting spatial patterns in the galaxy distribution. Astronomy &
Astrophysics, 474, 315–338.
Bailey, K., 1994. Numerical taxonomy and cluster analysis. In: K. D. Bailey (Ed.) Typologies and Taxonomies:
An Introduction to Classification Techniques, SAGE Publications Ltd., Thousand Oaks, California,
USA, 34, 24.
Bandos, T., Bruzzone, L., and Camps-Valls, G., 2009. Classification of hyperspectral images with regularized
linear discriminant analysis. IEEE Transactions on Geoscience and Remote Sensing, 47, 862–873.
Feature Extraction with Statistics and Decision Science Algorithms 121

Bastarrika, A., Chuvieco, E., and Martín, M. P., 2011. Mapping burned areas from Landsat TM/ETM+
data with a two-phase algorithm: Balancing omission and commission errors. Remote Sensing of
Environment, 115, 1003–1012.
Belgiu, M. and Drăguţ, L., 2016. Random forest in remote sensing: A review of applications and future
directions. ISPRS Journal of Photogrammetry and Remote Sensing, 114, 24–31.
Benediktsson, J. A., Palmason, J. A., and Sveinsson, J. R., 2005. Classification of hyperspectral data from
urban areas based on extended morphological profiles. IEEE Transactions on Geoscience and Remote
Sensing, 43, 480–491.
Bernardino, A. and Santos-Victor, J., 2005. A real-time Gabor primal sketch for visual attention. In: 2nd
Iberian Conference on Pattern Recognition and Image Analysis, 335–342, Estoril, Portugal.
Breinman, L., 2001. Random forests. Machine Learning, 45, 5–32.
Breiman, L., Friedman, J. H., Olshen, R. A., and Stone, C. J., 1984. Classification and Regression Trees.
Wadsworth & Brooks/Cole Advanced Books & Software.
Brekke, C. and Solberg, A. H. S., 2005. Oil spill detection by satellite remote sensing. Remote Sensing of
Environment, 95, 1–13.
Bruce, L. M., Koger, C. H., and Li, J., 2002. Dimensionality reduction of hyperspectral data using discrete
wavelet transform feature extraction. IEEE Transactions on Geoscience and Remote Sensing, 40,
2331–2338.
Bruce, L. M., Morgan, C., and Larsen, S., 2001. Automated detection of subpixel hyperspectral targets with
continuous and discrete wavelet transforms. IEEE Transactions on Geoscience and Remote Sensing,
39, 2217–2226.
Cattell, R. B., 1943. The description of personality: Basic traits resolved into clusters. Journal of Abnormal
Psychology, 38, 476–506.
Celik, T. 2009. Unsupervised change detection in satellite images using principal component analysis and
k-Means clustering. IEEE Geoscience and Remote Sensing Letters, 6, 772–776.
Chan, J. C. W. and Paelinckx, D., 2008. Evaluation of Random Forest and Adaboost tree-based ensemble
classification and spectral band selection for ecotope mapping using airborne hyperspectral imagery.
Remote Sensing of Environment, 112, 2999–3011.
Chang, N.-B., Han, M., Yao, W., and Chen, L.-C., 2012. Remote sensing assessment of coastal land
reclamation impact in Dalian, China, using high-resolution SPOT images and support vector machine.
In: Environmental Remote Sensing and Systems Analysis. CRC Press, Boca Raton, FL, USA, 249–276.
Chaudhuri, D., Kushwaha, N. K., Samal, A., and Agarwal, R. C., 2016. Automatic building detection from
high-resolution satellite images based on morphology and internal gray variance. IEEE Journal of
Selected Topics in Applied Earth Observations and Remote Sensing, 9, 1767–1779.
Chen, W., Li, X., Wang, Y., Chen, G., and Liu, S., 2014a. Forested landslide detection using LiDAR data and
the random forest algorithm: A case study of the Three Gorges, China. Remote Sensing of Environment,
152, 291–301.
Ch

Multisensor Data Fusion

Uploaded by

Multisensor Data Fusion

Uploaded by

Multisensor Data Fusion

and Machine Learning for

© 2018 by Taylor & Francis Group, LLC

No claim to original U.S. Government works

Printed on acid-free paper

International Standard Book Number-13: 978-1-4987-7433-8 (Hardback)

PART I Fundamental Principles of Remote Sensing

Chapter 2 Electromagnetic Radiation and Remote Sensing........................................................ 11

Chapter 3 Remote Sensing Sensors and Platforms...................................................................... 23

3.5.2 Historic Important Missions............................................................... 36

Chapter 4 Image Processing Techniques in Remote Sensing...................................................... 47

PART II Feature Extraction for Remote Sensing

Chapter 5 Feature Extraction and Classification for Environmental Remote Sensing................ 69

Chapter 6 Feature Extraction with Statistics and Decision Science Algorithms......................... 95

PART III Image and Data Fusion for Remote Sensing

Chapter 8 Principles and Practices of Data Fusion in Multisensor Remote Sensing

[Link] Spatiotemporal Modeling Algorithms............................... 203

PART IV I ntegrated Data Merging, Data Reconstruction,

Chapter 11 Cross-Mission Data Merging Methods and Algorithms........................................... 247

11.4 Summary........................................................................................................ 273

Chapter 12 Cloudy Pixel Removal and Image Reconstruction.................................................... 277

Chapter 14 Integrated Cross-Mission Data Merging, Fusion, and Machine Learning

PART V Remote Sensing for Environmental Decision Analysis

Chapter 15 Data Merging for Creating Long-Term Coherent Multisensor

15.2.2 OMI TCO Data................................................................................. 349

Chapter 16 Water Quality Monitoring in a Lake for Improving a Drinking Water

Chapter 17 Monitoring Ecosystem Toxins in a Water Body for Sustainable Development

Chapter 18 Environmental Reconstruction of Watershed Vegetation Cover

Chapter 19 Multisensor Data Merging and Reconstruction for Estimating PM2.5

Chapter 20 Conclusions............................................................................................................... 481

20.3 Future Perspectives and Actualization........................................................... 485

Ni-Bin Chang and Kaixu Bai

The MathWorks, Inc.

Kaixu Bai has been an assistant professor of cartography and geographic

environmental, ecological, meteorological, hydrological, and geological components interact with

FIGURE 1.1 Basic structure of a remote sensing system.

1.2 OBJECTIVES AND DEFINITIONS

• Level 0—Source pre-processing: Source pre-processing allocates data to suitable processes

Level 4—Process refinement

Level 3—Impact assessment Monitor data

Level 1—Object refinement

Data and information sources

Intelligence sources Sensor data Databases

Human (level 5 processing) or computer interaction

Cognitive, psychology, Computer science

Social, organization, Design and arts

1.3 FEATURED AREAS OF THE BOOK

2.2 PROPERTIES OF ELECTROMAGNETIC RADIATION

2.3 SOLAR RADIATION

FIGURE 2.1 Structure of an electromagnetic wave.

Gamma ray Ultraviolet Infrared Radio

X-ray Visible Microwave

Source: NASA Goddard Space Flight Center-Science Toolbox, 2016.

2.4 ATMOSPHERIC RADIATIVE TRANSFER

• Transmission—Some fraction of the total radiation energy in a beam of photons may

FIGURE 2.3 Radiative transfer processes through a medium.

Reflected solar Incoming solar Outgoing

(a) Specular reflection (b) Diffuse reflection

EX: mirrors EX: reflective ceilings

FIGURE 2.5 Types of reflections.

Refraction and direct transmission

FIGURE 2.6 Types of refraction layers.

2.5 REMOTE SENSING DATA COLLECTION

(ozone layer at 20–30 km;

2.5.2 Specific Spectral Region for Remote Sensing

2.5.3 Band Distribution for Remote Sensing

Reflective spectral region Emissive spectral region

0.4 0.5 0.6 0.7 1.1 3.0 5.0 14.0

Ka 5.0–11.3 mm 26.5–40 Earth observation, satellite communications

2.6 RATIONALE OF THERMAL REMOTE SENSING

107 T = 5785 K: Sun

2.6.2 Energy Budget and Earth’s Net Radiation

2.7 BASIC TERMINOLOGIES OF REMOTE SENSING

• Azimuth: Geographic orientation of a line given as an angle measured in degrees clockwise

FIGURE 2.10 Basic terminologies of remote sensing for earth observation.

Data Monitoring, investigation

PART IV I ntegrated Data Merging, Data Reconstruction,

PART V Remote Sensing for Environmental Decision Analysis

2.5 REMOTE SENSING DATA COLLECTION