0% found this document useful (0 votes)
35 views73 pages

Gis Algorithms SRM

The document is a book titled 'GIS Algorithms: Theory and Applications for Geographic Information Science & Technology' by Ningchuan Xiao, published in 2016. It focuses on critical algorithms essential for Geographic Information Systems (GIS), aiming to provide a deeper understanding of spatial data processing and algorithm implementation using Python. The book covers various topics including geometric algorithms, spatial indexing, and spatial analysis, while emphasizing the importance of transparency in data and processes.
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
35 views73 pages

Gis Algorithms SRM

The document is a book titled 'GIS Algorithms: Theory and Applications for Geographic Information Science & Technology' by Ningchuan Xiao, published in 2016. It focuses on critical algorithms essential for Geographic Information Systems (GIS), aiming to provide a deeper understanding of spatial data processing and algorithm implementation using Python. The book covers various topics including geometric algorithms, spatial indexing, and spatial analysis, while emphasizing the importance of transparency in data and processes.
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd

Sage Research Methods

GIS Algorithms: Theory and Applications for


Geographic Information Science & Technology

For the most optimal reading experience we recommend using our website.
[Link]

Author: Ningchuan Xiao


Pub. Date: 2016
Product: Sage Research Methods
DOI: [Link]
Methods: Algorithms, Nodes, Data mining
Keywords: trees, distance, regions, indexing, indexes, neighbors, coordinate systems
Disciplines: Geography, Technology, Computer Science
Sage Sage Research Methods
© Ningchuan Xiao 2016

Access Date: January 25, 2025


Publisher: SAGE Publications Ltd
City: 55 City Road
Online ISBN: 9781473921498

© 2016 SAGE Publications Ltd All Rights Reserved.

GIS Algorithms: Theory and Applications for Geographic Information


Page 2 of 73
Science & Technology
Sage Sage Research Methods
© Ningchuan Xiao 2016

Front Matter

• Copyright
• Acknowledgements
• Acknowledgements
• About the Author
• Preface

Chapters

• Chapter 1 | Introduction
• 1 | Geometric Algorithms
◦ Chapter 2 | Basic Geometric Operations
◦ Chapter 3 | Polygon Overlay
• 2 | Spatial Indexing
◦ Chapter 4 | Indexing
◦ Chapter 5 | k-D Trees
◦ Chapter 6 | Quadtrees
◦ Chapter 7 | Indexing Lines and Polygons
• 3 | Spatial Analysis and Modeling
◦ Chapter 8 | Interpolation
◦ Chapter 9 | Spatial Pattern and Analysis
◦ Chapter 10 | Network Analysis
◦ Chapter 11 | Spatial Optimization
◦ Chapter 12 | Heuristic Search Algorithms

Back Matter

• Postscript
• Appendix A Python: A Primer
• Appendix B GDAL/OGR and PySAL
• Appendix C Code List
• References

GIS Algorithms: Theory and Applications for Geographic Information


Page 3 of 73
Science & Technology
Sage Sage Research Methods
© Ningchuan Xiao 2016

Copyright

GIS Algorithms: Theory and Applications for Geographic Information


Page 4 of 73
Science & Technology
Sage Sage Research Methods
© Ningchuan Xiao 2016

SAGE Publications Ltd

1 Oliver’s Yard

55 City Road

London EC1Y 1SP

SAGE Publications Inc.

2455 Teller Road

Thousand Oaks, California 91320

SAGE Publications India Pvt Ltd

B 1/I 1 Mohan Cooperative Industrial Area

Mathura Road

New Delhi 110 044

SAGE Publications Asia-Pacific Pte Ltd

3 Church Street

#10-04 Samsung Hub

Singapore 049483

© Ningchuan Xiao 2016

First published 2016

Apart from any fair dealing for the purposes of research or private study, or criticism or review, as permitted
under the Copyright, Designs and Patents Act, 1988, this publication may be reproduced, stored or
transmitted in any form, or by any means, only with the prior permission in writing of the publishers, or in the
case of reprographic reproduction, in accordance with the terms of licences issued by the Copyright Licensing
Agency. Enquiries concerning reproduction outside those terms should be sent to the publishers.

GIS Algorithms: Theory and Applications for Geographic Information


Page 5 of 73
Science & Technology
Sage Sage Research Methods
© Ningchuan Xiao 2016

Library of Congress Control Number: 2015940434

British Library Cataloguing in Publication data

A catalogue record for this book is available from the British Library

ISBN 978-1-4462-7432-3

ISBN 978-1-4462-7433-0 (pbk)

Editor: Robert Rojek

Editorial assistant : Matt Oldfield

Production editor: Katherine Haw

Copyeditor: Richard Leigh

Proofreader: Richard Hutchinson

Indexer: Bill Farrington

Marketing manager: Michael Ainsley

Cover design: Francis Kenney

Typeset by: C&M Digitals (P) Ltd, Chennai, India

Printed and bound by CPI Group (UK) Ltd, Croydon, CR0 4YY

GIS Algorithms: Theory and Applications for Geographic Information


Page 6 of 73
Science & Technology
Sage Sage Research Methods
© Ningchuan Xiao 2016

Acknowledgements

Geographic Information Science and Technology (GIST) is enjoying profound innovation in its research and
development. SAGE Advances in GIST will provide students with learning resources to support and enrich
their interest in the workings of Geographic Information Science and Technology. These highly visual and
highly applied texts will promote curiosity about developments in the field, as well as provide focused and
up-to-date tools for doing GIS research.

Series edited by Mei-Po Kwan, Department of Geography, University of California, Berkeley

Also in the GIST series:

Spatial Statistics and Geostatistics

Yongwan Chun and Daniel A. Griffith

GIS Algorithms: Theory and Applications for Geographic Information


Page 7 of 73
Science & Technology
Sage Sage Research Methods
© Ningchuan Xiao 2016

Acknowledgements

To Alester, for understanding everything.

GIS Algorithms: Theory and Applications for Geographic Information


Page 8 of 73
Science & Technology
Sage Sage Research Methods
© Ningchuan Xiao 2016

About the Author

Ningchuan Xiao is an associate professor in the Department of Geography at the Ohio State University. He
has taught a wide range of courses in cartography, GIS, and spatial analysis and modeling. He previously
served as chair of the Spatial Analysis and Modeling Specialty Group of the Association of American
Geographers from 2009 to 2012. Dr Xiao’s research focuses on the development of effective and efficient
computational methods for mapping and analyzing spatial and temporal data in various application domains,
including spatial optimization, spatial decision support systems, environmental and ecological modeling, and
spatial epidemiology. His research has been published in leading journals in geography and GIScience. His
current projects include designing and implementing novel approaches to analyzing and mapping big data
from social media and other online sources, and developing search algorithms for solving spatial aggregation
problems. He is also working with interdisciplinary teams on projects to map the impacts of human mobility on
transmission of infectious diseases and to model the impacts of environment on social dynamics in the Far
North Region of Cameroon.

GIS Algorithms: Theory and Applications for Geographic Information


Page 9 of 73
Science & Technology
Sage Sage Research Methods
© Ningchuan Xiao 2016

Preface

Geographic information systems (GIS) have become increasingly important in helping us understand complex
social, economic, and natural dynamics where spatial components play a key role. In the relatively short
history of GIS development and applications, one can often observe an alienating process that separates GIS
as a “machine” from its human users. In some cases, GIS have been reduced to a black box that can be
used to generate pleasant mapping products serving various purposes. This trend has already encroached
on our GIS education programs where a noticeable portion of teaching has focused on training students how
to use the graphical user interface of GIS packages. Addressing this situation presents a great challenge to
GIS researchers and educators.

In this book we focus on critical algorithms used in GIS that serve as a cornerstone in supporting the many
operations on spatial data. The field of GIS has always been diverse, and many textbooks typically cover
the general topics without providing in-depth discussion for students to fully understand the context of GIS
concepts. Algorithms in GIS are often presented in different ways using different data structures, and the lack
of a coherent representation has made it difficult for students to digest the essence of algorithms. Students
in GIS classes often come from a variety of research and educational backgrounds and may not be fully
comfortable with the terms used in traditional and formal algorithm description. All of these have seemingly
made algorithms a difficult topic to teach in GIS classes. But they should not be excuses for us to avoid
algorithm topics. By examining how spatial data are input into an algorithm and how the algorithm is used to
process the data to yield the output, we can gain substantial understanding of two important components of
GIS: what geospatial data actually are and how these data are actually processed.

This book covers algorithms that are critical in implementing some major GIS functions. The goal, however,
is not to go over an exhaustive and long list of algorithms. Instead, we only include algorithms that either are
commonly used in today’s GIS or have a significant influence on the development of the current algorithms;
as such the choice of topics may appear to be subjective. We look at geospatial data from a minimalist
perspective by simply viewing them as being spatial and locational, meaning that we focus on the coordinates
using an atomic view of geographic information where most geospatial data can be understood as collections
of points. In this sense, we boil down a lot of unnecessary discussion about the difference between vector
and raster to a fundamental data model. Starting from there, we dive into the diversity of GIS algorithms that
can be used to help us carry out some fundamental functions: measuring important spatial properties such as
distance, incorporating multiple data sources using overlay, and speeding up analysis using various indexing
techniques. We also go to certain lengths to cover algorithms for spatial analysis and modeling tasks such as

GIS Algorithms: Theory and Applications for Geographic Information


Page 10 of 73
Science & Technology
Sage Sage Research Methods
© Ningchuan Xiao 2016

interpolation, pattern analysis, and decision-making using optimization models. Admittedly all these functions
are available in many GIS packages, open source or commercial. However, our goal is not to duplicate what
is available out there, but to show how things out there work so that we can implement our own GIS, or at
least GIS functions, without relying on a software system labeled as “GIS.” To gain such freedom, we must go
to the level where data are transparent not packaged, processes are outlined as code not as buttons to click
on, and results are as transparent as input.

This is not a traditional book on algorithms. In a typical computer science book one would expect algorithms
to be presented using pseudo-code that is assumed to concentrate on the critical parts of algorithms but to
ignore some of the details. The pseudo-code approach, however, is not suitable for many of those who study
GIS because the theoretical aspects of the algorithms are often not the main concern. Instead, the drive to
understand algorithms in GIS typically comes from the desire to know “how stuff works.” For this reason,
real, working code is used in this book to present and implement the algorithms. We specifically use Python
as the language in this book mainly because of its simplistic programming style that eliminates much of the
overhead in other programming languages. Careful thought has been given to balancing the use of powerful
but otherwise “packaged” Python modules in order to show the actual mechanism in the algorithms. With that,
we also maintain a coherent geospatial data representation and therefore data structures, starting from the
beginning of the book.

This book would not be in the form it is without help. I am deeply grateful to the open source community that
has tremendously advanced the way we think about and use spatial data today, and in this sense, this book
would not even exist without open source developments. I have on countless occasions consulted with the
Stack Overflow sites for my Python questions. The online LaTeX community ([Link]
and [Link] always had answers to my typesetting questions during the many
all-nighters spent writing this book. Some open source packages are especially crucial to many parts of
the book, including those on k-D trees ([Link] and [Link]
python-kdtree/) and kriging ([Link] My thanks are due to students
in many of the classes I have taught in the past few years in the United States and in China. Their feedback
and sometimes critiques have enabled me to improve my implementations of many of the algorithms. I thank
Mei-Po Kwan for her support, and Richard Leigh, Katherine Haw and Matthew Oldfield for carefully handling
the manuscript. Detailed comments from a number of reviewers of an earlier version of this book have greatly
helped me enhance the text. Finally, I would like to thank my family for their patience and support during the
writing process.

Ningchuan Xiao−82.8650°, 40.0556°December 2014

GIS Algorithms: Theory and Applications for Geographic Information


Page 11 of 73
Science & Technology
Sage Sage Research Methods
© Ningchuan Xiao 2016

Introduction

Algorithms1 are designed to solve computational problems. In general, an algorithm is a process that contains
a set of well designed steps for calculation. For example, to correctly calculate the sum of 18 and 19, one
must know how to deal with the fact that 8 plus 9 is more than 10, though different cultures have different
ways of processing that. Even for a simple problem like this, we expect that the steps used can help us get
the answer quickly and correctly. There are many problems that are more difficult than the simple problem
of addition, and solving these problems, again efficiently and correctly, requires more careful design of the
computational steps.

In GIS development and applications, algorithms are important in almost every aspect. When we click on a
map, for example, we expect a quick response from the computer system so that we can pull out relevant
information about the point or area we just clicked on. Such a fundamental daily routine for almost every GIS
application involves a variety of algorithms to ensure a satisfying response. It starts from searching for the
object (point, line, polygon, or pixel) underneath the clicking point. An efficient search algorithm will allow us
to narrow down to the area of interest quickly. While a brute-force approach may work by physically checking
every object in our data, it will not be useful for a large data set that will make the time of finding the target
impractically long. Many spatial indexing and query algorithms are designed to address this issue. While the
search is ongoing, we must check whether the object in our data matches the point clicked. For polygons,
we must decide whether the click point is within a polygon in our data, which requires a special algorithm
to quickly return a yes or no answer to decide whether the point is in the polygon. Geospatial data normally
come from many different sources, and it has been common practice to transform them into the same coor-
dinate system so that different data sets can be processed consistently. Another common application of the
multiple data sources is to overlay them to make the information more useful together.

There are many aspects of an algorithm to be examined. It is straightforward to require that an algorithm
solve the problem correctly. For some algorithms, it is easy to prove their correctness. For example, we will
introduce two search algorithms later in this chapter, and their correctness should be quite straightforward.
Other algorithms, however, are not so obvious, and proving their correctness will require more formal analy-
sis. A second feature of algorithms is their efficiency or running time. Of course we always want an algorithm
to be fast, but there are theoretical limits on how fast or efficient an algorithm can be, as determined by the
problem. We will discuss some of those problems at the end of the book under topics of spatial optimization.

GIS Algorithms: Theory and Applications for Geographic Information


Page 12 of 73
Science & Technology
Sage Sage Research Methods
© Ningchuan Xiao 2016

Besides correctness and running time, algorithms are often closely related to how the data are organized to
enable the processes and how the algorithms are actually implemented.

1.1 Computational concerns for algorithms

Let us assume we have a list of n points and the list does not have any order. We want to find a point from the
list. How long will we take to find the point? This is a reasonable question. But the actual time is highly related
to a lot of issues such as the programming language, the skill of the person who codes the program, the plat-
form, the speed and number of the CPUs, and so on. A more useful way to examine the time issue is to know
how many steps we need to finish the job, and then we analyze the total cost of performing the algorithm in
terms of the number of steps used. The cost of each step is of course variable and is dependent on what
constitutes a step. Nevertheless, it is still a more reliable way of thinking about computing time because many
computational steps, such as simple arithmetic operations, logical expressions, accessing computer memo-
ry for information retrieval, and variable value assignment, can be identified and they only cost a constant
amount of time. If we can figure out a way to count how many steps are needed to carry out a procedure,
we will then have a pretty good idea about how much time the entire procedure will cost, especially when we
compare algorithms.

Returning to our list of points, if there is no structure in the list – the points are stored in an arbitrary order
– the best we can do to find a point from the list is to test all the points in the list, one by one, until we can
conclude it is in or not in the list. Let us assume the name of the list is points and we want to find if the list
includes point p0. We can use a simple algorithm to do the search (Listing 1.1).

The algorithm in Listing 1.1 is called a linear search; in it we simply go through all the points, if necessary, to
search for the information we need. How many steps are necessary in this algorithm? The first line is a loop
and, because of the size of the list, it will run as many as n times when the item we are looking for happens
to be the last one in the list. The cost of running just once in the loop part in line 1 is a constant because the
list is stored in the computer memory, and the main operation steps here are to access the information at a
fixed location in the memory and then to move on to the next item in the memory. Suppose that the cost is c1

and we will run it up to to n times in the loop. The second line is a logic comparison between two points. It will

GIS Algorithms: Theory and Applications for Geographic Information


Page 13 of 73
Science & Technology
Sage Sage Research Methods
© Ningchuan Xiao 2016

run up to n times as well because it is inside the loop. Suppose that the cost of doing a logic comparison is c2

and it is a constant too. Line 3 simply returns the value of the point found; it has a constant cost of c and it will
only run once. For the best case scenario, we will find the target at the first iteration of the loop and therefore
the total cost is simply c1 + c2 + c, which can be generalized as a constant b + c. In the worst case scenario,

however, we will need to run all the way to the last item in the list and therefore the total cost becomes c1n +

c2n + c, which can be generalized as bn + c, where b and c are constants, and n is the size of the list (also

the size of the problem). On average, if the list is a random set of points and we are going to search for a
random point many times, we should expect a cost of c1n/2 + c2n/2 + c, which can be generalized as b′n + c,

and we know b′ < b, meaning that it will not cost as much as the worse case scenario does.

How much are we interested in the actual values of b, b′, and c in the above analysis? How will these values
impact the total computation cost? As it turns out, not much, because they are constants. But adding them up
many times will have a real impact and n, the problem size, generally controls how many times these constant
costs will be added together. When n reaches a certain level, the impact of the constants will become minimal
and it is really the magnitude of n that controls the growth of the total computation cost.

Some algorithms will have a cost related to n2, which is significantly different from the cost of n. For example,
the algorithm in Listing 1.2 is a simple procedure to compute the shortest pairwise distance between two
points in the list of n points. Here, the first loop (line 2) will run n times at a cost of t1 each, and the second

loop (line 3) will run exactly n2 times at the same cost of t1 each. The logic comparison (line 4) will run n2

times and we assume each time the cost is t2. The calculation of distance (line 5) will definitely be more costly

than the other simple operations such as logic comparison, but it is still a constant as the input is fixed (with
two points) and only a small finite set of steps will be taken to carry out the calculation. We say the cost of
each distance calculation is a constant t3. Since we do not compute the distance between the point and itself,

the distance calculation will run n2–n times, as will the comparison in line 6 (with time t4). The assignment in

line 7 will cost a constant time of t5 and may run up to n2–n times in the worst case scenario where every

next distance is shorter than the previous one. The last line will only run once with a time of c. Overall, the

total time for this algorithm will be t1n + t1n2 + t2n2 + t3 (n2−n) + t4 (n2−n) + t5 (n2 − n) + c, which can be

generalized as an2 + bn + c. Now it should be clear that this algorithm has a running time that is controlled by

n2.

GIS Algorithms: Theory and Applications for Geographic Information


Page 14 of 73
Science & Technology
Sage Sage Research Methods
© Ningchuan Xiao 2016

In the two example algorithms we have examined so far, the order of n indicates the total cost and we say
that our linear search algorithm has a computation cost in the order of n and the shortest pairwise distance al-

gorithm in the order of n2. For the linear search, we also know that, when n increases, the total cost of search
will always have an upper bound of bn. But is there a lower bound? We know the best case scenario has a

running time of a constant, or in the order of n0, but that does not apply to the general case. When we can
definitely find an upper bound but not a lower bound of the running time, we use the O-notation to denote the
order. In our case, we have O(n) for the average case, and the worst case scenario as well (because again
the constants do not control the total cost). In other words, we say that the running time, or time complexity,
of the linear search algorithm is O(n). Because the O-notation is about the upper bound, which is meant to be
the worst case scenario, we also mean the time complexity of the worst case scenario.

There are algorithms for which we do not have the upper bound of their running time. But we know their lower
bound and we use the Ω-notation to indicate that. A running time of Ω(n) would mean we know the algorithm
will at least cost an order of n in its running time, though we do not know the upper bound of the running time.
For other algorithms, we know both upper and lower bounds of the running time and we use the Θ-notation

to indicate that. For example, a running time of Θ(n2) indicates that the algorithm will take an order of n2 in
running time in all cases, best and worst. This is the case for our shortest distance algorithm because the

process will always run n2 times, regardless of the outcome of the comparison in line 6. It is more accurate

to say the time complexity is Θ(n2) instead of O(n2) because we know the lower bound of the running time of

pairwise shortest distance is always in the order of n2.

Now we reorganize our points in the previous list in a particular tree structure as illustrated in Figure 1.1. This
is a binary tree because each node on the tree can have at most two branches, starting from the root. Here
the root of the tree stores point (6, 7) and we show it at the top. All the points with X coordinates smaller than
or equal to that at the root are stored in the left branches of the root and those with X coordinates greater

GIS Algorithms: Theory and Applications for Geographic Information


Page 15 of 73
Science & Technology
Sage Sage Research Methods
© Ningchuan Xiao 2016

than that of the root point are stored in the right branches of the root. Going down the tree to the second level,
we have two points there, (4, 6) and (9, 4). For each of these points, we make sure that the rest of the points
will be stored on the left if they have a smaller or equal Y coordinate value, and on the right if greater. We
alternate the use of X and Y coordinates going down the tree, until we find the point we are looking for or
reach the end of a branch (a leaf node).

To use the tree structure to search for a point, we start from the root (we always start from the root for a tree
structure) and go down the tree by determining which branch to proceed along using the appropriate coordi-
nate at each level of the tree. For example, to search for a target point of (1, 7), we first go to the left branch
of the root because the X coordinate of our target point is 1, smaller than that in the root. Then we go the the
right branch of the second level node of (4, 6) because the Y coordinate (7) is greater than that in the node.
Now we reach the node of (3, 7) at the third level of the tree and we will go to the left branch there because
the target X coordinate is smaller than that in the node. Finally, we reach the node (2, 9) and we will move to
its left branch because the Y coordinate in the target is smaller than that in the node. In sum, given a tree, we
can write the algorithm in Listing 1.3 to fulfill such a search strategy using a tree structure.

Figure 1.1 A tree structure that stores 29 random points. Each node of the tree is labeled us-
ing a point with X and Y coordinates that range from 0 to 10

This is called a binary search, using the tree structure. Based on our discussion about running time, it is

GIS Algorithms: Theory and Applications for Geographic Information


Page 16 of 73
Science & Technology
Sage Sage Research Methods
© Ningchuan Xiao 2016

straightforward to see that the running time of this search algorithm is determined by the number of times we
have to run the while loop (line 2), which is determined by the height of the tree, as defined by the number of
edges from the root to the farthest leaf node. The above tree has a height of 4 and can hold up to 31 points

(we only have 29 here). In general, for a binary tree with a height of H, we can store up to 20 + 21 + 22 + … +

2H = 2H+1 − 1 items. In other words, if we have n points in total that fill all the nodes in a perfectly balanced

binary tree where all the leaf nodes are at exactly the same level, we have 2H+1 − 1 = n and hence H =
log2 (n + 1) − 1. In this case, when such a tree is given, we have a running time of the order of log2 (n+1),

which is at the same order as log2 n, and we say the running time is O(log2 n). For a balanced but not perfect

tree, meaning the difference between the heights of all leaf nodes is at most 1, we can still achieve a running
time of O(log2 n) since the farthest leaf node has a height of H. When a balanced tree cannot be guaranteed,

however, things can get worse. An example of an unbalanced tree is given in Figure 1.2 where we have ex-
actly the same points but the tree has a height of 8. Therefore, we know the binary search algorithm is more
efficient than linear search because O(log2 n) < O(n), but the actual running time is dependent on how the

tree is constructed and can be longer than O(log2 n) if the tree is unbalanced.

Figure 1.2 An unbalanced tree structure to store 29 random points

Since using a balanced binary tree to store the points can greatly improve the efficiency of search, will that
also help to reduce the running time of calculating the shortest pairwise distance? The brute-force approach

we discussed here has a running time of Θ(n2), which grows quickly as n increases. It would be reasonable
to look for a more efficient way to get the shortest pairwise distance in a list of points. The answer to our
question is yes, and the key lies in the use of tree structures. We will continue to explore this topic in the sec-
ond part of the book where we discuss different tree structures in more depth. Throughout the book, we do
not focus much on the theoretical analysis of running time for each algorithm. Instead, we will conduct more
empirical analysis by actually running the algorithms on different data sets.

GIS Algorithms: Theory and Applications for Geographic Information


Page 17 of 73
Science & Technology
Sage Sage Research Methods
© Ningchuan Xiao 2016

The discussion about search and especially binary search on a tree logically leads to the topic of data struc-
ture: how we store and organize data to facilitate the procedures in an algorithm. A tree structure is a good
example of how the original data stored in a list can be reorganized to achieve better search performance.
Many data structures are problem-specific. Some data structures can be complicated, but the increase in
storage is often compensated by a decrease in running time.

1.2 Coding

Algorithms can be described in different ways. We use verbal statements in this chapter to describe the linear
and binary search algorithms. For theoretical work, a formal description that details the steps but is not nec-
essarily executable will suffice. We call this type of description pseudo-code because it is not real computer
code, though very close. In this book, we take a more practical and explicit route by describing algorithms in
an actual computer programming language. Specifically, we use Python to describe the algorithms covered
in this book.

Writing computer programs (i.e., coding) to describe algorithms has a substantial benefit: all the algorithms
will immediately be executable. In this way, we present everything related to how the algorithms work in the
plain text of the book. The code becomes part of the text and consequently becomes an open source exper-
iment where each line of the process can be examined, modified, and improved. However, this code as text
approach may present too much information, especially when the programming language may need ancil-
lary code to help the main task. For example, many programming languages require end of line symbols and
brackets as part of the code to ensure syntax correctness. These symbols, when added as part of the text,
may hamper the reading process and therefore make it difficult for us to concentrate on the main contents of
the text. We choose Python in this book largely because of its simple syntax, along with many of its popular,
powerful, and well maintained modules. All the programs listed in this book were tested in Python 2.7, which
is a stable and widely adopted version at the time of writing. The majority of the programs in this book only
use the basic Python features, so these programs in principle are likely to be compatible with newer versions
of Python.

Python has become a popular programming language in recent years, including the use of Python for plugins
in GIS packages such as QGIS and ArcGIS. It is important to point out that programming in Python is a skill
that can definitely be acquired through learning and practice. To help readers make a quick start on the lan-

GIS Algorithms: Theory and Applications for Geographic Information


Page 18 of 73
Science & Technology
Sage Sage Research Methods
© Ningchuan Xiao 2016

guage, a short introduction to Python is included as Appendix A. This is not a comprehensive tutorial as many
of the online tutorials have more detailed and in-depth discussion about the language. However, many of
Python’s useful features, especially those related to the main text, are included in the tutorial.

1.3 How to use this book

The main text of this book is divided into three major parts. The general logic here is to start the book with
a discussion on the most fundamental aspects of the data – the geometry – before we move on to more ad-
vanced topics in spatial indexing and spatial analysis and modeling. At the end of each chapter we review
the major literature related to the topics covered in a section called Notes. At the end of the book, we also
include three appendices to help readers understand the Python programming language and the structure of
the programs included in the book.

In Part I, we focus on locations, or more specifically on coordinates that can be used to help us understand
geospatial information. In Chapter 2, we examine a few algorithms to compute different kinds of distance,
such as distances between points and distance from a point to a line. We also look at the calculation of poly-
gon centroids and a widely used algorithm called point-in-polygon that efficiently helps us determine whether
a point is located within a polygon. The final topic in Chapter 2 is about the transformation between coordinate
systems involving map projections. Chapter 3 covers a traditional GIS operation, known as overlay. As “old”
as this topic is, the actual computation of overlaying two polygons can be tedious, though not necessarily
complicated. Many of the topics in this part of the book are related to the field of computational geometry. But
we focus on those that are most relevant to the GIS world.

Part II is centered around the idea of spatial indexing. Spatial information is special. Though the general con-
cept in indexing, divide and conquer, is the same for spatial information, because of the two dimensions (or
more in some cases) in spatial information, more dedicated algorithms must be designed. We first introduce
the basic concepts of indexing in Chapter 4, where we focus on the development of a tree structure. Chapter
5 is devoted to k-D trees that are commonly used to index point data. Chapter 6 covers a popular indexing
technique called quadtrees, for both point and raster data. Chapter 7 extends the discussion to indexing lines
and polygons in spatial data.

Part III of the book focuses on the heart of GIS applications: spatial analysis and modeling. We first explore

GIS Algorithms: Theory and Applications for Geographic Information


Page 19 of 73
Science & Technology
Sage Sage Research Methods
© Ningchuan Xiao 2016

the interpolation methods on point data in Chapter 8 where we compare and contrast two commonly used
interpolation methods: inverse distance weighting and kriging. We also include a data simulation algorithm
called midpoint displacement from the fractal geometry literature. Chapter 9 is devoted to spatial pattern
analysis where the calculation of indices such as Moran’s I is included. Algorithms for network analysis, es-
pecially those calculating the shortest paths, are included in Chapter 10. We devote two chapters to topics in
spatial optimization: in Chapter 11 we focus on the exact methods and in Chapter 12 we explore some of the
heuristic methods.

In addition to the three parts in the main text, we have also included three appendices to cover some of the
technical details about coding. It should be obvious that, while we talk about algorithms most of the time, this
book is about coding as well. For this reason, a short introduction to Python is first included. Then we com-
pile a short introduction on the Python binding of a powerful library called GDAL/OGR and a Python library
for spatial analysis called PySAL. The purpose here is to help readers quickly get started with these libraries
so that they can have a sense how “real-world” data sets can be closely related to the topics (and code of
course) presented in this book.

Most of the programs listed in this book are also given a file name that can be used in other programs. In that
case the name of the program is listed in the caption of each code listing. We use directories to organize the
programs. In the last appendix, we provide an overview of the code by listing all the Python programs and
example data sets and discuss how they can be used.

Each of the chapters in this book could easily be extended into another book to cover the depth of each topic.
This book, then, is a survey of the general topics in GIS algorithms. The best way to grasp the breadth of the

topics presented in this book is coding. A collective Github page2 is under development where readers can
contribute their thoughts on implementing the algorithms included in this book and new algorithms that are
beyond the scope of the book. While theoretical derivation is not the focus of this book, empirical analysis
definitely is. We have included many experiments in the text and have suggested more in the exercises. This,
however, should not limit further experiments, especially those that are innovative and overarch various top-
ics. The book will only be successful if it achieves two goals: first, based on the skills built on understanding
the algorithms and code, that readers are able to develop their own tool sets that fit different data sets and
application requirements; and second, that coding becomes a habit when it comes to dealing with geospatial
data.

GIS Algorithms: Theory and Applications for Geographic Information


Page 20 of 73
Science & Technology
Sage Sage Research Methods
© Ningchuan Xiao 2016

1 The word “algorithm” itself comes from medieval Latin algorismus, which is from the name of al-Khāwrizmī,

a Persian mathematician, astronomer, and geographer, who made great contributions to the knowledge of
algebra and world geography.

2 [Link]

GIS Algorithms: Theory and Applications for Geographic Information


Page 21 of 73
Science & Technology
Sage Sage Research Methods
© Ningchuan Xiao 2016

[Link]

Postscript

Donkey: Are we there yet?

Shrek: No.

Donkey: Are we there yet?

Fiona: Not yet.

Donkey: Are we there yet?

Fiona: No.

Donkey: Are we there yet?

Shrek: No!

Donkey: Are we there yet?

Shrek: Yes.

Donkey: Really?

Shrek: No!!

Shrek 2 (2004)

We started this book with an introduction to spatial data and some basic geometric algorithms. From there
we moved on to discuss indexing methods that can be used to expedite the handling of spatial data. Then we
explored a number of applications in spatial analysis. In each chapter, we focused on theory, methods, and
practice with code. We used examples to show, in a humble way, that by coding we gain the freedom to do
things without relying on large software packages. But are we there yet?

Each chapter in its own right could be extended into a book or even more than one book. We have covered
some of the most important topics, concentrating our discussion around how the concepts work and how to
write a simple program in Python to implement the methods. We discussed the depth of each area in the
notes to each chapter, but only to a limited extent. More efforts, collective ones, will be needed to embrace

GIS Algorithms: Theory and Applications for Geographic Information


Page 22 of 73
Science & Technology
Sage Sage Research Methods
© Ningchuan Xiao 2016

the breadth and depth of the field involving geospatial data. Also, we can certainly do more with the code
included. For example, each of the programs is used individually in this book, for a good reason – we need
to introduce them individually for each algorithm when we go over the computational processes. Now that
we have seen all the programs, we can compile them together into a Python package that can be used
more easily. However, we should point out that the programs in this book are coded in a minimalist fashion
so as to put the algorithms under the spotlight. While all the programs work, they may not work under all
conditions. We do not check the type or validity of variables unless it is absolutely necessary to do so. For
example, we do not make sure to only insert points (as in the Point class) into a k-D tree. These may become
“bugs’’ that may cause a program to crash unexpectedly. Some may consider this to be a limitation of a
weakly typed programming language like Python. However, it also gives us the convenience to think about
the computational process from a more intuitive perspective by not going into excessive details.

Between here and there, then, there is the vast space of the discipline of geographic information science and
spatial analysis, and rigorous software engineering processes. But none of that detracts from the true value of
the chapters in this book: they are part of the process that prepares us to gain the freedom to handle spatial
data.

GIS Algorithms: Theory and Applications for Geographic Information


Page 23 of 73
Science & Technology
Sage Sage Research Methods
© Ningchuan Xiao 2016

Appendix A Python: A Primer

Beautiful is better than ugly.

Explicit is better than implicit.

Simple is better than complex.

Complex is better than complicated.

Flat is better than nested.

Sparse is better than dense.

Readability counts.

Tim Peters, The Zen of Python

This appendix is mainly for those readers who may not be familiar with some of the Python features used
in this book. It can be treated as a quick way to get started with Python but is not meant to be a thorough
introduction to the language. For a comprehensive understanding of Python, it would be better to use other
tutorials. There are excellent tutorials out there and many of them are free. For example, the official Python

tutorial1 covers a wide range of topics and provides many very effective examples that can be easily extended

to more complicated cases. Other tutorials, such as How to Think Like a Computer Scientist,2 may provide
more specialized content. Nevertheless, here we will look at Python topics that are closely related to many of
the algorithms introduced in this book. Our aim is to keep things simple as more detailed information is easily
available on the Web or in other publications.

Python is a cross-platform programming language. In this book we will assume a successful installation of
Python and many of the modules that come with the installation. We only use the command line of the Python
shell. We will introduce Python using some basic commands, before moving on to discuss some more specific
topics. It should be noted that all Python commands are entered after the prompt >>>. The version of Python
used to prepare this appendix is 2.7.6.

We can use Python as a powerful calculator. However, we must be careful about the type of each item. A
value of 2 will be treated as an integer; the calculation of 2/3 will be treated as an operation between two
integers and the result will be an integer (hence 0) in the output. To ensure a floating point value result we

GIS Algorithms: Theory and Applications for Geographic Information


Page 24 of 73
Science & Technology
Sage Sage Research Methods
© Ningchuan Xiao 2016

must use a floating point value in at least one of the numbers in the operation (e.g., 2.0/3). We can also use
the function float to specify the type of number.

>>> 10 * 10
100
>>> 2/3
0
>>> 2.0/3
0.6666666666666666
>>> 2/3.0
0.6666666666666666
>>> float(2)/3
0.6666666666666666
>>> 2/float(3)
0.6666666666666666

Variables are important in Python (as in all programming languages). However, variables in Python do not
have types. The type of a variable will be determined by Python at run time. In the following example, we
assign an integer to b first and then we change it to a string of text (enclosed in a pair of double quotes or
single quotes). When we want to add a string to an integer, Python reports an error. Also note that in Python
we use the symbol # to indicate comments and everything after the comment symbol in the same line will not
be considered by Python for any operation.

If both variables are strings, we can definitely add them up. Note that a string can be indicated using either

GIS Algorithms: Theory and Applications for Geographic Information


Page 25 of 73
Science & Technology
Sage Sage Research Methods
© Ningchuan Xiao 2016

single or double quotes, as long as they match.

>>> a = “astring”
>>> a + ’ ’ + b
’astring mystring’

Strings can be duplicated using the “*’’ operation.

>>> a * 5
’astringastringastringastringastring’
>>> (a + ’ ’) * 5
’astring astring astring astring astring ’

We can also combine variable values with strings to format powerful and meaningful expressions using the
built-in format method that comes with each string. The first part of the command is a partial string where
the braces are used as placeholders whose values will be filled in by the variables in the parentheses. The
variables are indexed as 0, 1, and 2, and these indices are used in the partial string.

>>> n1, n2, n3 = 10, 1, 2


>>> ’I have {0} apples, {1} pears, and {2} others’.\
... format(n1, n2, n3)
’I have 10 apples, 1 pears, and 2 others’

The first line in the above example represents another useful feature of Python called simultaneous
assignment that allows us to assign multiple values at once. Because of this feature, functions in Python can
return multiple values at the same time.

>>> a,b = 3,4


>>> a,b
(3, 4)
>>> a,b = b,a
>>> a,b
(4, 3)

Another way to format a string is to use the % operation. This is a little more confusing but can be useful.
Using % symbols within quotes indicates placeholders for values; such % symbols are followed by a character
to indicate the type of variable or value. Here the letters df and s denote integer, floating point, and string
types, respectively. For a floating point value, we can also use %.2f to make sure only two decimal points

GIS Algorithms: Theory and Applications for Geographic Information


Page 26 of 73
Science & Technology
Sage Sage Research Methods
© Ningchuan Xiao 2016

are shown. A % symbol after an end quote indicates the variables or values that will be used to replace the
placeholders in the quotes.

>>> n1, n2, n3 = 10, 1, 2


>>> “n1 = %d, n2 = %d, n3 = %d”%(n1, n2, n3)
’n1 = 10, n2 = 1, n3 = 2’
>>> name = “Dr. P”
>>> age = 40
>>> height = 6.1
>>> “%s is %d years old and %.2f feet tall”%(name,age,height)
’Dr. P is 40 years old and 6.10 feet tall’

Finally, the help function can be used to get useful explanations and examples of Python functions. Quotes
must be used for Python symbols in the this function.

>>> help(format)
>>> help(“<”)
>>> help(“==”)
>>> help(“%”)

A.1 List comprehension

A list in Python can be used to store multiple things of any data type, and is the most powerful data structure
in Python. In the following example, a is a list that contains six different values. The function len can be used
to return the length of the list. Each item in a list is referred to using an index that starts from 0.

>>> a = [’str1’, 10, ’str2’, ’str3’, 10, 5]


>>> a
[’str1’, 10, ’str2’, ’str3’, 10, 5]
>>> len(a)
6
>>> a[0] ‘str1’
>>> a[0] + ’ ’ + a[2]
’str1 str2’
>>> a[1] + a[5]
15
GIS Algorithms: Theory and Applications for Geographic Information
Page 27 of 73
Science & Technology
Sage Sage Research Methods
© Ningchuan Xiao 2016

When talking about lists, a very useful tool is the range function that returns a list of numbers. When only one
argument (e.g., 10) is entered, it is the stop condition and the function will yield a list of integers from 0 to the
integer one less than the input.

>>> range(9)
[0, 1, 2, 3, 4, 5, 6, 7, 8]
>>> range(2, 9)
[2, 3, 4, 5, 6, 7, 8]
>>> range(2, 9, 3)
[2, 5, 8]

Creating a list is called list comprehension in Python. There are many ways to create lists. Because a list
contains a set of items, a traditional way to fill in values in these items is through a for loop. We first create an
empty list ([ ]) and then keep adding new items to the list using the append function. The following example
generates a list of 10 values, each being the ith power of 2 for i ranging from 0 to 9. (Note ** in the following
code is the operator for “power of’’ in Python.)

>>> exps = []
>>> for i in range(10):
... [Link](2**i)
...
>>> exps
[1, 2, 4, 8, 16, 32, 64, 128, 256, 512]

Before we move on to list comprehension, let us focus on a very important rule in Python syntax: indentation.
Instead of using brackets or other symbols to indicate a code block, Python uses indentation. In the above
example, the code that is logically within the for loop must be written with some leading white space (four
spaces in this case). Before the indentation starts, the line above must end with a colon (:). This rule applies
whenever a logical group of lines must be identified, as we will often see in this appendix and in the main text
of the book.

The above example can be replaced using the list comprehension method in Python with one line of code
(shown below) that returns a list with exactly the same contents. We also compare the two lists using
the logical expression == that only returns True if elements in both lists are exactly the same. Each list
comprehension contains at least two kinds of operations: assignment and enumeration. In our example below,

GIS Algorithms: Theory and Applications for Geographic Information


Page 28 of 73
Science & Technology
Sage Sage Research Methods
© Ningchuan Xiao 2016

the assignment (2**i) is done over an enumeration (using the for statement) of each of the i values from a list
of 10 integers generated by the range function.

>>> exps1 = [2**i for i in range(10)]


>>> exps1 == exps
True

We can also put conditions in list comprehension to create more complicated lists. The following example
creates a list with only the members from exps1 that can be evenly divided by 32. Here, the symbol % is
the modulo operator that returns the remainder from a division operation of the first parameter by the second
(e.g., 5%3 returns 2).

>>> [x for x in exps1 if x%32==0]

We can actually plug in the expression for exps1 to make a nested list comprehension.

>>> [x for x in [2**i for i in range(10)] if x%32==0]

In Section 2.8, we use the following two for loops to create a list of points for the parallels where each parallel
contains 37 points ranging from –180 to 180 in steps of 10.

>>> points = []
>>> linenum = 0
>>> for lat in range(-90, 91, 10):
... for lon in range(-180, 181, 10):
... [Link]([linenum, lon, lat])
... linenum += 1
...

This can be simplified using list comprehension into one line of code as follows.

>>> points = [ [(lat+90)/10, lat, lon]


... for lat in range(-90, 91, 10)
... for lon in range(-180, 181, 10) ]

The X and Y coordinates of the points in the 17th parallel (80ºN) can be retrieved using another list
comprehension.

GIS Algorithms: Theory and Applications for Geographic Information


Page 29 of 73
Science & Technology
Sage Sage Research Methods
© Ningchuan Xiao 2016

>>> [[p[1], p[2]] for p in points if p[0]==17]

A.2 Functions, modules, and recursive functions

As in any programming language, we can group a collection of lines of code into a function that can be used
repeatedly. We use the Python keyword def to start defining a function. Immediately after the keyword is the
name of the function followed by a set of input arguments in a pair of parentheses. The line defining a function
ends with a colon. We use a simple example below to demonstrate how functions work.

Here we define a function that calculates the distance between two points with X and Y coordinates given by
x1 and y1 for the first point and x2 and y2 for the second. We will need to use the sqrt function to compute
the square root of the sum of squares. However, this function is not a built-in Python function. Instead, sqrt
is available in a Python module called math that must be specifically imported (in the first line). All the lines
in the function must be indented to indicate that they belong to the function. In this example, we do not show
Python prompts because it is inconvenient to write multiple lines of code in the interactive Python shell where
we cannot edit the previous lines. Instead, we can write the Python commands in a text editor and save them
as a file. There are many good editors for us to choose from. My favorite is Emacs, but Notepad++ is also
a popular and free option. Let us assume we save the following code in a file called [Link]. In this way, the
code we write becomes part of a module and can be imported for use when needed.

import math
def distance(x1, y1, x2, y2):
dx = x1-x2
dy = y1-y2
d = [Link](dx*dx + dy*dy)
print “my name is:”, __name__
return d

The following example shows how to use the code to calculate distance.

>>> from dist import *


>>> distance(1.5, 2.5, 3.5, 4.0)
my name is: dist

GIS Algorithms: Theory and Applications for Geographic Information


Page 30 of 73
Science & Technology
Sage Sage Research Methods
© Ningchuan Xiao 2016

2.5
>>> print __name__
__main__

In this way, we have created a Python module called dist. In general, a Python module includes the code
blocks that can be used somewhere else outside its original file. The above exercise also shows an important
built-in variable in Python called --name--. This variable is assigned to the value of --main-- in the process
that was first executed by the Python interpreter, which in our case is the Python prompt. When a module is
loaded, Python will use the module name to set the value of --name--.

The following is an example showing the calculation of the factorial of an integer using the formula n! = n ×
(n − 1) × (n − 2) × ... × 3 × 2 × 1. Here we simply use a list of integers ranging from 1 to n that repeatedly
multiplies itself.

def factorial_i(n):
f=1
for i in range(1,n+1):
f = f*i
return f

The calculation of n!, however, can be implemented in another manner where the function calls itself
repeatedly. This is what we refer to as a recursive function.

def factorial_r(n):
if n == 1:
return 1
return n * factorial_r(n-1)

Recursive functions are often simple and elegant to write. However, we must do this carefully to avoid
endlessly calling the function. In general, each recursive function must have a base where the calling stops.
In the above example, the last line returns by calling the function itself. If we do not control this, it will end up
calling itself ad infinitum. Therefore, we always have a condition before we call the function itself. Here, if the
value of n becomes 1, we will return a constant instead of another function call. This is the base case in this
example. Also, a recursive function must have a way to move on so as to get closer to the base case. In our
example, we make sure we will reach the base case by calling the function with a decreasing value of n. In

GIS Algorithms: Theory and Applications for Geographic Information


Page 31 of 73
Science & Technology
Sage Sage Research Methods
© Ningchuan Xiao 2016

the main text of this book there are plenty of examples of recursive functions, especially when we are dealing
with trees where a recursive call makes it easy to go down the tree continuously until we reach the end of the
tree (a leaf node). Using recursive functions, however, has a big disadvantage in terms of performance. For
example, when n is 10 and we call function factorial_r(10-1), no answer is returned and we have to wait until
n reaches 1 before we can go back and actually calculate all the values for n. This backtracking requires a
lot of memory storage to hold the unsolved function calls, and by backtracking we will spend more time going
through the loop.

A.3 Lambda functions and sorting

Sometimes, we need a function but do not need a real function with a name. This is an anonymous function
and we use the Python construct lambda for this purpose. For example, below we define a function so that
it returns True if an input is an even number or False otherwise. This is a quick way of defining a function
without using the def keyword and all the syntax rules. It is quite simple: the variable before the colon is the
input and everything after the colon will be calculated and returned.

>>> f = lambda x: x%2==0


>>> f(10)
True
>>> f(1)
False

This simple way of defining functions may seem unnecessary and confusing. Nevertheless, it provides great
expressiveness and abstraction of our code. We demonstrate its use in the context of list sorting. A list in
Python can be sorted using the built-in method sort.

>>> exps1 = [2**i for i in range(10)]


>>> [Link](reverse=True)
>>> exps1
[512, 256, 128, 64, 32, 16, 8, 4, 2, 1]

However, this will not work when each element in the list has a more complicated structure. For example,
if each element in the list is another list that contains the X and Y coordinates of a point, sorting that list

GIS Algorithms: Theory and Applications for Geographic Information


Page 32 of 73
Science & Technology
Sage Sage Research Methods
© Ningchuan Xiao 2016

directly will give results based on the first value in each element. How about if we want to sort it based on the
second value in each element? Here is how the lambda function comes to our assistance. We utilize the key
parameter of the sort function by specifying the second value in each element to be considered.

>>> points=[[10,1], [1,5], [1,2], [3,7], [10,8], [7,8], [9,8]]


>>> [Link]()
>>> points
[[1, 2], [1, 5], [3, 7], [7, 8], [9, 8], [10, 1], [10, 8]]
>>> [Link](key=lambda points: points[1])
>>> points
[[10, 1], [1, 2], [1, 5], [3, 7], [7, 8], [9, 8], [10, 8]]

A.4 NumPy and Matplotlib

NumPy is a powerful Python module specifically developed to operate on arrays and matrices. We use some
of the functions in NumPy extensively in Chapter 8 where interpolation methods are discussed. NumPy is
closely tied to many of the features of Python. For example, we can easily convert a Python list into a NumPy
array. Many of the NumPy functions are built to handle multiple inputs from an array directly. We demonstrate
these features using the following simple example. Here, we apply the sin function in NumPy directly to a list
called angles. It appears that NumPy implicitly converts the list into an array and computes the sine of each
element, using these to form an array for output. The sin function that comes in the math module, however,
does not work in this way because it requires a single float value to be the input.

>>> import numpy as np


>>> angles = [i*[Link]/10 for i in range(1, 10)]
>>> [Link](angles)
array([0.30901699, 0.58778525, 0.80901699, 0.95105652, 1.,
0.95105652, 0.80901699, 0.58778525, 0.30901699])
>>> import math
>>> [Link](angles)
Traceback (most recent call last):
File “<stdin>”, line 1, in <module>
TypeError: a float is required

GIS Algorithms: Theory and Applications for Geographic Information


Page 33 of 73
Science & Technology
Sage Sage Research Methods
© Ningchuan Xiao 2016

Matplotlib is a highly useful Python package designed for two-dimensional graphics. Despite its command
line style, it is fairly straightforward and quick to use this package to make impressive plots and graphics. We
show three examples here to demonstrate its use in three distinct areas that are related to our spatial data
handling applications.

The Matplotlib package contains two modes, IPython and pylab. The pylab mode is the main interface to the
functionality of Matplotlib and we import it first in the following example. Note that this way we actually also
import NumPy as np. Then we create a list called angles to hold 60 angles for more than a cycle of 2π. The
plot function creates a plot with the first two inputs as the X and Y coordinates. A number of parameters can
be applied in the plot function to control the look of the plot. Nothing will be displayed until the show function
is called. Figure A.1 shows the result of this example.

from pylab import * # or: import [Link] as plt


angles = [i*pi/20 for i in range(60)]
plot(angles, [Link](angles), color=’grey’)
plot(angles, [Link](angles), linewidth=2, color=’lightblue’)
show()

Figure A.1 Plotting using Matplotlib

The plot function can actually be used to plot line maps. We used this function in Chapter 2 to draw three
map projections using the graticule and world coastlines. To plot polygons, we can use the Polygon function

GIS Algorithms: Theory and Applications for Geographic Information


Page 34 of 73
Science & Technology
Sage Sage Research Methods
© Ningchuan Xiao 2016

in Matplotlib.

The second example illustrates how to draw points in a scatter plot, which can be equivalent to a “point
map.’’ Here we create 100 points, 50 of which are scattered around the point (0,0) in a Cartesian coordinate
system, and the other 50 around the point (10,10). Both X and Y coordinates follow a normal distribution with a
standard deviation of 1. The NumPy function normal can be used to do that by specifying the mean, standard
deviation, and number of samples to be created. We create the X and Y coordinates for each group separately
and then combine them using the NumPy function concatenate for the X and Y coordinates, respectively. We
mark the points in the first group around (0,0) in red and those in the second group around (10,10) in blue
by creating a list called col with 100 elements, where the first 50 are red and the second 50 blue. Finally, the
scatter function is used to create the plot, where we can also specify the colors for each point (col) and the
transparency. A 0.5 level of transparency will allow us to show the effect when multiple points overlap. Figure
A.2 shows the final result of this example. Note that we export the figure into a PDF, which is converted into
the EPS format for publication separately because transparency in EPS is not natively supported in Matplotlib.

import [Link] as plt


import numpy as np
n = 100
X1 = [Link](0, 1, n/2)
X2 = [Link](10, 5, n/2)
Y1 = [Link](0, 1, n/2)
Y2 = [Link](10, 5, n/2)
X = [Link]((X1, X2))
Y = [Link]((Y1, Y2))
col = [’red’ if i<n/2 else ’blue’ for i in range(n)]
[Link](X, Y, color=col, alpha=.5)
[Link](’scaled’)
[Link](’[Link]’,
bbox_inches=’tight’,pad_inches=0)
[Link]()

GIS Algorithms: Theory and Applications for Geographic Information


Page 35 of 73
Science & Technology
Sage Sage Research Methods
© Ningchuan Xiao 2016

Figure A.2 Plotting points using Matplotlib

In our last example, we reuse the distance function we developed before to create a raster with two centers,
one at the center of the lower-left quadrangle and the other at center of the upper-right. For each location in
the raster, we find the nearest center to that location and set the value at the location using a biweight kernel
(see Section 9.4). The function K is used to return such a value.

from dist import distance


def K(x, y, c1, c2, r):
d1 = distance(x, y, c1[0], c1[1])
d2 = distance(x, y, c2[0], c2[1])
d, i = d1, 0
if d2<d1:
d,i = d2,1
if d>r:
v=0
else:
u = float(d/r)
v = 15.0/16.0 * (1 - u**2)**2
return v

The following lines of code create a raster of 20 × 20 cells and the bandwidth of the kernel function is set to

GIS Algorithms: Theory and Applications for Geographic Information


Page 36 of 73
Science & Technology
Sage Sage Research Methods
© Ningchuan Xiao 2016

10. We use list comprehension to create a list of 20 values from 1 to 20. The variables c1 and c2 are used
to indicate the two centers. Z is the 2D list (a list of lists) where the value at each cell is generated using the
function K. The Matplotlib function imshow is used to create a 2D image using values in Z. We use a color
map of Greys to make a black-and-white map. By default, the imshow function will map the lowest value in the
data to one end of the color map (white) and the highest to the other end (black). However, this will leave our
map with very high contrast. Here, we force the function to use minimal and maximal values, −0.5 and 1.2,
respectively, that are outside our data range (0 and 0.9375) so that the actual color used in the map will be in
the middle range of the color map. The origin parameter in imshow specifies that the origin of the coordinates
is in the lower-left corner of the image, which is consistent with our data. The colorbar function displays a
legend and we set the total length of the bar to be 99% of the map height. The xticks and yticks functions take
empty lists as input, which helps hide the ticks so that the rendering is more like a real map. Figure A.3 shows
the result of this example.

from pylab import *


n = 20
r = n/2.0
c1 = [n/4, n/4]
c2 = [3*n/4, 3*n/4]
Z = [ [K(x, y, c1, c2, r) for x in range(1, n+1) ]
for y in range(1, n+1)]
imshow(Z, cmap=’Greys’, vmin=-0.5, vmax=1.2, origin=’lower’)
colorbar(shrink=.99)
xticks([]), yticks([])
show()

GIS Algorithms: Theory and Applications for Geographic Information


Page 37 of 73
Science & Technology
Sage Sage Research Methods
© Ningchuan Xiao 2016

Figure A.3 Plotting a raster image using Matplotlib

A.5 Classes

We can group data and the functions that are closely associated with data into a class. In most Python
terminology, data are called attributes and functions are called methods when they are encapsulated in a
class. The simplest way to create a class is to use the class keyword in Python. We will create a class called
Shape that stores a number of points. The first line declares the class. We define two methods in this class,
both of them built-in methods. A built-in method in a Python class means it comes with every definition of the
class and it does some specific thing. For example the __init__ method is always called when an instance
of the class is created. For this reason, we also call it the constructor of the class, and we as users can
rewrite it with our own code. For example, here we require a Shape instance to be provided with a set of
points when it is created. Each method in a Python class must include a variable called self as the first input
argument. In this way, we can pass the class itself (along with its data and methods) to the method (function)
for convenience. For example, in this __init__ method, we can then assign the input variable values to the
member attributes of the Shape class. Defining class attributes is straightforward: we simply write the names
there with a self. prefix, where the dot means that whatever comes after the dot is a member of the class. The
other built-in method is __repr__ which defines how the instance of the class will be printed as text. Because
of that, we will need to return a string. Here we simply convert the list of points into a string.
GIS Algorithms: Theory and Applications for Geographic Information
Page 38 of 73
Science & Technology
Sage Sage Research Methods
© Ningchuan Xiao 2016

class Shape():
def __init__(self, points, id=None, t=None):
[Link] = points
[Link] = id
[Link] = t
def __repr__(self):
return str([Link])

To use the above class, we simply define a “variable’’ and assign it the class name as a function, with a set of
input arguments. Such a variable is called an object of the class. Classes are a conceptualization or blueprint
of the object. In other words, we cannot do anything with the class because it is just a definition, but we can
create an object and operate on it.

>>> s = Shape([[1,1], [2,3], [3,7], [2,8]])


>>> print s
[[1, 1], [2, 3], [3, 7], [2, 8]]

There is not much we can do with the Shape object because an actual shape will be more complex than just
a set of points. For example, we may have lines and polygons, and we will desire different behaviors from
them. For this reason, we can extend an existing class into subclasses. This is called inheritance: we define
subclasses that inherit features from their parent classes. We first take a look at an example of a subclass
called Polyline. This time, we still use the class keyword, but we include the name of the parent class in
the parentheses. In the constructor (--init--) of this class, we call the constructor of the parent class first to
pass the necessary information that will be stored with the parent class. Beyond that, we introduce two new
attributes to this new subclass, including name and color. Both of the values get a default value and we can
make changes later. Now we also include a new method called draw where we first use the Polygon function
in Matplotlib to create a line object and then we add it to the plot. Note that we can use the Polygon function
to draw a line by specifying both parameters closed and fill to False. It will be necessary to import Matplotlib
here.

import [Link] as plt


class Polyline(Shape):
def __init__(self, points, id, name=None):
Shape.__init__(self, points, id, “polyline”)

GIS Algorithms: Theory and Applications for Geographic Information


Page 39 of 73
Science & Technology
Sage Sage Research Methods
© Ningchuan Xiao 2016

[Link] = name
[Link] = ’grey’
def draw(self):
l = [Link]([Link], closed=False, fill=False,
edgecolor=[Link])
[Link]().add_line(l)

We then continue to define another class called Polygon below. For this subclass, we have two color
properties, one for the fill (fillcolor) and one for the polygon outline (edgecolor). A Polygon class will have its
own draw function due to the specific features of a polygon. We set the transparency level to 0.5 using the
alpha parameter of the Polygon function.

class Polygon(Shape):
def __init__(self, points, id, name=None):
Shape.__init__(self, points, id, “polygon”)
[Link] = name
[Link] = ’white’
[Link] = ’grey’
def draw(self):
poly = [Link]([Link], closed=True,
fill=True,
facecolor=[Link],
edgecolor=[Link],
alpha=0.5)
[Link]().add_line(poly)

Finally, we create another class called Shapes as a container to include and manage multiple shapes. The
key feature here is a list called shapes and a function that adds different shapes to the list. Now we have
a complete draw function that first calls all the shapes in the list to draw their geometric features (but not
showing at this time). Then the show function in Matplotlib is called to finally draw the data.

class Shapes():
def __init__(self):
[Link] = []
def add_shape(self, shape):

GIS Algorithms: Theory and Applications for Geographic Information


Page 40 of 73
Science & Technology
Sage Sage Research Methods
© Ningchuan Xiao 2016

[Link](shape)
def draw(self):
for s in [Link]:
[Link]()
if len([Link]):
[Link](’scaled’)
[Link]([])
[Link]([])
[Link]()

We create three very simple shapes and draw the final map. Figure A.4 shows the result.

shapes = Shapes()
l1 = Polyline([[1,2], [5,3], [7,2]], 0)
l2 = Polyline([[6,2], [1,1], [5,5]], 1)
[Link] = ’red’
p1 = Polygon([[1,1], [2,3], [3,7], [2,8]], 3)

shapes.add_shape(l1)
shapes.add_shape(l2)
shapes.add_shape(p1)
[Link]()

GIS Algorithms: Theory and Applications for Geographic Information


Page 41 of 73
Science & Technology
Sage Sage Research Methods
© Ningchuan Xiao 2016

Figure A.4 Plotting shapes using Matplotlib

1 [Link]

2 [Link]

GIS Algorithms: Theory and Applications for Geographic Information


Page 42 of 73
Science & Technology
Sage Sage Research Methods
© Ningchuan Xiao 2016

Appendix B GDAL/OGR and PySAL

Many Python packages have been developed to handle geospatial data. In this appendix we introduce two
sets of libraries for different purposes. GDAL (Geospatial Data Abstraction Library) is a powerful library that
can be used to support various tasks in handling geospatial data sets. GDAL is designed to handle raster

data and supports about 100 raster formats.1 GDAL represents a major advance in free and open source
GIS. As a component of GDAL, OGR supports vector data formats. OGR used to mean OpenGIS Simple
Features Reference Implementation, but the OGR library does not fully comply with the OpenGIS Simple
Features Specification of the Open GIS Consortium (OGC) and therefore was not approved by the OGC.
As a consequence, OGR today technically does not stand for anything. In general, OGR supports many

vector formats2 such as shapefiles, personal geodatabases, ArcSDE, MapInfo, GRASS, TIGER/Line, SDTS,
GML, and KML. It also supports many database connections, including MySQL, PostgreSQL, and Oracle

Spatial. More information about GDAL and OGR can be found on their website.3 GDAL/OGR was not initially
developed as a Python library, but many programming languages support GDAL/OGR, including Python. In
that sense, we are using a Python wrapper of the binary library of GDAL/OGR, which to some extent will have
consequences on the performance of the code.

The second library is an open source software package called PySAL (Python Spatial Analysis Library)

that has been used in a wide range of applications of spatial data.4 Different from GDAL/OGR, PySAL is
developed in Python and supports a wide range of spatial analysis applications. PySAL has its own core data
structure that supports file input and output. A key component of PySAL is spatial weights that are central in
many spatial analysis methods.

In this appendix, we will go through some basic features to demonstrate how these powerful geospatial
libraries can help us implement and understand many of the algorithms covered in this book. This appendix is
mainly based on coding, with a lot of comments. Some of the lines included here are for instructional purposes
and therefore may not be necessary in all cases. However, they should provide a good context in which to
understand the functionality of GDAL/ORG and PySAL. We do not intend to explain the meaning of each and
every line since it should be relatively straightforward to understand (especially with the extensive comments
for most parts of the code). For more applications and details of the use of GDAL/OGR library, there is an

online Python GDAL/ORG Cookbook.5 PySAL has its own tutorials6 that include details of how to use the
library.

GIS Algorithms: Theory and Applications for Geographic Information


Page 43 of 73
Science & Technology
Sage Sage Research Methods
© Ningchuan Xiao 2016

There is a great deal we can do with the power of GDAL/OGR and PySAL because these libraries give us
a convenient way to convert geospatial data in any format into any data structures. Almost every algorithm
introduced in this book can be plugged in here. We should take this opportunity to test and explore those
algorithms using our own data.

Both GDAL/OGR and PySAL can be installed on different operating systems. We do not discuss the
installation process here since the websites of these libraries provide detailed information.

B.1 OGR

To use OGR, we typically should start with the following lines, where we show how OGR works for a shapefile.
In the following example, we assume there is a polygon shapefile called [Link] existing in a folder
called data in the upper level of the current working directory. To make it work for new data, we will need to
have a polygon shapefile and make sure we know the path to that file. We focus on polygon shapefiles here.
For other types of shapefiles (e.g., polyline), the processes are similar though the details will be different.
We have an example of the use of a polyline shapefile (world coastlines) in Chapter 2. The last line in the
following code should return False, indicating successful reading of the shapefile.

B.1.1 Attributes

The variable vector instantiated in the previous code lets us access the information stored in the shapefile. A

GIS Algorithms: Theory and Applications for Geographic Information


Page 44 of 73
Science & Technology
Sage Sage Research Methods
© Ningchuan Xiao 2016

general procedure is to go down the hierarchy of file > layer > feature to access different types of information.
For a shapefile, one file only contains one layer, which in turn contains multiple features. Below are the lines
of code for accessing the attributes associated with a feature.

We can write a short function ([Link]) that returns a list of values in an attribute in a shapefile. We will use
this function later in Section B.3.

GIS Algorithms: Theory and Applications for Geographic Information


Page 45 of 73
Science & Technology
Sage Sage Research Methods
© Ningchuan Xiao 2016

B.1.2 Geometry and coordinates

We will now see how to retrieve the geometry and coordinates from the shapefile we just loaded. We only
show how to get points of one polygon, but it is of course possible to get points for all polygons using a loop,
as we will show later.

GIS Algorithms: Theory and Applications for Geographic Information


Page 46 of 73
Science & Technology
Sage Sage Research Methods
© Ningchuan Xiao 2016

B.1.3 Projecting points

We can transform a set of points in a geographic coordinate system (GCS) expressed in longitudes and
latitudes into a projected coordinate system. Suppose we have a few locations from northern Cameroon that
are recorded in WGS84 datum format, and we want to transform them into UTM zone 33N format. Here we
will use the OSR (OGRSpatialReference) library, an OGR module for processing spatial reference systems,

to transform between coordinate systems. OSR is based on the PROJ.4 Cartographic Projections Library.7
The general flow of work is to start by defining the source coordinate system, which is what we call gcs in the
following code. Then we define the target coordinate system, called utmsr below. There are different ways to
define a projected coordinate system, and we choose the easiest way using the EPSG code that is uniquely
assigned to UTM 33N. With these two coordinate systems, we can define a transformation using the OSR
function CoordinateTransformation, and here we name the transformation coordTransPcsUtm33. The known
coordinates we have for the four locations in Cameroon are stored in a list of lists where the string in each
element list refers to a place name. We first need to convert these coordinates into a point geometry in OGR
and then we can simply use the Transform function to do the coordinate system transformation.

GIS Algorithms: Theory and Applications for Geographic Information


Page 47 of 73
Science & Technology
Sage Sage Research Methods
© Ningchuan Xiao 2016

B.1.4 Projecting and writing geospatial data

Now we turn to our shapefile that is in a GCS and this time we will project it to the Albers equal-area conic
projection that is commonly used for the conterminous United States. Our shapefile comes with a .prj file and
we can read the spatial reference information directly from there. The general idea here is to find each point
and transform it to a new coordinate system and then write it to a ring, which is in turn added into a polygon
feature. We save the transformed data into a new shapefile.

A general polygon in a shapefile may have holes, where each hole and the outline polygon are called rings.
However, many counties in our data also have islands in a river or lake and these islands are not holes. In
this case, they are organized as multipolygons. As the name suggests, each multipolygon includes a set of
polygons, and each of those polygons may still contain holes. To streamline the process of handling such
data, we design a specific function trans_rings that retrieves every ring in the input geometry that must be a
polygon, which must not be a multipolygon but can have holes. (Having holes or not does not affect how this
function works because holes and the outline polygon are all treated as rings and a polygon will have at least
one ring.) This function returns a new polygon that contains transformed points. This polygon is then added
into the feature for the new shapefile. For a feature that is only a simple polygon (with or without holes), we
call this function once and get a new polygon. For a multipolygon feature, we will need to call this function
multiple times, each time for a polygon in the multipolygon. Every time a new polygon is added, we create a

GIS Algorithms: Theory and Applications for Geographic Information


Page 48 of 73
Science & Technology
Sage Sage Research Methods
© Ningchuan Xiao 2016

new feature and add the polygon into the feature.

GIS Algorithms: Theory and Applications for Geographic Information


Page 49 of 73
Science & Technology
Sage Sage Research Methods
© Ningchuan Xiao 2016

GIS Algorithms: Theory and Applications for Geographic Information


Page 50 of 73
Science & Technology
Sage Sage Research Methods
© Ningchuan Xiao 2016

Figure B.1 Projecting spatial data. Left: original data in GCS. Right: data projected into
Albers equal-area conic projection

The result is shown in Figure B.1. The two maps in this figure are drawn using Matplotlib using the following
Python code (draw_shp_polygons.py).

GIS Algorithms: Theory and Applications for Geographic Information


Page 51 of 73
Science & Technology
Sage Sage Research Methods
© Ningchuan Xiao 2016

B.1.5 Adjacency matrix

Our final project using OGR is to create an adjacent matrix for a polygon shapefile. The procedure is simple:
we continuously check each pair of polygons and test if they touch each other or one contains another. We

GIS Algorithms: Theory and Applications for Geographic Information


Page 52 of 73
Science & Technology
Sage Sage Research Methods
© Ningchuan Xiao 2016

use the OGR Touches and Contains functions to do so. To speed up the process, we design a function
called env_touch to test if the bounding boxes (envelopes) of two polygons touch each other. We can write
a function called geom_touch to test whether two OGR geometry objects touch each other by sharing a
specified number of points (in parameter n0).

The adjacency_matrix function takes three input parameters: the path to the shapefile, the output format (M
for matrix and L for list), and the number of shared points for polygons to be considered as adjacent (default
is 1). In the test of the function, we use the US county shapefile and dump the final result in matrix form in a
Python pickle file. In Python, pickle is a standard way of translating complicated objects so that we can store
them in a text file and restore them for future use. The pickle file stored here is used in our main text (Section
9.2) for calculating Moran’s I.

GIS Algorithms: Theory and Applications for Geographic Information


Page 53 of 73
Science & Technology
Sage Sage Research Methods
© Ningchuan Xiao 2016

GIS Algorithms: Theory and Applications for Geographic Information


Page 54 of 73
Science & Technology
Sage Sage Research Methods
© Ningchuan Xiao 2016

B.2 GDAL

We now turn to GDAL to explore a few ways to access raster data. In this example we use the 2011 National
Land Cover Database (NLCD) to demonstrate the use of GDAL. Similarly to OGR, we will need a driver to
start using raster data, but in GDAL we also need to call the Register function before we can actually use it
to open the raster file. Driver names (code) are available at [Link] The NLCD
we use here is stored in Erdas Imagine format (.img), and the GDAL code for that is HFA. The purpose here
is to load the NLCD data into a NumPy array using the ReadAsArray function. Once the data are in an array
format, which is an intuitive format for raster data, we can process them using many other methods such as
the contagion metric (Section 9.4).

GIS Algorithms: Theory and Applications for Geographic Information


Page 55 of 73
Science & Technology
Sage Sage Research Methods
© Ningchuan Xiao 2016

B.3 PySAL

PySAL covers a very wide range of topics in spatial analysis and beyond (Rey and Anselin, 2010). Here we
demonstrate how to use it to calculate Moran’s I for an area data set. PySAL provides a lot of options to select
the type of spatial weights. Here we use binary weights, meaning the weight between two polygons will be
either 1 (adjacent) or 0 (non-adjacent). There are also a few different ways to compute the adjacency between
polygons, and here we use the queen criterion, in which two polygons are adjacent if they share one or more
points (the rook criterion, on the other hand, would require that they share an edge). Based on these choices,
we will be able to compare the PySAL results with the code discussed in this book.

GIS Algorithms: Theory and Applications for Geographic Information


Page 56 of 73
Science & Technology
Sage Sage Research Methods
© Ningchuan Xiao 2016

In Section 9.2, we introduced the calculation of Moran’s I based on an adjacency matrix (or list), which is
equivalent to the binary weights determined using the queen criterion. The above code also tests the result of
using the Moran’s I code in this book and the results are almost identical (readers who are interested in this
example can examine the source code and explain what causes the very small difference between the two
programs). To further validate the results, we can test the same data using the spdep package in R. To the
precision reported in the R package, the result is 0.3158373928, which is consistent with the results obtained
by the Python programs.

1 A full list of the supported formats can be found at [Link]

2 A full list of vector data formats supported under OGR can be found at [Link]

ogr_formats.html

3 [Link]

4 [Link]

GIS Algorithms: Theory and Applications for Geographic Information


Page 57 of 73
Science & Technology
Sage Sage Research Methods
© Ningchuan Xiao 2016

5 [Link]

6 [Link]

7 [Link]

GIS Algorithms: Theory and Applications for Geographic Information


Page 58 of 73
Science & Technology
Sage Sage Research Methods
© Ningchuan Xiao 2016

Appendix C Code List

The majority of the programs in this book have a file name so that they can be imported as modules or run
from the command line. Each program is typically run in its own directory, and whenever we need to use a
module from another directory we append that directory to the system path using [Link] from the
sys module. Suppose we are in directory interpolation and we need to import the [Link] module from the
geom directory. We can use the following three lines:

import sys
[Link](’../geom’)
from point import *

In Table C.1 we list all the named files, in the order they appear in the text. In addition to these files, there are
code blocks in the text that are not listed here because they are mainly meant for interactive commands. Data
sets used as examples in this book are also listed in the table.

Table C.1 Code and data

Folder Name Section

geom [Link] 2.1

spherical_distance.py 2.2

[Link] 2.3

[Link] 2.4

[Link] 2.5

[Link] 2.6

[Link] 2.7

GIS Algorithms: Theory and Applications for Geographic Information


Page 59 of 73
Science & Technology
Sage Sage Research Methods
© Ningchuan Xiao 2016

point_in_polygon.py 2.8

point_in_polygon_winding.py 2.9

[Link] 2.10

[Link] 2.11

[Link] 2.12

test_projection.py 2.13

[Link] 2.14

line_seg_eventqueue.py 3.2

line_seg_intersection.py 3.3

test_line_seg_intersection.py 3.4

[Link] 3.5

test_overlay_1.py 3.6

indexing [Link] 5.1

[Link] 5.2

[Link] 5.3

[Link] 5.4

GIS Algorithms: Theory and Applications for Geographic Information


Page 60 of 73
Science & Technology
Sage Sage Research Methods
© Ningchuan Xiao 2016

[Link] 5.5

[Link] 5.6

[Link] 5.7

kdtree_performance.py 5.8

[Link] 6.1

[Link] 6.2

[Link] 6.3

[Link] 6.4

[Link] 7.1

[Link] 7.2

[Link] 7.3

[Link] 7.4

[Link] 7.5

[Link] 7.6

[Link] 7.7

[Link] 7.8

use_rtree.py 7.9

GIS Algorithms: Theory and Applications for Geographic Information


Page 61 of 73
Science & Technology
Sage Sage Research Methods
© Ningchuan Xiao 2016

interpolation [Link] 8.1

read_data.py 8.2

prepare_interpolation_data.py 8.3

test_idw.py 8.4

idw_cross_validation.py 8.5

[Link] 8.6

[Link] 8.8

[Link] 8.9

test_fitsemivariance.py 8.10

[Link] 8.11

ordinary_kriging_test.py 8.12

[Link] 8.13

simple_kriging_test.py 8.14

interpolate_surface.py 8.15

kriging_cross_validation.py 8.16

[Link] 8.17

GIS Algorithms: Theory and Applications for Geographic Information


Page 62 of 73
Science & Technology
Sage Sage Research Methods
© Ningchuan Xiao 2016

spatialanalysis [Link] 9.1

nnd_monte_carlo.py 9.2

oval_trees.py 9.3

test_oval_trees_nnd.py 9.4

[Link] 9.5

test_oval_trees.py 9.6

[Link] 9.7

[Link] 9.8

test_moransi.py 9.9

test_moransi_mc.py 9.10

test_moransi_grid.py 9.11

[Link] 9.12

[Link] 9.13

test_contagion.py 9.14

test_contagion_kernel.py 9.15

networks [Link] 10.2

GIS Algorithms: Theory and Applications for Geographic Information


Page 63 of 73
Science & Technology
Sage Sage Research Methods
© Ningchuan Xiao 2016

[Link] 10.3

[Link] 10.4

[Link] 10.5

[Link] 10.6

[Link] 10.7

[Link] 10.8

optimization [Link] 11.1

[Link] 11.2

elzinga_hearn.py 11.3

[Link] 11.4

test_1center.py 11.5

pmed_lpformat.py 11.6

pmed_greedy.py 12.1

teitz_bart.py 12.3

test_orlib_pmed.py 12.4

simulated_annealing.py 12.6

GIS Algorithms: Theory and Applications for Geographic Information


Page 64 of 73
Science & Technology
Sage Sage Research Methods
© Ningchuan Xiao 2016

test_orlib_pmed_simann.py 12.7

gdal [Link] B.1.1

transform_cameroon_points.py B.1.3

transform_coordinate_systems.py B.1.4

draw_shp_polygons.py B.1.4

adjacency_matrix.py B.1.5

[Link] B.2

[Link] B.3

data uscnty48area.*

franklin.*

[Link]

[Link]

data/orlib pmed*.orlib

pmed*.distmatrix

GIS Algorithms: Theory and Applications for Geographic Information


Page 65 of 73
Science & Technology
Sage Sage Research Methods
© Ningchuan Xiao 2016

References

Aluru, S. (2005). Quadtrees and octrees. In D. Metha and S. Sahni (eds), Handbook of Data Structures and
Applications. Boca Raton, FL: Chapman & Hall/CRC.

Anselin, L. (1995). Local indicators of spatial association – LISA. Geographical Analysis 27(2), 93–115.

Bailey, T. C. and A. C. Gatrell (1995). Interactive Spatial Data Analysis. Harlow, UK: Pearson Education.

Bayer, R. and E. McCreight (1972). Organization and maintenance of large ordered indexes. Acta Informatica
1, 173–189.

Beasley, J. E. (1985). A note on solving large p-median problems. European Journal of Operational Research
21, 270–273.

Beckmann, N., H. Kriegel, R. Schneider, and B. Seeger (1990). The R*-tree: An efficient and robust method
for points and rectangles. In Proceedings of the ACM SIGMOD Conference on Management of Data, Atlantic
City, pp. 322–331. New York: Association for Computing Machinery.

Bentley, J. L. (1975). Multidimensional binary search trees used for associative searching. Communications
of the ACM 18(9), 509.

Bentley, J. L. and T. A. Ottmann (1979). Algorithms for reporting and counting geometric intersections. IEEE
Transactions on Computers C 28(9), 643–647.

Bentley, J. L., D. F. Stanat, and J. E. H. Williams (1977). The complexity of finding fixed-radius near neighbors.
Information Processing Letters 6(6), 209–212.

Boots, B. N. and A. Getis (1988). Point Pattern Analysis. Newbury Park, CA: Sage.

Burrough, P. A. and R. A. McDonnell (1998). Principles of Geographical Information Systems. Oxford: Oxford
University Press.

Carr, J. R. (1995). Numerical Analysis for the Geological Sciences. Englewood Cliffs, NJ: Prentice Hall.

Cerny, V. (1985). Thermodynamical approach to the traveling salesman problem: An efficient simulation
algorithm. Journal of Optimization Theory and Application 45(1), 41–51.

Chakraborty, R. K. and P. K. Chaudhuri (1981). Note on geometrical solution for some minimax location
problems. Transportation Science 15, 164–166.

GIS Algorithms: Theory and Applications for Geographic Information


Page 66 of 73
Science & Technology
Sage Sage Research Methods
© Ningchuan Xiao 2016

Chan, T. M. (1994). A simple trapezoid sweep algorithm for reporting red/blue segment intersections. In
Canadian Conference on Computational Geometry, Saskatoon, Saskatchewan, Canada, pp. 263–268.

Chrystal, G. (1885). On the problem to construct the minimum circle enclosing n given points in the plane.
Proceedings of Edinburgh Mathematical Society 3, 30–33.

Clark, P. J. and F. C. Evans (1954). Distance to nearest neighbor as a measure of spatial relationships in
populations. Ecology 35, 445–453.

Cliff, A. D. and J. K. Ord (1981). Spatial Processes: Models and Applications. London: Pion.

Cormen, T. H., C. E. Leiserson, R. L. Rivest, and C. Stein (2001). Introduction to Algorithms (2nd edn).
Cambridge, MA: MIT Press.

Cressie, N. (1990). The origins of kriging. Mathematical Geology 22, 239–253.

Cressie, N. A. C. (1991). Statistics for Spatial Data. New York: John Wiley & Sons.

Daskin, M. S. (1995). Network and Discrete Location: Models, Algorithms, and Applications. New York: John
Wiley & Sons.

de Berg, M., M. van Kreveld, M. Overmars, and O. Schwarzkopf (1998). Computational Geometry: Algorithms
and Applications (2nd edn). Berlin: Springer.

Densham, P. J. and G. Rushton (1992). A more efficient heuristic for solving large p-median problems. Papers
in Regional Science 41(3), 307–329.

Deza, M. M. and E. Deza (2010). Encyclopedia of Distances (2nd edn). Berlin: Springer.

Dijkstra, E. W. (1959). A note on two problems in connexion with graphs. Numerische Mathematik 1, 269–271.

Dorigo, M. and L. M. Gambardella (1997). Ant colony system: A cooperative learning approach to the traveling
salesman problem. IEEE Transactions on Evolutionary Computation 1(1), 53–66.

Drezner, Z. (ed.) (1995). Facility Location: A Survey of Applications and Methods. New York: Springer.

Drezner, Z. and S. Shelah (1987). On the complexity of the Elzinga–Hearn algorithm for the 1-center problem.
Mathematics of Operations Research 12(2), 255–261.

Dueck, G. and T. Scheuer (1990). Threshold accepting: A general purpose optimization algorithm appearing
superior to simulated annealing. Journal of Computational Physics 90, 161–175.

GIS Algorithms: Theory and Applications for Geographic Information


Page 67 of 73
Science & Technology
Sage Sage Research Methods
© Ningchuan Xiao 2016

Elzinga, J. and D. W. Hearn (1972). Geometrical solutions for some minimax location problems.
Transportation Science 6, 379–394.

Finkel, R. and J. Bentley (1974). Quad trees: A data structure for retrieval on composite keys. Acta Informatica
4(1), 1–9.

Floyd, R. W. (1962). Algorithm 97: Shortest path. Communications of the ACM 5(6), 345.

Fotheringham, A. S., C. Brunsdon, and M. Charlton (2000). Quantitative Geography: Perspectives on Spatial
Data Analysis. London: Sage.

Fournier, A., D. Fussell, and L. Carpenter (1982). Computer rendering of stochastic models. Communication
of the ACM 25(6), 371–384.

Frank, A. U. (1987). Overlay processing in spatial information systems. In Proceedings, Eighth International
Symposium on Computer-Assisted Cartography (Auto-Carto 8), pp. 12–31. Baltimore, MD: ASPRS, ACSM.

Franklin, W. R., C. Narayanaswami, M. Kankanhalli, D. Sun, M.-C. Zhou, and P. Y. Wu (1989). Uniform grids:
A technique for intersection detection on serial and parallel machines. In Ninth International Symposium on
Computer-Assisted Cartography (Auto-Carto 9), pp. 100–109. Baltimore, MD: ASPRS, ACSM.

Geary, R. C. (1954). The contiguity ratio and statistical mapping. Incorporated Statistician 5, 115–141.

Gendreau, M., G. Laporte, and F. Semet (1997). Solving an ambulance location model by tabu search.
Location Science 5(2), 75–88.

Getis, A. and J. K. Ord (1992). The analysis of spatial association by use of distance statistics. Geographical
Analysis 24(3), 189–206.

Goldberg, D. E. (1989). Genetic Algorithms in Search, Optimization and Machine Learning. Reading, MA:
Addison-Wesley.

Goovaerts, P. (1997). Geostatistics for Natural Resources Evaluation. Oxford: Oxford University Press.

Guttman, A. (1984). R-trees: A dynamic index structure for spatial searching. In Proceedings of the 1984
ACM SIGMOD International Conference on Management of Data, SIGMOD 84, Boston, pp. 47–57. New York:
ACM.

Haines, E. (1994). Point in polygon strategies. In P. S. Heckbert (ed.), Graphics Gems IV, pp. 24–46. San
Francisco: Morgan Kaufmann.

GIS Algorithms: Theory and Applications for Geographic Information


Page 68 of 73
Science & Technology
Sage Sage Research Methods
© Ningchuan Xiao 2016

Hansen, P. and N. Mladenovic (1997). Variable neighborhood search for the p-median. Location Science
5(4), 207–226.

Healey, R. G., S. Sowers, B. M. Gittings, and M. J. Mineter (1998). Parallel Processing Algorithms for GIS.
London: Taylor and Francis.

Huang, C.-W. and T.-Y. Shih (1997). On the complexity of point-in-polygon algorithms. Computers &
Geociences 23(1), 109–118.

Ipbuker, C. (2004). Numerical evaluation of the Robinson projection. Cartography and Geographic
Information Science 31(2), 79–88.

Jordan, C. (1887). Cours d’analyse de l’École polytechnique. Paris: Gauthier-Villars.

Kamel, I. and C. Faloutsos (1994). Hilbert R-tree: An improved R-tree using fractals. In Proceedings of the
20th International Conference on Very Large Data Bases, pp. 500–509. San Francisco: Morgan Kaufmann.

Kirkpatrick, S., C. D. Gelatt, and M. P. Vecchi, Jr. (1983). Optimization by simulated annealing. Science 220,
671–680.

Krige, D. G. (1951). A statistical approach to some basic mine valuation problems on the Witwatersrand.
Journal of the Chemical and Metallurgical Mining Society of South Africa 52, 119–139.

Krzanowski, R. M. and J. Raper (2001). Spatial Evolutionary Modeling. Oxford: Oxford University Press.

Lee, D. T. and C. K. Wong (1977). Worst-case analysis for region and partial region searches in
multidimensional binary search trees and balanced quad trees. Acta Informatica 9(1), 23–29.

Lombard, K. and R. L. Church (1993). The gateway shortest path problem: Generating alternative routes for
a corridor location problem. Geographical Systems 1(1), 25–45.

Mandelbrot, B. B. (1967). How long is the coast of Britain? Statistical self-similarity and fractional dimension.
Science 155, 636–638.

Mandelbrot, B. B. (1977a). The Fractal Geometry of Nature. San Francisco: W. H. Freeman.

Mandelbrot, B. B. (1977b). Fractals: Form, Chance, and Dimension. San Francisco: W. H. Freeman.

Manolopoulos, Y., A. Nanopoulos, A. N. Papadopoulos, and Y. Theodoridis (2006). R-Trees: Theory and
Applications. London: Springer.

GIS Algorithms: Theory and Applications for Geographic Information


Page 69 of 73
Science & Technology
Sage Sage Research Methods
© Ningchuan Xiao 2016

Matheron, G. (1963). Principles of geostatistics. Economic Geology 58, 1246–1266.

McGarigal, K. and B. J. Marks (1995). FRAGSTATS: Spatial pattern analysis program for quantifying
landscape structure. Technical Report PNW-GTR-351, U.S. Department of Agriculture, Forest Service, Pacific
Northwest Research Station.

Meagher, D. (1980). Octree encoding: A new technique for the representation, manipulation and display of
arbitrary 3-D objects by computer. Technical Report IPL-TR-80-111, Rensselaer Polytechnic Institute.

Meagher, D. (1982). Geometric modeling using octree encoding. Computer Graphics and Image Processing
19, 129–147.

Michalewicz, Z. and D. B. Fogel (2000). How to Solve It: Modern Heuristics. Berlin: Springer.

Miller, H. J. and S.-L. Shaw (2001). Geographic Information Systems for Transportation: Principles and
Applications. New York: Oxford University Press.

Moran, P. A. P. (1950). Notes on continuous stochastic phenomena. Biometrika 37(1), 17–23.

O’Neill, R., J. Krummel, R. Gardner, G. Sugihara, B. Jackson, D. DeAngelis, B. Milne, M. Turner, B. Zygmunt,
S. Christensen, V. Dale, and R. Graham (1988). Indices of landscape pattern. Landscape Ecology 1(3),
153–162.

O’Rourke, J. (1998). Point in polygon. In Computational Geometry in C (2nd edn), pp. 279–285. Cambridge:
Cambridge University Press.

Peitgen, H.-O., H. Jürgens, and D. Saupe (1992). Chaos and Fractals: New Frontiers of Science. New York:
Springer.

Peitgen, H.-O. and D. Saupe (1988). The Science of Fractal Images. New York: Springer.

Plastria, F. (2002). Continuous covering location problems. In Z. Drezner and H. W. Hamacher (eds), Facility
Location: Applications and Theory, pp. 37–79. Berlin: Springer.

Press, W. H., S. A. Teukosky, W. T. Vetterling, and B. P. Flannery (2002). Numerical Recipes in C++ (2nd
edn). Cambridge: Cambridge University Press.

Resende, M. G. C. and R. F. Werneck (2003). On the implementation of a swap-based local search procedure
for the p-median problem. In R. E. Ladner (ed.), Proceedings of the Fifth Workshop on Algorithm Engineering
and Experiments, pp. 119–127. Philadelphia: SIAM.

GIS Algorithms: Theory and Applications for Geographic Information


Page 70 of 73
Science & Technology
Sage Sage Research Methods
© Ningchuan Xiao 2016

Resende, M. G. C. and R. F. Werneck (2004). A hybrid heuristic for the p-median problem. Journal of
Heuristics 10, 59–88.

Rey, S. J. and L. Anselin (2010). PySAL: A Python library of spatial analytical methods. In M. Fischer and A.
Getis (eds), Handbook of Applied Spatial Analysis: Software Tools, Methods and Applications, pp. 175–193.
Berlin: Springer.

Richardson, R. T. (1989). Area deformation on the Robinson projection. American Cartographer 16, 294–296.

Riitters, K. H., R. V. O’Neill, J. D. Wickham, and K. B. Jones (1996). A note on contagion indices for landscape
analysis. Landscape Ecology 11(4), 197–202.

Ripley, B. D. (1981). Spatial Statistics. New York: John Wiley & Sons.

Robinson, A. H. (1974). A new map projection: Its development and characteristics. International Yearbook of
Cartography 14, 145–155.

Rosing, K. E. (1997). An empirical investigation of the effectiveness of a vertex substitution heuristic.


Environment and Planning B: Planning and Design 24(1), 59–67.

Rosing, K. E., E. L. Hillsman, and H. Rosing-Vogelaar (1979). A note comparing optimal and heuristic
solutions to the p-median problem. Geographical Analysis 11(1), 86–89.

Rosing, K. E. and M. J. Hodgson (2002). Heuristic concentration for the p-median: An example demonstrating
how and why it works. Computers & Operations Research 29, 1317–1330.

Samet, H. (1990a). Applications of Spatial Data Structures: Computer Graphics, Image Processing, and GIS.
Reading, MA: Addison-Wesley.

Samet, H. (1990b). The Design and Analysis of Spatial Data Structures. Reading, MA: Addison-Wesley.

Samet, H. (2006). Foundations of Multidimensional and Metric Data Structures. San Francisco: Morgan
Kaufmann.

Samet, H. and R. E. Webber (1985). Storing a collection of polygons using quadtrees. ACM Transactions on
Graphics 4(3), 182–222.

Sellis, T., N. Roussopoulos, and C. Faloutsos (1987). The R+-tree: A dynamic index for multi-dimensional
objects. Technical report, VLDB Endowments.

Shamos, M. and D. Hoey (1976). Geometric intersection problems. In 17th Annual Symposium on

GIS Algorithms: Theory and Applications for Geographic Information


Page 71 of 73
Science & Technology
Sage Sage Research Methods
© Ningchuan Xiao 2016

Foundations of Computer Science, pp. 208–215. IEEE.

Sinnott, R. (1984). Virtues of the Haversine. Sky and Telescope 68, 159.

Snyder, J. P. (1987). Map projections – a working manual. Professional paper 1395, U.S. Geological Survey.

Snyder, J. P. (1990). The Robinson projection: A computation algorithm. Cartography and Geographic
Information Systems 17(4), 301–305.

Sylvester, J. J. (1860). On Poncelet’s approximate linear valuation of Surd forms. Philosophical Magazine 20,
203–222.

Teitz, M. B. and P. Bart (1968). Heuristic methods for estimating the generalized vertex median of a weighted
graph. Operations Research 16, 955–961.

Tobler, W. R. (1970). A computer movie simulating urban growth in the Detroit region. Economic Geography
46, 234–240.

Webster, R. and M. A. Oliver (1990). Statistical Methods in Soil and Land Resource Survey. Oxford: Oxford
University Press.

Welzl, E. (1991). Smallest enclosing disks (balls and ellipsoids). In H. Maurer (ed.), New Results and New
Trends in Computer Science, pp. 359–370. Berlin: Springer.

Whitaker, R. (1983). A fast algorithm for the greedy interchange of large-scale clustering and median location
problems. INFOR 21, 95–108.

Wilford, J. N. (1988). The impossible quest for the perfect map. New York Times, October 25. [Link]
1BcUqAt.

Wolpert, D. and W. Macready (1997). No free lunch theorems for optimization. IEEE Transactions on
Evolutionary Computation 1(1), 67–82.

Xiao, N. (2006). An evolutionary algorithm for site search problems. Geographical Analysis 38(3), 227–247.

Xiao, N. (2008). A unified conceptual framework for geographical optimization using evolutionary algorithms.
Annals of the Association of American Geographers 98(4), 795–817.

Xiao, N. (2012). A parallel cooperative hybridization approach to the p-median problem. Environment and
Planning B: Planning and Design 39, 755–774.

GIS Algorithms: Theory and Applications for Geographic Information


Page 72 of 73
Science & Technology
Sage Sage Research Methods
© Ningchuan Xiao 2016

Xiao, N., D. A. Bennett, and M. P. Armstrong (2002). Using evolutionary algorithms to generate alternatives
for multiobjective site search problems. Environment and Planning A 34(4), 639–656.

Xiao, N., D. A. Bennett, and M. P. Armstrong (2007). Interactive evolutionary approaches to multiobjective
spatial decision making: A synthetic review. Computers, Environment and Urban Systems 30, 232–252.

Xiao, N. and A. T. Murray (2015). Interpolation. In M. Monmonier (ed.), History of Cartography, Volume 6:
Cartography in the Twentieth Century. Chicago: University of Chicago Press.

Young, C., D. Martin, and C. Skinner (2009). Geographically intelligent disclosure control for flexible
aggregation of census data. International Journal of Geographical Information Science 23(4), 457–482.

GIS Algorithms: Theory and Applications for Geographic Information


Page 73 of 73
Science & Technology

You might also like