100% found this document useful (2 votes)
2K views554 pages

Pro Spatial With SQL Server 2012

SQL Server Spatial functionality - as of 2012

Uploaded by

Joseph Riccardo
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
100% found this document useful (2 votes)
2K views554 pages

Pro Spatial With SQL Server 2012

SQL Server Spatial functionality - as of 2012

Uploaded by

Joseph Riccardo
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 554

www.it-ebooks.

info
For your convenience Apress has placed some of the front
matter material after the index. Please use the Bookmarks
and Contents at a Glance links to access them.

www.it-ebooks.info
Contents at a Glance

Contents..................................................................................................................vii
Foreword ............................................................................................................... xxi
About the Author .................................................................................................. xxii
About the Technical Reviewer ............................................................................. xxiii
Acknowledgments ............................................................................................... xxiv
Introduction .......................................................................................................... xxv
Chapter 1: Spatial Reference Systems.................................................................1
Chapter 2: Spatial Features ...............................................................................21
Chapter 3: Spatial Datatypes .............................................................................51
Chapter 4: Creating Spatial Data........................................................................77
Chapter 5: Importing Spatial Data ...................................................................101
Chapter 6: Geocoding.......................................................................................139
Chapter 7: Precision, Validity, and Errors........................................................163
Chapter 8: Transformation and Reprojection...................................................187
Chapter 9: Examining Spatial Properties .........................................................211
Chapter 10: Modification and Simplification ...................................................253
Chapter 11: Aggregation and Combination ......................................................273
Chapter 12: Testing Spatial Relationships .......................................................293
Chapter 13: Clustering and Distribution Analysis ............................................327

www.it-ebooks.info
Chapter 14: Route Finding ...............................................................................353
Chapter 15: Triangulation and Tesselation ......................................................387
Chapter 16: Visualization and User Interface ..................................................419
Chapter 17: Reporting Services .......................................................................445
Chapter 18: Indexing........................................................................................471
Appendix ..........................................................................................................499

Index.....................................................................................................................519

vi

www.it-ebooks.info
CHAPTER 1

Spatial Reference Systems

Spatial data analysis is a complex subject area, taking elements from a range of academic disciplines,
including geophysics, mathematics, astronomy, and cartography. Although you do not need to
understand these subjects in great depth to take advantage of the spatial features of SQL Server 2012,
it is important to have a basic understanding of the theory involved so that you use spatial data
appropriately and effectively in your applications.
This chapter describes spatial reference systemsways of describing positions in spaceand
shows how these systems can be used to define features on the earth's surface. The theoretical concepts
discussed in this chapter are fundamental to the creation of consistent, accurate spatial data, and are
used throughout the practical applications discussed in later chapters of this book.

What Is a Spatial Reference System?


The purpose of a spatial reference system (sometimes called a coordinate reference system) is to
identify and unambiguously describe any point in space. You are probably familiar with the terms
latitude and longitude, and have seen them used to describe positions on the earth. If this is the case,
you may be thinking that these represent a spatial reference system, and that a pair of latitude and
longitude coordinates uniquely identifies every point on the earth's surface, but, unfortunately, it's
not quite that simple.
What many people don't realize is that any particular point on the ground does not have a unique
latitude or longitude associated with it. There are, in fact, many systems of latitude and longitude, and
the coordinates of a given point on the earth will differ depending on which system was used.
Furthermore, latitude and longitude coordinates are not the only way to define locations: many spatial
reference systems describe the position of an object without using latitude and longitude at all. For
example, consider the following three sets of coordinates:
51.179024688899524, 1.82747483253479
SU 1215642213
581957, 5670386
These coordinates look very different, yet they all describe exactly the same point on the earth's
surface, located in Stonehenge, in Wiltshire, United Kingdom. The coordinates differ because they all
relate to different spatial reference systems: the first set is latitude and longitude coordinates from the
WGS84 reference system, the second is a grid reference from the National Grid of Great Britain, and
the third is a set of easting/northing coordinates from UTM Zone 30U.
Defining a spatial reference system involves not only specifying the type of coordinates used, but
also stating where those coordinates are measured from, in what units, and the shape of the earth over
which those coordinates extend.

www.it-ebooks.info
CHAPTER 1 SPATIAL REFERENCE SYSTEMS

Many different spatial reference systems exist, and each has different benefits and drawbacks:
some offer high accuracy but only over a relatively small geographic area; others offer reasonable
accuracy across the whole globe. Some spatial reference systems are designed for particular purposes,
such as for nautical navigation or for scientific use, whereas others are designed for general global use.
One key point to remember is that every set of coordinates is unique to a particular spatial
reference system, and only makes sense in the context of that system.

Modeling the Earth


The earth is a very complex shape. On the surface, we can see by looking around us that there are
irregular topographical features such as mountains and valleys. But even if we were to remove these
features and consider the mean sea-level around the planet, the earth is still not a regular shape. In
fact, it is so unique that geophysicists have a specific word solely used to describe the shape of the earth:
the geoid.
The surface of the geoid is smooth, but it is distorted by variations in gravitational field strength
caused by changes in the earth's composition. Figure 1-1 illustrates a depiction of the shape of the geoid.

Figure 1-1. The irregular shape of the earth.

In order to describe the location of an object on the earths surface with maximum accuracy, we
would ideally define its position relative to the geoid itself. However, even though scientists have
recently developed very accurate models of the geoid (to within a centimeter accuracy of the true
shape of the earth), the calculations involved are very complicated. Instead, spatial reference systems
normally define positions on the earth's surface based on a simple model that approximates the geoid.
This approximation is called a reference ellipsoid.

Note Not only is the geoid a complex shape, but it is also not constant. Astronomical and geophysical forces,
climatic changes, and volcanic activity all contribute to changes in the earth's structure that continuously alter the
shape of the geoid.

www.it-ebooks.info
CHAPTER 1 SPATIAL REFERENCE SYSTEMS

Approximating the Geoid


Many early civilizations believed the world to be flat (and a handful of modern day organizations still
do, e.g., the Flat Earth Society, http://www.theflatearthsociety.org). Our current understanding of
the shape of the earth is largely based on the work of Ancient Greek philosophers and scientists,
including Pythagoras and Aristotle, who scientifically proved that the world is, in fact, round. With this
fact in mind, the simplest reference ellipsoid that can be used to approximate the shape of the geoid is a
perfect sphere. Indeed, there are some spatial reference systems that do use a perfect sphere to model
the geoid, such as the system on which many Web-mapping providers, including Google Maps and
Bing Maps, are based.
Although a sphere would certainly be a simple model to use, it doesn't really match the shape of the
earth that closely. A better model of the geoid, and the one most commonly used, is an oblate spheroid. A
spheroid is the three-dimensional shape obtained when you rotate an ellipse about its shorter axis. In
other words, it's a sphere that has been squashed in one of its dimensions. When used to model the
earth, spheroids are always oblatethey are wider than they are highas if someone sat on a beach
ball. This is a fairly good approximation of the shape of the geoid, which bulges around the equator.
The most important property of a spheroid is that, unlike the geoid, a spheroid is a regular shape
that can be exactly mathematically described by two parameters: the length of the semi-major axis
(which represents the radius of the earth at the equator), and the length of the semi-minor axis (the
radius of the earth at the poles). The properties of a spheroid are illustrated in Figure 1-2.

Figure 1-2. Properties of a spheroid.

Note A spheroid is a sphere that has been flattened in one axis, and can be described using only two
parameters. An ellipsoid is a sphere that has been flattened in two axes; that is, the radii of the shape in the x-, y-,
and z-axes are all different. Although referred to as a reference ellipsoid, in practice most models of the earth are
actually spheroids, because ellipsoid models of the world are not significantly more accurate at describing the
shape of the geoid than simpler spheroid models.

www.it-ebooks.info
CHAPTER 1 SPATIAL REFERENCE SYSTEMS

The flattening ratio of an ellipsoid, f, is used to describe how much the ellipsoid has been
squashed, and is calculated as
f = (a b ) / a
where a = length of the semi-major axis; b = length of the semi-minor axis.
In most ellipsoidal models of the earth, the semi-minor axis is only marginally smaller than the
semi-major axis, which means that the value of the flattening ratio is also small, typically around
0.003. As a result, it is sometimes more convenient to state the inverse flattening ratio of an ellipsoid
instead. This is written as 1/f, and calculated as follows.
1 / f = a / (a b)
The inverse-flattening ratio of an ellipsoid model typically has a value of approximately
300.Given the length of the semi-major axis a and any one other parameter, f, 1/f, or b, we have all the
information necessary to describe a reference ellipsoid used to model the shape of the earth.

Regional Variations in Ellipsoids


There is not a single reference ellipsoid that best represents every part of the whole geoid. Some
ellipsoids, such as the World Geodetic System 1984 (WGS84) ellipsoid, provide a reasonable
approximation of the overall shape of the geoid. Other ellipsoids approximate the shape of the geoid
very accurately over certain regions of the world, but are much less accurate in other areas. These local
ellipsoids are normally only used in spatial reference systems designed for use in specific countries,
such as the Airy 1830 ellipsoid commonly used in Britain, or the Everest ellipsoid used in India.
Figure 1-3 provides an (exaggerated) illustration of how different ellipsoid models vary in accuracy
over different parts of the geoid. The dotted line represents an ellipsoid that provides the best accuracy
over the region of interest, whereas the dash-dotted line represents the ellipsoid of best global accuracy.

Figure 1-3. Comparison of cross-sections of different ellipsoid models of the geoid.

www.it-ebooks.info
CHAPTER 1 SPATIAL REFERENCE SYSTEMS

It is important to realize that specifying a different reference ellipsoid to approximate the geoid
will affect the accuracy with which a spatial reference system based on that ellipsoid can describe the
position of features on the earth. When choosing a spatial reference system, we must therefore be
careful to consider one that is based on an ellipsoid suitable for the data in question.
SQL Server 2012 recognizes spatial reference systems based on a number of different reference
ellipsoids, which best approximate the geoid at different parts of the earth. Table 1-1 lists the
properties of some commonly used ellipsoids that can be used.

Table 1-1. Properties of Some Commonly Used Ellipsoids

Ellipsoid Name Semi-Major Axis (m) Semi-Minor Axis (m) Inverse Flattening Area of Use

Airy (1830) 6,377,563.396 6,356,256.909 299.3249646 Great Britain

Bessel (1841) 6,377,397.155 6,356,078.963 299.1528128 Czechoslovakia,


Japan, South Korea

Clarke (1880) 6,378,249.145 6,356,514.87 293.465 Africa

NAD 27 6,378,206.4 6,356,583.8 294.9786982 North America

NAD 83 6,378,137 6,356,752.3 298.2570249 North America

WGS 84 6,378,137 6,356,752.314 298.2572236 Global

Realizing a Reference Ellipsoid Model with a Reference Frame


Having established the size and shape of an ellipsoid model, we need some way to position that model
to make it line up with the correct points on the earth's surface. An ellipsoid is just an abstract
mathematical shape; in order to use it as the basis of a spatial reference system, we need to correlate
coordinate positions on the ellipsoid with real-life locations on the earth. We do this by creating a
frame of reference points.
Reference points are places (normally on the earth's surface) that are assigned known
coordinates relative to the ellipsoid being used. By establishing a set of points of known coordinates,
we can use these to "realize" the reference ellipsoid in the correct position relative to the earth. Once
the ellipsoid is set in place based on a set of known points, we can then obtain the coordinates of any
other points on the earth, based on the ellipsoid model.
Reference points are sometimes assigned to places on the earth itself; The North American Datum
of 1927 (NAD 27) uses the Clarke (1866) reference ellipsoid, primarily fixed in place at Meades Ranch
in Kansas. Reference points may also be assigned to the positions of satellites orbiting the earth, which
is how the WGS 84 datum used by GPS-positioning systems is realized.
When packaged together, the properties of the reference ellipsoid and the frame of reference
points form a geodetic datum. The most common datum in global use is the World Geodetic System of
1984, commonly referred to as WGS 84. This is the datum used in handheld GPS systems, Google Earth,
and many other applications.
A datum, consisting of a reference ellipsoid model combined with a frame of reference points,
creates a usable model of the earth used as the basis for a spatial reference system.
We now need to consider what sort of coordinates to use to describe positions relative to the
chosen datum.

www.it-ebooks.info
CHAPTER 1 SPATIAL REFERENCE SYSTEMS

There are many different sorts of coordinate systems, but when you use geospatial data in SQL
Server 2012, you are most likely to use a spatial reference system based on either geographic or
projected coordinates.

Geographic Coordinate Systems


In a geographic coordinate system, any location on the surface of the earth can be defined using two
coordinates: a latitude coordinate and a longitude coordinate.
The latitude coordinate of a point measures the angle between the plane of the
equator and a line drawn perpendicular to the surface of the earth to that point.
The longitude coordinate measures the angle (in the equatorial plane)
between a line drawn from the center of the earth to the point and a line
drawn from the center of the earth to the prime meridian. The prime meridian
is an imaginary line drawn on the earth's surface between the North Pole and
the South Pole (so technically it is an arc, rather than a line) that defines the
axis from which angles of longitude are measured.
The definitions of the geographic coordinates of latitude and longitude are illustrated in Figure 1-4.

Figure 1-4. Describing positions using a geographic coordinate system.

Caution Because a point of greater longitude lies farther east, and a point of greater latitude lies farther north,
it is a common mistake for people to think of latitude and longitude as measured on the earth's surface itself, but
this is not the case: latitude and longitude are angles measured from the plane of the equator and prime meridian
at the center of the earth.

www.it-ebooks.info
CHAPTER 1 SPATIAL REFERENCE SYSTEMS

Geographic Units of Measurement


Coordinates of latitude and longitude are both angles, and are usually measured in degrees (although
they can be measured in radians or any other angular unit of measure). When measured in degrees,
longitude values measured from the prime meridian range from 180 to +180, and latitude values
measured from the equator range from 90 (at the South pole) to +90 (at the North pole).
Longitudes to the east of the prime meridian are normally stated as positive values, or suffixed
with the letter "E". Longitudes to the west of the prime meridian are expressed as negative values, or
using the suffix "W". Likewise, latitudes north of the equator are expressed as positive values or using
letter "N", whereas those south of the equator are negative or denoted with the letter "S".

NOTATION OF GEOGRAPHIC COORDINATES


There are several accepted methods of expressing coordinates of latitude and longitude.

1. The most commonly used method is the DMS (degree, minutes, seconds) system,
also known as sexagesimal notation. In this system, each degree is divided into 60
minutes. Each minute is further subdivided into 60 seconds. A value of 51
degrees, 15 minutes, and 32 seconds is normally written as 511532.
2. An alternative system, commonly used by GPS receivers, displays whole degrees,
followed by minutes and decimal fractions of minutes. This same coordinate value
would therefore be written as 51:15.53333333.
3. Decimal degree notation specifies coordinates using degrees and decimal
fractions of degrees, so the same coordinate value expressed using this system
would be written as 51.25888889.
When expressing geographic coordinate values of latitude and longitude for use in SQL Server 2012, you
must always use decimal degree notation. The advantage of this format is that each coordinate can be
expressed as a single floating-point number. To convert DMS coordinates into decimal degrees you can
use the following rule.
Degrees + (Minutes / 60) + (Seconds / 3600) = Decimal Degrees
For example, the CIA World Factbook (https://www.cia.gov/library/publications/the-world-
factbook/geos/uk.html ) gives the geographic coordinates for London as follows,
51 30 N, 0 10 W
When expressed in decimal degree notation, this is
51.5 (Latitude), 0.166667 (Longitude)

Defining the Origin of a Geographic Coordinate System


Latitude coordinates are always measured relative to the equator: the line that goes around the
"middle" of the earth. But from where should longitude coordinates, which are measured around the
earth, be measured?

www.it-ebooks.info
CHAPTER 1 SPATIAL REFERENCE SYSTEMS

A common misconception is to believe that there is a universal prime meridian based on some
inherent fundamental property of the earth, but this is not the case. The prime meridian of any spatial
reference system is arbitrarily chosen simply to provide a line of zero longitude from which all other
coordinates of longitude can be calculated. The most commonly used prime meridian is the meridian
passing through Greenwich, England, but there are many others. For example, the RT38 spatial
reference system used in Sweden is based on a prime meridian that passes through Stockholm, some
18 degrees east of the Greenwich Prime Meridian. Prime meridians from which coordinates are
measured in other systems include those that pass through Paris, Jakarta, Madrid, Bogota, and Rome.
If you were to define a different prime meridian, the value of the longitude coordinate of all the
points in a given spatial reference system would change.

Projected Coordinate Systems


Describing the location of positions on the earth using coordinates of latitude and longitude is all very
well in certain circumstances, but it's not without some problems. To start with, you can only apply
angular coordinates onto a three-dimensional, round model of the earth. If you were planning a car
journey you'd be unlikely to refer to a travel globe though, wouldn't you? Because an ellipsoidal
model, by definition, represents the entire world, you can't magnify an area of interest without
enlarging the entire globe. Clearly this would get unwieldy for any applications that required focusing
in detail on a small area of the earths surface.
Fortunately, ancient geographers and mathematicians devised a solution for this problem, and the
art of cartography, or map-making, was born. Using various techniques, a cartographer can project all,
or part, of the surface of an ellipsoidal model onto a flat plane, creating a map. The features on that
map can be scaled or adjusted as necessary to create maps suitable for different purposes.
Because a map is a flat, two-dimensional surface, we can then describe positions on the plane of
that map using familiar two-dimensional Cartesian coordinates in the x- and y-axes. This is known as
a projected coordinate system.

Note In contrast to a geographic coordinate system, which defines positions on a three-dimensional, round
model of the earth, a projected coordinate system describes the position of points on the earths surface as they lie
on a flat, projected, two-dimensional plane.

Creating Map Projections


We see two-dimensional projections of geospatial data on an almost daily basis in street maps, road
atlases, or on our computer screens. Given their familiarity, and the apparent simplicity of working on
a flat surface rather than a curved one, you would be forgiven for thinking that defining spatial data
using a projected coordinate system was somehow simpler than using a geographic coordinate system.
The difficulty associated with a projected coordinate system is that, of course, the world isnt a flat, two-
dimensional plane. In order to be able to represent it as one, we have to use a map projection.
Projection is the process of creating a two-dimensional representation of a three-dimensional
model of the earth, as illustrated in Figure 1-5. Map projections can be constructed either by using
purely geometric methods (such as the techniques used by ancient cartographers) or by using
mathematical algorithms (as used in more modern, complex projections). However, whatever method
is used, it is not possible to project any three-dimensional object onto a two-dimensional plane

www.it-ebooks.info
CHAPTER 1 SPATIAL REFERENCE SYSTEMS

without distorting the resulting image in some way. Distortions introduced as a result of the projection
process may affect the area, shape, distance, or direction represented by different elements of the map.

Figure 1-5. Projecting a 3D model of the earth to create a flat map.

By altering the projection method, cartographers can reduce the effect of these distortions for
certain features, but in doing so the accuracy of other features must be compromised. Just as there is
not a single "best" reference ellipsoid to model the three-dimensional shape of the earth, neither is
there a single best map projection when trying to project that model onto a two-dimensional surface.
Over the course of time, many projections have been developed that alter the distortions
introduced as a result of projection to create maps suitable for different purposes. For instance, when
designing a map to be used by sailors navigating through the Arctic regions, a projection may be used
that maximizes the accuracy of the direction and distance of objects at the poles of the earth, but
sacrifices accuracy of the shape of countries along the equator.
The full details of how to construct a map projection are outside the scope of this book. However,
the following sections introduce some common map projections and examine their key features.

HammerAitoff Projection
The HammerAitoff map projection is an equal-area map projection that displays the world on an
ellipse. An equal-area map projection is one that maintains the relative area of objects; that is, if you
were to measure the area of any particular region on the map, it would accurately represent the area
of the corresponding real-world region. However, in order to do this, the shapes of features are
distorted. The HammerAitoff map projection is illustrated in Figure 1-6.

www.it-ebooks.info
CHAPTER 1 SPATIAL REFERENCE SYSTEMS

Figure 1-6. The HammerAitoff map projection.

Mercator Projection
The Mercator map projection is an example of a conformal map projection. A conformal map
projection is any projection that preserves the local shape of objects on the resulting map.
The Mercator projection was first developed in 1569 by the Flemish cartographer Gerardus
Mercator, and has been widely used ever since. It is used particularly in nautical navigation because,
when using any map produced using the Mercator projection, the route taken by a ship following a
constant bearing will be depicted as a straight line on the map.
The Mercator projection accurately portrays all points that lie exactly on the equator. However, as
you move farther away from the equator, the distortion of features, particularly the representation of
their area, becomes increasingly severe. One common criticism of the Mercator projection is that, due
to the geographical distribution of countries in the world, many developed countries are depicted with
far greater area than equivalent-sized developing countries. For instance, examine Figure 1-7 to see
how the relative sizes of North America (actual area 19 million sq km) and Africa (actual area 30
million sq km) are depicted as approximately the same size.
Despite this criticism, the Mercator projection is still used by many applications, including Bing
Maps and Google Maps, and it is probably one of the most instantly recognizable of all geographical
images of the world.

10

www.it-ebooks.info
CHAPTER 1 SPATIAL REFERENCE SYSTEMS

Figure 1-7. The Mercator map projection.

Equirectangular Projection
The equirectangular projection is one of the first map projections ever to be invented, being credited to
Marinus of Tyre in about 100 AD. It is also one of the simplest map projections, in which the map displays
equally spaced degrees of longitude on the x-axis, and equally spaced degrees of latitude on the y-axis.
This projection is of limited use in spatial data analysis because it represents neither the accurate
shape nor area of features on the map, although it is still widely recognized and used for such purposes

11

www.it-ebooks.info
CHAPTER 1 SPATIAL REFERENCE SYSTEMS

as portraying NASA satellite imagery of the world (http://visibleearth.nasa.gov/). Figure 1-8


illustrates a map of the world created using the equirectangular projection method.

Figure 1-8. The equirectangular map projection.

Universal Transverse Mercator Projection


The Universal Transverse Mercator (UTM) projection is not a single projection, but rather a grid
composed of many projections laid side by side. The UTM grid is created by dividing the globe into 60
slices, called zones, with each zone being 6 wide and extending nearly the entire distance between
the North Pole and South Pole (the grid does not extend fully to the polar regions, but ranges from a
latitude of 80S to 84N). Each numbered zone is further subdivided by the equator into north and south
zones. Any UTM zone may be referenced using a number from 1 to 60, together with a suffix of N or S
to denote whether it is north or south of the equator. Figure 1-9 illustrates the grid of UTM zones
overlaid on a map of the world, highlighting UTM Zone 15N.
Within each UTM zone, features on the earth are projected using a transverse Mercator projection.
The transverse Mercator projection is produced using the same method as the Mercator projection, but
rotated by 90. This means that, instead of portraying features that lie along the equator with no
distortion, the transverse Mercator projection represents features that lie along a central northsouth
meridian with no distortion. Because each UTM zone is relatively narrow, any feature on the earth
lies quite close to the central meridian of the UTM zone in which it is contained, and distortion within
each zone is very small.

12

www.it-ebooks.info
CHAPTER 1 SPATIAL REFERENCE SYSTEMS

Figure 1-9. UTM zones of the world.

The UTM projection is universal insofar as it defines a system that can be applied consistently
across the entire globe. However, because each zone within the UTM grid is based on its own unique
projection, the UTM map projection can only be used to represent accurately those features lying
within a single specified zone.

Projection Parameters
In addition to the method of projection used, there are a number of additional parameters that affect
the appearance of any projected map. These parameters are listed in Table 1-2.

Table 1-2. Map Projection Parameters

Parameter Description

Azimuth The angle at which the center line of the projection lies, relative to north

Central meridian The line of longitude used as the origin from which x coordinates are measured

False easting A value added to x coordinates so that stated coordinate values remain positive
over the extent of the map

False northing A value added to y coordinates so that stated coordinate values remain positive
over the extent of the map

Latitude of center The latitude of the point at the center of the map projection

13

www.it-ebooks.info
CHAPTER 1 SPATIAL REFERENCE SYSTEMS

Parameter Description

Latitude of origin The latitude used as the origin from which y coordinates are measured

Latitude of point The latitude of a specific point on which the map projection is based

Longitude of center The longitude of the point at the center of the map projection

Longitude of point The longitude of a specific point on which the map projection is based

Scale factor A scaling factor used to reduce the effect of distortion in a map projection

Standard parallel A line of latitude along which features on the map have no distortion

Projected Units of Measurement


Having done the hard work involved in creating a projection, the task of defining coordinates on that
projection thankfully becomes much easier. If we consider all of the points on the earths surface to lie
on the flat surface of a map then we can define positions on that map using familiar Cartesian
coordinates of x and y, which represent the distance of a point from an origin along the x-axis and y-
axis, respectively. In a projected coordinate system, these coordinate values are normally referred to as
eastings (the x-coordinate) and northings (the y-coordinate). This concept is illustrated in Figure 1-10.

Figure 1-10. Describing position on the earth using a projected coordinate system.

14

www.it-ebooks.info
CHAPTER 1 SPATIAL REFERENCE SYSTEMS

Eastings and northings coordinate values represent a linear distance east and north of a given
origin. Although most projected coordinates are measured in meters, the appropriate units of
measurement for a spatial reference system will vary depending on the uses of that particular system.
Some systems use imperial units of measurement or locally relevant units: the foot, Clarke's foot, the
U.S. survey foot, or the Indian foot, for example (all of which are approximately equal to 30.5 cm,
although subtly different!). The coordinates of a spatial reference system designed for high-accuracy
local surveys may even specify millimeters as a unit of measurement.

Putting It All Together: Components of a Spatial Reference


System
We have examined several components that make up any spatial reference systema system that
allows us to define positions on the earths surfacethat we can use to describe positions on the
surface of the earth. Table 1-3 gives an overview of each component.

Table 1-3. Components of a Spatial Reference System

Component Function

Coordinate system Specifies a mathematical framework for determining the position of items
relative to an origin. Coordinate systems used in SQL Server are generally
either based on geographic or projected coordinate systems.

Datum States a model of the earth onto which we can apply the coordinate system.
Consists of a reference ellipsoid (a three-dimensional mathematical shape
that approximates the shape of the earth) and a reference frame (a set of points
to position the reference ellipsoid relative to known locations on the earth).

Prime meridian Defines the axis from which coordinates of longitude are measured.

Projection a Details the parameters required to create a two-dimensional image of the


earths surface (i.e., a map), so that positions can be defined using projected
coordinates.

Unit of measurement Provides the appropriate unit in which coordinate values are expressed.
a
Projection parameters are only defined for spatial reference systems based on projected coordinate systems.

Through a combination of all these elements, you can define a spatial reference system capable of
uniquely identifying any point on the earth.

15

www.it-ebooks.info
CHAPTER 1 SPATIAL REFERENCE SYSTEMS

Note In order to be able to describe positions on the earth using a projected coordinate system, a spatial
reference system must first specify a three-dimensional, geodetic model of the world (as would be used by a
geographic coordinate system), and then additionally state the parameters detailing how the two-dimensional
projected map image should be created from that model. For this reason, spatial reference systems based on
projected coordinate systems must contain all the same elements as those based on geographic coordinate
systems, together with the additional parameters required for the projection.

Spatial Reference Identifiers (SRIDs)


Every time we state the latitude and longitude, or x- and y-coordinates, that describe the position of a
point on the earth, we must also state the associated spatial reference system from which those
coordinates were obtained. Without the extra information contained in the spatial reference system, a
coordinate tuple is just an abstract set of numbers in a mathematical system. The spatial reference
system takes the abstract coordinates from a geographic or projected system and puts them in a context
so that they can be used to identify a real position on the earths surface.
However, it would be quite cumbersome to have to write out the full details of the datum, the prime
meridian, and the unit of measurement (and any applicable projection parameters) each time we
wrote down a set of coordinates. Fortunately, various authorities allocate easily memorable, unique
integer reference numbers that represent all of the necessary parameters of a spatial reference
system. These reference numbers are called spatial reference identifiers (SRIDs).
One authority that allocates SRIDs is the European Petroleum Survey Group (EPSG), and its
reference identification system is implemented in SQL Server 2012. Whenever you use any of the
spatial functions in SQL Server that involve stating the coordinates of a position, you must also supply
the relevant EPSG SRID of the system from which those coordinates were obtained.
Some examples of SRIDs assigned by the EPSG that can be used in SQL Server are:
4269 (North American Datum 1983)
32601 (UTM Zone 1 North)
4326 (World Geodetic System 1984)
32136 (Tennessee State Plane Coordinate System)
A more comprehensive list of common spatial reference identifiers can be found in the appendix
of this book.

Tip You can view the details of all spatial reference systems administered by the EPSG registry at the following
website, http://www.epsg-registry.org.

16

www.it-ebooks.info
CHAPTER 1 SPATIAL REFERENCE SYSTEMS

Well-Known Text of a Spatial Reference System


SQL Server maintains a catalogue view, sys.spatial_reference_systems, in which it stores the details
of all 392 supported geographic spatial reference systems. The information contained in this table is
required to define the model of the earth on which geographic coordinate calculations take place. Note
that no additional information is required to perform calculations of data defined using projected
coordinates, because these take place on a simple 2D plane. Therefore SQL Server can support data
defined using any projected coordinate system.
The parameters of each geographic spatial reference system in sys.spatial_reference_systems are
stored in the well_known_text column using the Well-Known Text (WKT) format, which is an industry-
standard format for expressing spatial information defined by the Open Geospatial Consortium (OGC).

Note SQL Server only supports geographic coordinate data defined relative to one of the spatial reference
systems listed in sys.spatial_reference_systems. This table contains the additional information required to
construct the model of the earth on which geographic coordinate calculations take place. However, because no
additional information is required to perform calculations on a 2D plane, SQL Server supports projected coordinate
data defined from any projected coordinate reference system, and the details of such systems are not listed in
sys.spatial_reference_systems.

To illustrate how spatial references are represented in WKT format, lets examine the properties
of the EPSG:4326 spatial reference by executing the following query.
SELECT
well_known_text
FROM
sys.spatial_reference_systems
WHERE
authority_name = 'EPSG'
AND
authorized_spatial_reference_id = 4326;
The following is the result (with line breaks and indents added to make the result easier to
read).
GEOGCS[
"WGS 84",
DATUM[
"World Geodetic System 1984",
ELLIPSOID[
"WGS 84",
6378137,
298.257223563
]
],
PRIMEM["Greenwich", 0],
UNIT["Degree", 0.0174532925199433]
]

17

www.it-ebooks.info
CHAPTER 1 SPATIAL REFERENCE SYSTEMS

This result contains all the parameters required to define this spatial reference system, as follows.
Coordinate system: The first line of a WKT spatial reference is a keyword to tell
us what sort of coordinate system is used. In this case, GEOGCS tells us that
EPSG:4326 uses a geographic coordinate reference system. If a spatial reference
system is based on projected coordinates then the WKT representation would
instead begin with PROJCS. Immediately following this is the name assigned to
the spatial reference system. In this case, the Well-Known Text is describing the
"WGS 84" spatial reference system.
Datum: The values following the DATUM keyword provide the parameters of
the datum. The first parameter gives us the name of the datum used. In this case,
it is the "World Geodetic System 1984" datum. Then follow the parameters of the
reference ellipsoid. This system uses the "WGS 84" ellipsoid, with a semimajor
axis of 6,378,137 m and an inverse-flattening ratio of 298.257223563.
Prime meridian: The PRIMEM value tells us that this system defines
Greenwich as the prime meridian, where longitude is defined to be 0.
Unit of measurement: The final parameter specifies that the unit in which
coordinates are measured is the "Degree". The value of 0.0174532925199433 is a
conversion factor required to convert from radians into the stated units (1
degree = /180 radians).

Contrasting a Geographic and a Projected Spatial Reference


Lets compare the result in the preceding section to the WKT representation of a spatial reference
system based on a projected coordinate system. The following example shows the WKT representation
of the UTM Zone 10N reference, a projected spatial reference system used in North America. The SRID
for this system is EPSG:26910.

Note Remember that, because this is a projected spatial reference system, you won't find these details in the
sys.spatial_reference_systems table. Instead, you can look up the details of these systems using a site such as
http://www.epsg-registry.org or http://www.spatialreference.org.

PROJCS[
"NAD_1983_UTM_Zone_10N",
GEOGCS[
"GCS_North_American_1983",
DATUM[
"D_North_American_1983",
SPHEROID[
"GRS_1980",
6378137,
298.257222101
]
],
PRIMEM["Greenwich",0],

18

www.it-ebooks.info
CHAPTER 1 SPATIAL REFERENCE SYSTEMS

UNIT["Degree", 0.0174532925199433]
],
PROJECTION["Transverse_Mercator"],
PARAMETER["False_Easting", 500000.0],
PARAMETER["False_Northing", 0.0],
PARAMETER["Central_Meridian", -123.0],
PARAMETER["Scale_Factor", 0.9996],
PARAMETER["Latitude_of_Origin", 0.0],
UNIT["Meter", 1.0]
]
Notice that the Well-Known Text for this projected coordinate system contains a complete set of
parameters for a geographic coordinate system, embedded within brackets following the GEOGCS
keyword. The reason is that a projected system must first define the three-dimensional, geodetic
model of the earth, and then specify several additional parameters that are required to project that
model onto a plane.

Note The Well-Known Text format in which SQL Server stores the properties of spatial reference systems in the
sys.spatial_reference_systems table is exactly the same format as used in the .PRJ file used to describe
the spatial reference in which the data in an ESRI shapefile are stored.

Summary
After reading this chapter, you should understand how spatial reference systems can be used to
describe positions in space:
A spatial reference system consists of a coordinate system (which describes a
position using either projected or geographic coordinates), a datum (which
describes a model representing the shape of the earth), the prime meridian
(which defines the origin from which units are measured), and the unit of
measurement. When using projected coordinates, the spatial reference system
also defines the properties of the projection used.
A geographic coordinate system defines the position of objects using angular
coordinates of latitude and longitude, which are measured from the equator and
the prime meridian, respectively.
A projected coordinate system defines the position of objects using Cartesian
coordinates, which measure the x and y distance of a point from an origin. These
are also referred to as easting and northing coordinates.
Whenever you state a set of coordinates representing a point on the earth, it is
essential that you also give details of the associated spatial reference system.
The spatial reference system defines the additional information that allows us to
apply the coordinate reference to identify a point on the earth.
For convenience, spatial reference systems may be specified by a single integer
identifier, known as a spatial reference identifier (SRID).

19

www.it-ebooks.info
CHAPTER 1 SPATIAL REFERENCE SYSTEMS

Details of all the geographic spatial reference systems supported by SQL Server
2012 are contained within a system catalogue view called
sys.spatial_reference_systems. SQL Server also supports data defined using any
projected spatial reference system.
The Well-Known Text format is a standard format used to express the properties
of a spatial reference system.
If you are interested in reading further about the topics covered in this chapter, I recommend
checking out the Microsoft white paper, "Introduction to Spatial Coordinate Systems: Flat Maps for a
Round Planet," which can be found in the MSDN SQL Server developer center site, at
http://msdn.microsoft.com/en-us/library/cc749633(SQL.100).aspx.

20

www.it-ebooks.info
CHAPTER 2

Spatial Features

In the last chapter, I stated that the purpose of geospatial data was to describe the shape and location of
objects on the Earth. Although this objective may be simply stated, in practice it is not always so easy to
achieve.
In many cases, although we have a rough understanding of the position and geographic extent of
features on the Earth, they may be hard to define in exact terms. For example, at what point does the
body of water known as the Gulf of Mexico become the Atlantic Ocean? Where exactly do we draw the
line that defines the boundary of a city or forest? In some parts of the world, there is even ambiguity or
contention as to where the border between two countries lies, and there are still significant areas of
land and sea that are subjects of international dispute.
Even if we agree on the precise shape and location of a feature, it may be hard to describe the
properties of that feature with sufficient detail; natural features, such as rivers and coastlines, have
complex irregular shapes. Even man-made structures such as roads are rarely simple straight lines.
It would be very hard, if not impossible, to define the shape of these features exactly. Instead,
spatial data represents these objects by storing simple geometrical shapes that approximate their
actual shape and position. These shapes are called geometries.
The spatial functionality in SQL Server is based on the Open Geospatial Consortiums Simple
Features for SQL Specification, which you can view online at
http://www.opensgeospatial.org/standards/sfs. This standard defines a number of different types of
geometries, each with different associated properties. In this chapter, each of the different types of
geometry is examined and the situations in which it is most appropriate to use each type are described.

Note In the context of spatial data, the word "geometry" can have two distinct meanings. To emphasize the
difference, geometry (code formatting) is used to refer to the geometry datatype, whereas geometry (no
formatting) is used to refer to simple shapes representing features on the Earth.

Geometry Hierarchy
There is a total of 14 standard types of geometries recognized by SQL Server (not counting the special
cases of the FullGlobe or Empty geometries; more on those later). However, only ten of these geometry
types are instantiable (that is to say, you can actually create instances of these geometries); the
remaining four types are abstract classes from which other instantiable classes are derived.
Geometries can be broadly categorized into two groups, as follows.

21

www.it-ebooks.info
CHAPTER 2 SPATIAL FEATURES

Single geometries contain one discrete geometric element. The most basic
single geometry is a Point. There are also three types of curve (LineString,
CircularString, and CompoundCurve) and two types of surface (Polygon and
CurvePolygon).
Geometry collections are compound elements, containing one or more of the
individual geometries listed above. Geometry collections may be
homogeneous or heterogeneous. A homogeneous geometry collection
contains several items of the same type of single geometry only (e.g., a
MultiPoint is a geometry collection containing only Points). A heterogeneous
geometry collection contains one or more of several different sorts of
geometry, such as a collection containing a LineString and a Polygon.

Note The Microsoft Books Online documentation refers to these two categories of geometries as Simple
types and Collection types (http://technet.microsoft.com/en-
us/library/bb964711%28SQL.110%29.aspx). The use of the word Simple here has been deliberately avoided
because this has a separate meaning (as used by the STIsSimple() method) that is discussed later.

Figure 2-1 illustrates the inheritance tree of geometry types, which demonstrates how the
different types of geometry are related to each other. Every item of spatial data in SQL Server is an
example of one of the ten instantiable classes shown with a solid border.

Figure 2-1. The inheritance hierarchy of geometry types. Instantiable types (those types from which an
instance of data can be created in SQL Server 2012) are shown with a solid border.

SQL Server 2008 provided only a single instantiable type of Curve (the LineString), and only a
single type of instantiable surface (the Polygon). Both of these geometry types are straight-edged,
linear features. SQL Server 2012 added support for curved geometries, and the CircularString,
CompoundCurve, and CurvePolygon curved geometries shown in Figure 2-1 are new types introduced
in SQL Server 2012.

22

www.it-ebooks.info
CHAPTER 2 SPATIAL FEATURES

Note In the OGC Simple Features specification, geometry type names are written using Pascal case (also called
Upper CamelCase) and this is the standard generally used in Microsoft documentation. For this reason, that
convention is also adopted in this book by referring to geometry types as MultiPoint, LineString, and so on.

Interiors, Exteriors, and Boundaries


Once you have defined an instance of any of the types of geometry listed in the previous section, you
can then classify every point in space into one of three areas relative to that geometry: every location
must lie either in the geometry's interior, in its exterior, or on its boundary:
The interior of a geometry consists of all those points that lie in the space
occupied by the geometry. In other words, it represents the "inside" of the
geometry.
The exterior consists of all those points that lie in the area of space not occupied
by the geometry. It can therefore be thought of as representing the outside of
the geometry.
The boundary of a geometry consists of those points that lie on the edge of the
geometry in question.
Generally speaking, every geometry contains at least one point in its interior and also at least
one point lies in its exterior. The only exceptions to this rule are the special cases of the empty
geometry and the full globe geometry: an empty geometry has no interior, and therefore every point
is considered to lie in its exterior, whereas the full globe geometry is exactly the opposite: containing
every point in its interior, with no points in its exterior.
The distinction between these classifications of space becomes very important when considering
the relationship between two or more geometries, because these relationships are defined by
comparing where particular points lie with respect to the interior, exterior, or boundary of the two
geometries in question. For example:
Two geometries are said to intersect each other if there is at least one point that
lies in either the interior or boundary of both geometries in question.
Two geometries are deemed to touch each other if there is at least one shared
point that lies on the boundary of both geometries, but no points common to the
interior of both geometries. Note that this criterion is more specific than the
general case of intersection decribed above, and any geometries that touch must
therefore also intersect.
If two geometries have no interior or boundary points in common then they are
said to be disjoint.
The distance between two geometries is measured as the shortest possible
distance between any two interior points of the two geometries.
These concepts, and other related classifications, are discussed in later chapters of this book when
spatial relationships are explained in more detail. For the remainder of this chapter, I instead
concentrate on examining the various types of geometry in greater detail.

23

www.it-ebooks.info
CHAPTER 2 SPATIAL FEATURES

Points
A Point is the most fundamental type of geometry, and is used to represent a singular position in
space.

Example Point Usage


When using geospatial data to define features on the Earth, a Point geometry is generally used to
represent an exact location, which could be a street address, or the location of a bank, volcano, or city,
for instance. Figure 2-2 illustrates a Point geometry used to represent the location of Berlin with
respect to a map of Germany. Berlin has a fascinating and complicated history, the city itself being
politically divided for much of the twentieth century between West Berlin and East Berlin, a division
that famously led to the erection of the Berlin Wall. Despite the fact that West Berlin was, to all intents
and purposes, a part of West Germany, it lay in a region that, for 50 years following the Second World
War, was proclaimed to be the German Democratic Republic (East Germany), and was entirely isolated
from the rest of West Germany.

Figure 2-2. A Point geometry marking the location of Berlin.

24

www.it-ebooks.info
CHAPTER 2 SPATIAL FEATURES

Note Inasmuch as a Point geometry represents an infinitely small, singular location in space, it is impossible to
truly illustrate it in a diagram. Throughout this book, Point geometries are represented as small black circles, as in
Figure 2-2.

Defining a Point
A Point is defined by a pair of coordinate values, either an x-coordinate value and a y-coordinate
value from a planar coordinate system, or a latitude and longitude coordinate value from a geographic
coordinate system.
When expressed using the Well-Known Text (WKT) format, a Point located with coordinates x = 5
and y = 3 may be written as follows,
POINT(5 3)
The WKT representation begins with the POINT keyword followed by the relevant coordinate
values, contained within round brackets. The coordinate values are separated by a space (not a
comma, as you might initially expect). Figure 2-3 illustrates the Point geometry represented by this
definition.

Figure 2-3. A Point located at POINT(5 3).

Defining a Point from geographic coordinates follows the same convention, but with one thing to
watch out for: whereas in everyday language it is common to refer to coordinates of "latitude and
longitude" (in that order), when you write geographic coordinates in WKT the longitude coordinate
always comes first, then the latitude coordinate. The WKT syntax for a geography Point located at a
latitude of 40 and longitude of 60 is therefore:
POINT(60 40)
The location of this Point is illustrated in Figure 2-4.

25

www.it-ebooks.info
CHAPTER 2 SPATIAL FEATURES

Figure 2-4. A Point located at geographic coordinates POINT(60 40).

To help remember the correct order for geographic coordinates, try thinking of longitude as being
equivalent to the x-coordinate, because longitude increases as you travel east around the world (until
you cross the 180th meridian). Likewise, latitude is equivalent to the y-coordinate, with increasing
latitude extending farther north. Because you list planar coordinates in (x y) order, the equivalent
order for geographic coordinates is therefore (longitude latitude).

Caution When defining geographic coordinates using WKT the longitude coordinate comes first, then the
latitude coordinate.

Defining Points in 3- and 4-Dimensional Space


In addition to the x- and y- (or longitude and latitude) coordinates required to locate a Point on the
surface of the Earth, the WKT syntax enables you to specify additional z-and m-coordinate values to
position a point in four-dimensional space.
The z-coordinate is the height, or elevation, of a Point. Just as positions on the
Earth's surface are measured with reference to a horizontal datum, the height
of points above or below the surface are measured relative to a vertical datum.
The z-coordinate may represent the height of a point above sea-level, the
height above the underlying terrain, or the height above the reference
ellipsoid, depending on which vertical datum is used.
The m-coordinate represents the measure value of a Point. The fourth
dimension is most commonly thought of as time, although the m-coordinate
can be used to represent any additional property of a point that can be
expressed as a double-precision number. For example, if you were recording
time-based data, you could use the m-coordinate to represent the time at
which the location of a point was recorded. Or, if recording waypoints along a

26

www.it-ebooks.info
CHAPTER 2 SPATIAL FEATURES

route, the m-coordinate could be used to express the distance of how far along
the route each point lay.
The WKT syntax for a Point containing z- and m-coordinates is as follows,
POINT(x y z m)
Or, if using geographic coordinates:
POINT(longitude latitude z m)
However, you should be aware that, although SQL Server 2012 supports the creation, storage, and
retrieval of z- and m-coordinate values, all of the inbuilt methods operate in 2D space only. The z and
m values assigned to a Point instance will therefore not have any effect on the result of any
calculations performed on that instance.
For example, when calculating the distance between the Points located at (0 0 0) and (3 4 12), SQL
Server calculates the result as 5 units (the square root of the sum of the difference in the x and y
dimensions only), and not 13 (the square root of the sum of the difference in the x, y, and z
dimensions). You can, however, retrieve the z and m values associated with any instance and use them
in your own calculations, as is demonstrated in a later chapter.

Characteristics of Points
All Point geometries share the following characteristics.
A Point is zero-dimensional, which means that it has no length in any direction
and there is no area contained within a Point.
A Point has no boundary.
The interior of a Point is the Point itself. Everything other than that Point is the
exterior.
Points are always classified as "simple" geometries.

LineStrings
Having established the ability to define individual Points, we can then create a series of two or more
Points and draw the path segments that directly connect each one to the next in the series. This path
defines a LineString.

Example LineString Usage


In geospatial data, LineStrings are commonly used to represent features such as roads, rivers, delivery
routes, or contours of the Earth. Figure 2-5 illustrates a LineString that represents the route of the
Orient Express railway, which traveled across Europe between Paris and Istanbul. The Orient Express
was one of the worlds most famous luxury railway services, and passed through many of Europes
great cities during its 1200-mile route, including Munich, Vienna, Budapest, and Belgrade.

27

www.it-ebooks.info
CHAPTER 2 SPATIAL FEATURES

Figure 2-5. A LineString representing the route of the Orient Express railway.

Defining a LineString
When expressed using the WKT format, the coordinate values of each Point are separated by a space,
and a comma separates each Point from the next in the LineString, as follows.
LINESTRING(2 3, 4 6, 6 6, 10 4)
The LineString created by this WKT definition is illustrated in Figure 2-6.

Figure 2-6. A LineString geometry.

28

www.it-ebooks.info
CHAPTER 2 SPATIAL FEATURES

Note Some GIS systems make a distinction between a LineString and a Line. According to the Open Geospatial
Consortium specification (a standard on which the spatial features of SQL Server 2012 are largely based), a Line
connects exactly two Points, whereas a LineString may connect any number of Points. Because all Lines can be
represented as LineStrings, of these two types SQL Server 2012 only implements the LineString geometry. If you
need to define a table in which only Lines can be stored, you can do so by adding a CHECK constraint that calls the
STNumPoints() method to test whether inserted LineString values contain only two points.

LineStrings created from geographic coordinates follow the same convention: the coordinates of
each Point in the LineString are listed in longitudelatitude order (as they would be for an individual
Point), and each Point in the LineString is separated by a comma.

Characteristics of LineStrings
All LineStrings are one-dimensional geometries: they have an associated length, but do not contain
any area. This is the case even when the ends of the LineString are joined together to form a closed
loop. LineStrings may be described as having the following additional characteristics.
A simple LineString is one where the path drawn between the points of the
LineString does not cross itself.
A closed LineString is one that starts and ends at the same point.
A LineString that is both simple and closed is known as a ring.
The interior of a LineString consists of all the points that lie on the path of the
line. Be aware that, even when a LineString forms a closed ring, the interior of
the LineString does not contain those points in the area enclosed by the ring.
The interior of a LineString consists only of those points that lie on the
LineString itself.
The boundary of a LineString consists of the two points that lie at the start and
end of the line. However, a closed LineString, in which the start and end points
are the same, has no boundary.
The exterior of a LineString consists of all those points that do not lie on the line.
Different examples of LineString geometries are illustrated in Figure 2-7.

Figure 2-7. Examples of LineString geometries. (From leftright) A simple LineString, a simple closed
LineString (a ring), a nonsimple LineString, a nonsimple closed LineString.

29

www.it-ebooks.info
CHAPTER 2 SPATIAL FEATURES

LineStrings and Self-Intersection


It is worth noting that, although the path of a nonsimple LineString may cross itself at one or more
distinct points, it cannot retrace any continuous length of path already covered. Consider Figure 2-8,
which illustrates the shape of a capital letter T:

Figure 2-8. A geometry in the shape of a capital letter T.

The shape illustrated in Figure 2-8 cannot be represented by a single LineString geometry,
because doing so would necessarily involve retracing at least one section of the path twice. Instead,
the appropriate type of geometry to represent this shape is a MultiLineString geometry, discussed later
this chapter.

CircularStrings
As described in the previous section, LineStrings are formed by defining the path segments
connecting a series of Points in order. The line segments that connect consecutive points are
calculated by linear interpolation: each line segment represents the shortest direct route from one
Point to the next in the LineString.
However, this is clearly not the only way to connect a series of Points. An alternative method
would be to define a curve that connects each Point with a smooth line and gently changing gradient,
rather than abrupt angular corners between segments typical of a LineString. The CircularString
geometry, which is a new geometry type introduced in SQL Server 2012, provides one such curved line
by using circular, rather than linear, interpolation between points. In other words, a CircularString is
defined by the paths connecting a series of points in order, where the path segments connecting each
pair of points is an arc formed from part of a circle.

Example CircularString Usage


Every year, teams of rowers from Oxford University and Cambridge University compete in a boat race
on the River Thames in West London. Starting from Putney Bridge, the race course follows the river
upstream for slightly over four miles, ending just before Chiswick Bridge. The course is marked by
three distinctive bends; the crew rowing on the north side of the river has the advantage in the first
and third bends, whereas the crew rowing on the south side of the river has the advantage of being on
the inside for the long second bend.
A CircularString, as illustrated in Figure 2-9, is a suitable geometry to model the course of the race,
because it can represent more accurately the smooth curves of the river than is possible using linear
interpolation as in a LineString.

30

www.it-ebooks.info
CHAPTER 2 SPATIAL FEATURES

Figure 2-9. A CircularString geometry representing the course of the OxfordCambridge University boat
race.

Note Don't be misled by the name: a CircularString geometry does not have to form a complete circle (although
it can); it merely means that the segments joining consecutive points are circular arcs rather than straight lines as
in a LineString.

Defining a CircularString
There are an infinite number of circular arcs that connect two Points. In order to specify which of these
arcs should be created, every CircularString segment actually requires three points: the start and end
points to be connected, and an additional anchor point that lies somewhere on the arc between those
points. The CircularString will follow the edge of the only circle that passes through all three points.
The syntax for the Well-Known Text representation of a CircularString is as follows,
CIRCULARSTRING (1 3, 4 1, 9 4)
The CircularString created from this definition is shown in the solid line illustrated in Figure
2-10. The dashed line illustrates the complete circle from which the CircularString arc has been
created.

31

www.it-ebooks.info
CHAPTER 2 SPATIAL FEATURES

Figure 2-10. CircularString defined by the circular interpolation of three points.

Note The additional anchor point does not need to lie in the middle of the start and end points of a
CircularString; it can be any point that lies on the circular arc between the start and end point.

Like LineStrings, CircularStrings can be created between a series of any number of consecutive
points. Each segment implicitly starts at the endpoint of the previous curved segment. Each additional
segment requires both an anchor point and an endpoint, therefore every valid CircularString contains
an odd number of points, and must contain at least three points.

Note A valid CircularString must have an odd number of points, greater than one.

WHEN IS A CIRCULARSTRING A STRAIGHT LINE?

One interesting point to note is that it is possible to specify a CircularString in which the anchor point lies
exactly on the straight line between the start and end point. The circular arc created in such cases is a
straight line, effectively joining all three points with the arc taken from a circle of infinite radius. The same
result can also be achieved if the anchor point is exactly equal to either the start or end point.
The set of points contained by either a LineString or a "straight" CircularString are identical, which can be
confirmed using SQL Server's STEquals() method as shown in the following code listing.
DECLARE @LineString geometry = 'LINESTRING(0 0, 8 6)';
DECLARE @CircularString1 geometry = 'CIRCULARSTRING(0 0, 4 3, 8 6)';

32

www.it-ebooks.info
CHAPTER 2 SPATIAL FEATURES

DECLARE @CircularString2 geometry = 'CIRCULARSTRING(0 0, 0 0, 8 6)';

SELECT
@LineString.STEquals(@CircularString1), -- Returns 1 (true)
@LineString.STEquals(@CircularString2); -- Returns 1 (true)

Characteristics of CircularStrings
CircularStrings, like LineStrings, inherit from the abstract Curve geometry type, and share many of
the same characteristics.
CircularStrings are one-dimensional geometries; they have an associated
length, but do not contain any area.
A simple CircularString is one where the path drawn between the points of the
CircularString does not cross itself.
A closed CircularString is one that starts and ends at the same point.
The interior of a CircularString consists of all the points that lie on the arc
segments.
The boundary of a CircularString consists of the start and end points only, except
in the case of a closed CircularString, which has no boundary.
The exterior of a CircularString consists of all those points not on the path of the
CircularString.
Every CircularString must be defined by an odd number of points greater than
one.

Drawing Complete Circles


To create a CircularString that forms a complete circle, you might expect that you would need to define
only three points: one point used twice as both the start and end of the CircularString, and one other
anchor point that lies somewhere on the perimeter of the circle. However, the problem with this
definition is that it does not specify the orientation of the created circle; that is, beginning from the
start point, does the path of the CircularString travel in a clockwise or anti-clockwise direction through
the anchor point and back to where it started?
To avoid this ambiguity, in order to create a CircularString that forms a complete circle, five points
are required. As with any closed LineString, the start and end points are the same. The remaining
three points can be any other points that lie on the circle, listed in the desired order. The following
Well-Known Text defines a clockwise circle with a radius of two units, centered about the point at (5 5):
CIRCULARSTRING(3 5, 5 7, 7 5, 5 3, 3 5)
This CircularString is illustrated in Figure 2-11.

33

www.it-ebooks.info
CHAPTER 2 SPATIAL FEATURES

Figure 2-11. Creating a circle using a CircularString geometry.

Note In order to define a full circle, you must define a CircularString containing five points.

Choosing Between LineString and CircularString


Although LineString geometries can be used to approximate a curve by using a number of small
segments, CircularStrings can generally do so more efficiently and with greater accuracy. However,
even though the CircularString geometry may enable you to describe rounded features with greater
precision, the LineString is better supported and more widely used in spatial applications. For this
reason, you will probably find that many sources of spatial data still use LineStrings even in situations
where CircularStrings may provide greater benefit.
You may also find that, when exporting your own data for use in third-party spatial applications,
CircularStrings are not supported and you may have to convert your CircularStrings to LineStrings
instead. Fortunately, SQL Server provides a method to do just thisSTCurveToLine()which is
documented in Books Online at http://msdn.microsoft.com/en-us/library
/ff929272%28v=sql.110%29.aspx.

CompoundCurves
A CompoundCurve is a single continuous path between a set of Points, in which the segments joining
each pair of Points may either be linear (as in a LineString) or curved (as in a CircularString), or a
mixture of both. The CompoundCurve geometry is a new geometry type introduced in SQL Server 2012.

34

www.it-ebooks.info
CHAPTER 2 SPATIAL FEATURES

Example CompundCurve Usage


The Daytona International Speedway race track in Daytona Beach, Florida, is recognizable by its
distinctive tri-oval shape, consisting of three straights and three smooth corners. This revolutionary
circuit design, conceived by Bill France, founder of NASCAR, was created to maximize the angle of
vision in which spectators could see cars both approaching and driving away from them.
Figure 2-12 illustrates a CompoundCurve representing the shape of the Daytona racing circuit,
consisting of three CircularString segments and three LineString segments.

Figure 2-12. A CompundCurve representing the Daytona International Speedway racing circuit.

Defining a CompoundCurve
The Well-Known Text for a CompoundCurve geometry begins with the COMPOUNDCURVE keyword
followed by a set of round brackets. Contained within the brackets are the individual LineString or
CircularString segments that are joined together to form the compound curve.
Each CircularString or LineString segment in the CompoundCurve must begin at the point where
the previous segment ended, so that the CompoundCurve defines a single continuous path. The
coordinates of CircularString segments are preceded by the CIRCULARSTRING keyword, whereas
LineString segments are not preceded by any keyword; they are simply a list of coordinates contained
in round brackets.
The following code listing demonstrates the Well-Known Text representation of a
CompoundCurve geometry containing two LineString segments and two CircularString segments:
COMPOUNDCURVE(
(2 3, 2 8),
CIRCULARSTRING(2 8, 4 10, 6 8),
(6 8, 6 3),
CIRCULARSTRING(6 3, 4 1, 2 3)
)
This CompoundCurve geometry is illustrated in Figure 2-13.

35

www.it-ebooks.info
CHAPTER 2 SPATIAL FEATURES

Figure 2-13. A CompoundCurve geometry.

Characteristics of CompoundCurves
CompoundCurves are constructed from one-dimensional LineStrings and CircularStrings, therefore
CompoundCurves are themselves one-dimensional, and contain no area.
A simple CompoundCurve is one that does not intersect itself.
A closed CompoundCurve is one that starts and ends at the same point.

Polygons
A Polygon is a type of surface; that is, a Polygon is a two-dimensional geometry that contains an area
of space. The outer extent of the area of space contained within a Polygon is defined by a closed
LineString, called the exterior ring. In contrast to a simple closed LineString geometry, which only
defines those points lying on the ring itself, the Polygon defined by a ring contains all of the points
that lie either on the line itself, or contained in the area within the exterior ring.

Example Polygon Usage


Polygons are frequently used in spatial data to represent geographic areas such as islands or lakes,
political jurisdictions, or large structures. Figure 2-14 illustrates a Polygon that represents the state of
Texas, United States. The large state (261,797 square miles) has a very distinctive, recognizable shape,
which features straight sides along the northwest border with New Mexico, the meandering path of the
Red River dividing Texas from Oklahoma to the north, and the smooth border of the Gulf Coast to the
southeast.

36

www.it-ebooks.info
CHAPTER 2 SPATIAL FEATURES

Figure 2-14. A Polygon geometry representing the U.S. state of Texas.

Exterior and Interior Rings


Every Polygon must have exactly one external ring that defines the overall perimeter of the geometry.
It may also contain one or more internal rings. Internal rings define areas of space contained within
the external ring, but excluded from the Polygon. They can therefore be thought of as holes that have
been cut out of the main geometry.
Figure 2-15 illustrates a Polygon geometry containing a hole. The Polygon in this case represents
the country of South Africa, and the hole represents the fully enclosed country of Lesotho.

Figure 2-15. A Polygon containing an interior ring, representing South Africa. The interior ring represents
the border with Lesotho.

37

www.it-ebooks.info
CHAPTER 2 SPATIAL FEATURES

Defining a Polygon
The Well-Known Text for a Polygon begins with the POLYGON keyword, followed by a set of round
brackets. Within these brackets, each ring of the Polygon is contained within its own set of brackets.
The exterior ring, which defines the perimeter of the Polygon, is always the first ring to be listed.
Following this, any interior rings are listed one after another, with each ring separated by a comma.
The following code listing demonstrates the WKT syntax for a rectangular Polygon, two units wide
and six units high.
POLYGON((1 1, 3 1, 3 7, 1 7, 1 1))
And the following code listing demonstrates the WKT syntax for a triangular Polygon containing
an interior ring.
POLYGON((10 1, 10 9, 4 9, 10 1), (9 4, 9 8, 6 8, 9 4))
These two Polygons are both illustrated in Figure 2-16.

Figure 2-16. Examples of Polygon geometries. (From leftright) A Polygon; a Polygon with an interior ring.

Note Because Polygons are constructed from rings, which are simple closed LineStrings, the coordinates of the
start and end points of each Polygon ring must be the same.

Characteristics of Polygons
All Polygons share the following characteristics.
Because Polygons are constructed from a series of one or more rings, which are
simple closed LineStrings, all Polygons are themselves deemed to be simple
closed geometries.

38

www.it-ebooks.info
CHAPTER 2 SPATIAL FEATURES

Polygons are two-dimensional geometries; they have both an associated length


and area.
The length of a Polygon is measured as the sum of the lengths of all the
rings of that Polygon (exterior and interior).
The area of a Polygon is calculated as the area of space within the exterior
ring, less the area contained within any interior rings.

CurvePolygons
The CurvePolygon, like the Polygon, is defined by one exterior ring and, optionally, one or more
interior rings. Unlike the Polygon, however, in which each ring must be a simple closed LineString,
each ring in a CurvePolygon can be any type of simple closed curve. Those curves can be LineStrings,
CircularStrings, or CompoundCurves, so the rings that define the boundary of a CurvePolygon can
have a mixture of straight and curved edges.

Example CurvePolygon Usage


Yankee Stadium, built in 1923 in New York City, hosted over 6581 home games of the New York
Yankees baseball team in its 85-year history, prior to its closure in 2008 (the team now plays in a new
stadium, also named Yankee Stadium, constructed a short distance away from the site of the original
Yankee Stadium). It was the first three-tiered sports facility to be built in the United States, and one of
the first to be officially named a stadium (as opposed to a traditional baseball park, or field).
The large stadium was designed to be a multipurpose facility that could accommodate baseball,
football, and track and field events, and the smooth-cornered, irregularly sided shape of the stadium
can be be modeled as a CurvePolygon whose exterior ring contains four CircularString segments and
four LineString segments, as illustrated in Figure 2-17.

Figure 2-17. A CurvePolygon representing Yankee Stadium in New York.

39

www.it-ebooks.info
CHAPTER 2 SPATIAL FEATURES

Defining a CurvePolygon
The WKT representation of a CurvePolygon follows the same general syntax as that for a Polygon.
However, because the CurvePolygon allows rings to be defined as LineStrings, CircularStrings, or
CompoundCurves, you must specify which kind of curve is used for each ring.
The LineString is considered to be the default curve type, and linear rings do not need to be
explicitly preceded by the LINESTRING keyword. In the following code listing, a CurvePolygon is defined
by a linear ring between five points:
CURVEPOLYGON((4 2, 8 2, 8 6, 4 6, 4 2))
The result, shown in Figure 2-18, is a square of width and height 2 units, exactly the same as would
have been created using the following Polygon geometry.
POLYGON((4 2, 8 2, 8 6, 4 6, 4 2))

Figure 2-18. A CurvePolygon with a LineString exterior ring.

In the following code listing, the exterior ring of the CurvePolygon is instead defined using a
CircularString geometry between the same set of points.
CURVEPOLYGON(CIRCULARSTRING(4 2, 8 2, 8 6, 4 6, 4 2))
In this case, rather than creating a square, the resulting CurvePolygon is a circle of radius 2.828
(8) , centered on the point (6 4), as shown in Figure 2-19.

40

www.it-ebooks.info
CHAPTER 2 SPATIAL FEATURES

Figure 2-19. A CurvePolygon with a CircularString exterior ring.

Characteristics of CurvePolygons
With the exception of the method of interpolation between points, CurvePolygons share exactly the
same general characteristics as Polygons:
CurvePolygons are two-dimensional, simple, closed geometries.
They have a length equal to the perimeter of all defined rings.
The area contained by a CurvePolygon is equal to the area of space enclosed
within the exterior ring less any area contained within any interior rings.

MultiPoints
A MultiPoint is a homogeneous collection of Point geometries. Unlike the LineString or CircularString,
which are formed from a series of connected Points, there are no connecting lines between the
individual Points in a MultiPoint: they are distinct and separate.

Example MultiPoint Usage


The Ardrossan windfarm in Ayrshire, Scotland, contains 14 wind turbines providing green electricity
to around 20,000 homes. If each turbine is modeled as a distinct Point geometry, then the collection of
turbines forming the entire windfarm can be modeled as a MultiPoint geometry, as in Figure 2-20.

41

www.it-ebooks.info
CHAPTER 2 SPATIAL FEATURES

Figure 2-20. A MultiPoint instance representing the location of each wind turbine at the site of the
Ardrossan wind farm.

Defining A MultiPoint
To represent a MultiPoint geometry in Well-Known Text, you first declare the MULTIPOINT element
name, followed by a comma-separated list of the coordinate tuples of each point contained in the
instance, contained within round brackets. The coordinates of each point are listed in exactly the same
manner as they would be if used to define an individual Point instance; that is, coordinates are listed
in x, y, z, m order, or longitude, latitude, z, m order, with values separated by spaces.
The following code listing is an example of a MultiPoint containing three Points,
MULTIPOINT(0 0, 2 4, 10 8)
The geometry created by this WKT is shown in Figure 2-21.

Figure 2-21. A MultiPoint geometry.

42

www.it-ebooks.info
CHAPTER 2 SPATIAL FEATURES

Be careful with the placement of the comma(s) in the Well-Known Text representation of a
MultiPoint geometry; because each Point may contain between two and four coordinate values
(depending on whether the optional z- and m-coordinates are defined), you must place the comma
carefully to separate each coordinate tuple. Compare the following two WKT representations, which
use the same coordinate values, but vary in their comma placement.
MULTIPOINT(0 0, 2 4, 10 8)
MULTIPOINT(0 0 2 , 4 10 8)
The first geometry represents a MultiPoint geometry containing three Points, each one specified
with only x- and y-coordinates. The second example creates a MultiPoint containing only two Point
geometries, with each one specifying x-, y-, and z-coordinates.

Characteristics of MultiPoints
MultiPoint geometries all share the following characteristics.
MultiPoints, in common with the individual Point geometries from which they
are formed, are always simple closed geometries.
MultiPoints are one-dimensional, and have no length nor do they contain any area.

Many Single Points, or One Multipoint?


There are many occasions in which you will find yourself working with a set of data in which each
individual item can be defined as a Point. However, this does not necessarily mean that it makes sense
to combine all of the elements together into a MultiPoint. A MultiPoint should not generally be used
simply as a means to group a set of Points; instead, it should be used to define a single logical feature
whose geographic location and shape are best described by a set of disparate singular entities.
Consider a university campus consisting of a number of different buildings. In this case, it might
be reasonable to define the entire campus as a MultiPoint geometry, with each Point representing one
of the buildings on the campus.
However, what if you had a table of customers in which each customer's address was represented
as a Point; would it then make sense to create a MultiPoint combining all of your customers' locations
in a single geometry? Probably not. Generally speaking, the only situations in which you should do
this is if it makes sense to treat that collection as a logical single unit, or if there is some operation that
needs to be applied to the collection of Points as a whole. In other cases, you should just leave the data
as a set of otherwise unrelated Points.

MultiLineStrings
A MultiLineString is a homogeneous collection of LineString geometries.

Example MultiLineString Usage


The River Nile Delta, formed where the River Nile joins the Mediterranean Sea, is one of the largest
river deltas in the world. It extends approximately 240 km along the coastline of Northern Egypt, and
begins some 160 km farther south, near Cairo. Approximately half of Egypts population of 83 million
live within the region of the Nile Delta.
The network of distributary channels formed as the River Nile bifurcates can be modeled as a
MultiLineString, as illustrated in Figure 2-22.

43

www.it-ebooks.info
CHAPTER 2 SPATIAL FEATURES

Figure 2-22. A MultiLineString geometry representing the Nile River delta.

Defining a MultiLineString
The WKT representation of a MultiLineString geometry is formed by a comma-separated list of
individual LineString geometries, contained within a set of round brackets following the MULTILINESTRING
keyword. Because it is known that the elements within a MultiLineString must all be LineStrings, it is not
necessary to include the individual LINESTRING keyword in front of each element; each LineString is
merely represented by a comma-separated series of coordinate values within round brackets.
The following code listing demonstrates the Well-Known Text representation of a
MultiLineString containing three LineStrings: the first and second containing only two points each,
and the third containing three points.
MULTILINESTRING((0 0, 2 2), (3 2, 6 9), (3 3, 5 3, 8 8))
The MultiLineString geometry represented by this WKT is shown in Figure 2-23.

Figure 2-23. A MultiLineString geometry.

44

www.it-ebooks.info
CHAPTER 2 SPATIAL FEATURES

Characteristics of MultiLineStrings
Unlike MultiPoints and MultiPolygons, which generally contain elements that are disjoint from each
other, MutiLineStrings are frequently comprised of a number of intersecting LineString elements. For
example, any road or river that splits into two or more forks cannot be represented by a single
LineString geometry, and must instead be represented as a MultiLineString, with each fork being
represented by a separate LineString within the MultiLineString. Additional properties of
MultiLineStrings are as follows.
As are LineStrings, MultiLineStrings are one-dimensional.
A MultiLineString is simple if all of the LineString elements contained within it
are completely disjoint from each other. If any two LineStrings intersect (even if
they only touch each other) the MultiLineString is not considered to be simple.
A MultiLineString is closed if all of the LineString instances contained within it
are themselves closed (i.e., every LineString forms a closed loop, ending at the
same point from which it started).

MultiPolygons
A MultiPolygon is a geometry collection containing several Polygon geometries.

Example MultiPolygon Usage


MultiPolygons are frequently used to represent countries, because many countries are not defined by
a single continuous border. Take New Zealand, for example, as illustrated in Figure 2-24.

Figure 2-24. A MultiPolygon representing the country of New Zealand.

It is easy to think of other countries that consist of two or more separate geographic islands or
regions, including Japan, the United States of America (Alaska and Hawaii), Australia (Tasmania),
France (Corsica), and many others. In all these cases, the geographic area represented by a particular
political entity is best represented as a MultiPolygon.

45

www.it-ebooks.info
CHAPTER 2 SPATIAL FEATURES

Defining a MultiPolygon
The Well-Known Text representation of a MultiPolygon uses the MULTIPOLYGON keyword, followed by
the definition of each Polygon contained in the collection, contained within round brackets.
The following code listing illustrates the WKT syntax required to define a MultiPolygon
containing two Polygons, each one containing only a single exterior ring.
MULTIPOLYGON(((10 20, 30 10, 44 50, 10 20)), ((35 36, 37 37, 38 34, 35 36)))
Take care to place the brackets carefully, because brackets are used both to separate individual
rings within a Polygon and also to separate Polygons within a MultiPolygon. Compare the preceding
code listing to the following, which instead creates a MultiPolygon geometry containing only one
Polygon that contains an interior ring.
MULTIPOLYGON(((10 20, 30 10, 44 50, 10 20), (35 36, 37 37, 38 34, 35 36)))

Characteristics of MultiPolygons
Characteristics of MultiPolygons are as follows.
MultiPolygons are two-dimensional, simple, closed geometries.
The length of a MultiPolygon is defined as the sum of the lengths of all the rings
in all the Polygons it contains.
The area of a MultiPolygon is the sum of the areas of all its Polygons.

GeometryCollections
The MultiPoint, MultiLineString, and MultiPolygon geometries considered previously are examples of
geometry collections containing only a single type of geometry. It is also possible to define a generic,
heterogeneous GeometryCollection, which may contain any number of any type of geometry (with the
exception of the FullGlobe geometry, discussed later). The GeometryCollection is also the only type of
collection that can contain multiple curved objects.

Example GeometryCollection Usage


GeometryCollections are commonly returned as the result of an aggregate query that returns a single
record representing a varied set of features. For example, if you were to run an aggregate query to
return, All those features that lie within one mile of Trafalgar Square, London, you might expect to
see geometries in the results representing Nelsons Column and the Cenotaph (Points), Downing
Street and the Strand (LineStrings), and St. James Park (Polygon). If these results were aggregated
into a single record, the only geometry type capable of representing them all would be a
GeometryCollection.

Defining a GeometryCollection
The Well-Known Text syntax for a GeometryCollection begins with the keyword GEOMETRYCOLLECTION,
followed by the fully formed WKT representation of each element in the collection, contained within a
set of round brackets. The following code listing illustrates the WKT syntax for a GeometryCollection
containing a Polygon and a Point.
GEOMETRYCOLLECTION(POLYGON((5 5, 10 5, 10 10, 5 5)), POINT(10 12))

46

www.it-ebooks.info
CHAPTER 2 SPATIAL FEATURES

Characteristics of Geometry Collections


The characteristics of a GeometryCollection depend on the elements it contains.
If all the elements contained in the GeometryCollection are simple then the
GeometryCollection is itself simple. If any of the elements are not simple then
the collection is not simple.
Likewise, if all the elements contained within the GeometryCollection are
closed then the collection itself is closed. If any element is not closed then the
GeometryCollection is not closed.
The number of dimensions occupied by a GeometryCollection is the same as the
element with the highest number of dimensions that it contains. In other words,
any GeometryCollection that contains only Points and MultiPoints will occupy
zero dimensions; a collection containing LineStrings, CircularStrings, or
CompoundCurves will occupy one dimension; and a collection that contains at
least one Polygon or MultiPolygon will occupy two dimensions.

Note Although there are specific MultiLineString and MultiPolygon collections, there is no specific collection
that can contain multiple instances of their equivalent curved forms; there are no MultiCircularString, MultiCurve,
or MultiCurvePolygons, for example. To create a collection that contains more than one of these elements you
must use the generic GeometryCollection type.

FullGlobe
The FullGlobe is a special type of geometry that encompasses the whole surface of the Earth.

Defining a FullGlobe
A FullGlobe geometry covers the entire surface of the Earth, thus there is no need to state any
particular coordinate points in its construction. The Well-Known Text representation of a FullGlobe
geometry is therefore very simply:
FULLGLOBE

Characteristics of the FullGlobe geometry are as follows.


A FullGlobe does not have a boundary nor any exterior, because every point on
the Earth is considered to lie in the interior of a FullGlobe.
A FullGlobe geometry is closed.
A FullGlobe geometry cannot be a member of a GeometryCollection.

47

www.it-ebooks.info
CHAPTER 2 SPATIAL FEATURES

Empty Geometries
One final type of geometry to consider is an empty geometry. An empty geometry is one that does not
contain any points. Even though it contains no points, an empty geometry is still nominally assigned a
particular type, so you may have an empty Point or empty LineString geometry, for example.

Figure 2-25. An empty Point geometry, an empty LineString geometry, and an empty Polygon geometry.
(JOKE!)

You may be wondering why you would ever create an empty geometry: how can you represent the
location or position of a feature on the Earth using a shape with no points? One way of thinking about
this is as follows. If geometries represent the position (and therefore, by implication, the presence) of
features on the Earth's surface, then empty geometries denote the absence of any such features.
You don't generally create empty geometries directly, but you do get empty geometries returned
in the results of a spatial query in which no points match the specified criteria. For example, empty
geometries can be used as a response to a question, Where is x? when the answer is, Nowhere on
Earth.

Note An empty geometry is not the same as NULL. A NULL value suggests a result that has not been
evaluated or is undefined. An empty geometry value suggests that a result has been evaluated, but that it does not
represent a location on the Earth.

Defining an Empty Geometry


Empty geometries of any type can be defined using the Well-Known Text syntax by declaring the
name of the type of geometry followed by the word EMPTY.
Some examples of different types of empty geometries are as follows.
POINT EMPTY
LINESTRING EMPTY
GEOMETRYCOLLECTION EMPTY

48

www.it-ebooks.info
CHAPTER 2 SPATIAL FEATURES

Characteristics of Empty Geometries


Characteristics of empty geometries are as follows.
Empty geometries are simple geometries.
Empty geometries are not closed.
Empty geometries have no interior and no boundary. Everything is exterior to
an empty geometry.

Choosing the Correct Geometry


There is no correct type of geometry to use to represent any given feature on the Earth. The choice of
which geometry to use will depend on how you plan to use the data. If you are going to analyze the
geographic spread of your customer base, you could define Polygon geometries that represented the
shape of each of your customers' houses, but it would be a lot easier to consider each customer's
address as a single Point. In contrast, when conducting a detailed analysis of a small-scale area for
land-planning purposes, you may want to represent all roads, buildings, and even walls of buildings
as Polygons having both length and area, to ensure that the spatial data represent their actual shape as
closely as possible.

Summary
In this chapter, you learned about the different types of geometries that can be used to store spatial
data in SQL Server 2012.
Points are the most basic type of geometry, representing a singular location in
space. They are used as a building block to construct the more complex types of
geometry.
LineStrings, CircularStrings, and CompoundCurves are all one-dimensional
geometries that are typically used to represent paths, routes, borders, and
similar features.
Polygons and CurvePolygons are two-dimensional geometries. They have a
boundary that contains an interior area, and may also have one or more
interior holes.
Elements may be combined together into collections. Homogeneous collections
are MultiPoints, MultiLineStrings, and MultiPolygons, respectively. SQL Server
also supports a heterogeneous GeometryCollection that may contain any
number of any type of geometry (other than a FullGlobe).
There are two special types of geometry: the FullGlobe geometry, which covers
the entire surface of the Earth, and the Empty geometry, which contains no area
at all.

49

www.it-ebooks.info
CHAPTER 3

Spatial Datatypes

Every variable, parameter, and column in a SQL Server table is defined as being of a particular
datatype. The datatype determines the range of values that can be stored and the ways in which that
data can be used. You are probably already familiar with many SQL Server datatypes, such as those
listed in Table 3-1.

Table 3-1. Common SQL Server Datatypes

Datatype Usage

char Fixed-length character string

datetime Date and time value, accurate to 3.33 ms

float Floating-point numeric data

int Integer value between 2 31 (2,147,483,648) and 2 31 1 (2,147,483,647)

money Monetary or currency data

nvarchar Variable-length unicode character string

In addition to these common datatypes designed to hold numeric, character, or date and time data,
SQL Server 2012 has two datatypes specifically designed to hold spatial data: geometry and geography.
These are listed in Table 3-2.

Table 3-2. SQL Server Spatial Datatypes

Datatype Usage

geography Geodetic vector spatial data

geometry Planar vector spatial data

There are several similarities between the geometry and geography datatypes:

51

www.it-ebooks.info
CHAPTER 3 SPATIAL DATATYPES

They both employ a vector model of spatial data, in which features may be
represented using a range of geometries including Points, LineStrings,
Polygons, and collections of these types.
Internally, SQL Server stores values of both datatypes as a serialized stream of
binary data in the same format.
When working with items of data from either type, you use object-oriented
methods based on the .NET framework.
They both provide much of the same standard spatial functionality, such as
calculating the length or area of a feature, the distance between features, or
testing whether two features intersect.
However, there are also a number of important differences between the two types. When you store
spatial data in SQL Server 2012, you must choose whether to store that information using the geometry
datatype or the geography datatype according to the nature of the data in question, and how you plan to
use that data. In this chapter, we explore the features of the two types, and show how to decide which
datatype to use in a given situation.

SQLCLR Foundations
Before looking at the specific differences between the geometry and geography datatypes, its worth
spending a bit of time examining the way in which both datatypes are implemented in SQL Server.
The geometry and geography datatypes are both system-defined CLR datatypes. The abbreviation
CLR in this context refers to the Common Language Runtime, the environment used to execute managed
.NET code. Whereas most SQL Server queries use the T-SQL query engine, when working with CLR
datatypes SQL Server additionally leverages the .NET runtime process hosted by SQL Server known as
SQLCLR. The managed code environment in which SQLCLR operates allows SQL Server to deal with
certain types of complex data objects, such as spatial data, more efficiently than relying on T-SQL alone.

Note SQL Server uses SQLCLR to perform operations on system-defined CLR datatypesgeometry,
geography, and hierarchyidas well as for user-defined CLR datatypes (UDTs). When querying such data,
SQLCLR works alongside the T-SQL engine; it does not replace it.

In all versions of SQL Server since SQL Server 2005, it has been possible to use the SQLCLR to
execute user-defined .NET code. In order to do so, however, the server must first be configured to allow
such behavior, which can be done by calling the sp_configure system stored procedure and setting the
clr_enabled option to 1. In contrast, system-defined CLR datatypes such as geometry and geography
require no additional configuration; they are automatically available for use in all SQL Server 2012
databases. As such, you can start using spatial datatypes straight away in your database just as you
would any other type of data.
Even though they require no configuration, there are still some special considerations of working
with the geometry and geography datatypes (and with CLR datatypes in general) that you should be
aware of, as follows.
Each CLR data item is defined as an object, a serialized representation of a
compound set of values. If you do a simple SELECT query of a column of data
defined using a CLR datatype, your query will return a set of binary values.

52

www.it-ebooks.info
CHAPTER 3 SPATIAL DATATYPES

As with classes in object-oriented languages, a CLR datatype defines a number


of methods and properties. Each individual item of geography or geometry data is
an instance of the respective datatype, and inherits the methods of the datatype
from which it is derived. So, methods defined by the geometry datatype can only
be used on items of geometry data, and methods defined by the geography
datatype can only be used on items of geography data.
The syntax required for writing queries involving CLR data is a little bit
different from regular T-SQL query syntax, and may look unfamiliar to you.
Unlike regular T-SQL functions, such as SUM, CAST, or SUBSTRING, the methods and
properties defined by CLR datatypes, such as ToString(), STArea(), or
STIntersects(), are case-sensitive.
CLR datatypes can be used within a .NET application layer as well as within
SQLCLR, therefore with clever coding and application design you can develop
encapsulated spatial routines that can be re-used in both the database layer and
the application layer with only a minimum amount of recoding.
Errors encountered in SQLCLR code need to be handled in a different way from
T-SQL errors.
Because spatial data are not comparable, you can't use geometry or geography
data with any T-SQL operators that make a direct comparison between two
values. For example, you cannot ORDER BY or SELECT DISTINCT values from a
column of spatial data, nor can you UNION two result sets containing a column of
the geography or geometry datatype (although you can UNION ALL two datasets).
You also cant join two tables together using conventional join syntax such as
TableA.GeometryColumn = TableB.GeometryColumn. Instead, you must use the
appropriate comparison operator defined by the datatype itself, for example,
TableA.GeometryColumn.STEquals(TableB.GeometryColumn) = 1
Many of these issues are explored in more detail in the topics covered throughout this book.

Methods and Properties


The range of methods available for a given item of spatial data (i.e., what you can actually do with that
data) is dependent on the datatype in which it is defined. Although in most cases there are methods to
provide equivalent functionality in both the geometry and geography datatypes, there are some methods
that can only be applied to one or the other. For example, the STRelate() method, which allows you to
define and test for a specific pattern of intersection between two geometries, is only available for the
geometry datatype.
In some cases, there are methods that provide roughly equivalent functionality between the two
types, but under a different name: for example, Lat and Long, which return the (geographic) coordinate
values of a geography instance, provide equivalent functionality to STY and STX, which return the
(Cartesian) coordinate values of a geometry instance.
In general, the methods available using either type can be classified into one of two categories:
OGC methods: Methods that adhere to the Open Geospatial Consortium specifications
are prefixed by the letters ST (an abbreviation for spatiotemporal). These methods
provide commonly used, basic functionality for working with spatial instances such
as STIntersects(), used to determine whether one instance intersects another;
STDistance(), used to calculate the shortest distance between two instances; and
STArea(), used to calculate the area contained within a Polygon instance.

53

www.it-ebooks.info
CHAPTER 3 SPATIAL DATATYPES

SQL Server also provides a number of extended methods, which provide


additional functionality on top of the OGC standard. These include Reduce(),
which simplifies a geometry; BufferWithTolerance(), which applies a buffer
within a given tolerance limit; and Filter(), which performs an approximate test
of intersection based on a spatial index.

Static Methods
To create an item of geography or geometry data, you must use a static method belonging to the
appropriate datatype. The syntax for using a static method is to state the name of the datatype followed
by a pair of colons, and then the name of the method. For example, the Parse() method is a static
method that can be used to create an instance from a supplied Well-Known Text string. To use the
Parse() method of the geometry datatype to create a Point geometry at coordinates (30,40), you would
call it as shown in the following code listing,
SELECT geometry::Parse('POINT(30 40)');

Note The geometry Parse method treats all supplied coordinate values as defined using SRID 0; that is, they
are abstract coordinates with no relation to a specific model of the Earth.

To provide another example, the following code listing creates a table containing a single column
of the geography datatype, and then inserts three Points into that table created using the geography
Point() static method. The Point() method requires three parameters representing latitude, longitude,
and SRID. The return value of the method is a Point geometry at the specified location.
CREATE TABLE geographypoints (
Location geography
);

INSERT INTO geographypoints VALUES


(geography::Point(51, 1, 4326)),
(geography::Point(52, 2, 4326)),
(geography::Point(50.7, 1.1, 4326));
After executing this code listing, the geographypoints table now contains three rows each
representing a location in the south of England, defined using the EPSG:4326 spatial reference system.

Instance Methods
The process of creating an item of data using a static method is known as instantiation, and the
geometry created is referred to as an instance of the geometry or geography dataype (depending on the
static method from which it was created). Operations performed on individual values of spatial data
are therefore called instance methods. Items of both the geography and geometry datatypes provide a
range of instance methods for performing common calculations, including intersections, measuring
distances, and addition and subtraction of geometries.
The syntax for using an instance method is to state the name of the item (or column) of data on
which the method should be performed followed by a single dot (.) and then the name of the method to

54

www.it-ebooks.info
CHAPTER 3 SPATIAL DATATYPES

be called followed by a set of closed brackets (). If the method requires any parameters, these should be
supplied within the brackets.
For example, the ToString() method is an instance method that retrieves the Well-Known Text
representation of any item of geography or geometry data. It requires no parameters. To retrieve the
WKT of each Point in the Location column of the geographypoints table created in the preceding code
listing, you can execute the following.
SELECT
Location.ToString()
FROM
geographypoints;
As another example, the STBuffer() method is an instance method that creates a buffer zone
around a geometry. It requires a single parameter stating the amount by which the geometry should be
buffered. Positive values create an enlarged area around a geometry, whereas negative values create
a reduced area. The following code listing declares a geometry Point variable located at (12 7) using the
geometry Point() static method, and then selects a buffer of 5 units about that geometry by calling the
STBuffer() method on that instance.
DECLARE @point geometry = geometry::Point(12, 7, 0);
SELECT @point.STBuffer(5);
Note that instance methods can be chained together, where the result of one method is passed
directly to the next method. For example, the following code listing creates a buffered Point geometry
using STBuffer() and then returns the area of that buffered geometry by calling by calling the STArea()
method on the buffered instance.
DECLARE @point geometry = geometry::Point(3, 5, 0);
SELECT @point.STBuffer(5).STArea();

Note In the preceding code listings, the coordinates from which the geometry Points were created were
abstract x- and y-coordinates; they werent intended to represent any particular feature on the Earths surface. To
indicate this, a value of 0 has been supplied as the third parameter to the Point() method, which means that
these coordinates do not relate to any particular spatial reference system.

Properties
Certain properties of a geography or geometry instance can be accessed directly using property notation,
which, like the syntax for instance methods, uses the column name followed by a single dot and then the
name of the property to retrieve. However, because you do not need to provide parameters to retrieve
the property of an instance, property names are not followed by a set of brackets.
For example, the Lat and Long properties represent the latitude and longitude coordinates of a
geography Point instance, respectively. The following code listing illustrates how to retrieve the
latitude and longitude coordinates of each Point in the geographypoints table.
SELECT
Location.Lat,
Location.Long
FROM
geographypoints;

55

www.it-ebooks.info
CHAPTER 3 SPATIAL DATATYPES

Some properties, such as Lat and Long are read-only. That is to say, you cannot update the latitude
coordinate of a Point by running the following query,
UPDATE geographypoints SET Location.Lat = 20;
Attempting to do so will result in the following error.
Msg 6595, Level 16, State 1, Line 1
Could not assign to property 'Lat' for type 'Microsoft.SqlServer.Types.SqlGeography' in
assembly 'Microsoft.SqlServer.Types' because it is read only.
Changing the coordinate values associated with a geometry or geography instance requires you to
create an entirely new geometry from a static method. However, certain properties of existing
geometries can be both retrieved and set. For example, the STSrid property allows you either to return
or update the spatial reference identifier associated with an instance. To change the spatial reference
identifier of the Points in the geographypoints table to use the North American Datum 1983 (SRID
4269), you can execute the following code listing.
UPDATE geographypoints
SET Location.STSrid = 4269;

Note Updating the STSrid property of an instance does not reproject the coordinate values into the specified
spatial reference system; it merely changes the metadata describing the system in which those coordinates are
defined.

Spatial Libraries
All of the functionality of the geography and geometry datatypes is contained in two libraries:
Microsoft.SqlServer.Types.dll and SqlServerSpatial.dll. These assemblies are created when you
install SQL Server 2012, but they can also be installed separately as part of the Microsoft SQL Server
Feature Pack, available for download from http://www.microsoft.com/downloads.
Microsoft.SqlServer.Types.dll contains the managed (.NET) code necessary
to define the spatial datatypes, and is installed by default in the \Program
Files\Microsoft SQL Server\110\SDK\Assemblies directory.
SqlServerSpatial.dll contains additional functionality required to perform
spatial operations, written using native (C++) code. This assembly is installed
by default in the \Windows\System32 directory.
Because these two libraries are redistributable and independent of SQL Server, you can reference
them in your own applications and use exactly the same spatial methods as provided by the database in
any other layer of your architecture: in a client-side app, or in a webservice, for example. The machine
on which these applications are executed doesnt even need to have SQL Server installed, so long as it
has the two libraries listed above (and the prerequisites to execute the code they contain, namely the
.NET Framework and the Microsoft C++ runtime libraries).

56

www.it-ebooks.info
CHAPTER 3 SPATIAL DATATYPES

Note You can import SQL Servers spatial libraries and use them from within other applications, such as a WPF
application, an ASP.NET webpage, or a console application. However, because the SqlServerSpatial.dll library uses
unmanaged code, these applications must be run on a Windows-based platform, and you cannot use the spatial
datatypes in a Silverlight application, for example.

Whereas the SQL Server spatial datatypes are called geography and geometry, when you call
directly into the SqlServer.Types.dll library from .NET, the corresponding spatial datatypes are called
SqlGeography and SqlGeometry. Throughout this book, I concentrate on examples that use the geography
and geometry types as they are implemented in SQL Server. However, it is worth remembering that
almost all of the same functions can be applied to the equivalent SqlGeography and SqlGeometry types
in a .NET application (one significant exception to this rule is that it is only possible to create and
utilize a spatial index on data held in the database itself).

The geography Datatype


The most important feature of the geography datatype (and its .NET equivalent, SqlGeography) is that it
stores geodetic spatial data, which take account of the curved shape of the Earth.
In order to define positions on a geodetic model, geography data is always stated using angular
coordinates of latitude and longitude from a geographic coordinate system. Not only is geography data
defined and stored on a geodetic model, but when you write spatial queries involving geography data,
SQL Server uses angular computations to work out the result. These computations are calculated based
on the properties of the ellipsoid model in which that data was defined.
For example, if you were to define a LineString connecting two Points on the Earth's surface in the
geography datatype, the line would curve to follow the surface of the reference ellipsoid between the
Points. Every line drawn between two Points in the geography datatype is actually a great elliptic arc:
that is, the line on the surface of the ellipsoid formed by the plane intersecting the start and end Points
of the line and the center of the reference ellipsoid. This concept is illustrated in Figure 3-1.

Figure 3-1. Calculations on the geography datatype account for curvature of the Earth.

57

www.it-ebooks.info
CHAPTER 3 SPATIAL DATATYPES

Spatial Reference Systems for the geography Datatype


The geography datatype performs calculations with respect to the reference ellipsoid, therefore
geography data can only be defined using one of the geographic spatial reference systems supported by
SQL Server, as listed in the sys.spatial_reference_systems table. This table contains the metadata
required by SQL Server to perform the appropriate ellipsoidal calculations, and its structure is shown
in Table 3-3.

Table 3-3. Structure of the sys.spatial_reference_systems Table

Column Name Description

spatial_reference_id The integer identifier used within SQL Server 2012 to refer to
this system.

authority_name The name of the standards body that administers this reference.

authorized_spatial_reference_id The identifier allocated by the issuing authority to refer to this


system.

well_known_text The parameters of the spatial reference system, expressed in


Well-Known Text format.

unit_of_measure A text description of the unit used to express linear


measurements in this system, such as, "metre" or "foot".

unit_conversion_factor A scale factor for converting from meters into the appropriate
linear unit of measure

Note Currently all but one of the spatial reference systems supported by SQL Server are based on the EPSG
registry, and the value of the internal spatial_reference_id for any system listed in sys.spatial_reference_systems
is the same as the authorized_spatial_reference_id allocated by the EPSG. The only exception is SRID 104001, a
system defined by Microsoft that defines coordinates on a perfect unit sphere.

The parameters that describe a geographic coordinate system (the ellipsoid, prime meridian,
angular unit of measure, and the like) are defined in WKT format in the well_known_text column of the
sys.spatial_reference_systems table. However, you might also have noticed that one of the other
columns of data defined in the sys.spatial_reference_systems table is unit_of_measure. Why does SQL
Server need to know a separate unit of measure, you might ask, when there is already a unit of
measure embedded in the well_known_text definition of the system (which, for latitude and longitude
coordinates as used by a geographic coordinate system, is generally degrees)?
The answer is that, although angular units of latitude and longitude are all very well for describing
the location of Points, they are not that helpful for expressing the distance between Points, nor the area
enclosed within a set of Points. For example, using the spatial reference system EPSG:4326, we can state
the location of Paris, France as a point at 48.87N, 2.33E. Using the same system, the location of Berlin,

58

www.it-ebooks.info
CHAPTER 3 SPATIAL DATATYPES

Germany could be described as 52.52N, 13.4E. However, if you wanted to know the distance between
Paris and Berlin, it would not be very helpful to state that they were 11.65 apart, with the answer
measured in degrees. You would probably find it much more useful to know that the distance between
them was 880 km, or 546 miles, say.
To account for this, SQL Server defines an additional linear unit of measurement for every
geodetic spatial reference system. When you use the geography datatype, although coordinates must be
supplied in angular latitude and longitude coordinates, the results of any calculations are returned in
the linear unit of measure specified in the unit_of_measure column of the
sys.spatial_reference_systems table for the relevant spatial reference system.
To check the units of measurement corresponding to a particular spatial reference system, you
can run a query as follows (substituting the SRID of the appropriate spatial reference system).
SELECT
unit_of_measure
FROM
sys.spatial_reference_systems
WHERE
authority_name = 'EPSG'
AND
authorized_spatial_reference_id = 4326;
This query gives the following result, which tells us that linear measurements of any geography
data defined using the EPSG:4326 spatial reference system are stated in meters.

metre

With this knowledge, we can use the geography datatype to determine the distance between Paris
and Berlin based on the latitude and longitude coordinates as stated previously, but returning the
answer in meters. This is shown in the following code listing.

DECLARE @Paris geography = geography::Point(48.87, 2.33, 4326);


DECLARE @Berlin geography = geography::Point(52.52, 13.4, 4326);
SELECT @Paris.STDistance(@Berlin);

The result, a distance of just under 880 km, is as follows.

879989.866996421

Note Every time you store an item of data using the geography type, you must supply the SRID of the spatial
reference system from which the coordinates were obtained. SQL Server 2012 uses the information contained in
the spatial reference system to apply the relevant model of curvature of the Earth in its calculations, and also to
express the results of any linear methods in the appropriate units of measurement. The supplied SRID must
therefore correspond to one of the supported spatial references in the sys.spatial_reference_systems table.

59

www.it-ebooks.info
CHAPTER 3 SPATIAL DATATYPES

Correct Ring Orientation for geography Polygons


Recall that a ring is a closed LineString, and a Polygon geometry is made up of one or more rings that
define the boundary of the area contained within the Polygon. Ring orientation refers to the
direction, or order, in which the Points that make up the ring of a Polygon are listed.
The geography datatype defines features on a geodetic model of the Earth, which is a continuous
round surface. Unlike the image created from a map projection, this geodetic model has no edges; you
can continue going in one direction all the way around the world and get back to where you started.
This becomes significant when defining a Polygon ring because when using a three-dimensional
round model it is ambiguous as to what area is contained inside the ring, and what is outside. Consider
Figure 3-2, which illustrates a Polygon defined using the geography datatype whose exterior ring is a
series of points drawn around the equator; does the interior of this Polygon include the Northern
Hemisphere, or the Southern Hemisphere?

Figure 3-2. The importance of Polygon ring orientation using the geography datatype. Does the Polygon
created from the ring shown here contain the Northern Hemisphere, or the Southern Hemisphere?

To resolve this ambiguity, SQL Server applies a rule known as the "left-hand rule" (or, sometimes,
the "left-foot rule"); if you imagine yourself walking along the ring of a geography Polygon, following
the points in the order in which they are listed, SQL Server 2012 treats the area on the left of the line
drawn between the Points of the ring as the interior of the Polygon, and the area on the right as the
exterior. Another way of thinking about this is to imagine looking down at a point on the surface of the
Earth from space; if that point is enclosed by a Polygon ring in a counterclockwise direction then that
point is contained inside the Polygon, otherwise it is outside.
Applying this rule, we can determine that the Polygon illustrated in Figure 3-2 therefore
represents the Northern Hemisphere. If the order of the points in the Polygon ring were to be
reversed, then this Polygon would instead contain the Southern Hemisphere.

60

www.it-ebooks.info
CHAPTER 3 SPATIAL DATATYPES

Caution It is a common mistake to list the points of a Polygon ring with incorrect ring orientation, in which case
the resulting Polygon is inside-out: the area that was intended to be contained within the Polygon is outside, and
the interior of the Polygon actually contains the entire rest of the Earths surface. When defining a Polygon ring in
the geography datatype, ensure that the interior is on the "left" of the line connecting the points.

Remember that Polygons can also contain one or more interior rings, which mark out areas of
space not included in the interior of the Polygon. To define an area of space not included in a Polygon
you should therefore enclose it in a ring of Points listed in clockwise order, so that the area to be
excluded lies to the right of the line. The illustration shown in Figure 3-3 demonstrates the correct ring
orientation required to define a geography Polygon containing a hole.

Figure 3-3. A geography Polygon containing an interior ring.

As before, if the Points of each ring were listed in reverse order then the Polygon would become
inverted: the interior of the Polygon would contain the entire surface of the globe except for the area
shaded in gray in Figure 3-3.

61

www.it-ebooks.info
CHAPTER 3 SPATIAL DATATYPES

Tip The geography datatype defines a ReorientObject() method, which flips the interior and exterior of a
geography Polygon instance, and has the same effect as reversing the coordinate order of each ring.

The geometry Datatype


In contrast to the geography datatype, the geometry datatype operates on a two-dimensional flat plane.
As such, calculations such as the distance between Points are worked out using simple geometrical
methods on a flat surface. This is illustrated in Figure 3-4.

Figure 3-4. Calculations on the planar geometry type operate on a flat plane.

The geometry datatype stores planar spatial data defined using Cartesian (x, y) coordinates, which
makes it ideally suited to storing coordinate data from a projected spatial reference system. In this case,
the act of projection has already mapped the geographic position of features on the Earths surface onto
a flat plane. It can also be used to store unprojected geographic coordinates of latitude and longitude,
where the longitude coordinate value is mapped directly to the x value, and the latitude value is used as
the y value (in doing so, you are implicitly projecting the data using an equirectangular projection).
In fact, the geometry datatype can be used to store any coordinates that can be expressed using x and
y values. Examples of such coordinates might be data collected from a local survey, or topological plans
of a geographically small area where curvature of the Earth can be safely ignored, or geometrical data
obtained from computer-aided design (CAD) packages. For example, Figure 3-5 shows the floorplan of
the European SQLBits conference held at the Grand Hotel in Brighton, England in April 2012, defined
entirely using the geometry datatype and displayed using the SQL Server Management Studio Spatial
Results tab. This data was created from an original plan supplied in PDF format.

62

www.it-ebooks.info
CHAPTER 3 SPATIAL DATATYPES

Figure 3-5. A floorplan modeled using the geometry datatype.

Because the geometry datatype uses simple planar calculations, the results of any computations
will be expressed in the same units of measurement as the underlying coordinate values. The most
common unit of measurement in which the coordinates of projected spatial reference systems are
stated is the meter; this is the case for the Universal Transverse Mercator system and many National
Grid reference systems, for example. If you use the geometry datatype to store data based on one of
these systems, lengths and distances calculated using the STLength() and STDistance() methods will be
measured in meters. And, if you were to use STArea() to calculate the area of a geometry, the result
would be measured in square meters. In contrast, the coordinates used to create the floorplan shown in
Figure 3-5 were measured in pixels, so using the STArea() method to calculate the area of each room
would lead to a result measured in pixels squared.

Caution Earlier, it was stated that the geometry datatype could be used to store "unprojected" geographic
coordinates of latitude and longitude, directly mapped to the y- and x-coordinates. However, remember that these
are angular coordinates, usually measured in degrees. If you use the geometry datatype to store information in
this way then the distances between points would also be measured in degrees, and the area enclosed within a
Polygon would be measured in degrees squared. This is almost certainly not what you want, so exercise caution
when using the geometry datatype in this way. To return a geometry calculation measured in meters, say, the
input coordinates must also be measured in meters.

63

www.it-ebooks.info
CHAPTER 3 SPATIAL DATATYPES

Spatial Reference Systems for the geometry Datatype


The geometry datatype does not take account of any curvature of the Earth, nor does it rely on knowledge
of the unit of measurement in which coordinates are defined, therefore stating a different SRID does not
alter the numeric value of any calculations performed on geometry data. This can be a tricky concept to
grasp; the first chapter of this book states that any pair of coordinatesprojected or geographicmust be
stated together with their associated spatial reference in order to refer to a position on the Earth. So why
doesnt it make a difference what spatial reference identifier is supplied to the geometry datatype?
The answer is that choosing a difference spatial reference system does make a difference when
initially determining the coordinates that identify a position on the Earth. However, once those values
have been obtained, all further operations on those data can be performed using basic geometrical
methods. Any decisions concerning how to deal with the curvature of the Earth have already been made
in the process of defining the coordinates that describe where any point lies on the projected image.
When using the geometry datatype, the distance between a Point at (0 0) and a Point located at (30
40) will always be 50 units, whatever spatial reference system was used to obtain those coordinates,
and in whatever units they are expressed. The actual features on the Earth represented by the Points at
(0 0) and (30 40) will be different depending on the system in question, but this does not affect the way
that geometry data is used in calculations. The result will always be accurate relative to the spatial
reference in which the Points themselves were defined.
Consider the following code listing, which defines the Well-Known Text representation of a
straight LineString geometry. The LineString in question represents the Royal Mile, which is the
straight route connecting Edinburgh Castle with the Palace of Holyrood House, running along some of
the oldest streets in Edinburgh. The coordinates in the Well-Known Text string are defined using
SRID:27700, which is a projected coordinate system in which coordinates are measured in meters, so
the result of the STLength() method gives the length of the Royal Mile in meters:
-- Define the WKT of the LineString
DECLARE @WKT nvarchar(max) = 'LINESTRING (325156 673448, 326897 673929)';

-- Construct a geometry LineString from the coordinates using SRID 27700


DECLARE @RoyalMile geometry = geometry::STGeomFromText(@WKT, 27700);

-- Calculate the length of the LineString


SELECT @RoyalMile.STLength();
The result is 1806.22313128805. Now suppose that we were to use exactly the same Well-Known
Text string but, this time, change the SRID supplied to the STGeomFromText() method:
-- Define the WKT of the LineString
DECLARE @WKT nvarchar(max) = 'LINESTRING (325156 673448, 326897 673929)';

-- Construct a geometry LineString from the coordinates using SRID 32039


DECLARE @RoyalMile geometry = geometry::STGeomFromText(@WKT, 32039);

-- Calculate the length of the LineString


SELECT @RoyalMile.STLength();
The result is still 1806.22313128805, exactly as before. As demonstrated, the numeric value of any
calculations performed using geometry data remains unchanged no matter in what spatial reference
system the coordinates are provided. However, that doesn't mean to say that it's not still important to
provide the correct SRID to identify the spatial reference system from which coordinates were derived.
The spatial reference system defines the important additional information that makes those coordinates
relate to a particular position on the Earth, and enable you to interpret the results appropriately:

64

www.it-ebooks.info
CHAPTER 3 SPATIAL DATATYPES

The first code listing uses the correct SRID for the coordinates, EPSG:27700,
which defines the Royal Mile as a straight line between two points in
Edinburgh, 1806 meters in length.
The second code listing uses the same coordinate values but suggests that they
are defined relative to the Texas Central coordinate system (EPSG:32039).
Although the numerical value of the result is the same, the interpretation of
this result would imply that the Royal Mile is a straight line of length
1806.22313128805 feet, drawn between two points in the Quitman Mountains
of Texas, United States. This is clearly incorrect!
Supplying an incorrect SRID will lead to many problems farther downstream in your spatial
applications. Once the metadata associated with a set of coordinates has been lost it cannot be
redetermined, because looking at a set of coordinate values in isolation gives very little indication of
the system from which they have been derived. By explicitly stating the SRID with every set of
coordinates, not only will you retain this important metadata, but it will also ensure that you do not
accidentally try to perform a calculation on items of spatial data defined using different spatial
reference systems, which would lead to an invalid result.
So please, I implore you, always, always use the correct SRID with any spatial data in SQL Server,
even when using the geometry datatype!

Note The sys.spatial_references table only contains details of geodetic spatial references, because these
are required to perform calculations using the geography datatype. In order to find the appropriate SRID for a
projected coordinate system, you can look it up on the EPSG website at http://www.epsg-registry.org/.

Storing Nongeodetic Data


The geometry datatype stores planar coordinates and uses standard Euclidean methods to perform
calculations for which no SRID is necessary, therefore it can also be used to store any data that can be
described using pairs of x and y floating point coordinates (or, if using the optional z- and m-
coordinates, up to four coordinate values per item). Such items of data do not necessarily have to relate
to any particular model of the shape of the Earth; for example, you could store the location of items in a
warehouse using x- and y-coordinates relative to a local origin, or describe the location of components
on a printed circuitboard.
When using the geometry type to record data such as these, you should use SRID 0. This SRID
denotes that the coordinates are not derived from any particular spatial reference system, and
coordinate values should be treated as x and y values with no specific units of measurement.

The Insignificance of Ring Orientation in a geometry Polygon


The geometry datatype operates on an infinite flat plane, thus the area contained by a closed line
drawn between any set of points is unambiguous. Therefore, unlike when using the geography
datatype, ring orientation (the direction in which the points of a ring are specified) is unimportant for
geometry Polygons.
In the geometry datatype, a Polygon whose exterior ring is defined by the coordinates
(50,30), (52,30), (52,31), (50,31), (50,30)

65

www.it-ebooks.info
CHAPTER 3 SPATIAL DATATYPES

contains exactly the same area as if it were specified by the following coordinates,
(50,30), (50,31), (52,31), (52,30), (50,30)
This applies to both interior and exterior rings of the geometry datatype, so the area contained by a
geometry Polygon remains exactly the same even if the points are supplied in reverse order. This is
illustrated in Figure 3-6, where the highlighted Polygon will always represent the Northern
Hemisphere irrespective of ring orientation.

Figure 3-6. Ring orientation is not significant in the geometry datatype.

Comparing geography and geometry Datatypes


On the face of it, the decision as to which spatial datatype you should use in a given situation is pretty
straightforward: if youve got projected coordinate data, measured in x- and y-coordinates, then use the
geometry datatype; if, on the other hand, youve got geographic coordinate data expressed in latitude and
longitude coordinates, then use the geography datatype. However, although the difference in accepted
coordinate systems is certainly the single most important difference between the types, there are other
factors that you should bear in mind when deciding which datatype to use for a given application.

Consistency
In order to perform operations using different items of spatial data in SQL Server 2012, all of the data
must be defined using the same spatial reference system, and stored using the same datatype. It is not
possible to combine geometry and geography data in the same query, nor perform operations on items
of the same datatype defined using different SRIDs. If you attempt to do so, SQL Server will return a
NULL result.
If you already have existing spatial data that you would like to integrate into your system, you
should therefore use a datatype suitable for the format in which that data has been collected. For

66

www.it-ebooks.info
CHAPTER 3 SPATIAL DATATYPES

instance, if you have projected data collected from the National Grid of Great Britain, you should store
the data in a geometry field, using the SRID of 27700. If you are using latitude and longitude coordinate
data collected from a GPS system, then you should choose a geography type, with SRID of 4326. If you
would like to combine multiple sets of data defined in different spatial reference systems, then you
must first transform one or more of the sets of coordinate data in order to make them consistent. For
more information on reprojection and transformation, refer to Chapter 8.

Note Remember that the spatial reference identifier provides descriptive information about the system in which
coordinate values have been defined; it does not dictate the system itself. You therefore cannot simply update the
SRID value relating to a set of coordinates to express them in a different spatial reference system. Instead, to
convert coordinates from one spatial reference system into another you must transform or reproject the data.

Accuracy
Calculations using the geometry datatype are performed on a flat plane. Any geospatial features drawn
on a flat plane must have been projected and, as explained in Chapter 1, the process of projecting any
three-dimensional object onto a surface will always lead to some distortion in the way those features
are represented. This distortion may affect the area, shape, distance, or direction of the data. Therefore,
the results of certain operations using the geometry datatype will inevitably also be distorted, with the
effect of distortion varying depending on the projection used and the particular part of the Earth's
surface on which the calculation is based.
Generally speaking, the greater the surface area being projected, the more distortion occurs.
Although over small areas the effects of these distortions are fairly minimal, in large-scale or global
applications there can be a significant impact on the accuracy of any results obtained using the
geometry datatype when compared to the geography datatype (which is not distorted by projection).
In many applications that cover only a small spatial area, such as those contained within a
particular state of the United States, the results of calculations performed using the geometry type on
the relevant state plane projection will be sufficiently accurate. However, over larger distances, the
computations based on a planar projection will become more and more inaccurate, and the geography
datatype become a more suitable choice.

The End(s) of the World


One particularly extreme example of the distortion occurring as a result of projection is that, unlike the
Earth itself, a projected map has edges. When storing projected spatial data using the geometry
datatype, special consideration therefore needs to be taken in situations where you need to define
data that cross these edges. This typically occurs in the following situations.
Any geometries or calculations that cross the 180th meridian
Any Polygon geometries that enclose the North or South Pole
To demonstrate how these distortions affect calculations using the geometry datatype, consider
how you might calculate the shortest straight line route taken from Vancouver to Tokyo. Using the flat
geometry datatype, the result (based on a map projection centered on the Greenwich meridian) might
look like that shown in Figure 3-7.

67

www.it-ebooks.info
CHAPTER 3 SPATIAL DATATYPES

Figure 3-7. The shortest line between Tokyo and Vancouver using the geometry datatype.

In contrast, the geography datatype uses a continuous round model of the Earth, which is unaffected
by the edges introduced as a result of projection. The answer obtained for the shortest route between
Tokyo and Vancouver using the geography datatype would instead look like that shown in Figure 3-8.

Figure 3-8. The shortest route between Tokyo and Vancouver using the geography datatype.

68

www.it-ebooks.info
CHAPTER 3 SPATIAL DATATYPES

It is obvious that, in cases such as these, the results obtained using the geography datatype give a
more accurate answer based on the real round Earth.
A further demonstration of these issues is the problem of trying to define geometry instances that
extend across the edges of the map in a given projection. Figure 3-9 highlights a Polygon geometry
representing Russia.

Figure 3-9. Polygon geometry representing Russia crossing edges of a projection in the geometry datatype.

Notice that although most of the Polygon is contained in the eastern hemisphere, the most
northeasterly part of Russia (the region of Chukotka) actually crosses the edge of the map, to appear in
the western hemisphere. Using the geometry datatype based on this projection, it would not be possible
to represent Russia using a single Polygon geometry; instead you would need to use a MultiPolygon
geometry containing two elements to represent the two distinct Polygons created where the edge of
the map had caused the original feature to be divided in two.
Both of the problems demonstrated in this section could be mitigated to some extent by choosing
an appropriate projected spatial reference system in which the geometry in question does not cross
the edges of the map. However, although this would avoid the issue for a particular case, it does not
solve it; even if a different projection is chosen there will always be some features that will occur on
the new edges instead.
If you expect to have to deal with geometries that risk extending over the edges of a map
projection, then the geography datatype would be a better choice in which to store your data.

Presentation
Because the geography datatype operates on a three-dimensional model of the Earth, if you want to
present the results of any geography data in a display, they will need to be projected (unless youve got
one of those fancy 3D displays). As we have already discussed, this introduces distortion. In the example
above, although the geography datatype accurately works out the shortest straight line connecting two
points, if we were to display this result on a projected map, this "straight" line would appear distorted

69

www.it-ebooks.info
CHAPTER 3 SPATIAL DATATYPES

and curved. The exact effect of this distortion will differ depending on the particular properties of the
map projection used.
Conversely, because as the geometry datatype is based on data that has already been projected
onto a plane, no further calculations or distortion need be introduced to express the results on a map:
"straight" lines in the geometry datatype remain straight when drawn on a map (providing the map is
projected using the same projection as the spatial reference system from which the points were
obtained).
If you are storing spatial data in SQL Server specifically for the purpose of display on a particular
map (say, for creating a tile overlay on Bing Maps or Google Maps), then it might be beneficial to
store that data using the geometry datatype in the same projected spatial reference system as the map
on which it is intended to be displayed. This reduces additional calculations required when the data is
retrieved and projected onto the map, and reduces the need to introduce further distortion in the data
at display time.

Performance
Performing ellipsoidal computations uses more computing resources than Cartesian computations. As
a result, spatial calculations using the geography datatype may take longer to compute than the
equivalent operations using the geometry datatype. This only affects methods where the geography
datatype has to calculate metrics based on the geodetic model of the Earth (such as distances, lengths,
or areas). When using methods that return properties of objects which do not take account of the model
of the Earth, such as returning the number of points in an object, there is no difference in performance
between geography and geometry types.

OGC Standards Compliance


According to their website, the Open Geospatial Consortium (OGC) is:

a non-profit, international, voluntary consensus standards organization that is


leading the development of standards for geospatial and location based services.

http://www.opengeospatial.org/

The OGC administer a number of industrywide standards for dealing with spatial data. By
conforming to these standards, different systems can ensure core levels of common functionality, which
ensures that spatial information can be more easily shared among different vendors and systems.
In October 2007, Microsoft joined the Open Geospatial Consortium (OGC) as a principal member,
and the spatial datatypes implemented in SQL Server 2012 are largely based on the standards defined
by the OGC:
The geometry datatype conforms to the OGC Simple Features for SQL
specifications v1.1.0 (http://www.opengeospatial.org/standards/sfs) and
implements all the required methods to meet that standard.
The geography datatype implements most of the same methods as the geometry
datatype, although it does not completely conform to the required OGC standards.
As such, if it is important to you to use spatial methods in SQL Server 2012 that adhere to accepted
OGC standards (such as if you are replacing the functionality of a legacy system based on those
standards), you should use the geometry datatype.

70

www.it-ebooks.info
CHAPTER 3 SPATIAL DATATYPES

General Rules
If you are still unsure which type to use, consider the following general rules.
If you have latitude and longitude coordinate data (collected from a GPS, from
Google Earth, or most sources listed on the Web) use the geography datatype,
normally using the default 4326 SRID.
If you have x- and y-coordinate data (e.g., collected from a flat map), use the
geometry datatype with an SRID to represent the map projection and datum used.
If you have x- and y-coordinate data that are not defined in relation to any
particular model of the Earth, use the geometry datatype with SRID = 0.

Storage Requirements
The datatypes geometry and geography are both of variable length. In contrast to a fixed-length
datatype such as int or datetime, the actual amount of storage required for an item of spatial data
varies depending on the complexity of the object that the data describes. Just as a varchar(max) field
varies in size according to the number of characters in the string being stored, so too does a geometry
or geography field vary in size according to the number of points in the corresponding geometry.
The structure of an item of geometry or geography data begins with a header section, which defines
basic information such as the type of shape being described, the spatial reference system used, and the
overall number of points in the object. This header is immediately followed by the coordinate values of
each x- and y- (or longitude and latitude) coordinate in the geometry, represented in 8-byte binary
format. The more points that an object has in its definition, the longer this binary stream will be, and
therefore the more storage space will be required.
A Point geometry defined with only two coordinates will always occupy 22 bytes
of storage space.
A LineString between two Points, containing the minimum of four coordinates (x
and y values of the start and end Point), requires 38 bytes of storage. This
increases by 16 bytes for every additional line segment added to the LineString.
A Polygon occupies a variable amount of space depending on the number of Points
with which it is defined (not related to the area of space contained by the Polygon).
If a Polygon contains interior rings, these also increase the storage required.
There is no specific maximum size of an item of geometry or geography data. However, SQL Server
2012 has an overall restriction on any kind of large object, which is limited to a size of 231 1 bytes.
This is the same limit as is applied to datatypes such as varbinary(max) and varchar(max), and equates
to approximately 2 Gb for each individual item of data. You would need to store a very complex
geometry object in order to exceed this limit. If necessary, remember that complex geometries can be
broken down into a number of individual objects which each fit within this limit.

Tip You can use the DATALENGTH function to find out the number of bytes used to store the value of any item of
geometry or geography data (or, for that matter, any other item of data).

71

www.it-ebooks.info
CHAPTER 3 SPATIAL DATATYPES

Internal Data Structure


Generally speaking, you do not need to know the internal format that SQL Server uses to store
geography or geometry data; the dedicated methods provided by each type provide the functionality
required to create and modify instances as appropriate. However, on some occasions it can be useful to
modify the bytes of a geography or geometry instance directly. For example, you may be designing a
spatial application using nonmanaged code (such as C or C++) that cannot use the SqlGeography and
SqlGeometry types directly, and you need to programmatically construct a geography or geometry
instance from its constituent elements.
To investigate the format used by SQL Server, lets first create a simple geography Point instance
using the Point() static method and then SELECT the corresponding native value, as shown in the
following code listing.
SELECT
geography::Point(40, 100, 4269);
The result is:

0xAD100000020C000000000000444000000000000059C0

This value may be broken down into a number of constituent parts, as shown in Table 3-4.

Table 3-4. Elements of a Geography Instance

Element Description

0x Hexadecimal notation identifier

AD100000 Spatial Reference identifier (4 bytes). The integer SRID valuein this case
4269expressed as a 4-byte binary value.

02 Version number (1 byte). SQL Server 2008/SQL Server 2008 R2 uses version 1
serialization, whereas SQL Server 2012 uses version 2 serialization.

OC Serialization properties (1 byte). This value is set from a series of bit flags
representing the following additional properties of the geometry.
Whether the geometry is larger than a hemisphere (0 20)
Whether the geometry is a single line segment (0 10)
Whether the geometry is a single Point (0 08)
Whether the geometry is valid (0 04)
Whether the geometry contains m-coordinates (0 02)
Whether the geometry contains z-coordinates (0 01)
For this case, the flags for a valid (0 04), single Point (0 08) geometry have
been set, leading to the value 0C

72

www.it-ebooks.info
CHAPTER 3 SPATIAL DATATYPES

Element Description
0000000000004440 Latitude coordinate (40) expressed as 8-byte floating point binary

00000000000059C0 Longitude coordinate (100) expressed as 8-byte floating point binary

Suppose instead that we wanted to construct programmatically a geography Point instance defined
at a latitude of 42 degrees, longitude of 90 degrees, using the SRID 4326. To do so, we could build up
the geography value from the corresponding binary elements, as shown in the following code listing.
DECLARE @point geography =
0xE6100000 + -- SRID (4326)
0x02 + -- Version (2)
0x0C + -- Properties (Single Point [8] + Valid [4])
0x0000000000004540 + -- Latitude (42)
0x00000000008056C0 -- Longitude (90)

SELECT
@point.STSrid,
@point.ToString();
The WKT results returned by the ToString() method confirm that the geography Point has been
created at the appropriate coordinates, using the 4326 SRID:

4326 POINT(90 42)

This is admittedly a fairly contrived example, and there are few situations in which you would
need to do such manual binary manipulation in T-SQL. However, it does demonstrate that it is
certainly possible to do so, and you can reuse the same approach in other application layers.

Note For more information on the serialization format used for SQL CLR datatypes, refer to the following
document: http://download.microsoft.com/download/7/9/3/79326E29-1E2E-45EE-AA73-
74043587B17D/%5BMS-SSCLRT%5D.pdf.

Converting Between Datatypes


Given that the two spatial datatypes are so similar, you might think that it would be an easy task to
convert data between the two. However, you cannot simply CAST or CONVERT between the two types. If
you try to do so, such as in the following query,
DECLARE @geog geography;
SET @geog = geography::STGeomFromText('POINT(23 32)', 4326);
SELECT CAST(@geog AS geometry);
you will receive the error,

73

www.it-ebooks.info
CHAPTER 3 SPATIAL DATATYPES

Msg 529, Level 16, State 2, Line 5


Explicit conversion from datatype sys.geography to sys.geometry is not allowed.

Notice the wording used in this error message: conversion is not allowed. This is not a technical
limitation of SQL Server; because geometry and geography both use the same underlying structure,
converting data between the types is incredibly easy from a technical point of view. Rather, this is a
deliberate restriction imposed by SQL Server to ensure that you understand the implications of
working with each datatype, and that you dont casually swap data between them.
There are very few scenarios in which it makes sense to take coordinate data from one spatial
datatype and convert it directly into the other. If you find a requirement to convert data between the
two datatypes then it normally also involves transforming the associated coordinate data from a
geographic coordinate system used in the geography datatype to a projected coordinate system for the
geometry datatype, for example. Nevertheless, there are a few occasions when it is useful to be able to
take geographic coordinates from a geography instance and convert them directly into the geometry
datatype. This might be the case if you want to use one of the methods that is only available to the
geometry datatype, such as STBoundingBox(), or STRelate().
In order to convert from geography to geometry, we can take advantage of the fact that both
datatypes can be represented by, and created from, a binary stream in the Well-Known Binary format.
In the following example the value of the geometry variable, @geom, is created from the STGeomFromWKB()
static method. The arguments passed to this method are the Well-Known Binary representation and
SRID of the geography variable @geog.
-- First, create a geography instance
DECLARE @geog geography;
SET @geog = geography::Point(23,32, 4326);

-- Convert to geometry via WKB


DECLARE @geom geometry;
SET @geom = geometry::STGeomFromWKB(@geog.STAsBinary(), @geog.STSrid);
The resulting geometry Point instance is defined using exactly the same coordinates and SRID as
the original geography instance @geog.
The approach just described can be used to convert any geography instance to the geometry
datatype. Inasmuch as geometry does not enforce any restrictions on the SRID used, any coordinate
data stored using the geography datatype can also be stored using the geometry datatype. However, in
order to perform conversion the other way, from geometry to geography, the coordinate values of the
existing geometry instance must represent latitudelongitude coordinates taken from a supported
geodetic spatial reference system. If you are storing Northing and Easting coordinates from a
projected system, or other nongeodetic data, those data can only be stored using the geometry datatype.

Creating Spatially Enabled Tables


There are no special attributes or configurations required to enable spatial data to be stored in a SQL
server database; all that is required is a table containing at least one geography or geometry column.
The following code listing creates a table containing two columns: CityName, which can hold a 255-
character variable length string, and CityLocation, which can be used to store the location of that city
using the geography datatype:
CREATE TABLE dbo.cities (
CityName varchar(255),
CityLocation geography
);

74

www.it-ebooks.info
CHAPTER 3 SPATIAL DATATYPES

New geometry or geography columns can be added to existing tables, enabling spatial information
to be seamlessly integrated alongside existing items of data. Let's suppose that you have an existing
table, customer, that contained the following fields of customer information.
CREATE TABLE dbo.customer (
CustomerID int,
FirstName varchar(50),
Surname varchar (50),
Address varchar (255),
Postcode varchar (10),
Country varchar(32)
);
Now suppose that you want to add an additional spatial field to this table to record the location of
each customer's address. No problem; geography and geometry fields can be added to existing tables just
like any other by using an ALTER TABLE statement as follows.
ALTER TABLE dbo.customer
ADD CustomerLocation geography;
By extending the table in this way, we have enabled the possiblity of using spatial methods in
conjunction with our existing customer data, to find answers to questions such as how many customers
there are within a certain area, and how far a particular customer lives from his closest store.

Enforcing a Common SRID


It is worth nothing that, although all the values stored in a single geography or geometry column must
match the datatype of that column, they do not all have to share the same spatial reference system.
Within a single geometry column, you may have instances defined using the UTM grid, the Alaskan
State Plane coordinate system, and the National Grid of Finland, for example. However, I find that the
flexibility to store instances from different spatial reference systems in the same column is rarely
useful; remember that, in order to perform calculations involving two or more items of data, they must
be defined using the same spatial reference system. Mixing spatial reference systems within the same
column introduces an unnecessary chance of error.
Within a single column of a table, I only store instances defined using a common spatial reference
system. I normally name this column according to the type of data and the spatial reference system
used. For example, for a column containing geography instances defined using SRID 4326, I would
choose the column name geog4326. Of course, you can name your spatial columns using any name
youd like, but I find that using this convention helps me by knowing, at a glance, important metadata
about all the values in that column.
To ensure that all the instances in a column are defined using the same system, you can add a
constraint on the STSrid property of that column. For example, the following code listing demonstrates
how to add a constraint to ensure that only instances defined using SRID 4199 (Egypt 1930) are stored
in the CustomerLocation column of the customer table.
ALTER TABLE dbo.customer
ADD CONSTRAINT enforce_customerlocation_srid4199
CHECK (CustomerLocation.STSrid = 4199);
It is now possible to perform calculations safely using any two data items from this column,
knowing that they will be defined based on the same spatial reference system. You may also find it
useful to enforce additional constraints on the type of spatial data stored in a particular column. For
example, the following code listing creates a constraint that uses the STGeometryType() method to
ensure that only Point geometries may be stored in the CustomerLocation column.

75

www.it-ebooks.info
CHAPTER 3 SPATIAL DATATYPES

ALTER TABLE dbo.customer


ADD CONSTRAINT enforce_customerlocation_point
CHECK (CustomerLocation.STGeometryType() = 'POINT');

Summary
In this chapter you learned about the two datatypes used for storing spatial data in SQL Server 2012,
geometry and geography, and examined the key differences between the two types.
The geography datatype uses geodetic spatial data, which accounts for the
curvature of the earth.
The geometry datatype uses planar spatial data, in which all points lie on a flat
plane.
You considered the factors influencing the choice of which datatype to use, and
saw some example usage scenarios for each type.
You saw the reasons why, whichever datatype you use, it is important to state the
correct spatial reference identifier associated with any coordinate data.
You examined the structure in which SQL Server 2012 stores spatial data,
represented as a stream of binary values.
You also saw how to add a column of spatial data to a SQL Server table, and add a
constraint to that column to ensure that only data of a certain SRID could be
inserted.

76

www.it-ebooks.info
CHAPTER 4

Creating Spatial Data

In the first three chapters of this book I introduced you to the main components necessary to define
spatial data in SQL Server, namely, a set of coordinate locations, the spatial reference system in which
those coordinates are defined, the type of geometry used to represent a feature, and the datatype in
which that feature is stored.
In this chapter, we apply the knowledge youve gained so far in a practical context, by looking at
the different methods you can use to create items of geometry or geography data. Every method requires
those same key pieces of information: the spatial reference system, type of geometry, and datatype,
together with the coordinates that define that instance.
The way in which you provide those elements varies depending on the method you choose; SQL
Server implements methods that create data from several different standard formats: Well-Known
Text (WKT), Well-Known Binary (WKB), and Geography Markup Language (GML). It also exposes an
API that allows you to construct items of spatial data programmatically using the SqlGeometryBuilder
and SqlGeographyBuilder classes. Each of these methods is examined in turn and the advantages and
disadvantages of each discussed.

Creating Spatial Data from Well-Known Text


Well-Known Text is one of the standard formats defined by the Open Geospatial Consortium for the
exchange of spatial information. It is a simple, text-based format that is easy to examine and
understand. You have already seen several examples of WKT in this book; it is the format SQL Server
2012 uses to store the parameters of supported spatial reference systems in the well_known_text
column of the sys.spatial_reference_systems table, and it is also used in the examples of each of the
types of geometry demonstrated in Chapter 2.
Some of the advantages of the WKT format are:
It is a simple structured format that is easy to store and share between systems.
Because it is text-based, it is easy to examine visibly and identify the
information conveyed in a WKT representation.
However, it also has the following disadvantages.
As with any text-based representation, it is not possible to precisely state the
value of certain floating point coordinate values obtained from binary methods.
The inevitable rounding errors introduced when attempting to do so will lead to
a loss of precision.

77

www.it-ebooks.info
CHAPTER 4 CREATING SPATIAL DATA

SQL Server must parse the text in a WKT representation into its own internal
binary format, therefore creating objects from WKT may be slower than other
methods.
Because WKT is both simple to read and understand, it is the format used in most of the MSDN
online documentation and code samples, as well as in other resources. It is also the format that is most
widely used throughout this book.

Choosing a WKT Static Method


SQL Server 2012 implements dedicated static methods for creating each of the basic types of geometry
from WKT. It also implements some generic static methods that can create any of the supported kinds
of geometry. Table 4-1 lists the different methods that can be used.

Table 4-1. Methods to Instantiate Spatial Data from Well-Known Text

Geometry Static Method


Point STPointFromText()

LineString STLineFromText()

Polygon STPolyFromText()

MultiPoint STMPointFromText()

MultiLineString STMLineFromText()

MultiPolygon STMPolyFromText()

GeometryCollection STGeomCollFromText()

Any supported geometry STGeomFromText() / Parse()

All of the methods listed in Table 4-1 are implemented by both the geometry and geography
datatypes, with the datatype of any created instance matching the datatype of the method from which it
was created. Note that there are no dedicated methods to create CircularString, CurvePolygon, or
CompoundCurve geometries from WKT; these methods appear to have been overlooked in the OGC
standard and so are not implemented by SQL Server either. To create a curved geometry from WKT
you must use the generic STGeomFromText() or Parse() methods.

Passing WKT to the Method


The syntax for using any of the WKT static methods is the same: first stating the datatype to which the
method belongs followed by the method name, separated by double colons. Each method itself requires
two parameters: the first being the WKT of the geometry to be created, and then the SRID of the spatial
reference system.
This syntax is illustrated as follows.
Datatype::Method( WKT, SRID )

78

www.it-ebooks.info
CHAPTER 4 CREATING SPATIAL DATA

Lets illustrate this with a few examples. To begin, lets take the simple example of a Point located
at geographic coordinates of latitude 27.5 and a longitude of 153, measured using the WGS84 spatial
reference system, SRID 4326. This is the approximate location of Brisbane, Australia.
The Well-Known Text representation of this Point is:
POINT(153 -27.5)
We can create a geography Point instance by supplying this WKT string to the dedicated
STPointFromText() method, together with the associated SRID, as follows.
SELECT
geography::STPointFromText('POINT(153 -27.5)', 4326);
Notice that the WKT parameter is passed as a nvarchar(max) text string, and supplied in single
quotes. If you use the SqlServer.Types.dll library in a .NET application then the WKT is passed as a
SqlChars value instead, as shown in the following C# code listing.
SqlGeography Point = SqlGeography.STPointFromText(
new SqlChars("POINT(153 -27.5)"),
4326);
The STPointFromText() method can be used to create Point instances only. If you supply the WKT
representation of a different sort of geometry to the STPointFromText() method then SQL Server will
throw an exception (System.FormatException 24142). To create a LineString geometry, for example, you
should use the STLineFromText() method instead, supplying the WKT of a LineString as shown in the
following code listing.
SELECT
geometry::STLineFromText('LINESTRING(300500 600150, 310200 602500)', 27700);
If you know in advance that the data you will be creating will only be of a certain type of
geometry, then I generally recommend that you use the method dedicated to that geometry type,
STPointFromText() for Points, STPolyFromText() for Polygons, and so on. Using these methods will
provide a first sanity check of your data by ensuring that it only contains geometries of the expected
type; any other data will throw an exception.
However, there are also occasions when you require a method that will create spatial data from a
WKT string of any sort of geometry. In these situations, you can use the STGeomFromText() method
instead. The following code listing demonstrates how the STGeomFromText() method can be used to
create both the Point and LineString from the previous examples.
SELECT
geography::STGeomFromText('POINT(153 -27.5)', 4326),
geometry::STGeomFromText('LINESTRING(300500 600150, 310200 602500)', 27700);
The results obtained from the STGeomFromText() method are identical to those obtained from the
dedicated STPointFromText() or STLineFromText() methods used previously.

Note There is no performance benefit from using a geometry type-specific method such as
STPointFromText() rather than the generic STGeomFromText() method. The sole advantage is that it restricts the
types of geometry that will be accepted in the WKT input (and therefore, by implication, the type of geometry that
will be returned by the method). If this is your objective, you might also want to consider adding a CHECK
constraint that tests the value returned by the STGeometryType() method at the point that a geometry is inserted
into a table, as demonstrated in Chapter 3.

79

www.it-ebooks.info
CHAPTER 4 CREATING SPATIAL DATA

Another alternative is to use the generic Parse() method. In common with the STGeomFromText()
method, the Parse() method will create an item of spatial data from any supplied WKT representation.
The difference is that, unlike the other static methods, Parse() does not require you to set an SRID; the
spatial reference system is assumed based on the datatype being used. For the geography datatype, the
Parse() method always uses SRID 4326, whereas for the geometry datatype it is SRID 0.
If you attempt to set the value of a geometry or geography column or variable directly from a
character string, the string will be treated as WKT and passed to the Parse() method. As such, the
following T-SQL code listing,
DECLARE @Delhi geography = 'POINT(77.25 28.5)';
produces exactly the same result as
DECLARE @Delhi geography = geography::Parse('POINT(77.25 28.5)';
which, in turn, is equivalent to
DECLARE @Delhi geography = geography::STGeomFromText('POINT(77.25 28.5)', 4326);
As long as you are dealing with the common cases of using the geography datatype with SRID 4326,
or the geometry datatype with SRID 0, you can therefore use the Parse() method as a convenient way of
shortening your code.
To demonstrate the same example using the static methods provided by the SqlGeography or
SqlGeometry classes in a .NET application, you will probably find it easier to write:
SqlGeography Delhi = SqlGeography.Parse("POINT(77.25 28.5)");
compared to
SqlGeography Delhi = SqlGeography.STGeomFromText(
new SqlChars("POINT(77.25 28.5)"), 4326);
One subtle difference worth noting from this example is that, although STGeomFromText() accepts
the supplied WKT input as a SqlChar array, the Parse() method treats the WKT input as a SqlString.
SqlChar values can be streamed, whereas a SqlString cannot. Using Parse() therefore requires a
contiguous block of memory to be allocated for the entire supplied WKT string. This, combined with the
fact that Parse() can only be used for creating instances defined using limited SRIDs, means that you
will need to decide whether it is suitable for use in a particular scenario compared to the more verbose
STGeomFromText() method.

Note All of the static methods that operate on WKT input expect decimal coordinate values to be represented
using a decimal point (.) to separate the integral and fractional parts of the coordinate, for example, 52.61.
Depending on the regional settings of your database or operating system, you may find that coordinate values are
instead displayed and represented using other coordinate separators, such as the comma in 52,61. When
supplying WKT for any of the static methods listed in this section, be sure to check the culture settings of your
application.

80

www.it-ebooks.info
CHAPTER 4 CREATING SPATIAL DATA

Retrieving the WKT Representation of an Instance


Well-Known Text is simply one format in which geometry and geography data can be represented.
Remember that, internally, SQL Server actually stores spatial data in a binary format, not as a text
string. If you want to retrieve the Well-Known Text of an existing geography or geometry instance, you
can use one of the appropriate instance methods: STAsText(), AsTextZM(), or ToString().
STAsText() is the OGC-compliant method for retrieving the Well-Known Text
of an instance. The returned nvarchar (or SqlChars) WKT string will only
contain 2D coordinates (x and y, or latitude and longitude). It will not contain
any z- or m-coordinate values associated with the geometry.
AsTextZM() is an extended method that returns the Well-Known Text of an
instance as an nvarchar(max) or SqlChars value, including any z- and m-
coordinate values defined by the geometry.
ToString() is a method defined by the Object base class in the .NET
framework, and therefore is inherited and implemented by all classes,
including the geometry and geography datatypes. The purpose of the ToString()
method is to convert any object into a string value suitable for display
purposes. When implemented by the geometry and geography datatypes, the
ToString() method retrieves the WKT of an instance including z-and m-
coordinate values as an nvarchar(max) value, just as AsTextZM() method does.
When called on a SqlGeography or SqlGeometry instance in .NET code,
ToString() returns a string rather than SqlChars returned by AsTextZM().
Apart from the datatype of the returned value, there is no difference between
the AsTextZM() and ToString() methods.
The following code listing demonstrates the output of the STAsText(), AsTextZM(), and ToString()
methods when called on a geometry Point instance containing x-, y-, and z-coordinate values.
DECLARE @Point geometry = geometry::STPointFromText('POINT(14 9 7)', 0);
SELECT
@Point.STAsText() AS STAsText,
@Point.AsTextZM() AS AsTextZM,
@Point.ToString() AS ToString;
The results are shown as

STAsText AsTextZM ToString


POINT (14 9) POINT (14 9 7) POINT (14 9 7)

Creating Spatial Data from Well-Known Binary


The Well-Known Binary format, like the WKT format, is a standardized way of representing spatial
data defined by the OGC. In contrast to the text-based WKT format, WKB represents a geometry or
geography instance as a binary value. Every WKB representation begins with a header section that
defines the type of geometry being represented, and the order in which bytes are expressed (big-
endian or little-endian). Depending on the type of geometry, the header may also contain additional
descriptive information such as the number of geometries contained within a multielement instance,
or the number of rings contained in a Polygon geometry. Following the information in the header, a

81

www.it-ebooks.info
CHAPTER 4 CREATING SPATIAL DATA

WKB representation lists a stream of 8-byte values representing the coordinates of each point in the
geometry.
Although SQL Server 2012 stores spatial data internally as a stream of binary data, it is not the
same as the WKB binary data format. As a result, you cannot directly set the value of an item of
geography or geometry data from a WKB representation. Instead, you must pass that WKB
representation to one of the appropriate static methods. Likewise, if you directly select the internal
binary value that SQL Server uses to store an item of spatial data, it will not be the same as the WKB
representation of that feature. One difference between WKB and SQL Server's own internal binary
format is that SQL Server serializes the spatial reference identifier and other properties related to the
geometry not present in the WKB representation.

Note The WKB format has some similarities with the internal binary format that SQL Server uses to store
geometry and geography data, but they are not the same. As in WKT, creating spatial data from WKB requires
passing that WKB representation to a suitable static method.

The following are some advantages of the WKB format.


Creating objects from WKB is faster than using static methods based on a text-
based representation such as WKT. Each x- and y- (or latitude and longitude)
coordinate value in WKB is stored on 8-byte binary boundaries, as they are in
SQL Server's own internal storage representation. The WKB static methods can
therefore efficiently process and create the associated instance from WKB,
rather than in WKT or GML where the parser must read in the whole text
representation first.
Because it is a binary format, WKB maintains the precision of floating-point
coordinate values calculated from binary operations, without the rounding
errors introduced in a text-based format.
However, WKB also has the following significant disadvantage.
Binary values cannot be easily understood by a human reader; it can therefore
be hard to detect errors in a WKB representation that could have been easily
spotted from examining the equivalent WKT or GML representation.

Choosing a WKB Static Method


Just as for the WKT format, SQL Server 2012 provides a specific method for creating each type of
geometry from a Well-Known Binary representation, as well as a generalized method
STGeomFromWKB() for creating any type of object from valid WKB. The list of static methods that can be
used to create geography or geometry data from WKB is shown in Table 4-2.

82

www.it-ebooks.info
CHAPTER 4 CREATING SPATIAL DATA

Table 4-2. Methods to Instantiate Spatial Data from Well-Known Binary

Geometry Static Method

Point STPointFromWKB()

LineString STLineFromWKB()

Polygon STPolyFromWKB()

MultiPoint STMPointFromWKB()

MultiLineString STMLineFromWKB()

MultiPolygon STMPolyFromWKB()

GeometryCollection STGeomCollFromWKB()

Any supported geometry STGeomFromWKB()

To demonstrate the use of these methods, let's first take a look at an example WKB representation
of a Point geometry:
0x00000000014001F5C28F5C28F6402524DD2F1A9FBE
The elements of this binary string are broken down in Table 4-3.

Table 4-3. Elements Contained Within an Example WKB Geometry Representation

Value Description

0x Hexadecimal notation identifier

00 Byte order marker. 000 indicates little-endian byte order

00000001 This geometry is a Point, denoted as type 1

4001F5C28F5C28F6 x-coordinate (10.572)

402524DD2F1A9FBE y-coordinate (2.245)

To create a geometry Point instance from this WKB representation, using the Qatar National Grid
(SRID 2099), you can use the STPointFromWKB() method as follows.
SELECT
geometry::STPointFromWKB(0x00000000014001F5C28F5C28F6402524DD2F1A9FBE, 2099);
Or, you can use the generic STGeomFromWKB() method, which can be used to create any type of
geometry from WKB:

83

www.it-ebooks.info
CHAPTER 4 CREATING SPATIAL DATA

SELECT
geometry::STGeomFromWKB(0x00000000014001F5C28F5C28F6402524DD2F1A9FBE, 2099);

Note The spatial reference identifier is not serialized as part of the WKB binary string, so it must be provided as
the second parameter to any static methods that instantiate geometries from WKB.

Representing an Existing Geometry as WKB


Just as SQL Server provides the STAsText() method to retrieve the WKT of a geometry, so too does it
provide a method to retrieve the WKB representation of an instance. In order to retrieve the WKB
representation of a geometry or geography instance you can use the STAsBinary() method, as follows.
DECLARE @g geometry = geometry::STPointFromText('POINT(14 9 7)', 0);
SELECT
@g.STAsBinary();
The result is as follows.

0x01010000000000000000002C400000000000002240

Note that, like its WKT sister, STAsText(), the WKB representation produced by the STAsBinary()
method specifies coordinates in two dimensions only: x and y for the geometry datatype, or latitude and
longitude for the geography datatype. You can confirm this by converting the Well-Known Binary result
above back to Well-Known Text:
-- Declare point containing x, y, and z coordinates
DECLARE @g geometry = geometry::STPointFromText('POINT(14 9 7)', 0);

-- Convert to WKB using STAsBinary()


DECLARE @WKB varbinary(max) = @g.STAsBinary();

-- Now create a new geometry instance from this WKB


DECLARE @h geometry = geometry::STPointFromWKB(@WKB, 0);

--Retrieve the Well-Known Text of the new geometry


SELECT @h.AsTextZM();
The geometry created from the WKB retrieved from the STAsBinary() method is now simply
POINT(14 9), with no z-coordinate value. In order to create a WKB representation that retains the full
fidelity of the original point supplied in this example, which has x-, y-, and z-coordinates, you can use
the AsBinaryZM() method instead. This will serialize any geometry or geography instance into WKB
format including x-, y-, z-, and m-coordinate values, as follows.
DECLARE @g geometry = geometry::STPointFromText('POINT(14 9 7)', 0);
SELECT @g.AsBinaryZM();
The WKB result produced by the AsBinaryZM() method in this example, which is a longer binary
string containing three coordinate values, is:

84

www.it-ebooks.info
CHAPTER 4 CREATING SPATIAL DATA

0x01E90300000000000000002C4000000000000022400000000000001C40

Creating Spatial Data from Geometry Markup Language


Geometry Markup Language is an XML-based language for representing spatial information. In
common with all XML dialects, GML is a very explicit and highly structured format, where each
property of the geometry is contained within specific element tags within the document structure.
The following code listing demonstrates an example of the GML representation of a Point located
at a latitude of 47.6 degrees North, longitude 122.3 degrees West.
<Point xmlns="http://www.opengis.net/gml">
<pos>47.6 -122.3</pos>
</Point>
Note that, unlike in WKT, the GML format states geographic coordinates in latitudelongitude
order rather than longitudelatitude order (although geometric coordinates are listed in xy order in
both WKT and GML).
GML does not support z- or m-coordinates, so every coordinate pair is made up of exactly two
values, separated by a space. To define a single coordinate pair, as used to create a Point geometry,
those values are listed within the GML <pos> element. To define a LineString or Polygon containing
more than one pair of coordinate points, the values of each coordinate pair are instead listed within
the GML <posList> element, with each coordinate pair also separated by a space.
Note that GML uses a space both as the separator between individual values within a coordinate
tuple, as well as to separate each coordinate tuple from the next. There is no need to differentiate these
delimiters from each other because every GML coordinate tuple contains exactly two values; therefore
the elements in a <posList> can be parsed into an array of points by breaking it apart following every
other space.
The following code listing demonstrates the GML representation of a LineString connecting three
points at (6,4), (3,5), and (10,8).
<LineString xmlns="http://www.opengis.net/gml">
<posList>-6 4 3 -5 10 8</posList>
</LineString>

Tip You can return the GML representation of any existing geography or geometry instance by calling the
AsGml() method.

Some advantages of the GML format are as follows.


GML is text-based, making it relatively easy to examine and understand the
information contained within.
The explicit structure of a GML document mirrors the structure of a geometry
itself; a GML <Polygon> contains an <exterior> element, which specifies a
<LinearRing> containing an array of coordinates in a <posList>, for example.
This makes it easy to understand the structure of a complex geometry by
examining the structure of the associated GML representation.

85

www.it-ebooks.info
CHAPTER 4 CREATING SPATIAL DATA

GML is very verbose, explicitly stating all values within specific elements.
However, GML also has the following disadvantages.
It is very verbose! Although both WKT and GML are text-based formats, the GML
representation of a geometry requires substantially more space than the
equivalent WKT representation
Because GML is text-based, it too suffers from precision issues caused by
rounding of binary floating-point values.
GML is most commonly used for representing spatial information in an XML-based environment,
including when syndicating spatial data over the Internet.

Note The GML methods implemented in SQL Server are based on a scaled-down version of the GML 3.1.1
schema. You can view the schema used in SQL Server at http://schemas.microsoft.com/sqlserver/profiles
/gml/ or you can find the full GML standards on the OGC website, located at
http://www.opengeospatial.org/standards/gml.

GeomFromGml()The Only GML Static Method


Unlike the WKT and WKB formats, SQL Server does not provide different methods for creating each type
of geometry from GML; every geometry, whether Point, LineString, Polygon, or GeometryCollection, is
created using the same generic static method, GeomFromGml(). The geometry returned by the
GeomFromGml() method is determined by the structure and content of the GML representation supplied.
The GeomFromGml() method is implemented by both geometry and geography datatypes. The
following code listing demonstrates its usage to create a geography Point instance at a latitude of 47.6,
longitude of 122.3 West using the North American datum 1983 (SRID 4269):
DECLARE @gml xml =
'<Point xmlns="http://www.opengis.net/gml">
<pos>47.6 -122.3</pos>
</Point>';

SELECT
geography::GeomFromGml(@gml, 4269);

The GML Namespace


The xmlns attribute of an XML element associates that element with a particular namespace. XML
elements are unique only within a given namespace, so <Point xmlns="http://www.opengis.net/gml">
is different from <Point xmlns="http://www.someothernamespace.com">, or just <Point> with no
associated namespace.
In a GML representation, every element must belong to the GML namespace,
http://www.opengis.net/gml. This unambiguously defines that element as being a GML element,
rather than an element of the same name from any other XML namespace. To ensure that a GML
representation is valid, you should always attach the xmlns="http://www.opengis.net/gml" attribute to
the top-level tag of the GML document. This namespace will then be inherited by all of the child
elements nested within that element.

86

www.it-ebooks.info
CHAPTER 4 CREATING SPATIAL DATA

Using the example of a LineString, the GML representation should therefore always be formed as
follows.
<LineString xmlns="http://www.opengis.net/gml">
<posList>-6 4 3 -5</posList>
</LineString>
If you omit the namespace, you will still have valid, well-formed XML, but it will no longer define
a valid GML geometry. Attempting to create a geometry from such a representation using the
GeomFromGml() method will result in an error as shown in the next example:
DECLARE @NoGMLNameSpace xml =
'<LineString>
<posList>-6 4 3 -5</posList>
</LineString>';

SELECT geometry::GeomFromGml(@NoGMLNameSpace, 0);


The error received is rather confusing, with the FormatException message being given as

System.FormatException: 24129: The given XML instance is not valid because the top-
level tag is LineString. The top-level element of the input Geographic Markup
Language (GML) must contain a Point, LineString, Polygon, MultiPoint,
MultiGeometry, MultiCurve, MultiSurface, Arc, ArcString, CompositeCurve,
PolygonPatch or FullGlobe (geography Data Type only) object.

The text of the error message states that the GML representation passed to the GeomFromGml()
method is invalid because the top-level tag is LineString. Instead, it helpfully suggests a list of possible
valid elements, including LineString?
As explained previously, XML elements are unique only within a particular namespace; what the
preceding error message really should say is that the top-level element of the input GML must be one
of the listed elements from the GML namespace. Declaring the GML namespace as an attribute on the
parent element tag resolves the error, as shown in the following example.
DECLARE @WithGMLNameSpace xml =
'<LineString xmlns="http://www.opengis.net/gml">
<posList>-6 4 3 -5</posList>
</LineString>';

SELECT geometry::GeomFromGml(@WithGMLNameSpace, 0);

(1 row(s) affected)

Representing an Existing Geometry as GML


The GML equivalent to the STAsText() and STAsBinary() methods is called AsGml(). The AsGml() method
can be used to return the GML representation of any instance of geography or geometry data, as shown
in the following code listing.
DECLARE @polygon geography = 'POLYGON((-4 50, 2 50, 2 60, -4 60, -4 50))';
SELECT @polygon.AsGml();

87

www.it-ebooks.info
CHAPTER 4 CREATING SPATIAL DATA

The result is:


<Polygon xmlns="http://www.opengis.net/gml">
<exterior>
<LinearRing>
<posList>50 -4 50 2 60 2 60 -4 50 -4</posList>
</LinearRing>
</exterior>
</Polygon>
Note that, because this is an instance of the geography datatype, the latitude and longitude values
of each coordinate pair in the resulting GML are listed in reverse order compared to that in the
supplied WKT.

PRACTICAL USES OF GML IN SQL SERVER

GML is a widely accepted standard in the geospatial world, and it provides a rich set of elements capable
of describing spatial features with all their associated metadata. It is used in many professional and
industrial datasets. For example, GML is the native format in which data from the Ordnance Survey (the
British government's national executive mapping agency) is distributed.
Unfortunately, SQL Server implements only a reduced subset of the full GML standard, and lacks many of
its more advanced elements. What's more, the parser used by the GeomFromGml() method will fail to parse
GML documents containing those GML elements not recognized by SQL Server, even if the document itself
adheres to the full published GML schema.
For this reason, I find the GeomFromGml() method to be of little use to create geography or geometry data
from GML documents found "in the wild," because there is no guarantee that SQL Server will be able to
parse them. The best way of knowing that SQL Server will be able to parse a particular GML file is if that
file were itself created from SQL Server's AsGml() method, but that of course implies that the data has
already been successfully imported into the database!

Creating Spatial Data from Dynamically Generated WKT


The static methods introduced thus far in this chapter create instances of geometry or geography data
from fully formed WKT, WKB, or GML representations. However, on many occasions we do not have,
nor necessarily want, such predefined representations to hand. Instead of providing a static WKT or
WKB string, you may want to create a geometry programmatically through code, for example. Creating
geometries programmatically allows you to define spatial features based on a query of an underlying
data source, or take advantage of programming logic such as conditional blocks and control-of-flow
statements to determine the resulting geometry at runtime.
The first approach to creating such "dynamic" geography or geometry data is to consider methods
that construct a WKT string in code before passing it to a static method. Because WKT is just a simple
text string, we can CAST numeric coordinates into nvarchar values and then use the range of T-SQL
string manipulation functions to construct the required WKT string.
To demonstrate this approach, suppose that you had a table containing the log data transmitted
from the GPS device in a moving vehicle. The table records the latitude and longitude of the vehicle
sampled at regular intervals in time, stored in two float columns, together with a timestamp record of
when that measurement was made, stored in a datetime column. You can create a table illustrating this
structure together with some sample data using the following code listing.

88

www.it-ebooks.info
CHAPTER 4 CREATING SPATIAL DATA

CREATE TABLE GPSLog (


Latitude float,
Longitude float,
LogTime datetime
);
INSERT INTO GPSLog VALUES
(51.868, -1.198, '2011-06-02T13:47:00'),
(51.857, -1.182, '2011-06-02T13:48:00'),
(51.848, -1.167, '2011-06-02T13:49:00'),
(51.841, -1.143, '2011-06-02T13:50:00'),
(51.832, -1.124, '2011-06-02T13:51:00');
Now suppose that we wanted to create Point instances corresponding to each record in the table.
Because the coordinates were recorded from a GPS system, we know that the SRID is 4326, and because
this is a geographic coordinate system we should use the geography datatype.
Therefore, we could use T-SQL string methods to concatenate the relevant elements of a WKT
representation for each Point and pass them to the STGeomFromText() method:
SELECT geography::STGeomFromText(
'POINT(' + CAST(Longitude AS varchar(32)) + ' ' + CAST(Latitude AS varchar(32)) + ')',
4326
)
FROM GPSLog;
The preceding code listing uses the CAST statement to convert each coordinate to a nvarchar value.
It also concatenates the two coordinates, separated by a space, and appends round brackets and the
POINT keyword around the outside. In other words, we've dynamically constructed the WKT
representation of each row in the table, and then used this to create a Point instance using the
STGeomFromText() method.
This example doesn't really provide much value; we could have achieved the same result more
easily by supplying the numeric latitude and longitude values directly to the Point() method, as follows.
SELECT geography::Point(Latitude, Longitude, 4326) FROM GPSLog;
The Point() method is a static method that can be used to create Point geometries from three
parameters: two float coordinate values and an integer SRID. Not only does using Point() make this
code simpler to read than the previous example, but because it doesn't require unnecessary CASTing
and string concatenation to create the WKT representation, it will almost certainly be faster too.

Note When using the geography Point() method, the coordinate parameters are supplied with latitude first,
then longitude, which is opposite to the order in which they are stated in WKT.

But what if we wanted to create something a bit more complex? Suppose that, rather than create
individual Point instances for each row in the GPSLog table, you wanted to create a LineString joining
each of the points in order. There is no inbuilt function that accepts an array of coordinates and returns
the LineString created from them, so dynamically constructing the WKT may be a better choice here.
The following code listing demonstrates one way of achieving this.
-- Declare an empty nvarchar to hold our constructed WKT string
DECLARE @WKT nvarchar(max) = '';

-- Build up the comma-delimited list of coordinate pairs

89

www.it-ebooks.info
CHAPTER 4 CREATING SPATIAL DATA

SELECT @WKT = @WKT + CAST(Latitude AS varchar(32)) + ' ' + CAST(Longitude AS varchar(32)) +


','
FROM GPSLog
ORDER BY LogTime;

-- Remove the final trailing comma


SET @WKT = LEFT(@WKT, LEN(@WKT) - 1);

-- Append the LINESTRING keyword and enclose the coordinate list in brackets
SET @WKT = 'LINESTRING(' + @WKT + ')';

-- Pass the constructed WKT to the static method


SELECT geography::STGeomFromText(@WKT, 4326);
The preceding code listing creates the desired result, illustrated in the Spatial Results tab of SQL
Server Management Studio as shown in Figure 4-1.

Figure 4-1. A LineString created from dynamically constructed WKT.

However, there are a couple of problems with this method. First, the code required to create it is
pretty ugly. Ugly code is not only unappealing to look at, but it's also hard to maintain, and it's much
more likely to conceal hidden bugs. Without the comments inserted, would it be obvious why you had to
use the LEFT function to trim the last character from the list of coordinates? The code required to
construct the WKT could arguably be cleaned up somewhat if implemented as a recursive CTE, but it
would still involve a degree of manual string manipulation.

90

4
www.it-ebooks.info
CHAPTER 4 CREATING SPATIAL DATA

It's also slow: string manipulation functions are generally not known for being efficient, and this
method requires both CASTing and concatenating of nvarchar values.
Finally, although this approach is somewhat dynamic, it's hard to include much control-of-flow or
conditional operators. Consider how much more complicated the code would quickly become if the
GPSLog table contained details of multiple vehicles that we wanted to plot as separate LineStrings, or if
we were to try to construct more complicated geometries such as Polygons from the underlying data.
The fact is that, although you can use T-SQL string functions to create dynamic WKT strings, the
primarily procedural-based T-SQL engine is not designed for this kind of operation. Fortunately, SQL
Server provides us with an alternative in the form of the SqlGeometryBuilder and SqlGeographyBuilder
classes available in SqlServer.Types.dll, which we examine in the next section.

Creating Spatial Data Using the Builder Classes


As explained in Chapter 3, the spatial functionality of SQL Server's geometry and geography datatypes is
contained in a compiled .NET assembly, SqlServer.Types.dll. This assembly is installed by SQL Server,
and located in the /100/SDK/Assemblies subdirectory of the directory in which SQL Server is installed.
In addition to the SqlGeometry and SqlGeography classes, which mirror the functionality of the
geometry and geography datatypes in SQL Server, the SqlServer.Types.dll assembly also provides
additional SqlGeometryBuilder and SqlGeographyBuilder classes, which are only accessible via .NET code.
The SqlGeometryBuilder and SqlGeographyBuilder classes provide a simple core set of methods
that can be used to programmatically build geometry and geography instances, respectively. These can
either be created in the application layer or, if exposed via a CLR UDF or stored procedure, in SQL
Server itself.
This section demonstrates how to create spatial data using the SqlGeometryBuilder class in a C#
console application, but the same procedure can be followed whatever the tier in which this code is used.

Tip The SqlGeometryBuilder and SqlGeographyBuilder classes can be used in any .NET application that
references the SqlServer.Types.dll library, and are not dependent on SQL Server being installed.

Configuring a .NET Console Application for Spatial Data


To demonstrate the basic approach required to use the Builder classes, we start by creating a simple C#
console application that creates a geometry, and then prints the Well-Known Text of that geometry to
the screen. To do so:
Load up Visual Studio and, from the File menu, select New Project.
In the following dialogue box, click to expand the set of Visual C# project types,
and highlight Console Application.
Choose a name and location for your project, and click OK.

These steps are illustrated in Figure 4-2.

91

www.it-ebooks.info
CHAPTER 4 CREATING SPATIAL DATA

Figure 4-2. Creating a new spatial console application.

When your project is first created, youll see the default Program.cs code file in which we insert the
code for our application. But, before we do so, we need to include a reference to the
Microsoft.SqlServer.Types.dll library.
Select Project Add Reference.
On the Add Reference dialogue box that appears, ensure that the .NET tab is
selected. Scroll down the list until you find Microsoft.SqlServer.Types and click
to highlight it.
Click OK.
These steps are illustrated in Figure 4-3.

92

www.it-ebooks.info
CHAPTER 4 CREATING SPATIAL DATA

Figure 4-3. Adding a reference to the Microsoft.SqlServer.Types library.

Note Depending on your system configuration, the Microsoft.SqlServer.Types library might not
automatically show up in the list of installed components under the Visual Studio .NET tab. In such cases, you can
locate the library manually by clicking the Browse tab, navigating to \Program Files (x86)\Microsoft SQL
Server\110\SDK\Assemblies and highlighting Microsoft.SqlServer.Types.dll in that directory.

Once the reference has been added to the project, we add a using directive so that we can easily
reference the methods contained in the Microsoft.SqlServer.Types library without having to specify
the namespace each time. Add the following line to the top of your Program.cs file,
using Microsoft.SqlServer.Types;
The project is now set up and ready to use the spatial datatypes, and we can get on with writing the
body of our code.

Constructing a Simple Point


The first thing to do is to create a new instance of the appropriate builder class. There are two builder
classes to choose from: SqlGeographyBuilder constructs SqlGeography instances, whereas
SqlGeometryBuilder, unsurprisingly, constructs SqlGeometry instances.
For this first example, we will create a geometry Point instance, so we use the corresponding
SqlGeometryBuilder class. Add the following code as the first line in the Main method of the application.

93

www.it-ebooks.info
CHAPTER 4 CREATING SPATIAL DATA

SqlGeometryBuilder gb = new SqlGeometryBuilder();


After instantiating the appropriate builder class, the next thing is to specify the spatial reference
system in which the coordinates of this geometry will be defined. You do this by passing the
appropriate SRID to the SetSrid() method of the builder instance. The following example illustrates
how to set the spatial reference of the geometry to SRID 27700.
gb.SetSrid(27700);

Caution You must call SetSrid() to set the SRID of the builder immediately after it is created, and before
adding any points.

The next step is to specify the type of geometry that will be created. For the SqlGeometryBuilder,
you do this by passing a member of the OpenGisGeometryType enumeration to the BeginGeometry()
method. The equivalent for the SqlGeographyBuilder class is to pass an OpenGisGeographyType to the
BeginGeography() method. There is one enumeration for every type of geometry that can be created.
To begin creating a Point geometry using the SqlGeometryBuilder, you use the following code.
gb.BeginGeometry(OpenGisGeometryType.Point);

Note The SqlGeometryBuilder and SqlGeographyBuilder classes have a near identical set of methods,
except that every occurrence of the word "geometry" becomes "geography". So, BeginGeometry() corresponds
to BeginGeography(), EndGeometry() becomes EndGeography(), and ConstructedGeometry is equivalent to
ConstructedGeography.

The first set of coordinates of the geometry are specified with a call to the BeginFigure() method.
The following code listing creates a point at an x-coordinate of 300500 and y-coordinate of 600200.
gb.BeginFigure(300500, 600200);
If the geometry contains more than a single Point, then additional line segments are added by
calling the AddLine() method. Because the Point we are defining in this example contains only one
pair of coordinates, we do not need to use AddLine(), and we can now end the figure:
gb.EndFigure();
A SqlGeometry Point instance contains only a single figure, so at this point we can end the
geometry:
gb.EndGeometry();
Having ended the geometry, we can then retrieve the constructed SqlGeometry Point via the
ConstructedGeometry property of the SqlGeometryBuilder instance:
SqlGeometry Point = gb.ConstructedGeometry;
This is a simple demonstration application, therefore we then just print the WKT of the
constructed geometry to the console window, using the ToString() method:

94

www.it-ebooks.info
CHAPTER 4 CREATING SPATIAL DATA

Console.WriteLine(Point.ToString());
Here's the full code listing for Program.cs:
using System;
using System.Collections.Generic;
using System.Linq;
using System.Text;
using System.Data.SqlTypes;
using Microsoft.SqlServer.Types;

namespace ProSpatial.Ch4
{
class Program
{
static void Main(string[] args)
{

// Create a new instance of the SqlGeographyBuilder


SqlGeometryBuilder gb = new SqlGeometryBuilder();

// Set the spatial reference identifier


gb.SetSrid(27700);

// Declare the type of geometry to be created


gb.BeginGeometry(OpenGisGeometryType.Point);

// Add the coordinates of the first (and only) point


gb.BeginFigure(300500, 600200);

// End the figure


gb.EndFigure();

// End the geometry


gb.EndGeometry();

// Retrieve the constructed geometry


SqlGeometry Point = gb.ConstructedGeometry;

// Print WKT of the geometry to the console window


Console.WriteLine(Point.ToString());

// Wait for user input before exiting


Console.ReadLine();
}
}
}
Hit F5 to debug the application, and you should see the WKT of the constructed geometry instance
displayed as

POINT (300500 600200)

95

www.it-ebooks.info
CHAPTER 4 CREATING SPATIAL DATA

Building Polygons with Multiple Rings


Let's now look at some slightly more complex examples, the first of which is to create a geometry
Polygon containing an interior ring. The steps involved are similar to the preceding example, except
that each ring in the Polygon begins with a new call to the BeginFigure() method, stating the
coordinates of the first point in that ring. Additional coordinates are added to each ring using the
AddLine() method, which adds a straight line segment from the previous point to the given coordinates.
Polygon rings must be closed, therefore the coordinates specified in the final call to AddLine() must
always be the same as those provided to the BeginFigure() method used to commence that ring. The
following code listing demonstrates this approach.
// Create a new instance of the SqlGeometryBuilder
SqlGeometryBuilder gb = new SqlGeometryBuilder();

// Set the spatial reference identifier


gb.SetSrid(0);

// Declare the type of geometry to be created


gb.BeginGeometry(OpenGisGeometryType.Polygon);

// Exterior ring
gb.BeginFigure(0, 0);
gb.AddLine(10, 0);
gb.AddLine(10, 20);
gb.AddLine(0, 20);
gb.AddLine(0, 0);
gb.EndFigure();

// Interior ring
gb.BeginFigure(3, 3);
gb.AddLine(7, 3);
gb.AddLine(5, 17);
gb.AddLine(3, 3);
gb.EndFigure();

// End the geometry and retrieve the constructed instance


gb.EndGeometry();
SqlGeometry Polygon = gb.ConstructedGeometry;
The Polygon instance created by this code listing is equivalent to the following WKT.

POLYGON ((0 0, 10 0, 10 20, 0 20, 0 0), (3 3, 7 3, 5 17, 3 3))

Note To construct a Polygon geometry using the SqlGeometryBuilder class, each ring requires a separate call
to the BeginFigure() method. The exterior ring is the first to be created, and every subsequent figure defines an
interior ring.

96

www.it-ebooks.info
CHAPTER 4 CREATING SPATIAL DATA

Creating Geometry Collections


To build a collection-type geometry, you begin with a call to BeginGeometry() (or BeginGeography()),
specifying one of either the MultiPoint, MultiLineString, MultiPolygon, or GeometryCollection
geometry types. Then, for each element within the collection you call BeginGeometry() (or
BeginGeography()) again, specifying the type of that individual geometry.
To demonstrate, the following example creates a MultiPoint geometry containing three Points.
This time, I use the SqlGeographyBuilder() class, so note that I use the corresponding geography
method names and enumerations. Also note that the coordinates passed to the BeginFigure() method
must be given in latitudelongitude order:
// Create a new instance of the SqlGeographyBuilder
SqlGeographyBuilder gb = new SqlGeographyBuilder();

// Set the spatial reference identifier


gb.SetSrid(4269);

// Declare the type of collection to be created


gb.BeginGeography(OpenGisGeographyType.MultiPoint);

// Create the first point in the collection


gb.BeginGeography(OpenGisGeographyType.Point);
gb.BeginFigure(40, -120);
gb.EndFigure();
gb.EndGeography();

// Create the second point in the collection


gb.BeginGeography(OpenGisGeographyType.Point);
gb.BeginFigure(45, -100);
gb.EndFigure();
gb.EndGeography();

// Create the third point in the collection


gb.BeginGeography(OpenGisGeographyType.Point);
gb.BeginFigure(42, -110);
gb.EndFigure();
gb.EndGeography();

// End the collection geometry and retrieve the constructed instance


gb.EndGeography();
SqlGeography MultiPoint = gb.ConstructedGeography;
The geometry created in this example is equivalent to the following WKT.

MULTIPOINT((-120 40), (-100 45), (-110 42))

Caution Be sure that all figures and geometries are closed (i.e., every BeginFigure() and BeginGeometry()
have matching EndFigure() and EndGeometry() calls) before retrieving the constructed geometry from the
Builder class.

97

www.it-ebooks.info
CHAPTER 4 CREATING SPATIAL DATA

Building Curved Geometries


To add curved segments that are used to define a CircularString, CompoundCurve, or CurvePolygon
geometry, you use the AddCircularArc() method. This works in a similar way to AddLine(), except that it
requires two sets of coordinates: an anchor point that the curve passes through and the final point of
the arc. The following code listing demonstrates how to create a CompoundCurve geometry using the
SqlGeometryBuilder class.
// Create a new instance of the SqlGeometryBuilder
SqlGeometryBuilder gb = new SqlGeometryBuilder();

// Set the spatial reference identifier


gb.SetSrid(0);

// Declare the type of geometry to be created


gb.BeginGeometry(OpenGisGeometryType.CompoundCurve);

// Add the first point of the geometry


gb.BeginFigure(50, 0);

// Define a straight line segment to the point at (50, 10)


gb.AddLine(50, 10);

// Create a circular arc segment that passes through (55,5) and ends at (60,0)
gb.AddCircularArc(55, 5, 60, 0);

// End the figure


gb.EndFigure();

// End the geometry


gb.EndGeometry();

// Retrieve the constructed instance


SqlGeometry Curve = gb.ConstructedGeometry;
The result is equivalent to the following WKT: a CompoundCurve consisting of a single straight
line segment followed by a circular arc segment.

COMPOUNDCURVE((50 0, 50 10), CIRCULARSTRING(50 10, 55 2, 60 0))

Although the examples shown here have used hard-coded coordinate values for simplicity, it is
easy to see how the SqlGeometryBuilder and SqlGeographyBuilder classes provide much more flexibility
for defining dynamic spatial instances than the static methods provided by the geometry and geography
datatypes within SQL Server.
Looking back at the earlier example of the table of GPS points, for example, you could create a CLR
User-Defined Aggregate that used the SqlGeographyBuilder class to construct a LineString representing
the route by reading the coordinate values of each point in the table and passing these to successive calls
to the AddLine() method, returning the constructed geography instance once all points had been added.

98

www.it-ebooks.info
CHAPTER 4 CREATING SPATIAL DATA

Programmatically Defining Three- and Four-Dimensional


Geometries
The BeginFigure(), AddLine(), and AddCircularArc() methods used to construct geometries with
SqlGeometryBuilder or SqlGeographyBuilder require coordinate tuples containing x and y, or latitude
and longitude values. These methods also provide overloads that will accept x-, y-, z-, and m-
coordinates for the SqlGeometryBuilder, or latitude, longitude, z, and m for the SqlGeographyBuilder.
This is demonstrated in the following code listing, which creates a SqlGeography Point instance
containing a z-coordinate value.
// Create a new instance of the SqlGeographyBuilder
SqlGeographyBuilder gb = new SqlGeographyBuilder();

// Set the spatial reference identifier


gb.SetSrid(4326);

// Declare the type of geometry to be created


gb.BeginGeography(OpenGisGeographyType.Point);

// Specify latitude, longitude, z, m coordinate values


gb.BeginFigure(52, 0.15, 140, null);

// End the figure


gb.EndFigure();

// End the geography


gb.EndGeography();

// Return the constructed instance


SqlGeography PointZ = gb.ConstructedGeography;

Note There is no overload that accepts only latitude, longitude, and z-coordinate values, only one that accepts
latitude, longitude, z-, and m-coordinates. If you do not require an m-coordinate value, simply pass a null
parameter value as in this example.

Summary
In this chapter, you saw different methods of creating instances of the geometry and geography datatype
in SQL Server.
There are static methods dedicated to creating different types of geometries
from Well-Known Text and Well-Known Binary, as well as generic methods to
create any type of geometry.
Static methods must be supplied with a fully formed representation of the
geometry to be created, together with the spatial reference identifier in which
the coordinates of that geometry are defined.

99

www.it-ebooks.info
CHAPTER 4 CREATING SPATIAL DATA

It is possible to dynamically create WKT (or WKB) representations before


passing them to the appropriate static method, but doing so can require some
awkward string manipulation.
You can use the SqlGeometryBuilder and SqlGeographyBuilder classes to
programmatically create geometry or geography instances in .NET. This facilitates
a much more manageable approach to creating dynamic spatial instances at
runtime.

100

www.it-ebooks.info
CHAPTER 5

Importing Spatial Data

In the last chapter, we examined various methods that can be used to create individual items of
geography or geometry data, for example, using a static method that accepts WKT or WKB input, or using
one of the SqlGeometryBuilder or SqlGeographyBuilder classes. Each of these methods creates only a
single item of data at a time. So what if you wanted to import an entire set of spatial data?
In this chapter, I will introduce you to some of the sources from which you can obtain publicly
available sets of spatial information, examine the formats in which those datasets are commonly
supplied, and teach you some of the techniques you can use to import that information into SQL Server.

Sources of Spatial Data


There is a wealth of existing spatial information that you can purchase from various commercial data
vendors, as well as an ever-increasing amount of data made freely available by educational
institutions and government agencies. Table 5-1 provides details of a few sources of spatial data that
you can download over the Internet.

Table 5-1. Sources of Freely Downloadable Spatial Information

Sourcea Description

http://www.geonames.org The GeoNames database contains over 8,000,000 place names


across all countries and can be downloaded free of charge.

http://geodata.grid.unep.ch/ The United Nations Geo Data Portal includes global, national,
regional, and subregional statistics and spatial data, covering
themes such as freshwater, population, forests, emissions,
climate, disasters, health, and GDP.

http://www.diva-gis.org/gData This dataset, originally designed to support the study of


biodiversity, contains a wealth of feature layers including
inland water, administrative areas, altitude, and population
density for almost every country in the world.

101

www.it-ebooks.info
CHAPTER 5 IMPORTING SPATIAL DATA

Sourcea Description

http://www.census.gov The U.S. Census Bureau Geography Division has lots of high-
quality spatial information, including a gazetteer, Zip Code
Tabulation Areas (ZCTAs), and the TIGER database of streets,
rivers, railroads, and many other geographic entities (United
States only).

http://earth-info.nga.mil/gns/html/ The U.S. National Geospatial-Intelligence Agency (NGA)


GEOnet Names Server (GNS) is the official U.S. repository of
all foreign place names, containing information about
location, administrative division, and quality.

http://www.ordnancesurvey.co.uk The Ordnance Survey is the executive mapping agency of


/products/os-opendata.html Great Britain. Its range of OpenData products includes
transportation, natural features, administrative, and postcode
data for Great Britain.

http://www.openstreetmap.org A collaboratively edited, openly available streetmap of the


whole world (think of it as the "Wikipedia" of Web maps). The
OSM planet database has over 250 Gb of vector data available
for download in XML format.

http://geodata.gov/wps/portal/gos The U.S. government Geospatial One Stop Web page of


geographic data contains classified links to a variety of sources
covering areas including ecology, geology, health,
transportation, and demographics.

a There may be restrictions on the use of data obtained from these sources. Please refer to the respective providers for
specific details.

Spatial datasets obtained from the sources listed in Table 5-1 may be provided in a variety of
different formats, and may contain a significant volume of data. A full download of U.S. census data,
for example, is several hundred gigabytes in size.
The remainder of this chapter describes some of the alternative formats in which spatial data is
commonly supplied, and explains techniques that you can use to import this data into SQL Server 2012.

Importing Tabular Spatial Data


The most abundant (and also the simplest) source of freely available geographic information generally
takes the form of a list of place names, together with separate columns containing the latitude and
longitude coordinates of each location. These tabular sources may also contain other columns of related
information, such as associated demographic or economic measures. Information presented in this
format is commonly known as a gazetteer, a dictionary geographic information. Spatial information in

102

www.it-ebooks.info
CHAPTER 5 IMPORTING SPATIAL DATA

gazetteer format resembles the index that can be found at the back of an atlas, which may list place
names, or road names, for example, together with the location of that item on the map.
To demonstrate how to add a column of spatial data to a table containing columns of latitude and
longitude (or Northing and Easting coordinates), let's consider an example using a file of earthquake data
provided by the United States Geological Survey (USGS). The USGS makes a number of datasets freely
available that you can download from their website at http://www.usgs.gov. One such dataset lists real-
time, worldwide earthquakes from the past seven days, which you can download directly from
http://earthquake.usgs.gov/eqcenter/catalogs/eqs7day-M1.txt. This file is a comma-separated list of
data containing various attributes of each earthquake in columnar format, as listed and described in Table
5-2.

Table 5-2. Columns of Datae in the eqs7day-M1.txt File

Column Description e Length

Src The two-character identifier of the source String [DT_STR] 2


network that contributed the data

Eqid The unique identifier for this earthquake String [DT_STR] 8

Version The version number String [DT_STR] 1

Datetime A text string describing the date at which the String [DT_STR] 50
recording was made

Lat The latitude of the earthquake epicenter, stated Float [DT_R4] -


in the EPSG:4326 spatial reference system

Lon The longitude of the earthquake epicenter, stated Float [DT_R4] -


in the EPSG:4326 spatial reference system

Magnitude The magnitude of the earthquake, determined by Float [DT_R4] -


the strength of the seismic waves detected at each
station

Depth The depth of the earthquakes center, measured Float [DT_R4] -


in kilometers

NST The number of reporting stations Two-byte signed -


integer [DT_I2]

Region A text string description of the area in which the String [DT_STR] 255

103

www.it-ebooks.info
CHAPTER 5 IMPORTING SPATIAL DATA

earthquake occurred

To obtain a copy of this data, follow these steps.


1. Load your Web browser and, in the address bar, type the following URL address,
http://earthquake.usgs.gov/eqcenter/catalogs/eqs7day-M1.txt. The
browser will show the contents of the latest feed, as demonstrated in the
example in Figure 5-1.
2. Save this file to an accessible location by choosing File Save As (or Save
Page As, depending on your browser). You will be prompted for a file name
and location. For this example, it is assumed that you name the file eqs7day-
M1.txt and save it to the C:\Spatial folder.

Note Because the eqs7day-M1.txt file contains a constantly updated feed of data from the last seven days, the
actual content of the file you download will be different from that demonstrated in this chapter.

Figure 5-1. The USGS earthquake data file.

104

www.it-ebooks.info
CHAPTER 5 IMPORTING SPATIAL DATA

Importing the Text File


There are a number of different ways to import data into SQL Server 2012. This example uses the
Import and Export Wizard, which allows you to step through the creation of a simple package to move data
from a source to a destination. If you want to create a repeatable load process, or have more control
over the way in which the data is imported, you may instead prefer to create a custom SSIS package, or
use the BCP Bulk Copy tool.
The steps to import the earthquake file using the Import and Export Wizard follow.
1. From the Object Explorer pane in Microsoft SQL Server Management Studio,
right-click the name of the database into which you would like to import the
data, and select Tasks Import Data.
2. The Import and Export Wizard appears. Click Next to begin.
3. The first page of the wizard prompts you to choose a data source. Select Flat
File Source from the Data Source drop-down list at the top of the screen.
4. Click the Browse button and navigate to the eqs7day-M1.txt text file that you
saved earlier. Highlight the file and click Open.
5. By default, the Text Qualifier field for the connection is set to <none>. The
text strings within the eqs7day-M1.txt file are contained within double
quotes, so change this value to be a double quote character (") instead.
6. The eqs7day-M1.txt text file contains headings, so check the Column Names
in the First Data Row check box.
7. Click the Advanced option in the left pane. Click each column in turn and,
from the properties pane on the right side, amend the values of the DataType
and OutputColumnWidth fields to match the values shown in Table 5-1.
8. Once you have made the appropriate changes, click the Next button. The
wizard prompts you to choose a destination.
9. Enter any authentication details required to connect to the chosen SQL
Server 2012 database, and then click Next.
10. The wizard prompts you to select source tables and views. By default, the wizard
automatically creates a destination table called eqs7day-M1. You can leave this
name if you like, but it's generally not a good idea to use table names that
contain nonalphanumeric characters. Instead, I suggest you double-click on the
table name in the destination column and edit it to remove the minus sign,
making the destination table simply eqs7dayM1. Then click Next.
11. On the Save and Run Package screen, click Finish (depending on the edition
of SQL Server 2012 you are using, this may be called Run Package). The
package summary appears, and you are prompted to verify the details.
12. Click Finish again to execute the package.
You will receive a message informing you that the execution was successful, and stating the
number of rows transferred from the text file into the destination table. You may now close the wizard
by clicking the Close button.
Lets check the contents of the new table. You can do this by opening a new query window and
issuing the following command against the database to which you just imported the earthquake data.
SELECT * FROM eqs7dayM1;

105

www.it-ebooks.info
CHAPTER 5 IMPORTING SPATIAL DATA

You will see the data inserted from the text file, as shown in Figure 5-2.

Figure 5-2. The data inserted from the eqs7day-M1.txt file.

Creating a Computed Column


The location of each earthquake is currently described in the eqs7dayM1 table by latitude and longitude
coordinate values stored in the Lat and Lon columns. In order to use any of the spatial methods
provided by SQL Server, we need to use these coordinates to create a representation of each
earthquake using the geography or geometry datatype instead. To do so, we can create a new computed
column containing a Point geometry representing the epicenter of each earthquake using the Point()
method of the geography datatype. This is demonstrated in the following code listing.
ALTER TABLE eqs7dayM1
ADD Epicenter AS geography::Point(Lat, Lon, 4326);

To test the contents of the Epicenter column, you can now run the following query,
SELECT TOP 5
Eqid,
Epicenter.STAsText() AS Epicenter
FROM
eqs7dayM1;
The results are as follows.

Eqid Epicenter
10325561 POINT (-150.3317 65.0131)
10325555 POINT (-152.2948 57.4106)
00349540 POINT (-119.8993 39.4092)
10325549 POINT (-149.6373 61.2189)
71655381 POINT (-121.287 36.6595)

106

www.it-ebooks.info
CHAPTER 5 IMPORTING SPATIAL DATA

Populating a Noncomputed Column


Using the Point() method, we have been able to populate the Epicenter column with Point geometries
representing the latitude and longitude of each earthquakes epicenter, which lies on the surface of the
earth. However, the origin of an earthquake (its hypocenter) normally lies deep within the earth, tens or
hundreds of miles underground. In the eqs7day-M1 dataset, the depth of the hypocenter, in
kilometers, is recorded in the Depth column. To be able to represent the position of the hypocenter of
each earthquake instead, we need to define each Point with an additional z-coordinate based on the
value of the Depth column.
The following code adds a new column, Hypocenter, to the eqs7dayM1 table. Rather than being
computed like the Epicenter column, the Hypocenter column is populated with an UPDATE statement
that sets the value of each row using a dynamically created WKT string based on the latitude,
longitude, and depth of each earthquake. Note that, because the Depth column represents a distance
beneath the earths surface, the z-coordinate of each Point is set based on the negative value of the Depth
column.
-- First, add a new column to the table
ALTER TABLE eqs7dayM1
ADD Hypocenter geography;
GO
-- Populate the column
UPDATE eqs7dayM1
SET Hypocenter =
geography::STPointFromText(
'POINT('
+ CAST(Lon AS varchar(255)) + ' '
+ CAST(Lat AS varchar(255)) + ' '
+ CAST (-Depth AS varchar(255)) + ')',
4326);
You can now select the data contained in the eqs7dayM1 table, including the Point representations
of both the epicenter and hypocenter of each earthquake, as follows.
SELECT TOP 5
Eqid,
Epicenter.ToString() AS Epicenter,
Hypocenter.ToString() AS Hypocenter
FROM
eqs7dayM1;
The results follow.

Eqid Epicenter Hypocenter


10325561 POINT(-150.3317 65.0131) POINT(-150.332 65.0131 -0.1)
10325555 POINT(-152.2948 57.4106) POINT(-152.295 57.4106 -30.1)
00349540 POINT(-119.8993 39.4092) POINT(-119.899 39.4092 -6.8)
10325549 POINT(-149.6373 61.2189) POINT(-149.637 61.2189 -33)
71655381 POINT(-121.287 36.6595) POINT(-121.287 36.6595 -7.9)

107

www.it-ebooks.info
CHAPTER 5 IMPORTING SPATIAL DATA

Comparing Computed, Noncomputed, and Persisted Columns


The Epicenter and Hypocenter columns of the eqs7dayM1 table both appear to behave in the same way:
you can include them in a SELECT statement, and apply the full range of geography methods on both.
However, there are some important differences between them due to the way they were created.
Epicenter is a computed column, and the Point instances it contains are
calculated only at runtime of a query that involves that column. The
epicenters are created using the Point() method based on the coordinates
contained in the Lat and Lon columns of the eqs7dayM1 table, which creates a
dependency on those columns remaining in the table schema. The values in
the Epicenter column are not stored anywhere, and are disposed of following
every query execution.
Hypocenter is a regular noncomputed column in which the value of every
geography Point instance is materialized and stored in the database. This
eliminates the dependency on the Lat and Lon columns (and the Depth column),
because the individual coordinate values of each earthquake can be retrieved
from the Hypocenter column using the Lat, Lon, and Z properties if necessary.
There are are advantages and disadvantages of both methods. Using a computed column requires
less storage in the database layer, because computed values are never physically stored in the table.
However, they require more overhead in the execution phase in order to determine the value of each
instance when used in a query. An advantage of computed columns is that when dealing with Point data
as in this example, you might find it easier to manage separate columns of numeric coordinates than
having to manage a single compound geography value, especially when dealing with spatial data in
which coordinates are frequently changing. To update any single coordinate of a geography instance, for
example, requires you to completely recreate the instance from a static method, which could be deferred
until query time if using a computed column. However, a drawback of computed columns is that, because
their value is only calculated at runtime, you cannot generally create an index on a computed column.
Spatial indexes are crucial to creating efficient performant queries, as discussed later in this book.
A third option worth considering is the fact that computed columns can be persisted. A persisted
computed column contains values that are defined according to a specified calculation, as in the case of
the Epicenter column, but the values of which are materialized and stored in the database as is the
Hypocenter column. The stored values of a persisted computed column are updated whenever the value
of a column used as part of its calculation changes.
Persisted computed columns can be considered to represent the best of both worlds, allowing you
to easily manage coordinate data in separate numeric columns while also seamlessly providing a
geography Point instance based on those coordinates for use in any spatial queries. And, because the
values are materialized in the database, indexes can be added to persisted computed columns.
The following code listing adds a new persisted computed column, Hypocenter_Persisted, using
the PERSISTED option following the column definition.
ALTER TABLE eqs7dayM1
ADD Hypocenter_Persisted AS geography::STPointFromText(
'POINT('
+ CAST(Lon AS varchar(255)) + ' '
+ CAST(Lat AS varchar(255)) + ' '
+ CAST (-Depth AS varchar(255)) + ')',
4326) PERSISTED;
Note that, even though the Hypocenter_Persisted column is persisted, the values it contains still
have a dependency on the Lat, Lon, and Depth columns, and you will not be able to drop any column
from a table that is referenced by a computed column definition. Persisted computed columns are not
necessarily the right option in every scenario, but it's certainly useful to be aware of them.

108

www.it-ebooks.info
CHAPTER 5 IMPORTING SPATIAL DATA

Importing Data Using OGR2OGR


OGR2OGR is a component of the Geospatial Data Abstraction Library, more commonly referred to as
GDAL. GDAL is an open-source library and command-line toolkit that provides various utilities to
read and write from different spatial data formats.
The core GDAL library provides raster data functionality for data sources such as GeoTIFF, MrSID,
and JPEG2000, whereas the OGR sublibrary provides support for vector data, including ESRI shapefiles,
MapInfo files, KML, GML, and various spatial databases including PostGIS, Oracle Spatial, and SQL
Server.
The GDAL/OGR library is relatively mature, powerful, and has an active development
community. It's also free, and licensed under a permissive MIT-style open source license. This section
shows how to use OGR2OGR, the command-line component of the OGR library, to import data from
various different sources into SQL Server.

Obtaining and Installing the GDAL/OGR Library


In common with much open source software, the GDAL/OGR library has been developed to work
across many different operating environments and to be relatively platform-independent.
Unfortunately, that means that it is not necessarily as simple to set up compared to an application that
has been tailored specifically for the Windows operating system from the outset. Nevertheless, it is not
too hard to get working, and there are two possible approaches you can follow.
One option is to download the latest GDAL source from http://download.osgeo.org/gdal and
configure and compile it using Visual Studio (or another tool from which you can compile C++). This
will give you the most flexibility to adjust some of GDAL's configuration options, such as linking in
additional optional libraries. This is also the best route to take if you want to understand how GDAL
works behind the scenes, or if you want to extend it or integrate its functionality into part of another
application or larger workflow. However, you should expect to invest a little time and effort to
understand the structure of the source code if you choose to take this route.
Alternatively, if you simply want to get hold of a working copy of the OGR2OGR executable so that
you can follow along with the examples in this section, I suggest instead that you download the
OSGeo4w package from http://osgeo4w.osgeo.org. OSGeo4w (Open Source Geospatial for Windows)
is a package containing ready-compiled Windows binaries of a range of open source geospatial tools,
including GDAL/OGR, MapServer, QGIS, and many more. I recommend OSGeo4W not only as a
convenient way to get hold of GDAL/OGR itself, but you also might find that you can make use of many
of the other included tools to integrate with spatial data in SQL Server.
Whatever option you choose, ensure that your version of the GDAL library corresponds to version
1.8.0 or greater because this is the version at which support for SQL Server was first introduced.

OGR2OGRBasic Syntax
OGR2OGR is the command-line component of the GDAL/OGR library that can be used to convert
vector data from one format to another. You can retrieve a full list of usage options by typing the
following in a command prompt window (assuming that the directory in which OGR2OGR is installed
is located in the command path).
ogr2ogr /?
The basic syntax to load data from a source to a destination is as follows.
ogr2ogr -f {OutputFormat} {Destination} {Source} {Additional Options}
To import data into SQL Server you state the {OutputFormat} as "MSSQLSpatial", and specify the
{Destination} using the connection string of the target SQL Server database. In the code listing below,

109

www.it-ebooks.info
CHAPTER 5 IMPORTING SPATIAL DATA

the contents of the {Source} file will be loaded into a new table in the ProSpatial database of the
default localhost SQL Server instance, for example:
ogr2ogr
-f "MSSQLSpatial"
"MSSQL:server=localhost;database=ProSpatial;trusted_connection=yes"
{Source}
{Additional Options}
The default OGR2OGR behavior is to create a new table in the destination database named with
the same filename as the {Source} dataset. The data from a shapefile named MyData.shp will be
imported into the SQL Server table MyData, for example. If a table with that name already exists then the
command will fail. This behavior can be modified by specifying one of the following flags in the
{Additional Options}.
-append appends new records into an existing table with the specified name.
-overwrite deletes the existing destination table and re-creates it.
-nln "tablename" inserts data into the named table rather than the default
table named based on the {Source} dataset name.
There are further {Additional Options}, some of which vary depending on the nature of the source
data. In the following sections we examine some practical scenarios that demonstrate these options, as
well as the syntax required to load from {Source} datasets of several common formats.

Importing ESRI Shapefile Data


The shapefile format was designed and is maintained by Environmental Systems Research Institute,
Inc. (ESRI). Originally developed for use in its ARC/INFO suite of GIS software, the shapefile is now a
very common format used for exchanging spatial information between all kinds of systems, and is the
format in which most commercial spatial data is supplied. Over time, a large body of spatial datasets
has been created in ESRI shapefile format.
Although a set of data provided in shapefile format is commonly referred to as a" shapefile (singular),
this is a slight misnomer because a single shapefile actually consists of several files. Each file relating
to a given shapefile dataset shares the same file name, with one of the following file extensions.
.shp: The SHP file contains the raw geometrical shape data. Each SHP file can
contain items of only one kind of geometry shape: Points, LineStrings, or
Polygons.
.shx: The SHX file maintains the shapefile index, which holds one index entry
for every shape in the shapefile document. Each index entry describes the start
position and length of the associated shape record in the SHP file.
.dbf: The DBF file contains additional nonspatial attributes of each shape. For
instance, in a shapefile containing Polygons representing the states of
America, the DBF file might contain the name of each state, its population, or
the name of its state capital.
.prj: The PRJ file gives details about the projection in which the coordinates of
the geometry data are represented, in the same format as used in the
well_known_text column of the sys.spatial_reference_systems table. When
importing a shapefile into SQL Server, this file contains the information that
is required to determine the correct spatial reference identifier (SRID).

110

www.it-ebooks.info
CHAPTER 5 IMPORTING SPATIAL DATA

Shapefiles may contain coordinate data defined using either projected or geographic coordinate
reference systems. The following sections show an example of each of these, imported into a column of
the geometry and geography datatype, respectively.

Obtaining Sample Shapefile Data


For this example, we use a projected dataset of Polygons representing precinct outlines of the city of
Boston, Massachussetts. As do many large U.S. cities, the City of Boston maintains a website from
which you can download various spatial datasets related to the city.
To download the precinct dataset, go to the website at http://hubmaps1.cityofboston.gov/datahub/
(you may be asked to accept a license agreement). From the left-hand menu, select GIS Data
Downloads to display various layers, grouped by theme. Under the Political heading, click on the small
blue icon to the right of the Precincts layer, which will bring up a popup window. Click on the link in the
popup window to download the layer and metadata, as shown in Figure 5-3. Then having downloaded
the Precincts.zip file, unzip it to extract the set of files for the Precincts shapefile.

Figure 5-3. Downloading Boston Precinct data.

111

www.it-ebooks.info
CHAPTER 5 IMPORTING SPATIAL DATA

Loading the Data


To import the shapefile into SQL Server using the default OGR2OGR options, open a command prompt
window, navigate to the folder in which you extracted the Precincts shapefile, and execute the
following command (you need to change the SQL Server connection string as appropriate).
ogr2ogr
-f "MSSQLSpatial"
"MSSQL:server=localhost;database=ProSpatial;trusted_connection=yes"
"precincts.shp"

Note For the purposes of clarity, I've separated the output format, destination, and source parameter values in
this code listing onto separate lines. However, when executing OGR2OGR you should not insert carriage returns
between parameters, but list them all on the same line with a space separating each.

After importing the dataset, you should find that three tables have been created in the specified
destination database (the ProSpatial database, in my case): precincts, geometry_columns, and
spatial_ref_sys.
The geometry_columns and spatial_ref_sys tables store metadata about any spatial data that has
been imported by OGR2OGR into a destination database. If, at some point in the future, you were to
use this same data as the source of another transformation (suppose you were to export the precincts
dataset from SQL Server to KML) then OGR2OGR would refer to these tables to provide additional
information used in the conversion. In essence, spatial_ref_sys is OGR2OGR's own version of SQL
Server's sys.spatial_reference_systems table. However, such information is not essential and can
always be re-created, so you can ignore these tables for now (or even delete them if you choose,
although be aware that they will be re-created the next time you import any data).
What's more interesting to us right now is the precincts table, which contains the data imported
from the precincts shapefile. The structure of the table is as follows.
[ogr_fid] [int] IDENTITY(1,1) NOT NULL,
[ogr_geometry] [geometry] NULL,
[objectid] [numeric](10, 0) NULL,
[prcnts_id] [float] NULL,
[id] [varchar](4) NULL,
[wdpct] [varchar](4) NULL,
[pct] [varchar](2) NULL,
[shape_area] [numeric](19, 11) NULL,
[shape_len] [numeric](19, 11) NULL
The polygon shapes representing the outline of each precinct have been imported into a geometry
column of the table named ogr_geometry. The attributes from the precincts.dbf file have been used to
populate additional columns in the table, containing various identifiers together with the length and
area of each precinct shape. To ensure uniqueness of each row in the table, OGR2OGR has
automatically added an IDENTITY column, ogr_fid, which contains a unique integer reference for each
record.
To test out the data loaded into the table, we can plot the ogr_geometry column on the Management
Studio Spatial Results Tab, as shown in Figure 5-4.

112

www.it-ebooks.info
CHAPTER 5 IMPORTING SPATIAL DATA

Figure 5-4. The precincts shapefile plotted using the Management Studio Spatial Results tab.

Note After importing a dataset, you may find that OGR2OGR creates additional tables in your database:
geometry_columns and spatial_ref_sys. These tables are used by OGR2OGR to store additional metadata.

Assigning an Output SRID


The result displayed in Figure 5-4 seems pretty good, but there are still a few tweaks that you might
want to make. You could, for example, use the nln "NewTable" option to change the name of the table
into which the data is inserted, or specify select objectid,pct to copy only certain attribute fields
from the shapefile to columns in the destination table.
The most important change to be made, however, is to correct a problem that might not yet be
apparent but which can be demonstrated by executing the following code listing.
SELECT ogr_geometry.STSrid FROM precincts;
The STSrid property returns the spatial reference identifier of an instance, and the results of this
query show that OGR2OGR has loaded every record in the precincts table using a spatial reference
identifier of 32768. This value seems like a perfectly plausible SRID so it might not strike you as
particularly odd, but it's not the correct spatial reference identifier for this data. In fact, it's not a valid

113

www.it-ebooks.info
CHAPTER 5 IMPORTING SPATIAL DATA

SRID at all; it's the code that OGR2OGR uses when the destination spatial reference system is
undefined. (Perhaps a code of 99999 would have made this more obvious!)
To assign the correct SRID, we first need to examine the properties of the spatial reference system
in which the coordinates were originally supplied, which is described in the precincts.prj file that
accompanies the shapefile. The PRJ file is a simple text file, so you can load it up in any text editor to
show the contents as follows.

PROJCS[
"NAD_1983_StatePlane_Massachusetts_Mainland_FIPS_2001_Feet",
GEOGCS[
"GCS_North_American_1983",
DATUM["D_North_American_1983",SPHEROID["GRS_1980",6378137.0,298.257222101]],
PRIMEM["Greenwich",0.0],
UNIT["Degree",0.0174532925199433]],
PROJECTION["Lambert_Conformal_Conic"],
PARAMETER["False_Easting",656166.6666666665],
PARAMETER["False_Northing",2460625.0],
PARAMETER["Central_Meridian",-71.5],
PARAMETER["Standard_Parallel_1",41.71666666666667],
PARAMETER["Standard_Parallel_2",42.68333333333333],
PARAMETER["Latitude_Of_Origin",41.0],
UNIT["Foot_US",0.3048006096012192]]

This preceding PRJ text should look familiar to you as the Well-Known Text representation of a
projected spatial reference system, which we examined back in Chapter 1. This particular system is the
state plane coordinate system for Massachussetts Mainland, which uses a Lambert Conformal Conic
projection of the North American 1983 Datum, centered on a central meridian at 71.5 degrees West.
Because this is a projected coordinate system, we can't look for it in SQL Server's
sys.spatial_reference_systems table. Instead, you can go to the EPSG registry website at
http://www.epsg-registry.org and search for spatial reference systems using the name
"Massachusetts Mainland". There will be a few matches returned, but the distinguishing properties of
the system we're looking for is that it's based on the NAD83 datum, with the unit of measure being US
Feet. This leads to only one possible spatial reference system: SRID 2249.
To correct the SRID of the records in the precincts table, you could execute a T-SQL UPDATE query
after importing the data, as follows.
UPDATE precincts
SET ogr_geometry.STSrid = 2249;

A better approach, however, would be to make OGR2OGR assign the correct SRID at the point the
data is inserted. This can be done by specifying the -a_srs parameter, together with the EPSG code of
the spatial reference system to be used. The following example recreates the precincts table using the
a_srs parameter to populate every record with SRID 2249. Note that because I want to replace the
existing precincts table, I've also included the overwrite parameter:
ogr2ogr
-f "MSSQLSpatial"
"MSSQL:server=localhost;database=ProSpatial;trusted_connection=yes"
"precincts.shp"
-a_srs "EPSG:2249"
-overwrite

114

www.it-ebooks.info
CHAPTER 5 IMPORTING SPATIAL DATA

Tip You can find the full list of OGR2OGR options by calling ogr2ogr /? or by looking at
http://www.gdal.org/ogr2ogr.html.

Re-executing the previous code listing to retrieve the SRID of the values in the ogr_geometry
column now shows that all the data have been imported with the correct SRID:
SELECT DISTINCT ogr_geometry.STSrid FROM precincts;

2249

Another Example: Specifying Layer Creation Options


To provide a further example of importing data from shapefile format into SQL Server, we will use data
from the U.S. Census Bureau representing the Zip Code Tabulation Areas (ZCTAs) of the state of
California. These ZCTAs were defined by the U.S. Census Bureau during the U.S. 2010 census, and are
approximately equivalent to the delivery area for a five-digit ZIP code as used by the U.S. Postal
Service. You can download the ESRI shapefile of the Californian ZCTA areas used in this example
from the following URL:
http://www2.census.gov/geo/tiger/TIGER2010/ZCTA5/2010/tl_2010_06_zcta510.zip.

Tip You can download many other interesting shapefile datasets from the U.S. Census by using the Web
download interface at http://www.census.gov/cgi-bin/geo/shapefiles2010/main.

Download this ZIP file and extract its contents. You should see the familiar structure of a shapefile
dataset: the .shp file that defines the coordinate data, the .dbf file containing the various attributes attached
to each shape, the .shx index file, and the .prj file describing the projection parameters. Let's start by
looking at the tl_2010_06_zcta510.prj file, the content of which is shown below:

GEOGCS[
"GCS_North_American_1983",
DATUM["D_North_American_1983",SPHEROID["GRS_1980",6378137,298.257222101]],
PRIMEM["Greenwich",0],
UNIT["Degree",0.017453292519943295]
]

Notice that, because the Well-Known Text definition in the tl_2010_06_zcta510.prj file begins with
the keyword GEOGCS, we know that the coordinates of this shapefile are defined using a geographic
coordinate system, and should be imported into a column of the geography datatype. Now, we just need to
determine the appropriate SRID for the data in this column.
To find out the SRID, you could search for the parameters listed in the tl_2010_06_zcta510.prj file on
the EPSG registry website, as we did to find the SRID of the Massachussetts Mainland projection in the

115

www.it-ebooks.info
CHAPTER 5 IMPORTING SPATIAL DATA

preceding example. However, even though this would allow you to find the correct SRID, it wouldnt
necessarily prove that this data could be imported into SQL Server. Remember that SQL Server only
supports geographic coordinates defined using one of the spatial reference systems listed in the
sys.spatial_reference_systems table. Therefore, instead of looking up the projection on a website, we'll
search for a record in the sys.spatial_reference_systems table that matches the parameters given in the
tl_2010_06_zcta510.prj file. That way, we can look up the corresponding SRID and check that it's a
supported spatial reference system at the same time.
The .prj file states that the ZCTA coordinate data is defined using a geographic coordinate system
based on the NAD 83 datum. We can search for the correct identifier for this spatial reference system
in the sys.spatial_reference_systems table using the query:
SELECT
spatial_reference_id
FROM
sys.spatial_reference_systems
WHERE
well_known_text LIKE 'GEOGCS%"NAD83"%';
The single result returned is as follows.

spatial_reference_id
4269

So, when importing the data contained in the ZCTA shapefile, we should use the a_srs parameter
to assign SRID 4269. Because SRID 4269 is a geographic coordinate reference system, we also need to tell
OGR2OGR to import the data into a column using the geography datatype rather than the geometry datatype
as used in the last example.
OGR2OGR settings that are specific to the destination data format, such as the choice between using the
geometry/geography datatype, are known as layer creation options. You can specify one or more layer
creation options using the lco flag. In the following code listing, two layer creation options are given.
GEOM_TYPE="geography" specifies that the shape data should be inserted into a
column of the geography datatype (the alternative, and default value, is
GEOM_TYPE="geometry").
GEOM_NAME="geog4269" sets the name of the geography column to geog4269 (I've
chosen this column name following my convention of concatenating the datatype
and SRID of the data it contains).
As in the preceding example, I use the a_srs parameter to ensure that the created geometries are
assigned the correct SRID of EPSG:4269. I also use the overwrite option to specify that the requested
destination table should be replaced should it already exist.
I also include two other options:
-nln "CaliforniaZCTA" specifies the name of the destination table in SQL
Server. Without this switch, the table would have been named
tl_2010_06_zcta510 to match the input shapefile, but that's rather difficult to
remember!
-progress is a switch to tell OGR2OGR to display a simple progress bar in the
console window as the data is uploaded. This is particularly useful when
loading larger datasets to give an indication of how far along the import
procedure has progressed (and how much farther it has to go).

116

www.it-ebooks.info
CHAPTER 5 IMPORTING SPATIAL DATA

Heres the full code listing to load the Californian ZCTA shapefile:
ogr2ogr
-f "MSSQLSpatial"
"MSSQL:server=localhost;database=ProSpatial;trusted_connection=yes"
"tl_2010_06_zcta510.shp"
-a_srs "EPSG:4269"
-overwrite
-lco "GEOM_TYPE=geography"
-lco "GEOM_NAME=geog4269"
-nln "CaliforniaZCTA"
-progress
As the records are imported from the shapefile into SQL Server, OGR2OGR will update a
percentage progress bar as illustrated in Figure 5-5.

Figure 5-5. Displaying progress of OGR2OGR import from ESRI shapefile.

Once the progress bar has reached 100%, the import is complete and you can select the data from the
CaliforniaZCTA table, the (partial) content of which is displayed in the SSMS Spatial Results tab as shown in
Figure 5-6.

117

www.it-ebooks.info
CHAPTER 5 IMPORTING SPATIAL DATA

Figure 5-6. Displaying the ZCTA areas of California.

Importing MapInfo Data


MapInfo Professional is a GIS system for Windows platforms often used in public sector organizations.
MapInfo datasets are saved in an ASCII text format, either using the native .tab file extension, or
saved as a MapInfo Interchange File (MIF), which typically comes as a pair of files with .mif and .mid
extensions. OGR2OGR can read and write MapInfo datafiles in either native (.tab) or interchange
(.mid/.mif) format.

Obtaining Sample MapInfo Data


To demonstrate the process of importing MapInfo data into SQL Server, we will use a dataset from
Geoscience Australia, a department of the Australian government responsible for various geoscientifc
and environmental research and information management.
Starting from the homepage of the Geoscience Australia website at http://www.ga.gov.au/, use the
search box to search for "river basins data". When the search results are displayed, clicking the top hit
should take you to a page of metadata titled "Australia's River Basins 1997". This page tells you various
useful information about this dataset and, around halfway down the page (under the heading "Access"),
contains a link labeled "Free Data Download".

Note At the time of writing, the direct URL link for the river basins dataset is
https://www.ga.gov.au/products/servlet/controller?event=FILE_SELECTION&catno=42343.

118

www.it-ebooks.info
CHAPTER 5 IMPORTING SPATIAL DATA

On the File Selection page, click to select the MapInfo Interchange Format (MIF) checkbox in the
Available Files column as shown in Figure 5-7, and then click Continue to File Download. Finally, click
on the dataset title hyperlink to download the file.

Figure 5-7. Downloading a MapInfo dataset from the Geoscience Australia website.

The file download consists of a single ZIP archive containing several files. Extract these files to
find three datasets containing Point (rbasin_point), LineString (rbasin_chain), and Polygon
(rbasin_polygon) features in MapInfo Interchange Format, each having an associated .mid and .mif
file. There is also a copyright notice and a user guide to accompany the dataset.

Determining the SRID


Following the same steps as with the ESRI shapefile examples, before loading the data into SQL Server,
we first need to determine what SRID is associated with the coordinates. But, unlike with the shapefile
data sources, we don't have a .prj file to refer to and, although the header of a MapInfo file can contain
details of the projection used, the river basin files obtained from Geoscience Australia don't have this
information.

119

www.it-ebooks.info
CHAPTER 5 IMPORTING SPATIAL DATA

Fortunately, a quick glance through the user guide that accompanies the download reveals a
passage in Section 2.3, Coordinate system, as follows.

Australia's River Basins 1997 data is available in geographical coordinates (latitude


and longitude) in decimal degrees using the Australian Geodetic Datum (AGD66).

Perfect. Now, as we did with the U.S. Census data, we just need to take a look in the
sys.spatial_reference_systems table to find the ID for a geographic coordinate system based on the
AGD66 datum:

SELECT * FROM sys.spatial_reference_systems


WHERE well_known_text LIKE '%AGD66%';

There's a single matching result returned, with SRID 4202.

Loading the Data


That's all the information we need, so now we can use OGR2OGR to import the MapInfo data from the
RBasin_Polygon dataset using the following command,
ogr2ogr
-f "MSSQLSpatial"
"MSSQL:server=localhost;database=ProSpatial;trusted_connection=yes"
"RBasin_Polygon.mid"
-a_srs "EPSG:4202"
-lco "GEOM_TYPE=geography"
-lco "GEOM_NAME=geog4202"
Note that this code listing follows exactly the same pattern as used to load data from ESRI
shapefiles; the options set in this case specify that the data should be loaded into a new table in the
tempdb database of the SQL Server instance running on localhost, using SRID 4202, with a geography
column named geog4202. The only difference from the last example, in fact, is that instead of providing
the name of the .shp shapefile, you provide the name of the MapInfo .mid file instead. There are no
additional options to set regarding the input file format; this is determined dynamically from the
structure of the input file itself.
Once the import has finished, select the data in the RBasin_Polygon table in SQL Server and you
should see the familiar shape of Australia, as shown in Figure 5-8.

120

www.it-ebooks.info
CHAPTER 5 IMPORTING SPATIAL DATA

Figure 5-8. Australian River Basins imported from MapInfo into SQL Server.

Reprojecting Data During Import


In all the preceding examples we've used the a_srs option to assign the SRID to the data in SQL
Server to be the same as the spatial reference system in which the coordinates of the source dataset
were defined. It's worth noting that OGR2OGR can also reproject data from the source coordinate
system into another spatial reference system as it is imported into the destination. To do so, instead
of using the a_srs option to assign a spatial reference system to the output, you use s_srs to specify
the EPSG code of the source coordinate system and t_srs to specify the coordinate system into which
it should be transformed.
For example, the following code listing demonstrates how to import the Boston precincts shapefile
dataset used previously but, this time, reprojects the data from SRID 2249 to SRID 4326 as it is loaded.
This involves conversion from a projected coordinate system to a geographic coordinate system,
therefore I've also used the GEOM_TYPE layer creation option to ensure that a geography column is used
in the destination table, which I've called precincts_reprojected:
ogr2ogr
-f "MSSQLSpatial"
"MSSQL:server=localhost;database=ProSpatial;trusted_connection=yes"
"precincts.shp"
-s_srs "EPSG:2249"
-t_srs "EPSG:4326"
-overwrite
-lco "GEOM_TYPE=geography"
-lco "GEOM_NAME=geog4326"
-nln "precincts_reprojected"

121

www.it-ebooks.info
CHAPTER 5 IMPORTING SPATIAL DATA

The resulting Polygons, now contained in the geography column geog4326, are illustrated in Figure 5-9.

Figure 5-9. Massachussetts precinct polygons reprojected to SRID 4326 in the geography datatype.

Reprojection and transformation between different coordinate systems are covered in detail in
Chapter 8.

Exporting Spatial Data from SQL Server


Although the main focus of this chapter is concerned with getting data from different sources into SQL
Server, it is worth noting that OGR2OGR can just as easily be used to export data from SQL Server into
another supported format.
To export data from SQL Server with OGR2OGR you follow the same basic pattern as when
importing, except that you provide the SQL Server connection string as the {Source} for the
transformation, and set the filename of the target file as the {Destination}. You also need to set the f
option to specify the format in which the output file should be written.
It is often the case that you don't want to export a complete single table, but possibly a subset of a
table, or the results of a query that combines fields from multiple tables. To do this, you can specify
the sql option, supplying a SQL query to be executed that will return the relevant source data from
the database.
To demonstrate, in the following sections we take the data from the precincts_reprojected table
and export them to a KML file.

Keyhole Markup Language


Keyhole Markup Language, more commonly known as KML, is an XML-based language originally
developed by Keyhole Inc. for use in its EarthViewer application. In 2004, Google acquired Keyhole,
together with EarthViewer, which it used as the foundation on which to develop its popular Google Earth

122

www.it-ebooks.info
CHAPTER 5 IMPORTING SPATIAL DATA

platform (http://earth.google.com). Although the KML format has undergone several revisions since
then (at the time of writing, the latest version is KML 2.2), it continues to be the native format for
storing spatial information used in Google Earth.
In 2008, KML was adopted by the Open Geospatial Consortium as a standard format for spatial
information, and you can now find the latest implementation of the KML specification at the OGC
website, at http://www.opengeospatial.org/standards/kml/.
KML has always been used within the Google Earth community to share user-created spatial data,
however, the popularity and accessibility of the Google Earth platform among the wider Internet com-
munity means that KML is becoming increasingly used for educational and research purposes, as well as
in critical applications such as emergency and disaster services. Coupled with its adoption as a standard
by the OGC, KML is becoming an increasingly important format for the interchange of spatial data.

Exporting from SQL Server to KML with OGR2OGR


To use OGR2OGR to write spatial data to a KML file, you set the -f format flag as "KML", and set the
name of the kml file as the {Destination}. Note that KML files can only contain coordinates defined
using SRID 4326, so if your source data uses any other spatial reference system, you will also have to use
the t_srs option to transform the coordinates to SRID 4326 in the process. In this example, we export
data from the precincts_reprojected table, in which the geog4326 column already contains coordinates
in the correct spatial reference system (because they were transformed into SRID 4326 during import).
We use the sql option to retrieve only the ID and shape of each precinct. The shape information
itself is retrieved from the geog4326 column in Well-Known Binary format using the STAsBinary()
method. Although OGR2OGR can read SQL Server's native binary format, I find it sometimes leads to
problems, and using the industry-standard WKB format is a more reliable option.
Here's the code listing to export the relevant data from the precincts_reprojected table to a new
precincts.kml file in the c:\spatial directory:
ogr2ogr
-f "KML"
"C:\spatial\precincts.kml"
"MSSQL:server=localhost;database=tempdb;trusted_connection=yes;"
-sql "SELECT prcnts_id, geog4326.STAsBinary() FROM precincts_reprojected"
-overwrite
The resulting KML file begins as shown below. Notice how the KML structure of a Polygon, in
which a space-separated list of coordinates is listed within a <coordinates> element contained within
<outerBoundaryIs>, closely resembles the GML format in which space-separated Polygon coordinates
are listed within the <posList> element contained within an <exterior> element:
<?xml version="1.0" encoding="utf-8" ?>
<kml xmlns="http://www.opengis.net/kml/2.2">
<Document><Folder><name>SELECT</name>
<Schema name="SELECT" id="SELECT">
<SimpleField name="Name" type="string"></SimpleField>
<SimpleField name="Description" type="string"></SimpleField>
<SimpleField name="prcnts_id" type="float"></SimpleField>
</Schema>
<Placemark>
<Style>
<LineStyle><color>ff0000ff</color></LineStyle>
<PolyStyle><fill>0</fill></PolyStyle></Style>
<ExtendedData>
<SchemaData schemaUrl="#SELECT">
<SimpleData name="prcnts_id">1</SimpleData>

123

www.it-ebooks.info
CHAPTER 5 IMPORTING SPATIAL DATA

</SchemaData>
</ExtendedData>
<Polygon>
<outerBoundaryIs>
<LinearRing>
<coordinates>-71.008036474492485,42.387093194669127
-71.010502257322116,42.389034720949425 -71.011425746332051,42.38976473048055
-71.011931948239607,42.390174099511839 -71.01262223043183,42.390907499155361
-71.013103274784129,42.391509391490196
-71.008036474492485,42.387093194669127
</coordinates>
</LinearRing>
</outerBoundaryIs>
</Polygon>
</Placemark>
<Placemark>

</Placemark>
</Folder></Document></kml>
When viewed in Google Earth, this KML file is displayed as shown in Figure 5-10.

Figure 5-10. Viewing precincts of Boston exported from SQL Server to a KML file in Google Earth.

124

www.it-ebooks.info
CHAPTER 5 IMPORTING SPATIAL DATA

Spatial ETL Using SQL Server Integration Services


The techniques discussed thus far in this chapter demonstrate some simple methods that allow you to
import spatial data in various formats into SQL Server 2012, but they cannot readily be automated or
used as part of a robust extraction, transformation, and load (ETL) process.
Unfortunately, SQL Server Integration Services, Microsoft's own ETL tool for SQL Server, does not
directly support transformations between spatial data formats such as the ESRI shapefile or MapInfo
file format. However, there are still ways to use spatial data in SSIS, as is demonstrated in this section.

Importing Point Data Using a Custom Script Component


SQL Server does not provide any specific functionality for dealing with spatial data. However, because
geography and geometry are CLR datatypes based on the .Net framework, you can import spatial data in
an SSIS dataflow task by using a custom script component that references the
Microsoft.SqlServer.Types.dll library and makes use of the SqlGeography and SqlGeometry classes.
To demonstrate this approach, we will create an SSIS package that loads the same eqs7day-M1
dataset used right at the beginning of this chapter, but instead of running a T-SQL script to populate a
geography Point column representing the epicenter of each record after loading the table, we'll create
this column as part of the SSIS load process itself.

Note SQL Server Integration Services is not available as part of SQL Server Express. To follow this example you
must be using SQL Server developer, standard or enterprise edition.

Creating a New SSIS Project


To begin, create a new SSIS project by following these steps.
1. From the main menu of Business Intelligence Development Studio (or Visual
Studio), select File New Project...
2. In the New Project dialog box that appears, highlight Integration Services
Project from the Business Intelligence Project types templates.
3. Choose a Name and Location for the new SSIS project, and then click OK.

Creating the Text File Connection


Once the new project has been created, define a connection to the eqs7day-M1.txt source file as
follows.
1. From the Visual Studio main menu, click on SSIS New Connection...
2. In the Add SSIS Connection Manager dialog box that appears, select FLATFILE
Connection manager for flat files, and then click the Add... button.
3. The Flat File Connection Manager Editor dialog will appear. Notice that this
is exactly the same dialog as you used to set the file source for the
Import/Export Wizard at the beginning of this chapter, and you need to choose

125

www.it-ebooks.info
CHAPTER 5 IMPORTING SPATIAL DATA

exactly the same settings as before. Start by giving a name to this connection.
I use "Earthquakes Text File".
4. Click the Browse button, and navigate to the eqs7day-M1.txt text file.
Highlight the file and click open.
5. Change the Text qualifier to be a double quote character (").
6. Click to enable the checkbox next to Column names in the first data row.
7. You now need to configure the datatype of each column in the textfile. Select
the Advanced option from the left-hand pane of the Flat File Connection
Manager Editor. As before, click each column in turn and, from the properties
pane on the right-hand side, amend the values of the Datatype and
OutputColumnWidth fields to match the values shown in Table 5-2.
8. Once you have made the appropriate changes as shown in Figure 5-11, click
OK to close the Flat File Connection Manager Editor.

Figure 5-11. SSIS Flat File Connection Manager to the eqs7day-M1.txt file.

The new connection will appear in the Connection Managers pane at the bottom of the screen.

126

www.it-ebooks.info
CHAPTER 5 IMPORTING SPATIAL DATA

Creating the SQL Server Connection


Next you need to add the details of the SQL Server database that will be the destination into which the
data are inserted. We will use an OLE DB connection to the database, which you can create by
following these steps.
1. From the Visual Studio main menu, click on SSIS New Connection...
2. In the Add SSIS Connection Manager dialog box that appears, select OLEDB
Connection manager for OLE DB connections, and then click the Add...
button.
3. On the next screen, you will see a list of all available data connections. Click
New...
4. From the Provider dropdown at the top of the form, ensure that SQL Server
Native Client 11.0 is selected.
5. Enter the relevant details of the SQL Server instance to which you wish to
upload the data, and click OK.
The connection manager settings that I use for connecting to the SQL Server 2012 instance
running on my local machine are shown in Figure 5-12.

Figure 5-12. Connection Manager settings for SQL Server 2012.

127

www.it-ebooks.info
CHAPTER 5 IMPORTING SPATIAL DATA

We've now set up the relevant connections for our project, so it's time to add the task that will
transform the data from our text file into SQL Server.

Adding a New Dataflow Task and Specifying the Data Source


The project will contain a single dataflow task that transforms data from the source text file into the
destination SQL Server table. From the Toolbox on the left-hand side of the screen, drag a new Data
Flow Task onto the Control Flow designer surface in the center of the screen, and then double-click on
the Data Flow Task to edit it.
We will add a data source to the dataflow that uses the connection to our eqs7day-M1.txt file:
1. From the toolbox, under the Other Sources heading, select Flat File Source,
and drag it into the designer surface.
2. Double-click the Flat File Source to bring up the Flat File Source Editor.
3. Select the Earthquakes Text File connection from the dropdown box, and click
OK.

Creating the Script Component


Next we need to add a script component that will use the lat and lon columns from the text file source to
create a new geography column containing a Point instance. Because this script will act upon input
columns and create new output columns, we must use a transformation script.
1. Open the toolbox once again and, under the Common heading, select the
Script Component item and drag it onto the dataflow workspace.
2. In the Select Script Component Type dialog window that appears, select
Transformation, and click OK.
3. Now connect the flat file source as an input to the script component. To do
this, first click once on the Flat File Source item. Then click and drag the
green arrow that appears underneath to connect it to the Script Component
item.
To configure the script, double-click on the Script Component item that you just added to the Data Flow
window. This will bring up the Script Transformation Editor dialog window.
1. Click on the Input Columns tab on the left-hand side. The Available Input
Columns table will show all of the columns available from our source text file.
For this transformation, we need only the Lat and Lon columns, so click once
to select the check boxes next to each of these items.
2. Next, click on the Inputs and Outputs item on the left-hand side.
3. Click once on the small [+] icon next to Output 0, and then click on the Output
Columns item that appears underneath it.
4. Click on the Add Column button to create a new output column.
5. A new output column will be created. By default, this is named Column; while
the name of the column is still highlighted, let's change it to something more
informative, such as Location.
6. In the properties window on the right-hand side, set the DataType of the new
Location column to image [DT_IMAGE].

128

www.it-ebooks.info
CHAPTER 5 IMPORTING SPATIAL DATA

The preceding steps are illustrated in Figure 5-13.

Figure 5-13. Configuring the Script Transformation Component.

Note Why do we specify the image [DT_IMAGE] datatype for a column that will contain geography data? SSIS
datatypes are not specific to SQL Server, because SSIS can connect to many different data sources. The image
[DT_IMAGE] datatype, named because it can be used to store image data in databases, can be used to represent
any binary value up to a maximum size of 2^31-1 (2,147,483,647) bytes, including geography or geometry data.

129

www.it-ebooks.info
CHAPTER 5 IMPORTING SPATIAL DATA

To add the code that will be executed by the script task, click on the script tab on the left-hand side of
the Script Editor Transformation dialog box. You can create SSIS custom scripts using either Microsoft
Visual C# 2010 or Microsoft Visual Basic 2010. C# is the default, and that's what I use in this example. (If
you prefer to write your own script in Visual Basic, you can do so by changing the ScriptLanguage
property in the right-hand pane.)
Click the Edit Script button. A new Visual Studio editor window will appear containing the
skeleton template for a C# script component. We edit this script to create a geography Point item to
populate the Location column as part of the dataflow.
Before we can create our new script, we first need to add the necessary references to the .NET
libraries that contain the methods we want to use.
1. From the main menu, select Project Add Reference
2. Scroll down the list of components under the .NET tab, and highlight the
Microsoft.SqlServer.Types component (or Browse to the Program Files
(x86)\Microsoft SQL Server\110\SDK\Assemblies directory and highlight it
from there).
3. Click OK. Microsoft.SQLServer.Types will appear in the References list in the
Solution Explorer pane.
Having included a reference to Microsoft.SQLServer.Types in our project, we also need to include
the corresponding namespace in our code so that we can easily reference the methods included in the
assembly. To do so, add this line to the list of references at the top of main.cs, on the line immediately
following using Microsoft.SqlServer.Dts.Runtime.Wrapper:
using Microsoft.SqlServer.Types;
We also use the BinaryWriter() and MemoryStream() methods from the System.IO namespace, so
include a reference to that namespace by adding the following line,
using System.IO;
The main functionality of our script is contained within the ProcessInputRow() method. This
method acts upon the data in each row of the source file as they pass through the script component.
There is already an empty template for ProcessInputRow() method in the script, as follows.
public override void Input0_ProcessInputRow(Input0Buffer Row)
{
/*
Add your code here
*/
}
Edit this to be as follows.
public override void Input0_ProcessInputRow(Input0Buffer Row)
{
// Instantiate a new geography object
SqlGeography point = new SqlGeography();

// Use the Point() method of the geography datatype to create a point from the lat and lon
// values of this row
point = SqlGeography.Point(Row.Lat, Row.Lon, 4326);

// Instantiate a new memorystream


MemoryStream ms = new MemoryStream();

130

www.it-ebooks.info
CHAPTER 5 IMPORTING SPATIAL DATA

// Instantiate a new binarywriter based on the memorystream


BinaryWriter bw = new BinaryWriter(ms);

// Write the result to the binarywriter


point.Write(bw);

// Insert the resulting object into the Location row


Row.Location.AddBlobData(ms.ToArray());
}
}
We are not performing any postexecute() or preexecute() methods, so you can remove these
empty methods from the file if you wish. When you have made these changes, the resulting main.cs file
should look like that shown in the following code listing.
using System;
using System.Data;
using Microsoft.SqlServer.Dts.Pipeline.Wrapper;
using Microsoft.SqlServer.Dts.Runtime.Wrapper;
using Microsoft.SqlServer.Types;
using System.IO;

[Microsoft.SqlServer.Dts.Pipeline.SSISScriptComponentEntryPointAttribute]
public class ScriptMain : UserComponent
{
public override void Input0_ProcessInputRow(Input0Buffer Row)
{
SqlGeography point = new SqlGeography();
point = SqlGeography.Point(Row.Lat, Row.Lon, 4326);
MemoryStream ms = new MemoryStream();
BinaryWriter bw = new BinaryWriter(ms);
point.Write(bw);
Row.Location.AddBlobData(ms.ToArray());
}
}
Save the script (Ctrl + Shift + S) and then select File Exit to close the script window and return to
the Script Transformation Editor dialog window. Now Click OK to exit the Script Transformation Editor
and return to the Data Flow tab of the main SSIS project.

131

www.it-ebooks.info
CHAPTER 5 IMPORTING SPATIAL DATA

Specifying the Destination


Now that we have added the source and the transformation of our dataflow task, we need to set the
destination, the SQL Server table into which the task will insert the data.
1. From the toolbox, under Other Destinations, select and drag a new OLE DB
Destination to the Data Flow workspace.
2. Click once on the Script Component item in the workspace, and drag the
green output connector from the Script Component onto the new OLE DB
destination item.
3. Double-click the OLE DB destination item to open the OLE DB Destination
Editor dialog window.
4. On the Connection Manager tab, ensure that the OLE DB Connection
manager dropdown shows the SQL Server connection that you added to the
project earlier. To distinguish this set of data from the version loaded earlier
through the SQL Server Import/Export Wizard, wewill create a new table for
it, so click the New button next to the Name of the table or the view.
SSIS will generate a CREATE TABLE script for you based on the datatypes of the
columns included in the dataflow; the only modification you need to make is
to name the table and change the type of the Location column from IMAGE to
geography. The modified script I used is as follows.
CREATE TABLE [eqs7dayM1_SSIS] (
[Src] CHAR(2),
[Eqid] VARCHAR(8),
[Version] CHAR(1),
[DateTime] VARCHAR(50),
[Lat] REAL,
[Long] REAL,
[Magnitude] REAL,
[Depth] REAL,
[NST] SMALLINT,
[Region] VARCHAR(255),
[Location] GEOGRAPHY
)
Then click OK.
5. Click "Mapping" from the left-hand pane of the OLE DB Connection Manager
window. You should see that all of the columns from the original text file
together with the new Location column that is created in the script
component are automatically mapped to the appropriate columns in the
destination table, as shown in Figure 5-14. Click OK.

132

www.it-ebooks.info
CHAPTER 5 IMPORTING SPATIAL DATA

Figure 5-14. Columns from the source text file and the Location column created by the script component
mapped to columns in the SQL Server destination.

The SSIS project is now finished and ready to run. The complete dataflow task is illustrated in
Figure 5-15.

133

www.it-ebooks.info
CHAPTER 5 IMPORTING SPATIAL DATA

Figure 5-15. The completed SSIS Spatial Data Flow Task.

Running the SSIS package


We have now added all the components of the dataflow, so press F5 to run the project. Once the project
has executed, you can go back to SQL Server Management Studio and run the following T-SQL
statement in a query window.
SELECT
Eqid,
Region,
Location.STAsText() AS WKT
FROM
dbo.Eqs7dayM1_SSIS;
The results show the rows of data successfully imported from the Eqs7day-M1.txt text file, together
with a new column, Location, containing the geography Point created by the SSIS script component, as
follows.

134

www.it-ebooks.info
CHAPTER 5 IMPORTING SPATIAL DATA

Eqid Region WKT


51204068 Northern California POINT (-122.8238 38.809)
10330433 Southern California POINT (-116.1075 33.7415)
00048942 Central Alaska POINT (-150.3075 63.0547)
2008taaw Greece POINT (21.4639 38.0295)

This end result is identical to that obtained from importing the text file using the SQL Server
Import/Export Wizard and then manually adding and populating the geography column with an UPDATE
query. The difference is that, in this case, the derived column was created by the script component in
an SSIS package, which can be easily repeated as part of an automated load task.

Using Third-Party Source Components


The previous example was relatively simple because it sought only to create Point geometries from a
pair of numeric coordinate columns in the source dataset. To extend this example to create more
complex geometries, or to use SSIS to load data from sources such as ESRI shapefiles, we need a more
flexible data source than a simple text file.

CodePlex SSIS Shapefile Source


There's a project on the CodePlex open source repository that aims to provide an SSIS data source for
loading Points, LineStrings, or Polygons from ESRI shapefiles. You can find the project page at
http://shapefilesource.codeplex.com/.
At the time of writing, this project is still very young, and only in beta release. It's also a little bit
fiddly to set up; instead of being able to directly read shapefile data from the data source into a SQL
Server destination, you still need to use an intermediate custom script transformation. The custom
script must take the DT_IMAGE produced by the shapefile reader and turn it into a WKT string, which is
then used as an input to the STGeomFromText() method on the destination server. Despite its current
shortcomings, the project shows promise and is probably worth keeping an eye on for the future.

Safe FME
If you want to import an ESRI shapefile, MapInfo, or other spatial dataset into SQL Server in a
production environment, another option is to use a commercial third-party component. The most
popular tool for this purpose is the Feature Manipulation Engine (FME) from Safe Software
(http://www.safe.com).
FME can convert data between a huge range of spatial formats (250+), and it can also transform
and manipulate that data in various ways as part of a workflow. In addition to providing a stand-alone
desktop application and server component, FME also provides a set of extensions that integrate with
SQL Server Integration Services, providing additional source readers and destination writers for
working with different formats of spatial data as part of an ETL process.
Figure 5-16 illustrates a simple FME workflow that takes geometry and attribute data from a
shapefile and loads it directly into SQL Server.

135

www.it-ebooks.info
CHAPTER 5 IMPORTING SPATIAL DATA

Figure 5-16. Loading data from a shapefile to SQL Server using Safe FME.

It is not my intention to describe Safe FME in any further detail here, but I do recommend that you
check it out if you have need to load a large amount of spatial data from many different formats into
SQL Server.

Summary
In this chapter, you learned about a variety of data formats in which existing spatial data may be
provided, and how you can import those data into SQL Server 2012. Specifically, this chapter covered
the following.
There are many alternative file formats in which spatial information is
commonly stored and shared, including tabular geographic information, the
ESRI shapefile format, MapInfo files, KML, and many more.
There are a number of sources from which you can obtain freely available
spatial data over the Internet. The data obtained from these sources range in quality
and in coverage. If you are downloading spatial data for use in a critical
application, be sure to check the accuracy of that data first!

136

www.it-ebooks.info
CHAPTER 5 IMPORTING SPATIAL DATA

Simple spatial information provided in tabular gazetteer format can be imported


using the Import and Export Wizard. You can then create a computed column or
use the T-SQL UPDATE statement in conjunction with the Point() static method to
populate a geography or geometry column from the coordinate values of each row
of data in the table.
To construct an item of spatial data from tabular information containing a z-
coordinate, you can manually construct the WKT representation of a geometry
based on each coordinate value, and then pass that representation to the
relevant WKT static method, such as STPointFromText().
There are a number of other tools capable of converting and importing spatial
data that are compatible with SQL Server. OGR2OGR is one such tool, which can
read from and write to a variety of formats, and you saw examples of how to load
ESRI shapefiles and MapInfo interchange files into SQL Server, and how to
export data to KML format.
SQL Server Integration Services does not directly support spatial data formats.
To perform ETL of spatial data, you can write a custom script component, or use a
commercial add-in such as FME from Safe Software.

137

www.it-ebooks.info
CHAPTER 6

Geocoding

Even though the dedicated spatial datatypes, geometry and geography, are a relatively recent addition to
SQL Server, almost every existing SQL Server database already contains some form of spatial
information, that is, data that describes the location of some feature or other. This spatial information
might not be of the sort we have considered so far in this book, being described using coordinates from
a spatial reference system, but might instead be the addresses of customers or suppliers, postal codes,
delivery routes, or the names of cities or regions for which a sales manager is responsible. Wouldnt it
be useful if you could conduct spatial analysis based on this sort of common, unstructured spatial
information? That is exactly what geocoding enables you to do.
The process of geocoding involves taking a text-based description of a location or place, such as a
street address, the name of a landmark, or a postal code, and deriving a structured spatial representation
of that feature. In practice, the representation returned by most geocoding methods is a single pair of
coordinates representing the approximate center of the supplied address, but they may also return a
bounding box representing the extent of a feature, or even a Polygon representing its precise shape.
To demonstrate the process of geocoding, let's consider an example. The address of the White
House, the official residence of the president of the United States, is 1600 Pennsylvania Avenue NW,
Washington DC, 20500. If you were to geocode this address, you might obtain the coordinates 38.8980
degrees latitude, 77.0365 degrees longitude, corresponding to a Point located in the WGS84 spatial
reference system. The geocoding process is illustrated in Figure 6-1.

Figure 6-1. Geocoding an address into latitude and longitude coordinates

There are a number of different ways to provide geocoding functionality; some geocoding tools are
desktop-based applications, whereas others are services that you access over the Web. In this chapter, I'll
show you how to create a .NET geocoding function that calls into the Bing Maps REST geocoding service,
and how to integrate this function into SQL Server. I'll then discuss some considerations for using
geocoding in a batch or asynchronous environment, and introduce the concept of reverse-geocoding.

The Bing Maps Geocoding Service


Microsoft provides a number of online spatial services under the "Bing Maps" brand. One of these, the
Bing Maps REST Locations API, exposes a representational state transfer (REST) interface into a

139

www.it-ebooks.info
CHAPTER 6 GEOCODING

geocoding service. You can read more details about the Bing Maps REST Locations API at
http://msdn.microsoft.com/en-us/library/ff701715.aspx.
Because the Locations API is free for most applications, and simple to use, I use this service to
provide geocoding functionality in this chapter. However, you could apply a similar approach to many
other web-based geocoding services.

Obtaining a Bing Maps Key


In order to use the Bing Maps geocoding service, you must provide authenticatation in the form of an
alphanumeric key. If you want to follow any of the code samples in this chapter, you'll therefore need
to sign up for a key, which you can do by following the instructions at
https://www.bingmapsportal.com/.
Registration is free and quick. You'll first need to log in to the Bing Maps portal site with a
Windows Live ID and create a new account by filling in the form shown in Figure 6-2.

Figure 6-2. Creating a Bing Maps account

140

www.it-ebooks.info
CHAPTER 6 GEOCODING

Having entered the required registration details, select the menu option to "Create or view keys".
You'll be prompted to enter a name for your application, as well as the URL at which the application
will be hosted. You'll also need to select the type of application for which the key will be used. For the
purposes of this chapter, you can request a "Developer" application key, with an application URL
pointing to a localhost address, such as http://127.0.0.1, as shown in Figure 6-3.

Figure 6-3. Creating a Bing Maps key

Keys are used to track usage of the Bing Maps service, and are also used for recording certain
billable transactions, as described at http://msdn.microsoft.com/en-us/library/ff859477.aspx.
However, you are only billed for production applications for which you have taken out an enterprise
license agreement. You won't incur any costs for using Bing Maps in a development environment, or
for any production applications that are not covered by an enterprise license. For the rest of this
chapter, I'll assume that you've signed up for a developer key, which will be a 64-character
alphanumeric string such as this:

At7aATG4p6LjyQha9TFGduTh15_i5N0t4R341k3y!Uvs3VIE2QhsOSRx_tFoKURkD5vOeRs

In any of the code samples in this chapter where you see the text ENTERYOURBINGMAPSKEY, be sure to
substitute your key in the appropriate place to be able to access the Bing Maps service.

Calling the Bing Maps REST Locations API


You access the Bing Maps Locations API by making an HTTP request to the URL at
http://dev.virtualearth.net/REST/v1/Locations (using a web browser, or via the .NET HttpWebRequest

141

www.it-ebooks.info
CHAPTER 6 GEOCODING

class, for example). This request is based on a template that contains a number of parameters for the
elements of the address to be geocoded, as in the following example (be sure to enter your Bing Maps
key where indicated):
http://dev.virtualearth.net/REST/v1/Locations?countryRegion=UK&adminDistrict=Norfolk
&locality=Norwich&postalCode=NR2 4TE&addressLine=27 Heigham Street&o=xml&key=
ENTERYOURBINGMAPSKEY
In this example, the service is called to geocode an address at 27 Heigham Street, Norwich,
Norfolk, NR2 4TE, UK. The parameter o=xml is provided to specify that the result should be returned in
XML format (the default is to return results as JSON). The results returned by the service when
geocoding the preceding address with a valid Bing Maps key are as follows:
<Response>
<Copyright>Copyright 2011 Microsoft and its suppliers. All rights reserved. This API
cannot be accessed and the content and any results may not be used, reproduced or
transmitted in any manner without express written permission from Microsoft
Corporation.</Copyright>
<BrandLogoUri>http://dev.virtualearth.net/Branding/logo_powered_by.png</BrandLogoUri>
<StatusCode>200</StatusCode>
<StatusDescription>OK</StatusDescription>
<AuthenticationResultCode>ValidCredentials</AuthenticationResultCode>
<TraceId>ece6495ca47a416f8dc4dad13b64f6a2</TraceId>
<ResourceSets>
<ResourceSet>
<EstimatedTotal>1</EstimatedTotal>
<Resources>
<Location>
<Name>NR2 4TE, Norwich, Norfolk, United Kingdom</Name>
<Point><Latitude>52.634046</Latitude><Longitude>1.286097</Longitude></Point>
<BoundingBox>
<SouthLatitude>52.630183282429321</SouthLatitude>
<WestLongitude>1.2776115753233459</WestLongitude>
<NorthLatitude>52.637908717570674</NorthLatitude>
<EastLongitude>1.2945824246766542</EastLongitude>
</BoundingBox>
<EntityType>Postcode1</EntityType>
<Address>
<AdminDistrict>England</AdminDistrict>
<AdminDistrict2>Norfolk</AdminDistrict2>
<CountryRegion>United Kingdom</CountryRegion>
<FormattedAddress>NR2 4TE, Norwich, Norfolk, United Kingdom</FormattedAddress>
<Locality>Norwich</Locality>
<PostalCode>NR2 4TE</PostalCode>
</Address>
<Confidence>High</Confidence>
</Location>
</Resources>
</ResourceSet>
</ResourceSets>
</Response>
Most geocoding algorithms are approximate. This is because they rely on parsing a number of
free-text user-supplied fields, and are therefore unlikely to find an exact match for the address

142

www.it-ebooks.info
CHAPTER 6 GEOCODING

entered. Instead, they tend to separate out components of an address and find the best match (or
matches) for the elements given.
In the example above, the value of the <EntityType> element, Postcode1, signifies that, in this case,
the result is based on a match of the supplied postcode value only rather than the full street address.
This might occur in situations when the full street address could not be located (because it was
mistyped, for example). The result contains both an approximate center point for the geocode postcode
match, provided in the <Point> element of the <Location>, as well as a <BoundingBox> representing the
extent of this area.

Creating a .NET Geocoding Assembly


To access the Bing Maps Locations API from within SQL Server, we will use Visual Studio to build a
custom .NET assembly that will be executed by the SQLCLR. This is another example of the benefits of
the integration of the .NET CLR within SQL Server: it allows you to extend the functionality of SQL
Server using any of the methods contained in the .NET Base Class Library, including using classes
such as HttpWebRequest to access a web-based resource.
Because this is the first time I've described how to use spatial functionality from within an SQLCLR
assembly, I'll explain the process in detail, step by step. In subsequent chapters, I'll assume that you're
familiar with the basic process of assembly creation and deployment given here.

Note The following steps describe the creation of a .NET assembly using the freely available Visual C# 2010
Express Edition. You may use a different edition of Visual Studio, but be aware that you may find that some of the
menu items appear under different headings than described here.

Creating a New Project


Your first task is to create a new class library project, by following these steps.
1. From the Visual Studio menu bar, select File New Project (or press Ctrl+N).
2. In the New Project dialog box, shown in Figure 6-4, select the Class Library
template and type a name for the new project. For the assembly containing
the code samples in this chapter, I named the project ProSpatialCh6.
3. Click OK.

143

www.it-ebooks.info
CHAPTER 6 GEOCODING

Figure 6-4. Creating a new project using Visual C# 2010 Express Edition

Once the project has been created, the project workspace will appear, and the main window will
show the contents of the default class file within the project.

Configuring the Project


Before adding the geocoding function, we need to make a number of changes to configure the project.
To make these changes, follow these steps.
1. Open the project properties page by selecting Project ProSpatialCh6
Properties from the main menu bar.
2. From the Application tab, ensure that the Target Framework is set to .NET
Framework 3.5 or greater. If it is not already, then change the value (which
might require you to save and reload the project).
3. Select Build from the list of tabs on the left side of the project properties page.
4. In the Output path field, enter the location in which you want the compiled
geocoder assembly to be created. In this example, I set the output path to
C:\Spatial
Theses steps are illustrated in Figure 6-5.

144

www.it-ebooks.info
CHAPTER 6 GEOCODING

Figure 6-5. Setting the compile options for the Geocoder project

Adding a Reference to Microsoft.SqlServer.Types.dll


The coordinates returned from the Locations API are expressed in geographic coordinates measured
using the EPSG:4326 spatial reference system. Our geocoding function will create a geography Point
instance from these coordinates. To be able to do so, we must therefore include a reference to the
Microsoft.SqlServer.Types.dll library that contains the methods of the SqlGeography and
SqlGeometry classes.
1. From the Project menu, select Add Reference.
2. Click on the Browse tab and navigate to the folder in which the
Microsoft.SqlServer.Types.dll assembly is installed. On my system, this is
/Program Files (x86)/Microsoft SQL Server/110/SDK/Assemblies.
3. Highlight the Microsoft.SqlServer.Types.dll assembly and click OK.
These steps are illustrated in Figure 6-6.

145

www.it-ebooks.info
CHAPTER 6 GEOCODING

Figure 6-6. Adding a reference to the Microsoft.SqlServer.Types.dll library

Adding a Simple Geocoding Function


Now that the project is set up, its time to add the method that will perform the geocoding itself. Add a
new class file to the project by selecting Project Add new item Class. Name the new class file
Geocoder.cs and click Add.
When the new class file is created, it will contain an empty template. Replace this with the following
code listing, inserting the value of your Bing Maps key where indicated.
using System;
using System.Data;
using System.Data.SqlClient;
using System.Data.SqlTypes;
using Microsoft.SqlServer.Server;
using Microsoft.SqlServer.Types;
using System.Collections.Generic;
using System.Xml; // Used to manipulate XML response
using System.Net; // Used to make HTTP Request
using System.IO; // Used to read stream of data

namespace ProSpatial.Ch6
{
public partial class UserDefinedFunctions
{

146

www.it-ebooks.info
CHAPTER 6 GEOCODING

public static XmlDocument Geocode(


string countryRegion,
string adminDistrict,
string locality,
string postalCode,
string addressLine
)
{
// Variable to hold the geocode response
XmlDocument xmlResponse = new XmlDocument();

// Bing Maps key used to access the Locations API service


string key = "ENTERYOURBINGMAPSKEY";

// URI template for making a geocode request


string urltemplate = "http://dev.virtualearth.net/REST/v1/Locations?countryRegion={0}
&adminDistrict={1}&locality={2}&postalCode={3}&addressLine={4}&key={5}&output=xml";

// Insert the supplied parameters into the URL template


string url = string.Format(urltemplate, countryRegion, adminDistrict, locality,
postalCode, addressLine, key);

// Attempt to geocode the provided address


try {

// Initialise web request


HttpWebRequest webrequest = null;
HttpWebResponse webresponse = null;
Stream stream = null;
StreamReader streamReader = null;

// Make request to the Locations API REST service


webrequest = (HttpWebRequest)WebRequest.Create(url);
webrequest.Method = "GET";
webrequest.ContentLength = 0;

// Retrieve the response


webresponse = (HttpWebResponse)webrequest.GetResponse();
stream = webresponse.GetResponseStream();
streamReader = new StreamReader(stream);
xmlResponse.LoadXml(streamReader.ReadToEnd());

// Clean up
webresponse.Close();
stream.Dispose();
streamReader.Dispose();
}
catch(Exception ex)
{
// Exception handling code here;
}

// Return an XMLDocument with the geocoded results

147

www.it-ebooks.info
CHAPTER 6 GEOCODING

return xmlResponse;
}
}
}
This code is pretty self-explanatory; It defines a single method, Geocode(), which accepts five
string parameters, representing the different elements of the address to be geocoded. The Geocode()
method constructs a URL template for the Locations API service including placeholders for each of the
address elements, and inserts the supplied values in the appropriate places. An HttpWebRequest is used
to call the service, and the response is read into an XmlDocument, which is then returned by the method.

Creating a Geocoding UDF Wrapper


You might have noticed that there's nothing in the preceding code listing that makes it specific to SQL
Server; it's a regular C# method that accepts a number of string parameters (not SqlString
parameters) representing the elements of an address. It returns an XMLDocument containing the
response from the Bing Maps Locations API service, again, not anything unusual.
The reason for creating the code in this manner is to highlight an important fact about working
with spatial functionality in .NET (and, in fact, SQLCLR procedures in general); wherever possible you
should try to encapsulate your code and make it platform-independent rather than tying it to a
particular tier. The geocoding function illustrated previously can be reused in a number of layers of
the application architecturein a SQL Server database, in an ASP.Net website, or in a client
application, for examplewith only a minimal amount of additional coding required. In fact, within
this chapter I demonstrate the flexibility of this approach by creating two different functions that
expose the geocoded result to SQL Server in different ways.
Before we can actually use this function in SQL Server, we therefore need to create an additional
"wrapper" method around this base geocoding function. The wrapper will accept the address
parameters in SQL Server's own SqlString format, pass these to the geocoder method, and construct
and return a geography Point instance from the XML response returned. To keep this example simple,
our wrapper function will return a single scalar value, representing the first matched location returned
by the geocode service. (If no matching address can be found then an empty Point is returned.) Here's
the code, which you should add in to the existing UserDefinedFunctions class in the Geocoder.cs file:
// Declare a UDF wrapper method
[Microsoft.SqlServer.Server.SqlFunction(DataAccess = DataAccessKind.Read)]
public static SqlGeography GeocodeUDF(
SqlString addressLine,
SqlString locality,
SqlString adminDistrict,
SqlString postalCode,
SqlString countryRegion
)
{
// Document to hold the XML geocoded location
XmlDocument geocodeResponse = new XmlDocument();

// Attempt to geocode the requested address


try
{
geocodeResponse = Geocode(
(string)countryRegion,
(string)adminDistrict,
(string)locality,

148

www.it-ebooks.info
CHAPTER 6 GEOCODING

(string)postalCode,
(string)addressLine
);
}
// Failed to geocode the address
catch (Exception ex)
{
SqlContext.Pipe.Send(ex.Message.ToString());
}

// Specify the XML namespaces


XmlNamespaceManager nsmgr = new XmlNamespaceManager(geocodeResponse.NameTable);
nsmgr.AddNamespace("ab","http://schemas.microsoft.com/search/local/ws/rest/v1");

// Check that we received a valid response from the Bing Maps geocoding server
if (geocodeResponse.GetElementsByTagName("StatusCode")[0].InnerText != "200")
{
throw new Exception("Didn't get correct response from geocoding server");
}

// Retrieve the list of geocoded locations


XmlNodeList Locations = geocodeResponse.GetElementsByTagName("Location");

// Create a geography Point instance of the first matching location


double Latitude = double.Parse(Locations[0]["Point"]["Latitude"].InnerText);
double Longitude = double.Parse(Locations[0]["Point"]["Longitude"].InnerText);
SqlGeography Point = SqlGeography.Point(Latitude, Longitude, 4326);

// Return the Point to SQL Server


return Point;
}

Compiling the Assembly


Having added the necessary methods, you can now compile the assembly by selecting Build Build
Solution, or press Ctrl+Shift+B. You should see the following message appear in the output window (if
you cannot see the output window, select it from the View menu or press Ctrl+Alt+O).

------ Build started: Project: ProSpatialCh6, Configuration: Release Any CPU ------
ProSpatialCh6 -> C:\Spatial\ProSpatialCh6.dll
========== Build: 1 succeeded or up-to-date, 0 failed, 0 skipped ==========

Thats all that is required from Visual Studio, so you can go back to SQL Server now.

149

www.it-ebooks.info
CHAPTER 6 GEOCODING

Configuring the Database


Before we can use the geocoding function, we need to make a few configuration changes to the SQL
Server Database Engine to allow it to use the assembly correctly.

Enabling CLR Support


We know that SQL Server runs the .NET CLR process; thats how the geometry and geography datatypes
work. However, although the system-defined CLR datatypes require no additional CLR configuration,
you cannot normally import and run user-defined CLR functions in SQL Server because this feature is
disabled by default. This is a deliberate safety mechanism to ensure that a database administrator has
allowed the use of these powerful (although potentially dangerous) features. In order to use the custom
.NET geocoding function, you first need to configure the database to enable CLR support. You can do
this by running the following T-SQL code.
EXEC sp_configure 'clr enabled', '1';
GO
If the configuration change is successful, you should receive the following message.

Configuration option 'clr enabled' changed from 0 to 1.


Run the RECONFIGURE statement to install.

To complete the change, we need to reconfigure the server to reflect the changed value, by issuing
a T-SQL query with the RECONFIGURE statement as follows.
RECONFIGURE;
GO
The SQL Server configuration settings will now be updated to allow you to run user-defined CLR
code, and you should receive the following message.

Command(s) completed successfully.

Setting Security Permissions


Because we will be using our .NET managed code to access information from a web service, we also
need to set the appropriate security permissions on the database to enable access to external data. The
simplest way of allowing this is to set the database to be trustworthy. This can be done by running the
following T-SQL code (note that you should change the name ProSpatial to match the name of your
database).
ALTER DATABASE ProSpatial SET TRUSTWORTHY ON;
GO
You should receive the following message.

150

www.it-ebooks.info
CHAPTER 6 GEOCODING

Command(s) completed successfully.

The database is now configured and ready to import the geocoding assembly.

Note Although setting a database to be trustworthy might be the simplest way to grant access to external
resources, it is not necessarily representative of security best practice in a production environment. Understanding
the different security levels and permission sets for .NET assemblies within SQL Server can be complicated. If
you'd like to learn more on this subject, try reading Chapter 7 of "Expert SQL Server 2008 Development" (by the
author, Apress, 2009).

Importing the Assembly


Having created and compiled our .NET assembly and made the necessary configuration changes to our
server, we can now import the assembly into the database. You can do this by executing the following
T-SQL script.
CREATE ASSEMBLY ProSpatialCh6
FROM 'C:\Spatial\ProSpatialCh6.dll'
WITH PERMISSION_SET = EXTERNAL_ACCESS;
GO
This creates an assembly in the database called ProSpatialCh6, from the ProSpatialCh6.dll output
file compiled by Visual Studio. You will need to change the file path specified from
C:\Spatial\ProSpatialCh6.dll to match the build output location and assembly name that you set in
Visual Studio earlier.
The PERMISSION_SET argument specifies the permission level granted to this assembly. By default,
new SQL Server assemblies are marked as SAFE, which means that they can only access restricted, local
resources. This is a security feature to ensure that any code cannot access external (potentially
dangerous) resources to which it is not permitted access. For our geocoding function to work, we need to
explicitly allow our code to access external resources, by specifying PERMISSION_SET =
EXTERNAL_ACCESS.
Once the assembly has been created, it should appear in the SQL Server Management Studio Object
Explorer, listed under Assemblies within the Programmability node of the database into which it was
imported, as shown in Figure 6-7 (you may need to refresh the Object Explorer view before the assembly
becomes visible, by right-clicking the Assemblies node and selecting Refresh).

151

www.it-ebooks.info
CHAPTER 6 GEOCODING

Figure 6-7. The geocoding assemby listed in SQL Server Management Studio Object Explorer

Creating the Geocode Function


Now that we have imported our assembly, we need to define a function so that we can access the
geocoding method from within T-SQL code. The function will specify a number of input parameters
containing different descriptive fields of an address, and return a geography Point instance representing
that address, constructed from the coordinates returned from the Bing Maps geocode service.
To create the function, execute the following T-SQL code.
CREATE FUNCTION dbo.Geocode(
@addressLine nvarchar(max),
@locality nvarchar(max),
@adminDistrict nvarchar(max),

152

www.it-ebooks.info
CHAPTER 6 GEOCODING

@postalCode nvarchar(max),
@countryRegion nvarchar(max)
) RETURNS geography
AS EXTERNAL NAME
ProSpatialCh6.[ProSpatial.Ch6.UserDefinedFunctions].GeocodeUDF;
This code creates a T-SQL function called Geocode that provides an interface to the GeocodeUDF
method contained within the ProSpatialCh6 assembly. It specifies that the parameters that must be
provided when using the Geocode function are an address, the name of a city, the subdivision (i.e.,
county/state), the postal code (or ZIP code), and the country or region. These parameters correspond
exactly to the parameters passed by the .NET method to the Locations API geocoding service. The
return value of the function is a geography instance, containing a Point geometry associated with that
address.

Note When creating a function from a .NET assembly, the syntax for the AS EXTERNAL NAME clause is
AssemblyName.[Namespace.ClassName].FunctionName.

Thats it! Congratulations, youve just used .NET to add a new function to SQL Server, extending
the existing spatial functionality by allowing you to geocode address data.

Using the Geocode Function


Finally, we get to use our geocoding function. To test it out, lets try retrieving the latitude and
longitude of a Point at the Apress head office, located at the following postal address:
Apress, Inc.
233 Spring Street
New York,
NY 10013
To retrieve the WKT representation of a Point based on the Apress office address, you can execute
the following query in SQL Server Management Studio.
SELECT dbo.Geocode('233 Spring Street','New York','NY','10013','USA').ToString();
The result is as follows.

POINT (-74.004799 40.725906)

To check the accuracy of the function, we can plot this result against a road map of New York city
using Bing Maps, as shown in Figure 6-8.

153

www.it-ebooks.info
CHAPTER 6 GEOCODING

Figure 6-8. Plotting the geocoded address location of the Apress offices on Bing Maps

Creating a Geocoding TVF Wrapper


In the previous example, the geocoding wrapper function was created as a scalar UDF; you supply a
single address, and you get a single geocoded result back. However, this approach has some limitations:
First, the Locations API service actually returns several columns of
information about the matched geocoded result. As shown in the example
given near the beginning of this chapter, the XML response may contain not
only the coordinates of a single point at the center of the requested location,
but also the bounding box describing the extent of a feature, and additional
metadata about the match found. A UDF that returns only a single scalar Point
value loses this additional information.
Second, geocoding is not a precise operation: frequently we may find that the
source address is not completely and exactly specified, and there is some scope
for ambiguity as to the exact location to which it refers. Consider the example

154

www.it-ebooks.info
CHAPTER 6 GEOCODING

from the beginning of this chapter, referring to the address of the White House.
Suppose that we had omitted the fact that we were looking for 1600
Pennsylvania Avenue, Washington D.C. Knowing only the street number and
name, and that this address was in the United States, we could have been
referring to Pennsylvania Avenue in Baltimore, or Atlantic City, or one of two
Pennsylvania Avenues in the state of West Virginia, or one of several others. In
these cases, the Bing Maps Locations API could return multiple rows of results,
representing each of the possible matching geocoded locations. Our current
function automatically returns the first match, but it is not necessarily the
correct one.
Because the result of a geocoding operation has the potential to return multiple rows and columns
of data, it sounds like a suitable situation in which to use a table-valued function (TVF) that returns a
table of several possible matched locations, allowing the user to choose which one they meant. Such a
TVF is demonstrated in the following code listing:
[Microsoft.SqlServer.Server.SqlFunction(
Name = "GeocodeTVF",
FillRowMethodName = "GeocodeTVFFillRow",
DataAccess = DataAccessKind.Read,
TableDefinition = @"Name nvarchar(255),
Point geography,
BoundingBox geography")]
public static System.Collections.IEnumerable GeocodeTVF(
SqlString addressLine,
SqlString locality,
SqlString adminDistrict,
SqlString postalCode,
SqlString countryRegion
)
{

// Document to hold the XML geocoded location


XmlDocument geocodeResponse = new XmlDocument();

try
{
geocodeResponse = Geocode(
(string)countryRegion,
(string)adminDistrict,
(string)locality,
(string)postalCode,
(string)addressLine
);
}
// Failed to geocode the address
catch (Exception ex)
{
SqlContext.Pipe.Send(ex.Message.ToString());
}

// Define the default XML namespace


XmlNamespaceManager nsmgr = new XmlNamespaceManager(geocodeResponse.NameTable);
nsmgr.AddNamespace("ab", "http://schemas.microsoft.com/search/local/ws/rest/v1");

155

www.it-ebooks.info
CHAPTER 6 GEOCODING

// Create a set of all <Location>s in the response


XmlNodeList Locations = geocodeResponse.GetElementsByTagName("Location");

// Set up a list to hold results


List<object[]> items = new List<object[]>();

// Loop through each location in the response


foreach (XmlNode locationNode in Locations)
{
// Create a new object for this result
object[] item = new object[3];

// Retrieve the name of this location


string Name = locationNode["Name"].InnerText;
item.SetValue(Name, 0);

// Create a point for this location


double Latitude = double.Parse(locationNode["Point"]["Latitude"].InnerText);
double Longitude = double.Parse(locationNode["Point"]["Longitude"].InnerText);
SqlGeography Point = SqlGeography.Point(Latitude, Longitude, 4326);
item.SetValue(Point, 1);

// Create a polygon for this location's bounding box


if (locationNode.SelectSingleNode("ab:BoundingBox", nsmgr) != null)
{
// Retrieve the latitude/longitude extents of the box
double BBSLatitude = double.Parse(
locationNode.SelectSingleNode("ab:BoundingBox/ab:SouthLatitude", nsmgr).InnerText);

double BBNLatitude = double.Parse(


locationNode.SelectSingleNode("ab:BoundingBox/ab:NorthLatitude", nsmgr).InnerText);

double BBWLongitude = double.Parse(


locationNode.SelectSingleNode("ab:BoundingBox/ab:WestLongitude", nsmgr).InnerText);

double BBELongitude = double.Parse(


locationNode.SelectSingleNode("ab:BoundingBox/ab:EastLongitude", nsmgr).InnerText);

// Build a geography polygon of the box


SqlGeographyBuilder gb = new SqlGeographyBuilder();
gb.SetSrid(4326);
gb.BeginGeography(OpenGisGeographyType.Polygon);
gb.BeginFigure(BBSLatitude, BBWLongitude);
gb.AddLine(BBSLatitude, BBELongitude);
gb.AddLine(BBNLatitude, BBELongitude);
gb.AddLine(BBNLatitude, BBWLongitude);
gb.AddLine(BBSLatitude, BBWLongitude);
gb.EndFigure();
gb.EndGeography();
SqlGeography Polygon = gb.ConstructedGeography;
item.SetValue(Polygon, 2);
}

156

www.it-ebooks.info
CHAPTER 6 GEOCODING

// Add this result to the set of results


items.Add(item);

return items;
}

public static void GeocodeTVFFillRow(


object obj,
out SqlString Name,
out SqlGeography Point,
out SqlGeography BoundingBox)
{
object[] item = (object[])obj;
Name = (SqlString)(item[0].ToString());
Point = (SqlGeography)item[1];
BoundingBox = (SqlGeography)item[2];
}
Note that this function still calls exactly the same underlying Geocode method as the scalar
GeocodeUDF method described previously; it just handles the resulting XMLDocument in a different way,
returning a table of possible results to SQL Server rather than only the first match. I haven't needed to
change the actual code that calls the Bing Maps web service at all, which demonstrates the benefits of
separating out your code into modular reusable units.
Having added the GeocodeTVF() method to the UserDefinedFunctions class of your geocoding project,
recompile the assembly in Visual Studio by selecting Rebuild Solution from the Build menu. Then drop
and re-create the ProSpatialCh6 assembly in SQL Server, and register the new function as follows:
CREATE FUNCTION dbo.GeocodeTVF(
@addressLine nvarchar(255),
@locality nvarchar(255),
@adminDistrict nvarchar(255),
@postalCode nvarchar(255),
@countryRegion nvarchar(255)
) RETURNS table (Name nvarchar(255), Point geography, BoundingBox geography)
AS EXTERNAL NAME
ProSpatialCh6.[ProSpatial.Ch6.UserDefinedFunctions].GeocodeTVF;
Table-valued functions return table result sets, so the syntax for their usage is a bit different than
that used for inline UDF functions. To demonstrate, let's look at the example stated previously by
looking for an address of "1600 Pennsylvania Avenue", somewhere in the United States:
SELECT * FROM dbo.GeocodeTVF('1600 Pennsylvania Avenue', '', '', '', 'USA');
The results from the table-valued function now contain not only a single result, but five matching
results. What's more, we don't only have a single Point geometry associated with each matching
record, but also a Polygon representing the extent of each instance:

Name Point BoundingBox


1600 Pennsylvania Ave, Colton, CA 92324 0xE610000 0xE6100000010405
1600 Pennsylvania Ave, Fairfield, CA 94533 0xE610000 0xE6100000010405
1600 Pennsylvania Ave, Los Angeles, CA 90033 0xE610000 0xE6100000010405

157

www.it-ebooks.info
CHAPTER 6 GEOCODING

1600 Pennsylvania Ave, Richmond, CA 94801 0xE610000 0xE6100000010405


1600 Pennsylvania Ave, West Sacramento, CA 95691 0xE610000 0xE6100000010405

Asynchronous and Batch Geocoding


So far, we've manually called the geocoding function from a SELECT query to geocode a single address
at a time. But suppose that you wanted to automatically geocode every new address record that was
entered into an Addresses table, such as this simple example:
CREATE TABLE Addresses (
AddressID int identity(1,1),
Address nvarchar(255),
Location geography
);
Ideally, every time a new address is inserted into the Address column of the table, we'd like the
Location column to be updated with a geography Point record representing the location of that address.
The logic to do this is not too hard; you could simply create an UPDATE trigger on the table, which calls
the Geocode procedure and updates the corresponding value:
CREATE TRIGGER tgGeocodeAddress
ON Addresses
FOR INSERT, UPDATE AS
BEGIN
SET NOCOUNT ON;
UPDATE Addresses
SET Location = dbo.Geocode(Address)
WHERE AddressID IN (SELECT AddressID FROM inserted);
END;
However, theres a problem here: CLR UDFs in SQL Server can be slow at the best of times, but
our Geocode function is going to suffer especially because it relies on waiting for a response from a
network resource. Every single INSERT transaction is going to be forced to wait for that geocode
function to return a result before completing, which, depending on the latency of the network, the
load on the Bing Maps REST service, and other factors, could cause each INSERT transaction on the
table to take a couple of seconds or more. Even on a database with only a few concurrent users
issuing single row updates at a time, the potential for deadlocks, concurrency issues, and generally
bringing the server to its knees is huge.
One possible solution is to employ an asynchronous trigger, one that will perform the geocoding
task in the background and update the record in the Addresses table when the web service has returned
its response. You can do this with SQL Server Service Broker by creating a procedure that listens for
messages sent to a service queue. Details of addresses to be geocoded are sent to the queue and
processed asynchronously, updating the corresponding record when a result is returned from the
Locations API service rather than hold up the transaction waiting for a response at the point the INSERT
or UPDATE operation is made. For more details on how you can set up such a queueing system using
Service Broker, I recommend reading "Pro SQL Server 2008 Service Broker" (Apress, 2008).
Another point to bear in mind is that, if you are planning to geocode a lot of addresses, it would be
quite inefficient to process each address one at a time through the Locations API. Instead, you can use a
service that allows you to upload an entire set of address data and process it as a batch operation,
downloading the geocoded data when the operation has completed. One such service is the Geocode

158

www.it-ebooks.info
CHAPTER 6 GEOCODING

Dataflow API that is part of the Bing Spatial Data Services, described in more detail at
http://msdn.microsoft.com/en-us/library/ff701734.aspx.

Reverse Geocoding
So far, we've only looked at one-way geocoding, starting with an address and deriving the
corresponding latitude/longitude coordinates, which enables us to create a geography Point
representing that location. But what about performing the reverse operation: starting with a set of
coordinates and returning a description of the closest matching point of interest?
There are several potential applications of such a function. Suppose that you had collected
coordinates from a GPS system, or triangulated the location of a mobile phone call, and wanted to
obtain a description of that location: in what town is this person, for example? Or to what address is this
delivery vehicle closest?
The process of reverse-geocoding is illustrated in Figure 6-9.

10 Downing Street
(51.5035, -0.1278) London
SW1A 2AA
Figure 6-9. Reverse geocoding

Fortunately, the Bing Maps REST services also expose reverse-geocoding functionality. To
perform a reverse-geocode, rather than providing the address to be looked up, you simply provide the
(WGS84) latitude and longitude coordinates instead. The following code listing demonstrates the URL
syntax required to reverse-geocode a Point located at a latitude of 47.64054 degrees and a longitude of
122.12934 west:
http://dev.virtualearth.net/REST/v1/Locations/47.64054,-122.12934?o=xml&key=YOURBINGMAPSKEY
This URL returns the following XML response
<Response xmlns:xsi="http://www.w3.org/2001/XMLSchema instance"
xmlns:xsd="http://www.w3.org/2001/XMLSchema"
xmlns="http://schemas.microsoft.com/search/local/ws/rest/v1">
<Copyright>Copyright 2010 Microsoft and its suppliers. All rights reserved. This API
cannot be accessed and the content and any results may not be used, reproduced or transmitted
in any manner without express written permission from Microsoft Corporation.</Copyright>
<BrandLogoUri>http://dev.virtualearth.net/Branding/logo_powered_by.png</BrandLogoUri>
<StatusCode>200</StatusCode>
<StatusDescription>OK</StatusDescription>
<AuthenticationResultCode>ValidCredentials</AuthenticationResultCode>
<TraceId>fbfb8df89423415589eec14c8de7585e</TraceId>
<ResourceSets>
<ResourceSet>
<EstimatedTotal>2</EstimatedTotal>
<Resources>
<Location>
<Name>1 Microsoft Way, Redmond, Washington 98052, United States</Name>
<Point>
<Latitude>47.640568390488625</Latitude>
<Longitude>-122.1293731033802</Longitude>

159

www.it-ebooks.info
CHAPTER 6 GEOCODING

</Point>
<BoundingBox>
<SouthLatitude>47.636705672917948</SouthLatitude>
<WestLongitude>-122.137016420622</WestLongitude>
<NorthLatitude>47.6444311080593</NorthLatitude>
<EastLongitude>-122.1217297861384</EastLongitude>
</BoundingBox>
<EntityType>Address</EntityType>
<Address>
<AddressLine>1 Microsoft Way</AddressLine>
<AdminDistrict>Washington</AdminDistrict>
<AdminDistrict2>King</AdminDistrict2>
<CountryRegion>United States</CountryRegion>
<FormattedAddress>1 Microsoft Way, Redmond, Washington 98052, United
States</FormattedAddress>
<Locality>Redmond</Locality>
<PostalCode>98052</PostalCode>
</Address>
<Confidence>Medium</Confidence>
</Location>
<Location>
<Name>1 Microsoft Way, Redmond, WA 98052 6399</Name>
<Point>
<Latitude>47.639747</Latitude>
<Longitude>-122.129731</Longitude>
</Point>
<BoundingBox>
<SouthLatitude>47.635884282429323</SouthLatitude>
<WestLongitude>-122.13737419709076</WestLongitude>
<NorthLatitude>47.643609717570676</NorthLatitude>
<EastLongitude>-122.12208780290925</EastLongitude>
</BoundingBox>
<EntityType>Address</EntityType>
<Address>
<AddressLine>1 Microsoft Way</AddressLine>
<AdminDistrict>WA</AdminDistrict>
<AdminDistrict2>King County</AdminDistrict2>
<CountryRegion>United States</CountryRegion>
<FormattedAddress>1 Microsoft Way, Redmond, WA 98052 6399</FormattedAddress>
<Locality>Redmond</Locality>
<PostalCode>98052 6399</PostalCode>
</Address>
<Confidence>Medium</Confidence>
</Location>
</Resources>
</ResourceSet>
</ResourceSets>
</Response>
Because the response returned from the reverse-geocoding service follows exactly the same
structure as for the geocoding service, this makes it easy to adapt the UDF and TVF functions
introduced earlier this chapter to cater for reverse-geocoding if required as well.

160

www.it-ebooks.info
CHAPTER 6 GEOCODING

Summary
In this chapter, you learned how to extend the functionality of SQL Server to geocode address data.
Specifically, you learned the following:
Geocoding can be used to derive a structured spatial representation of a feature
on the Earth from descriptive information about that feature.
The Bing Maps Web Services provide a method that can be used to geocode data,
accessible via a REST interface over the Web.
You can create a reusable .NET method to access the REST service, and then
create one or more wrapper classes to expose geocoding functionality in SQL
Server as a UDF or TVF. These methods can be called directly from T-SQL code
to return geography data from a supplied address value.
Reverse-geocoding is the opposite process to geocoding: taking a
latitude/longitude coordinate pair, and deriving the corresponding street
address or location. This functionality is also provided by the Bing Maps REST
service, and can be incorporated into SQL Server using similar methods.
As a final note, bear in mind that, although I've demonstrated that it is possible to geocode address
data directly from the database layer, it does not always make sense to do so. I've highlighted the
benefits of creating modular reusable code, so that if you decide to perform geocoding in an
application layer instead, you can do so using the same code base as described in this chapter.

161

www.it-ebooks.info
CHAPTER 7

Precision, Validity, and Errors

Most software comes supplied with online technical documentation containing examples that describe
and demonstrate all of that software's features, and SQL Server is no different. Microsoft Books
Online includes a reference guide to all of the geography and geometry methods available in SQL
Server 2012, including sample code listings illustrating their usage. You can view the full reference
guide at http://msdn.microsoft.com/en-us/library/bb933790%28v=SQL.110%29.aspx.
It's all very well following the code samples provided by the software manufacturer, and Books
Online is an excellent reference if all you want to know is the syntax required to use each method.
However, many of the examples are quite artificial, demonstrating each function in isolation, using
perfect, compact, fully formed datasets. In the real world, we must accept that the spatial data we are
given to work with is frequently less perfect than in these idealistic examples. You'll often discover
that the textbook samples start to fall down when they are applied to the gnarly ugly data that we, as
developers, all face on a day-to-day basis.
In this chapter, well examine some of the issues that can affect the quality, accuracy, and
reliability of spatial information. I'll also show you techniques to minimize or eliminate the negative
effect of such issues, ultimately leading to more robust, useful spatial applications.

Precision
If you were to ask your bank manager how much money you had in your account and she replied
"somewhere between $800 and $810," you'd probably be a bit concerned. We have come to expect that
all data held in databases should have an exact known value. If that value can't be determined then we
assume that there must be an error in the process used to gather the information or a bug in the system
used to retrieve it. However, one important fact to remember is that all spatial data is only ever an
approximation. The accuracy of that approximation depends on a number of factors:
Firstly, there are approximations and assumptions in the underlying
theoretical approach used to define the data; as explained in Chapter 1, spatial
data is defined relative to a geodetic model of the earth. That model provides
only a rough fit around the true shape of the earth, and doesn't match it
exactly. Projected coordinate systems can introduce further inaccuracies, as
geodetic features must be distorted in order to be represented on a flat plane.
Remember also that vector spatial data stores only simple geometric shapes,
which fail to reflect the true, complex and organic shape of many features on
the Earth.
Secondly, precision errors may be introduced as a result of practical
limitations of the measuring systems and instruments used for collecting and

163

www.it-ebooks.info
CHAPTER 7 PRECISION, VALIDITY, AND ERRORS

recording spatial data. Location information gathered from consumer GPS


devices, for example, is generally only accurate to within a few meters. More
advanced scientific tools and surveying equipment have greater accuracy, but
inevitably still possess a margin of error.
Finally, precision errors may be introduced as a result of the way in which
spatial data is stored, retrieved, and manipulated in systems such as SQL
Server, and at every subsequent stage at which data passes through layers of
an application interface.
It is this last category of errors with which we are most concerned in this chapter, because these
are generally the errors that we, as developers, are able to control (at least to some degree).

Storage Precision
SQL Server stores geography and geometry coordinates as binary values, adhering to the IEEE-754
standard for binary floating-point arithmetic. Based on this standard, each coordinate is represented
as a double precision floating point number that is 64 bits (8 bytes) long. By storing coordinates as
floating point values, SQL Server ensures that a large range of possible coordinate values can be
accommodated, while requiring only a limited amount of storage.
Although SQL Server stores coordinates as binary floating point values, WKT, the format in which
coordinate data is most commonly supplied, is a text-based format in which coordinates are stated in
decimal format. Not all decimal numbers can be represented exactly in floating-point binary format.
Whenever you use a static method to create an instance of geography or geometry data from WKT (or
any other static method that accepts decimal input), the supplied coordinates are therefore implicitly
converted to the closest equivalent binary floating point value. Essentially, each WKT coordinate
value is CAST from nvarchar to binary(8). As with conversion of any other type of data between
datatypes, this presents the possibility of truncation or rounding of the coordinate values.
The range of possible coordinate values that can be stored in an 8-byte binary value is roughly
equivalent to 15 digits of decimal precision. So, for example, the x-coordinate of a geometry Point might
be stored as 0x3FF3C0CA428C59F8, which corresponds to a decimal value of 1.234567890123456. Some of
the geography and geometry static methods allow you to create instances from coordinate values with
greater precision than this; the Point() method, for example, will accept decimal coordinate values
with up to 38 digits of precision. However, those coordinates will ultimately be stored with the same
fixed 64-bit binary precision, and supplying coordinates with greater precision will not lead to any
greater precision of the stored geography or geometry value.
To demonstrate, consider the following code listing, which creates two geometry Point instances
using the Point() static method. The y-coordinate of both instances is the same. The value of the x-
coordinate differs, but only following the sixteenth decimal place:
DECLARE @Precise geometry;
SET @Precise = geometry::Point(10.23456789012345, 0, 0);

DECLARE @SuperPrecise geometry;


SET @SuperPrecise = geometry::Point(10.234567890123456789012345678901234567, 0, 0);

SELECT @Precise.STEquals(@SuperPrecise);
The additional decimal places of precision supplied for the @SuperPrecise Point cannot be
represented in an 8-byte binary value. As a result, the x-coordinate values that SQL Server stores for
both the @Precise and @SuperPrecise instances are exactly the same (0x3C8B514819782440). The
STEquals() method, which compares whether two instances are equal, returns the value 1, which
confirms that both Points are the same.

164

www.it-ebooks.info
CHAPTER 7 PRECISION, VALIDITY, AND ERRORS

The preceding example demonstrated that supplying excess precision has no effect on the
coordinate values of a geometry; they will always be stored in the database as 8-byte binary values.
Likewise, it is worth remembering that coordinates supplied with less decimal precision will still
occupy a fixed 8-bytes when converted to binary and saved to the database. Thus, in the following
example, even though @HighPrecision is instantiated from coordinates with greater precision than
@LowPrecision, the two Points occupy exactly the same amount of space.
DECLARE @LowPrecision geometry;
SET @LowPrecision = geometry::STPointFromText('POINT(1 2)', 0);

DECLARE @HighPrecision geometry;


SET @HighPrecision = geometry::STPointFromText('POINT(1.2345678901234567890123456789
2.3456789012345678)', 0);

SELECT
DATALENGTH(@LowPrecision),
DATALENGTH(@HighPrecision);

22 22

The result demonstrates that you do not create simpler geometries, or require less space to repre-
sent geometries created from less precise coordinates. Both Points in this example (and all Points created
from a single coordinate pair) occupy exactly 22 bytes, as shown by the result of the DATALENGTH function.

Practical Implications of Fixed Binary Precision


What effect does the design decision to store coordinates as 8-byte binary values have on the accuracy
of spatial data stored in SQL Server? The answer to this question depends on whether you are
referring to the geometry or geography datatype.

geometry Precision
Firstly, let's consider the precision of coordinates defined in a planar coordinate system, stored using
the geometry datatype. Suppose you were given the coordinates of a location measured in the
EPSG:26913 spatial reference system. This is the spatial reference identifier corresponding to UTM
Zone 13N, a projected spatial reference system based on the NAD83 datum. The unit of measurement
for coordinates in this system is the meter.
Because geometry coordinates represent the distance of a point from an origin in a linear unit of
measure, it is easy to see the correlation between the precision of the supplied coordinates and the
accuracy of the stored position. Assuming that coordinates were supplied using the full precision
capable of being represented in an 8-byte binary value, equivalent to 15 decimal places of accuracy,
you can be sure that the stored location of any geometry data in SQL Server based on this SRID will be
accurate to within 0.000000000000001 meters of the supplied value, in both the x- and y-axes.

geography Precision
What about the effects of fixed precision in coordinates of the geography datatype? In such cases it's
less obvious what the magnitude of any error would be because, whereas geographic coordinates are
expressed in angular degrees, we tend to think of the "accuracy" of a location in terms of meters away
from its true position. So just how far, in terms of distance measured across the earths surface, does 15

165

www.it-ebooks.info
CHAPTER 7 PRECISION, VALIDITY, AND ERRORS

decimal places of one degree of latitude or longitude correspond? To answer this question, we have to
consider the error in the longitude and latitude coordinates separately.
The distance on the Earths surface represented by one degree of longitude or one degree of
latitude varies depending on the corresponding latitude. This is illustrated in Figure 7-1.

Figure 7-1. The distance covered by one degree of longitude varies with latitude

At the equator, where the distance represented by one degree of longitude is greatest, one degree
of longitude corresponds to about 111.3 km. We can verify this using SQL Server's STDistance()
method to measure the distance between two points on the equator separated by one degree of
longitude, as follows:
DECLARE @EquatorA geography = geography::Point(0,0,4326);
DECLARE @EquatorB geography = geography::Point(0,1,4326);

SELECT @EquatorA.STDistance(@EquatorB);
The result is shown following. As the coordinate values in this case were stated using SRID 4326,
this result is calculated relative to the WGS84 datum and expressed in meters:

111319.490735885

As you approach the poles, the meridian lines representing points of equally spaced longitude
converge, and the distance represented by one degree of longitude decreases. At the Tropic of Cancer
(the circle of latitude at which the sun appears directly overhead during the June Solstice), which lies at
a latitude of approximately 23.5 degrees, the distance covered by one degree of longitude is reduced to
about 102.1 km, which can be verified using the following code listing:
DECLARE @TropicOfCancerA geography = geography::Point(23.5,0,4326);
DECLARE @TropicOfCancerB geography = geography::Point(23.5,1,4326);

SELECT @TropicOfCancerA.STDistance(@TropicOfCancerB);

166

www.it-ebooks.info
CHAPTER 7 PRECISION, VALIDITY, AND ERRORS

102140.828881171

At the Arctic circle, which lies at a latitude of approximately 66.5 degrees North, one degree of
longitude covers only 44.5 km:
DECLARE @ArcticCircleA geography = geography::Point(66.5,0,4326);
DECLARE @ArcticCircleB geography = geography::Point(66.5,1,4326);

SELECT @ArcticCircleA.STDistance(@ArcticCircleB);

44513.5512918299

At the poles, one degree of longitude covers an infinitely small distance. Figure 7-2 depicts a
graph illustrating the distance on the Earth's surface represented by one degree of longitude at any
latitude.

Figure 7-2. The distance on the Earth's surface covered by one degree of longitude at any latitude

167

www.it-ebooks.info
CHAPTER 7 PRECISION, VALIDITY, AND ERRORS

Due to the oblate shape of the earth, the distance covered by one degree of latitude also varies with
latitude, although the variance is much less than that of longitude. One degree of latitude at the equator
corresponds to about 110.6 km whereas at the poles, where the distance is greatest, one degree of
latitude corresponds to about 111.7 km.
So, assuming 15 decimal places of precision and a "worst-case" scenario in which one degree of
latitude/longitude represents 111 km along the ground, coordinates of geography instances are
capable of describing a location with a precision of 111 km/10E15, which still means that any supplied
coordinate value can be stored with submillimeter precision, wherever it lies on the globe. It's
probably safe to say that SQL Server therefore stores spatial data with sufficient accuracy for all but the
most demanding of spatial applications.

Balancing Precision Against Transmission Size


As a general rule, you should always retain the maximum amount of precision possible throughout your
spatial applications. When creating geometry and geography instances, for example, you should provide
coordinates with the full, most precise value available from your data. As demonstrated previously, SQL
Server will represent this as a 64-bit binary value, which will retain the maximum fidelity of your data.
However, when retrieving coordinate values for display purposes, or to pass to another
application in decimal format, you may wish to deliberately limit the precision of those values. Unlike
a fixed-width binary value, WKT strings (or any other text-based representation of decimal
coordinates) vary in length depending on the number of decimal places with which coordinates are
stated. There are occasions when you might deliberately want to reduce the precision of these
coordinate values to reduce the size of data that needs to be transmitted between the database and
other layers of your application. This is most likely to occur when sending data to an end-user UI
system over a network, where extreme accuracy (e.g., at the submillimeter level) is not required, and
performance gains can be made by reducing the transmission size of the data and reducing bandwidth.
Suppose that you have a web-based mapping application that plots a number of points of interest
stored in SQL Server. The locations of these points are stored using the geography datatype, so each
coordinate is natively stored as an 8-byte binary value. However, in order to pass this information to
the web application, that data is to be represented as WKT, and exposed via a web service. To minimize
bandwidth, it is decided to round the coordinate values in the WKT string to only display five decimal
places of precision. The following code listing demonstrates a C# SQLCLR function to achieve this:
using System;
using System.Collections.Generic;
using System.IO;
using System.Text;
using Microsoft.SqlServer.Types;
using System.Data;
using System.Data.SqlTypes;
using Microsoft.SqlServer.Server;

namespace ProSQLSpatial.Ch7
{
class RoundGeography : IGeographySink110
{
private readonly IGeographySink110 _target; // the target sink
private readonly int _precision; // the number of fractional digits in the return
value

public RoundGeography(int precision, IGeographySink110 target)


{

168

www.it-ebooks.info
CHAPTER 7 PRECISION, VALIDITY, AND ERRORS

_target = target;
_precision = precision;
}

public void SetSrid(int srid)


{
_target.SetSrid(srid);
}

public void BeginGeography(OpenGisGeographyType type)


{
_target.BeginGeography(type);
}

// Each BeginFigure call rounds the start point to the required precision.
public void BeginFigure(double x, double y, double? z, double? m)
{
_target.BeginFigure(Math.Round(x, _precision), Math.Round(y, _precision), z, m);
}

// Each AddLine call rounds subsequent points to the required precision.


public void AddLine(double x, double y, double? z, double? m)
{
_target.AddLine(Math.Round(x, _precision), Math.Round(y, _precision), z, m);
}

// Each AddCircularArc call rounds subsequent points to the required precision


public void AddCircularArc(double x1, double y1, double? z1, double? m1, double x2,
double y2, double? z2, double? m2)
{
_target.AddCircularArc(Math.Round(x1, _precision), Math.Round(y1, _precision), z1, m1,
Math.Round(x2, _precision), Math.Round(y2, _precision), z2, m2);
}

public void EndFigure()


{
_target.EndFigure();
}

public void EndGeography()


{
_target.EndGeography();
}
}

// Create a wrapper function


public partial class UserDefinedFunctions
{
[Microsoft.SqlServer.Server.SqlFunction(DataAccess = DataAccessKind.Read)]
public static SqlGeography RoundGeography(SqlGeography g, Int32 precision)
{
SqlGeographyBuilder constructed = new SqlGeographyBuilder();
RoundGeography rounded = new RoundGeography(precision, constructed);
g.Populate(rounded);

169

www.it-ebooks.info
CHAPTER 7 PRECISION, VALIDITY, AND ERRORS

return constructed.ConstructedGeography.MakeValid();
}
}
}
The preceding code listing makes use of an IGeographySink110 interface, which is populated with a
supplied geography instance. When the first point is added to the geography instance (in the
BeginFigure() method), and as each subsequent point is added (in the AddLine() or AddCircularArc()
methods), the coordinate values are rounded to the specified number of decimal places. Finally, the
MakeValid() method is called on the constructed geography instance to ensure that the rounded
coordinate values have not caused the geometry to degenerate, and the resulting value is returned to the
caller.
You can compile this method in an assembly and import it into SQL Server, then register a
corresponding function as follows:
CREATE FUNCTION dbo.RoundGeography (
@g geography,
@precision int
) RETURNS geography
AS EXTERNAL NAME
ProSpatialCh7.[ProSpatial.Ch7.UserDefinedFunctions].RoundGeography;
To test the effectiveness of the new RoundGeography function, we'll use it to round the coordinates
of a Point representing the location of the Eiffel Tower in Paris from 16 decimal places to 5 decimal
places, as follows:
DECLARE @EiffelTower geography = 'POINT(2.2945117950439298 48.858259942745526)';
DECLARE @RoundedEiffelTower geography = dbo.RoundGeography(@EiffelTower, 5);

SELECT
@EiffelTower.ToString() AS WKT,
DATALENGTH(@EiffelTower.ToString()) AS Length
UNION ALL
SELECT
@RoundedEiffelTower.ToString() AS WKT,
DATALENGTH(@RoundedEiffelTower.ToString()) AS Length;
The results are as follows:

WKT Length
POINT (2.2945117950439298 48.858259942745526) 90
POINT (2.29451 48.85826) 48

By rounding the coordinate values to 5 decimal places, we have nearly halved the size of the the
WKT string to be transferred, from 90 bytes to 48 bytes. What effect has this had on the accuracy of the
data? By executing the following query, we can see that we have shifted the location of the Eiffel Tower
by just 13 cm:
SELECT @EiffelTower.STDistance(@RoundedEiffelTower);

0.13187268312192

170

www.it-ebooks.info
CHAPTER 7 PRECISION, VALIDITY, AND ERRORS

Depending on the purpose of the application, this difference of 13 cm may or may not be
significant. It will be a matter of deciding on an individual case-by-case basis the appropriate tradeoff
between accuracy and datasize.

Note Reducing the number of decimal places in a coordinate value will not reduce the size of spatial data stored
in SQL Server, but may reduce the size of data transferred to and from the database if transmitted in text format.

Calculation Precision
In the last section, we looked at issues relating to the precision with which coordinates are supplied,
stored, and retrieved from SQL Server. In practice, it is relatively unlikely that you will encounter too
many precision problems here; as long as you do not have a highly specialized application that
requires greater than 64-bit coordinate accuracy, you can be fairly sure that SQL Server will be able to
store and retrieve it with full fidelity.
A separate, and perhaps more important issue to consider relates to the precision with which SQL
Server perform calculations on coordinates. Every time you use any method that creates or modifies a
geometry or geography instance, or compares the relationship between two geometries, there is the
potential for coordinates to be modified slightly, and this is very important to understand, especially if
you are creating any application that relies on testing for strict equality between two geometries.
To demonstrate the issue, lets look at example. Consider the following two lines:
DECLARE @line1 geometry = 'LINESTRING(0 13, 431 310)';
DECLARE @line2 geometry = 'LINESTRING(0 502, 651 1)';
You can view these lines in SSMS Spatial Results tab as follows:
SELECT @line1
UNION ALL SELECT @line2;
The result is shown in Figure 7-3.

Figure 7-3. Two intersecting lines viewed in the SSMS spatial results tab

171

www.it-ebooks.info
CHAPTER 7 PRECISION, VALIDITY, AND ERRORS

The two lines clearly cross each other, and SQL Server will calculate the point at which they
intersect using the STIntersection() method:
SELECT
@line1.STIntersection(@line2).ToString();
The calculated result of the point of intersection is POINT (335.23450808497148 244.008466128158).
So far, there's nothing unusual. But what about if we run the following query instead:
SELECT
@line1.STIntersection(@line2).STIntersects(@line1), --0
@line1.STIntersection(@line2).STIntersects(@line2); --0
The STIntersects() method returns 0 in both cases. When expressed in words, the result of this
query suggests The point at which line1 intersects line2 does not intersect line1 or line2. Huh?
Lets look at another example, this time involving two overlapping Polygons:
DECLARE @square geometry = 'POLYGON((0 0, 100 0, 100 100, 0 100, 0 0))';
DECLARE @rectangle geometry = 'POLYGON((-10 5, 10 5, 10 15, -10 15, -10 5))';

SELECT
@rectangle.STIntersects(@square),
@rectangle.STIntersection(@square).STArea();
The preceding code listing creates a simple square Polygon, one hundred units high by one
hundred units wide, and a smaller rectangular Polygon that overlaps it on the left-hand side. These
shapes are illustrated in Figure 7-4.

Figure 7-4. Two overlapping Polygon geometries

The region of the overlap between the two geometries is a square 10 units wide by 10 units high,
exactly 100 units in area. However, the result returned by the STArea() method when used to calculate
the area created by the intersection in the preceding code listing is 99.999999999995.
The numbers involved in this example are hardly complex, so you would probably expect the result
to be exactly 100, right? Whats interesting is that, if we increase the size of the large square by
extending it further along the x- and y-axes without changing the region of overlap with the smaller

172

www.it-ebooks.info
CHAPTER 7 PRECISION, VALIDITY, AND ERRORS

rectangle, the calculated area of intersection between the two geometries becomes less and less
precise. The following code listing demonstrates the calculated area of intersection between the
rectangle and increasing-sized square Polygons which it overlaps (the true area of overlap
maintained consistently at 100 units):
DECLARE @rectangle geometry = 'POLYGON((-10 5, 10 5, 10 15, -10 15, -10 5))';

DECLARE @square geometry = 'POLYGON((0 0, 1000 0, 1000 1000, 0 1000, 0 0))';


SELECT @rectangle.STIntersection(@square).STArea();
-- 99.9999999999713

DECLARE @square2 geometry = 'POLYGON((0 0, 100000 0, 100000 100000, 0 100000, 0 0))';


SELECT @rectangle.STIntersection(@square2).STArea();
-- 99.9999999962893

DECLARE @square3 geometry = 'POLYGON((0 0, 1e9 0, 1e9 1e9, 0 1e9, 0 0))';


SELECT @rectangle.STIntersection(@square3).STArea();
-- 99.9999690055833

DECLARE @square4 geometry = 'POLYGON((0 0, 1e12 0, 1e12 1e12, 0 1e12, 0 0))';


SELECT @rectangle.STIntersection(@square4).STArea();
-- 99.9691756255925

DECLARE @square5 geometry = 'POLYGON((0 0, 1e15 0, 1e15 1e15, 0 1e15, 0 0))';


SELECT @rectangle.STIntersection(@square5).STArea();
-- 67.03125

Note Because WKT is a parsed text format, you can state coordinates using scientific notation. In the preceding
code listing, the coordinate value 1e9 is equal to 1,000,000,000.

So whats going on here? Its not the rounding issues caused by conversion from binary to
decimal; as explained earlier, that only occurs when a new geometry is instantiated from, or retrieved
as, WKT. It is, however, another kind of internal coordinate conversion. Although SQL Server stores
coordinates as floating point binary values, it performs spatial calculations using integer arithmetic.
Floating point coordinates are, by the very nature of any floating point system, approximate, and
floating point arithmetic is not robust; calculating the difference between two similar floating point
values, for example, can create results containing errors with large orders of magnitude. Integer
arithmetic, in contrast, is always exact and reliable, whatever the integer values supplied.
Therefore, in order to perform spatial calculations in a robust fashion, SQL Server first snaps all
coordinates to an integer grid. Note that coordinate values are not simply converted to the closest
integer value (which would lose any fractional precision), but they are scaled to an integer value on a
dynamically sized grid. This can be tricky to visualize, but I find a helpful analogy is to think back to
mathematics lessons at school, in the days before MS Excel, when you had to draw graphs by hand on a
piece of graph paper. The skill in drawing such graphs was to choose an appropriate scale, deciding
how many units each cell on the grid should represent. The ideal scale is one in which all the data
values can be plotted exactly on a grid cell boundary, but subject to the fact that you had to get the
whole range of data to fit on a fixed size piece of paper.
SQL Server takes a similar approach; every spatial calculation takes place on an integer grid of
fixed size, equivalent to the fixed size sheet of graph paper. The geometries involved in the calculation
are scaled to fit onto this grid, which is equivalent to adjusting the size of the bars for a bar chart drawn

173

www.it-ebooks.info
CHAPTER 7 PRECISION, VALIDITY, AND ERRORS

on graph paper, say. Having achieved the optimum scaling possible considering the overall fixed grid
size, each coordinate is then snapped to the closest cell on the grid. Some results will lie exactly on a
grid cell boundary, in which case the results returned by SQL Server will be exact but, in other cases,
the result will be subject to some error. Finally, the result of the integer calculation is then converted
back into floating-point coordinates again, in the same scale as originally supplied.
The amount of error introduced in this process is determined by two main factors:
The overall size of the grid on which calculations are performed.
The extent of the geometries involved in a given calculation.
With regard to the first of these factors, all spatial calculations in SQL Server 2012 are performed
on integer values with 42 bits of precision. This 42-bit precision is used across all SQL Server 2012 and
SQL Azure instances, and represents a considerable increase in precision compared to the 27-bit
integer grid used by SQL Server 2008/R2.
As for the second factor, the greater the range of coordinate values that must be covered by the
integer grid, then the more coarse the grid resolution must become in order to accommodate the full
set of data. As each cell becomes larger and the distance between cells increases, there is a greater
distance that coordinates might potentially be snapped. This explains the effect demonstrated in the
preceding code listing, in which the area of intersection between the square and rectangular Polygon
became less and less precise as the overall size of the square became larger and larger, because the
integer grid had to become ever more coarse to accommodate it, leading to less accurate results.

Caution Because SQL Server 2012 performs calculations using a more precise, 42-bit, integer grid than that
used in SQL Server 2008/R2, the results of any spatial computations may differ from those obtained under a
different version of the database server. This is especially important to consider when upgrading a database from
a previous version, because executing a spatial query in SQL Server 2012 may lead to different results than those
previously obtained from running exactly the same query under SQL Server 2008/R2.

Precision and Equality


The 42-bit integer grid is pretty high accuracy and it will only lead to fractional shifts in coordinates
that are very unlikely to have any significant impact on all but very large, very detailed geometries.
The preceding example deliberately used an exaggerated square geometry 1 quadrillion units (1e15)
in height and width in order to emphasize the effect caused.
However, it does raise an important point: even if the coordinates have not shifted much,
performing almost any type of operation can cause the coordinate values of an instance to shift a little.
This means you must be incredibly careful with any operations that require you to test whether two
instances are exactly equal (as in the STEquals() method). If one or both instances have been involved
in a calculation requiring them to be snapped to the integer grid and back, their coordinates may have
changed fractionally.
It is therefore best never to test for exact equality between two geometries, but to test whether two
instances are "equal" within a certain acceptable tolerance. The method for determining an
acceptable tolerance may differ depending on the geometry types being compared, and on the
particular application in question. For example, to compare whether two LineStrings are "equal," you
might check that the start points and end points lie within a certain distance of each other, that the
length of each LineString is roughly the same, and that the path of one LineString never deviates by
more than a certain amount from the path of the other. These checks can be incorporated in a function
as follows:

174

www.it-ebooks.info
CHAPTER 7 PRECISION, VALIDITY, AND ERRORS

CREATE FUNCTION CompareLineStrings (@l1 geometry, @l2 geometry)


RETURNS bit AS
BEGIN

-- Only test LineString geometries


IF NOT (@l1.STGeometryType() = 'LINESTRING' AND @l2.STGeometryType() = 'LINESTRING')
RETURN NULL

-- Startpoints differ by more than 1 unit


IF @l1.STStartPoint().STDistance(@l2.STStartPoint()) > 1
RETURN 0

-- Endpoints differ by more than 1 unit


IF @l1.STEndPoint().STDistance(@l2.STEndPoint()) > 1
RETURN 0

-- Length differs by more than 5%


IF ABS(@l1.STLength() - @l2.STLength() / @l1.STLength()) > 0.05
RETURN 0

-- Any part of l2 lies more than 0.1 units from l1


IF @l1.STBuffer(0.1).STDifference(@l2).STEquals('GEOMETRYCOLLECTION EMPTY') = 0
RETURN 0

-- All tests pass, so return success


RETURN 1
END
This function compares different aspects of similarity between two LineString geometries. If any
of the tests fail then the function returns 0, otherwise the function returns 1.
An alternative method of comparing whether two geometries are equal is to use the STEquals()
method but, rather than directly compare whether the points in two geometries are the same, use it in
conjunction with the RoundGeography() method introduced earlier this chapter to compare whether the
sets of coordinates are the same after having been rounded to a certain number of decimal places.
Clearly, the fewer the number of decimal places to which values are rounded, the greater is the
tolerance within which the instances will be considered equal.

Validity
Whenever you create a new geometry or geography instance from a static method, such as
STGeomFromText(), SQL Server performs a number of checks on the created geometry. Examples of some
of these checks are as follows:
A LineString must have at least two distinct points.
A Polygon must contain at least four points, and the start point and end point
must be the same.
The spatial reference system in which the coordinates of any geography instance
are defined must correspond to one of the supported spatial reference systems in
the sys.spatial_reference_systems table.
If any of these checks fail, SQL Server will throw an exception. Generally speaking, each rule has
its own exception number and corresponding exception message. Some examples of specific error

175

www.it-ebooks.info
CHAPTER 7 PRECISION, VALIDITY, AND ERRORS

messages are demonstrated in the following code listings. Firstly, attempting to create a geometry
LineString containing only one point:
DECLARE @LineMustHave2Points geometry;
SET @LineMustHave2Points = geometry::STLineFromText('LINESTRING(3 2)', 0);

System.FormatException: 24117: The LineString input is not valid because it does not have
enough distinct points. A LineString must have at least two distinct points.

The next code listing demonstrates a Polygon that only contains three points in its exterior ring:
DECLARE @PolygonMustHave4Points geometry;
SET @PolygonMustHave4Points = geometry::STPolyFromText('POLYGON((0 0, 10 2, 0 0))', 0);

System.FormatException: 24305: The Polygon input is not valid because the ring does not have
enough distinct points. Each ring of a polygon must contain at least three distinct points.

And the following example attempts to create a Point from well-formed WKT, but using an invalid
spatial reference identifier:
DECLARE @UnsupportedSRID geography;
SET @UnsupportedSRID = geography::STPointFromText('POINT(52 0)', 123);

System.FormatException: 24204: The spatial reference identifier (SRID) is not valid. The
specified SRID must match one of the supported SRIDs displayed in the
sys.spatial_reference_systems catalog view.

Each of the preceding examples generates an exception message, and the requested geometry is
not created. Fortunately, the exception messages state pretty clearly what the problem is, so finding
and fixing these errors is relatively easy.
However, even if you were to address these errors so that the static method executes successfully, it
does not necessarily mean that the resulting geometry is valid. In other words, it is possible to create
some geometries that do not cause any exceptions to be thrown, but which do not meet the OGC
requirements for that type of geometry. For example, the following code listing creates a geometry
Polygon in which the interior rings crosses the exterior ring. This code listing can be executed and will
successfully create a geometry instance with no exception occurring:
DECLARE @SelfIntersectingPolygon geometry;
SET @SelfIntersectingPolygon = 'POLYGON((0 0, 6 0, 3 5, 0 0), (2 2, 8 2, 8 4, 2 4, 2 2))';
Another example of an invalid geometry is a LineString that retraces itself, defining certain
segments of the LineString twice, as shown in the following code listing:
DECLARE @InvalidLinestring geometry;
SET @InvalidLinestring = 'LINESTRING(0 0, 10 0, 5 0)';
Once again, this geometry can be created and stored in SQL Server, even though it is not valid.
However, although it is possible to define, store, and retrieve invalid geometries without receiving
any exception messages, you will receive an exception if you attempt to use them in any spatial
queries. For example, if you were to try to determine the area of the preceding Polygon using the
STArea() method:

176

www.it-ebooks.info
CHAPTER 7 PRECISION, VALIDITY, AND ERRORS

@SelfIntersectingPolygon.STArea();
you would get the following result:

System.ArgumentException: 24144: This operation cannot be completed because the instance is


not valid. Use MakeValid to convert the instance to a valid instance. Note that MakeValid
may cause the points of a geometry instance to shift slightly.

As the message contained within the exception suggests, in order to perform any operations on an
invalid geometry, you must first make it valid.

Testing Whether a Geometry Is Valid


The first step in dealing with invalid geometry data is to identify those geometries that are invalid. To
do so, you can use the STIsValid() method. The STIsValid() method tests whether a geometry or
geography instance meets all the criteria required to ensure that instance is valid, based on the OGC
definition of the type of geometry in question.
The STIsValid() method requires no parameters, and can be called as follows:
Instance.STIsValid()
If the instance is valid, this method returns a bit value of 1 and, if the instance is invalid, it returns
0. To demonstrate the STIsValid() method, consider the following example code:
DECLARE @Spike geometry
SET @Spike = geometry::STPolyFromText('POLYGON((0 0,1 1,2 2,0 0))', 0)
SELECT
@Spike.STAsText(),
@Spike.STIsValid().

POLYGON ((0 0, 1 1, 2 2, 0 0)) 0

In this case, although SQL Server lets us create the Polygon geometry without exception, it is not
valid according to the OGC specifications because the exterior ring consists of a single spike. Because
the exterior ring intersects itself it is not simple, which is one of the requirements of the Polygon
geometry.
Invalid geometries can only be created by supplying an invalid representation to a static method;
all geometry instances returned by SQL Server's instance methods (such as the result of the STUnion(),
STIntersection(), or STBuffer() methods) will always be valid. There is therefore no need to call
STIsValid() on the results of any of these methods.

Note All geometries returned as the result of a SQL Server method on an existing instance will be valid.

Finding Out Why a Geometry Is Invalid


Having identified that a geometry is invalid, the next logical step is to determine the cause of its
invalidity. In a simple example such as in the preceding code listing, it is relatively easy to glance at

177

www.it-ebooks.info
CHAPTER 7 PRECISION, VALIDITY, AND ERRORS

the Well-Known Text and see the cause of the problem. However, this is not the case when dealing with
invalid geometries constructed from thousands of points, represented in binary format or imported
programmatically! This was a real headache in SQL Server 2008, when data imported from external
sources (such as U.S. census TIGER data) would frequently be declared as "invalid," with no further
information provided as to the cause.
Fortunately, SQL Server 2012 introduces a new method specifically intended to help diagnose
problems with invalid geometries, IsValidDetailed(). Rather than simply returning a bit value
representing if a geometry is valid or not, the IsValidDetailed() method returns a code and text
description of the cause of any invalidity of a geometry, as demonstrated in the following code listing:
DECLARE @g geometry = 'LINESTRING(0 0, 5 10, 8 2)';
DECLARE @h geometry = 'LINESTRING(0 0, 10 0, 5 0)';
DECLARE @i geometry = 'POLYGON((0 0, 2 0, 2 2, 0 2, 0 0), (1 0, 3 0, 3 1, 1 1, 1 0))';

SELECT
@g.STIsValid() AS STIsValid, @g.IsValidDetailed() AS IsValidDetailed
UNION ALL SELECT
@h.STIsValid(), @h.IsValidDetailed()
UNION ALL SELECT
@h.STIsValid(), @i.IsValidDetailed();
The results are as follows:

STIsValid IsValidDetailed
1 24400: Valid
0 24413: Not valid because of two overlapping edges in curve (1).
0 24404: Not valid because polygon ring (2) intersects itself or some other ring.

Based on the output of IsValidDetailed(), you can then investigate further the particular problem
identified with the geometry. For a full list of possible problems identified by IsValidDetailed(),
please refer to the appendix.
A final point worth noting is that STIsValid() and IsValidDetailed() check only that a geometry is
well-formed according to OGC definitions. They do not necessarily ensure that a geometry makes
logical sense. In the following code listing, the STIsValid() method confirms that the Point geometry is
valid, even though the coordinates lie outside the bounds of the area of use of the specified spatial
reference system (the minimum x,y bounds of EPSG:27700 are at 0,0):
DECLARE @g geometry;
SET @g = geometry::STGeomFromText('POINT(-300412 -200123)', 27700);
SELECT @g.STIsValid();

Making an Object Valid


Having used STIsValid() to identify those geometries that are invalid, we can now set about making
them valid. SQL Server 2012 comes with a dedicated method, MakeValid(), designed to "fix" invalid
geometry or geography data. The MakeValid() method can be used on any type of geometry, and will
return a valid geometry based on the set of points specified. However, in order to do so, the resulting
geometry may be of a different type than that originally supplied, and the coordinate values of that
geometry may also have shifted fractionally.
To understand how MakeValid() works, let's examine its output when called on the invalid geometries
discussed in the previous section. Firstly, let's consider the case of the self-intersecting LineString:

178

www.it-ebooks.info
CHAPTER 7 PRECISION, VALIDITY, AND ERRORS

DECLARE @InvalidLinestring geometry;


SET @InvalidLinestring = 'LINESTRING(0 0, 10 0, 5 0)';
SELECT @InvalidLinestring.MakeValid().ToString()
In this example, the reason for the invalidity is that the final coordinate (5,0) causes the path of the
LineString to retrace itself. The valid geometry returned by the MakeValid() method therefore simply
omits this line segment, as follows:

LINESTRING(0 0, 10 0)

Note that the result is still a LineString, and still represents exactly the same set of points,
although the end point of the LineString is now different. Now let's consider the example of the
Polygon consisting of a single spike.
DECLARE @Spike geometry = 'POLYGON((0 0,1 1,2 2,0 0))';
SELECT @Spike.MakeValid().ToString()
In this case, the appropriate geometry type to represent the three distinct points in the supplied
WKT is not a Polygon as originally stated, but a LineString. The result of the MakeValid() method in this
case is therefore a valid LineString, as follows:

LINESTRING (2 2, 1 1, 0 0)

Caution Note that in some cases (as demonstrated here), the MakeValid() method may return a different type
of geometry than that originally supplied.

Finally, let's consider the case of the Polygon in which the interior ring crossed over the exterior
ring:
DECLARE @SelfIntersectingPolygon geometry;
SET @SelfIntersectingPolygon = 'POLYGON((0 0, 6 0, 3 5, 0 0), (2 2, 8
2, 8 4, 2 4, 2 2))';
SELECT @SelfIntersectingPolygon.MakeValid().ToString();
In this example, the ambiguous areas of space defined by the rings of the original supplied Polygon
have been divided into four separate Polygons, contained within a MultiPolygon collection. The
segments of interior ring that lay outside the original exterior ring have been treated as defining new
Polygons, as shown in Figure 7-5.

MULTIPOLYGON (
((2.4000000000000004 4, 3.6 4, 3 5, 2.4 4, 2.4000000000000004 4)),
((2 3.3333333333333335, 2.4 4, 2 4, 2 3.3333333333333335)),
((4.8 2, 8 2, 8 4, 3.6000000000000005 4, 4.8 2)),
((0 0, 6 0, 4.8 2, 2 2, 2 3.3333333333333335, 0 0))
)

179

www.it-ebooks.info
CHAPTER 7 PRECISION, VALIDITY, AND ERRORS

Notice also that the result of this example demonstrates how precision issues can arise as a result
of any spatial operations in SQL Server, including the result of the MakeValid() operation (the very
first coordinate, if calculated using decimal arithmetic, would be 2.4 exactly, not 2.4000000000000004).

Figure 7-5. A MultiPolygon created as a result of calling MakeValid() on an invalid Polygon in which the
interior ring intersects the exterior ring.

If called on a existing valid instance, MakeValid() has no effect. Therefore it is a very useful
method that can be safely called to ensure geography or geometry data is valid before passing it on for
further calculations.

INVALID GEOMETRY, OR THE WRONG SORT OF GEOMETRY?


Issues of invalid data generally only occur when you are creating geometry or geography from raw data
imported into SQL Server 2012, or created programmatically through one of the builder classes. SQL Server
will never return invalid geometries from one of its instance methods, so once you've got a set of clean
valid data in your database, you can generally use it without too many worries as to issues of invalidity.
Sometimes, problems apparently caused by the requirement of validity are actually due to the fact that the
wrong geometry type has been selected to represent an item of data. The most common example of this I
see is when people use LineString geometries to represent routes using data gathered from GPS devices.
The problem in such cases is that, as discussed previously, if any segment of a LineString geometry
retraces itself, the LineString will be considered invalid.
Considering that the GPS data is recorded as a set of distinct readings at separate intervals in time, the
most appropriate geometry to use in such a case is arguably a MultiPoint rather than a LineString (because
there is no evidence to suggest that the vehicle actually followed the straight line route between each point
in the log as indicated by the linear interpolation of a LineString geometry). The M-coordinate of each Point
in the MultiPoint can be used to store the order in which those points were recorded. Alternatively, a route
that retraces itself may have to be represented as a number of distinct LineString elements in a
MultiLineString collection.

180

www.it-ebooks.info
CHAPTER 7 PRECISION, VALIDITY, AND ERRORS

Handling Errors
The geometry and geography datatypes are CLR datatypes, so spatial operatons are executed within the
SQLCLR environment, the .NET Framework Common Language Runtime process hosted by SQL
Server. The .NET Framework provides its own exception-handling mechanism, which is quite separate
from the mechanism used to deal with T-SQL exceptions typically encountered in SQL Server. So, how
do the two systems interact when an exception occurs when dealing with geography or geometry data?
SQL Server automatically wraps an exception handler around any SQLCLR managed code
executed from within SQL Server. This includes any user-defined CLR procedures or functions as well
as the system-defined CLR methods used by the geography and geometry datatypes. The purpose of the
wrapper is that, if any managed code throws an exception, it is caught by the wrapper, which then
generates an error, rather than allowing the exception to bubble up further through the system. The
error message created by the wrapper contains details of the SQLCLR exception, together with a stack
trace of the point at which it occurred.

Dissecting a SQLCLR Exception


To understand the way in which SQLCLR errors are handled, let's look at an example by revisiting one
of the code listings from the previous section:
DECLARE @LineMustHave2Points geometry;
SET @LineMustHave2Points = geometry::STLineFromText('LINESTRING(3 2)', 0);
This code listing raises an exception because the supplied WKT specifies a LineString containing
only one point. Executing this code listing in SQL Server Management Studio produces the following
message:

Msg 6522, Level 16, State 1, Line 2


A .NET Framework error occurred during execution of user-defined routine or aggregate
"geometry":
System.FormatException: 24117: The LineString input is not valid because it does not have
enough distinct points.
A LineString must have at least two distinct points.
System.FormatException:
at Microsoft.SqlServer.Types.Validator.Execute(Transition transition)
at Microsoft.SqlServer.Types.ForwardingGeoDataSink.EndFigure()
at Microsoft.SqlServer.Types.OpenGisWktReader.ParseLineStringText()
at Microsoft.SqlServer.Types.OpenGisWktReader.ParseTaggedText(OpenGisType type)
at Microsoft.SqlServer.Types.OpenGisWktReader.Read(OpenGisType type, Int32 srid)
at Microsoft.SqlServer.Types.SqlGeometry.GeometryFromText(OpenGisType type, SqlChars
text, Int32 srid)

There are several components to this error message, so let's dissect it into its separate elements,
starting with the first line: Msg 6522, Level 16, State 1, Line 2. This line actually contains four
pieces of information:
The value immediately following Msg tells us the error number that has occurred;
in this case, it is error number 6522.
The Level tag informs us that the error in this case has been classified as a level 16
error. Every error in SQL Server is assigned a level in the range between 1 and 25,
and this value can be used as an approximate indication of the severity of the
error. A level 16 error falls into the category of "errors that can be corrected by the

181

www.it-ebooks.info
CHAPTER 7 PRECISION, VALIDITY, AND ERRORS

user," as described in Microsoft documentation. The majority of exceptions thrown


by SQL Server are in this category, including constraint violations, parsing and
compilation errors, and many other runtime exceptions.
Each exception also has a State, which contains information about the exception
that is used internally by SQL Server. This error has a State of 1, but because the
values that SQL Server uses for this tag are not documented, this information is
not very helpful!
Line 2 informs us that the error occurred in the second line of the submitted
batch of code, in the attempt to instantiate the geometry LineString.
The second line of the message contains the description of the error that has occurred. In this case,
it informs us that A .NET Framework error occurred during execution of user-defined routine or
aggregate. The important thing to note here is that this message, and the preceding error number,
level, and state, relate to the generic error raised by SQL Server whenever any unhandled exception
occurs within SQLCLR. This is exactly the same error as would be raised if an error were encountered
in any user-defined CLR procedure, or when using the hierarchyid datatype (another system-deifned
CLR datatype), for example.
It is only following the generic error description that we get on to the specific details of the SQLCLR
exception that caused this error to be raised in this case. The next line of the error message reports that
an exception (of type System.FormatException) 24117 was encountered, and we get an unusually verbose
and helpful description of the particular error that occurred: The LineString input is not valid because
it does not have enough distinct points. A LineString must have at least two distinct points.
Following the error message is the stack trace, which reveals where the error occurred in
execution; in this case the exception originally occurred within the Execute() method of the Validator
class, which had been called from the SqlGeometry GeometryFromText() method.
Let's now compare this to the message received from another one of the other earlier examples,
which occurs when trying to use an unsupported SRID for the geography datatype:
DECLARE @UnsupportedSRID geography;
SET @UnsupportedSRID = geography::STPointFromText('POINT(52 0)', 123);
The message received is as follows:

Msg 6522, Level 16, State 1, Line 2


A .NET Framework error occurred during execution of user-defined routine or aggregate
"geography":
System.FormatException: 24204: The spatial reference identifier (SRID) is not valid. The
specified SRID must match one of the supported SRIDs displayed in the
sys.spatial_reference_systems catalog view.
System.FormatException:
at Microsoft.SqlServer.Types.GeographyValidator.ValidateSrid(Int32 srid)
at Microsoft.SqlServer.Types.ForwardingGeoDataSink.SetSrid(Int32 srid)
at Microsoft.SqlServer.Types.CoordinateReversingGeoDataSink.SetSrid(Int32 srid)
at Microsoft.SqlServer.Types.OpenGisWktReader.Read(OpenGisType type, Int32 srid)
at Microsoft.SqlServer.Types.SqlGeography.GeographyFromText(OpenGisType type, SqlChars
taggedText, Int32 srid)

Notice that the error number, level, and state, together with the initial part of the error message are
identical to that received previously, even though the cause of the error in this case was completely

182

www.it-ebooks.info
CHAPTER 7 PRECISION, VALIDITY, AND ERRORS

different. It is only by examining the error message in full that we get to see the underlying CLR
exception that caused the error to be triggered (in this case, System.FormatException 24204).

Error-Handling Mechanisms
Generally speaking, it is best to deal with any exceptions in code at the lowest level possible. In the
case of user-defined CLR functions, this means adding code to handle the exception within the CLR
function itself, in which case it never needs to be caught at the T-SQL level. However, this is not an
option for system-defined CLR types; there is no way to add exception-handling code to the sealed
Microsoft.SqlServer.Types.dll library, so any exceptions will inevitably bubble up and trigger the
error handler in the SQLCLR wrapper demonstrated in the previous section.
How, then, should you create specific code paths to handle such exceptions? The general approach
to error-handling in T-SQL (and in many other programming languages) is to use a TRY/CATCH
construct, containing two blocks of code. The try block contains exception-prone code that is to be
"tried." The second block of code, called the catch block, contains code that should be called in the event
that the code in the try block fails. As soon as any exception occurs within the try block, execution
immediately jumps into the catch block, which is known as "catching" the exception. Within the catch
block, different courses of action can be taken depending on the nature of the exception that occurred,
which can (generally) be determined by examining the value of the ERROR_NUMBER() function as shown
in the following code listing:
BEGIN TRY
SELECT geometry::STPolyFromText('POLYGON((0 0, 10 2, 0 0))', 0);
END TRY
BEGIN CATCH
IF ERROR_NUMBER() = 123
-- Code to deal with error 123 here
SELECT 'Error 123 occurred'
ELSE IF ERROR_NUMBER() = 456
-- Code to deal with error 456 here
SELECT 'Error 456 occurred'
ELSE
SELECT ERROR_NUMBER() AS ErrorNumber;
END CATCH
The problem is that this common approach to selecting conditional code paths based on T-SQL
error number won't work for the geography and geometry datatypes, because every exception occurring
within managed code will lead to the same T-SQL error: generic error 6522.
In order to create different code paths according to the CLR exception that occurred, we must parse
the contents of ERROR_MESSAGE() to try to identify the original CLR exception number specified in the
stack trace. The exceptions generated by the system-defined CLR types have five-digit exception
numbers in the range 24000 to 24999, so can be distilled from the ERROR_MESSSAGE() string using the T-
SQL PATINDEX function. The following code listing demonstrates this approach:
BEGIN TRY
SELECT geometry::STPolyFromText('POLYGON((0 0, 10 2, 0 0))', 0);
END TRY
BEGIN CATCH
-- Has a SQLCLR error occurred?
IF ERROR_NUMBER() = 6522
BEGIN
-- Retrieve the error message
DECLARE @errorMsg nvarchar(max) = ERROR_MESSAGE();

183

www.it-ebooks.info
CHAPTER 7 PRECISION, VALIDITY, AND ERRORS

DECLARE @exc int;


-- Distil the SQLCLR exception number from the error message
SET @exc = SUBSTRING(@errorMsg, PATINDEX('%: 24[0-9][0-9][0-9]%', @errorMsg) + 2, 5);
IF @exc = 24305
-- Code to deal with exception 24305 here
SELECT 'Exception 24305 occurred';
ELSE IF @exc = 24000
-- Code to deal with exception 24000 here
SELECT 'Exception 24000 occurred';
ELSE
SELECT '...';
END
END CATCH
With this revised code, the catch block is able to dictate different courses of action depending on
the nature of the SQLCLR exception that caused the try block to fail. In this case, the misformed Polygon
in the try block generates a 24305 exception, and the appropriate code to deal with this is triggered in
the catch block.
As an alternative approach to error-handling, you might choose to create your own custom
wrapper methods around each of the system-defined methods in SqlServer.Types.dll, which check for
and handle any CLR exceptions before passing the result back to SQL Server. An example of such a
wrapper placed around the geography Parse() method is shown in the following code listing:
[Microsoft.SqlServer.Server.SqlFunction()]
public static SqlGeography GeogTryParse(SqlString Input)
{
SqlGeography result = new SqlGeography();
try
{
result = SqlGeography.Parse(Input);
}
catch
{
// Exception Handling code here

}
return result;
}
The preceding code listing wraps a call to the SqlGeography Parse() method in a try block, which
allows the possibility to include appropriate exception handling code in the corresponding catch block.
To use this approach, you could register the GeogTryParse() function in SQL Server and use it in place
of the built-in geography Parse() method.
Table 7-1 lists some of the common exceptions you might want to create code paths to deal with,
and their corresponding exception numbers. For a full list of all exceptions, please see the appendix of
this book.

184

www.it-ebooks.info
CHAPTER 7 PRECISION, VALIDITY, AND ERRORS

Table 7-1. Spatial exception numbers and messages

Exception Number Description

24111 The well-known text (WKT) input is not valid.

24112 The well-known text (WKT) input is empty. To input an empty instance,
specify an empty instance of one of the following types: Point, LineString,
Polygon, MultiPoint, MultiLineString, MultiPolygon, or GeometryCollection.

24114 The label {0} in the input well-known text (WKT) is not valid. Valid labels are
POINT, LINESTRING, POLYGON, MULTIPOINT, MULTILINESTRING,
MULTIPOLYGON, or GEOMETRYCOLLECTION.

24117 The LineString input is not valid because it does not have enough distinct
points. A LineString must have at least two distinct points.

24118 The Polygon input is not valid because the exterior ring does not have enough
points. Each ring of a polygon must contain at least three distinct points.

24119 The Polygon input is not valid because the start and end points of the exterior
ring are not the same. Each ring of a polygon must have the same start and
end points.

24120 The Polygon input is not valid because the interior ring number {0} does not
have enough points. Each ring of a polygon must contain at least three points.

24121 The Polygon input is not valid because the start and end points of the interior
ring number {0} are not the same. Each ring of a polygon must have the same
start and end points.

24306 The Polygon input is not valid because the start and end points of the ring are
not the same. Each ring of a polygon must have the same start and end points.

Summary
In this chapter, you learned some of the issues that can affect the quality, accuracy, and robustness of
spatial data in SQL Server, and some of the methods that can be used to prevent the negative effect of
such issues.
SQL Server stores coordinates as 8-byte floating-point binary values, according
to the IEEE-754 specification.
When you create a geography or geometry instance from a static method such as
STGeomFromText(), the coordinate values are converted from nvarchar to binary(8).
This has the possibility for truncation or rounding of the supplied values.
When you perform calculations involving geometries, SQL Server uses integer
arithmetic based on a dynamically scaled 42-bit grid. This introduces an
element of approximation into the results of any spatial methods.

185

www.it-ebooks.info
CHAPTER 7 PRECISION, VALIDITY, AND ERRORS

You should avoid attempting to perform tests of exact equality between any two
geometries, instead testing whether they lie within a certain tolerance of each
other.
It is possible to create invalid geometries, but these can only be stored and
retrieved. In order to perform other operations, they must be made valid first.
The process of making a geometry valid may cause it to change type (e.g., from a
LineString to a MultiLineString) and may also cause its coordinates to shift
slightly.
Errors encountered when using the geography and geometry datatypes will
initially trigger a CLR exception. This will in turn be caught by a wrapper and
lead to a T-SQL error 6522.
Although you cannot use ERROR_NUMBER() to switch between alternative code
paths for CLR exceptions, you can distill the contents of ERROR_MESSAGE() to
retrieve the number of the underlying CLR exception.

186

www.it-ebooks.info
CHAPTER 8

Transformation and Reprojection

SQL Server supports spatial data defined in a wide variety of spatial reference systems used across the
world. When you create an individual item of geography data, you can use any one of the 392 spatial
reference systems listed in the sys.spatial_reference_systems table. And if you create geometry data,
you can use any spatial reference system you like; you can even define your own coordinate system
that uses coordinates measured from an origin somewhere in your living room.
Coordinate values are only valid for the spatial reference system in which they were defined, and
you can only compare two coordinates if they were obtained from the same coordinate system. In SQL
Server, this means that when you want to perform an operation involving two items of spatial datato
work out whether they intersect, to calculate the distance between them, or to join them together, for
exampleboth items must be of the same data type (i.e., both geography or both geometry), and both
must be defined using the same SRID. If you attempt to compare geometry data with geography data you
will receive an exception. If you attempt to compare two instances of the same type defined using
different SRIDs you will receive a NULL result.
The constraint that every item of data must use the same spatial reference system may not
present too many difficulties if you source your own spatial data; simply choose a reference system
that is appropriate for your application, and ensure that all coordinates are measured and consistently
recorded using that system. Unfortunately, as developers we rarely have such control over how our
data is gathered, and we frequently have to accommodate data provided in a variety of different SRIDs.
For example, data from the U.S. Census bureau (http://www.census.gov) is provided using geographic
coordinates based on the NAD83 datum (SRID 4269), but clearinghouses and GIS departments of
individual states typically use the state plane coordinate system of that state. Government agencies of
some other countries, such as Great Britain and Finland, use spatial reference systems based on the
national grid of those particular countries. And then, of course, there is 4326, the most commonly used
SRID for global applications.
SQL Server does not include any inbuilt methods to convert data between the two spatial
datatypes, or to transform coordinates between different spatial reference systems of the same
datatype. However, in this chapter I'll show you how you can integrate a freely available, open source
projection library, Proj.NET, into SQL Server, and use the methods of this library to transform
coordinates between different spatial reference systems suitable for both the geography and geometry
datatypes. By transforming data from different sources into a common, consistent spatial format, you
can then use the full range of spatial methods across your whole dataset.

Datum Transformation
Every spatial reference system used to define geospatial information is based on an underlying model
of the Earth: the datum. The datum describes the size and shape of the ellipsoid used to approximate the
shape of the Earth and the reference points on the surface of the Earth used to realize that model.

187

www.it-ebooks.info
CHAPTER 8 TRANSFORMATION AND REPROJECTION

To compare the definitions of spatial reference systems based on different datums, let's retrieve
the well-known text (WKT) representation of two geographic spatial reference systems listed in SQL
Server's sys.spatial_reference_systems table.

Note The Classical scholars among you may suggest that the plural of datum, referred to in the preceding
sentence, is data. Although this may be true of the Latin term from which the word is derived, when used in the
context of a geodetic datum, the correct English plural is datums. Fortunately, this also prevents confusion with the
more common use of the word "data" with which we are familiar in SQL Server!

First, let's look at the Luxembourg 1930 spatial reference system, SRID 4181. To retrieve the well-
known text for this system, execute the following query:
SELECT well_known_text
FROM sys.spatial_reference_systems
WHERE spatial_reference_id = 4181;
The results are shown following (indents added):
GEOGCS[
"Luxembourg 1930",
DATUM[
"Luxembourg 1930",
ELLIPSOID["International 1924", 6378388, 297]
],
PRIMEM["Greenwich", 0],
UNIT["Degree", 0.0174532925199433]
]
Now let's compare this to the Deutches Hauptdreiecksnetz system used in Germany, which is SRID
4314. To retrieve the well-known text for this system, execute the following query:
SELECT well_known_text
FROM sys.spatial_reference_systems
WHERE spatial_reference_id = 4314;
Again, the results have been formatted and shown as follows:
GEOGCS[
"DHDN",
DATUM[
"Deutsches Hauptdreiecksnetz",
ELLIPSOID["Bessel 1841", 6377397.155, 299.1528128]
],
PRIMEM["Greenwich", 0],
UNIT["Degree", 0.0174532925199433]
]
Examining the preceding WKT output, we can see several similarities between the two spatial
reference systems; they are both geographic coordinate systems, in which coordinates are stated in
angular degrees measured from the Greenwich prime meridian. However, the datum on which those
coordinates are applied is different in each system. The Luxembourg 1930 datum defines a set of points

188

www.it-ebooks.info
CHAPTER 8 TRANSFORMATION AND REPROJECTION

relative to the International 1924 ellipsoid, which has a radius of 6,378,388 meters at the equator, and
6,356,911.946 meters at the Poles. The Deutches Hauptdreiecksnetz system uses the Bessel 1841
ellipsoid, which is based on a more conservative estimate of the size of the earth, with an ellipsoid of
equatorial radius 6,377,397.155 meters and polar radius of 6,356,078.965 meters.

Note Instead of giving the dimensions of both the equatorial and polar axes, the WKT definition of a spatial
reference gives the length of the larger, semi-major axis together with the inverse flattening ratio. This ratio, which
generally has a value of around 300, indicates how much larger the earth is assumed to be at the equator than at
the poles, and can be used to derive the polar axis of the ellipsoid model.

The different ellipsoid models used in these datums are illustrated in Figure 8-1.

Figure 8-1. Comparing different geodetic datums

You cannot compare geography data defined using SRID 4181 to geography data defined using SRID
4314 without first transforming the sets of data to be based on a consistent datum. Failure to do so (i.e.,
treating geographic coordinates defined relative to one datum as if they had been defined on another)
can lead to coordinate locations being incorrect by several hundred meters.
Datum transformation involves converting coordinate values defined using one geodetic datum
into the equivalent coordinate values based on a different datum. In the context of the previous
example, we could either transform the coordinates defined using the Luxembourg datum to the DHDN
datum, or vice versa; the important thing is that all the resulting data should be defined relative to the
same datum.

189

www.it-ebooks.info
CHAPTER 8 TRANSFORMATION AND REPROJECTION

Transformation Algorithms
The process of datum transformation involves applying a mathematical function to adjust coordinate
values based on a specified algorithm and set of parameters. There are several alternative
transformation algorithms available, with varying degrees of complexity and accuracy. Generally
speaking, the greater the number of parameters used in the transformation, the more complex the
algorithm, and the greater the accuracy of the results.
In the following section, I'll examine three different algorithms that can be used to transform
coordinates between geodetic datums, in increasing order of complexity.

Coordinate Offsets (Two Parameters)


The simplest form of transformation involves adding constant values to the latitude and longitude
coordinate values defined relative to one datum to approximate the corresponding coordinates on a
second datum. This sort of transformation algorithm can be expressed in generalized form as follows:
Latitude2 = Latitude1 + Latitude
Longitude2 = Longitude1 + Longitude
This algorithm is very simplistic and requires only two parameters: the Latitude and Longitude
offsets to be added to each supplied coordinate value. For a transformation between any two given
datums, the same latitude and longitude offsets are applied consistently to every coordinate value
across the whole globe, as illustrated in Figure 8-2.

Figure 8-2. Applying geographic coordinate offsets

As an example, the location of Cardiff, Wales, has coordinates of (51.4826, 3.18183) relative to the
WGS84 datum. The same location expressed relative to the OSGB36 datum is (51.4821, 3.18057). Thus
we can define a set of offsets to convert from WGS84 to OSGB36 as follows:
, The change in latitude between "from" datum and "to" datum = 0.0005
degrees.
, The change in longitude between "from" datum and "to" datum = 0.00126
degrees.
However, this method is unable to allow for any change in shape between the datums, and is only
really suitable for low-precision applications in a limited area. The offsets calculated here are based on

190

f
www.it-ebooks.info
CHAPTER 8 TRANSFORMATION AND REPROJECTION

the differences between datums calculated at a single point and, as the geographic range of coordinates
to be transformed increases, the accuracy of the results obtained by this method diminishes.

Molodensky Transformation (Five Parameters)


The Molodensky transformation, named after Russian geodesist Mikhail Molodensky, is an algorithm
for converting coordinates between datums based on five parameters. These parameters describe the
translation of the center of the ellipsoid between the two datums in x-, y-, and z-axes (x, y, and z),
together with the difference in the length of the semi-major axis (a) and flattening ratio (f). Unlike
the simple geographic coordinate offset approach described previously, the Molodensky
transformation can account for a change in the shape, size, and position of the ellipsoid model
between two datums.
The Molodensky transformation can be expressed in general form as follows:
Latitude2 = f(Latitude1, x, y, z, a, f)
Longitude2 = f(Longitude1, x, y, z, a, f)
Figure 8-3 illustrates the process of Molodensky transformation of a point between two datums.

Figure 8-3. Molodensky five-parameter transformation applies translation and scaling of datum, but not
rotation.

Molodensky transformations using an appropriate set of parameters can generally achieve


results that are accurate to within a few meters. For example, the following set of parameters can be
used to transform from the local datum used in the Bahamas to the WGS84 datum:
a, Difference in semi-major axis between "from" datum and "to" datum = 69.4
meters
f, Difference in flattening ratio between datums = 0.000037264639
x, Translation along the x-axis = 4 meters
y, Translation along the y-axis = 154 meters
z, Translation along the z-axis = 178 meters
Because the Molodensky transformation only translates and scales parallel to the plane of the
existing defined axes, it is not suitable for cases in which the orientation of the ellipsoid has been rotated.

191

www.it-ebooks.info
CHAPTER 8 TRANSFORMATION AND REPROJECTION

Helmert Transformation (Seven Parameters)


The Helmert transformation, named after German geodesist Friedrich Helmert, requires seven
parameters to describe the transformation from one datum to another. For this reason, it is also
commonly known as the seven-parameter transformation. The parameters represent a translation
along the three coordinate axes (x, y, and z), as well as a rotation about each of the three axes. A final
parameter provides a scaling factor used across all three axes.
The process involved in the Helmert transformation is illustrated in Figure 8-4.

Figure 8-4. The Helmert transformation rotates and translates coordinate axes between datums

For example, the parameters required to perform a Helmert transformation from the Bessel 1841
datum to the WGS84 datum are as follows:
cx, Translation along the x-axis = 582 meters
cy, Translation along the y-axis = 105 meters
cz, Translation along the z-axis = 414 meters
rx, Rotation about the x-axis = 1.04 arcseconds
ry, Rotation about the y-axis = 0.35 arcseconds
rz, Rotation about the z-axis = 3.08 arcseconds
s, Scaling factor = 8.3 parts per million

Note The preceding section does not provide an exhaustive list of transformation algorithms, but describes
some of those most commonly used in GIS applications. Other more complex algorithms are sometimes used,
particularly in specialized scientific use, where the degree of accuracy required is much greater.

192

www.it-ebooks.info
CHAPTER 8 TRANSFORMATION AND REPROJECTION

Transforming to and from WGS84


In theory, it would be possible to compile a list of the transformation parameters required to convert
between any datum and any other datum, using any of the transformation algorithms listed
previously. With this knowledge, we could apply the relevant parameters to transform coordinates
defined using the Luxembourg 1930 datum directly into the DHDN datum, for example. However, to
specify the explicit conversion between every combination of the 392 geodetic systems supported by
SQL Server would require 153,272 sets of parameters. This would correspond to over one million
individual Helmert transformation parameters!
Fortunately, this is not generally necessary, and there is a much more efficient way of defining
the parameters required to convert between any two datums. The key to this approach is that all of the
transformation algorithms described in the preceding section are reversible. That is, if applying a
given set of parameters transforms from datum A to datum B, then applying the inverse set of
parameters (i.e., supplying the negative value of each parameter) will transform from datum B back to
datum A again. (Note that although this is true of all the algorithms described here, it is not true of
every transformation algorithm. The MolodenskyBadekas algorithm, for example, is not reversible.)
Therefore, rather than determine the set of parameters to convert from any datum directly into
any other datum, we need only define the parameters required to convert from any datum to a single
common datum. We can then transform coordinates between any two datums using a two-step process:
first applying the parameters to convert from the source datum into the common datum, and then
applying the inverse set of parameters associated with the target datum to convert from the common
datum to the target. Of course, if the common datum happens to be the source or target datum desired,
then one of these steps can be omitted.
The single common intermediate datum into which datums are generally transformed is, as you
may have guessed, WGS84, and the transformation most commonly used is the seven-parameter
Helmert transformation. The seven parameters required to transform into the WGS84 datum are
commonly included in the WKT definition of a datum, following the TOWGS84 keyword. This is
demonstrated in the following code listing, which restates the WKT representation of the Luxembourg
1930 spatial reference system given earlier but now includes the TOWGS84 parameters required to
convert coordinates from this system to WGS84 (highlighted in bold):
GEOGCS[
"Luxembourg 1930",
DATUM[
"Luxembourg_1930",
ELLIPSOID["International 1924", 6378388, 297],
TOWGS84[-193, 13.7, -39.3, -0.41, -2.933, 2.688, 0.43],
],
PRIMEM["Greenwich", 0],
UNIT["Degree",0.01745329251994328]
]
The first three values following the TOWGS84 keyword represent the translation in the x-, y-, and z-
axes, measured in meters. The next three parameters represent the rotation around the x-, y-, and z-
axes, measured in seconds of an arc, and the final parameter represents the scale change, measured in
parts per million. Taken together, these parameters provide all the information required to convert
from this datum to the WGS84 datum.
The seven parameters required to perform a Helmert transformation from the Deutsches
Hauptdreiecksnetz system to WGS84 may be stated as part of the following WKT:
GEOGCS[
"DHDN",
DATUM[
"Deutsches Hauptdreiecksnetz",

193

www.it-ebooks.info
CHAPTER 8 TRANSFORMATION AND REPROJECTION

ELLIPSOID["Bessel 1841", 6377397.155, 299.1528128]


TOWGS84[582, 105, 414, -1.04, -0.35, 3.08, 8.3],
],
PRIMEM["Greenwich", 0],
UNIT["Degree", 0.0174532925199433]
]
Knowing the additional TOWGS84 parameters just given, we can now convert from Luxembourg to
DHDN using two transformations: first applying the TOWGS84 parameters of the Luxembourg datum to
convert from Luxembourg to WGS84, and then applying the inverse TOWGS84 parameters from the
DHDN datum to convert from WGS84 to DHDN. This process is illustrated in Figure 8-5.

Figure 8-5. Applying datum transformation parameters to convert between any two datums via WGS84

Note There are different conventions regarding how to define the three rotation parameters supplied to the
Helmert transformation, which essentially differ in their interpretation of what is being rotated. In a position vector
rotation, the axes remain constant and the frame of points is rotated around the axes. In a coordinate frame
rotation, the points remain constant while the axes themselves are rotated. The practical difference between these
two definitions is that the rotation parameters have changed sign. If you obtain Helmert transformation parameters
it's best to check which convention was used, and reverse the rotation parameters as necessary.

Projection, Unprojection, and Reprojection


In the preceding section, we considered how to transform coordinates defined relative to different
geodetic datums. In SQL Server terminology, this relates to converting geography data defined using
one spatial reference system to geography data using a different spatial reference system. Let's now
turn our attention to consider the process involved in converting geometry data between different
spatial reference systems.
Remember that while geography data is defined using geographic coordinates of latitude and
longitude, geometry data uses Cartesian coordinates, which are generally based on a projected spatial
reference system. You do not normally convert coordinates directly from one projected system into
another; instead, you first convert projected coordinates back to 3D geographic coordinates on their
underlying geodetic model (i.e., you "unproject" them, also known as an inverse projection). Then, you

194

www.it-ebooks.info
CHAPTER 8 TRANSFORMATION AND REPROJECTION

reproject them back onto a two-dimensional plane using the set of parameters required by the
destination coordinate system (known as a forward projection).
To illustrate these concepts, lets consider the NAD83 geographic coordinate reference system,
SRID 4269, which is widely used across North America. The well-known text representation of this
spatial reference system is as follows:
GEOGCS["NAD83",
DATUM["North American Datum 1983",
SPHEROID["GRS 1980", 6378137.0, 298.257222101, AUTHORITY["EPSG","7019"]],
AUTHORITY["EPSG","6269"]],
PRIMEM["Greenwich", 0.0, AUTHORITY["EPSG","8901"]],
UNIT["degree", 0.017453292519943295],
AUTHORITY["EPSG","4269"]]
The Massachusetts Mainland spatial reference system (EPSG:2249) is a projected spatial reference
based on a Lambert Conic Conformal projection of the NAD83 datum. The well-known text
representation of this system is as follows:
PROJCS["NAD83 / Massachusetts Mainland",
GEOGCS["NAD83",
DATUM["North_American_Datum_1983",
SPHEROID["GRS 1980", 6378137.0, 298.257222101, AUTHORITY["EPSG","7019"]],
AUTHORITY["EPSG","6269"]],
PRIMEM["Greenwich", 0.0, AUTHORITY["EPSG","8901"]],
UNIT["degree", 0.017453292519943295],
AUTHORITY["EPSG","4269"]],
PROJECTION["Lambert Conic Conformal (2SP)", AUTHORITY["EPSG","9802"]],
PARAMETER["central_meridian", -71.5],
PARAMETER["latitude_of_origin", 41.0],
PARAMETER["standard_parallel_1", 42.68333333333334],
PARAMETER["false_easting", 200000.0],
PARAMETER["false_northing", 750000.0],
PARAMETER["standard_parallel_2", 41.71666666666667],
UNIT["US survey foot", 0.3048006096012192, AUTHORITY["EPSG","9003"]],
AXIS["Easting", EAST],
AXIS["Northing", NORTH],
AUTHORITY["EPSG","2249"]
]
The NAD83 Universal Transverse Mercator Zone 18N, SRID 26918, is another projected spatial
reference system based on the NAD83 datum. It shares the same underlying geodetic model of the
Earth as the Massachusetts Mainland system. However, it uses a transverse Mercator projection rather
than a conic projection in which coordinate values are measured in meters rather than feet.
The well-known text for SRID 26918 is as follows:
PROJCS["NAD83 / UTM zone 18N",
GEOGCS["NAD83",
DATUM["North_American_Datum_1983",
SPHEROID["GRS 1980", 6378137, 298.257222101, AUTHORITY["EPSG","7019"]],
AUTHORITY["EPSG","6269"]],
PRIMEM["Greenwich",0, AUTHORITY["EPSG","8901"]],
UNIT["degree",0.01745329251994328],
AUTHORITY["EPSG","4269"]],
PROJECTION["Transverse_Mercator"],
PARAMETER["latitude_of_origin",0],

195

www.it-ebooks.info
CHAPTER 8 TRANSFORMATION AND REPROJECTION

PARAMETER["central_meridian",-75],
PARAMETER["scale_factor",0.9996],
PARAMETER["false_easting",500000],
PARAMETER["false_northing",0],
UNIT["metre",1, AUTHORITY["EPSG","9001"]],
AXIS["Easting",EAST],
AXIS["Northing",NORTH],
AUTHORITY["EPSG","26918"]
]
To convert coordinate data from the Masachussets State Plane projected system into UTM Zone
18N projection first requires unprojection into NAD83, the underlying geodetic model on which both
projections are based, and then reprojection into UTM Zone 18N. This is illustrated in Figure 8-6.

Figure 8-6. Converting between two projected coordinate systems

In the preceding example, the source and target projected spatial reference systems were both
based on the same underlying geodetic datum: the North American Datum 1983. But what if you
wanted to convert between projected spatial reference systems based on different datums?
To do this, you need to add an additional step to perform a datum conversion. The process is as
follows:
First, unproject from the source projection to geodetic coordinates based on the
source datum.
Then, perform a datum conversion from the source datum to the target datum
using, for example, the Helmert transformation discussed earlier this chapter.
Finally, reproject geographic coordinates from the target datum into the target
projected spatial reference system.
To demonstrate, suppose that you wanted to convert coordinates from the Massachussets Mainland
spatial reference system based on the NAD1983 datum into the CONUS Albers projection based on the
NAD1927 datum. The process required to perform this conversion is illustrated in Figure 8-7.

196

www.it-ebooks.info
CHAPTER 8 TRANSFORMATION AND REPROJECTION

Figure 8-7. Unprojection, datum transformation, and reprojection from one projected spatial reference
system to another.

Performing Manual Calculations


The processes of datum transformation and projection described previously involve nothing more
than a series of deterministic mathematical conversions. You could, if you wanted, write your own
functions to perform the necessary calculations to convert coordinates between any spatial reference
system of the geography or geometry datatype.
However, converting between geographic coordinates on a three-dimensional, round model and
projected coordinates on a two-dimensional flat surface can involve some pretty tricky mathematics.
For example, the following formulae describe the calculations necessary to project latitude and
longitude coordinates into x- and y-coordinates in a Lambert conformal conic projection, as used in
the Massachussetts Mainland system described in the preceding example:

x = sin[n( 0)]
y = 0 cos[n( 0)]
where

197

www.it-ebooks.info
CHAPTER 8 TRANSFORMATION AND REPROJECTION

Fortunately, understanding and implementing these detailed formulae is completely


unnecessary. Because clever people have already done the hard work for us, rather than reinvent the
wheel, we can use one of the many available spatial libraries that handle the conversion between
datums and projections. This is exactly what we'll do in the next section.

Creating a Reusable Spatial Conversion Library


There are several libraries available that provide the necessary algorithms for transforming and
projecting coordinates between spatial references systems. In this section, I'll show you how to use the
Proj.NET library, which is an open source project available on the codeplex website at
http://www.codeplex.com/ProjNET. The Proj.NET library contains all the necessary functions to support
datum transformations and projection between a number of common map projections, including
Mercator, Albers, and Lambert Conformal.

Storing Transformation Parameters


Before we get onto the issue of the transformation function itself, we need to define a structure in
which to store the parameters necessary to convert between different spatial reference systems.
Although SQL Server already contains details of many spatial reference systems in the
sys.spatial_reference_systems table, this information is not sufficient for transformation and
projection purposes. Why? Well, firstly, this table stores only those geodetic spatial reference systems
supported by the geography datatype; it does not contain any details of projected spatial reference
systems that can be used by the geometry datatype. Secondly, even those spatial reference systems that
are listed in sys.spatial_reference_systems do not contain the TOWGS84 parameters required to
transform coordinates into a different geodetic datum.
We can't modify or insert additional rows into sys.spatial_reference_systems, so instead we'll
simply create a new table to store this information using a simple structure as follows:
CREATE TABLE prospatial_reference_systems (
spatial_reference_id int,
well_known_text nvarchar(max)
);

198

www.it-ebooks.info
CHAPTER 8 TRANSFORMATION AND REPROJECTION

To populate this table, you need to provide the EPSG spatial reference identifier and the well-
known text definition of each spatial reference system between which you want to convert data. You
can find details of spatial reference systems on the Internet, including from http://www.epsg-
registry.org or http://www.spatialreference.org. Once you've obtained the SRID and WKT (including
the TOWGS84 parameter) of the systems you'd like to support, you can insert them into the
prospatial_reference_systems table.
To demonstrate the range of possible conversions between coordinate systems, the following
code listing inserts three records into the prospatial_reference_systems table:
WGS84, a geographic coordinate system based on the WGS84 datum
UTM Zone 31N, a projected coordinate system also based on the WGS84 datum
The British National Grid, a projected coordinate system based on the OSGB36
datum.
INSERT INTO prospatial_reference_systems (
spatial_reference_id,
well_known_text
)
VALUES
(4326,
'GEOGCS["WGS 84",
DATUM[
"World Geodetic System 1984",
SPHEROID["WGS 84", 6378137.0, 298.257223563, AUTHORITY["EPSG","7030"]],
AUTHORITY["EPSG","6326"]
],
PRIMEM["Greenwich", 0.0, AUTHORITY["EPSG","8901"]],
UNIT["degree", 0.017453292519943295],
AXIS["Geodetic longitude", EAST],
AXIS["Geodetic latitude", NORTH],
AUTHORITY["EPSG","4326"]
]'),
(32631,
'PROJCS["WGS 84 / UTM zone 31N",
GEOGCS["WGS 84",
DATUM[
"WGS_1984",
SPHEROID["WGS 84", 6378137, 298.257223563, AUTHORITY["EPSG","7030"]],
AUTHORITY["EPSG","6326"]
],
PRIMEM["Greenwich", 0, AUTHORITY["EPSG","8901"]],
UNIT["degree", 0.01745329251994328, AUTHORITY["EPSG","9122"]],
AUTHORITY["EPSG","4326"]
],
PROJECTION["Transverse_Mercator"],
PARAMETER["latitude_of_origin",0],
PARAMETER["central_meridian",3],
PARAMETER["scale_factor",0.9996],
PARAMETER["false_easting", 500000],
PARAMETER["false_northing",0],
UNIT["metre", 1, AUTHORITY["EPSG","9001"]],
AUTHORITY["EPSG","32631"]
]'),

199

www.it-ebooks.info
CHAPTER 8 TRANSFORMATION AND REPROJECTION

(27700,
'PROJCS["OSGB 1936 / British National Grid",
GEOGCS["OSGB 1936",
DATUM[
"OSGB 1936",
SPHEROID["Airy 1830", 6377563.396, 299.3249646, AUTHORITY["EPSG","7001"]],
TOWGS84[446.448, -125.157, 542.06, 0.15, 0.247, 0.842, -4.2261596151967575],
AUTHORITY["EPSG","6277"]
],
PRIMEM["Greenwich", 0.0, AUTHORITY["EPSG","8901"]],
UNIT["degree", 0.017453292519943295],
AXIS["Geodetic longitude", EAST],
AXIS["Geodetic latitude", NORTH],
AUTHORITY["EPSG","4277"]
],
PROJECTION["Transverse Mercator"],
PARAMETER["central_meridian", -2.0],
PARAMETER["latitude_of_origin", 49.0],
PARAMETER["scale_factor", 0.9996012717],
PARAMETER["false_easting", 400000.0],
PARAMETER["false_northing", -100000.0],
UNIT["m", 1.0],
AXIS["Easting", EAST],
AXIS["Northing", NORTH],
AUTHORITY["EPSG","27700"]
]');
The code samples that accompany this book contain a script to populate this table with details of
many other commonly used spatial reference systems.

Compiling the Proj.NET Assembly for SQL Server


To perform the transformation and conversion of coordinates between any two systems based on the
parameters defined in the prospatial_reference_systems table, well leverage the Proj.NET library.
Because the Proj.NET library is coded in C#, it can be compiled into a .dll assembly that can be
imported into SQL Server, and its methods exposed via user-defined functions. This is the approach
we'll follow in the following section. To begin, follow these steps:
1. First, download the Source Code of the latest Proj.Net build from the codeplex
website at http://www.codeplex.com/ProjNET. At the time of writing, the latest
version is changeset 34175, dated May 26th, 2009. Note that you must
download the source code, not the precompiled binary .dll library, as we must
first make a few modifications to the code to make it suitable for use in SQL
Server.
2. Unzip the downloaded Proj.NET archive, projnet-34175.zip, and load the
solution file, ProjNET.sln, in Visual Studio. This is a Visual Studio 2008 solution
so, if you open the file in a more recent edition, the Visual Studio Conversion
Wizard dialog may appear to upgrade the solution. If prompted, do so.
The solution contains a number of different projects, including a demo website, a test framework,
and a Silverlight-compatible library. However, the project that is of interest to us is the ProjNET
project, highlighted in the Visual Studio Solution Explorer pane shown in Figure 8-8.

200

www.it-ebooks.info
CHAPTER 8 TRANSFORMATION AND REPROJECTION

Figure 8-8. The ProjNET project highlighted in the Visual Studio Solution Explorer pane

The ProjNET project is the core assembly containing all the methods required for coordinate
transformation and conversion. We will create an additional separate assembly that calls into the
methods of the ProjNET assembly and exposes them in a format suitable for geometry and geography
data. However, there is a small issue to address before we do so: SQLCLR assemblies do not generally
allow their methods to be called by any other code, unless they are trusted. There are two possible
solutions to this problem:
We could register the calling assembly as an UNSAFE assembly, one that is
granted full trust to perform pretty much any action in the database. This
would achieve the objective of allowing it to access the methods in the ProjNET
assembly, but it would also introduce an unnecessary level of risk. Code in an
assembly with the UNSAFE permission set has permission to do all sorts of
things, including accessing the file system and network resources, and
registering assemblies as UNSAFE should be avoided if at all possible.
The second, better option is to alter the ProjNET assembly itself to allow
partially trusted callers. Microsoft recommends that all assemblies registered
in SQL Server (except those added to the Global Assembly Cache) should be
decorated with the System.Security.AllowPartiallyTrustedCallers attribute,
so that they can be accessed from other SAFE or EXTERNAL_ACCESS assemblies,
preventing the need to grant full trust where it is not required.

Note For more information about using SQLCLR libraries from partially trusted callers, refer to
http://msdn.microsoft.com/en-us/library/ms345097%28v=SQL.110%29.aspx

Fortunately, implementing the necessary change to the Proj.NET library to allow partially trusted
callers is very simple:

201

www.it-ebooks.info
CHAPTER 8 TRANSFORMATION AND REPROJECTION

1. From the Solution Explorer pane, click to expand the ProjNET Properties
folder, and then double-click to edit the AssemblyInfo.cs file contained inside.
2. Decorate the assembly with the AllowPartiallyTrustedCallers attribute, by
adding the line shown below onto the end of the AssemblyInfo.cs file
(immediately after the existing attribute declarations):
[assembly: System.Security.AllowPartiallyTrustedCallers]
3. Now, right-click on the ProjNET project in the Solution Explorer pane and
select "Build" to recompile the assembly.
4. That's it! A message in the output window should report that the build
succeeded, and you can now close the ProjNET solution by clicking on the
Visual Studio File menu and selecting "Close Solution".

Creating Transformation Sinks


Having compiled the ProjNET library to allow partially trusted callers, we'll now create methods that
will call into the ProjNET transformation functions and wrap them in a manner suitable for SQL
Server's geography and geometry datatypes. These will be contained in a separate project, so begin by
creating and configuring the new project, as follows:
1. From the Visual Studio menu bar, select File New Project. Highlight the
Class Library template and name the new library, as shown in Figure 8-9.

Figure 8-9. Adding a new project

202

www.it-ebooks.info
CHAPTER 8 TRANSFORMATION AND REPROJECTION

2. Once the project has been created, go to the Project menu and select Add Reference.
3. In the Add Reference dialog box, click on the Browse tab and navigate to the
directory in which the recompiled ProjNet.dll library was just created. (By default
this is in the /SharpMap.CoordinateSystems/Bin/Debug subdirectory of the directory
in which Proj.NET was unarchived.) Highlight the ProjNET.dll and click OK.
4. Click Add Reference again, and this time add a reference to the
Microsoft.SqlServer.Types.dll library. By default, this is installed in the
/SDK/Assemblies subdirectory of your SQL Server installation folder.
Highlight the SQLServer.Types.dll library and click OK.
Having configured the project, the next step is to create the sink interfaces. There are four sink
interfaces required to cover each combination of possible conversions between the geometry and
geography datatypes:
geography to geometry
geography to geography
geometry to geography
geometry to geometry
The sinks will all follow essentially identical structures, with the exception of the expected input
and return types. The following code listing demonstrates the sink to convert coordinates from the
geography datatype to the geometry datatype, so it implements the IGeographySink110 interface. For
each method in the sink, the parameter values passed in are used to populate a corresponding
IGeometrySink110 interface. So, for example, the OpenGisGeographyType used to denote the type of
geography instance passed to the sink (in the BeginGeography() method) is converted to the
corresponding OpenGisGeometryType and passed through to the BeginGeometry() method of the
IGeometrySink110. As each point is passed to the IGeographySink110 BeginFigure() and AddLine()
methods, the latitude and longitude coordinates are converted using a Proj.NET
ICoordinateTransformation class into the corresponding x- and y-coordinates, and these are then
provided to the equivalent IGeometrySink110 BeginFigure() and AddLine() methods.
Here's the code listing:
using System;
using Microsoft.SqlServer.Types; // SqlGeometry and SqlGeography
using ProjNet.CoordinateSystems.Transformations; // Proj.NET

namespace ProSpatial.Ch8
{
class TransformGeographyToGeometrySink : IGeographySink110
{

private readonly ICoordinateTransformation _trans;


private readonly IGeometrySink110 _sink;

public TransformGeographyToGeometrySink(
ICoordinateTransformation trans,
IGeometrySink110 sink
)
{
_trans = trans;
_sink = sink;

203

www.it-ebooks.info
CHAPTER 8 TRANSFORMATION AND REPROJECTION

public void BeginGeography(OpenGisGeographyType type)


{
// Begin creating a new geometry of the type requested
_sink.BeginGeometry((OpenGisGeometryType)type);
}

public void BeginFigure(double latitude, double longitude, double? z, double? m)


{
// Use ProjNET Transform() method to project lat,lng coordinates to x,y
double[] startPoint = _trans.MathTransform.Transform(new double[]
{ longitude, latitude });

// Begin a new geometry figure at corresponding x,y coordinates


_sink.BeginFigure(startPoint[0], startPoint[1], z, m);
}

public void AddLine(double latitude, double longitude, double? z, double? m)


{
// Use ProjNET to transform end point of the line segment being added
double[] toPoint = _trans.MathTransform.Transform(new double[]
{ longitude, latitude });

// Add this line to the geometry


_sink.AddLine(toPoint[0], toPoint[1], z, m);
}

public void AddCircularArc(double latitude1, double longitude1, double? z1, double? m1,
double latitude2, double longitude2, double? z2, double? m2
)
{
// Transform both the anchor point and destination of the arc segment
double[] anchorPoint = _trans.MathTransform.Transform(new double[]
{ longitude1, latitude1 });
double[] toPoint = _trans.MathTransform.Transform(new double[]
{ longitude2, latitude2 });

// Add this arc to the geometry


_sink.AddCircularArc(anchorPoint[0], anchorPoint[1], z1, m1,
toPoint[0], toPoint[1], z2, m2);
}

public void EndFigure()


{
_sink.EndFigure();
}

public void EndGeography()


{
_sink.EndGeometry();
}

public void SetSrid(int srid)

204

www.it-ebooks.info
CHAPTER 8 TRANSFORMATION AND REPROJECTION

{
// Just pass through
}
}
}
For the code listings of the corresponding geometry to geography, geometry to geometry, and
geography to geography sinks, please see the code samples accompanying this book.
To actually make use of the interfaces, each sink will have a corresponding function, which we will
expose in SQL Server. The function to accompany the TransformGeographyToGeometrySink interface
requires two inputs: an item of geography data and the SRID of a projected coordinate system, whose
parameters are listed in the prospatial_reference_systems table created earlier. The method projects
the supplied geography input into the appropriate coordinate system and returns the corresponding
geometry value.
The method will read the parameters associated with both the source and destination systems
from the prospatial_reference_systems table, so we need to decorate the function with the
DataAccessKind.Read attribute. Here's the function signature:
[Microsoft.SqlServer.Server.SqlFunction(DataAccess = DataAccessKind.Read)]
public static SqlGeometry GeographyToGeometry(SqlGeography geog, SqlInt32 toSRID)
Once called, the first thing the method must do is to determine the parameters of the source and
destination reference systems. The source spatial reference identifier is retrieved from the STSrid
property of the supplied geography instance. The target SRID is the integer value supplied as the second
parameter to the function.
The function then retrieves the corresponding WKT for these systems by querying the
prospatial_reference_systems table via the context connection (if you named your spatial reference
table something else then be sure to change the query appropriately).
The well-known text representation of the source and destination systems is used to create a
CoordinateTransformation instance. This instance is then passed to the appropriate sink interface in
order to populate a SqlGeometryBuilder instance with transformed values. Finally, the
ConstructedGeometry instance created by the SqlGeometryBuilder is returned to the client.
The code listing for the GeographyToGeometry function is shown following:
using System;
using System.Collections.Generic;
using System.Text;
using System.Data.SqlTypes;
using System.Data.SqlClient; // Required for context connection
using Microsoft.SqlServer.Server; // SqlFunction Decoration
using Microsoft.SqlServer.Types; // SqlGeometry and SqlGeography
using ProjNet.CoordinateSystems; // ProjNET coordinate systems
using ProjNet.CoordinateSystems.Transformations; // ProjNET transformation functions
using ProjNet.Converters.WellKnownText; //ProjNET WKT functions

namespace ProSpatial.Ch8
{
public partial class UserDefinedFunctions
{

[Microsoft.SqlServer.Server.SqlFunction(DataAccess = DataAccessKind.Read)]
public static SqlGeometry GeographyToGeometry(SqlGeography geog, SqlInt32 toSRID)
{
// Use the context connection to the SQL Server instance on which this is executed
using (SqlConnection conn = new SqlConnection("context connection=true"))

205

www.it-ebooks.info
CHAPTER 8 TRANSFORMATION AND REPROJECTION

{
// Open the connection
conn.Open();

// Retrieve the parameters of the source spatial reference system


SqlCommand cmd = new SqlCommand("SELECT well_known_text FROM
prospatial_reference_systems WHERE spatial_reference_id = @srid", conn);
cmd.Parameters.Add(new SqlParameter("srid", geog.STSrid));
object fromResult = cmd.ExecuteScalar();

// Check that details of the source SRID have been found


if (fromResult is System.DBNull || fromResult == null)
{ return null; }

// Retrieve the WKT


String fromWKT = Convert.ToString(fromResult);

// Create the source coordinate system from WKT


ICoordinateSystem fromCS = CoordinateSystemWktReader.Parse(fromWKT) as
ICoordinateSystem;

// Retrieve the parameters of the destination spatial reference system


cmd.Parameters["srid"].Value = toSRID;
object toResult = cmd.ExecuteScalar();

// Check that details of the destination SRID have been found


if (toResult is System.DBNull || toResult == null)
{ return null; }

// Execute the command and retrieve the WKT


String toWKT = Convert.ToString(toResult);

// Clean up
cmd.Dispose();

// Create the destination coordinate system from WKT


ICoordinateSystem toCS = CoordinateSystemWktReader.Parse(toWKT) as
ICoordinateSystem;

// Create a CoordinateTransformationFactory instance


CoordinateTransformationFactory ctfac = new CoordinateTransformationFactory();

// Create the transformation between the specified coordinate systems


ICoordinateTransformation trans = ctfac.CreateFromCoordinateSystems(fromCS, toCS);

// Create a geometry instance to be populated by the sink


SqlGeometryBuilder b = new SqlGeometryBuilder();

// Set the SRID to match the destination SRID


b.SetSrid((int)toSRID);

// Create a sink for the transformation and plug it in to the builder


TransformGeographyToGeometrySink s = new TransformGeographyToGeometrySink(trans, b);

206

www.it-ebooks.info
CHAPTER 8 TRANSFORMATION AND REPROJECTION

// Populate the sink with the supplied geography instance


geog.Populate(s);

// Return the transformed geometry instance


return b.ConstructedGeometry;
}
}
}
}
As with the sink interfaces, the functions required for the other conversions from geometry to
geography, geometry to geometry, and geography to geography, are included in the code listings that
accompany this book, which can be downloaded from the Apress website at http://www.apress.com.
Having added all the necessary sinks and functions, save the project and compile the assembly by
selecting Project Build Solution.

Registering the Transformation Assembly and Functions


At this stage, you should have two compiled .dll assemblies:
The ProjNET project library (modified to allow partially trusted callers)
An assembly containing SqlFunction methods that expose the ProjNET
transformations, operating on geometry and geography data
Before registering the assemblies in SQL Server, if you have not already done so, ensure that your
SQL Server instance is configured to enable user-defined CLR routines by executing the following T-
SQL code in SQL Server Management Studio:
EXEC sp_configure 'clr enabled', '1';
GO
To complete the configuration change, reconfigure the server to reflect the new value, by issuing a
T-SQL query with the RECONFIGURE statement as follows:
RECONFIGURE;
GO
You can then register the transformation assemblies into your chosen database by issuing the
following T-SQL command, substituting the name and location where you compiled the custom dll
library:
CREATE ASSEMBLY ProSpatialCh8
FROM 'C:\ProSQLSpatial\Ch8_Transformation\bin\Release\Ch8_Transformation.dll'
WITH PERMISSION_SET = SAFE;
GO

Note You do not need to explicitly register the ProjNET library in SQL Server, only the library containing the
custom SQL functions. Since the ProjNET library is referenced from this assembly, SQL Server will automatically
locate and import the ProjNET assembly at the same time. You only need to make sure that both dlls are present in
the same folder on the server, or deployed to the Global Assembly Cache.

207

www.it-ebooks.info
CHAPTER 8 TRANSFORMATION AND REPROJECTION

Having imported the assembly, the next step is to register the corresponding function. Here's the
T-SQL code required to create the GeographyToGeometry function:
CREATE FUNCTION dbo.GeographyToGeometry(@geog geography, @srid int)
RETURNS geometry
EXTERNAL NAME ProSpatialCh8.[ProSpatial.Ch8.UserDefinedFunctions].GeographyToGeometry;
GO

Transforming Spatial Data in T-SQL


With the assemblies and functions registered, our transformation functionality is ready to go. To
transform any item of geography or geometry data into another spatial reference system, first ensure
that the prospatial_reference_systems table is populated with the correct parameters corresponding to
both the source and target spatial reference systems. Then call the transformation method, supplying
the instance to be transformed, and the SRID of the desired target system.
For example, the following code listing demonstrates how to convert a geography Point defined
using the EPSG:4326 system into a geometry Point defined using UTM Zone 31N (EPSG:32631):
DECLARE @Norwich geography;
SET @Norwich = geography::STPointFromText('POINT(1.369338 53.035498)', 4326);

SELECT dbo.GeographyToGeometry(@Norwich, 32631).ToString();


The GeographyToGeometry() method reads in the parameters of the source and target spatial
reference system from the well_known_text column of the prospatial_reference_systems table. In this
example, the requested destination system, 32631, is based on a projection of the same datum used by the
source reference system, 4326. Therefore, no datum conversion is necessary, and conversion involves
only projecting the coordinates based on a transverse Mercator projection. The coordinates of each
point in the supplied geography instance (in this case, there is only one) are projected to the destination
reference system, and the well-known text of the resulting geometry instance is returned as follows:

POINT (390659.51922243327 5877462.7522814982)

We can also try another example, this time converting a point from WGS84 to the British National
Grid system, which is based on the OSGB36 datum. In this case, conversion involves both a datum
transformation and also projection. Fortunately, ProjNET determines and performs the required
actions automatically behind the scenes, so no further steps need be added to the code listing:
DECLARE @Oxford geography;
SET @Oxford = geography::STPointFromText('POINT(-1.256804 51.752143)', 4326);

SELECT dbo.GeographyToGeometry(@Oxford, 27700).ToString();

POINT (451400.00276604667 206200.52991363779)

You can confirm both of the preceding results using one of a number of online coordinate
conversion tools, such as that available at http://nearby.org.uk.
As a graphical demonstration of this transformation function, Figure 8-10 illustrates a side-by-
side comparison of three versions of the same MultiPolygon geometry representing Great Britain and

208

www.it-ebooks.info
CHAPTER 8 TRANSFORMATION AND REPROJECTION

Northern Ireland. The left image shows the original geography instance using SRID 4326, displayed in
the Management Studio Spatial Results tab using a Bonne projection; the center image displays the
result of the GeographyToGeometry() method when used to transform into a geometry MultiPolygon
using SRID 32631; the right image displays the same MultiPolygon when transformed into SRID 27700.

Figure 8-10. Three different projections of Great Britain. From left to right: geographic (EPSG:4326) using
Bonne projection, UTM Zone 31N (EPSG:32631), Ordnance Survey National Grid of Great Britain
(EPSG:27700)

Architecture and Design Considerations


So far in this chapter I've shown you both the theory behind spatial transformation and reprojection,
and also demonstrated a technical solution enabling you to convert spatial data between spatial
reference systems directly in SQL Server. In this final section, I'd like briefly to discuss some practical
considerations to bear in mind when dealing with spatial transformations in a real-world application.
The first thing I'd like to point out is that reprojection or conversion of spatial data, particularly of
complex geometries comprised of many points, is a computationally intensive operation. I therefore
do not recommend that you try to convert data on the fly at runtime, but rather precalculate and store
reprojected versions of each shape in the database in all those systems in which it is likely to be used.
The second fact is that reprojection and transformation inevitably cause a slight loss in fidelity of
the original coordinate data, and each subsequent conversion degrades the accuracy still further.
Therefore, if you are supplied with data that uses a different spatial reference system from the rest of
your data, and therefore needs converting, there may be some value in also retaining the full fidelity
original data (for performing isolated calculations of STArea(), for example), alongside a reprojected
version that is consistent with the spatial reference system used by the rest of your application.
So, you may have several versions of the same geometry: one in the geography datatype, using the
North American Datum 1983 (SRID 4269), and one in the geometry datatype, using the New York Long
Island State Plane Coordinate System (SRID 2263), for example. When using this approach, I tend to
maintain separate columns in my table for each datatype/SRID combination, only storing instances of
the same SRID in that column, and name them according to the pattern <datatype><srid>. For the
example given previously, I might therefore use a table structure such as this:
CREATE TABLE SampleSpatialData (
Id int,
Geog4326 geography,

209

www.it-ebooks.info
CHAPTER 8 TRANSFORMATION AND REPROJECTION

Geom2263 geometry
);
By adding triggers to the table that call the necessary transformation functions you can ensure
that, should any of the spatial columns be updated, all of the other spatial columns are updated with
transformed versions of the updated geometry, keeping the data in sync. Different stored procedures
can utilize different spatial columns as appropriate, and separate indexes can be added to the columns
to optimize performance.
The final point to note is that, while the SQLCLR makes it possible to perform spatial
transformations within SQL Server, you should always consider whether this is the most appropriate
place at which to do so. CLR procedures are easily moved between tiers of the application hierarchy,
and the Proj.NET library used in this chapter could be leveraged just as readily in a front-end
Silverlight application, or in a middle-tier web service, for example, rather than directly in the
database layer.
If the only time at which you find it necessary to convert data is at the point it is first imported into
SQL Server, you might want to investigate one of the ETL tools that can perform conversion and
reprojection of spatial data as an integrated part of a load process, such as Safe FME or OGR2OGR
discussed in Chapter 5, or create a custom component that calls Proj.NET as part of an SSIS data
transformation instead.

Summary
In this chapter, you learned about the process involved in transforming coordinate data between
different spatial reference systems.
Converting data between spatial reference systems based on different
underlying geodetic models of the earth requires datum transformation.
There are various algorithms used for datum transformation. The most common
method is the Helmert transformation, which requires seven parameters.
Helmert transformation parameters that convert to the WGS84 datum can be
included with the well-known text representation of a datum, following the
TOWGS84 keyword.
By applying the inverse set of TOWGS84 parameters, you can convert from WGS84
into the target datum.
Proj.NET is an open source projection library that implements a variety of
projections and can create transformations between any two datums given the
corresponding WKT.
Custom CLR functions can be written that call into the Proj.NET library, and
expose its functionality to create transformation functions for the geometry and
geography datatypes in SQL Server.
These functions can also be called from CLR code in other layers of an
application: in the client, a web layer, or as part of an SSIS ETL load, for example.

210

www.it-ebooks.info
CHAPTER 9

Examining Spatial Properties

There are many questions that we might ask about an individual item of spatial data: Where is it? How
big is it? What sort of object is it? Where does it start and end? Where does its center lie? In this chapter,
you'll learn about the methods that SQL Server provides to answer questions such as these. Note that
all of the content in this chapter relates to examining individual items of spatial data, considered in
isolation. Methods that analyze the properties of, or relationship between, two or more items of data
will be covered in later chapters.

Property and Method SyntaxA Reminder


Before examining any specific methods or properties, it's worth having a quick reminder about the
general syntax used when writing queries involving the geography and geometry datatypes. All of the
topics in this chapter involve instance methods or properties that are applied directly to an individual
item or column of data. Some can be retrieved directly using property synta