Pro Spatial With SQL Server 2012
Pro Spatial With SQL Server 2012
info
For your convenience Apress has placed some of the front
matter material after the index. Please use the Bookmarks
and Contents at a Glance links to access them.
www.it-ebooks.info
Contents at a Glance
Contents..................................................................................................................vii
Foreword ............................................................................................................... xxi
About the Author .................................................................................................. xxii
About the Technical Reviewer ............................................................................. xxiii
Acknowledgments ............................................................................................... xxiv
Introduction .......................................................................................................... xxv
Chapter 1: Spatial Reference Systems.................................................................1
Chapter 2: Spatial Features ...............................................................................21
Chapter 3: Spatial Datatypes .............................................................................51
Chapter 4: Creating Spatial Data........................................................................77
Chapter 5: Importing Spatial Data ...................................................................101
Chapter 6: Geocoding.......................................................................................139
Chapter 7: Precision, Validity, and Errors........................................................163
Chapter 8: Transformation and Reprojection...................................................187
Chapter 9: Examining Spatial Properties .........................................................211
Chapter 10: Modification and Simplification ...................................................253
Chapter 11: Aggregation and Combination ......................................................273
Chapter 12: Testing Spatial Relationships .......................................................293
Chapter 13: Clustering and Distribution Analysis ............................................327
www.it-ebooks.info
Chapter 14: Route Finding ...............................................................................353
Chapter 15: Triangulation and Tesselation ......................................................387
Chapter 16: Visualization and User Interface ..................................................419
Chapter 17: Reporting Services .......................................................................445
Chapter 18: Indexing........................................................................................471
Appendix ..........................................................................................................499
Index.....................................................................................................................519
vi
www.it-ebooks.info
CHAPTER 1
Spatial data analysis is a complex subject area, taking elements from a range of academic disciplines,
including geophysics, mathematics, astronomy, and cartography. Although you do not need to
understand these subjects in great depth to take advantage of the spatial features of SQL Server 2012,
it is important to have a basic understanding of the theory involved so that you use spatial data
appropriately and effectively in your applications.
This chapter describes spatial reference systemsways of describing positions in spaceand
shows how these systems can be used to define features on the earth's surface. The theoretical concepts
discussed in this chapter are fundamental to the creation of consistent, accurate spatial data, and are
used throughout the practical applications discussed in later chapters of this book.
www.it-ebooks.info
CHAPTER 1 SPATIAL REFERENCE SYSTEMS
Many different spatial reference systems exist, and each has different benefits and drawbacks:
some offer high accuracy but only over a relatively small geographic area; others offer reasonable
accuracy across the whole globe. Some spatial reference systems are designed for particular purposes,
such as for nautical navigation or for scientific use, whereas others are designed for general global use.
One key point to remember is that every set of coordinates is unique to a particular spatial
reference system, and only makes sense in the context of that system.
In order to describe the location of an object on the earths surface with maximum accuracy, we
would ideally define its position relative to the geoid itself. However, even though scientists have
recently developed very accurate models of the geoid (to within a centimeter accuracy of the true
shape of the earth), the calculations involved are very complicated. Instead, spatial reference systems
normally define positions on the earth's surface based on a simple model that approximates the geoid.
This approximation is called a reference ellipsoid.
Note Not only is the geoid a complex shape, but it is also not constant. Astronomical and geophysical forces,
climatic changes, and volcanic activity all contribute to changes in the earth's structure that continuously alter the
shape of the geoid.
www.it-ebooks.info
CHAPTER 1 SPATIAL REFERENCE SYSTEMS
Note A spheroid is a sphere that has been flattened in one axis, and can be described using only two
parameters. An ellipsoid is a sphere that has been flattened in two axes; that is, the radii of the shape in the x-, y-,
and z-axes are all different. Although referred to as a reference ellipsoid, in practice most models of the earth are
actually spheroids, because ellipsoid models of the world are not significantly more accurate at describing the
shape of the geoid than simpler spheroid models.
www.it-ebooks.info
CHAPTER 1 SPATIAL REFERENCE SYSTEMS
The flattening ratio of an ellipsoid, f, is used to describe how much the ellipsoid has been
squashed, and is calculated as
f = (a b ) / a
where a = length of the semi-major axis; b = length of the semi-minor axis.
In most ellipsoidal models of the earth, the semi-minor axis is only marginally smaller than the
semi-major axis, which means that the value of the flattening ratio is also small, typically around
0.003. As a result, it is sometimes more convenient to state the inverse flattening ratio of an ellipsoid
instead. This is written as 1/f, and calculated as follows.
1 / f = a / (a b)
The inverse-flattening ratio of an ellipsoid model typically has a value of approximately
300.Given the length of the semi-major axis a and any one other parameter, f, 1/f, or b, we have all the
information necessary to describe a reference ellipsoid used to model the shape of the earth.
www.it-ebooks.info
CHAPTER 1 SPATIAL REFERENCE SYSTEMS
It is important to realize that specifying a different reference ellipsoid to approximate the geoid
will affect the accuracy with which a spatial reference system based on that ellipsoid can describe the
position of features on the earth. When choosing a spatial reference system, we must therefore be
careful to consider one that is based on an ellipsoid suitable for the data in question.
SQL Server 2012 recognizes spatial reference systems based on a number of different reference
ellipsoids, which best approximate the geoid at different parts of the earth. Table 1-1 lists the
properties of some commonly used ellipsoids that can be used.
Ellipsoid Name Semi-Major Axis (m) Semi-Minor Axis (m) Inverse Flattening Area of Use
www.it-ebooks.info
CHAPTER 1 SPATIAL REFERENCE SYSTEMS
There are many different sorts of coordinate systems, but when you use geospatial data in SQL
Server 2012, you are most likely to use a spatial reference system based on either geographic or
projected coordinates.
Caution Because a point of greater longitude lies farther east, and a point of greater latitude lies farther north,
it is a common mistake for people to think of latitude and longitude as measured on the earth's surface itself, but
this is not the case: latitude and longitude are angles measured from the plane of the equator and prime meridian
at the center of the earth.
www.it-ebooks.info
CHAPTER 1 SPATIAL REFERENCE SYSTEMS
1. The most commonly used method is the DMS (degree, minutes, seconds) system,
also known as sexagesimal notation. In this system, each degree is divided into 60
minutes. Each minute is further subdivided into 60 seconds. A value of 51
degrees, 15 minutes, and 32 seconds is normally written as 511532.
2. An alternative system, commonly used by GPS receivers, displays whole degrees,
followed by minutes and decimal fractions of minutes. This same coordinate value
would therefore be written as 51:15.53333333.
3. Decimal degree notation specifies coordinates using degrees and decimal
fractions of degrees, so the same coordinate value expressed using this system
would be written as 51.25888889.
When expressing geographic coordinate values of latitude and longitude for use in SQL Server 2012, you
must always use decimal degree notation. The advantage of this format is that each coordinate can be
expressed as a single floating-point number. To convert DMS coordinates into decimal degrees you can
use the following rule.
Degrees + (Minutes / 60) + (Seconds / 3600) = Decimal Degrees
For example, the CIA World Factbook (https://www.cia.gov/library/publications/the-world-
factbook/geos/uk.html ) gives the geographic coordinates for London as follows,
51 30 N, 0 10 W
When expressed in decimal degree notation, this is
51.5 (Latitude), 0.166667 (Longitude)
www.it-ebooks.info
CHAPTER 1 SPATIAL REFERENCE SYSTEMS
A common misconception is to believe that there is a universal prime meridian based on some
inherent fundamental property of the earth, but this is not the case. The prime meridian of any spatial
reference system is arbitrarily chosen simply to provide a line of zero longitude from which all other
coordinates of longitude can be calculated. The most commonly used prime meridian is the meridian
passing through Greenwich, England, but there are many others. For example, the RT38 spatial
reference system used in Sweden is based on a prime meridian that passes through Stockholm, some
18 degrees east of the Greenwich Prime Meridian. Prime meridians from which coordinates are
measured in other systems include those that pass through Paris, Jakarta, Madrid, Bogota, and Rome.
If you were to define a different prime meridian, the value of the longitude coordinate of all the
points in a given spatial reference system would change.
Note In contrast to a geographic coordinate system, which defines positions on a three-dimensional, round
model of the earth, a projected coordinate system describes the position of points on the earths surface as they lie
on a flat, projected, two-dimensional plane.
www.it-ebooks.info
CHAPTER 1 SPATIAL REFERENCE SYSTEMS
without distorting the resulting image in some way. Distortions introduced as a result of the projection
process may affect the area, shape, distance, or direction represented by different elements of the map.
By altering the projection method, cartographers can reduce the effect of these distortions for
certain features, but in doing so the accuracy of other features must be compromised. Just as there is
not a single "best" reference ellipsoid to model the three-dimensional shape of the earth, neither is
there a single best map projection when trying to project that model onto a two-dimensional surface.
Over the course of time, many projections have been developed that alter the distortions
introduced as a result of projection to create maps suitable for different purposes. For instance, when
designing a map to be used by sailors navigating through the Arctic regions, a projection may be used
that maximizes the accuracy of the direction and distance of objects at the poles of the earth, but
sacrifices accuracy of the shape of countries along the equator.
The full details of how to construct a map projection are outside the scope of this book. However,
the following sections introduce some common map projections and examine their key features.
HammerAitoff Projection
The HammerAitoff map projection is an equal-area map projection that displays the world on an
ellipse. An equal-area map projection is one that maintains the relative area of objects; that is, if you
were to measure the area of any particular region on the map, it would accurately represent the area
of the corresponding real-world region. However, in order to do this, the shapes of features are
distorted. The HammerAitoff map projection is illustrated in Figure 1-6.
www.it-ebooks.info
CHAPTER 1 SPATIAL REFERENCE SYSTEMS
Mercator Projection
The Mercator map projection is an example of a conformal map projection. A conformal map
projection is any projection that preserves the local shape of objects on the resulting map.
The Mercator projection was first developed in 1569 by the Flemish cartographer Gerardus
Mercator, and has been widely used ever since. It is used particularly in nautical navigation because,
when using any map produced using the Mercator projection, the route taken by a ship following a
constant bearing will be depicted as a straight line on the map.
The Mercator projection accurately portrays all points that lie exactly on the equator. However, as
you move farther away from the equator, the distortion of features, particularly the representation of
their area, becomes increasingly severe. One common criticism of the Mercator projection is that, due
to the geographical distribution of countries in the world, many developed countries are depicted with
far greater area than equivalent-sized developing countries. For instance, examine Figure 1-7 to see
how the relative sizes of North America (actual area 19 million sq km) and Africa (actual area 30
million sq km) are depicted as approximately the same size.
Despite this criticism, the Mercator projection is still used by many applications, including Bing
Maps and Google Maps, and it is probably one of the most instantly recognizable of all geographical
images of the world.
10
www.it-ebooks.info
CHAPTER 1 SPATIAL REFERENCE SYSTEMS
Equirectangular Projection
The equirectangular projection is one of the first map projections ever to be invented, being credited to
Marinus of Tyre in about 100 AD. It is also one of the simplest map projections, in which the map displays
equally spaced degrees of longitude on the x-axis, and equally spaced degrees of latitude on the y-axis.
This projection is of limited use in spatial data analysis because it represents neither the accurate
shape nor area of features on the map, although it is still widely recognized and used for such purposes
11
www.it-ebooks.info
CHAPTER 1 SPATIAL REFERENCE SYSTEMS
12
www.it-ebooks.info
CHAPTER 1 SPATIAL REFERENCE SYSTEMS
The UTM projection is universal insofar as it defines a system that can be applied consistently
across the entire globe. However, because each zone within the UTM grid is based on its own unique
projection, the UTM map projection can only be used to represent accurately those features lying
within a single specified zone.
Projection Parameters
In addition to the method of projection used, there are a number of additional parameters that affect
the appearance of any projected map. These parameters are listed in Table 1-2.
Parameter Description
Azimuth The angle at which the center line of the projection lies, relative to north
Central meridian The line of longitude used as the origin from which x coordinates are measured
False easting A value added to x coordinates so that stated coordinate values remain positive
over the extent of the map
False northing A value added to y coordinates so that stated coordinate values remain positive
over the extent of the map
Latitude of center The latitude of the point at the center of the map projection
13
www.it-ebooks.info
CHAPTER 1 SPATIAL REFERENCE SYSTEMS
Parameter Description
Latitude of origin The latitude used as the origin from which y coordinates are measured
Latitude of point The latitude of a specific point on which the map projection is based
Longitude of center The longitude of the point at the center of the map projection
Longitude of point The longitude of a specific point on which the map projection is based
Scale factor A scaling factor used to reduce the effect of distortion in a map projection
Standard parallel A line of latitude along which features on the map have no distortion
Figure 1-10. Describing position on the earth using a projected coordinate system.
14
www.it-ebooks.info
CHAPTER 1 SPATIAL REFERENCE SYSTEMS
Eastings and northings coordinate values represent a linear distance east and north of a given
origin. Although most projected coordinates are measured in meters, the appropriate units of
measurement for a spatial reference system will vary depending on the uses of that particular system.
Some systems use imperial units of measurement or locally relevant units: the foot, Clarke's foot, the
U.S. survey foot, or the Indian foot, for example (all of which are approximately equal to 30.5 cm,
although subtly different!). The coordinates of a spatial reference system designed for high-accuracy
local surveys may even specify millimeters as a unit of measurement.
Component Function
Coordinate system Specifies a mathematical framework for determining the position of items
relative to an origin. Coordinate systems used in SQL Server are generally
either based on geographic or projected coordinate systems.
Datum States a model of the earth onto which we can apply the coordinate system.
Consists of a reference ellipsoid (a three-dimensional mathematical shape
that approximates the shape of the earth) and a reference frame (a set of points
to position the reference ellipsoid relative to known locations on the earth).
Prime meridian Defines the axis from which coordinates of longitude are measured.
Unit of measurement Provides the appropriate unit in which coordinate values are expressed.
a
Projection parameters are only defined for spatial reference systems based on projected coordinate systems.
Through a combination of all these elements, you can define a spatial reference system capable of
uniquely identifying any point on the earth.
15
www.it-ebooks.info
CHAPTER 1 SPATIAL REFERENCE SYSTEMS
Note In order to be able to describe positions on the earth using a projected coordinate system, a spatial
reference system must first specify a three-dimensional, geodetic model of the world (as would be used by a
geographic coordinate system), and then additionally state the parameters detailing how the two-dimensional
projected map image should be created from that model. For this reason, spatial reference systems based on
projected coordinate systems must contain all the same elements as those based on geographic coordinate
systems, together with the additional parameters required for the projection.
Tip You can view the details of all spatial reference systems administered by the EPSG registry at the following
website, http://www.epsg-registry.org.
16
www.it-ebooks.info
CHAPTER 1 SPATIAL REFERENCE SYSTEMS
Note SQL Server only supports geographic coordinate data defined relative to one of the spatial reference
systems listed in sys.spatial_reference_systems. This table contains the additional information required to
construct the model of the earth on which geographic coordinate calculations take place. However, because no
additional information is required to perform calculations on a 2D plane, SQL Server supports projected coordinate
data defined from any projected coordinate reference system, and the details of such systems are not listed in
sys.spatial_reference_systems.
To illustrate how spatial references are represented in WKT format, lets examine the properties
of the EPSG:4326 spatial reference by executing the following query.
SELECT
well_known_text
FROM
sys.spatial_reference_systems
WHERE
authority_name = 'EPSG'
AND
authorized_spatial_reference_id = 4326;
The following is the result (with line breaks and indents added to make the result easier to
read).
GEOGCS[
"WGS 84",
DATUM[
"World Geodetic System 1984",
ELLIPSOID[
"WGS 84",
6378137,
298.257223563
]
],
PRIMEM["Greenwich", 0],
UNIT["Degree", 0.0174532925199433]
]
17
www.it-ebooks.info
CHAPTER 1 SPATIAL REFERENCE SYSTEMS
This result contains all the parameters required to define this spatial reference system, as follows.
Coordinate system: The first line of a WKT spatial reference is a keyword to tell
us what sort of coordinate system is used. In this case, GEOGCS tells us that
EPSG:4326 uses a geographic coordinate reference system. If a spatial reference
system is based on projected coordinates then the WKT representation would
instead begin with PROJCS. Immediately following this is the name assigned to
the spatial reference system. In this case, the Well-Known Text is describing the
"WGS 84" spatial reference system.
Datum: The values following the DATUM keyword provide the parameters of
the datum. The first parameter gives us the name of the datum used. In this case,
it is the "World Geodetic System 1984" datum. Then follow the parameters of the
reference ellipsoid. This system uses the "WGS 84" ellipsoid, with a semimajor
axis of 6,378,137 m and an inverse-flattening ratio of 298.257223563.
Prime meridian: The PRIMEM value tells us that this system defines
Greenwich as the prime meridian, where longitude is defined to be 0.
Unit of measurement: The final parameter specifies that the unit in which
coordinates are measured is the "Degree". The value of 0.0174532925199433 is a
conversion factor required to convert from radians into the stated units (1
degree = /180 radians).
Note Remember that, because this is a projected spatial reference system, you won't find these details in the
sys.spatial_reference_systems table. Instead, you can look up the details of these systems using a site such as
http://www.epsg-registry.org or http://www.spatialreference.org.
PROJCS[
"NAD_1983_UTM_Zone_10N",
GEOGCS[
"GCS_North_American_1983",
DATUM[
"D_North_American_1983",
SPHEROID[
"GRS_1980",
6378137,
298.257222101
]
],
PRIMEM["Greenwich",0],
18
www.it-ebooks.info
CHAPTER 1 SPATIAL REFERENCE SYSTEMS
UNIT["Degree", 0.0174532925199433]
],
PROJECTION["Transverse_Mercator"],
PARAMETER["False_Easting", 500000.0],
PARAMETER["False_Northing", 0.0],
PARAMETER["Central_Meridian", -123.0],
PARAMETER["Scale_Factor", 0.9996],
PARAMETER["Latitude_of_Origin", 0.0],
UNIT["Meter", 1.0]
]
Notice that the Well-Known Text for this projected coordinate system contains a complete set of
parameters for a geographic coordinate system, embedded within brackets following the GEOGCS
keyword. The reason is that a projected system must first define the three-dimensional, geodetic
model of the earth, and then specify several additional parameters that are required to project that
model onto a plane.
Note The Well-Known Text format in which SQL Server stores the properties of spatial reference systems in the
sys.spatial_reference_systems table is exactly the same format as used in the .PRJ file used to describe
the spatial reference in which the data in an ESRI shapefile are stored.
Summary
After reading this chapter, you should understand how spatial reference systems can be used to
describe positions in space:
A spatial reference system consists of a coordinate system (which describes a
position using either projected or geographic coordinates), a datum (which
describes a model representing the shape of the earth), the prime meridian
(which defines the origin from which units are measured), and the unit of
measurement. When using projected coordinates, the spatial reference system
also defines the properties of the projection used.
A geographic coordinate system defines the position of objects using angular
coordinates of latitude and longitude, which are measured from the equator and
the prime meridian, respectively.
A projected coordinate system defines the position of objects using Cartesian
coordinates, which measure the x and y distance of a point from an origin. These
are also referred to as easting and northing coordinates.
Whenever you state a set of coordinates representing a point on the earth, it is
essential that you also give details of the associated spatial reference system.
The spatial reference system defines the additional information that allows us to
apply the coordinate reference to identify a point on the earth.
For convenience, spatial reference systems may be specified by a single integer
identifier, known as a spatial reference identifier (SRID).
19
www.it-ebooks.info
CHAPTER 1 SPATIAL REFERENCE SYSTEMS
Details of all the geographic spatial reference systems supported by SQL Server
2012 are contained within a system catalogue view called
sys.spatial_reference_systems. SQL Server also supports data defined using any
projected spatial reference system.
The Well-Known Text format is a standard format used to express the properties
of a spatial reference system.
If you are interested in reading further about the topics covered in this chapter, I recommend
checking out the Microsoft white paper, "Introduction to Spatial Coordinate Systems: Flat Maps for a
Round Planet," which can be found in the MSDN SQL Server developer center site, at
http://msdn.microsoft.com/en-us/library/cc749633(SQL.100).aspx.
20
www.it-ebooks.info
CHAPTER 2
Spatial Features
In the last chapter, I stated that the purpose of geospatial data was to describe the shape and location of
objects on the Earth. Although this objective may be simply stated, in practice it is not always so easy to
achieve.
In many cases, although we have a rough understanding of the position and geographic extent of
features on the Earth, they may be hard to define in exact terms. For example, at what point does the
body of water known as the Gulf of Mexico become the Atlantic Ocean? Where exactly do we draw the
line that defines the boundary of a city or forest? In some parts of the world, there is even ambiguity or
contention as to where the border between two countries lies, and there are still significant areas of
land and sea that are subjects of international dispute.
Even if we agree on the precise shape and location of a feature, it may be hard to describe the
properties of that feature with sufficient detail; natural features, such as rivers and coastlines, have
complex irregular shapes. Even man-made structures such as roads are rarely simple straight lines.
It would be very hard, if not impossible, to define the shape of these features exactly. Instead,
spatial data represents these objects by storing simple geometrical shapes that approximate their
actual shape and position. These shapes are called geometries.
The spatial functionality in SQL Server is based on the Open Geospatial Consortiums Simple
Features for SQL Specification, which you can view online at
http://www.opensgeospatial.org/standards/sfs. This standard defines a number of different types of
geometries, each with different associated properties. In this chapter, each of the different types of
geometry is examined and the situations in which it is most appropriate to use each type are described.
Note In the context of spatial data, the word "geometry" can have two distinct meanings. To emphasize the
difference, geometry (code formatting) is used to refer to the geometry datatype, whereas geometry (no
formatting) is used to refer to simple shapes representing features on the Earth.
Geometry Hierarchy
There is a total of 14 standard types of geometries recognized by SQL Server (not counting the special
cases of the FullGlobe or Empty geometries; more on those later). However, only ten of these geometry
types are instantiable (that is to say, you can actually create instances of these geometries); the
remaining four types are abstract classes from which other instantiable classes are derived.
Geometries can be broadly categorized into two groups, as follows.
21
www.it-ebooks.info
CHAPTER 2 SPATIAL FEATURES
Single geometries contain one discrete geometric element. The most basic
single geometry is a Point. There are also three types of curve (LineString,
CircularString, and CompoundCurve) and two types of surface (Polygon and
CurvePolygon).
Geometry collections are compound elements, containing one or more of the
individual geometries listed above. Geometry collections may be
homogeneous or heterogeneous. A homogeneous geometry collection
contains several items of the same type of single geometry only (e.g., a
MultiPoint is a geometry collection containing only Points). A heterogeneous
geometry collection contains one or more of several different sorts of
geometry, such as a collection containing a LineString and a Polygon.
Note The Microsoft Books Online documentation refers to these two categories of geometries as Simple
types and Collection types (http://technet.microsoft.com/en-
us/library/bb964711%28SQL.110%29.aspx). The use of the word Simple here has been deliberately avoided
because this has a separate meaning (as used by the STIsSimple() method) that is discussed later.
Figure 2-1 illustrates the inheritance tree of geometry types, which demonstrates how the
different types of geometry are related to each other. Every item of spatial data in SQL Server is an
example of one of the ten instantiable classes shown with a solid border.
Figure 2-1. The inheritance hierarchy of geometry types. Instantiable types (those types from which an
instance of data can be created in SQL Server 2012) are shown with a solid border.
SQL Server 2008 provided only a single instantiable type of Curve (the LineString), and only a
single type of instantiable surface (the Polygon). Both of these geometry types are straight-edged,
linear features. SQL Server 2012 added support for curved geometries, and the CircularString,
CompoundCurve, and CurvePolygon curved geometries shown in Figure 2-1 are new types introduced
in SQL Server 2012.
22
www.it-ebooks.info
CHAPTER 2 SPATIAL FEATURES
Note In the OGC Simple Features specification, geometry type names are written using Pascal case (also called
Upper CamelCase) and this is the standard generally used in Microsoft documentation. For this reason, that
convention is also adopted in this book by referring to geometry types as MultiPoint, LineString, and so on.
23
www.it-ebooks.info
CHAPTER 2 SPATIAL FEATURES
Points
A Point is the most fundamental type of geometry, and is used to represent a singular position in
space.
24
www.it-ebooks.info
CHAPTER 2 SPATIAL FEATURES
Note Inasmuch as a Point geometry represents an infinitely small, singular location in space, it is impossible to
truly illustrate it in a diagram. Throughout this book, Point geometries are represented as small black circles, as in
Figure 2-2.
Defining a Point
A Point is defined by a pair of coordinate values, either an x-coordinate value and a y-coordinate
value from a planar coordinate system, or a latitude and longitude coordinate value from a geographic
coordinate system.
When expressed using the Well-Known Text (WKT) format, a Point located with coordinates x = 5
and y = 3 may be written as follows,
POINT(5 3)
The WKT representation begins with the POINT keyword followed by the relevant coordinate
values, contained within round brackets. The coordinate values are separated by a space (not a
comma, as you might initially expect). Figure 2-3 illustrates the Point geometry represented by this
definition.
Defining a Point from geographic coordinates follows the same convention, but with one thing to
watch out for: whereas in everyday language it is common to refer to coordinates of "latitude and
longitude" (in that order), when you write geographic coordinates in WKT the longitude coordinate
always comes first, then the latitude coordinate. The WKT syntax for a geography Point located at a
latitude of 40 and longitude of 60 is therefore:
POINT(60 40)
The location of this Point is illustrated in Figure 2-4.
25
www.it-ebooks.info
CHAPTER 2 SPATIAL FEATURES
To help remember the correct order for geographic coordinates, try thinking of longitude as being
equivalent to the x-coordinate, because longitude increases as you travel east around the world (until
you cross the 180th meridian). Likewise, latitude is equivalent to the y-coordinate, with increasing
latitude extending farther north. Because you list planar coordinates in (x y) order, the equivalent
order for geographic coordinates is therefore (longitude latitude).
Caution When defining geographic coordinates using WKT the longitude coordinate comes first, then the
latitude coordinate.
26
www.it-ebooks.info
CHAPTER 2 SPATIAL FEATURES
route, the m-coordinate could be used to express the distance of how far along
the route each point lay.
The WKT syntax for a Point containing z- and m-coordinates is as follows,
POINT(x y z m)
Or, if using geographic coordinates:
POINT(longitude latitude z m)
However, you should be aware that, although SQL Server 2012 supports the creation, storage, and
retrieval of z- and m-coordinate values, all of the inbuilt methods operate in 2D space only. The z and
m values assigned to a Point instance will therefore not have any effect on the result of any
calculations performed on that instance.
For example, when calculating the distance between the Points located at (0 0 0) and (3 4 12), SQL
Server calculates the result as 5 units (the square root of the sum of the difference in the x and y
dimensions only), and not 13 (the square root of the sum of the difference in the x, y, and z
dimensions). You can, however, retrieve the z and m values associated with any instance and use them
in your own calculations, as is demonstrated in a later chapter.
Characteristics of Points
All Point geometries share the following characteristics.
A Point is zero-dimensional, which means that it has no length in any direction
and there is no area contained within a Point.
A Point has no boundary.
The interior of a Point is the Point itself. Everything other than that Point is the
exterior.
Points are always classified as "simple" geometries.
LineStrings
Having established the ability to define individual Points, we can then create a series of two or more
Points and draw the path segments that directly connect each one to the next in the series. This path
defines a LineString.
27
www.it-ebooks.info
CHAPTER 2 SPATIAL FEATURES
Figure 2-5. A LineString representing the route of the Orient Express railway.
Defining a LineString
When expressed using the WKT format, the coordinate values of each Point are separated by a space,
and a comma separates each Point from the next in the LineString, as follows.
LINESTRING(2 3, 4 6, 6 6, 10 4)
The LineString created by this WKT definition is illustrated in Figure 2-6.
28
www.it-ebooks.info
CHAPTER 2 SPATIAL FEATURES
Note Some GIS systems make a distinction between a LineString and a Line. According to the Open Geospatial
Consortium specification (a standard on which the spatial features of SQL Server 2012 are largely based), a Line
connects exactly two Points, whereas a LineString may connect any number of Points. Because all Lines can be
represented as LineStrings, of these two types SQL Server 2012 only implements the LineString geometry. If you
need to define a table in which only Lines can be stored, you can do so by adding a CHECK constraint that calls the
STNumPoints() method to test whether inserted LineString values contain only two points.
LineStrings created from geographic coordinates follow the same convention: the coordinates of
each Point in the LineString are listed in longitudelatitude order (as they would be for an individual
Point), and each Point in the LineString is separated by a comma.
Characteristics of LineStrings
All LineStrings are one-dimensional geometries: they have an associated length, but do not contain
any area. This is the case even when the ends of the LineString are joined together to form a closed
loop. LineStrings may be described as having the following additional characteristics.
A simple LineString is one where the path drawn between the points of the
LineString does not cross itself.
A closed LineString is one that starts and ends at the same point.
A LineString that is both simple and closed is known as a ring.
The interior of a LineString consists of all the points that lie on the path of the
line. Be aware that, even when a LineString forms a closed ring, the interior of
the LineString does not contain those points in the area enclosed by the ring.
The interior of a LineString consists only of those points that lie on the
LineString itself.
The boundary of a LineString consists of the two points that lie at the start and
end of the line. However, a closed LineString, in which the start and end points
are the same, has no boundary.
The exterior of a LineString consists of all those points that do not lie on the line.
Different examples of LineString geometries are illustrated in Figure 2-7.
Figure 2-7. Examples of LineString geometries. (From leftright) A simple LineString, a simple closed
LineString (a ring), a nonsimple LineString, a nonsimple closed LineString.
29
www.it-ebooks.info
CHAPTER 2 SPATIAL FEATURES
The shape illustrated in Figure 2-8 cannot be represented by a single LineString geometry,
because doing so would necessarily involve retracing at least one section of the path twice. Instead,
the appropriate type of geometry to represent this shape is a MultiLineString geometry, discussed later
this chapter.
CircularStrings
As described in the previous section, LineStrings are formed by defining the path segments
connecting a series of Points in order. The line segments that connect consecutive points are
calculated by linear interpolation: each line segment represents the shortest direct route from one
Point to the next in the LineString.
However, this is clearly not the only way to connect a series of Points. An alternative method
would be to define a curve that connects each Point with a smooth line and gently changing gradient,
rather than abrupt angular corners between segments typical of a LineString. The CircularString
geometry, which is a new geometry type introduced in SQL Server 2012, provides one such curved line
by using circular, rather than linear, interpolation between points. In other words, a CircularString is
defined by the paths connecting a series of points in order, where the path segments connecting each
pair of points is an arc formed from part of a circle.
30
www.it-ebooks.info
CHAPTER 2 SPATIAL FEATURES
Figure 2-9. A CircularString geometry representing the course of the OxfordCambridge University boat
race.
Note Don't be misled by the name: a CircularString geometry does not have to form a complete circle (although
it can); it merely means that the segments joining consecutive points are circular arcs rather than straight lines as
in a LineString.
Defining a CircularString
There are an infinite number of circular arcs that connect two Points. In order to specify which of these
arcs should be created, every CircularString segment actually requires three points: the start and end
points to be connected, and an additional anchor point that lies somewhere on the arc between those
points. The CircularString will follow the edge of the only circle that passes through all three points.
The syntax for the Well-Known Text representation of a CircularString is as follows,
CIRCULARSTRING (1 3, 4 1, 9 4)
The CircularString created from this definition is shown in the solid line illustrated in Figure
2-10. The dashed line illustrates the complete circle from which the CircularString arc has been
created.
31
www.it-ebooks.info
CHAPTER 2 SPATIAL FEATURES
Note The additional anchor point does not need to lie in the middle of the start and end points of a
CircularString; it can be any point that lies on the circular arc between the start and end point.
Like LineStrings, CircularStrings can be created between a series of any number of consecutive
points. Each segment implicitly starts at the endpoint of the previous curved segment. Each additional
segment requires both an anchor point and an endpoint, therefore every valid CircularString contains
an odd number of points, and must contain at least three points.
Note A valid CircularString must have an odd number of points, greater than one.
One interesting point to note is that it is possible to specify a CircularString in which the anchor point lies
exactly on the straight line between the start and end point. The circular arc created in such cases is a
straight line, effectively joining all three points with the arc taken from a circle of infinite radius. The same
result can also be achieved if the anchor point is exactly equal to either the start or end point.
The set of points contained by either a LineString or a "straight" CircularString are identical, which can be
confirmed using SQL Server's STEquals() method as shown in the following code listing.
DECLARE @LineString geometry = 'LINESTRING(0 0, 8 6)';
DECLARE @CircularString1 geometry = 'CIRCULARSTRING(0 0, 4 3, 8 6)';
32
www.it-ebooks.info
CHAPTER 2 SPATIAL FEATURES
SELECT
@LineString.STEquals(@CircularString1), -- Returns 1 (true)
@LineString.STEquals(@CircularString2); -- Returns 1 (true)
Characteristics of CircularStrings
CircularStrings, like LineStrings, inherit from the abstract Curve geometry type, and share many of
the same characteristics.
CircularStrings are one-dimensional geometries; they have an associated
length, but do not contain any area.
A simple CircularString is one where the path drawn between the points of the
CircularString does not cross itself.
A closed CircularString is one that starts and ends at the same point.
The interior of a CircularString consists of all the points that lie on the arc
segments.
The boundary of a CircularString consists of the start and end points only, except
in the case of a closed CircularString, which has no boundary.
The exterior of a CircularString consists of all those points not on the path of the
CircularString.
Every CircularString must be defined by an odd number of points greater than
one.
33
www.it-ebooks.info
CHAPTER 2 SPATIAL FEATURES
Note In order to define a full circle, you must define a CircularString containing five points.
CompoundCurves
A CompoundCurve is a single continuous path between a set of Points, in which the segments joining
each pair of Points may either be linear (as in a LineString) or curved (as in a CircularString), or a
mixture of both. The CompoundCurve geometry is a new geometry type introduced in SQL Server 2012.
34
www.it-ebooks.info
CHAPTER 2 SPATIAL FEATURES
Figure 2-12. A CompundCurve representing the Daytona International Speedway racing circuit.
Defining a CompoundCurve
The Well-Known Text for a CompoundCurve geometry begins with the COMPOUNDCURVE keyword
followed by a set of round brackets. Contained within the brackets are the individual LineString or
CircularString segments that are joined together to form the compound curve.
Each CircularString or LineString segment in the CompoundCurve must begin at the point where
the previous segment ended, so that the CompoundCurve defines a single continuous path. The
coordinates of CircularString segments are preceded by the CIRCULARSTRING keyword, whereas
LineString segments are not preceded by any keyword; they are simply a list of coordinates contained
in round brackets.
The following code listing demonstrates the Well-Known Text representation of a
CompoundCurve geometry containing two LineString segments and two CircularString segments:
COMPOUNDCURVE(
(2 3, 2 8),
CIRCULARSTRING(2 8, 4 10, 6 8),
(6 8, 6 3),
CIRCULARSTRING(6 3, 4 1, 2 3)
)
This CompoundCurve geometry is illustrated in Figure 2-13.
35
www.it-ebooks.info
CHAPTER 2 SPATIAL FEATURES
Characteristics of CompoundCurves
CompoundCurves are constructed from one-dimensional LineStrings and CircularStrings, therefore
CompoundCurves are themselves one-dimensional, and contain no area.
A simple CompoundCurve is one that does not intersect itself.
A closed CompoundCurve is one that starts and ends at the same point.
Polygons
A Polygon is a type of surface; that is, a Polygon is a two-dimensional geometry that contains an area
of space. The outer extent of the area of space contained within a Polygon is defined by a closed
LineString, called the exterior ring. In contrast to a simple closed LineString geometry, which only
defines those points lying on the ring itself, the Polygon defined by a ring contains all of the points
that lie either on the line itself, or contained in the area within the exterior ring.
36
www.it-ebooks.info
CHAPTER 2 SPATIAL FEATURES
Figure 2-15. A Polygon containing an interior ring, representing South Africa. The interior ring represents
the border with Lesotho.
37
www.it-ebooks.info
CHAPTER 2 SPATIAL FEATURES
Defining a Polygon
The Well-Known Text for a Polygon begins with the POLYGON keyword, followed by a set of round
brackets. Within these brackets, each ring of the Polygon is contained within its own set of brackets.
The exterior ring, which defines the perimeter of the Polygon, is always the first ring to be listed.
Following this, any interior rings are listed one after another, with each ring separated by a comma.
The following code listing demonstrates the WKT syntax for a rectangular Polygon, two units wide
and six units high.
POLYGON((1 1, 3 1, 3 7, 1 7, 1 1))
And the following code listing demonstrates the WKT syntax for a triangular Polygon containing
an interior ring.
POLYGON((10 1, 10 9, 4 9, 10 1), (9 4, 9 8, 6 8, 9 4))
These two Polygons are both illustrated in Figure 2-16.
Figure 2-16. Examples of Polygon geometries. (From leftright) A Polygon; a Polygon with an interior ring.
Note Because Polygons are constructed from rings, which are simple closed LineStrings, the coordinates of the
start and end points of each Polygon ring must be the same.
Characteristics of Polygons
All Polygons share the following characteristics.
Because Polygons are constructed from a series of one or more rings, which are
simple closed LineStrings, all Polygons are themselves deemed to be simple
closed geometries.
38
www.it-ebooks.info
CHAPTER 2 SPATIAL FEATURES
CurvePolygons
The CurvePolygon, like the Polygon, is defined by one exterior ring and, optionally, one or more
interior rings. Unlike the Polygon, however, in which each ring must be a simple closed LineString,
each ring in a CurvePolygon can be any type of simple closed curve. Those curves can be LineStrings,
CircularStrings, or CompoundCurves, so the rings that define the boundary of a CurvePolygon can
have a mixture of straight and curved edges.
39
www.it-ebooks.info
CHAPTER 2 SPATIAL FEATURES
Defining a CurvePolygon
The WKT representation of a CurvePolygon follows the same general syntax as that for a Polygon.
However, because the CurvePolygon allows rings to be defined as LineStrings, CircularStrings, or
CompoundCurves, you must specify which kind of curve is used for each ring.
The LineString is considered to be the default curve type, and linear rings do not need to be
explicitly preceded by the LINESTRING keyword. In the following code listing, a CurvePolygon is defined
by a linear ring between five points:
CURVEPOLYGON((4 2, 8 2, 8 6, 4 6, 4 2))
The result, shown in Figure 2-18, is a square of width and height 2 units, exactly the same as would
have been created using the following Polygon geometry.
POLYGON((4 2, 8 2, 8 6, 4 6, 4 2))
In the following code listing, the exterior ring of the CurvePolygon is instead defined using a
CircularString geometry between the same set of points.
CURVEPOLYGON(CIRCULARSTRING(4 2, 8 2, 8 6, 4 6, 4 2))
In this case, rather than creating a square, the resulting CurvePolygon is a circle of radius 2.828
(8) , centered on the point (6 4), as shown in Figure 2-19.
40
www.it-ebooks.info
CHAPTER 2 SPATIAL FEATURES
Characteristics of CurvePolygons
With the exception of the method of interpolation between points, CurvePolygons share exactly the
same general characteristics as Polygons:
CurvePolygons are two-dimensional, simple, closed geometries.
They have a length equal to the perimeter of all defined rings.
The area contained by a CurvePolygon is equal to the area of space enclosed
within the exterior ring less any area contained within any interior rings.
MultiPoints
A MultiPoint is a homogeneous collection of Point geometries. Unlike the LineString or CircularString,
which are formed from a series of connected Points, there are no connecting lines between the
individual Points in a MultiPoint: they are distinct and separate.
41
www.it-ebooks.info
CHAPTER 2 SPATIAL FEATURES
Figure 2-20. A MultiPoint instance representing the location of each wind turbine at the site of the
Ardrossan wind farm.
Defining A MultiPoint
To represent a MultiPoint geometry in Well-Known Text, you first declare the MULTIPOINT element
name, followed by a comma-separated list of the coordinate tuples of each point contained in the
instance, contained within round brackets. The coordinates of each point are listed in exactly the same
manner as they would be if used to define an individual Point instance; that is, coordinates are listed
in x, y, z, m order, or longitude, latitude, z, m order, with values separated by spaces.
The following code listing is an example of a MultiPoint containing three Points,
MULTIPOINT(0 0, 2 4, 10 8)
The geometry created by this WKT is shown in Figure 2-21.
42
www.it-ebooks.info
CHAPTER 2 SPATIAL FEATURES
Be careful with the placement of the comma(s) in the Well-Known Text representation of a
MultiPoint geometry; because each Point may contain between two and four coordinate values
(depending on whether the optional z- and m-coordinates are defined), you must place the comma
carefully to separate each coordinate tuple. Compare the following two WKT representations, which
use the same coordinate values, but vary in their comma placement.
MULTIPOINT(0 0, 2 4, 10 8)
MULTIPOINT(0 0 2 , 4 10 8)
The first geometry represents a MultiPoint geometry containing three Points, each one specified
with only x- and y-coordinates. The second example creates a MultiPoint containing only two Point
geometries, with each one specifying x-, y-, and z-coordinates.
Characteristics of MultiPoints
MultiPoint geometries all share the following characteristics.
MultiPoints, in common with the individual Point geometries from which they
are formed, are always simple closed geometries.
MultiPoints are one-dimensional, and have no length nor do they contain any area.
MultiLineStrings
A MultiLineString is a homogeneous collection of LineString geometries.
43
www.it-ebooks.info
CHAPTER 2 SPATIAL FEATURES
Defining a MultiLineString
The WKT representation of a MultiLineString geometry is formed by a comma-separated list of
individual LineString geometries, contained within a set of round brackets following the MULTILINESTRING
keyword. Because it is known that the elements within a MultiLineString must all be LineStrings, it is not
necessary to include the individual LINESTRING keyword in front of each element; each LineString is
merely represented by a comma-separated series of coordinate values within round brackets.
The following code listing demonstrates the Well-Known Text representation of a
MultiLineString containing three LineStrings: the first and second containing only two points each,
and the third containing three points.
MULTILINESTRING((0 0, 2 2), (3 2, 6 9), (3 3, 5 3, 8 8))
The MultiLineString geometry represented by this WKT is shown in Figure 2-23.
44
www.it-ebooks.info
CHAPTER 2 SPATIAL FEATURES
Characteristics of MultiLineStrings
Unlike MultiPoints and MultiPolygons, which generally contain elements that are disjoint from each
other, MutiLineStrings are frequently comprised of a number of intersecting LineString elements. For
example, any road or river that splits into two or more forks cannot be represented by a single
LineString geometry, and must instead be represented as a MultiLineString, with each fork being
represented by a separate LineString within the MultiLineString. Additional properties of
MultiLineStrings are as follows.
As are LineStrings, MultiLineStrings are one-dimensional.
A MultiLineString is simple if all of the LineString elements contained within it
are completely disjoint from each other. If any two LineStrings intersect (even if
they only touch each other) the MultiLineString is not considered to be simple.
A MultiLineString is closed if all of the LineString instances contained within it
are themselves closed (i.e., every LineString forms a closed loop, ending at the
same point from which it started).
MultiPolygons
A MultiPolygon is a geometry collection containing several Polygon geometries.
It is easy to think of other countries that consist of two or more separate geographic islands or
regions, including Japan, the United States of America (Alaska and Hawaii), Australia (Tasmania),
France (Corsica), and many others. In all these cases, the geographic area represented by a particular
political entity is best represented as a MultiPolygon.
45
www.it-ebooks.info
CHAPTER 2 SPATIAL FEATURES
Defining a MultiPolygon
The Well-Known Text representation of a MultiPolygon uses the MULTIPOLYGON keyword, followed by
the definition of each Polygon contained in the collection, contained within round brackets.
The following code listing illustrates the WKT syntax required to define a MultiPolygon
containing two Polygons, each one containing only a single exterior ring.
MULTIPOLYGON(((10 20, 30 10, 44 50, 10 20)), ((35 36, 37 37, 38 34, 35 36)))
Take care to place the brackets carefully, because brackets are used both to separate individual
rings within a Polygon and also to separate Polygons within a MultiPolygon. Compare the preceding
code listing to the following, which instead creates a MultiPolygon geometry containing only one
Polygon that contains an interior ring.
MULTIPOLYGON(((10 20, 30 10, 44 50, 10 20), (35 36, 37 37, 38 34, 35 36)))
Characteristics of MultiPolygons
Characteristics of MultiPolygons are as follows.
MultiPolygons are two-dimensional, simple, closed geometries.
The length of a MultiPolygon is defined as the sum of the lengths of all the rings
in all the Polygons it contains.
The area of a MultiPolygon is the sum of the areas of all its Polygons.
GeometryCollections
The MultiPoint, MultiLineString, and MultiPolygon geometries considered previously are examples of
geometry collections containing only a single type of geometry. It is also possible to define a generic,
heterogeneous GeometryCollection, which may contain any number of any type of geometry (with the
exception of the FullGlobe geometry, discussed later). The GeometryCollection is also the only type of
collection that can contain multiple curved objects.
Defining a GeometryCollection
The Well-Known Text syntax for a GeometryCollection begins with the keyword GEOMETRYCOLLECTION,
followed by the fully formed WKT representation of each element in the collection, contained within a
set of round brackets. The following code listing illustrates the WKT syntax for a GeometryCollection
containing a Polygon and a Point.
GEOMETRYCOLLECTION(POLYGON((5 5, 10 5, 10 10, 5 5)), POINT(10 12))
46
www.it-ebooks.info
CHAPTER 2 SPATIAL FEATURES
Note Although there are specific MultiLineString and MultiPolygon collections, there is no specific collection
that can contain multiple instances of their equivalent curved forms; there are no MultiCircularString, MultiCurve,
or MultiCurvePolygons, for example. To create a collection that contains more than one of these elements you
must use the generic GeometryCollection type.
FullGlobe
The FullGlobe is a special type of geometry that encompasses the whole surface of the Earth.
Defining a FullGlobe
A FullGlobe geometry covers the entire surface of the Earth, thus there is no need to state any
particular coordinate points in its construction. The Well-Known Text representation of a FullGlobe
geometry is therefore very simply:
FULLGLOBE
47
www.it-ebooks.info
CHAPTER 2 SPATIAL FEATURES
Empty Geometries
One final type of geometry to consider is an empty geometry. An empty geometry is one that does not
contain any points. Even though it contains no points, an empty geometry is still nominally assigned a
particular type, so you may have an empty Point or empty LineString geometry, for example.
Figure 2-25. An empty Point geometry, an empty LineString geometry, and an empty Polygon geometry.
(JOKE!)
You may be wondering why you would ever create an empty geometry: how can you represent the
location or position of a feature on the Earth using a shape with no points? One way of thinking about
this is as follows. If geometries represent the position (and therefore, by implication, the presence) of
features on the Earth's surface, then empty geometries denote the absence of any such features.
You don't generally create empty geometries directly, but you do get empty geometries returned
in the results of a spatial query in which no points match the specified criteria. For example, empty
geometries can be used as a response to a question, Where is x? when the answer is, Nowhere on
Earth.
Note An empty geometry is not the same as NULL. A NULL value suggests a result that has not been
evaluated or is undefined. An empty geometry value suggests that a result has been evaluated, but that it does not
represent a location on the Earth.
48
www.it-ebooks.info
CHAPTER 2 SPATIAL FEATURES
Summary
In this chapter, you learned about the different types of geometries that can be used to store spatial
data in SQL Server 2012.
Points are the most basic type of geometry, representing a singular location in
space. They are used as a building block to construct the more complex types of
geometry.
LineStrings, CircularStrings, and CompoundCurves are all one-dimensional
geometries that are typically used to represent paths, routes, borders, and
similar features.
Polygons and CurvePolygons are two-dimensional geometries. They have a
boundary that contains an interior area, and may also have one or more
interior holes.
Elements may be combined together into collections. Homogeneous collections
are MultiPoints, MultiLineStrings, and MultiPolygons, respectively. SQL Server
also supports a heterogeneous GeometryCollection that may contain any
number of any type of geometry (other than a FullGlobe).
There are two special types of geometry: the FullGlobe geometry, which covers
the entire surface of the Earth, and the Empty geometry, which contains no area
at all.
49
www.it-ebooks.info
CHAPTER 3
Spatial Datatypes
Every variable, parameter, and column in a SQL Server table is defined as being of a particular
datatype. The datatype determines the range of values that can be stored and the ways in which that
data can be used. You are probably already familiar with many SQL Server datatypes, such as those
listed in Table 3-1.
Datatype Usage
In addition to these common datatypes designed to hold numeric, character, or date and time data,
SQL Server 2012 has two datatypes specifically designed to hold spatial data: geometry and geography.
These are listed in Table 3-2.
Datatype Usage
There are several similarities between the geometry and geography datatypes:
51
www.it-ebooks.info
CHAPTER 3 SPATIAL DATATYPES
They both employ a vector model of spatial data, in which features may be
represented using a range of geometries including Points, LineStrings,
Polygons, and collections of these types.
Internally, SQL Server stores values of both datatypes as a serialized stream of
binary data in the same format.
When working with items of data from either type, you use object-oriented
methods based on the .NET framework.
They both provide much of the same standard spatial functionality, such as
calculating the length or area of a feature, the distance between features, or
testing whether two features intersect.
However, there are also a number of important differences between the two types. When you store
spatial data in SQL Server 2012, you must choose whether to store that information using the geometry
datatype or the geography datatype according to the nature of the data in question, and how you plan to
use that data. In this chapter, we explore the features of the two types, and show how to decide which
datatype to use in a given situation.
SQLCLR Foundations
Before looking at the specific differences between the geometry and geography datatypes, its worth
spending a bit of time examining the way in which both datatypes are implemented in SQL Server.
The geometry and geography datatypes are both system-defined CLR datatypes. The abbreviation
CLR in this context refers to the Common Language Runtime, the environment used to execute managed
.NET code. Whereas most SQL Server queries use the T-SQL query engine, when working with CLR
datatypes SQL Server additionally leverages the .NET runtime process hosted by SQL Server known as
SQLCLR. The managed code environment in which SQLCLR operates allows SQL Server to deal with
certain types of complex data objects, such as spatial data, more efficiently than relying on T-SQL alone.
Note SQL Server uses SQLCLR to perform operations on system-defined CLR datatypesgeometry,
geography, and hierarchyidas well as for user-defined CLR datatypes (UDTs). When querying such data,
SQLCLR works alongside the T-SQL engine; it does not replace it.
In all versions of SQL Server since SQL Server 2005, it has been possible to use the SQLCLR to
execute user-defined .NET code. In order to do so, however, the server must first be configured to allow
such behavior, which can be done by calling the sp_configure system stored procedure and setting the
clr_enabled option to 1. In contrast, system-defined CLR datatypes such as geometry and geography
require no additional configuration; they are automatically available for use in all SQL Server 2012
databases. As such, you can start using spatial datatypes straight away in your database just as you
would any other type of data.
Even though they require no configuration, there are still some special considerations of working
with the geometry and geography datatypes (and with CLR datatypes in general) that you should be
aware of, as follows.
Each CLR data item is defined as an object, a serialized representation of a
compound set of values. If you do a simple SELECT query of a column of data
defined using a CLR datatype, your query will return a set of binary values.
52
www.it-ebooks.info
CHAPTER 3 SPATIAL DATATYPES
53
www.it-ebooks.info
CHAPTER 3 SPATIAL DATATYPES
Static Methods
To create an item of geography or geometry data, you must use a static method belonging to the
appropriate datatype. The syntax for using a static method is to state the name of the datatype followed
by a pair of colons, and then the name of the method. For example, the Parse() method is a static
method that can be used to create an instance from a supplied Well-Known Text string. To use the
Parse() method of the geometry datatype to create a Point geometry at coordinates (30,40), you would
call it as shown in the following code listing,
SELECT geometry::Parse('POINT(30 40)');
Note The geometry Parse method treats all supplied coordinate values as defined using SRID 0; that is, they
are abstract coordinates with no relation to a specific model of the Earth.
To provide another example, the following code listing creates a table containing a single column
of the geography datatype, and then inserts three Points into that table created using the geography
Point() static method. The Point() method requires three parameters representing latitude, longitude,
and SRID. The return value of the method is a Point geometry at the specified location.
CREATE TABLE geographypoints (
Location geography
);
Instance Methods
The process of creating an item of data using a static method is known as instantiation, and the
geometry created is referred to as an instance of the geometry or geography dataype (depending on the
static method from which it was created). Operations performed on individual values of spatial data
are therefore called instance methods. Items of both the geography and geometry datatypes provide a
range of instance methods for performing common calculations, including intersections, measuring
distances, and addition and subtraction of geometries.
The syntax for using an instance method is to state the name of the item (or column) of data on
which the method should be performed followed by a single dot (.) and then the name of the method to
54
www.it-ebooks.info
CHAPTER 3 SPATIAL DATATYPES
be called followed by a set of closed brackets (). If the method requires any parameters, these should be
supplied within the brackets.
For example, the ToString() method is an instance method that retrieves the Well-Known Text
representation of any item of geography or geometry data. It requires no parameters. To retrieve the
WKT of each Point in the Location column of the geographypoints table created in the preceding code
listing, you can execute the following.
SELECT
Location.ToString()
FROM
geographypoints;
As another example, the STBuffer() method is an instance method that creates a buffer zone
around a geometry. It requires a single parameter stating the amount by which the geometry should be
buffered. Positive values create an enlarged area around a geometry, whereas negative values create
a reduced area. The following code listing declares a geometry Point variable located at (12 7) using the
geometry Point() static method, and then selects a buffer of 5 units about that geometry by calling the
STBuffer() method on that instance.
DECLARE @point geometry = geometry::Point(12, 7, 0);
SELECT @point.STBuffer(5);
Note that instance methods can be chained together, where the result of one method is passed
directly to the next method. For example, the following code listing creates a buffered Point geometry
using STBuffer() and then returns the area of that buffered geometry by calling by calling the STArea()
method on the buffered instance.
DECLARE @point geometry = geometry::Point(3, 5, 0);
SELECT @point.STBuffer(5).STArea();
Note In the preceding code listings, the coordinates from which the geometry Points were created were
abstract x- and y-coordinates; they werent intended to represent any particular feature on the Earths surface. To
indicate this, a value of 0 has been supplied as the third parameter to the Point() method, which means that
these coordinates do not relate to any particular spatial reference system.
Properties
Certain properties of a geography or geometry instance can be accessed directly using property notation,
which, like the syntax for instance methods, uses the column name followed by a single dot and then the
name of the property to retrieve. However, because you do not need to provide parameters to retrieve
the property of an instance, property names are not followed by a set of brackets.
For example, the Lat and Long properties represent the latitude and longitude coordinates of a
geography Point instance, respectively. The following code listing illustrates how to retrieve the
latitude and longitude coordinates of each Point in the geographypoints table.
SELECT
Location.Lat,
Location.Long
FROM
geographypoints;
55
www.it-ebooks.info
CHAPTER 3 SPATIAL DATATYPES
Some properties, such as Lat and Long are read-only. That is to say, you cannot update the latitude
coordinate of a Point by running the following query,
UPDATE geographypoints SET Location.Lat = 20;
Attempting to do so will result in the following error.
Msg 6595, Level 16, State 1, Line 1
Could not assign to property 'Lat' for type 'Microsoft.SqlServer.Types.SqlGeography' in
assembly 'Microsoft.SqlServer.Types' because it is read only.
Changing the coordinate values associated with a geometry or geography instance requires you to
create an entirely new geometry from a static method. However, certain properties of existing
geometries can be both retrieved and set. For example, the STSrid property allows you either to return
or update the spatial reference identifier associated with an instance. To change the spatial reference
identifier of the Points in the geographypoints table to use the North American Datum 1983 (SRID
4269), you can execute the following code listing.
UPDATE geographypoints
SET Location.STSrid = 4269;
Note Updating the STSrid property of an instance does not reproject the coordinate values into the specified
spatial reference system; it merely changes the metadata describing the system in which those coordinates are
defined.
Spatial Libraries
All of the functionality of the geography and geometry datatypes is contained in two libraries:
Microsoft.SqlServer.Types.dll and SqlServerSpatial.dll. These assemblies are created when you
install SQL Server 2012, but they can also be installed separately as part of the Microsoft SQL Server
Feature Pack, available for download from http://www.microsoft.com/downloads.
Microsoft.SqlServer.Types.dll contains the managed (.NET) code necessary
to define the spatial datatypes, and is installed by default in the \Program
Files\Microsoft SQL Server\110\SDK\Assemblies directory.
SqlServerSpatial.dll contains additional functionality required to perform
spatial operations, written using native (C++) code. This assembly is installed
by default in the \Windows\System32 directory.
Because these two libraries are redistributable and independent of SQL Server, you can reference
them in your own applications and use exactly the same spatial methods as provided by the database in
any other layer of your architecture: in a client-side app, or in a webservice, for example. The machine
on which these applications are executed doesnt even need to have SQL Server installed, so long as it
has the two libraries listed above (and the prerequisites to execute the code they contain, namely the
.NET Framework and the Microsoft C++ runtime libraries).
56
www.it-ebooks.info
CHAPTER 3 SPATIAL DATATYPES
Note You can import SQL Servers spatial libraries and use them from within other applications, such as a WPF
application, an ASP.NET webpage, or a console application. However, because the SqlServerSpatial.dll library uses
unmanaged code, these applications must be run on a Windows-based platform, and you cannot use the spatial
datatypes in a Silverlight application, for example.
Whereas the SQL Server spatial datatypes are called geography and geometry, when you call
directly into the SqlServer.Types.dll library from .NET, the corresponding spatial datatypes are called
SqlGeography and SqlGeometry. Throughout this book, I concentrate on examples that use the geography
and geometry types as they are implemented in SQL Server. However, it is worth remembering that
almost all of the same functions can be applied to the equivalent SqlGeography and SqlGeometry types
in a .NET application (one significant exception to this rule is that it is only possible to create and
utilize a spatial index on data held in the database itself).
Figure 3-1. Calculations on the geography datatype account for curvature of the Earth.
57
www.it-ebooks.info
CHAPTER 3 SPATIAL DATATYPES
spatial_reference_id The integer identifier used within SQL Server 2012 to refer to
this system.
authority_name The name of the standards body that administers this reference.
unit_conversion_factor A scale factor for converting from meters into the appropriate
linear unit of measure
Note Currently all but one of the spatial reference systems supported by SQL Server are based on the EPSG
registry, and the value of the internal spatial_reference_id for any system listed in sys.spatial_reference_systems
is the same as the authorized_spatial_reference_id allocated by the EPSG. The only exception is SRID 104001, a
system defined by Microsoft that defines coordinates on a perfect unit sphere.
The parameters that describe a geographic coordinate system (the ellipsoid, prime meridian,
angular unit of measure, and the like) are defined in WKT format in the well_known_text column of the
sys.spatial_reference_systems table. However, you might also have noticed that one of the other
columns of data defined in the sys.spatial_reference_systems table is unit_of_measure. Why does SQL
Server need to know a separate unit of measure, you might ask, when there is already a unit of
measure embedded in the well_known_text definition of the system (which, for latitude and longitude
coordinates as used by a geographic coordinate system, is generally degrees)?
The answer is that, although angular units of latitude and longitude are all very well for describing
the location of Points, they are not that helpful for expressing the distance between Points, nor the area
enclosed within a set of Points. For example, using the spatial reference system EPSG:4326, we can state
the location of Paris, France as a point at 48.87N, 2.33E. Using the same system, the location of Berlin,
58
www.it-ebooks.info
CHAPTER 3 SPATIAL DATATYPES
Germany could be described as 52.52N, 13.4E. However, if you wanted to know the distance between
Paris and Berlin, it would not be very helpful to state that they were 11.65 apart, with the answer
measured in degrees. You would probably find it much more useful to know that the distance between
them was 880 km, or 546 miles, say.
To account for this, SQL Server defines an additional linear unit of measurement for every
geodetic spatial reference system. When you use the geography datatype, although coordinates must be
supplied in angular latitude and longitude coordinates, the results of any calculations are returned in
the linear unit of measure specified in the unit_of_measure column of the
sys.spatial_reference_systems table for the relevant spatial reference system.
To check the units of measurement corresponding to a particular spatial reference system, you
can run a query as follows (substituting the SRID of the appropriate spatial reference system).
SELECT
unit_of_measure
FROM
sys.spatial_reference_systems
WHERE
authority_name = 'EPSG'
AND
authorized_spatial_reference_id = 4326;
This query gives the following result, which tells us that linear measurements of any geography
data defined using the EPSG:4326 spatial reference system are stated in meters.
metre
With this knowledge, we can use the geography datatype to determine the distance between Paris
and Berlin based on the latitude and longitude coordinates as stated previously, but returning the
answer in meters. This is shown in the following code listing.
879989.866996421
Note Every time you store an item of data using the geography type, you must supply the SRID of the spatial
reference system from which the coordinates were obtained. SQL Server 2012 uses the information contained in
the spatial reference system to apply the relevant model of curvature of the Earth in its calculations, and also to
express the results of any linear methods in the appropriate units of measurement. The supplied SRID must
therefore correspond to one of the supported spatial references in the sys.spatial_reference_systems table.
59
www.it-ebooks.info
CHAPTER 3 SPATIAL DATATYPES
Figure 3-2. The importance of Polygon ring orientation using the geography datatype. Does the Polygon
created from the ring shown here contain the Northern Hemisphere, or the Southern Hemisphere?
To resolve this ambiguity, SQL Server applies a rule known as the "left-hand rule" (or, sometimes,
the "left-foot rule"); if you imagine yourself walking along the ring of a geography Polygon, following
the points in the order in which they are listed, SQL Server 2012 treats the area on the left of the line
drawn between the Points of the ring as the interior of the Polygon, and the area on the right as the
exterior. Another way of thinking about this is to imagine looking down at a point on the surface of the
Earth from space; if that point is enclosed by a Polygon ring in a counterclockwise direction then that
point is contained inside the Polygon, otherwise it is outside.
Applying this rule, we can determine that the Polygon illustrated in Figure 3-2 therefore
represents the Northern Hemisphere. If the order of the points in the Polygon ring were to be
reversed, then this Polygon would instead contain the Southern Hemisphere.
60
www.it-ebooks.info
CHAPTER 3 SPATIAL DATATYPES
Caution It is a common mistake to list the points of a Polygon ring with incorrect ring orientation, in which case
the resulting Polygon is inside-out: the area that was intended to be contained within the Polygon is outside, and
the interior of the Polygon actually contains the entire rest of the Earths surface. When defining a Polygon ring in
the geography datatype, ensure that the interior is on the "left" of the line connecting the points.
Remember that Polygons can also contain one or more interior rings, which mark out areas of
space not included in the interior of the Polygon. To define an area of space not included in a Polygon
you should therefore enclose it in a ring of Points listed in clockwise order, so that the area to be
excluded lies to the right of the line. The illustration shown in Figure 3-3 demonstrates the correct ring
orientation required to define a geography Polygon containing a hole.
As before, if the Points of each ring were listed in reverse order then the Polygon would become
inverted: the interior of the Polygon would contain the entire surface of the globe except for the area
shaded in gray in Figure 3-3.
61
www.it-ebooks.info
CHAPTER 3 SPATIAL DATATYPES
Tip The geography datatype defines a ReorientObject() method, which flips the interior and exterior of a
geography Polygon instance, and has the same effect as reversing the coordinate order of each ring.
Figure 3-4. Calculations on the planar geometry type operate on a flat plane.
The geometry datatype stores planar spatial data defined using Cartesian (x, y) coordinates, which
makes it ideally suited to storing coordinate data from a projected spatial reference system. In this case,
the act of projection has already mapped the geographic position of features on the Earths surface onto
a flat plane. It can also be used to store unprojected geographic coordinates of latitude and longitude,
where the longitude coordinate value is mapped directly to the x value, and the latitude value is used as
the y value (in doing so, you are implicitly projecting the data using an equirectangular projection).
In fact, the geometry datatype can be used to store any coordinates that can be expressed using x and
y values. Examples of such coordinates might be data collected from a local survey, or topological plans
of a geographically small area where curvature of the Earth can be safely ignored, or geometrical data
obtained from computer-aided design (CAD) packages. For example, Figure 3-5 shows the floorplan of
the European SQLBits conference held at the Grand Hotel in Brighton, England in April 2012, defined
entirely using the geometry datatype and displayed using the SQL Server Management Studio Spatial
Results tab. This data was created from an original plan supplied in PDF format.
62
www.it-ebooks.info
CHAPTER 3 SPATIAL DATATYPES
Because the geometry datatype uses simple planar calculations, the results of any computations
will be expressed in the same units of measurement as the underlying coordinate values. The most
common unit of measurement in which the coordinates of projected spatial reference systems are
stated is the meter; this is the case for the Universal Transverse Mercator system and many National
Grid reference systems, for example. If you use the geometry datatype to store data based on one of
these systems, lengths and distances calculated using the STLength() and STDistance() methods will be
measured in meters. And, if you were to use STArea() to calculate the area of a geometry, the result
would be measured in square meters. In contrast, the coordinates used to create the floorplan shown in
Figure 3-5 were measured in pixels, so using the STArea() method to calculate the area of each room
would lead to a result measured in pixels squared.
Caution Earlier, it was stated that the geometry datatype could be used to store "unprojected" geographic
coordinates of latitude and longitude, directly mapped to the y- and x-coordinates. However, remember that these
are angular coordinates, usually measured in degrees. If you use the geometry datatype to store information in
this way then the distances between points would also be measured in degrees, and the area enclosed within a
Polygon would be measured in degrees squared. This is almost certainly not what you want, so exercise caution
when using the geometry datatype in this way. To return a geometry calculation measured in meters, say, the
input coordinates must also be measured in meters.
63
www.it-ebooks.info
CHAPTER 3 SPATIAL DATATYPES
64
www.it-ebooks.info
CHAPTER 3 SPATIAL DATATYPES
The first code listing uses the correct SRID for the coordinates, EPSG:27700,
which defines the Royal Mile as a straight line between two points in
Edinburgh, 1806 meters in length.
The second code listing uses the same coordinate values but suggests that they
are defined relative to the Texas Central coordinate system (EPSG:32039).
Although the numerical value of the result is the same, the interpretation of
this result would imply that the Royal Mile is a straight line of length
1806.22313128805 feet, drawn between two points in the Quitman Mountains
of Texas, United States. This is clearly incorrect!
Supplying an incorrect SRID will lead to many problems farther downstream in your spatial
applications. Once the metadata associated with a set of coordinates has been lost it cannot be
redetermined, because looking at a set of coordinate values in isolation gives very little indication of
the system from which they have been derived. By explicitly stating the SRID with every set of
coordinates, not only will you retain this important metadata, but it will also ensure that you do not
accidentally try to perform a calculation on items of spatial data defined using different spatial
reference systems, which would lead to an invalid result.
So please, I implore you, always, always use the correct SRID with any spatial data in SQL Server,
even when using the geometry datatype!
Note The sys.spatial_references table only contains details of geodetic spatial references, because these
are required to perform calculations using the geography datatype. In order to find the appropriate SRID for a
projected coordinate system, you can look it up on the EPSG website at http://www.epsg-registry.org/.
65
www.it-ebooks.info
CHAPTER 3 SPATIAL DATATYPES
contains exactly the same area as if it were specified by the following coordinates,
(50,30), (50,31), (52,31), (52,30), (50,30)
This applies to both interior and exterior rings of the geometry datatype, so the area contained by a
geometry Polygon remains exactly the same even if the points are supplied in reverse order. This is
illustrated in Figure 3-6, where the highlighted Polygon will always represent the Northern
Hemisphere irrespective of ring orientation.
Consistency
In order to perform operations using different items of spatial data in SQL Server 2012, all of the data
must be defined using the same spatial reference system, and stored using the same datatype. It is not
possible to combine geometry and geography data in the same query, nor perform operations on items
of the same datatype defined using different SRIDs. If you attempt to do so, SQL Server will return a
NULL result.
If you already have existing spatial data that you would like to integrate into your system, you
should therefore use a datatype suitable for the format in which that data has been collected. For
66
www.it-ebooks.info
CHAPTER 3 SPATIAL DATATYPES
instance, if you have projected data collected from the National Grid of Great Britain, you should store
the data in a geometry field, using the SRID of 27700. If you are using latitude and longitude coordinate
data collected from a GPS system, then you should choose a geography type, with SRID of 4326. If you
would like to combine multiple sets of data defined in different spatial reference systems, then you
must first transform one or more of the sets of coordinate data in order to make them consistent. For
more information on reprojection and transformation, refer to Chapter 8.
Note Remember that the spatial reference identifier provides descriptive information about the system in which
coordinate values have been defined; it does not dictate the system itself. You therefore cannot simply update the
SRID value relating to a set of coordinates to express them in a different spatial reference system. Instead, to
convert coordinates from one spatial reference system into another you must transform or reproject the data.
Accuracy
Calculations using the geometry datatype are performed on a flat plane. Any geospatial features drawn
on a flat plane must have been projected and, as explained in Chapter 1, the process of projecting any
three-dimensional object onto a surface will always lead to some distortion in the way those features
are represented. This distortion may affect the area, shape, distance, or direction of the data. Therefore,
the results of certain operations using the geometry datatype will inevitably also be distorted, with the
effect of distortion varying depending on the projection used and the particular part of the Earth's
surface on which the calculation is based.
Generally speaking, the greater the surface area being projected, the more distortion occurs.
Although over small areas the effects of these distortions are fairly minimal, in large-scale or global
applications there can be a significant impact on the accuracy of any results obtained using the
geometry datatype when compared to the geography datatype (which is not distorted by projection).
In many applications that cover only a small spatial area, such as those contained within a
particular state of the United States, the results of calculations performed using the geometry type on
the relevant state plane projection will be sufficiently accurate. However, over larger distances, the
computations based on a planar projection will become more and more inaccurate, and the geography
datatype become a more suitable choice.
67
www.it-ebooks.info
CHAPTER 3 SPATIAL DATATYPES
Figure 3-7. The shortest line between Tokyo and Vancouver using the geometry datatype.
In contrast, the geography datatype uses a continuous round model of the Earth, which is unaffected
by the edges introduced as a result of projection. The answer obtained for the shortest route between
Tokyo and Vancouver using the geography datatype would instead look like that shown in Figure 3-8.
Figure 3-8. The shortest route between Tokyo and Vancouver using the geography datatype.
68
www.it-ebooks.info
CHAPTER 3 SPATIAL DATATYPES
It is obvious that, in cases such as these, the results obtained using the geography datatype give a
more accurate answer based on the real round Earth.
A further demonstration of these issues is the problem of trying to define geometry instances that
extend across the edges of the map in a given projection. Figure 3-9 highlights a Polygon geometry
representing Russia.
Figure 3-9. Polygon geometry representing Russia crossing edges of a projection in the geometry datatype.
Notice that although most of the Polygon is contained in the eastern hemisphere, the most
northeasterly part of Russia (the region of Chukotka) actually crosses the edge of the map, to appear in
the western hemisphere. Using the geometry datatype based on this projection, it would not be possible
to represent Russia using a single Polygon geometry; instead you would need to use a MultiPolygon
geometry containing two elements to represent the two distinct Polygons created where the edge of
the map had caused the original feature to be divided in two.
Both of the problems demonstrated in this section could be mitigated to some extent by choosing
an appropriate projected spatial reference system in which the geometry in question does not cross
the edges of the map. However, although this would avoid the issue for a particular case, it does not
solve it; even if a different projection is chosen there will always be some features that will occur on
the new edges instead.
If you expect to have to deal with geometries that risk extending over the edges of a map
projection, then the geography datatype would be a better choice in which to store your data.
Presentation
Because the geography datatype operates on a three-dimensional model of the Earth, if you want to
present the results of any geography data in a display, they will need to be projected (unless youve got
one of those fancy 3D displays). As we have already discussed, this introduces distortion. In the example
above, although the geography datatype accurately works out the shortest straight line connecting two
points, if we were to display this result on a projected map, this "straight" line would appear distorted
69
www.it-ebooks.info
CHAPTER 3 SPATIAL DATATYPES
and curved. The exact effect of this distortion will differ depending on the particular properties of the
map projection used.
Conversely, because as the geometry datatype is based on data that has already been projected
onto a plane, no further calculations or distortion need be introduced to express the results on a map:
"straight" lines in the geometry datatype remain straight when drawn on a map (providing the map is
projected using the same projection as the spatial reference system from which the points were
obtained).
If you are storing spatial data in SQL Server specifically for the purpose of display on a particular
map (say, for creating a tile overlay on Bing Maps or Google Maps), then it might be beneficial to
store that data using the geometry datatype in the same projected spatial reference system as the map
on which it is intended to be displayed. This reduces additional calculations required when the data is
retrieved and projected onto the map, and reduces the need to introduce further distortion in the data
at display time.
Performance
Performing ellipsoidal computations uses more computing resources than Cartesian computations. As
a result, spatial calculations using the geography datatype may take longer to compute than the
equivalent operations using the geometry datatype. This only affects methods where the geography
datatype has to calculate metrics based on the geodetic model of the Earth (such as distances, lengths,
or areas). When using methods that return properties of objects which do not take account of the model
of the Earth, such as returning the number of points in an object, there is no difference in performance
between geography and geometry types.
http://www.opengeospatial.org/
The OGC administer a number of industrywide standards for dealing with spatial data. By
conforming to these standards, different systems can ensure core levels of common functionality, which
ensures that spatial information can be more easily shared among different vendors and systems.
In October 2007, Microsoft joined the Open Geospatial Consortium (OGC) as a principal member,
and the spatial datatypes implemented in SQL Server 2012 are largely based on the standards defined
by the OGC:
The geometry datatype conforms to the OGC Simple Features for SQL
specifications v1.1.0 (http://www.opengeospatial.org/standards/sfs) and
implements all the required methods to meet that standard.
The geography datatype implements most of the same methods as the geometry
datatype, although it does not completely conform to the required OGC standards.
As such, if it is important to you to use spatial methods in SQL Server 2012 that adhere to accepted
OGC standards (such as if you are replacing the functionality of a legacy system based on those
standards), you should use the geometry datatype.
70
www.it-ebooks.info
CHAPTER 3 SPATIAL DATATYPES
General Rules
If you are still unsure which type to use, consider the following general rules.
If you have latitude and longitude coordinate data (collected from a GPS, from
Google Earth, or most sources listed on the Web) use the geography datatype,
normally using the default 4326 SRID.
If you have x- and y-coordinate data (e.g., collected from a flat map), use the
geometry datatype with an SRID to represent the map projection and datum used.
If you have x- and y-coordinate data that are not defined in relation to any
particular model of the Earth, use the geometry datatype with SRID = 0.
Storage Requirements
The datatypes geometry and geography are both of variable length. In contrast to a fixed-length
datatype such as int or datetime, the actual amount of storage required for an item of spatial data
varies depending on the complexity of the object that the data describes. Just as a varchar(max) field
varies in size according to the number of characters in the string being stored, so too does a geometry
or geography field vary in size according to the number of points in the corresponding geometry.
The structure of an item of geometry or geography data begins with a header section, which defines
basic information such as the type of shape being described, the spatial reference system used, and the
overall number of points in the object. This header is immediately followed by the coordinate values of
each x- and y- (or longitude and latitude) coordinate in the geometry, represented in 8-byte binary
format. The more points that an object has in its definition, the longer this binary stream will be, and
therefore the more storage space will be required.
A Point geometry defined with only two coordinates will always occupy 22 bytes
of storage space.
A LineString between two Points, containing the minimum of four coordinates (x
and y values of the start and end Point), requires 38 bytes of storage. This
increases by 16 bytes for every additional line segment added to the LineString.
A Polygon occupies a variable amount of space depending on the number of Points
with which it is defined (not related to the area of space contained by the Polygon).
If a Polygon contains interior rings, these also increase the storage required.
There is no specific maximum size of an item of geometry or geography data. However, SQL Server
2012 has an overall restriction on any kind of large object, which is limited to a size of 231 1 bytes.
This is the same limit as is applied to datatypes such as varbinary(max) and varchar(max), and equates
to approximately 2 Gb for each individual item of data. You would need to store a very complex
geometry object in order to exceed this limit. If necessary, remember that complex geometries can be
broken down into a number of individual objects which each fit within this limit.
Tip You can use the DATALENGTH function to find out the number of bytes used to store the value of any item of
geometry or geography data (or, for that matter, any other item of data).
71
www.it-ebooks.info
CHAPTER 3 SPATIAL DATATYPES
0xAD100000020C000000000000444000000000000059C0
This value may be broken down into a number of constituent parts, as shown in Table 3-4.
Element Description
AD100000 Spatial Reference identifier (4 bytes). The integer SRID valuein this case
4269expressed as a 4-byte binary value.
02 Version number (1 byte). SQL Server 2008/SQL Server 2008 R2 uses version 1
serialization, whereas SQL Server 2012 uses version 2 serialization.
OC Serialization properties (1 byte). This value is set from a series of bit flags
representing the following additional properties of the geometry.
Whether the geometry is larger than a hemisphere (0 20)
Whether the geometry is a single line segment (0 10)
Whether the geometry is a single Point (0 08)
Whether the geometry is valid (0 04)
Whether the geometry contains m-coordinates (0 02)
Whether the geometry contains z-coordinates (0 01)
For this case, the flags for a valid (0 04), single Point (0 08) geometry have
been set, leading to the value 0C
72
www.it-ebooks.info
CHAPTER 3 SPATIAL DATATYPES
Element Description
0000000000004440 Latitude coordinate (40) expressed as 8-byte floating point binary
Suppose instead that we wanted to construct programmatically a geography Point instance defined
at a latitude of 42 degrees, longitude of 90 degrees, using the SRID 4326. To do so, we could build up
the geography value from the corresponding binary elements, as shown in the following code listing.
DECLARE @point geography =
0xE6100000 + -- SRID (4326)
0x02 + -- Version (2)
0x0C + -- Properties (Single Point [8] + Valid [4])
0x0000000000004540 + -- Latitude (42)
0x00000000008056C0 -- Longitude (90)
SELECT
@point.STSrid,
@point.ToString();
The WKT results returned by the ToString() method confirm that the geography Point has been
created at the appropriate coordinates, using the 4326 SRID:
This is admittedly a fairly contrived example, and there are few situations in which you would
need to do such manual binary manipulation in T-SQL. However, it does demonstrate that it is
certainly possible to do so, and you can reuse the same approach in other application layers.
Note For more information on the serialization format used for SQL CLR datatypes, refer to the following
document: http://download.microsoft.com/download/7/9/3/79326E29-1E2E-45EE-AA73-
74043587B17D/%5BMS-SSCLRT%5D.pdf.
73
www.it-ebooks.info
CHAPTER 3 SPATIAL DATATYPES
Notice the wording used in this error message: conversion is not allowed. This is not a technical
limitation of SQL Server; because geometry and geography both use the same underlying structure,
converting data between the types is incredibly easy from a technical point of view. Rather, this is a
deliberate restriction imposed by SQL Server to ensure that you understand the implications of
working with each datatype, and that you dont casually swap data between them.
There are very few scenarios in which it makes sense to take coordinate data from one spatial
datatype and convert it directly into the other. If you find a requirement to convert data between the
two datatypes then it normally also involves transforming the associated coordinate data from a
geographic coordinate system used in the geography datatype to a projected coordinate system for the
geometry datatype, for example. Nevertheless, there are a few occasions when it is useful to be able to
take geographic coordinates from a geography instance and convert them directly into the geometry
datatype. This might be the case if you want to use one of the methods that is only available to the
geometry datatype, such as STBoundingBox(), or STRelate().
In order to convert from geography to geometry, we can take advantage of the fact that both
datatypes can be represented by, and created from, a binary stream in the Well-Known Binary format.
In the following example the value of the geometry variable, @geom, is created from the STGeomFromWKB()
static method. The arguments passed to this method are the Well-Known Binary representation and
SRID of the geography variable @geog.
-- First, create a geography instance
DECLARE @geog geography;
SET @geog = geography::Point(23,32, 4326);
74
www.it-ebooks.info
CHAPTER 3 SPATIAL DATATYPES
New geometry or geography columns can be added to existing tables, enabling spatial information
to be seamlessly integrated alongside existing items of data. Let's suppose that you have an existing
table, customer, that contained the following fields of customer information.
CREATE TABLE dbo.customer (
CustomerID int,
FirstName varchar(50),
Surname varchar (50),
Address varchar (255),
Postcode varchar (10),
Country varchar(32)
);
Now suppose that you want to add an additional spatial field to this table to record the location of
each customer's address. No problem; geography and geometry fields can be added to existing tables just
like any other by using an ALTER TABLE statement as follows.
ALTER TABLE dbo.customer
ADD CustomerLocation geography;
By extending the table in this way, we have enabled the possiblity of using spatial methods in
conjunction with our existing customer data, to find answers to questions such as how many customers
there are within a certain area, and how far a particular customer lives from his closest store.
75
www.it-ebooks.info
CHAPTER 3 SPATIAL DATATYPES
Summary
In this chapter you learned about the two datatypes used for storing spatial data in SQL Server 2012,
geometry and geography, and examined the key differences between the two types.
The geography datatype uses geodetic spatial data, which accounts for the
curvature of the earth.
The geometry datatype uses planar spatial data, in which all points lie on a flat
plane.
You considered the factors influencing the choice of which datatype to use, and
saw some example usage scenarios for each type.
You saw the reasons why, whichever datatype you use, it is important to state the
correct spatial reference identifier associated with any coordinate data.
You examined the structure in which SQL Server 2012 stores spatial data,
represented as a stream of binary values.
You also saw how to add a column of spatial data to a SQL Server table, and add a
constraint to that column to ensure that only data of a certain SRID could be
inserted.
76
www.it-ebooks.info
CHAPTER 4
In the first three chapters of this book I introduced you to the main components necessary to define
spatial data in SQL Server, namely, a set of coordinate locations, the spatial reference system in which
those coordinates are defined, the type of geometry used to represent a feature, and the datatype in
which that feature is stored.
In this chapter, we apply the knowledge youve gained so far in a practical context, by looking at
the different methods you can use to create items of geometry or geography data. Every method requires
those same key pieces of information: the spatial reference system, type of geometry, and datatype,
together with the coordinates that define that instance.
The way in which you provide those elements varies depending on the method you choose; SQL
Server implements methods that create data from several different standard formats: Well-Known
Text (WKT), Well-Known Binary (WKB), and Geography Markup Language (GML). It also exposes an
API that allows you to construct items of spatial data programmatically using the SqlGeometryBuilder
and SqlGeographyBuilder classes. Each of these methods is examined in turn and the advantages and
disadvantages of each discussed.
77
www.it-ebooks.info
CHAPTER 4 CREATING SPATIAL DATA
SQL Server must parse the text in a WKT representation into its own internal
binary format, therefore creating objects from WKT may be slower than other
methods.
Because WKT is both simple to read and understand, it is the format used in most of the MSDN
online documentation and code samples, as well as in other resources. It is also the format that is most
widely used throughout this book.
LineString STLineFromText()
Polygon STPolyFromText()
MultiPoint STMPointFromText()
MultiLineString STMLineFromText()
MultiPolygon STMPolyFromText()
GeometryCollection STGeomCollFromText()
All of the methods listed in Table 4-1 are implemented by both the geometry and geography
datatypes, with the datatype of any created instance matching the datatype of the method from which it
was created. Note that there are no dedicated methods to create CircularString, CurvePolygon, or
CompoundCurve geometries from WKT; these methods appear to have been overlooked in the OGC
standard and so are not implemented by SQL Server either. To create a curved geometry from WKT
you must use the generic STGeomFromText() or Parse() methods.
78
www.it-ebooks.info
CHAPTER 4 CREATING SPATIAL DATA
Lets illustrate this with a few examples. To begin, lets take the simple example of a Point located
at geographic coordinates of latitude 27.5 and a longitude of 153, measured using the WGS84 spatial
reference system, SRID 4326. This is the approximate location of Brisbane, Australia.
The Well-Known Text representation of this Point is:
POINT(153 -27.5)
We can create a geography Point instance by supplying this WKT string to the dedicated
STPointFromText() method, together with the associated SRID, as follows.
SELECT
geography::STPointFromText('POINT(153 -27.5)', 4326);
Notice that the WKT parameter is passed as a nvarchar(max) text string, and supplied in single
quotes. If you use the SqlServer.Types.dll library in a .NET application then the WKT is passed as a
SqlChars value instead, as shown in the following C# code listing.
SqlGeography Point = SqlGeography.STPointFromText(
new SqlChars("POINT(153 -27.5)"),
4326);
The STPointFromText() method can be used to create Point instances only. If you supply the WKT
representation of a different sort of geometry to the STPointFromText() method then SQL Server will
throw an exception (System.FormatException 24142). To create a LineString geometry, for example, you
should use the STLineFromText() method instead, supplying the WKT of a LineString as shown in the
following code listing.
SELECT
geometry::STLineFromText('LINESTRING(300500 600150, 310200 602500)', 27700);
If you know in advance that the data you will be creating will only be of a certain type of
geometry, then I generally recommend that you use the method dedicated to that geometry type,
STPointFromText() for Points, STPolyFromText() for Polygons, and so on. Using these methods will
provide a first sanity check of your data by ensuring that it only contains geometries of the expected
type; any other data will throw an exception.
However, there are also occasions when you require a method that will create spatial data from a
WKT string of any sort of geometry. In these situations, you can use the STGeomFromText() method
instead. The following code listing demonstrates how the STGeomFromText() method can be used to
create both the Point and LineString from the previous examples.
SELECT
geography::STGeomFromText('POINT(153 -27.5)', 4326),
geometry::STGeomFromText('LINESTRING(300500 600150, 310200 602500)', 27700);
The results obtained from the STGeomFromText() method are identical to those obtained from the
dedicated STPointFromText() or STLineFromText() methods used previously.
Note There is no performance benefit from using a geometry type-specific method such as
STPointFromText() rather than the generic STGeomFromText() method. The sole advantage is that it restricts the
types of geometry that will be accepted in the WKT input (and therefore, by implication, the type of geometry that
will be returned by the method). If this is your objective, you might also want to consider adding a CHECK
constraint that tests the value returned by the STGeometryType() method at the point that a geometry is inserted
into a table, as demonstrated in Chapter 3.
79
www.it-ebooks.info
CHAPTER 4 CREATING SPATIAL DATA
Another alternative is to use the generic Parse() method. In common with the STGeomFromText()
method, the Parse() method will create an item of spatial data from any supplied WKT representation.
The difference is that, unlike the other static methods, Parse() does not require you to set an SRID; the
spatial reference system is assumed based on the datatype being used. For the geography datatype, the
Parse() method always uses SRID 4326, whereas for the geometry datatype it is SRID 0.
If you attempt to set the value of a geometry or geography column or variable directly from a
character string, the string will be treated as WKT and passed to the Parse() method. As such, the
following T-SQL code listing,
DECLARE @Delhi geography = 'POINT(77.25 28.5)';
produces exactly the same result as
DECLARE @Delhi geography = geography::Parse('POINT(77.25 28.5)';
which, in turn, is equivalent to
DECLARE @Delhi geography = geography::STGeomFromText('POINT(77.25 28.5)', 4326);
As long as you are dealing with the common cases of using the geography datatype with SRID 4326,
or the geometry datatype with SRID 0, you can therefore use the Parse() method as a convenient way of
shortening your code.
To demonstrate the same example using the static methods provided by the SqlGeography or
SqlGeometry classes in a .NET application, you will probably find it easier to write:
SqlGeography Delhi = SqlGeography.Parse("POINT(77.25 28.5)");
compared to
SqlGeography Delhi = SqlGeography.STGeomFromText(
new SqlChars("POINT(77.25 28.5)"), 4326);
One subtle difference worth noting from this example is that, although STGeomFromText() accepts
the supplied WKT input as a SqlChar array, the Parse() method treats the WKT input as a SqlString.
SqlChar values can be streamed, whereas a SqlString cannot. Using Parse() therefore requires a
contiguous block of memory to be allocated for the entire supplied WKT string. This, combined with the
fact that Parse() can only be used for creating instances defined using limited SRIDs, means that you
will need to decide whether it is suitable for use in a particular scenario compared to the more verbose
STGeomFromText() method.
Note All of the static methods that operate on WKT input expect decimal coordinate values to be represented
using a decimal point (.) to separate the integral and fractional parts of the coordinate, for example, 52.61.
Depending on the regional settings of your database or operating system, you may find that coordinate values are
instead displayed and represented using other coordinate separators, such as the comma in 52,61. When
supplying WKT for any of the static methods listed in this section, be sure to check the culture settings of your
application.
80
www.it-ebooks.info
CHAPTER 4 CREATING SPATIAL DATA
81
www.it-ebooks.info
CHAPTER 4 CREATING SPATIAL DATA
WKB representation lists a stream of 8-byte values representing the coordinates of each point in the
geometry.
Although SQL Server 2012 stores spatial data internally as a stream of binary data, it is not the
same as the WKB binary data format. As a result, you cannot directly set the value of an item of
geography or geometry data from a WKB representation. Instead, you must pass that WKB
representation to one of the appropriate static methods. Likewise, if you directly select the internal
binary value that SQL Server uses to store an item of spatial data, it will not be the same as the WKB
representation of that feature. One difference between WKB and SQL Server's own internal binary
format is that SQL Server serializes the spatial reference identifier and other properties related to the
geometry not present in the WKB representation.
Note The WKB format has some similarities with the internal binary format that SQL Server uses to store
geometry and geography data, but they are not the same. As in WKT, creating spatial data from WKB requires
passing that WKB representation to a suitable static method.
82
www.it-ebooks.info
CHAPTER 4 CREATING SPATIAL DATA
Point STPointFromWKB()
LineString STLineFromWKB()
Polygon STPolyFromWKB()
MultiPoint STMPointFromWKB()
MultiLineString STMLineFromWKB()
MultiPolygon STMPolyFromWKB()
GeometryCollection STGeomCollFromWKB()
To demonstrate the use of these methods, let's first take a look at an example WKB representation
of a Point geometry:
0x00000000014001F5C28F5C28F6402524DD2F1A9FBE
The elements of this binary string are broken down in Table 4-3.
Value Description
To create a geometry Point instance from this WKB representation, using the Qatar National Grid
(SRID 2099), you can use the STPointFromWKB() method as follows.
SELECT
geometry::STPointFromWKB(0x00000000014001F5C28F5C28F6402524DD2F1A9FBE, 2099);
Or, you can use the generic STGeomFromWKB() method, which can be used to create any type of
geometry from WKB:
83
www.it-ebooks.info
CHAPTER 4 CREATING SPATIAL DATA
SELECT
geometry::STGeomFromWKB(0x00000000014001F5C28F5C28F6402524DD2F1A9FBE, 2099);
Note The spatial reference identifier is not serialized as part of the WKB binary string, so it must be provided as
the second parameter to any static methods that instantiate geometries from WKB.
0x01010000000000000000002C400000000000002240
Note that, like its WKT sister, STAsText(), the WKB representation produced by the STAsBinary()
method specifies coordinates in two dimensions only: x and y for the geometry datatype, or latitude and
longitude for the geography datatype. You can confirm this by converting the Well-Known Binary result
above back to Well-Known Text:
-- Declare point containing x, y, and z coordinates
DECLARE @g geometry = geometry::STPointFromText('POINT(14 9 7)', 0);
84
www.it-ebooks.info
CHAPTER 4 CREATING SPATIAL DATA
0x01E90300000000000000002C4000000000000022400000000000001C40
Tip You can return the GML representation of any existing geography or geometry instance by calling the
AsGml() method.
85
www.it-ebooks.info
CHAPTER 4 CREATING SPATIAL DATA
GML is very verbose, explicitly stating all values within specific elements.
However, GML also has the following disadvantages.
It is very verbose! Although both WKT and GML are text-based formats, the GML
representation of a geometry requires substantially more space than the
equivalent WKT representation
Because GML is text-based, it too suffers from precision issues caused by
rounding of binary floating-point values.
GML is most commonly used for representing spatial information in an XML-based environment,
including when syndicating spatial data over the Internet.
Note The GML methods implemented in SQL Server are based on a scaled-down version of the GML 3.1.1
schema. You can view the schema used in SQL Server at http://schemas.microsoft.com/sqlserver/profiles
/gml/ or you can find the full GML standards on the OGC website, located at
http://www.opengeospatial.org/standards/gml.
SELECT
geography::GeomFromGml(@gml, 4269);
86
www.it-ebooks.info
CHAPTER 4 CREATING SPATIAL DATA
Using the example of a LineString, the GML representation should therefore always be formed as
follows.
<LineString xmlns="http://www.opengis.net/gml">
<posList>-6 4 3 -5</posList>
</LineString>
If you omit the namespace, you will still have valid, well-formed XML, but it will no longer define
a valid GML geometry. Attempting to create a geometry from such a representation using the
GeomFromGml() method will result in an error as shown in the next example:
DECLARE @NoGMLNameSpace xml =
'<LineString>
<posList>-6 4 3 -5</posList>
</LineString>';
System.FormatException: 24129: The given XML instance is not valid because the top-
level tag is LineString. The top-level element of the input Geographic Markup
Language (GML) must contain a Point, LineString, Polygon, MultiPoint,
MultiGeometry, MultiCurve, MultiSurface, Arc, ArcString, CompositeCurve,
PolygonPatch or FullGlobe (geography Data Type only) object.
The text of the error message states that the GML representation passed to the GeomFromGml()
method is invalid because the top-level tag is LineString. Instead, it helpfully suggests a list of possible
valid elements, including LineString?
As explained previously, XML elements are unique only within a particular namespace; what the
preceding error message really should say is that the top-level element of the input GML must be one
of the listed elements from the GML namespace. Declaring the GML namespace as an attribute on the
parent element tag resolves the error, as shown in the following example.
DECLARE @WithGMLNameSpace xml =
'<LineString xmlns="http://www.opengis.net/gml">
<posList>-6 4 3 -5</posList>
</LineString>';
(1 row(s) affected)
87
www.it-ebooks.info
CHAPTER 4 CREATING SPATIAL DATA
GML is a widely accepted standard in the geospatial world, and it provides a rich set of elements capable
of describing spatial features with all their associated metadata. It is used in many professional and
industrial datasets. For example, GML is the native format in which data from the Ordnance Survey (the
British government's national executive mapping agency) is distributed.
Unfortunately, SQL Server implements only a reduced subset of the full GML standard, and lacks many of
its more advanced elements. What's more, the parser used by the GeomFromGml() method will fail to parse
GML documents containing those GML elements not recognized by SQL Server, even if the document itself
adheres to the full published GML schema.
For this reason, I find the GeomFromGml() method to be of little use to create geography or geometry data
from GML documents found "in the wild," because there is no guarantee that SQL Server will be able to
parse them. The best way of knowing that SQL Server will be able to parse a particular GML file is if that
file were itself created from SQL Server's AsGml() method, but that of course implies that the data has
already been successfully imported into the database!
88
www.it-ebooks.info
CHAPTER 4 CREATING SPATIAL DATA
Note When using the geography Point() method, the coordinate parameters are supplied with latitude first,
then longitude, which is opposite to the order in which they are stated in WKT.
But what if we wanted to create something a bit more complex? Suppose that, rather than create
individual Point instances for each row in the GPSLog table, you wanted to create a LineString joining
each of the points in order. There is no inbuilt function that accepts an array of coordinates and returns
the LineString created from them, so dynamically constructing the WKT may be a better choice here.
The following code listing demonstrates one way of achieving this.
-- Declare an empty nvarchar to hold our constructed WKT string
DECLARE @WKT nvarchar(max) = '';
89
www.it-ebooks.info
CHAPTER 4 CREATING SPATIAL DATA
-- Append the LINESTRING keyword and enclose the coordinate list in brackets
SET @WKT = 'LINESTRING(' + @WKT + ')';
However, there are a couple of problems with this method. First, the code required to create it is
pretty ugly. Ugly code is not only unappealing to look at, but it's also hard to maintain, and it's much
more likely to conceal hidden bugs. Without the comments inserted, would it be obvious why you had to
use the LEFT function to trim the last character from the list of coordinates? The code required to
construct the WKT could arguably be cleaned up somewhat if implemented as a recursive CTE, but it
would still involve a degree of manual string manipulation.
90
4
www.it-ebooks.info
CHAPTER 4 CREATING SPATIAL DATA
It's also slow: string manipulation functions are generally not known for being efficient, and this
method requires both CASTing and concatenating of nvarchar values.
Finally, although this approach is somewhat dynamic, it's hard to include much control-of-flow or
conditional operators. Consider how much more complicated the code would quickly become if the
GPSLog table contained details of multiple vehicles that we wanted to plot as separate LineStrings, or if
we were to try to construct more complicated geometries such as Polygons from the underlying data.
The fact is that, although you can use T-SQL string functions to create dynamic WKT strings, the
primarily procedural-based T-SQL engine is not designed for this kind of operation. Fortunately, SQL
Server provides us with an alternative in the form of the SqlGeometryBuilder and SqlGeographyBuilder
classes available in SqlServer.Types.dll, which we examine in the next section.
Tip The SqlGeometryBuilder and SqlGeographyBuilder classes can be used in any .NET application that
references the SqlServer.Types.dll library, and are not dependent on SQL Server being installed.
91
www.it-ebooks.info
CHAPTER 4 CREATING SPATIAL DATA
When your project is first created, youll see the default Program.cs code file in which we insert the
code for our application. But, before we do so, we need to include a reference to the
Microsoft.SqlServer.Types.dll library.
Select Project Add Reference.
On the Add Reference dialogue box that appears, ensure that the .NET tab is
selected. Scroll down the list until you find Microsoft.SqlServer.Types and click
to highlight it.
Click OK.
These steps are illustrated in Figure 4-3.
92
www.it-ebooks.info
CHAPTER 4 CREATING SPATIAL DATA
Note Depending on your system configuration, the Microsoft.SqlServer.Types library might not
automatically show up in the list of installed components under the Visual Studio .NET tab. In such cases, you can
locate the library manually by clicking the Browse tab, navigating to \Program Files (x86)\Microsoft SQL
Server\110\SDK\Assemblies and highlighting Microsoft.SqlServer.Types.dll in that directory.
Once the reference has been added to the project, we add a using directive so that we can easily
reference the methods contained in the Microsoft.SqlServer.Types library without having to specify
the namespace each time. Add the following line to the top of your Program.cs file,
using Microsoft.SqlServer.Types;
The project is now set up and ready to use the spatial datatypes, and we can get on with writing the
body of our code.
93
www.it-ebooks.info
CHAPTER 4 CREATING SPATIAL DATA
Caution You must call SetSrid() to set the SRID of the builder immediately after it is created, and before
adding any points.
The next step is to specify the type of geometry that will be created. For the SqlGeometryBuilder,
you do this by passing a member of the OpenGisGeometryType enumeration to the BeginGeometry()
method. The equivalent for the SqlGeographyBuilder class is to pass an OpenGisGeographyType to the
BeginGeography() method. There is one enumeration for every type of geometry that can be created.
To begin creating a Point geometry using the SqlGeometryBuilder, you use the following code.
gb.BeginGeometry(OpenGisGeometryType.Point);
Note The SqlGeometryBuilder and SqlGeographyBuilder classes have a near identical set of methods,
except that every occurrence of the word "geometry" becomes "geography". So, BeginGeometry() corresponds
to BeginGeography(), EndGeometry() becomes EndGeography(), and ConstructedGeometry is equivalent to
ConstructedGeography.
The first set of coordinates of the geometry are specified with a call to the BeginFigure() method.
The following code listing creates a point at an x-coordinate of 300500 and y-coordinate of 600200.
gb.BeginFigure(300500, 600200);
If the geometry contains more than a single Point, then additional line segments are added by
calling the AddLine() method. Because the Point we are defining in this example contains only one
pair of coordinates, we do not need to use AddLine(), and we can now end the figure:
gb.EndFigure();
A SqlGeometry Point instance contains only a single figure, so at this point we can end the
geometry:
gb.EndGeometry();
Having ended the geometry, we can then retrieve the constructed SqlGeometry Point via the
ConstructedGeometry property of the SqlGeometryBuilder instance:
SqlGeometry Point = gb.ConstructedGeometry;
This is a simple demonstration application, therefore we then just print the WKT of the
constructed geometry to the console window, using the ToString() method:
94
www.it-ebooks.info
CHAPTER 4 CREATING SPATIAL DATA
Console.WriteLine(Point.ToString());
Here's the full code listing for Program.cs:
using System;
using System.Collections.Generic;
using System.Linq;
using System.Text;
using System.Data.SqlTypes;
using Microsoft.SqlServer.Types;
namespace ProSpatial.Ch4
{
class Program
{
static void Main(string[] args)
{
95
www.it-ebooks.info
CHAPTER 4 CREATING SPATIAL DATA
// Exterior ring
gb.BeginFigure(0, 0);
gb.AddLine(10, 0);
gb.AddLine(10, 20);
gb.AddLine(0, 20);
gb.AddLine(0, 0);
gb.EndFigure();
// Interior ring
gb.BeginFigure(3, 3);
gb.AddLine(7, 3);
gb.AddLine(5, 17);
gb.AddLine(3, 3);
gb.EndFigure();
Note To construct a Polygon geometry using the SqlGeometryBuilder class, each ring requires a separate call
to the BeginFigure() method. The exterior ring is the first to be created, and every subsequent figure defines an
interior ring.
96
www.it-ebooks.info
CHAPTER 4 CREATING SPATIAL DATA
Caution Be sure that all figures and geometries are closed (i.e., every BeginFigure() and BeginGeometry()
have matching EndFigure() and EndGeometry() calls) before retrieving the constructed geometry from the
Builder class.
97
www.it-ebooks.info
CHAPTER 4 CREATING SPATIAL DATA
// Create a circular arc segment that passes through (55,5) and ends at (60,0)
gb.AddCircularArc(55, 5, 60, 0);
Although the examples shown here have used hard-coded coordinate values for simplicity, it is
easy to see how the SqlGeometryBuilder and SqlGeographyBuilder classes provide much more flexibility
for defining dynamic spatial instances than the static methods provided by the geometry and geography
datatypes within SQL Server.
Looking back at the earlier example of the table of GPS points, for example, you could create a CLR
User-Defined Aggregate that used the SqlGeographyBuilder class to construct a LineString representing
the route by reading the coordinate values of each point in the table and passing these to successive calls
to the AddLine() method, returning the constructed geography instance once all points had been added.
98
www.it-ebooks.info
CHAPTER 4 CREATING SPATIAL DATA
Note There is no overload that accepts only latitude, longitude, and z-coordinate values, only one that accepts
latitude, longitude, z-, and m-coordinates. If you do not require an m-coordinate value, simply pass a null
parameter value as in this example.
Summary
In this chapter, you saw different methods of creating instances of the geometry and geography datatype
in SQL Server.
There are static methods dedicated to creating different types of geometries
from Well-Known Text and Well-Known Binary, as well as generic methods to
create any type of geometry.
Static methods must be supplied with a fully formed representation of the
geometry to be created, together with the spatial reference identifier in which
the coordinates of that geometry are defined.
99
www.it-ebooks.info
CHAPTER 4 CREATING SPATIAL DATA
100
www.it-ebooks.info
CHAPTER 5
In the last chapter, we examined various methods that can be used to create individual items of
geography or geometry data, for example, using a static method that accepts WKT or WKB input, or using
one of the SqlGeometryBuilder or SqlGeographyBuilder classes. Each of these methods creates only a
single item of data at a time. So what if you wanted to import an entire set of spatial data?
In this chapter, I will introduce you to some of the sources from which you can obtain publicly
available sets of spatial information, examine the formats in which those datasets are commonly
supplied, and teach you some of the techniques you can use to import that information into SQL Server.
Sourcea Description
http://geodata.grid.unep.ch/ The United Nations Geo Data Portal includes global, national,
regional, and subregional statistics and spatial data, covering
themes such as freshwater, population, forests, emissions,
climate, disasters, health, and GDP.
101
www.it-ebooks.info
CHAPTER 5 IMPORTING SPATIAL DATA
Sourcea Description
http://www.census.gov The U.S. Census Bureau Geography Division has lots of high-
quality spatial information, including a gazetteer, Zip Code
Tabulation Areas (ZCTAs), and the TIGER database of streets,
rivers, railroads, and many other geographic entities (United
States only).
a There may be restrictions on the use of data obtained from these sources. Please refer to the respective providers for
specific details.
Spatial datasets obtained from the sources listed in Table 5-1 may be provided in a variety of
different formats, and may contain a significant volume of data. A full download of U.S. census data,
for example, is several hundred gigabytes in size.
The remainder of this chapter describes some of the alternative formats in which spatial data is
commonly supplied, and explains techniques that you can use to import this data into SQL Server 2012.
102
www.it-ebooks.info
CHAPTER 5 IMPORTING SPATIAL DATA
gazetteer format resembles the index that can be found at the back of an atlas, which may list place
names, or road names, for example, together with the location of that item on the map.
To demonstrate how to add a column of spatial data to a table containing columns of latitude and
longitude (or Northing and Easting coordinates), let's consider an example using a file of earthquake data
provided by the United States Geological Survey (USGS). The USGS makes a number of datasets freely
available that you can download from their website at http://www.usgs.gov. One such dataset lists real-
time, worldwide earthquakes from the past seven days, which you can download directly from
http://earthquake.usgs.gov/eqcenter/catalogs/eqs7day-M1.txt. This file is a comma-separated list of
data containing various attributes of each earthquake in columnar format, as listed and described in Table
5-2.
Datetime A text string describing the date at which the String [DT_STR] 50
recording was made
Region A text string description of the area in which the String [DT_STR] 255
103
www.it-ebooks.info
CHAPTER 5 IMPORTING SPATIAL DATA
earthquake occurred
Note Because the eqs7day-M1.txt file contains a constantly updated feed of data from the last seven days, the
actual content of the file you download will be different from that demonstrated in this chapter.
104
www.it-ebooks.info
CHAPTER 5 IMPORTING SPATIAL DATA
105
www.it-ebooks.info
CHAPTER 5 IMPORTING SPATIAL DATA
You will see the data inserted from the text file, as shown in Figure 5-2.
To test the contents of the Epicenter column, you can now run the following query,
SELECT TOP 5
Eqid,
Epicenter.STAsText() AS Epicenter
FROM
eqs7dayM1;
The results are as follows.
Eqid Epicenter
10325561 POINT (-150.3317 65.0131)
10325555 POINT (-152.2948 57.4106)
00349540 POINT (-119.8993 39.4092)
10325549 POINT (-149.6373 61.2189)
71655381 POINT (-121.287 36.6595)
106
www.it-ebooks.info
CHAPTER 5 IMPORTING SPATIAL DATA
107
www.it-ebooks.info
CHAPTER 5 IMPORTING SPATIAL DATA
108
www.it-ebooks.info
CHAPTER 5 IMPORTING SPATIAL DATA
OGR2OGRBasic Syntax
OGR2OGR is the command-line component of the GDAL/OGR library that can be used to convert
vector data from one format to another. You can retrieve a full list of usage options by typing the
following in a command prompt window (assuming that the directory in which OGR2OGR is installed
is located in the command path).
ogr2ogr /?
The basic syntax to load data from a source to a destination is as follows.
ogr2ogr -f {OutputFormat} {Destination} {Source} {Additional Options}
To import data into SQL Server you state the {OutputFormat} as "MSSQLSpatial", and specify the
{Destination} using the connection string of the target SQL Server database. In the code listing below,
109
www.it-ebooks.info
CHAPTER 5 IMPORTING SPATIAL DATA
the contents of the {Source} file will be loaded into a new table in the ProSpatial database of the
default localhost SQL Server instance, for example:
ogr2ogr
-f "MSSQLSpatial"
"MSSQL:server=localhost;database=ProSpatial;trusted_connection=yes"
{Source}
{Additional Options}
The default OGR2OGR behavior is to create a new table in the destination database named with
the same filename as the {Source} dataset. The data from a shapefile named MyData.shp will be
imported into the SQL Server table MyData, for example. If a table with that name already exists then the
command will fail. This behavior can be modified by specifying one of the following flags in the
{Additional Options}.
-append appends new records into an existing table with the specified name.
-overwrite deletes the existing destination table and re-creates it.
-nln "tablename" inserts data into the named table rather than the default
table named based on the {Source} dataset name.
There are further {Additional Options}, some of which vary depending on the nature of the source
data. In the following sections we examine some practical scenarios that demonstrate these options, as
well as the syntax required to load from {Source} datasets of several common formats.
110
www.it-ebooks.info
CHAPTER 5 IMPORTING SPATIAL DATA
Shapefiles may contain coordinate data defined using either projected or geographic coordinate
reference systems. The following sections show an example of each of these, imported into a column of
the geometry and geography datatype, respectively.
111
www.it-ebooks.info
CHAPTER 5 IMPORTING SPATIAL DATA
Note For the purposes of clarity, I've separated the output format, destination, and source parameter values in
this code listing onto separate lines. However, when executing OGR2OGR you should not insert carriage returns
between parameters, but list them all on the same line with a space separating each.
After importing the dataset, you should find that three tables have been created in the specified
destination database (the ProSpatial database, in my case): precincts, geometry_columns, and
spatial_ref_sys.
The geometry_columns and spatial_ref_sys tables store metadata about any spatial data that has
been imported by OGR2OGR into a destination database. If, at some point in the future, you were to
use this same data as the source of another transformation (suppose you were to export the precincts
dataset from SQL Server to KML) then OGR2OGR would refer to these tables to provide additional
information used in the conversion. In essence, spatial_ref_sys is OGR2OGR's own version of SQL
Server's sys.spatial_reference_systems table. However, such information is not essential and can
always be re-created, so you can ignore these tables for now (or even delete them if you choose,
although be aware that they will be re-created the next time you import any data).
What's more interesting to us right now is the precincts table, which contains the data imported
from the precincts shapefile. The structure of the table is as follows.
[ogr_fid] [int] IDENTITY(1,1) NOT NULL,
[ogr_geometry] [geometry] NULL,
[objectid] [numeric](10, 0) NULL,
[prcnts_id] [float] NULL,
[id] [varchar](4) NULL,
[wdpct] [varchar](4) NULL,
[pct] [varchar](2) NULL,
[shape_area] [numeric](19, 11) NULL,
[shape_len] [numeric](19, 11) NULL
The polygon shapes representing the outline of each precinct have been imported into a geometry
column of the table named ogr_geometry. The attributes from the precincts.dbf file have been used to
populate additional columns in the table, containing various identifiers together with the length and
area of each precinct shape. To ensure uniqueness of each row in the table, OGR2OGR has
automatically added an IDENTITY column, ogr_fid, which contains a unique integer reference for each
record.
To test out the data loaded into the table, we can plot the ogr_geometry column on the Management
Studio Spatial Results Tab, as shown in Figure 5-4.
112
www.it-ebooks.info
CHAPTER 5 IMPORTING SPATIAL DATA
Figure 5-4. The precincts shapefile plotted using the Management Studio Spatial Results tab.
Note After importing a dataset, you may find that OGR2OGR creates additional tables in your database:
geometry_columns and spatial_ref_sys. These tables are used by OGR2OGR to store additional metadata.
113
www.it-ebooks.info
CHAPTER 5 IMPORTING SPATIAL DATA
SRID at all; it's the code that OGR2OGR uses when the destination spatial reference system is
undefined. (Perhaps a code of 99999 would have made this more obvious!)
To assign the correct SRID, we first need to examine the properties of the spatial reference system
in which the coordinates were originally supplied, which is described in the precincts.prj file that
accompanies the shapefile. The PRJ file is a simple text file, so you can load it up in any text editor to
show the contents as follows.
PROJCS[
"NAD_1983_StatePlane_Massachusetts_Mainland_FIPS_2001_Feet",
GEOGCS[
"GCS_North_American_1983",
DATUM["D_North_American_1983",SPHEROID["GRS_1980",6378137.0,298.257222101]],
PRIMEM["Greenwich",0.0],
UNIT["Degree",0.0174532925199433]],
PROJECTION["Lambert_Conformal_Conic"],
PARAMETER["False_Easting",656166.6666666665],
PARAMETER["False_Northing",2460625.0],
PARAMETER["Central_Meridian",-71.5],
PARAMETER["Standard_Parallel_1",41.71666666666667],
PARAMETER["Standard_Parallel_2",42.68333333333333],
PARAMETER["Latitude_Of_Origin",41.0],
UNIT["Foot_US",0.3048006096012192]]
This preceding PRJ text should look familiar to you as the Well-Known Text representation of a
projected spatial reference system, which we examined back in Chapter 1. This particular system is the
state plane coordinate system for Massachussetts Mainland, which uses a Lambert Conformal Conic
projection of the North American 1983 Datum, centered on a central meridian at 71.5 degrees West.
Because this is a projected coordinate system, we can't look for it in SQL Server's
sys.spatial_reference_systems table. Instead, you can go to the EPSG registry website at
http://www.epsg-registry.org and search for spatial reference systems using the name
"Massachusetts Mainland". There will be a few matches returned, but the distinguishing properties of
the system we're looking for is that it's based on the NAD83 datum, with the unit of measure being US
Feet. This leads to only one possible spatial reference system: SRID 2249.
To correct the SRID of the records in the precincts table, you could execute a T-SQL UPDATE query
after importing the data, as follows.
UPDATE precincts
SET ogr_geometry.STSrid = 2249;
A better approach, however, would be to make OGR2OGR assign the correct SRID at the point the
data is inserted. This can be done by specifying the -a_srs parameter, together with the EPSG code of
the spatial reference system to be used. The following example recreates the precincts table using the
a_srs parameter to populate every record with SRID 2249. Note that because I want to replace the
existing precincts table, I've also included the overwrite parameter:
ogr2ogr
-f "MSSQLSpatial"
"MSSQL:server=localhost;database=ProSpatial;trusted_connection=yes"
"precincts.shp"
-a_srs "EPSG:2249"
-overwrite
114
www.it-ebooks.info
CHAPTER 5 IMPORTING SPATIAL DATA
Tip You can find the full list of OGR2OGR options by calling ogr2ogr /? or by looking at
http://www.gdal.org/ogr2ogr.html.
Re-executing the previous code listing to retrieve the SRID of the values in the ogr_geometry
column now shows that all the data have been imported with the correct SRID:
SELECT DISTINCT ogr_geometry.STSrid FROM precincts;
2249
Tip You can download many other interesting shapefile datasets from the U.S. Census by using the Web
download interface at http://www.census.gov/cgi-bin/geo/shapefiles2010/main.
Download this ZIP file and extract its contents. You should see the familiar structure of a shapefile
dataset: the .shp file that defines the coordinate data, the .dbf file containing the various attributes attached
to each shape, the .shx index file, and the .prj file describing the projection parameters. Let's start by
looking at the tl_2010_06_zcta510.prj file, the content of which is shown below:
GEOGCS[
"GCS_North_American_1983",
DATUM["D_North_American_1983",SPHEROID["GRS_1980",6378137,298.257222101]],
PRIMEM["Greenwich",0],
UNIT["Degree",0.017453292519943295]
]
Notice that, because the Well-Known Text definition in the tl_2010_06_zcta510.prj file begins with
the keyword GEOGCS, we know that the coordinates of this shapefile are defined using a geographic
coordinate system, and should be imported into a column of the geography datatype. Now, we just need to
determine the appropriate SRID for the data in this column.
To find out the SRID, you could search for the parameters listed in the tl_2010_06_zcta510.prj file on
the EPSG registry website, as we did to find the SRID of the Massachussetts Mainland projection in the
115
www.it-ebooks.info
CHAPTER 5 IMPORTING SPATIAL DATA
preceding example. However, even though this would allow you to find the correct SRID, it wouldnt
necessarily prove that this data could be imported into SQL Server. Remember that SQL Server only
supports geographic coordinates defined using one of the spatial reference systems listed in the
sys.spatial_reference_systems table. Therefore, instead of looking up the projection on a website, we'll
search for a record in the sys.spatial_reference_systems table that matches the parameters given in the
tl_2010_06_zcta510.prj file. That way, we can look up the corresponding SRID and check that it's a
supported spatial reference system at the same time.
The .prj file states that the ZCTA coordinate data is defined using a geographic coordinate system
based on the NAD 83 datum. We can search for the correct identifier for this spatial reference system
in the sys.spatial_reference_systems table using the query:
SELECT
spatial_reference_id
FROM
sys.spatial_reference_systems
WHERE
well_known_text LIKE 'GEOGCS%"NAD83"%';
The single result returned is as follows.
spatial_reference_id
4269
So, when importing the data contained in the ZCTA shapefile, we should use the a_srs parameter
to assign SRID 4269. Because SRID 4269 is a geographic coordinate reference system, we also need to tell
OGR2OGR to import the data into a column using the geography datatype rather than the geometry datatype
as used in the last example.
OGR2OGR settings that are specific to the destination data format, such as the choice between using the
geometry/geography datatype, are known as layer creation options. You can specify one or more layer
creation options using the lco flag. In the following code listing, two layer creation options are given.
GEOM_TYPE="geography" specifies that the shape data should be inserted into a
column of the geography datatype (the alternative, and default value, is
GEOM_TYPE="geometry").
GEOM_NAME="geog4269" sets the name of the geography column to geog4269 (I've
chosen this column name following my convention of concatenating the datatype
and SRID of the data it contains).
As in the preceding example, I use the a_srs parameter to ensure that the created geometries are
assigned the correct SRID of EPSG:4269. I also use the overwrite option to specify that the requested
destination table should be replaced should it already exist.
I also include two other options:
-nln "CaliforniaZCTA" specifies the name of the destination table in SQL
Server. Without this switch, the table would have been named
tl_2010_06_zcta510 to match the input shapefile, but that's rather difficult to
remember!
-progress is a switch to tell OGR2OGR to display a simple progress bar in the
console window as the data is uploaded. This is particularly useful when
loading larger datasets to give an indication of how far along the import
procedure has progressed (and how much farther it has to go).
116
www.it-ebooks.info
CHAPTER 5 IMPORTING SPATIAL DATA
Heres the full code listing to load the Californian ZCTA shapefile:
ogr2ogr
-f "MSSQLSpatial"
"MSSQL:server=localhost;database=ProSpatial;trusted_connection=yes"
"tl_2010_06_zcta510.shp"
-a_srs "EPSG:4269"
-overwrite
-lco "GEOM_TYPE=geography"
-lco "GEOM_NAME=geog4269"
-nln "CaliforniaZCTA"
-progress
As the records are imported from the shapefile into SQL Server, OGR2OGR will update a
percentage progress bar as illustrated in Figure 5-5.
Once the progress bar has reached 100%, the import is complete and you can select the data from the
CaliforniaZCTA table, the (partial) content of which is displayed in the SSMS Spatial Results tab as shown in
Figure 5-6.
117
www.it-ebooks.info
CHAPTER 5 IMPORTING SPATIAL DATA
Note At the time of writing, the direct URL link for the river basins dataset is
https://www.ga.gov.au/products/servlet/controller?event=FILE_SELECTION&catno=42343.
118
www.it-ebooks.info
CHAPTER 5 IMPORTING SPATIAL DATA
On the File Selection page, click to select the MapInfo Interchange Format (MIF) checkbox in the
Available Files column as shown in Figure 5-7, and then click Continue to File Download. Finally, click
on the dataset title hyperlink to download the file.
Figure 5-7. Downloading a MapInfo dataset from the Geoscience Australia website.
The file download consists of a single ZIP archive containing several files. Extract these files to
find three datasets containing Point (rbasin_point), LineString (rbasin_chain), and Polygon
(rbasin_polygon) features in MapInfo Interchange Format, each having an associated .mid and .mif
file. There is also a copyright notice and a user guide to accompany the dataset.
119
www.it-ebooks.info
CHAPTER 5 IMPORTING SPATIAL DATA
Fortunately, a quick glance through the user guide that accompanies the download reveals a
passage in Section 2.3, Coordinate system, as follows.
Perfect. Now, as we did with the U.S. Census data, we just need to take a look in the
sys.spatial_reference_systems table to find the ID for a geographic coordinate system based on the
AGD66 datum:
120
www.it-ebooks.info
CHAPTER 5 IMPORTING SPATIAL DATA
Figure 5-8. Australian River Basins imported from MapInfo into SQL Server.
121
www.it-ebooks.info
CHAPTER 5 IMPORTING SPATIAL DATA
The resulting Polygons, now contained in the geography column geog4326, are illustrated in Figure 5-9.
Figure 5-9. Massachussetts precinct polygons reprojected to SRID 4326 in the geography datatype.
Reprojection and transformation between different coordinate systems are covered in detail in
Chapter 8.
122
www.it-ebooks.info
CHAPTER 5 IMPORTING SPATIAL DATA
platform (http://earth.google.com). Although the KML format has undergone several revisions since
then (at the time of writing, the latest version is KML 2.2), it continues to be the native format for
storing spatial information used in Google Earth.
In 2008, KML was adopted by the Open Geospatial Consortium as a standard format for spatial
information, and you can now find the latest implementation of the KML specification at the OGC
website, at http://www.opengeospatial.org/standards/kml/.
KML has always been used within the Google Earth community to share user-created spatial data,
however, the popularity and accessibility of the Google Earth platform among the wider Internet com-
munity means that KML is becoming increasingly used for educational and research purposes, as well as
in critical applications such as emergency and disaster services. Coupled with its adoption as a standard
by the OGC, KML is becoming an increasingly important format for the interchange of spatial data.
123
www.it-ebooks.info
CHAPTER 5 IMPORTING SPATIAL DATA
</SchemaData>
</ExtendedData>
<Polygon>
<outerBoundaryIs>
<LinearRing>
<coordinates>-71.008036474492485,42.387093194669127
-71.010502257322116,42.389034720949425 -71.011425746332051,42.38976473048055
-71.011931948239607,42.390174099511839 -71.01262223043183,42.390907499155361
-71.013103274784129,42.391509391490196
-71.008036474492485,42.387093194669127
</coordinates>
</LinearRing>
</outerBoundaryIs>
</Polygon>
</Placemark>
<Placemark>
</Placemark>
</Folder></Document></kml>
When viewed in Google Earth, this KML file is displayed as shown in Figure 5-10.
Figure 5-10. Viewing precincts of Boston exported from SQL Server to a KML file in Google Earth.
124
www.it-ebooks.info
CHAPTER 5 IMPORTING SPATIAL DATA
Note SQL Server Integration Services is not available as part of SQL Server Express. To follow this example you
must be using SQL Server developer, standard or enterprise edition.
125
www.it-ebooks.info
CHAPTER 5 IMPORTING SPATIAL DATA
exactly the same settings as before. Start by giving a name to this connection.
I use "Earthquakes Text File".
4. Click the Browse button, and navigate to the eqs7day-M1.txt text file.
Highlight the file and click open.
5. Change the Text qualifier to be a double quote character (").
6. Click to enable the checkbox next to Column names in the first data row.
7. You now need to configure the datatype of each column in the textfile. Select
the Advanced option from the left-hand pane of the Flat File Connection
Manager Editor. As before, click each column in turn and, from the properties
pane on the right-hand side, amend the values of the Datatype and
OutputColumnWidth fields to match the values shown in Table 5-2.
8. Once you have made the appropriate changes as shown in Figure 5-11, click
OK to close the Flat File Connection Manager Editor.
Figure 5-11. SSIS Flat File Connection Manager to the eqs7day-M1.txt file.
The new connection will appear in the Connection Managers pane at the bottom of the screen.
126
www.it-ebooks.info
CHAPTER 5 IMPORTING SPATIAL DATA
127
www.it-ebooks.info
CHAPTER 5 IMPORTING SPATIAL DATA
We've now set up the relevant connections for our project, so it's time to add the task that will
transform the data from our text file into SQL Server.
128
www.it-ebooks.info
CHAPTER 5 IMPORTING SPATIAL DATA
Note Why do we specify the image [DT_IMAGE] datatype for a column that will contain geography data? SSIS
datatypes are not specific to SQL Server, because SSIS can connect to many different data sources. The image
[DT_IMAGE] datatype, named because it can be used to store image data in databases, can be used to represent
any binary value up to a maximum size of 2^31-1 (2,147,483,647) bytes, including geography or geometry data.
129
www.it-ebooks.info
CHAPTER 5 IMPORTING SPATIAL DATA
To add the code that will be executed by the script task, click on the script tab on the left-hand side of
the Script Editor Transformation dialog box. You can create SSIS custom scripts using either Microsoft
Visual C# 2010 or Microsoft Visual Basic 2010. C# is the default, and that's what I use in this example. (If
you prefer to write your own script in Visual Basic, you can do so by changing the ScriptLanguage
property in the right-hand pane.)
Click the Edit Script button. A new Visual Studio editor window will appear containing the
skeleton template for a C# script component. We edit this script to create a geography Point item to
populate the Location column as part of the dataflow.
Before we can create our new script, we first need to add the necessary references to the .NET
libraries that contain the methods we want to use.
1. From the main menu, select Project Add Reference
2. Scroll down the list of components under the .NET tab, and highlight the
Microsoft.SqlServer.Types component (or Browse to the Program Files
(x86)\Microsoft SQL Server\110\SDK\Assemblies directory and highlight it
from there).
3. Click OK. Microsoft.SQLServer.Types will appear in the References list in the
Solution Explorer pane.
Having included a reference to Microsoft.SQLServer.Types in our project, we also need to include
the corresponding namespace in our code so that we can easily reference the methods included in the
assembly. To do so, add this line to the list of references at the top of main.cs, on the line immediately
following using Microsoft.SqlServer.Dts.Runtime.Wrapper:
using Microsoft.SqlServer.Types;
We also use the BinaryWriter() and MemoryStream() methods from the System.IO namespace, so
include a reference to that namespace by adding the following line,
using System.IO;
The main functionality of our script is contained within the ProcessInputRow() method. This
method acts upon the data in each row of the source file as they pass through the script component.
There is already an empty template for ProcessInputRow() method in the script, as follows.
public override void Input0_ProcessInputRow(Input0Buffer Row)
{
/*
Add your code here
*/
}
Edit this to be as follows.
public override void Input0_ProcessInputRow(Input0Buffer Row)
{
// Instantiate a new geography object
SqlGeography point = new SqlGeography();
// Use the Point() method of the geography datatype to create a point from the lat and lon
// values of this row
point = SqlGeography.Point(Row.Lat, Row.Lon, 4326);
130
www.it-ebooks.info
CHAPTER 5 IMPORTING SPATIAL DATA
[Microsoft.SqlServer.Dts.Pipeline.SSISScriptComponentEntryPointAttribute]
public class ScriptMain : UserComponent
{
public override void Input0_ProcessInputRow(Input0Buffer Row)
{
SqlGeography point = new SqlGeography();
point = SqlGeography.Point(Row.Lat, Row.Lon, 4326);
MemoryStream ms = new MemoryStream();
BinaryWriter bw = new BinaryWriter(ms);
point.Write(bw);
Row.Location.AddBlobData(ms.ToArray());
}
}
Save the script (Ctrl + Shift + S) and then select File Exit to close the script window and return to
the Script Transformation Editor dialog window. Now Click OK to exit the Script Transformation Editor
and return to the Data Flow tab of the main SSIS project.
131
www.it-ebooks.info
CHAPTER 5 IMPORTING SPATIAL DATA
132
www.it-ebooks.info
CHAPTER 5 IMPORTING SPATIAL DATA
Figure 5-14. Columns from the source text file and the Location column created by the script component
mapped to columns in the SQL Server destination.
The SSIS project is now finished and ready to run. The complete dataflow task is illustrated in
Figure 5-15.
133
www.it-ebooks.info
CHAPTER 5 IMPORTING SPATIAL DATA
134
www.it-ebooks.info
CHAPTER 5 IMPORTING SPATIAL DATA
This end result is identical to that obtained from importing the text file using the SQL Server
Import/Export Wizard and then manually adding and populating the geography column with an UPDATE
query. The difference is that, in this case, the derived column was created by the script component in
an SSIS package, which can be easily repeated as part of an automated load task.
Safe FME
If you want to import an ESRI shapefile, MapInfo, or other spatial dataset into SQL Server in a
production environment, another option is to use a commercial third-party component. The most
popular tool for this purpose is the Feature Manipulation Engine (FME) from Safe Software
(http://www.safe.com).
FME can convert data between a huge range of spatial formats (250+), and it can also transform
and manipulate that data in various ways as part of a workflow. In addition to providing a stand-alone
desktop application and server component, FME also provides a set of extensions that integrate with
SQL Server Integration Services, providing additional source readers and destination writers for
working with different formats of spatial data as part of an ETL process.
Figure 5-16 illustrates a simple FME workflow that takes geometry and attribute data from a
shapefile and loads it directly into SQL Server.
135
www.it-ebooks.info
CHAPTER 5 IMPORTING SPATIAL DATA
Figure 5-16. Loading data from a shapefile to SQL Server using Safe FME.
It is not my intention to describe Safe FME in any further detail here, but I do recommend that you
check it out if you have need to load a large amount of spatial data from many different formats into
SQL Server.
Summary
In this chapter, you learned about a variety of data formats in which existing spatial data may be
provided, and how you can import those data into SQL Server 2012. Specifically, this chapter covered
the following.
There are many alternative file formats in which spatial information is
commonly stored and shared, including tabular geographic information, the
ESRI shapefile format, MapInfo files, KML, and many more.
There are a number of sources from which you can obtain freely available
spatial data over the Internet. The data obtained from these sources range in quality
and in coverage. If you are downloading spatial data for use in a critical
application, be sure to check the accuracy of that data first!
136
www.it-ebooks.info
CHAPTER 5 IMPORTING SPATIAL DATA
137
www.it-ebooks.info
CHAPTER 6
Geocoding
Even though the dedicated spatial datatypes, geometry and geography, are a relatively recent addition to
SQL Server, almost every existing SQL Server database already contains some form of spatial
information, that is, data that describes the location of some feature or other. This spatial information
might not be of the sort we have considered so far in this book, being described using coordinates from
a spatial reference system, but might instead be the addresses of customers or suppliers, postal codes,
delivery routes, or the names of cities or regions for which a sales manager is responsible. Wouldnt it
be useful if you could conduct spatial analysis based on this sort of common, unstructured spatial
information? That is exactly what geocoding enables you to do.
The process of geocoding involves taking a text-based description of a location or place, such as a
street address, the name of a landmark, or a postal code, and deriving a structured spatial representation
of that feature. In practice, the representation returned by most geocoding methods is a single pair of
coordinates representing the approximate center of the supplied address, but they may also return a
bounding box representing the extent of a feature, or even a Polygon representing its precise shape.
To demonstrate the process of geocoding, let's consider an example. The address of the White
House, the official residence of the president of the United States, is 1600 Pennsylvania Avenue NW,
Washington DC, 20500. If you were to geocode this address, you might obtain the coordinates 38.8980
degrees latitude, 77.0365 degrees longitude, corresponding to a Point located in the WGS84 spatial
reference system. The geocoding process is illustrated in Figure 6-1.
There are a number of different ways to provide geocoding functionality; some geocoding tools are
desktop-based applications, whereas others are services that you access over the Web. In this chapter, I'll
show you how to create a .NET geocoding function that calls into the Bing Maps REST geocoding service,
and how to integrate this function into SQL Server. I'll then discuss some considerations for using
geocoding in a batch or asynchronous environment, and introduce the concept of reverse-geocoding.
139
www.it-ebooks.info
CHAPTER 6 GEOCODING
geocoding service. You can read more details about the Bing Maps REST Locations API at
http://msdn.microsoft.com/en-us/library/ff701715.aspx.
Because the Locations API is free for most applications, and simple to use, I use this service to
provide geocoding functionality in this chapter. However, you could apply a similar approach to many
other web-based geocoding services.
140
www.it-ebooks.info
CHAPTER 6 GEOCODING
Having entered the required registration details, select the menu option to "Create or view keys".
You'll be prompted to enter a name for your application, as well as the URL at which the application
will be hosted. You'll also need to select the type of application for which the key will be used. For the
purposes of this chapter, you can request a "Developer" application key, with an application URL
pointing to a localhost address, such as http://127.0.0.1, as shown in Figure 6-3.
Keys are used to track usage of the Bing Maps service, and are also used for recording certain
billable transactions, as described at http://msdn.microsoft.com/en-us/library/ff859477.aspx.
However, you are only billed for production applications for which you have taken out an enterprise
license agreement. You won't incur any costs for using Bing Maps in a development environment, or
for any production applications that are not covered by an enterprise license. For the rest of this
chapter, I'll assume that you've signed up for a developer key, which will be a 64-character
alphanumeric string such as this:
At7aATG4p6LjyQha9TFGduTh15_i5N0t4R341k3y!Uvs3VIE2QhsOSRx_tFoKURkD5vOeRs
In any of the code samples in this chapter where you see the text ENTERYOURBINGMAPSKEY, be sure to
substitute your key in the appropriate place to be able to access the Bing Maps service.
141
www.it-ebooks.info
CHAPTER 6 GEOCODING
class, for example). This request is based on a template that contains a number of parameters for the
elements of the address to be geocoded, as in the following example (be sure to enter your Bing Maps
key where indicated):
http://dev.virtualearth.net/REST/v1/Locations?countryRegion=UK&adminDistrict=Norfolk
&locality=Norwich&postalCode=NR2 4TE&addressLine=27 Heigham Street&o=xml&key=
ENTERYOURBINGMAPSKEY
In this example, the service is called to geocode an address at 27 Heigham Street, Norwich,
Norfolk, NR2 4TE, UK. The parameter o=xml is provided to specify that the result should be returned in
XML format (the default is to return results as JSON). The results returned by the service when
geocoding the preceding address with a valid Bing Maps key are as follows:
<Response>
<Copyright>Copyright 2011 Microsoft and its suppliers. All rights reserved. This API
cannot be accessed and the content and any results may not be used, reproduced or
transmitted in any manner without express written permission from Microsoft
Corporation.</Copyright>
<BrandLogoUri>http://dev.virtualearth.net/Branding/logo_powered_by.png</BrandLogoUri>
<StatusCode>200</StatusCode>
<StatusDescription>OK</StatusDescription>
<AuthenticationResultCode>ValidCredentials</AuthenticationResultCode>
<TraceId>ece6495ca47a416f8dc4dad13b64f6a2</TraceId>
<ResourceSets>
<ResourceSet>
<EstimatedTotal>1</EstimatedTotal>
<Resources>
<Location>
<Name>NR2 4TE, Norwich, Norfolk, United Kingdom</Name>
<Point><Latitude>52.634046</Latitude><Longitude>1.286097</Longitude></Point>
<BoundingBox>
<SouthLatitude>52.630183282429321</SouthLatitude>
<WestLongitude>1.2776115753233459</WestLongitude>
<NorthLatitude>52.637908717570674</NorthLatitude>
<EastLongitude>1.2945824246766542</EastLongitude>
</BoundingBox>
<EntityType>Postcode1</EntityType>
<Address>
<AdminDistrict>England</AdminDistrict>
<AdminDistrict2>Norfolk</AdminDistrict2>
<CountryRegion>United Kingdom</CountryRegion>
<FormattedAddress>NR2 4TE, Norwich, Norfolk, United Kingdom</FormattedAddress>
<Locality>Norwich</Locality>
<PostalCode>NR2 4TE</PostalCode>
</Address>
<Confidence>High</Confidence>
</Location>
</Resources>
</ResourceSet>
</ResourceSets>
</Response>
Most geocoding algorithms are approximate. This is because they rely on parsing a number of
free-text user-supplied fields, and are therefore unlikely to find an exact match for the address
142
www.it-ebooks.info
CHAPTER 6 GEOCODING
entered. Instead, they tend to separate out components of an address and find the best match (or
matches) for the elements given.
In the example above, the value of the <EntityType> element, Postcode1, signifies that, in this case,
the result is based on a match of the supplied postcode value only rather than the full street address.
This might occur in situations when the full street address could not be located (because it was
mistyped, for example). The result contains both an approximate center point for the geocode postcode
match, provided in the <Point> element of the <Location>, as well as a <BoundingBox> representing the
extent of this area.
Note The following steps describe the creation of a .NET assembly using the freely available Visual C# 2010
Express Edition. You may use a different edition of Visual Studio, but be aware that you may find that some of the
menu items appear under different headings than described here.
143
www.it-ebooks.info
CHAPTER 6 GEOCODING
Figure 6-4. Creating a new project using Visual C# 2010 Express Edition
Once the project has been created, the project workspace will appear, and the main window will
show the contents of the default class file within the project.
144
www.it-ebooks.info
CHAPTER 6 GEOCODING
Figure 6-5. Setting the compile options for the Geocoder project
145
www.it-ebooks.info
CHAPTER 6 GEOCODING
namespace ProSpatial.Ch6
{
public partial class UserDefinedFunctions
{
146
www.it-ebooks.info
CHAPTER 6 GEOCODING
// Clean up
webresponse.Close();
stream.Dispose();
streamReader.Dispose();
}
catch(Exception ex)
{
// Exception handling code here;
}
147
www.it-ebooks.info
CHAPTER 6 GEOCODING
return xmlResponse;
}
}
}
This code is pretty self-explanatory; It defines a single method, Geocode(), which accepts five
string parameters, representing the different elements of the address to be geocoded. The Geocode()
method constructs a URL template for the Locations API service including placeholders for each of the
address elements, and inserts the supplied values in the appropriate places. An HttpWebRequest is used
to call the service, and the response is read into an XmlDocument, which is then returned by the method.
148
www.it-ebooks.info
CHAPTER 6 GEOCODING
(string)postalCode,
(string)addressLine
);
}
// Failed to geocode the address
catch (Exception ex)
{
SqlContext.Pipe.Send(ex.Message.ToString());
}
// Check that we received a valid response from the Bing Maps geocoding server
if (geocodeResponse.GetElementsByTagName("StatusCode")[0].InnerText != "200")
{
throw new Exception("Didn't get correct response from geocoding server");
}
------ Build started: Project: ProSpatialCh6, Configuration: Release Any CPU ------
ProSpatialCh6 -> C:\Spatial\ProSpatialCh6.dll
========== Build: 1 succeeded or up-to-date, 0 failed, 0 skipped ==========
Thats all that is required from Visual Studio, so you can go back to SQL Server now.
149
www.it-ebooks.info
CHAPTER 6 GEOCODING
To complete the change, we need to reconfigure the server to reflect the changed value, by issuing
a T-SQL query with the RECONFIGURE statement as follows.
RECONFIGURE;
GO
The SQL Server configuration settings will now be updated to allow you to run user-defined CLR
code, and you should receive the following message.
150
www.it-ebooks.info
CHAPTER 6 GEOCODING
The database is now configured and ready to import the geocoding assembly.
Note Although setting a database to be trustworthy might be the simplest way to grant access to external
resources, it is not necessarily representative of security best practice in a production environment. Understanding
the different security levels and permission sets for .NET assemblies within SQL Server can be complicated. If
you'd like to learn more on this subject, try reading Chapter 7 of "Expert SQL Server 2008 Development" (by the
author, Apress, 2009).
151
www.it-ebooks.info
CHAPTER 6 GEOCODING
Figure 6-7. The geocoding assemby listed in SQL Server Management Studio Object Explorer
152
www.it-ebooks.info
CHAPTER 6 GEOCODING
@postalCode nvarchar(max),
@countryRegion nvarchar(max)
) RETURNS geography
AS EXTERNAL NAME
ProSpatialCh6.[ProSpatial.Ch6.UserDefinedFunctions].GeocodeUDF;
This code creates a T-SQL function called Geocode that provides an interface to the GeocodeUDF
method contained within the ProSpatialCh6 assembly. It specifies that the parameters that must be
provided when using the Geocode function are an address, the name of a city, the subdivision (i.e.,
county/state), the postal code (or ZIP code), and the country or region. These parameters correspond
exactly to the parameters passed by the .NET method to the Locations API geocoding service. The
return value of the function is a geography instance, containing a Point geometry associated with that
address.
Note When creating a function from a .NET assembly, the syntax for the AS EXTERNAL NAME clause is
AssemblyName.[Namespace.ClassName].FunctionName.
Thats it! Congratulations, youve just used .NET to add a new function to SQL Server, extending
the existing spatial functionality by allowing you to geocode address data.
To check the accuracy of the function, we can plot this result against a road map of New York city
using Bing Maps, as shown in Figure 6-8.
153
www.it-ebooks.info
CHAPTER 6 GEOCODING
Figure 6-8. Plotting the geocoded address location of the Apress offices on Bing Maps
154
www.it-ebooks.info
CHAPTER 6 GEOCODING
from the beginning of this chapter, referring to the address of the White House.
Suppose that we had omitted the fact that we were looking for 1600
Pennsylvania Avenue, Washington D.C. Knowing only the street number and
name, and that this address was in the United States, we could have been
referring to Pennsylvania Avenue in Baltimore, or Atlantic City, or one of two
Pennsylvania Avenues in the state of West Virginia, or one of several others. In
these cases, the Bing Maps Locations API could return multiple rows of results,
representing each of the possible matching geocoded locations. Our current
function automatically returns the first match, but it is not necessarily the
correct one.
Because the result of a geocoding operation has the potential to return multiple rows and columns
of data, it sounds like a suitable situation in which to use a table-valued function (TVF) that returns a
table of several possible matched locations, allowing the user to choose which one they meant. Such a
TVF is demonstrated in the following code listing:
[Microsoft.SqlServer.Server.SqlFunction(
Name = "GeocodeTVF",
FillRowMethodName = "GeocodeTVFFillRow",
DataAccess = DataAccessKind.Read,
TableDefinition = @"Name nvarchar(255),
Point geography,
BoundingBox geography")]
public static System.Collections.IEnumerable GeocodeTVF(
SqlString addressLine,
SqlString locality,
SqlString adminDistrict,
SqlString postalCode,
SqlString countryRegion
)
{
try
{
geocodeResponse = Geocode(
(string)countryRegion,
(string)adminDistrict,
(string)locality,
(string)postalCode,
(string)addressLine
);
}
// Failed to geocode the address
catch (Exception ex)
{
SqlContext.Pipe.Send(ex.Message.ToString());
}
155
www.it-ebooks.info
CHAPTER 6 GEOCODING
156
www.it-ebooks.info
CHAPTER 6 GEOCODING
return items;
}
157
www.it-ebooks.info
CHAPTER 6 GEOCODING
158
www.it-ebooks.info
CHAPTER 6 GEOCODING
Dataflow API that is part of the Bing Spatial Data Services, described in more detail at
http://msdn.microsoft.com/en-us/library/ff701734.aspx.
Reverse Geocoding
So far, we've only looked at one-way geocoding, starting with an address and deriving the
corresponding latitude/longitude coordinates, which enables us to create a geography Point
representing that location. But what about performing the reverse operation: starting with a set of
coordinates and returning a description of the closest matching point of interest?
There are several potential applications of such a function. Suppose that you had collected
coordinates from a GPS system, or triangulated the location of a mobile phone call, and wanted to
obtain a description of that location: in what town is this person, for example? Or to what address is this
delivery vehicle closest?
The process of reverse-geocoding is illustrated in Figure 6-9.
10 Downing Street
(51.5035, -0.1278) London
SW1A 2AA
Figure 6-9. Reverse geocoding
Fortunately, the Bing Maps REST services also expose reverse-geocoding functionality. To
perform a reverse-geocode, rather than providing the address to be looked up, you simply provide the
(WGS84) latitude and longitude coordinates instead. The following code listing demonstrates the URL
syntax required to reverse-geocode a Point located at a latitude of 47.64054 degrees and a longitude of
122.12934 west:
http://dev.virtualearth.net/REST/v1/Locations/47.64054,-122.12934?o=xml&key=YOURBINGMAPSKEY
This URL returns the following XML response
<Response xmlns:xsi="http://www.w3.org/2001/XMLSchema instance"
xmlns:xsd="http://www.w3.org/2001/XMLSchema"
xmlns="http://schemas.microsoft.com/search/local/ws/rest/v1">
<Copyright>Copyright 2010 Microsoft and its suppliers. All rights reserved. This API
cannot be accessed and the content and any results may not be used, reproduced or transmitted
in any manner without express written permission from Microsoft Corporation.</Copyright>
<BrandLogoUri>http://dev.virtualearth.net/Branding/logo_powered_by.png</BrandLogoUri>
<StatusCode>200</StatusCode>
<StatusDescription>OK</StatusDescription>
<AuthenticationResultCode>ValidCredentials</AuthenticationResultCode>
<TraceId>fbfb8df89423415589eec14c8de7585e</TraceId>
<ResourceSets>
<ResourceSet>
<EstimatedTotal>2</EstimatedTotal>
<Resources>
<Location>
<Name>1 Microsoft Way, Redmond, Washington 98052, United States</Name>
<Point>
<Latitude>47.640568390488625</Latitude>
<Longitude>-122.1293731033802</Longitude>
159
www.it-ebooks.info
CHAPTER 6 GEOCODING
</Point>
<BoundingBox>
<SouthLatitude>47.636705672917948</SouthLatitude>
<WestLongitude>-122.137016420622</WestLongitude>
<NorthLatitude>47.6444311080593</NorthLatitude>
<EastLongitude>-122.1217297861384</EastLongitude>
</BoundingBox>
<EntityType>Address</EntityType>
<Address>
<AddressLine>1 Microsoft Way</AddressLine>
<AdminDistrict>Washington</AdminDistrict>
<AdminDistrict2>King</AdminDistrict2>
<CountryRegion>United States</CountryRegion>
<FormattedAddress>1 Microsoft Way, Redmond, Washington 98052, United
States</FormattedAddress>
<Locality>Redmond</Locality>
<PostalCode>98052</PostalCode>
</Address>
<Confidence>Medium</Confidence>
</Location>
<Location>
<Name>1 Microsoft Way, Redmond, WA 98052 6399</Name>
<Point>
<Latitude>47.639747</Latitude>
<Longitude>-122.129731</Longitude>
</Point>
<BoundingBox>
<SouthLatitude>47.635884282429323</SouthLatitude>
<WestLongitude>-122.13737419709076</WestLongitude>
<NorthLatitude>47.643609717570676</NorthLatitude>
<EastLongitude>-122.12208780290925</EastLongitude>
</BoundingBox>
<EntityType>Address</EntityType>
<Address>
<AddressLine>1 Microsoft Way</AddressLine>
<AdminDistrict>WA</AdminDistrict>
<AdminDistrict2>King County</AdminDistrict2>
<CountryRegion>United States</CountryRegion>
<FormattedAddress>1 Microsoft Way, Redmond, WA 98052 6399</FormattedAddress>
<Locality>Redmond</Locality>
<PostalCode>98052 6399</PostalCode>
</Address>
<Confidence>Medium</Confidence>
</Location>
</Resources>
</ResourceSet>
</ResourceSets>
</Response>
Because the response returned from the reverse-geocoding service follows exactly the same
structure as for the geocoding service, this makes it easy to adapt the UDF and TVF functions
introduced earlier this chapter to cater for reverse-geocoding if required as well.
160
www.it-ebooks.info
CHAPTER 6 GEOCODING
Summary
In this chapter, you learned how to extend the functionality of SQL Server to geocode address data.
Specifically, you learned the following:
Geocoding can be used to derive a structured spatial representation of a feature
on the Earth from descriptive information about that feature.
The Bing Maps Web Services provide a method that can be used to geocode data,
accessible via a REST interface over the Web.
You can create a reusable .NET method to access the REST service, and then
create one or more wrapper classes to expose geocoding functionality in SQL
Server as a UDF or TVF. These methods can be called directly from T-SQL code
to return geography data from a supplied address value.
Reverse-geocoding is the opposite process to geocoding: taking a
latitude/longitude coordinate pair, and deriving the corresponding street
address or location. This functionality is also provided by the Bing Maps REST
service, and can be incorporated into SQL Server using similar methods.
As a final note, bear in mind that, although I've demonstrated that it is possible to geocode address
data directly from the database layer, it does not always make sense to do so. I've highlighted the
benefits of creating modular reusable code, so that if you decide to perform geocoding in an
application layer instead, you can do so using the same code base as described in this chapter.
161
www.it-ebooks.info
CHAPTER 7
Most software comes supplied with online technical documentation containing examples that describe
and demonstrate all of that software's features, and SQL Server is no different. Microsoft Books
Online includes a reference guide to all of the geography and geometry methods available in SQL
Server 2012, including sample code listings illustrating their usage. You can view the full reference
guide at http://msdn.microsoft.com/en-us/library/bb933790%28v=SQL.110%29.aspx.
It's all very well following the code samples provided by the software manufacturer, and Books
Online is an excellent reference if all you want to know is the syntax required to use each method.
However, many of the examples are quite artificial, demonstrating each function in isolation, using
perfect, compact, fully formed datasets. In the real world, we must accept that the spatial data we are
given to work with is frequently less perfect than in these idealistic examples. You'll often discover
that the textbook samples start to fall down when they are applied to the gnarly ugly data that we, as
developers, all face on a day-to-day basis.
In this chapter, well examine some of the issues that can affect the quality, accuracy, and
reliability of spatial information. I'll also show you techniques to minimize or eliminate the negative
effect of such issues, ultimately leading to more robust, useful spatial applications.
Precision
If you were to ask your bank manager how much money you had in your account and she replied
"somewhere between $800 and $810," you'd probably be a bit concerned. We have come to expect that
all data held in databases should have an exact known value. If that value can't be determined then we
assume that there must be an error in the process used to gather the information or a bug in the system
used to retrieve it. However, one important fact to remember is that all spatial data is only ever an
approximation. The accuracy of that approximation depends on a number of factors:
Firstly, there are approximations and assumptions in the underlying
theoretical approach used to define the data; as explained in Chapter 1, spatial
data is defined relative to a geodetic model of the earth. That model provides
only a rough fit around the true shape of the earth, and doesn't match it
exactly. Projected coordinate systems can introduce further inaccuracies, as
geodetic features must be distorted in order to be represented on a flat plane.
Remember also that vector spatial data stores only simple geometric shapes,
which fail to reflect the true, complex and organic shape of many features on
the Earth.
Secondly, precision errors may be introduced as a result of practical
limitations of the measuring systems and instruments used for collecting and
163
www.it-ebooks.info
CHAPTER 7 PRECISION, VALIDITY, AND ERRORS
Storage Precision
SQL Server stores geography and geometry coordinates as binary values, adhering to the IEEE-754
standard for binary floating-point arithmetic. Based on this standard, each coordinate is represented
as a double precision floating point number that is 64 bits (8 bytes) long. By storing coordinates as
floating point values, SQL Server ensures that a large range of possible coordinate values can be
accommodated, while requiring only a limited amount of storage.
Although SQL Server stores coordinates as binary floating point values, WKT, the format in which
coordinate data is most commonly supplied, is a text-based format in which coordinates are stated in
decimal format. Not all decimal numbers can be represented exactly in floating-point binary format.
Whenever you use a static method to create an instance of geography or geometry data from WKT (or
any other static method that accepts decimal input), the supplied coordinates are therefore implicitly
converted to the closest equivalent binary floating point value. Essentially, each WKT coordinate
value is CAST from nvarchar to binary(8). As with conversion of any other type of data between
datatypes, this presents the possibility of truncation or rounding of the coordinate values.
The range of possible coordinate values that can be stored in an 8-byte binary value is roughly
equivalent to 15 digits of decimal precision. So, for example, the x-coordinate of a geometry Point might
be stored as 0x3FF3C0CA428C59F8, which corresponds to a decimal value of 1.234567890123456. Some of
the geography and geometry static methods allow you to create instances from coordinate values with
greater precision than this; the Point() method, for example, will accept decimal coordinate values
with up to 38 digits of precision. However, those coordinates will ultimately be stored with the same
fixed 64-bit binary precision, and supplying coordinates with greater precision will not lead to any
greater precision of the stored geography or geometry value.
To demonstrate, consider the following code listing, which creates two geometry Point instances
using the Point() static method. The y-coordinate of both instances is the same. The value of the x-
coordinate differs, but only following the sixteenth decimal place:
DECLARE @Precise geometry;
SET @Precise = geometry::Point(10.23456789012345, 0, 0);
SELECT @Precise.STEquals(@SuperPrecise);
The additional decimal places of precision supplied for the @SuperPrecise Point cannot be
represented in an 8-byte binary value. As a result, the x-coordinate values that SQL Server stores for
both the @Precise and @SuperPrecise instances are exactly the same (0x3C8B514819782440). The
STEquals() method, which compares whether two instances are equal, returns the value 1, which
confirms that both Points are the same.
164
www.it-ebooks.info
CHAPTER 7 PRECISION, VALIDITY, AND ERRORS
The preceding example demonstrated that supplying excess precision has no effect on the
coordinate values of a geometry; they will always be stored in the database as 8-byte binary values.
Likewise, it is worth remembering that coordinates supplied with less decimal precision will still
occupy a fixed 8-bytes when converted to binary and saved to the database. Thus, in the following
example, even though @HighPrecision is instantiated from coordinates with greater precision than
@LowPrecision, the two Points occupy exactly the same amount of space.
DECLARE @LowPrecision geometry;
SET @LowPrecision = geometry::STPointFromText('POINT(1 2)', 0);
SELECT
DATALENGTH(@LowPrecision),
DATALENGTH(@HighPrecision);
22 22
The result demonstrates that you do not create simpler geometries, or require less space to repre-
sent geometries created from less precise coordinates. Both Points in this example (and all Points created
from a single coordinate pair) occupy exactly 22 bytes, as shown by the result of the DATALENGTH function.
geometry Precision
Firstly, let's consider the precision of coordinates defined in a planar coordinate system, stored using
the geometry datatype. Suppose you were given the coordinates of a location measured in the
EPSG:26913 spatial reference system. This is the spatial reference identifier corresponding to UTM
Zone 13N, a projected spatial reference system based on the NAD83 datum. The unit of measurement
for coordinates in this system is the meter.
Because geometry coordinates represent the distance of a point from an origin in a linear unit of
measure, it is easy to see the correlation between the precision of the supplied coordinates and the
accuracy of the stored position. Assuming that coordinates were supplied using the full precision
capable of being represented in an 8-byte binary value, equivalent to 15 decimal places of accuracy,
you can be sure that the stored location of any geometry data in SQL Server based on this SRID will be
accurate to within 0.000000000000001 meters of the supplied value, in both the x- and y-axes.
geography Precision
What about the effects of fixed precision in coordinates of the geography datatype? In such cases it's
less obvious what the magnitude of any error would be because, whereas geographic coordinates are
expressed in angular degrees, we tend to think of the "accuracy" of a location in terms of meters away
from its true position. So just how far, in terms of distance measured across the earths surface, does 15
165
www.it-ebooks.info
CHAPTER 7 PRECISION, VALIDITY, AND ERRORS
decimal places of one degree of latitude or longitude correspond? To answer this question, we have to
consider the error in the longitude and latitude coordinates separately.
The distance on the Earths surface represented by one degree of longitude or one degree of
latitude varies depending on the corresponding latitude. This is illustrated in Figure 7-1.
Figure 7-1. The distance covered by one degree of longitude varies with latitude
At the equator, where the distance represented by one degree of longitude is greatest, one degree
of longitude corresponds to about 111.3 km. We can verify this using SQL Server's STDistance()
method to measure the distance between two points on the equator separated by one degree of
longitude, as follows:
DECLARE @EquatorA geography = geography::Point(0,0,4326);
DECLARE @EquatorB geography = geography::Point(0,1,4326);
SELECT @EquatorA.STDistance(@EquatorB);
The result is shown following. As the coordinate values in this case were stated using SRID 4326,
this result is calculated relative to the WGS84 datum and expressed in meters:
111319.490735885
As you approach the poles, the meridian lines representing points of equally spaced longitude
converge, and the distance represented by one degree of longitude decreases. At the Tropic of Cancer
(the circle of latitude at which the sun appears directly overhead during the June Solstice), which lies at
a latitude of approximately 23.5 degrees, the distance covered by one degree of longitude is reduced to
about 102.1 km, which can be verified using the following code listing:
DECLARE @TropicOfCancerA geography = geography::Point(23.5,0,4326);
DECLARE @TropicOfCancerB geography = geography::Point(23.5,1,4326);
SELECT @TropicOfCancerA.STDistance(@TropicOfCancerB);
166
www.it-ebooks.info
CHAPTER 7 PRECISION, VALIDITY, AND ERRORS
102140.828881171
At the Arctic circle, which lies at a latitude of approximately 66.5 degrees North, one degree of
longitude covers only 44.5 km:
DECLARE @ArcticCircleA geography = geography::Point(66.5,0,4326);
DECLARE @ArcticCircleB geography = geography::Point(66.5,1,4326);
SELECT @ArcticCircleA.STDistance(@ArcticCircleB);
44513.5512918299
At the poles, one degree of longitude covers an infinitely small distance. Figure 7-2 depicts a
graph illustrating the distance on the Earth's surface represented by one degree of longitude at any
latitude.
Figure 7-2. The distance on the Earth's surface covered by one degree of longitude at any latitude
167
www.it-ebooks.info
CHAPTER 7 PRECISION, VALIDITY, AND ERRORS
Due to the oblate shape of the earth, the distance covered by one degree of latitude also varies with
latitude, although the variance is much less than that of longitude. One degree of latitude at the equator
corresponds to about 110.6 km whereas at the poles, where the distance is greatest, one degree of
latitude corresponds to about 111.7 km.
So, assuming 15 decimal places of precision and a "worst-case" scenario in which one degree of
latitude/longitude represents 111 km along the ground, coordinates of geography instances are
capable of describing a location with a precision of 111 km/10E15, which still means that any supplied
coordinate value can be stored with submillimeter precision, wherever it lies on the globe. It's
probably safe to say that SQL Server therefore stores spatial data with sufficient accuracy for all but the
most demanding of spatial applications.
namespace ProSQLSpatial.Ch7
{
class RoundGeography : IGeographySink110
{
private readonly IGeographySink110 _target; // the target sink
private readonly int _precision; // the number of fractional digits in the return
value
168
www.it-ebooks.info
CHAPTER 7 PRECISION, VALIDITY, AND ERRORS
_target = target;
_precision = precision;
}
// Each BeginFigure call rounds the start point to the required precision.
public void BeginFigure(double x, double y, double? z, double? m)
{
_target.BeginFigure(Math.Round(x, _precision), Math.Round(y, _precision), z, m);
}
169
www.it-ebooks.info
CHAPTER 7 PRECISION, VALIDITY, AND ERRORS
return constructed.ConstructedGeography.MakeValid();
}
}
}
The preceding code listing makes use of an IGeographySink110 interface, which is populated with a
supplied geography instance. When the first point is added to the geography instance (in the
BeginFigure() method), and as each subsequent point is added (in the AddLine() or AddCircularArc()
methods), the coordinate values are rounded to the specified number of decimal places. Finally, the
MakeValid() method is called on the constructed geography instance to ensure that the rounded
coordinate values have not caused the geometry to degenerate, and the resulting value is returned to the
caller.
You can compile this method in an assembly and import it into SQL Server, then register a
corresponding function as follows:
CREATE FUNCTION dbo.RoundGeography (
@g geography,
@precision int
) RETURNS geography
AS EXTERNAL NAME
ProSpatialCh7.[ProSpatial.Ch7.UserDefinedFunctions].RoundGeography;
To test the effectiveness of the new RoundGeography function, we'll use it to round the coordinates
of a Point representing the location of the Eiffel Tower in Paris from 16 decimal places to 5 decimal
places, as follows:
DECLARE @EiffelTower geography = 'POINT(2.2945117950439298 48.858259942745526)';
DECLARE @RoundedEiffelTower geography = dbo.RoundGeography(@EiffelTower, 5);
SELECT
@EiffelTower.ToString() AS WKT,
DATALENGTH(@EiffelTower.ToString()) AS Length
UNION ALL
SELECT
@RoundedEiffelTower.ToString() AS WKT,
DATALENGTH(@RoundedEiffelTower.ToString()) AS Length;
The results are as follows:
WKT Length
POINT (2.2945117950439298 48.858259942745526) 90
POINT (2.29451 48.85826) 48
By rounding the coordinate values to 5 decimal places, we have nearly halved the size of the the
WKT string to be transferred, from 90 bytes to 48 bytes. What effect has this had on the accuracy of the
data? By executing the following query, we can see that we have shifted the location of the Eiffel Tower
by just 13 cm:
SELECT @EiffelTower.STDistance(@RoundedEiffelTower);
0.13187268312192
170
www.it-ebooks.info
CHAPTER 7 PRECISION, VALIDITY, AND ERRORS
Depending on the purpose of the application, this difference of 13 cm may or may not be
significant. It will be a matter of deciding on an individual case-by-case basis the appropriate tradeoff
between accuracy and datasize.
Note Reducing the number of decimal places in a coordinate value will not reduce the size of spatial data stored
in SQL Server, but may reduce the size of data transferred to and from the database if transmitted in text format.
Calculation Precision
In the last section, we looked at issues relating to the precision with which coordinates are supplied,
stored, and retrieved from SQL Server. In practice, it is relatively unlikely that you will encounter too
many precision problems here; as long as you do not have a highly specialized application that
requires greater than 64-bit coordinate accuracy, you can be fairly sure that SQL Server will be able to
store and retrieve it with full fidelity.
A separate, and perhaps more important issue to consider relates to the precision with which SQL
Server perform calculations on coordinates. Every time you use any method that creates or modifies a
geometry or geography instance, or compares the relationship between two geometries, there is the
potential for coordinates to be modified slightly, and this is very important to understand, especially if
you are creating any application that relies on testing for strict equality between two geometries.
To demonstrate the issue, lets look at example. Consider the following two lines:
DECLARE @line1 geometry = 'LINESTRING(0 13, 431 310)';
DECLARE @line2 geometry = 'LINESTRING(0 502, 651 1)';
You can view these lines in SSMS Spatial Results tab as follows:
SELECT @line1
UNION ALL SELECT @line2;
The result is shown in Figure 7-3.
Figure 7-3. Two intersecting lines viewed in the SSMS spatial results tab
171
www.it-ebooks.info
CHAPTER 7 PRECISION, VALIDITY, AND ERRORS
The two lines clearly cross each other, and SQL Server will calculate the point at which they
intersect using the STIntersection() method:
SELECT
@line1.STIntersection(@line2).ToString();
The calculated result of the point of intersection is POINT (335.23450808497148 244.008466128158).
So far, there's nothing unusual. But what about if we run the following query instead:
SELECT
@line1.STIntersection(@line2).STIntersects(@line1), --0
@line1.STIntersection(@line2).STIntersects(@line2); --0
The STIntersects() method returns 0 in both cases. When expressed in words, the result of this
query suggests The point at which line1 intersects line2 does not intersect line1 or line2. Huh?
Lets look at another example, this time involving two overlapping Polygons:
DECLARE @square geometry = 'POLYGON((0 0, 100 0, 100 100, 0 100, 0 0))';
DECLARE @rectangle geometry = 'POLYGON((-10 5, 10 5, 10 15, -10 15, -10 5))';
SELECT
@rectangle.STIntersects(@square),
@rectangle.STIntersection(@square).STArea();
The preceding code listing creates a simple square Polygon, one hundred units high by one
hundred units wide, and a smaller rectangular Polygon that overlaps it on the left-hand side. These
shapes are illustrated in Figure 7-4.
The region of the overlap between the two geometries is a square 10 units wide by 10 units high,
exactly 100 units in area. However, the result returned by the STArea() method when used to calculate
the area created by the intersection in the preceding code listing is 99.999999999995.
The numbers involved in this example are hardly complex, so you would probably expect the result
to be exactly 100, right? Whats interesting is that, if we increase the size of the large square by
extending it further along the x- and y-axes without changing the region of overlap with the smaller
172
www.it-ebooks.info
CHAPTER 7 PRECISION, VALIDITY, AND ERRORS
rectangle, the calculated area of intersection between the two geometries becomes less and less
precise. The following code listing demonstrates the calculated area of intersection between the
rectangle and increasing-sized square Polygons which it overlaps (the true area of overlap
maintained consistently at 100 units):
DECLARE @rectangle geometry = 'POLYGON((-10 5, 10 5, 10 15, -10 15, -10 5))';
Note Because WKT is a parsed text format, you can state coordinates using scientific notation. In the preceding
code listing, the coordinate value 1e9 is equal to 1,000,000,000.
So whats going on here? Its not the rounding issues caused by conversion from binary to
decimal; as explained earlier, that only occurs when a new geometry is instantiated from, or retrieved
as, WKT. It is, however, another kind of internal coordinate conversion. Although SQL Server stores
coordinates as floating point binary values, it performs spatial calculations using integer arithmetic.
Floating point coordinates are, by the very nature of any floating point system, approximate, and
floating point arithmetic is not robust; calculating the difference between two similar floating point
values, for example, can create results containing errors with large orders of magnitude. Integer
arithmetic, in contrast, is always exact and reliable, whatever the integer values supplied.
Therefore, in order to perform spatial calculations in a robust fashion, SQL Server first snaps all
coordinates to an integer grid. Note that coordinate values are not simply converted to the closest
integer value (which would lose any fractional precision), but they are scaled to an integer value on a
dynamically sized grid. This can be tricky to visualize, but I find a helpful analogy is to think back to
mathematics lessons at school, in the days before MS Excel, when you had to draw graphs by hand on a
piece of graph paper. The skill in drawing such graphs was to choose an appropriate scale, deciding
how many units each cell on the grid should represent. The ideal scale is one in which all the data
values can be plotted exactly on a grid cell boundary, but subject to the fact that you had to get the
whole range of data to fit on a fixed size piece of paper.
SQL Server takes a similar approach; every spatial calculation takes place on an integer grid of
fixed size, equivalent to the fixed size sheet of graph paper. The geometries involved in the calculation
are scaled to fit onto this grid, which is equivalent to adjusting the size of the bars for a bar chart drawn
173
www.it-ebooks.info
CHAPTER 7 PRECISION, VALIDITY, AND ERRORS
on graph paper, say. Having achieved the optimum scaling possible considering the overall fixed grid
size, each coordinate is then snapped to the closest cell on the grid. Some results will lie exactly on a
grid cell boundary, in which case the results returned by SQL Server will be exact but, in other cases,
the result will be subject to some error. Finally, the result of the integer calculation is then converted
back into floating-point coordinates again, in the same scale as originally supplied.
The amount of error introduced in this process is determined by two main factors:
The overall size of the grid on which calculations are performed.
The extent of the geometries involved in a given calculation.
With regard to the first of these factors, all spatial calculations in SQL Server 2012 are performed
on integer values with 42 bits of precision. This 42-bit precision is used across all SQL Server 2012 and
SQL Azure instances, and represents a considerable increase in precision compared to the 27-bit
integer grid used by SQL Server 2008/R2.
As for the second factor, the greater the range of coordinate values that must be covered by the
integer grid, then the more coarse the grid resolution must become in order to accommodate the full
set of data. As each cell becomes larger and the distance between cells increases, there is a greater
distance that coordinates might potentially be snapped. This explains the effect demonstrated in the
preceding code listing, in which the area of intersection between the square and rectangular Polygon
became less and less precise as the overall size of the square became larger and larger, because the
integer grid had to become ever more coarse to accommodate it, leading to less accurate results.
Caution Because SQL Server 2012 performs calculations using a more precise, 42-bit, integer grid than that
used in SQL Server 2008/R2, the results of any spatial computations may differ from those obtained under a
different version of the database server. This is especially important to consider when upgrading a database from
a previous version, because executing a spatial query in SQL Server 2012 may lead to different results than those
previously obtained from running exactly the same query under SQL Server 2008/R2.
174
www.it-ebooks.info
CHAPTER 7 PRECISION, VALIDITY, AND ERRORS
Validity
Whenever you create a new geometry or geography instance from a static method, such as
STGeomFromText(), SQL Server performs a number of checks on the created geometry. Examples of some
of these checks are as follows:
A LineString must have at least two distinct points.
A Polygon must contain at least four points, and the start point and end point
must be the same.
The spatial reference system in which the coordinates of any geography instance
are defined must correspond to one of the supported spatial reference systems in
the sys.spatial_reference_systems table.
If any of these checks fail, SQL Server will throw an exception. Generally speaking, each rule has
its own exception number and corresponding exception message. Some examples of specific error
175
www.it-ebooks.info
CHAPTER 7 PRECISION, VALIDITY, AND ERRORS
messages are demonstrated in the following code listings. Firstly, attempting to create a geometry
LineString containing only one point:
DECLARE @LineMustHave2Points geometry;
SET @LineMustHave2Points = geometry::STLineFromText('LINESTRING(3 2)', 0);
System.FormatException: 24117: The LineString input is not valid because it does not have
enough distinct points. A LineString must have at least two distinct points.
The next code listing demonstrates a Polygon that only contains three points in its exterior ring:
DECLARE @PolygonMustHave4Points geometry;
SET @PolygonMustHave4Points = geometry::STPolyFromText('POLYGON((0 0, 10 2, 0 0))', 0);
System.FormatException: 24305: The Polygon input is not valid because the ring does not have
enough distinct points. Each ring of a polygon must contain at least three distinct points.
And the following example attempts to create a Point from well-formed WKT, but using an invalid
spatial reference identifier:
DECLARE @UnsupportedSRID geography;
SET @UnsupportedSRID = geography::STPointFromText('POINT(52 0)', 123);
System.FormatException: 24204: The spatial reference identifier (SRID) is not valid. The
specified SRID must match one of the supported SRIDs displayed in the
sys.spatial_reference_systems catalog view.
Each of the preceding examples generates an exception message, and the requested geometry is
not created. Fortunately, the exception messages state pretty clearly what the problem is, so finding
and fixing these errors is relatively easy.
However, even if you were to address these errors so that the static method executes successfully, it
does not necessarily mean that the resulting geometry is valid. In other words, it is possible to create
some geometries that do not cause any exceptions to be thrown, but which do not meet the OGC
requirements for that type of geometry. For example, the following code listing creates a geometry
Polygon in which the interior rings crosses the exterior ring. This code listing can be executed and will
successfully create a geometry instance with no exception occurring:
DECLARE @SelfIntersectingPolygon geometry;
SET @SelfIntersectingPolygon = 'POLYGON((0 0, 6 0, 3 5, 0 0), (2 2, 8 2, 8 4, 2 4, 2 2))';
Another example of an invalid geometry is a LineString that retraces itself, defining certain
segments of the LineString twice, as shown in the following code listing:
DECLARE @InvalidLinestring geometry;
SET @InvalidLinestring = 'LINESTRING(0 0, 10 0, 5 0)';
Once again, this geometry can be created and stored in SQL Server, even though it is not valid.
However, although it is possible to define, store, and retrieve invalid geometries without receiving
any exception messages, you will receive an exception if you attempt to use them in any spatial
queries. For example, if you were to try to determine the area of the preceding Polygon using the
STArea() method:
176
www.it-ebooks.info
CHAPTER 7 PRECISION, VALIDITY, AND ERRORS
@SelfIntersectingPolygon.STArea();
you would get the following result:
As the message contained within the exception suggests, in order to perform any operations on an
invalid geometry, you must first make it valid.
In this case, although SQL Server lets us create the Polygon geometry without exception, it is not
valid according to the OGC specifications because the exterior ring consists of a single spike. Because
the exterior ring intersects itself it is not simple, which is one of the requirements of the Polygon
geometry.
Invalid geometries can only be created by supplying an invalid representation to a static method;
all geometry instances returned by SQL Server's instance methods (such as the result of the STUnion(),
STIntersection(), or STBuffer() methods) will always be valid. There is therefore no need to call
STIsValid() on the results of any of these methods.
Note All geometries returned as the result of a SQL Server method on an existing instance will be valid.
177
www.it-ebooks.info
CHAPTER 7 PRECISION, VALIDITY, AND ERRORS
the Well-Known Text and see the cause of the problem. However, this is not the case when dealing with
invalid geometries constructed from thousands of points, represented in binary format or imported
programmatically! This was a real headache in SQL Server 2008, when data imported from external
sources (such as U.S. census TIGER data) would frequently be declared as "invalid," with no further
information provided as to the cause.
Fortunately, SQL Server 2012 introduces a new method specifically intended to help diagnose
problems with invalid geometries, IsValidDetailed(). Rather than simply returning a bit value
representing if a geometry is valid or not, the IsValidDetailed() method returns a code and text
description of the cause of any invalidity of a geometry, as demonstrated in the following code listing:
DECLARE @g geometry = 'LINESTRING(0 0, 5 10, 8 2)';
DECLARE @h geometry = 'LINESTRING(0 0, 10 0, 5 0)';
DECLARE @i geometry = 'POLYGON((0 0, 2 0, 2 2, 0 2, 0 0), (1 0, 3 0, 3 1, 1 1, 1 0))';
SELECT
@g.STIsValid() AS STIsValid, @g.IsValidDetailed() AS IsValidDetailed
UNION ALL SELECT
@h.STIsValid(), @h.IsValidDetailed()
UNION ALL SELECT
@h.STIsValid(), @i.IsValidDetailed();
The results are as follows:
STIsValid IsValidDetailed
1 24400: Valid
0 24413: Not valid because of two overlapping edges in curve (1).
0 24404: Not valid because polygon ring (2) intersects itself or some other ring.
Based on the output of IsValidDetailed(), you can then investigate further the particular problem
identified with the geometry. For a full list of possible problems identified by IsValidDetailed(),
please refer to the appendix.
A final point worth noting is that STIsValid() and IsValidDetailed() check only that a geometry is
well-formed according to OGC definitions. They do not necessarily ensure that a geometry makes
logical sense. In the following code listing, the STIsValid() method confirms that the Point geometry is
valid, even though the coordinates lie outside the bounds of the area of use of the specified spatial
reference system (the minimum x,y bounds of EPSG:27700 are at 0,0):
DECLARE @g geometry;
SET @g = geometry::STGeomFromText('POINT(-300412 -200123)', 27700);
SELECT @g.STIsValid();
178
www.it-ebooks.info
CHAPTER 7 PRECISION, VALIDITY, AND ERRORS
LINESTRING(0 0, 10 0)
Note that the result is still a LineString, and still represents exactly the same set of points,
although the end point of the LineString is now different. Now let's consider the example of the
Polygon consisting of a single spike.
DECLARE @Spike geometry = 'POLYGON((0 0,1 1,2 2,0 0))';
SELECT @Spike.MakeValid().ToString()
In this case, the appropriate geometry type to represent the three distinct points in the supplied
WKT is not a Polygon as originally stated, but a LineString. The result of the MakeValid() method in this
case is therefore a valid LineString, as follows:
LINESTRING (2 2, 1 1, 0 0)
Caution Note that in some cases (as demonstrated here), the MakeValid() method may return a different type
of geometry than that originally supplied.
Finally, let's consider the case of the Polygon in which the interior ring crossed over the exterior
ring:
DECLARE @SelfIntersectingPolygon geometry;
SET @SelfIntersectingPolygon = 'POLYGON((0 0, 6 0, 3 5, 0 0), (2 2, 8
2, 8 4, 2 4, 2 2))';
SELECT @SelfIntersectingPolygon.MakeValid().ToString();
In this example, the ambiguous areas of space defined by the rings of the original supplied Polygon
have been divided into four separate Polygons, contained within a MultiPolygon collection. The
segments of interior ring that lay outside the original exterior ring have been treated as defining new
Polygons, as shown in Figure 7-5.
MULTIPOLYGON (
((2.4000000000000004 4, 3.6 4, 3 5, 2.4 4, 2.4000000000000004 4)),
((2 3.3333333333333335, 2.4 4, 2 4, 2 3.3333333333333335)),
((4.8 2, 8 2, 8 4, 3.6000000000000005 4, 4.8 2)),
((0 0, 6 0, 4.8 2, 2 2, 2 3.3333333333333335, 0 0))
)
179
www.it-ebooks.info
CHAPTER 7 PRECISION, VALIDITY, AND ERRORS
Notice also that the result of this example demonstrates how precision issues can arise as a result
of any spatial operations in SQL Server, including the result of the MakeValid() operation (the very
first coordinate, if calculated using decimal arithmetic, would be 2.4 exactly, not 2.4000000000000004).
Figure 7-5. A MultiPolygon created as a result of calling MakeValid() on an invalid Polygon in which the
interior ring intersects the exterior ring.
If called on a existing valid instance, MakeValid() has no effect. Therefore it is a very useful
method that can be safely called to ensure geography or geometry data is valid before passing it on for
further calculations.
180
www.it-ebooks.info
CHAPTER 7 PRECISION, VALIDITY, AND ERRORS
Handling Errors
The geometry and geography datatypes are CLR datatypes, so spatial operatons are executed within the
SQLCLR environment, the .NET Framework Common Language Runtime process hosted by SQL
Server. The .NET Framework provides its own exception-handling mechanism, which is quite separate
from the mechanism used to deal with T-SQL exceptions typically encountered in SQL Server. So, how
do the two systems interact when an exception occurs when dealing with geography or geometry data?
SQL Server automatically wraps an exception handler around any SQLCLR managed code
executed from within SQL Server. This includes any user-defined CLR procedures or functions as well
as the system-defined CLR methods used by the geography and geometry datatypes. The purpose of the
wrapper is that, if any managed code throws an exception, it is caught by the wrapper, which then
generates an error, rather than allowing the exception to bubble up further through the system. The
error message created by the wrapper contains details of the SQLCLR exception, together with a stack
trace of the point at which it occurred.
There are several components to this error message, so let's dissect it into its separate elements,
starting with the first line: Msg 6522, Level 16, State 1, Line 2. This line actually contains four
pieces of information:
The value immediately following Msg tells us the error number that has occurred;
in this case, it is error number 6522.
The Level tag informs us that the error in this case has been classified as a level 16
error. Every error in SQL Server is assigned a level in the range between 1 and 25,
and this value can be used as an approximate indication of the severity of the
error. A level 16 error falls into the category of "errors that can be corrected by the
181
www.it-ebooks.info
CHAPTER 7 PRECISION, VALIDITY, AND ERRORS
Notice that the error number, level, and state, together with the initial part of the error message are
identical to that received previously, even though the cause of the error in this case was completely
182
www.it-ebooks.info
CHAPTER 7 PRECISION, VALIDITY, AND ERRORS
different. It is only by examining the error message in full that we get to see the underlying CLR
exception that caused the error to be triggered (in this case, System.FormatException 24204).
Error-Handling Mechanisms
Generally speaking, it is best to deal with any exceptions in code at the lowest level possible. In the
case of user-defined CLR functions, this means adding code to handle the exception within the CLR
function itself, in which case it never needs to be caught at the T-SQL level. However, this is not an
option for system-defined CLR types; there is no way to add exception-handling code to the sealed
Microsoft.SqlServer.Types.dll library, so any exceptions will inevitably bubble up and trigger the
error handler in the SQLCLR wrapper demonstrated in the previous section.
How, then, should you create specific code paths to handle such exceptions? The general approach
to error-handling in T-SQL (and in many other programming languages) is to use a TRY/CATCH
construct, containing two blocks of code. The try block contains exception-prone code that is to be
"tried." The second block of code, called the catch block, contains code that should be called in the event
that the code in the try block fails. As soon as any exception occurs within the try block, execution
immediately jumps into the catch block, which is known as "catching" the exception. Within the catch
block, different courses of action can be taken depending on the nature of the exception that occurred,
which can (generally) be determined by examining the value of the ERROR_NUMBER() function as shown
in the following code listing:
BEGIN TRY
SELECT geometry::STPolyFromText('POLYGON((0 0, 10 2, 0 0))', 0);
END TRY
BEGIN CATCH
IF ERROR_NUMBER() = 123
-- Code to deal with error 123 here
SELECT 'Error 123 occurred'
ELSE IF ERROR_NUMBER() = 456
-- Code to deal with error 456 here
SELECT 'Error 456 occurred'
ELSE
SELECT ERROR_NUMBER() AS ErrorNumber;
END CATCH
The problem is that this common approach to selecting conditional code paths based on T-SQL
error number won't work for the geography and geometry datatypes, because every exception occurring
within managed code will lead to the same T-SQL error: generic error 6522.
In order to create different code paths according to the CLR exception that occurred, we must parse
the contents of ERROR_MESSAGE() to try to identify the original CLR exception number specified in the
stack trace. The exceptions generated by the system-defined CLR types have five-digit exception
numbers in the range 24000 to 24999, so can be distilled from the ERROR_MESSSAGE() string using the T-
SQL PATINDEX function. The following code listing demonstrates this approach:
BEGIN TRY
SELECT geometry::STPolyFromText('POLYGON((0 0, 10 2, 0 0))', 0);
END TRY
BEGIN CATCH
-- Has a SQLCLR error occurred?
IF ERROR_NUMBER() = 6522
BEGIN
-- Retrieve the error message
DECLARE @errorMsg nvarchar(max) = ERROR_MESSAGE();
183
www.it-ebooks.info
CHAPTER 7 PRECISION, VALIDITY, AND ERRORS
}
return result;
}
The preceding code listing wraps a call to the SqlGeography Parse() method in a try block, which
allows the possibility to include appropriate exception handling code in the corresponding catch block.
To use this approach, you could register the GeogTryParse() function in SQL Server and use it in place
of the built-in geography Parse() method.
Table 7-1 lists some of the common exceptions you might want to create code paths to deal with,
and their corresponding exception numbers. For a full list of all exceptions, please see the appendix of
this book.
184
www.it-ebooks.info
CHAPTER 7 PRECISION, VALIDITY, AND ERRORS
24112 The well-known text (WKT) input is empty. To input an empty instance,
specify an empty instance of one of the following types: Point, LineString,
Polygon, MultiPoint, MultiLineString, MultiPolygon, or GeometryCollection.
24114 The label {0} in the input well-known text (WKT) is not valid. Valid labels are
POINT, LINESTRING, POLYGON, MULTIPOINT, MULTILINESTRING,
MULTIPOLYGON, or GEOMETRYCOLLECTION.
24117 The LineString input is not valid because it does not have enough distinct
points. A LineString must have at least two distinct points.
24118 The Polygon input is not valid because the exterior ring does not have enough
points. Each ring of a polygon must contain at least three distinct points.
24119 The Polygon input is not valid because the start and end points of the exterior
ring are not the same. Each ring of a polygon must have the same start and
end points.
24120 The Polygon input is not valid because the interior ring number {0} does not
have enough points. Each ring of a polygon must contain at least three points.
24121 The Polygon input is not valid because the start and end points of the interior
ring number {0} are not the same. Each ring of a polygon must have the same
start and end points.
24306 The Polygon input is not valid because the start and end points of the ring are
not the same. Each ring of a polygon must have the same start and end points.
Summary
In this chapter, you learned some of the issues that can affect the quality, accuracy, and robustness of
spatial data in SQL Server, and some of the methods that can be used to prevent the negative effect of
such issues.
SQL Server stores coordinates as 8-byte floating-point binary values, according
to the IEEE-754 specification.
When you create a geography or geometry instance from a static method such as
STGeomFromText(), the coordinate values are converted from nvarchar to binary(8).
This has the possibility for truncation or rounding of the supplied values.
When you perform calculations involving geometries, SQL Server uses integer
arithmetic based on a dynamically scaled 42-bit grid. This introduces an
element of approximation into the results of any spatial methods.
185
www.it-ebooks.info
CHAPTER 7 PRECISION, VALIDITY, AND ERRORS
You should avoid attempting to perform tests of exact equality between any two
geometries, instead testing whether they lie within a certain tolerance of each
other.
It is possible to create invalid geometries, but these can only be stored and
retrieved. In order to perform other operations, they must be made valid first.
The process of making a geometry valid may cause it to change type (e.g., from a
LineString to a MultiLineString) and may also cause its coordinates to shift
slightly.
Errors encountered when using the geography and geometry datatypes will
initially trigger a CLR exception. This will in turn be caught by a wrapper and
lead to a T-SQL error 6522.
Although you cannot use ERROR_NUMBER() to switch between alternative code
paths for CLR exceptions, you can distill the contents of ERROR_MESSAGE() to
retrieve the number of the underlying CLR exception.
186
www.it-ebooks.info
CHAPTER 8
SQL Server supports spatial data defined in a wide variety of spatial reference systems used across the
world. When you create an individual item of geography data, you can use any one of the 392 spatial
reference systems listed in the sys.spatial_reference_systems table. And if you create geometry data,
you can use any spatial reference system you like; you can even define your own coordinate system
that uses coordinates measured from an origin somewhere in your living room.
Coordinate values are only valid for the spatial reference system in which they were defined, and
you can only compare two coordinates if they were obtained from the same coordinate system. In SQL
Server, this means that when you want to perform an operation involving two items of spatial datato
work out whether they intersect, to calculate the distance between them, or to join them together, for
exampleboth items must be of the same data type (i.e., both geography or both geometry), and both
must be defined using the same SRID. If you attempt to compare geometry data with geography data you
will receive an exception. If you attempt to compare two instances of the same type defined using
different SRIDs you will receive a NULL result.
The constraint that every item of data must use the same spatial reference system may not
present too many difficulties if you source your own spatial data; simply choose a reference system
that is appropriate for your application, and ensure that all coordinates are measured and consistently
recorded using that system. Unfortunately, as developers we rarely have such control over how our
data is gathered, and we frequently have to accommodate data provided in a variety of different SRIDs.
For example, data from the U.S. Census bureau (http://www.census.gov) is provided using geographic
coordinates based on the NAD83 datum (SRID 4269), but clearinghouses and GIS departments of
individual states typically use the state plane coordinate system of that state. Government agencies of
some other countries, such as Great Britain and Finland, use spatial reference systems based on the
national grid of those particular countries. And then, of course, there is 4326, the most commonly used
SRID for global applications.
SQL Server does not include any inbuilt methods to convert data between the two spatial
datatypes, or to transform coordinates between different spatial reference systems of the same
datatype. However, in this chapter I'll show you how you can integrate a freely available, open source
projection library, Proj.NET, into SQL Server, and use the methods of this library to transform
coordinates between different spatial reference systems suitable for both the geography and geometry
datatypes. By transforming data from different sources into a common, consistent spatial format, you
can then use the full range of spatial methods across your whole dataset.
Datum Transformation
Every spatial reference system used to define geospatial information is based on an underlying model
of the Earth: the datum. The datum describes the size and shape of the ellipsoid used to approximate the
shape of the Earth and the reference points on the surface of the Earth used to realize that model.
187
www.it-ebooks.info
CHAPTER 8 TRANSFORMATION AND REPROJECTION
To compare the definitions of spatial reference systems based on different datums, let's retrieve
the well-known text (WKT) representation of two geographic spatial reference systems listed in SQL
Server's sys.spatial_reference_systems table.
Note The Classical scholars among you may suggest that the plural of datum, referred to in the preceding
sentence, is data. Although this may be true of the Latin term from which the word is derived, when used in the
context of a geodetic datum, the correct English plural is datums. Fortunately, this also prevents confusion with the
more common use of the word "data" with which we are familiar in SQL Server!
First, let's look at the Luxembourg 1930 spatial reference system, SRID 4181. To retrieve the well-
known text for this system, execute the following query:
SELECT well_known_text
FROM sys.spatial_reference_systems
WHERE spatial_reference_id = 4181;
The results are shown following (indents added):
GEOGCS[
"Luxembourg 1930",
DATUM[
"Luxembourg 1930",
ELLIPSOID["International 1924", 6378388, 297]
],
PRIMEM["Greenwich", 0],
UNIT["Degree", 0.0174532925199433]
]
Now let's compare this to the Deutches Hauptdreiecksnetz system used in Germany, which is SRID
4314. To retrieve the well-known text for this system, execute the following query:
SELECT well_known_text
FROM sys.spatial_reference_systems
WHERE spatial_reference_id = 4314;
Again, the results have been formatted and shown as follows:
GEOGCS[
"DHDN",
DATUM[
"Deutsches Hauptdreiecksnetz",
ELLIPSOID["Bessel 1841", 6377397.155, 299.1528128]
],
PRIMEM["Greenwich", 0],
UNIT["Degree", 0.0174532925199433]
]
Examining the preceding WKT output, we can see several similarities between the two spatial
reference systems; they are both geographic coordinate systems, in which coordinates are stated in
angular degrees measured from the Greenwich prime meridian. However, the datum on which those
coordinates are applied is different in each system. The Luxembourg 1930 datum defines a set of points
188
www.it-ebooks.info
CHAPTER 8 TRANSFORMATION AND REPROJECTION
relative to the International 1924 ellipsoid, which has a radius of 6,378,388 meters at the equator, and
6,356,911.946 meters at the Poles. The Deutches Hauptdreiecksnetz system uses the Bessel 1841
ellipsoid, which is based on a more conservative estimate of the size of the earth, with an ellipsoid of
equatorial radius 6,377,397.155 meters and polar radius of 6,356,078.965 meters.
Note Instead of giving the dimensions of both the equatorial and polar axes, the WKT definition of a spatial
reference gives the length of the larger, semi-major axis together with the inverse flattening ratio. This ratio, which
generally has a value of around 300, indicates how much larger the earth is assumed to be at the equator than at
the poles, and can be used to derive the polar axis of the ellipsoid model.
The different ellipsoid models used in these datums are illustrated in Figure 8-1.
You cannot compare geography data defined using SRID 4181 to geography data defined using SRID
4314 without first transforming the sets of data to be based on a consistent datum. Failure to do so (i.e.,
treating geographic coordinates defined relative to one datum as if they had been defined on another)
can lead to coordinate locations being incorrect by several hundred meters.
Datum transformation involves converting coordinate values defined using one geodetic datum
into the equivalent coordinate values based on a different datum. In the context of the previous
example, we could either transform the coordinates defined using the Luxembourg datum to the DHDN
datum, or vice versa; the important thing is that all the resulting data should be defined relative to the
same datum.
189
www.it-ebooks.info
CHAPTER 8 TRANSFORMATION AND REPROJECTION
Transformation Algorithms
The process of datum transformation involves applying a mathematical function to adjust coordinate
values based on a specified algorithm and set of parameters. There are several alternative
transformation algorithms available, with varying degrees of complexity and accuracy. Generally
speaking, the greater the number of parameters used in the transformation, the more complex the
algorithm, and the greater the accuracy of the results.
In the following section, I'll examine three different algorithms that can be used to transform
coordinates between geodetic datums, in increasing order of complexity.
As an example, the location of Cardiff, Wales, has coordinates of (51.4826, 3.18183) relative to the
WGS84 datum. The same location expressed relative to the OSGB36 datum is (51.4821, 3.18057). Thus
we can define a set of offsets to convert from WGS84 to OSGB36 as follows:
, The change in latitude between "from" datum and "to" datum = 0.0005
degrees.
, The change in longitude between "from" datum and "to" datum = 0.00126
degrees.
However, this method is unable to allow for any change in shape between the datums, and is only
really suitable for low-precision applications in a limited area. The offsets calculated here are based on
190
f
www.it-ebooks.info
CHAPTER 8 TRANSFORMATION AND REPROJECTION
the differences between datums calculated at a single point and, as the geographic range of coordinates
to be transformed increases, the accuracy of the results obtained by this method diminishes.
Figure 8-3. Molodensky five-parameter transformation applies translation and scaling of datum, but not
rotation.
191
www.it-ebooks.info
CHAPTER 8 TRANSFORMATION AND REPROJECTION
Figure 8-4. The Helmert transformation rotates and translates coordinate axes between datums
For example, the parameters required to perform a Helmert transformation from the Bessel 1841
datum to the WGS84 datum are as follows:
cx, Translation along the x-axis = 582 meters
cy, Translation along the y-axis = 105 meters
cz, Translation along the z-axis = 414 meters
rx, Rotation about the x-axis = 1.04 arcseconds
ry, Rotation about the y-axis = 0.35 arcseconds
rz, Rotation about the z-axis = 3.08 arcseconds
s, Scaling factor = 8.3 parts per million
Note The preceding section does not provide an exhaustive list of transformation algorithms, but describes
some of those most commonly used in GIS applications. Other more complex algorithms are sometimes used,
particularly in specialized scientific use, where the degree of accuracy required is much greater.
192
www.it-ebooks.info
CHAPTER 8 TRANSFORMATION AND REPROJECTION
193
www.it-ebooks.info
CHAPTER 8 TRANSFORMATION AND REPROJECTION
Figure 8-5. Applying datum transformation parameters to convert between any two datums via WGS84
Note There are different conventions regarding how to define the three rotation parameters supplied to the
Helmert transformation, which essentially differ in their interpretation of what is being rotated. In a position vector
rotation, the axes remain constant and the frame of points is rotated around the axes. In a coordinate frame
rotation, the points remain constant while the axes themselves are rotated. The practical difference between these
two definitions is that the rotation parameters have changed sign. If you obtain Helmert transformation parameters
it's best to check which convention was used, and reverse the rotation parameters as necessary.
194
www.it-ebooks.info
CHAPTER 8 TRANSFORMATION AND REPROJECTION
reproject them back onto a two-dimensional plane using the set of parameters required by the
destination coordinate system (known as a forward projection).
To illustrate these concepts, lets consider the NAD83 geographic coordinate reference system,
SRID 4269, which is widely used across North America. The well-known text representation of this
spatial reference system is as follows:
GEOGCS["NAD83",
DATUM["North American Datum 1983",
SPHEROID["GRS 1980", 6378137.0, 298.257222101, AUTHORITY["EPSG","7019"]],
AUTHORITY["EPSG","6269"]],
PRIMEM["Greenwich", 0.0, AUTHORITY["EPSG","8901"]],
UNIT["degree", 0.017453292519943295],
AUTHORITY["EPSG","4269"]]
The Massachusetts Mainland spatial reference system (EPSG:2249) is a projected spatial reference
based on a Lambert Conic Conformal projection of the NAD83 datum. The well-known text
representation of this system is as follows:
PROJCS["NAD83 / Massachusetts Mainland",
GEOGCS["NAD83",
DATUM["North_American_Datum_1983",
SPHEROID["GRS 1980", 6378137.0, 298.257222101, AUTHORITY["EPSG","7019"]],
AUTHORITY["EPSG","6269"]],
PRIMEM["Greenwich", 0.0, AUTHORITY["EPSG","8901"]],
UNIT["degree", 0.017453292519943295],
AUTHORITY["EPSG","4269"]],
PROJECTION["Lambert Conic Conformal (2SP)", AUTHORITY["EPSG","9802"]],
PARAMETER["central_meridian", -71.5],
PARAMETER["latitude_of_origin", 41.0],
PARAMETER["standard_parallel_1", 42.68333333333334],
PARAMETER["false_easting", 200000.0],
PARAMETER["false_northing", 750000.0],
PARAMETER["standard_parallel_2", 41.71666666666667],
UNIT["US survey foot", 0.3048006096012192, AUTHORITY["EPSG","9003"]],
AXIS["Easting", EAST],
AXIS["Northing", NORTH],
AUTHORITY["EPSG","2249"]
]
The NAD83 Universal Transverse Mercator Zone 18N, SRID 26918, is another projected spatial
reference system based on the NAD83 datum. It shares the same underlying geodetic model of the
Earth as the Massachusetts Mainland system. However, it uses a transverse Mercator projection rather
than a conic projection in which coordinate values are measured in meters rather than feet.
The well-known text for SRID 26918 is as follows:
PROJCS["NAD83 / UTM zone 18N",
GEOGCS["NAD83",
DATUM["North_American_Datum_1983",
SPHEROID["GRS 1980", 6378137, 298.257222101, AUTHORITY["EPSG","7019"]],
AUTHORITY["EPSG","6269"]],
PRIMEM["Greenwich",0, AUTHORITY["EPSG","8901"]],
UNIT["degree",0.01745329251994328],
AUTHORITY["EPSG","4269"]],
PROJECTION["Transverse_Mercator"],
PARAMETER["latitude_of_origin",0],
195
www.it-ebooks.info
CHAPTER 8 TRANSFORMATION AND REPROJECTION
PARAMETER["central_meridian",-75],
PARAMETER["scale_factor",0.9996],
PARAMETER["false_easting",500000],
PARAMETER["false_northing",0],
UNIT["metre",1, AUTHORITY["EPSG","9001"]],
AXIS["Easting",EAST],
AXIS["Northing",NORTH],
AUTHORITY["EPSG","26918"]
]
To convert coordinate data from the Masachussets State Plane projected system into UTM Zone
18N projection first requires unprojection into NAD83, the underlying geodetic model on which both
projections are based, and then reprojection into UTM Zone 18N. This is illustrated in Figure 8-6.
In the preceding example, the source and target projected spatial reference systems were both
based on the same underlying geodetic datum: the North American Datum 1983. But what if you
wanted to convert between projected spatial reference systems based on different datums?
To do this, you need to add an additional step to perform a datum conversion. The process is as
follows:
First, unproject from the source projection to geodetic coordinates based on the
source datum.
Then, perform a datum conversion from the source datum to the target datum
using, for example, the Helmert transformation discussed earlier this chapter.
Finally, reproject geographic coordinates from the target datum into the target
projected spatial reference system.
To demonstrate, suppose that you wanted to convert coordinates from the Massachussets Mainland
spatial reference system based on the NAD1983 datum into the CONUS Albers projection based on the
NAD1927 datum. The process required to perform this conversion is illustrated in Figure 8-7.
196
www.it-ebooks.info
CHAPTER 8 TRANSFORMATION AND REPROJECTION
Figure 8-7. Unprojection, datum transformation, and reprojection from one projected spatial reference
system to another.
x = sin[n( 0)]
y = 0 cos[n( 0)]
where
197
www.it-ebooks.info
CHAPTER 8 TRANSFORMATION AND REPROJECTION
198
www.it-ebooks.info
CHAPTER 8 TRANSFORMATION AND REPROJECTION
To populate this table, you need to provide the EPSG spatial reference identifier and the well-
known text definition of each spatial reference system between which you want to convert data. You
can find details of spatial reference systems on the Internet, including from http://www.epsg-
registry.org or http://www.spatialreference.org. Once you've obtained the SRID and WKT (including
the TOWGS84 parameter) of the systems you'd like to support, you can insert them into the
prospatial_reference_systems table.
To demonstrate the range of possible conversions between coordinate systems, the following
code listing inserts three records into the prospatial_reference_systems table:
WGS84, a geographic coordinate system based on the WGS84 datum
UTM Zone 31N, a projected coordinate system also based on the WGS84 datum
The British National Grid, a projected coordinate system based on the OSGB36
datum.
INSERT INTO prospatial_reference_systems (
spatial_reference_id,
well_known_text
)
VALUES
(4326,
'GEOGCS["WGS 84",
DATUM[
"World Geodetic System 1984",
SPHEROID["WGS 84", 6378137.0, 298.257223563, AUTHORITY["EPSG","7030"]],
AUTHORITY["EPSG","6326"]
],
PRIMEM["Greenwich", 0.0, AUTHORITY["EPSG","8901"]],
UNIT["degree", 0.017453292519943295],
AXIS["Geodetic longitude", EAST],
AXIS["Geodetic latitude", NORTH],
AUTHORITY["EPSG","4326"]
]'),
(32631,
'PROJCS["WGS 84 / UTM zone 31N",
GEOGCS["WGS 84",
DATUM[
"WGS_1984",
SPHEROID["WGS 84", 6378137, 298.257223563, AUTHORITY["EPSG","7030"]],
AUTHORITY["EPSG","6326"]
],
PRIMEM["Greenwich", 0, AUTHORITY["EPSG","8901"]],
UNIT["degree", 0.01745329251994328, AUTHORITY["EPSG","9122"]],
AUTHORITY["EPSG","4326"]
],
PROJECTION["Transverse_Mercator"],
PARAMETER["latitude_of_origin",0],
PARAMETER["central_meridian",3],
PARAMETER["scale_factor",0.9996],
PARAMETER["false_easting", 500000],
PARAMETER["false_northing",0],
UNIT["metre", 1, AUTHORITY["EPSG","9001"]],
AUTHORITY["EPSG","32631"]
]'),
199
www.it-ebooks.info
CHAPTER 8 TRANSFORMATION AND REPROJECTION
(27700,
'PROJCS["OSGB 1936 / British National Grid",
GEOGCS["OSGB 1936",
DATUM[
"OSGB 1936",
SPHEROID["Airy 1830", 6377563.396, 299.3249646, AUTHORITY["EPSG","7001"]],
TOWGS84[446.448, -125.157, 542.06, 0.15, 0.247, 0.842, -4.2261596151967575],
AUTHORITY["EPSG","6277"]
],
PRIMEM["Greenwich", 0.0, AUTHORITY["EPSG","8901"]],
UNIT["degree", 0.017453292519943295],
AXIS["Geodetic longitude", EAST],
AXIS["Geodetic latitude", NORTH],
AUTHORITY["EPSG","4277"]
],
PROJECTION["Transverse Mercator"],
PARAMETER["central_meridian", -2.0],
PARAMETER["latitude_of_origin", 49.0],
PARAMETER["scale_factor", 0.9996012717],
PARAMETER["false_easting", 400000.0],
PARAMETER["false_northing", -100000.0],
UNIT["m", 1.0],
AXIS["Easting", EAST],
AXIS["Northing", NORTH],
AUTHORITY["EPSG","27700"]
]');
The code samples that accompany this book contain a script to populate this table with details of
many other commonly used spatial reference systems.
200
www.it-ebooks.info
CHAPTER 8 TRANSFORMATION AND REPROJECTION
Figure 8-8. The ProjNET project highlighted in the Visual Studio Solution Explorer pane
The ProjNET project is the core assembly containing all the methods required for coordinate
transformation and conversion. We will create an additional separate assembly that calls into the
methods of the ProjNET assembly and exposes them in a format suitable for geometry and geography
data. However, there is a small issue to address before we do so: SQLCLR assemblies do not generally
allow their methods to be called by any other code, unless they are trusted. There are two possible
solutions to this problem:
We could register the calling assembly as an UNSAFE assembly, one that is
granted full trust to perform pretty much any action in the database. This
would achieve the objective of allowing it to access the methods in the ProjNET
assembly, but it would also introduce an unnecessary level of risk. Code in an
assembly with the UNSAFE permission set has permission to do all sorts of
things, including accessing the file system and network resources, and
registering assemblies as UNSAFE should be avoided if at all possible.
The second, better option is to alter the ProjNET assembly itself to allow
partially trusted callers. Microsoft recommends that all assemblies registered
in SQL Server (except those added to the Global Assembly Cache) should be
decorated with the System.Security.AllowPartiallyTrustedCallers attribute,
so that they can be accessed from other SAFE or EXTERNAL_ACCESS assemblies,
preventing the need to grant full trust where it is not required.
Note For more information about using SQLCLR libraries from partially trusted callers, refer to
http://msdn.microsoft.com/en-us/library/ms345097%28v=SQL.110%29.aspx
Fortunately, implementing the necessary change to the Proj.NET library to allow partially trusted
callers is very simple:
201
www.it-ebooks.info
CHAPTER 8 TRANSFORMATION AND REPROJECTION
1. From the Solution Explorer pane, click to expand the ProjNET Properties
folder, and then double-click to edit the AssemblyInfo.cs file contained inside.
2. Decorate the assembly with the AllowPartiallyTrustedCallers attribute, by
adding the line shown below onto the end of the AssemblyInfo.cs file
(immediately after the existing attribute declarations):
[assembly: System.Security.AllowPartiallyTrustedCallers]
3. Now, right-click on the ProjNET project in the Solution Explorer pane and
select "Build" to recompile the assembly.
4. That's it! A message in the output window should report that the build
succeeded, and you can now close the ProjNET solution by clicking on the
Visual Studio File menu and selecting "Close Solution".
202
www.it-ebooks.info
CHAPTER 8 TRANSFORMATION AND REPROJECTION
2. Once the project has been created, go to the Project menu and select Add Reference.
3. In the Add Reference dialog box, click on the Browse tab and navigate to the
directory in which the recompiled ProjNet.dll library was just created. (By default
this is in the /SharpMap.CoordinateSystems/Bin/Debug subdirectory of the directory
in which Proj.NET was unarchived.) Highlight the ProjNET.dll and click OK.
4. Click Add Reference again, and this time add a reference to the
Microsoft.SqlServer.Types.dll library. By default, this is installed in the
/SDK/Assemblies subdirectory of your SQL Server installation folder.
Highlight the SQLServer.Types.dll library and click OK.
Having configured the project, the next step is to create the sink interfaces. There are four sink
interfaces required to cover each combination of possible conversions between the geometry and
geography datatypes:
geography to geometry
geography to geography
geometry to geography
geometry to geometry
The sinks will all follow essentially identical structures, with the exception of the expected input
and return types. The following code listing demonstrates the sink to convert coordinates from the
geography datatype to the geometry datatype, so it implements the IGeographySink110 interface. For
each method in the sink, the parameter values passed in are used to populate a corresponding
IGeometrySink110 interface. So, for example, the OpenGisGeographyType used to denote the type of
geography instance passed to the sink (in the BeginGeography() method) is converted to the
corresponding OpenGisGeometryType and passed through to the BeginGeometry() method of the
IGeometrySink110. As each point is passed to the IGeographySink110 BeginFigure() and AddLine()
methods, the latitude and longitude coordinates are converted using a Proj.NET
ICoordinateTransformation class into the corresponding x- and y-coordinates, and these are then
provided to the equivalent IGeometrySink110 BeginFigure() and AddLine() methods.
Here's the code listing:
using System;
using Microsoft.SqlServer.Types; // SqlGeometry and SqlGeography
using ProjNet.CoordinateSystems.Transformations; // Proj.NET
namespace ProSpatial.Ch8
{
class TransformGeographyToGeometrySink : IGeographySink110
{
public TransformGeographyToGeometrySink(
ICoordinateTransformation trans,
IGeometrySink110 sink
)
{
_trans = trans;
_sink = sink;
203
www.it-ebooks.info
CHAPTER 8 TRANSFORMATION AND REPROJECTION
public void AddCircularArc(double latitude1, double longitude1, double? z1, double? m1,
double latitude2, double longitude2, double? z2, double? m2
)
{
// Transform both the anchor point and destination of the arc segment
double[] anchorPoint = _trans.MathTransform.Transform(new double[]
{ longitude1, latitude1 });
double[] toPoint = _trans.MathTransform.Transform(new double[]
{ longitude2, latitude2 });
204
www.it-ebooks.info
CHAPTER 8 TRANSFORMATION AND REPROJECTION
{
// Just pass through
}
}
}
For the code listings of the corresponding geometry to geography, geometry to geometry, and
geography to geography sinks, please see the code samples accompanying this book.
To actually make use of the interfaces, each sink will have a corresponding function, which we will
expose in SQL Server. The function to accompany the TransformGeographyToGeometrySink interface
requires two inputs: an item of geography data and the SRID of a projected coordinate system, whose
parameters are listed in the prospatial_reference_systems table created earlier. The method projects
the supplied geography input into the appropriate coordinate system and returns the corresponding
geometry value.
The method will read the parameters associated with both the source and destination systems
from the prospatial_reference_systems table, so we need to decorate the function with the
DataAccessKind.Read attribute. Here's the function signature:
[Microsoft.SqlServer.Server.SqlFunction(DataAccess = DataAccessKind.Read)]
public static SqlGeometry GeographyToGeometry(SqlGeography geog, SqlInt32 toSRID)
Once called, the first thing the method must do is to determine the parameters of the source and
destination reference systems. The source spatial reference identifier is retrieved from the STSrid
property of the supplied geography instance. The target SRID is the integer value supplied as the second
parameter to the function.
The function then retrieves the corresponding WKT for these systems by querying the
prospatial_reference_systems table via the context connection (if you named your spatial reference
table something else then be sure to change the query appropriately).
The well-known text representation of the source and destination systems is used to create a
CoordinateTransformation instance. This instance is then passed to the appropriate sink interface in
order to populate a SqlGeometryBuilder instance with transformed values. Finally, the
ConstructedGeometry instance created by the SqlGeometryBuilder is returned to the client.
The code listing for the GeographyToGeometry function is shown following:
using System;
using System.Collections.Generic;
using System.Text;
using System.Data.SqlTypes;
using System.Data.SqlClient; // Required for context connection
using Microsoft.SqlServer.Server; // SqlFunction Decoration
using Microsoft.SqlServer.Types; // SqlGeometry and SqlGeography
using ProjNet.CoordinateSystems; // ProjNET coordinate systems
using ProjNet.CoordinateSystems.Transformations; // ProjNET transformation functions
using ProjNet.Converters.WellKnownText; //ProjNET WKT functions
namespace ProSpatial.Ch8
{
public partial class UserDefinedFunctions
{
[Microsoft.SqlServer.Server.SqlFunction(DataAccess = DataAccessKind.Read)]
public static SqlGeometry GeographyToGeometry(SqlGeography geog, SqlInt32 toSRID)
{
// Use the context connection to the SQL Server instance on which this is executed
using (SqlConnection conn = new SqlConnection("context connection=true"))
205
www.it-ebooks.info
CHAPTER 8 TRANSFORMATION AND REPROJECTION
{
// Open the connection
conn.Open();
// Clean up
cmd.Dispose();
206
www.it-ebooks.info
CHAPTER 8 TRANSFORMATION AND REPROJECTION
Note You do not need to explicitly register the ProjNET library in SQL Server, only the library containing the
custom SQL functions. Since the ProjNET library is referenced from this assembly, SQL Server will automatically
locate and import the ProjNET assembly at the same time. You only need to make sure that both dlls are present in
the same folder on the server, or deployed to the Global Assembly Cache.
207
www.it-ebooks.info
CHAPTER 8 TRANSFORMATION AND REPROJECTION
Having imported the assembly, the next step is to register the corresponding function. Here's the
T-SQL code required to create the GeographyToGeometry function:
CREATE FUNCTION dbo.GeographyToGeometry(@geog geography, @srid int)
RETURNS geometry
EXTERNAL NAME ProSpatialCh8.[ProSpatial.Ch8.UserDefinedFunctions].GeographyToGeometry;
GO
We can also try another example, this time converting a point from WGS84 to the British National
Grid system, which is based on the OSGB36 datum. In this case, conversion involves both a datum
transformation and also projection. Fortunately, ProjNET determines and performs the required
actions automatically behind the scenes, so no further steps need be added to the code listing:
DECLARE @Oxford geography;
SET @Oxford = geography::STPointFromText('POINT(-1.256804 51.752143)', 4326);
You can confirm both of the preceding results using one of a number of online coordinate
conversion tools, such as that available at http://nearby.org.uk.
As a graphical demonstration of this transformation function, Figure 8-10 illustrates a side-by-
side comparison of three versions of the same MultiPolygon geometry representing Great Britain and
208
www.it-ebooks.info
CHAPTER 8 TRANSFORMATION AND REPROJECTION
Northern Ireland. The left image shows the original geography instance using SRID 4326, displayed in
the Management Studio Spatial Results tab using a Bonne projection; the center image displays the
result of the GeographyToGeometry() method when used to transform into a geometry MultiPolygon
using SRID 32631; the right image displays the same MultiPolygon when transformed into SRID 27700.
Figure 8-10. Three different projections of Great Britain. From left to right: geographic (EPSG:4326) using
Bonne projection, UTM Zone 31N (EPSG:32631), Ordnance Survey National Grid of Great Britain
(EPSG:27700)
209
www.it-ebooks.info
CHAPTER 8 TRANSFORMATION AND REPROJECTION
Geom2263 geometry
);
By adding triggers to the table that call the necessary transformation functions you can ensure
that, should any of the spatial columns be updated, all of the other spatial columns are updated with
transformed versions of the updated geometry, keeping the data in sync. Different stored procedures
can utilize different spatial columns as appropriate, and separate indexes can be added to the columns
to optimize performance.
The final point to note is that, while the SQLCLR makes it possible to perform spatial
transformations within SQL Server, you should always consider whether this is the most appropriate
place at which to do so. CLR procedures are easily moved between tiers of the application hierarchy,
and the Proj.NET library used in this chapter could be leveraged just as readily in a front-end
Silverlight application, or in a middle-tier web service, for example, rather than directly in the
database layer.
If the only time at which you find it necessary to convert data is at the point it is first imported into
SQL Server, you might want to investigate one of the ETL tools that can perform conversion and
reprojection of spatial data as an integrated part of a load process, such as Safe FME or OGR2OGR
discussed in Chapter 5, or create a custom component that calls Proj.NET as part of an SSIS data
transformation instead.
Summary
In this chapter, you learned about the process involved in transforming coordinate data between
different spatial reference systems.
Converting data between spatial reference systems based on different
underlying geodetic models of the earth requires datum transformation.
There are various algorithms used for datum transformation. The most common
method is the Helmert transformation, which requires seven parameters.
Helmert transformation parameters that convert to the WGS84 datum can be
included with the well-known text representation of a datum, following the
TOWGS84 keyword.
By applying the inverse set of TOWGS84 parameters, you can convert from WGS84
into the target datum.
Proj.NET is an open source projection library that implements a variety of
projections and can create transformations between any two datums given the
corresponding WKT.
Custom CLR functions can be written that call into the Proj.NET library, and
expose its functionality to create transformation functions for the geometry and
geography datatypes in SQL Server.
These functions can also be called from CLR code in other layers of an
application: in the client, a web layer, or as part of an SSIS ETL load, for example.
210
www.it-ebooks.info
CHAPTER 9
There are many questions that we might ask about an individual item of spatial data: Where is it? How
big is it? What sort of object is it? Where does it start and end? Where does its center lie? In this chapter,
you'll learn about the methods that SQL Server provides to answer questions such as these. Note that
all of the content in this chapter relates to examining individual items of spatial data, considered in
isolation. Methods that analyze the properties of, or relationship between, two or more items of data
will be covered in later chapters.