Topological Features
Topological Features
20 GIS
Value point encoding
• Cells are assigned position numbers starting in the upper left corner proceeding from
left to right and from top to bottom.
• The position no. for end of each run is stored in the point columns. The value for each
cell in the run is in the value column.
2.7.3. Quadtree
• Typical type of raster model is dividing area into equal-sized rectangular cells .
• However, many cases, variable sized grid cell size used for more compact raster
representation as shown figure.2.13.
• Larger cells used to represent large homogenous areas and smaller cells for finely
details.
• Process involves regularly subdividing a map into four equal sized quadrants.
Quadrant that has more than one class is again subdivided. Then; it is further
subdivided within each quadrant until a square is found to be so homogenous that it is
no longer needed to be divided.
• Then a Quadtree is prepared, resembling an inverted tree with “Root”, i.e., a point
from which all branches expand; Leaf is a lower most point and all other points in the
tree are nodes.
b) Topological features
A topology is a mathematical procedure that describes how features are spatially related
and ensures data quality of the spatial relationships. Topological relationships include following
three basic elements:
2.8.1. Connectivity
Arc node topology defines connectivity - arcs are connected to each other if they share a
common node. This is the basis for many network tracing and path finding operations.
Arcs represent linear features and the borders of area features. Every arc has a from-node
which is the first vertex in the arc and a to-node which is the last vertex. These two nodes define
the direction of the arc. Nodes indicate the endpoints and intersections of arcs. They do not exist
independently and therefore cannot be added or deleted except by adding and deleting arcs.
2.8.2. Contiguity
Polygon topology defines contiguity. The polygons are said to be contiguous if they share
a common arc. Contiguity allows the vector data model to determine adjacency.
Polygon A is outside the boundary of the area covered by polygons B, C and D. It is called
the external or universe polygon, and represents the world outside the study area. The universe
polygon ensures that each arc always has a left and right side defined.
2.8.3. Containment
Geographic features cover distinguishable area on the surface of the earth. An area is
represented by one or more boundaries defining a polygon.
The lake actually has two boundaries, one which defines its outer edge and the other
(island) which defines its inner edge. An island defines the inner boundary of a polygon. The
polygon D is made up of arc 5, 6 and 7. The 0 before the 7 indicates that the arc 7 creates an
island in the polygon.
Polygons are represented as an ordered list of arcs and not in terms of X, Y coordinates.
This is called Polygon-Arc topology. Since arcs define the boundary of polygon, arc coordinates
are stored only once, thereby reducing the amount of data and ensuring no overlap of boundaries
of the adjacent polygons.
Line entities:
Linear features made by tracing two or more XY coordinate pair.
• Simple line: It requires a start and an end point.
• Arc: A set of XY coordinate pairs describing a continuous complex line. The shorter
the line segment and the higher the number of coordinate pairs, the closer the chain
approximates a complex curve.
Simple Polygons:
Enclosed structures formed by joining set of XY coordinate pairs. The structure is simple
but it carries few disadvantages which are mentioned below:
• Lines between adjacent polygons must be digitized and stored twice, improper
digitization give rise to slivers and gaps
• Convey no information about neighbour
• Creating islands is not possible
Because points can be placed irregularly over a surface a TIN can have higher resolution
in areas where surface is highly variable. The model incorporates original sample points providing
a check on the accuracy of the model. The information related to TIN is stored in a file or a
database table. Calculation of elevation, slope, and aspect is easy with TIN but these are less
widely available than raster surface models and more time consuming in term of construction and
processing.
2.26 GIS
The TIN model is a vector data model which is stored using the relational attribute tables.
A TIN dataset contains three basic attribute tables: Arc attribute table that contains length, from
node and to node of all the edges of all the triangles.
• Node attribute table that contains x, y coordinates and z (elevation) of the vertices
• Polygon attribute table that contains the areas of the triangles, the identification
number of the edges and the identifier of the adjacent polygons.
Storing data in this manner eliminated redundancy as all the vertices and edges are stored
only once even if they are used for more than one triangle. As TIN stores topological
relationships, the datasets can be applied to vector based geo-processing such as automatic
contouring, 3D landscape visualization, volumetric design, surface characterization etc.
Data Model
The data model represents a set of guidelines to convert the real world (called entity) to
the digitally and logically represented spatial objects consisting of the attributes and geometry.
The attributes are managed by thematic or semantic structure while the geometry is represented by
geometric-topological structure.
Vector Data Model: [data models] A representation of the world using points, lines, and
polygons (show in the figure 2.21). Vector models are useful for storing data that has discrete
boundaries, such as country borders, land parcels, and streets.
Raster Data Model: [data models] A representation of the world as a surface divided into
a regular grid of cells. Raster models are useful for storing data that varies continuously, as in an
aerial photograph, a satellite image, a surface of chemical concentrations, or an elevation surface.
Spatial Data Models 2.27
Since the dawn of time, maps have been using symbols to represent real-world features. In
GIS terminology, real-world features are called spatial entities.
The cartographer decides how much data needs to be generalized in a map. This depends
on scale and how much detail will be displayed in the map. The decision to choose vector points,
lines or polygons is governed by the cartographer and scale of the map.
2.28 GIS
(1) Points
For Example: At a regional scale, city extents can be displayed as polygons because this
amount of detail can be seen when zoomed in. But at a global scale, cities can be represented
as points because the detail of city boundaries cannot be seen.
Vector data are stored as pairs of XY coordinates (latitude and longitude) represented as a
point. Complementary information like street name or date of construction could accompany it
in a table for its current use.
(2) Lines
Lines usually represent features that are linear in nature. Cartographers can use a different
thickness of line to show size of the feature. For Example, 500 meter Wide River may be
thicker than a 50 meter wide river. They can exist in the real-world such as roads or rivers. Or
they can also be artificial divisions such as regional borders or administrative boundaries.
Points are simply pairs of XY coordinates (latitude and longitude). When you connect
each point or vertex with a line in a particular order, they become a vector line feature.
Networks are line data sets but they are often considered to be different. This is because linear
networks are topologically connected elements. They consist of junctions and turns with
Spatial Data Models 2.29
connectivity. If you were to find an optimal route using a traffic line network, it would follow
one-way streets and turn restrictions to solve an analysis. Networks are just that smart.
(3) Polygons
Examples of polygons are buildings, agricultural fields and discrete administrative areas.
Cartographers use polygons when the map scale is large enough to be represented as polygons.
For example:
Each pixel value in a satellite image has a red, green and blue value. Alternatively, each
value in an elevation map represents a specific height. It could represent anything from
rainfall to land cover.
Raster models are useful for storing data that varies continuously. For example, elevation
surfaces, temperature and lead contamination.
In a discrete raster land cover/use map, you can distinguish each thematic class. Each class
can be discretely defined where it begins and ends. In other words, each land cover cell is
definable and it fills the entire area of the cell.
Discrete data usually consists of integers to represent classes. For example, the value 1
might represent urban areas; the value 2 represents forest and so on.
A continuous raster surface can be derived from a fixed registration point. For example,
digital elevation models use sea level as a registration point. Each cell represents a value above or
below sea level. As another example, aspect cell values have fixed directions such as north, east,
south or west.
Phenomena can gradually vary along a continuous raster from a specific source. In a raster
depicting an oil spill, it can show how the fluid moves from high concentration to low
concentration. At the source of the oil spill, concentration is higher and diffuses outwards with
diminishing values as a function of distance.
In the end, it really comes down to the way in which the cartographer conceptualizes the
feature in their map.
Spatial Data Models 2.31
• Do you want to work with pixels or coordinates? Raster data works with pixels.
Vector data consists of coordinates.
• What is your map scale? Vectors can scale objects up to the size of a billboard. But
you don’t get that type of flexibility with raster data
• Do you have restrictions for file size? Raster file size can result larger in comparison
with vector data sets with the same phenomenon and area.
Disadvantages:
• The location of each vertex needs to be stored explicitly. For effective analysis, vector
data must be converted into a topological structure. This is often processing intensive
and usually requires extensive data cleaning. As well, topology is static, and any
updating or editing of the vector data requires re-building of the topology.
• Algorithms for manipulative and analysis functions are complex and may be
processing intensive. Often, this inherently limits the functionality for large data sets,
e.g. a large number of features.
• Continuous data; such as elevation data, is not effectively represented in vector form.
• Usually substantial data generalization or interpolation is required for these data
layers.
• Spatial analysis and filtering within polygons is impossible.
A TIN is a vector based representation of the physical land surface or sea bottom, made up
of irregularly distributed nodes and lines with three dimensional coordinates (x,y, and z) that are
arranged in a network of non-overlapping triangles. TINs are often derived from the elevation data
of a rasterized digital elevation model (DEM).
Edges:
Every node is joined with its nearest neighbors by edges to form triangles, which satisfy
the Delaunay criterion. Each edge has two nodes, but a node may have two or more edges.
2.34 GIS
Because edges have a node with a z value at each end, it is possible to calculate a slope along the
edge from one node to the other.
TIN:
Advantages - ability to describe the surface at different level of resolution, efficiency in
storing data.
Disadvantages - in many cases require visual inspection and manual control of the
network.
The TIN creates triangles from a set of points called mass points, which always become
nodes. The user is not responsible for selecting; all the nodes are added according to a set of rules.
Mass points can be located anywhere, the more carefully selected, the more accurate the model of
the surface will be. Well-placed mass points occur when there is a major change in the shape of
the surface, for example, at the peak of a mountain, the floor of a valley, or at the edge (top and
bottom) of cliffs. By connecting points on a valley floor or along the edge of a cliff, a linear break
in the surface can be defined. These are called break lines. Break lines can control the shape of the
surface model.
They always form edges of triangles and, generally, cannot be moved. A triangle always
has three and only three straight sides, making their representation rather simple. A triangle is
assigned a unique identifier that defines by its three nodes and its two or three neighboring
triangles.
TIN is a vector-based topological data model that is used to represent terrain data. A TIN
represents the terrain surface as a set of interconnected triangular facets. For each of the three
vertices, the XY (geographic location) and the (elevation) Z values are encoded.
2.12. GRID/LUNR/MAGI
In this model each grid cell is referenced or addressed individually and is associated with
identically positioned grid cells in all other coverage’s, rather than like a vertical column of grid
cells, each dealing with a separate theme. Comparisons between coverage’s are therefore
performed on a single column at a time. Soil attributes in one coverage can be compared with
vegetation attributes in a second coverage. Each soil grid cell in one coverage can be compared
with a vegetation grid cell in the second coverage. The advantage of this data structure is that it
facilitates the multiple coverage analysis for single cells. However, this limits the examination of
spatial relationships between entire groups or themes in different coverage’s.
2.13.1. Standards
Most of the OGC standards depend on a generalized architecture captured in a set of
documents collectively called the Abstract Specification, which describes a basic data model for
representing geographic features. Atop the Abstract Specification members have developed and
continue to develop a growing number of specifications, or standards to serve specific needs for
interoperable location and geospatial technology, including GIS.
The OGC standards baseline comprises more than thirty standards, including:
Although the term "garbage in, garbage out" certainly applies to GIS data, there are other
important data quality issues besides the input data that need to be considered.
Position Accuracy
Position accuracy is the expected deviance in the geographical location of an object in the
data set (e.g. on a map) from its true ground position. Selecting a specified sample of points in a
prescribed manner and comparing the position coordinates with an independent and more accurate
Spatial Data Models 2.39
source of information usually test it. There are two components to position accuracy: the bias and
the precision.
Attribute Accuracy
Attributes may be discrete or continuous variables. A discrete variable can take on only a
finite number of values whereas a continuous variable can take on any number of values.
Categories like land use class, vegetation type, or administrative area are discrete variables. They
are, in effect, ordered categories where the order indicates the hierarchy of the attribute.
Logical Consistency
Logical consistency refers to how well logical relations among data elements are
maintained. It also refers to the fidelity of relationships encoded in the database, they may refer to
the geometric structure of the data model (e.g. topologic consistency) or to the encoded attribute
information e.g. semantic consistency).
(a) Completeness
Completeness refers to the exhaustiveness of the information in terms of spatial and
attribute properties encoded in the database. It may include information regarding feature
selection criteria, definition and mapping rules and the deviations from them. The tests on
spatial completeness may be obtained from topological test used for logical consistency
whereas the test for attribute completeness is done by comparison of a master list of geo-codes
to the codes actually appearing in the database.
There are several aspects to completeness as it pertains to data quality. They are grouped
here into three categories: completeness of coverage, classification and verification.
The completeness of coverage is the proportion of data available for the area of interest.
Example:
Demographic information is usually very time sensitive. It can change significantly
over a year. Land cover will change quickly in an area of rapid urbanization.
(c) Lineage
The lineage of a data set is its history, the source data and processing steps used to
produce it. The source data may include transaction records, field notes etc. Ideally, some
indication of lineage should be included with the data set since the internal documents are
rarely available and usually require considerable expertise to evaluate. Unfortunately, lineage
information most often exists as the personal experience of a few staff members and is not
readily available to most users.
Accessibility refers to the ease of obtaining and using the data. The accessibility of a data
set may be restricted because the data are privately held. Access to government-held information
may be restricted for reasons of national security or to protect citizen rights. Census data are
usually restricted in this way. Even when the right to use restricted data can be obtained, the time
and effort needed to actually receive the information may reduce its overall suitability.
The direct cost of a data set purchased from another organization is usually well known: it
is the price paid for the data. However, when the data are generated within the organization, the
true cost may be unknown. Assessing the true cost of these data is usually difficult because the
services and equipment used in their production support other activities as well.
The indirect costs include all the time and materials used to make use of the data. When
data are purchased from another organization, the indirect costs may actually be more significant
than the direct ones.
It may take longer for staff to handle data with which they are unfamiliar, or the data
may not be compatible with the other data sets to be used.
But during the last many years, GIS data most often have been digitized from several
sources, including hard copy maps, rectified aerial photography and satellite imagery. Hard-copy
maps (e.g. paper, vellum and plastic film) may contain unintended production errors as well as
unavoidable or even intended errors in presentation. The following are "errors" commonly found
in maps.
Indistinct Boundaries
Indistinct boundaries typically include the borders of vegetated areas, soil types, wetlands
and land use areas. In the real world, such features are characterized by gradual change, but
cartographers represent these boundaries with a distinct line. Some compromise is inevitable.
Map Scale
Cartographers and photogrammetrists work to accepted levels of accuracy for a given map
scale as per National Map Accuracy Standards. Locations of map features may disagree with
actual ground locations, although the error likely will fall within specified tolerances. Of course,
the problem is compounded by limitations in linear map measurements-typically about 1/100th of
an inch on a map scale.
Map Symbology
It is impossible to perfectly depict the real world using lines, colors, symbols and patterns.
Cartographers work with certain accepted conventions. As a result, facts and features represented
on maps often must be interpreted or interpolated, which can produce errors. For example, terrain
elevations typically are depicted using topographic contour lines and spot elevations. Elevations
of the ground between the lines and spots must be interpolated. Also, areas symbolized as "forest"
may not depict all open areas among the trees.
A digitizer must accurately discern the centre of a line or point as well as accurately trace
it with a cursor. This task is especially prone to error if the map scale is small and the lines or
symbols are relatively thick or large. The method of digitizing curvilinear lines also affects
accuracy. "Point-mode" digitizing, for example, places sample points at selected locations along a
line to best represent it in a GIS. The process is subject to judgment of the digitizer who selects
the number and placement of data points. "Stream-mode" digitizing collects data points at a pre-
set frequency, usually specified as the distance or time between data points. Every time an
operator strays from an intended line, a point digitized at that moment would be inaccurate. This
method also collects more data points than may be needed to faithfully represent a map feature.
Therefore, post-processing techniques often are used to "weed out" unneeded data points.
Heads-up digitizing often is preferred over table digitizing, because it typically yields
better results more efficiently. Keyed data entry of land parcel data is the most precise method.
Moreover, most errors are fairly obvious, because the source data usually are carefully computed
Spatial Data Models 2.43
and thoroughly checked. Most keyed data entry errors show as obvious mismatches in the parcel
"fabric."
GIS software usually includes functions that detect several types of database errors. These
error-checking routines can find mistakes in data topology, including gaps, overshoots, dangling
lines and unclosed polygons. An operator sets tolerances that the routine uses to search for errors,
and system effectiveness depends on setting correct tolerances. For example, tolerances too small
may pass over unintentional gaps, and tolerances too large may improperly remove short dangling
lines or small polygons that were intentionally digitized.
The phrasing of spatial and attribute queries also may lead to errors. In addition, the use of
Boolean operators can be complicated, and results can be decidedly different, depending on how a
data query is structured or a series of queries are executed. For example, the query, "Find all
structures within the 100 year flood zone," yields a different result than, "Find all structures
touching the 100 year flood zone." The former question will find only those structures entirely
within the flood zone, whereas the latter also will include structures that are partially within the
zone.
Dataset overlay is a powerful and commonly used GIS tool, but it can yield inaccurate
results. To determine areas suitable for a specific type of land development project, one may
overlay several data layers, including natural resources, wetlands, flood zones, land uses, land
ownership and zoning. The result usually will narrow the possible choices down to a few parcels
that would be investigated more carefully to make a final choice. The final result of the analysis
will reflect any errors in the original GIS data. Its accuracy only will be as good as the least
accurate GIS dataset used in the analysis.
It is also common to overlay and merge GIS data to form new layers. In certain
circumstances, this process introduces a new type of error: the polygon "sliver." Slivers often
appear when two GIS datasets with common boundary lines are merged. If the common elements
have been digitized separately, the usual result will be sliver polygons. Most GIS software
products offer routines that can find and fix such errors, but users must be careful in setting search
and correction tolerances.
Many errors can be avoided through proper selection and "scrubbing" of source data
before they are digitized. Data scrubbing includes organizing, reviewing and preparing the source
materials to be digitized. The data should be clean, legible and free of ambiguity. "Owners" of
source data should be consulted as needed to clear up questions that arise.
Data entry procedures should be thoroughly planned, organized and managed to produce
consistent, repeatable results. Nonetheless, a thorough, disciplined quality review and revision
process also is needed to catch and eliminate data entry errors. All production and quality control
Spatial Data Models 2.45
procedures should be documented, and all personnel should be trained in these procedures.
Moreover, the work itself should be documented, including a record of what was done, who did it,
when was it done, who checked it, what errors were found and how they were corrected.
To avoid misusing GIS data and the misapplication of analytical software, GIS analysts
including casual users need proper training. Moreover, GIS data should not be provided without
metadata indicating the source, accuracy and specifics of how the data were entered.