0% found this document useful (0 votes)
88 views96 pages

Etc - 10 - GIS DATA

This document discusses GIS data input and characteristics. It describes the main types of spatial and non-spatial data used in GIS, including data representation through variables like nominal, ordinal, internal, and ratio. Common data sources are also outlined, such as satellite imagery, maps, aerial photographs, surveys, and attribute data from tables. Typical data sets for GIS applications involving natural resource management and regional planning are identified.

Uploaded by

rishav baishya
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
88 views96 pages

Etc - 10 - GIS DATA

This document discusses GIS data input and characteristics. It describes the main types of spatial and non-spatial data used in GIS, including data representation through variables like nominal, ordinal, internal, and ratio. Common data sources are also outlined, such as satellite imagery, maps, aerial photographs, surveys, and attribute data from tables. Typical data sets for GIS applications involving natural resource management and regional planning are identified.

Uploaded by

rishav baishya
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 96

MODERN SURVEYING TECHNIQUES

GIS – Data Input


Dear Viewers, in this session, I will discuss the
various aspects related to GIS data input and it
characteristics etc.
INTRODUCTION
• Geographic data in digital form are numerical
representations of the real world.
• It describes real-world features and phenomena
coded in specific ways in support of GIS and
mapping applications using the computer.
• The digital geographic data must be organized
as a geographic database.
• Roughly two-thirds of the total cost of
implementing a GIS involves building the GIS
database which should be accurate and has a
significant impact on the usefulness of the GIS.
GIS DATA TYPES

Geographic data consists of

a) spatial data
b) non-spatial data.
SPATIAL DATA
• It gives information about the geometrical orientation,
shape and size of a feature, and its relative position
with respect to the position of other features.
• Spatial data is described by its x and y coordinates.
• The spatial data is normally available in analog form
as maps but now the maps are also available directly
in digital format.
• In GIS, both types of the spatial data are handled
differently.
• Normally the spatial and non-spatial data are stored
separately in a GIS, and links are established
between the two at the time of processing and
analysis.
NON-SPATIAL DATA
• The non-spatial data, also known as attribute data,
are information about various attributes like length,
area, population, acreage, etc.
• The non-spatial data describe the attributes of a
point, along a line, or in a polygon.
• In other words they describe what is at a point (e.g.,
a hospital), along a line (e.g., a canal), or in a
polygon (e.g., a forest).
• The attributes of a soil category may be depth of
soil, texture, type of erosion, or permeability.
• The non-spatial data, mostly available in tabular
form, are also converted into digital format for use
in GIS.
DATA REPRESENTATION
• The data representation is in different kinds of
variables, also known as scales that can be stored in
a GIS. These variables are
(i) Nominal,
(ii) Ordinal,
(iii) Internal,
(iv) Ratio.
Nominal Variables
• Nominal Variables are used when the data are
principally classified into mutually exclusive sets
or levels based on relevant characteristics.
The nominal variable is the commonly used as a
measure for spatial data.
It can be of two types as below.

a) Dichotomous
b) Categorical
Dichotomous (Presence or absence)
• These data are mainly logical definition of a data
characteristic, and are also referred to as Yes/
No data.

• It mainly applies where a particular data is to be


classified into one of the two categories.

• For example, a village may or may not have


hospital; a city may or may not have an airport.
Categorical data
• These are used when it is required to classify the
data into one of several categories by name with
no specific order.

• Categories of land use such as residential area,


recreational area, business areas, or trees such
as Quercus agrifolia, Pinus Coulteri, Eucalyptas
calophylla, are different kinds of categorical
variables.
Ordinal Variables
• Ordinal Variables are lists of discrete classes but with
an inherent order or sequence.
• This representation of data is more sophisticated and
orderly as the classes are placed into some form of
rank order based on a logical property of magnitude.
• The ranking of data may be natural such as grades of
agricultural land, or according to some criteria, such
as population density.
• In general, class of streams may be first order,
second order, and so forth, levels of education may
be primary, secondary, college, post-graduate, are
ordinal variables since the discrete classes have a
natural sequence.
Internal Variables
• Internal variables also have a natural sequence,
but in addition, the differences between the
values are quantified.
• For example, the elevation of points is an internal
variable since the difference in elevation
between two points having elevations 55 m and
65 m is the same as for other points having
elevations 80 m and 90 m.
• The representation of population in same order is
an example of internal variables.
Ratio Variables
Ratio variables have the same characteristic as
internal variables, but in addition, they have
natural zero or real origin (i.e., starting point).
Per capital income, the fraction of the weight of a
soil sample that passes through sieve, are
common ratio variables.
DATA SOURCES
a) Satellite Imagery
b) Existing Maps
c) Aerial Photographs, Digital Ortho-photographs
e) Attribute Data
f) Survey Data and Records
g) Other Sources
Satellite Imagery
• Remote sensing data in the form of satellite
imagery is an important element of the
organization of any GIS database as it makes
possible repetitive coverage of large areas.

• Satellite imagery can be used as a raster


backdrop on vector GIS data.
• Satellite images can support numerous GIS
applications including environmental impact
analysis, site evaluation for large facilities,
highway planning, development and monitoring
of environmental baselines, emergency and
disaster response, agriculture, and forestry.

• Satellite images are also useful for urban


planning and management.

• In addition to image analysis, satellite images are


used to generate thematic information resulting
into thematic maps.
Existing Maps
Paper maps are the most important source of data
for GIS. Maps of various scales, sizes, formats,
and time periods showing different features are
available for large portion of the Earth, and these
are major sources of data for the GIS database.
The information available on a paper map is
converted into digital form by the process of
digitization for use in GIS.
The advanced countries like U.S.A. also have the
digital maps, which can directly be used in GIS
without going into the process of digitization.
Aerial Photographs and Digital Ortho-
photographs
• Another major source of data for a GIS application
is the aerial photographs.
• Aerial photographs rectified for relief displacement
or radial distortions are known as ortho-photos.
An ortho-photo is geometrically equivalent to a
conventional line map, and represents planimetric
features on the ground in their true orthographic
positions.
• Due to this, ortho-photos possess the advantage of
line maps, such as, ability to make measurements
of distances, angles, and areas.
• However, ortho-photos unlike line maps also
contain the images of an infinite number of ground
objects, and therefore, most of the time they need
conversion into theme maps.
• At present with the given computer power, its
storage capacity, and speed, it has become
possible to have digital ortho-photos commercially.
• The digital ortho-photos provide all information of a
photograph, but at the same time allow the
registration of vector maps used in GIS.
Attribute Data
• Attribute data for a GIS are mainly tabular data
collected by sampling.

• The tabular data which are tables consisting of rows


representing samples and columns representing
parameter values can be incorporated into GIS as
relational tables.
Survey Data and Records
Some survey data and records about rock types,
soil types, elevation, population, and other
features are collected by the related national
agencies of a country and maintained in the form
of maps and tables.
These data can also be incorporated into a GIS.
Other Sources
Conventionally, terrain data can be obtained by
field surveying using grid levelling, stadia
tachometry or other field surveying methods.
These methods have been replaced by the new
generation surveying instruments, such as
electronic tachometer or total station, and the
Global Positioning System (GPS) for collecting
locational as well as attribute data.
Another source of GIS data could be the internet.
Almost all analog or digital data available for use
in a GIS may have limitations, and pose
problems while organizing the GIS database.
TYPICAL GIS DATA SETS
• By and large, the most common application of GIS
is the effective management of natural resources
and planning of regions at different levels.
• For such an activity a variety of data sets are
required.
• These data sets can be broadly grouped as below:-

a) Natural resource data


b) Demographic data
c) Agro-economic data
d) Socio-economic data
e) Infrastructure data
Natural resource data
• land use
• crop type
• cropping area
• water bodies and drainage
• soil types
• forest types
• groundwater potentials
• mineral resources.
Demographic data
• population,
• age structure,
• sex ratio,
• urban and rural population,
• reserved caste population,
• occupational structure, and
• migration patterns.
Agro-economic data
• cropped and irrigated area,
• agricultural production,
• land holdings,
• livestock population,
• livestock produce,
• market and
• pricing information.
Socio-economic data
• industrial,
• fishing,
• tourism development ,
• beneficiaries of various schemes and
• programmes of development.
Infrastructure data
• various facilities, utilities and services, such as
• education,
• health,
• power,
• transport network,
• water supply,
• communication,
• general amenities, and
• drainage.
DATA ACQUISITION
• Data acquisition in GIS refers to all aspects of
collecting spatial data from all available sources
and converting them to a standard digital form.

• This requires tools such as interactive computer


screen and mouse, digitizer, word processors
and spreadsheet programs, scanners and
devices necessary for reading data already
written on magnetic media as tapes or CD-ROMs.
Source of Data
a) Remote Sensing
b) Existing Maps
c) Photogrammetry
d) Field Surveying Methods
e) GPS
f) Internet
REMOTE SENSING
• The terrain data acquired through sensors onboard
satellite platforms being in digital format can be
directly used after preprocessing for preparing a
GIS database.
• These data are coded in picture elements called
pixels, and stored in the form of a two-dimensional
matrix that contains merely a number representing
the amount of the reflected electromagnetic
radiation received in a given band.
• The digital images must be located properly with
respect to a geodetic grid, otherwise the data they
contain, cannot be related to their true ground
position.
EXISTING MAPS
• The acquisition of digital data by digitizing
existing maps is comparatively cheaper, and
requires less time compared to other methods.

• The elevation data extracted using contours have


poor accuracy as compared to spot heights.

• The digitized contours cannot be used for any


useful GIS application other than regenerating the
original contours themselves.
• The digitized contours cannot be used for any
useful GIS application other than regenerating
the original contours themselves.
• For the digitized contour data to be used for
digital terrain modeling it is necessary to carry
out a series of post-digitized processes where
by sampled points are turned into a Triangulated
Irregular Network (TIN) or a Digital Elevation
Model (DEM).
• The digitization of paper maps is done using a
spatial data capturing device called a digitizer.
• Map data conversion by scanning and
vectorization is often referred to as screen
digitizing or heads-up digitizing to distinguish it
from conventional map data conversion using
digitizer.

• This approach of digital data conversion is


capable of converting a large number of maps in
a relatively short period of time and at a cost
comparable to or lower than the conventional
method of map digitizing.
By Photogrammetry
• When the area of interest is too extensive or
too rugged, the photogrammetric method is
employed to collect the digital terrain data
using appropriate photogrammetric
instruments such as analog stereoplotter
equipped with encoders, analytical plotter, or
using digital photogrammetry methods.
• Analog stereoplotter provide 3D data from
aerial photographs.
• The digitization of 3D models of the terrain
formed by analog stereoplotter is done by
equipping the stereoplotter with linear and
rotary encoders.
FIELD SURVEYING METHODS
• Terrain data in digital form can be obtained directly
by field surveying methods by employing instruments
such as electronic tachometer or total station, and
GPS.
• Electronic tachometer or total station is capable of
electronically measuring both angles and distances,
and performing computation to obtain horizontal
distance, slope distance, difference in elevation,
coordinates, and elevation of points.
• These instruments are equipped with internal
memory or external data recorder for temporary
storage of data, which subsequently are transferred
to dedicated microcomputer or mainframe computer.
GPS
• GPS is a satellite-based surveying system to
obtain highly accurate digital terrain data
electronically in the form of x, y, and z coordinates.
• There are two basic field methods of GPS
measurements: static and differential.
• Static GPS surveying is used to determine
positions of survey control points in areas where
geodetic control is lacking or unreliable.
• It is also used for accurate measurement of
distance between two points using two or more
dual-frequency GPS receivers to record the
observations made on GPS satellites
simultaneously for about six hours.
• Static GPS surveying is mainly employed for
establishing geodetic control and measuring
national and international networks, and not
intended for ordinary detail terrain data
acquisition.
• Differential GPS surveying is used to determine
the positions and heights of ground points by
making use of existing or newly established
control points.
• The differential GPS surveying may be
performed as a kinematic GPS surveying or
real-time GPS surveying.
INTERNET
• The Internet is a vast network of digital
computers.
• They are linked by an array of different data
transmission media such a satellite and radio
links, and fiber optic, unshielded-twisted pair, co-
axial, and telephone communication lines.
• Connecting these media involves an array of
different devices ranging from complicated
routers and data switches, through simple signal
amplifying hubs to modems.
• The transfer of data is carried out by using a
standard coupling protocol known as TCP/IP
(Transmission Control Protocol/Internet Protocol).
• For GIS users the WWW provides data, and is
source of information.
• Whole libraries of vector, raster, and object data
are being offered on the Internet as well as
directory information on different data sets.
• At the moment speed of data transmission and
Internet access are the main limitations of this
data resource.
• In my next session I would focus my attention to
data verification and editing for removal of errors.
THANK
YOU
MODERN SURVEYING TECHNIQUES

Lecture No 28

Lecture Name :- Data verification and


editing

Prof. S.K.Ghosh
Dept of Civil Engg
• Dear Viewers, in this session, I would like to
focus my attention on the verification of data
and removal of errors, if any.
DATA VERIFICATION & EDITING
• It is important to check the acquired data for errors
due to possible inaccuracies, omissions, and other
factors.
• The errors in the spatial data are, generally
checked by printing the data or by taking its
computer plot, preferably on translucent or thin
paper, at the same scale as the original.
• The print out or computer plot is placed over the
original map.
• The two maps are compared visually and the
discrepancies in the form of missing data,
locational errors, and other errors, are clearly
marked on the print out.
• Checking of the attribute data is also done by
visual inspection of the print out.
• A better method of checking the attribute data is
to scan the data files with a computer program
that can locate the gross errors such as text
instead of numbers, numbers exceeding a given
range, and so on.
• Errors may arise during the capturing of spatial
and attribute data in the following cases.
ERRORS IN SPATIAL DATA
a) Spatial data are incomplete or double
b) Spatial data are in the wrong place
c) Spatial data are defined using too many
coordinate pairs
d) Spatial data are at the wrong scale
e) Spatial data are distorted
Incomplete or double
• When the data are entered manually, incompleteness
in the spatial data may be due to omissions in the
input of points, lines, or cells.
• In case of scanned data, this error of omission is
usually in the form of gaps between lines where the
raster-vector conversion process fails to join up all
parts of a line.
• The raster-vector conversion of scanned data can
lead to the generation of unwanted spikes.
• Sometimes one line may be digitized twice, and lines
and nodes may be disjointed at the intersections.
In the wrong place
• Spatial data may have minor placement errors to
gross spatial errors due to mislocation of spatial
data.
• Minor placement errors are usually the result of
careless digitizing.
• Gross spatial errors are due to change of origin or
scale that occurs during digitizing, or as a result
of hardware or software faults.
Defined using too many coordinate
pairs
• As a result of both digitizing and scanning
process, lines in the database may be defined
using too many points resulting into use of large
storage space in a computer.
Spatial data are at the wrong scale
• The digitization at the wrong scale results in
erroneous representation of spatial data.
• In case of scanned data, the problem usually
arises during the geo-referencing process using
incorrect values.
Data is distorted
• The spatial data may be distorted if the base
maps used for digitizing are not at the correct
scale.

• Usually aerial photographs do not have uniform


scale over the whole of image because of relief
and tilt distortions, and sometimes due to
aberration in the lens properties.
• Paper maps may suffer from paper stretch, which
is usually greater in one direction than other.

• In addition, paper maps and field documents


may contain random distortions as a result of
having been exposed to rain, sunshine, frequent
folding, etc.

• Transformation from one coordinate system to


another may also cause error in the spatial data.
• These errors are addressed through various
editing and updating functions supported by most
GIS software.
• Data editing is done visually by viewing the
portion of the map containing the error on the
computer monitor, and correcting them through
the software by using a keyboard, mouse, or
digitizer.
• Data scaling problems may be overcome by
applying simple numerical factors to the data.
• More complex rotating and translating operations
are needed when fitting various data sets
together such as a distorted thematic map to an
accurate base map.
• The faulty map should be corrected with the
base map, and a number of points on the original
map linked by vectors to their correct position as
shown in Fig. where T is the transformation
vector. Transformed line
T T T

T
Original line
Rubber sheeting
• Mathematical transformations, stretches or
compression of the original map until the linking
vectors have shrunk to zero length and the tie
points are registered with each other.
• It is then assumed that all the other points on the
original map have been relocated correctly. This
process is known as rubber sheeting or wrapping.
• This method cannot be applied directly on
rasterized data because of the rigidity of the fixed
grid and the structure of the data.
• Attribute values and spatial errors in raster data
are corrected by changing the value of the faulty
cells.
GEOREFERENCING OF GIS DATA
• A spatial referencing system is required to
handle spatial information.

• The primary aim of a reference system is to


locate a feature on the Earth’s surface or a 2D
representation of this surface such as a map.

• A map portrays accurately real-world features


that occur on the curved surface of Earth.
• The objective of geo-referencing is to provide a
rigid spatial framework by which the positions of
the real-world features are measured, computed,
recorded, and analyzed in terms of length of a
line, size of an area, and shape of a feature.
• Several methods of geo-referencing exist, all of
which can be grouped into three categories as
under:
(i) Geographic Coordinate System,
(ii) Rectangular Coordinate System,
(iii) Non-Coordinate System.
GEOGRAPHIC COORDINATE SYSTEM
• The geographic coordinate system is the only
system that defines the true geographical
coordinates in terms of latitude and longitude.
• The Earth is defined by a reference surface using
latitude and longitude.
• As shown in Fig. lines of longitude (also known as
meridians) start at one pole and radiate outwards
until they converge at the opposite pole, while
lines of latitude lie at right angles to lines of
longitude and run parallel to one another.
N
Greenwich Meridian through P
Meridian
P

Parallel of latitude φ
φ passing through P
O

0º P'
λ Equator

Geographical coordinate system


RECTANGULAR COORDINATE
SYSTEM
• Since most of spatial data available for use in
GIS exist in 2D form, a referencing system that
uses rectangular coordinates is most suited.
• This requires a map graticule or grid, placed on
top of the map.
• The graticule is obtained by projecting the lines
of latitude or longitude from our representation of
the world as a globe onto a flat surface using a
map projection.
• The function of map projection is to define
positions on to the Earth’s curved surface when
transformed on to a flat map surface.

• There are several map projections, and a variety


of these are in common use since no single
projection can meet the requirements of all users.
Universal Transverse Mercator
• The simplest regular square grid is the
most widely used coordinate system for
small areas. For larger areas, certain
established cartographic projections such
as the Universal Transverse Mercator
Projection (UTM) are commonly used.
• This projection uses multiple cylinders that
touch the globe at 6º intervals of longitude,
dividing the globe into 60 projection zones,
avoiding the pole.
Universal Transverse Mercator
Projection (UTM)
N
84º
72º X
64º W
56º V
40º
32º S
24º R
16º Q
8º P
N 0

Zone -1

Zone -2


M
8º L
Equator
16º K
D
72º C
80º

174º E

180º
E
180º

174º

168º

S
W

Universal Transverse Mercator zones


• The UTM projection has been adopted by many
organizations for remote sensing, topographic
mapping, and natural resource inventory.
• Realizing that the UTM is the most popular
coordinate system amongst map users, most of
the digital products in United States are being
produced on UTM projection.
• Presently, many GPS receivers are adopting
this coordinate system as an option, in fact,
making it a de-facto standard coordinate system
in the spatial data collection industry.
Non-coordinate System
• In non-coordinate system, spatial referencing is
done using descriptive codes rather than
coordinates.
• Most widely used postal code which is
appended to a postal address, is one of the
examples of geo-referencing using codes.
• These codes may be completely numeric, such
as 267667 (PIN Code in India) or alpha numeric
such as DL3 6KT (Postcode in UK).

• The basic purpose of such codes is to increase


the efficiency of mail sorting and delivery rather
than to be an effective spatial referencing system
for GIS.
Non-Coordinate System
Delivery post office
No. (267667)

(DL3 6KT)

Unit post code

Sorting office route (2676) Postal sector code (DL3 6)

Sub-zone and sorting district postal district code


(267) (DL3)
Postal office zone (2) Postal area code (DL)

(b) Alphanumeric codes


(a) Numeric codes
• This system has the following advantages:

(i) Provides coverage of all areas where people


reside and work.
(ii) Individual codes do not refer to a single address.
(iii)Provide a degree of confidentiality for data
released using this as referencing system.
SPATIAL DATA ERRORS
• It is essential that the GIS products are high
quality products.
• This can be achieved by making use of quality
data with minimum errors.
• The data to be used in a GIS may have some
inherent errors and some errors are produced by
the system while working with the data.\
• It is important for GIS users to document the
limitations of their source data and the
subsequent output generated.
a) Data Quality and Errors
b) Errors, Accuracy and Precision
c) Bias
d) Completeness, Compatibility, Consistency and
Applicability
Data Quality and Errors
• The term quality is used for describing problems,
and errors are associated with the sources,
propagation and management of errors.
Examining issues such as errors, accuracy,
precision, and bias help in assessing the quality
of data sets.
Errors, Accuracy and Precision
• The errors in a data set refer to the faults including
the statistical concept of error meaning variation.
• Accuracy is defined as the degree of conformity or
closeness a data value to its true value.
An accurate GIS database represents real world.
• On the other hand, precision is the degree of
conformity or closeness of repeated observed values
of the same quantity in the data set to each other.
• It is perfectly feasible to have a GIS data set that is
highly accurate but not very precise, and vice versa.
• Computers store data with a high level of precision
which does not imply that the data set is of a high
level accuracy.
Bias
Bias in GIS data indicates the systematic
variation of data from reality, and can be referred
to as a consistent error through out a data set.
For example, a badly calibrated digitizer
introduces a consistent overshoot in digitized
data.
Resolution
• Resolution being an important concept when
dealing with spatial data.

• It is dependent on the scale of the original map,


the point size, and line width of the features
represented thereon, and the precision of
digitizing.
Completeness, Compatibility,
Consistency and Applicability
• A GIS data set should be complete
spatially and temporally, including with
respect to attribute information.
• Different data sets in the GIS database
should be compatible to produce a
sensible result.
• Compatibility can be ensured using
similar methods of data capture, storage
manipulation, and editing in producing
ideal data sets.
• Consistency should also be observed within
individual data sets.
• Inconsistency may arise within a data set when
different sections of a data set come from
different sources or different people digitize them.

• Applicability or suitability of a data set is viewed


in terms of a set of commands, operations or
analysis.
Source of Errors

Errors in GIS data may be categorized as:


(i) Conceptual errors,
(ii) Source data errors,
(iii) Data encoding errors,
(iv) Data editing and conversion errors,
(v) Data processing and analysis errors, and
(vi) Data output errors.
CONCEPTUAL ERRORS
• Conceptual errors arise from our understanding of the
real world and how it is modeled.
• The perception of reality varies from person to person,
and this affects the data.
• Whatever GIS model is adopted, it is a simplification
of the reality, and thus any simplification will introduce
errors of generalization, completeness, and
consistency.
SOURCE DATA ERRORS
• GIS spatial and attribute data collected from various
sources are likely to include errors.

• The errors in survey data can be due to observational


errors, instrumental errors, and personal errors,
whereas the data collected by remote sensing and
aerial photography can have the errors due to wrong
spatial referencing, and mistakes in classification and
interpretation.
• The temporal changes in the features also
introduce errors due to time and date of data
acquisition.

• Maps, probably the most frequently used sources


of data, contain errors in both spatial as well as
attribute data caused by human or equipment
failings.

• The cartographic process used in map-making


introduces subtle errors in maps.
DATA ENCODING ERRORS
• The process of transferring the data collected
through maps, remote sensing, or ground
surveys, into a GIS format is referred to as data
encoding.
• Data encoding is probably the greatest source of
error.
• Digitizing a map is one of the processes of data
encoding in GIS.
• It can be done manually or automatically using
suitable computer hardware.
• Despite of availability of hardware for automatic
digitization, most of the digitizing work is done
manually involving human judgment and
limitations, which is one of the main sources of
error in GIS.
• Translating a continuously curved line on a map
into a digital image involves a sampling process.
• Out of infinite number of points on the line, a very
small number of points along a curve are
sampled due to human limitations.
Representation of curved shape by nodes is
shown in Fig.
Data Encoding Errors



Node

 
 
Original line  Overshoot
 (dangling line)
Undershoot
Spike

Digitized line

Digitizing a line
• The errors are also introduced due to incorrect
registration of map document before the digitizing
commences both in manual and automatic digitizing.

• Raster scanners used for automatic digitizing suffer


from resolution problem.
EDITING AND CONVERSION ERRORS

• Since the data input by either manual or


automatic digitizing, is never without errors it
will almost always require editing and cleaning.
• It is difficult to locate the errors precisely
and remove them, but many of them can be
removed by careful scrutiny of data.
• Fig. shows some of the common errors in
digitizing and effect of editing and cleaning
using automated procedures available in some
of the vector GIS.
• The tolerance limit set for cleaning the errors
has its own effect on cleaning depending on its
value.
Editing and Conversion Errors
Unclosed gap
remains

Small polygon
Unclosed polygon
(intentional)
(unintentional) Dangling line
remains
Dangling line
(intentional)

(a) Too small tolerance limit

Arc or Gap
line (intentional)
Dangling line
(unintentional) Node
Intentional small
polygon removed

 Intentional gap
removed

(b) Too large tolerance limit


• While dealing with a raster GIS, a different
problem arises when using the automated
techniques for cleaning.
• In raster GIS, the noise, also referred to as
misclassification of cells, can be either regular
form which is easier to identify or scattered
randomly in which case, it is difficult to locate.
• The noise errors are rectified by employing filters,
which reclassify single cell or a group of cells by
matching them with general trends in the data.
• It is important to choose an appropriate filtering
method as the wrong method may remove
genuine variations in the data or retain too much
of the noise.
SPATIAL DATA ERRORS
• After cleaning and editing data it is required to
convert vector data to raster data or vice versa.
• When raster data is converted into vector,
topological ambiguities as shown in Fig. are
introduced.
• When converting vector data into raster data, both
size of the raster cell and the method of
rasterization used have important implications for
positional error and, in some cases, attribute
uncertainty.
TOPOLOGICAL ERRORS IN
CONVERSATION
Original vector map Vector converted to raster Raster converted to vector

A B A B A B

C B
• Positional and attribute errors resulting from
generalization are found as classification error in
cells along the vector polygon boundary (stepped
appearance of the raster version when compared
to the original vector form).

• Besides topological errors, conversion from


vector to raster may lead to loss of small
polygons and different raster map, due to
incorrect placement of grid for rasterization in
respect of orientation and origin.
PROCESSING & ERROR ANALYSIS
• Before processing and analysis of GIS data are taken
up, the GIS users must ensure the following:
(i) The data are suitable and relevant for analysis.
(ii) The data sets are compatible.
(iii) The technique to be employed is appropriate.
• GIS data processes that can introduce errors are
mainly classification of data, aggregation or
disaggregation of area data, and data integration by
overlay techniques.
• A common error arising due to overlaying two
polygon maps is slivers.
• These are very small polygons along correlated or
shared boundaries of the two input maps
SPATIAL DATA ERRORS

Original forest
map

Forest polygon Forest polygon Overlay


digitized at time digitized at time operation
t1 t2

Sliver polygons

Sliver polygons created in vector overlay


DATA OUTPUT ERRORS
• Due to inaccuracies in the GIS database
and errors resulting in manipulation and
analysis of data, it is inevitable that all GIS
output, whether in the form of paper maps
or digital database, will have inaccuracies,
the extent depending on attention and care
taken at all stages starting from
construction, manipulation, and analysis of
GIS database.
• In my next session, I will focus my attention on
GIS data models by which data can be stored in
a GIS environment.
THANK
YOU

You might also like