Fundamentals of Geographical
Information System
BGE II/II
Chapter 4 Data Sources [4 hrs]
1. Sources of Spatial Data
2. Spatial Data Input
3. Data Quality and Standards
4. Major Data Feeds
5. Data Formats
6. Metadata
Asst. Prof., Er. Bikash Sherchan 1
Sources of Spatial Data
Spatial data – Various sources Data Collection
- Two categories Most time consuming
- Primary and Secondary and expensive, yet
important task in GIS
Primary Sources:
Collected from scratch
Using spatial data acquisition techniques
In-Situ or Remote sensing
In-Situ data – Ground based, human being or special
instruments (e.g.; gauge/sensors/receivers)
- precipitation, temperature, wind, etc.
- social, economic and demographic data
- important characteristics/instruments – known
location/GPS
2
Sources of Spatial Data
Primary Sources:
Remote sensing
Usually not fit for immediate use – many sources of errors
and distortions exist
Large portion of GIS data – extracted by analyzing remotely
sensed data, e.g.; land use, land cover, building footprints,
transportation and utility networks, DTM, etc.
Although primary source – usually comes from other sources
Remote sensing instruments – cameras, multispectral and
hyperspectral scanners, thermal-infrared detectors, RADAR
and LiDAR sensors
Airplanes, helicopters and UAVs
Sound navigation and SONAR – ships and submarines for
bathymetric survey
3
Sources of Spatial Data
Secondary Sources:
Huge numbers of historical and current maps, aerial
photographs, diagrams and other types of geospatial
information – in the form of hard-copy format
Valuable resource – need to be handled carefully
- often one-of-a-kind
- originals are very fragile
Development in techniques to turn historical
maps/photographs to digital data,
- Can be related to GIS with other geospatial
information
- Digitization
4
Sources of Spatial Data
Secondary Sources:
Digitizing – 3 methods
Digitizing tables or tablets
with hand-held cursor or
electronic pen
Heads-up on-screen
digitization with cursor or
electronic pen
Raster scanning
5
Spatial Data Input and Editing
Data encoding – process of getting data into
the computer (Heywood et al, 2011)
Fundamental process in almost every GIS
project, e.g.;
Archeology - Encoding aerial photographs of
ancient remains to integrate with newly
collected field data
Planner – Digitize outlines of new buildings or
roads and plot on existing topographical data
Ecology – Add new remotely sensed data to a GIS
to examine changes in habitats;
and many more
6
Spatial Data Input and Editing
Data input – normally before being
structured or analyzed
Data input – Depends on the characteristics
of data and the way they are to be
modelled
Once fed into GIS, always need to be
corrected and manipulated to make sure
that they can be structure according to the
required data model
7
Spatial Data Input and Editing
Activities addressed at this stage are:
Re-formating of data (e.g.; conversion of postal
code to grid reference)
Re-projection of data from different sources to
a common projection
Generalization of complex data to provide a
simpler data set
Matching and joining of adjacent sheets once
the data are in digital form
8
Spatial Data Input and Editing
Summary of Data Encoding Methods
9
Spatial Data Input and Editing
Data Streaming (Heywood et al., 2011):
Range of methods to get data into GIS
(keyboard entry, digitizing, scanning and
electronic data transfer)
Data editing and manipulation include re-
projection, transformation, edge-matching
(Geo-referencing), etc.
The process of encoding and editing – the data
streaming
10
Spatial Data Input and Editing
Possible encoding methods for different data sources
(Heywood et al., 2011)
11
Spatial Data Input and Editing
Possible encoding methods for different data sources
(Heywood et al., 2011)
12
Spatial Data Input and Editing
Data Editing (Heywood et al., 2011):
Problems during data encoding: unlikely to
get an error-free data
Errors may also be derived from the original
source as well or
May be introduced during the encoding
process
Errors may in co-ordinate data or
Inaccuracy and uncertainty in attribute data
13
Spatial Data Input and Editing
Better to intercept errors before they go
on to propagate to higher levels of
information
The process is also known as data cleaning
Data editing – covered in heading
Detection and correction of errors
Re-projection, Transformation and
generalization
Edge matching and rubber sheeting and
Updating the spatial databases
14
Spatial Data Input and Editing
Detecting and Correcting Errors
Errors in input data – derived from 3 main
sources
Errors in source data - erroneous in paper map,
printing errors, etc. – difficult to identify
Errors during encoding – typing mistake, encoding
wrong line, mistaken for folds and stains as
geographical features
Errors propagated during transfer/conversion –
conversion between different formats required by
different conversion packages – lead to loss of data
15
Spatial Data Input and Editing
Detecting and Correcting Errors
Errors in attribute data – relatively easy to identify –
manual comparison with the original
Example: road coded as river
Various other methods as well, e.g.;
Impossible values – values falling outside the range -
incorrect
Extreme values – elevation values greater than the height
of Mt. Everest
Internal consistencies – totals, means, min and max shall
comply with the original data
Scattergram – errors in variables in correlation with each
other can be detected by scatterplot
Trend Surface – values that depart significantly from the
general trend identified and corrected
16
Spatial Data Input and Editing
Errors in spatial data are more difficult to
detect compared to errors in attribute data
Errors in spatial data – different forms
based on data models (vector or raster) and
method of data capture
Rain gauge station may be wrongly located
Landuse/lancover boundary may be wrongly
delineated
Railway line has been erroneously digitized
as road, etc.
17
Spatial Data Input and Editing
Common errors in spatial data (Vector)
18
Spatial Data Input and Editing
Problems in raster data - Like in vector data,
missing entities and noise
Difficult to collect data – restricted area,
obstacles (environmental or cultural), e.g.;
airports, rainy day, , snow covered area, etc.
Noise – in the data itself or introduced
during processing
Scattered pixels – characteristics of which do
not conform to the neighboring pixels
Can be removed by filtering
19
Spatial Data Input and Editing
Re-projection, transformation and
generalization
Once encoded and edited, it is required to
process geometrically – common reference frame
Scale and resolution of data source – also
important – to be taken into account when
combining data from different sources
Grid systems may have different origins, units of
measurement and orientations – necessary to
transform onto common grid system
20
Spatial Data Input and Editing
Re-projection, transformation and
generalization
Data from large scale map – generalized
Generalized map – comparable to small-scale maps
Saves processing time and disk space
Several routines available in GIS packages
Simplest technique – delete points along a line at a
fixed interval, e.g.; every third point
Original shape of the feature may be lost
Raster data generalization – aggregate or
amalgamate cells with same attribute values
- more appropriate approach - filtering 21
Spatial Data Input and Editing
Edge matching and rubber sheeting
Dealing with multiple map sheets – mismatch
between adjacent maps – need to be resolved
Each map sheet digitized separately -> adjacent
sheets joined after editing, re-projection,
transformation and generalization
This process is known as edge matching
At first sheet boundary mismatches resolved
- such problems are seen in maps derived from
multiple satellite images
Secondly in the case of vector data, topology
must be reconstructed
22
Spatial Data Input and Editing
Edge matching and rubber sheeting
23
Spatial Data Input and Editing
Edge matching and rubber sheeting
In certain data – internal distortions within
individual map sheets
In aerial photographs – movement of aircraft and
distortion of camera lens
Rectified through Rubber Sheeting (Conflation)
Stretching the map in various directions as if it is
drawn on a rubber sheet
Accurate points are fixed while the others having
wrong co-ordinates are stretched to fit with the
control points
Can be used for reprojection
24
Spatial Data Input and Editing
Edge matching and rubber sheeting
25
Spatial Data Input and Editing
Updating and maintaining spatial databases
Important to keep data up-to-date
Dynamic world: place and things changes over
time
Spatial data can go out of data – need regular
updating
26
Data Quality and Standards
Data Quality
Measure how good data are
Describes overall fitness/suitability of data
for specific purpose
Examine issues such as error, accuracy,
precision and bias
Resolution and generalization of source data
Also deal with completeness, compatibility
and consistency, and applicability for
analysis
27
Data Quality and Standards
Data Quality - Errors
Flaws in data
Physical difference with real world
Single, definable departures from reality or
Persistent, widespread deviations
A coordinate pair indicating a bank ATM –
incorrectly entered
Systematic error – coordinates of all ATMs
entered in (y,x) instead of (x,y)
28
Data Quality and Standards
Data Quality - Errors
Sources of Error in GIS
Spatial and attribute errors can occur at any
stage in GIS project
The best way to detect them is to observe them
within the context of a typical GIS project
Errors from Conceptualization
Originate from how we perceive, understand and
model a reality – conceptual errors
Perception of reality influence the definition of
reality and to the use of spatial data
29
Data Quality and Standards
Data Quality - Errors
Errors from Conceptualization
Inconsistencies among data collected by
different surveyors
Use of different spatial data model for
representation of reality (raster, vector, TIN,
etc.)
All of these have limitations – portraying reality
Errors in Source Data
Variety of data sources: Survey data, remotely
sensed data, map data, etc.
All are likely to include errors
30
Data Quality and Standards
Data Quality - Errors
Errors in Source Data
Human mistakes in device operation or recording
observations
Technical problems with the device/equipment
Examples:
Recording features incorrectly
GPS receiver or leveling machine malfunctioning
Wrong spatial referencing
Mistakes in interpretation and classification
Cloud and shadows obscure interesting details
Generalization
31
Data Quality and Standards
Data Quality - Errors
Errors in Data Encoding
Probably the greatest source of error in most GIS
Digitizing (both manual and automatic)
Source map error or operational error
Requires correct registration of original map
document
Cell size determined by the resolution of the
machine
Always require editing and cleaning
32
Data Quality and Standards
Data Quality - Errors
Errors in Data Editing and Conversion
Last line of defence against errors before it is
being used for analysis
Impossible to spot and remove all errors
Many problems can be eliminated by careful
examination of the data
Vector GIS contain routines to check and build
topology, e.g.; open polygons, dangling lines
(overshoots), etc. – automated procedures
In raster GIS – noise may be mistaken with
randomly scattered cell - filtering
33
Data Quality and Standards
Data Quality - Errors
Errors in Data Editing and Conversion
Vector to raster conversion – size of the raster
and method of rasterization matters
Pose positional error and in some cases attribute
uncertainty
Smaller cell size – greater precision – reduce
classification error (a form of attribute error)
Positional and attribute errors - generalization
34
Data Quality and Standards
Data Quality - Errors
Errors in Data Editing and Conversion
Classification
Error
35
Data Quality and Standards
Data Quality - Errors
Errors in Data
Editing and
Conversion
Loss of Connectivity
Loss of Information 36
Data Quality and Standards
Data Quality - Errors
Errors in Data Processing and Analysis
Inappropriate phrasing of spatial queries
Overlaying maps having different coordinate
systems
Combining maps having attributes measured in
incompatible units
Combining maps from different source (widely
different map scales)
Classification of data, aggregation or
disaggregation of area data
37
Data Quality and Standards
Data Quality - Accuracy
Extent to which an estimated value
approaches its true value (Aronoff, 1991)
Data is accurate – true representation of
reality
Impossible for spatial data to be 100%
accurate
Accuracy within a specified tolerance is
possible
Location of an ATM may be accurate within
10 m radius
38
Data Quality and Standards
Data Quality - Precision
Precision – exactness of the measurements
Also refers to number of decimal places,
e.g.; measurement of temperature at 1
degree interval or half degree
Which one is more precise?
Measurement of coordinates to 12 decimal
places and the one expressed to the 3
decimal places
39
Data Quality and Standards
Data Quality – Accuracy vs Precision
A data set highly accurate – not precise or
vice versa
A – high accuracy, low
precision
B – low accuracy, high
precision
C – low accuracy, low
precision
D – high accuracy, high
precision 40
Data Quality and Standards
Data Standards
Spatial data standards – methods for
structuring, describing, and delivering
spatially referenced data
Categorized into four areas: media
standards, format standards, accuracy
standards and documentation standards
All are important- latter 2 more complex
than the first two
41
Data Quality and Standards
Data Standards
Media standards -
Physical form in which data are transferred
Specific formats for CD-ROM, magnetic tape,
optical or solid state storage or some proprietary
drive or other media type
Standardized formats are specified by
International Standards Organization (ISO)
42
Data Quality and Standards
Data Standards
Format standards -
Specify data components and structures
Establishes number of files used to store a
spatial data set including the basic components
to be contained in each file
Order, size, range of values for data element
contained in each file are defined
Information such as spacing, variable types and
file encoding may be included
Aid in transferring data between different
computer hardware and software
43
Data Quality and Standards
Data Standards
Accuracy standards -
Document the quality of the positional and
attribute values
Knowledge on data quality is crucial for the
effective use of GIS
Field sampling – expensive, leads to additional
funds for collecting additional data
Time limit and fund limit – tempted to use less
quality data
Accuracy standards ensure – spatial data quality
in a well-defined, established manner
44
Data Quality and Standards
Data Standards
Documentation standards -
Define spatial data description
Agreed-upon way of describing – source,
development, and form of spatial data
Ensure complete description of the data origin,
methods of development, accuracy and delivery
formats
Allows to maintain the data to assess the
appropriateness of the data for an intended task
45
Data Quality and Standards
National and International Standards
Standards organizations at national and
international levels
Define and maintain geospatial standards
The Federal Geographic Data Committee
(FGDC) – USA
The International Standards Organization
(ISO)
The Open Geospatial Consortium (OGC),
e.g.; Web mapping services (WMS) standards
48
Metadata: Data Documentation
Special type of non-geometric data
Simply defined as ‘data about data’
Essential for the efficient use of the spatial
data
Describes - content, quality, methods,
developer, coordinate system, extent,
structure, spatial accuracy, attributes and
authority
Allow users to evaluate data in terms of
suitability for their intended use
Provides record of changes or modifications
that have been made
49
Metadata: Data Documentation
Some are derived automatically by the
software, eg.; length, area, extent of data,
count, etc.
Some others shall be collected explicitly,
e.g.; owner name, quality, and original
source, etc.
Explicitly collected metadata can be
entered in the same way as other
attributes
50
Metadata: Data Documentation
51
Metadata: Data Documentation
52
Metadata: Data Documentation
Created data travels through a network –
transformed – modified – used for many
different applications
Retransmitted to another user and then to
another and so on.
It is important to document any changes
made to any dataset by updating its
associated metadata
Standard methods to be established for
reporting metadata
53
Metadata: Data Documentation
In US Federal Geographic Data Committee
(FGDC) – defined a Content Standard for
Digital Geospatial Metadata (CSDGM)
There are 10 basic types of information in
the CSDGM:
1) Identification, describing the data set,
2) Data quality,
3) Spatial data organization,
4) Spatial reference coordinate system,
5) Entity and attribute,
54
Metadata: Data Documentation
There are 10 basic types of information in
the CSDGM:
6) Distribution and options for obtaining the
data set,
7) Currency of metadata and responsible party,
8) Citation,
9) Time period information, used with other
sections to provide temporal information, and
10) Contact organization or person.
55