0% found this document useful (0 votes)
152 views138 pages

GISBook

This document provides an overview of geographic information systems (GIS). It discusses the history and development of GIS, provides definitions of GIS from various authors, lists the objectives of GIS, and outlines the key components of a GIS which include hardware, software, data, methods, and personnel. The document serves as an introductory chapter to GIS.
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
152 views138 pages

GISBook

This document provides an overview of geographic information systems (GIS). It discusses the history and development of GIS, provides definitions of GIS from various authors, lists the objectives of GIS, and outlines the key components of a GIS which include hardware, software, data, methods, and personnel. The document serves as an introductory chapter to GIS.
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd

See discussions, stats, and author profiles for this publication at: https://www.researchgate.

net/publication/321377210

Geographic Information System: (1st Edition)

Book · July 2012

CITATIONS READS

0 1,088

2 authors:

Mohamed Hashim Mohamed Rinos mohammed ibrahim Mohamed Kaleel


South Eastern University of Sri Lanka South Eastern University of Sri Lanka
29 PUBLICATIONS   13 CITATIONS    42 PUBLICATIONS   69 CITATIONS   

SEE PROFILE SEE PROFILE

Some of the authors of this publication are also working on these related projects:

Study on the Feasibility of Chickpea Cultivation in Ampara Area in Sri Lanka View project

Identify the Significance of Performance Appraisals on Employee Work Improvement in Software Development Organizations View project

All content following this page was uploaded by Mohamed Hashim Mohamed Rinos on 05 October 2018.

The user has requested enhancement of the downloaded file.


1
2
3
4
5
6
7
8
9
10
Chapter 01

OVERVIEW OF GIS
1.1. Introduction

We are presently positioned at the twenty-first century with the fast


growing trends in computer technology, information systems and virtual world to
obtain data about the physical and cultural worlds, and to use these data to do
research or to solve practical problems. The current digital and analog electronic
devices facilitate the inventory of resources and the rapid execution of arithmetic
or logical operations. These Information Systems are undergoing much
improvement and they are able to create, manipulate, store and use spatial data
much faster and at rapid rate as compared to conventional methods.

An Information System, a collective of data and tools for working with those data,
contains data in analog form or digital form about the phenomena in the real
world. Our Perception of the world through selection, generalization and synthesis
give us information and the representation of this information that is, the data
constitutes a model of those phenomena. So the collection of data, the data base is
a physical repository of varied views of the real world representing our knowledge
at one point in time. Information is derived from the individual data elements in a
database, the information directly apparent i.e. information is produced from data
by our thought processes, institution or what ever based on our knowledge.
Therefore in a data base context the terms data, information and knowledge are
differentiated. It can be summarized that the data is very important and added
value as we progress from data to information, to knowledge. The data, which has
many origins and forms, may be any of the following:

1. Real data (terrain conditions)


2. Captured data (recorded digital data from remote sensing satellites or Arial
photographs of any area)
3. Interpreted data (landuse from remote sensing data)
4. Encoded data (recordings of rain-gauge data, depth of well data)
5. Structured or organized data (tables about conditions of particular watershed)

1.2. Developments of Information Systems

Geographic Information Management Technology encompasses many fields


including Computer Science, Cartography, Information Management,
Telecommunications, Geodesy, Photogrammetry and Remote Sensing and is
flavored with it‟s applications of engineering, environmental analysis, landuse
planning, natural resource development, infrastructure management, and many
others. Geographic Information Management Technology has almost as many
names and acronyms as uses.

11
One common name is Geographic Information System (GIS). Another is
automated mapping/facilities management (AM/FM). Although GIS has recently
became more widely accepted as a generic term for the technology, the term
Geographic Information System was first published in Northwestern University
discussion paper by Michael Darcy and Dvane Marble in 1965. Key terms
associated with geographic information management technology include:

 Automated Mapping (A.M.)


 Computer Assisted or Computer Aided Mapping (CAM)
 Computer Aided Drafting (CAD)
 Computer Aided Drafting and Design (CADD)
 Geographic Information System
 Automated Mapping/ Facility Management (AM/FM)
 Geo-processing and Network Analysis
 Land information System
 Multipurpose Cadastre

All these terminologies are often used interchangeably even though they
denote different capabilities and concepts.

1.3. History of GIS

The GIS history dates back 1960 where computer based GIS have been
used and their manual procedures were in life 100 years earlier or so. The initial
developments originated in North America with the organizations such as US
Bureau of the Census, The US Geological Survey and The Harvard Laboratory for
computer graphics and Environmental Systems Research Institute (commercial).
Canadian Geographic Information Systems (CGIS) in Canada, Natural
Experimental Research Center (NREC) and Department of Environment (DOE) in
notable organizations in U.K. involved in early developments. Even the other
parts of the world are noticed but they have taken place in recent past. In India the
major developments have happened for the last one-decade with significant
contribution coming from Department of Space emphasizing the GIS applications
for Natural Resources Management. Recently the commercial organizations have
realized the importance of GIS for many applications like infrastructure
development, facility management, business/market applications, etc.

1.4. Definitions of GIS

The defilations of a GIS given by various authors are as follows

 “A spatial data handling system" (Marble et at, 1983).

12
 “A computer - assisted system for the capture, storage, retrieval, analysis and
display of spatial data within a particular Organization" (Clarke, 1986).

 “A powerful set of tools for collecting, storing, retrieving at will, transforming


and displaying spatial data from the real world" (Burrough, 1987).

 “An internally referenced, automated, spatial information system" (Berry,


1986).

 "A system which uses a spatial data base to provide answers to queries of a
geographical nature", (Goodchild, 1985).

 “A system for capturing, storing, checking, manipulating, analyzing and


displaying data which are spatially referenced to the Earth” (DOE, 1987:132)

 “Any manual or computer based set of procedures used to store and


manipulate geographically referenced data”( Aronoff, 1989:39)

 “An institutional entity, reflecting an organizational structure that integrates


technology with a database, expertise and continuing financial support over
time”(Carter, 1989:3)

 “An information technology which stores, analyses and display both spatial
and non – spatial data”(Parker,1988:1547)

 “A special case of information systems where the database consists of


observations on spatially distributed features, activities, or events, which are
definable in space as points, lines or areas. A GIS manipulates data about
these points, lines and areas to retrieve data for ad hoc queries and
analyses”(Dueker,1979:106)

 “A database system in which most of the data are spatially indexed, and upon
which a set of procedures operated in order to answer queries about spatial
entities in the database”(Smith,1987:13)

 “An automated set of functions that provides professionals with advanced


capabilities for the storage, retrieval, manipulation and display of
geographically located data”(Ozemoy, Smith and Sicherman, 1981:92)

 “A powerful set of tools for collecting, storing, retrieving at will, transforming


and displaying spatial data from the real world”(Burrough, 1986:6)

 “A decision support system involving the integration of spatially referenced


data in a problem solving environment”(Cowen, 1988:1554)

13
 “A system with advanced geo – modeling capabilities”(Koshkariov, Tikunov
and Trofimov, 1989:259)

 “A form of MIS (Management Information System) that allows map display


of the general information”(Devine and Field, 1986:18)

Although the above definitions cover wide range of subjects and activities
best refer to geographical information. Some times it is also termed as Spatial
Information Systems as it deals with located data, for objects positioned in any
space, not just geographical, a term for world space. Similarly, the term 'non-
spatial data' is often used as a synonym for attribute data (i. e. rainfall/
temperature/ soil chemical parameters/ population data, etc. are referred as
attribute data).

1.5. Objectives of GIS

1. Maximize the efficiency of planning and decision making


2. Provide efficient means for data distribution and handling
3. Elimination of redundant data base - minimize duplication
4. Capacity to integrate information from many sources
5. Complex analysis/query involving geographical referenced data to generate
new information.

1.6. Components of a GIS

The GIS has been divided into five elements; hardware, software, data, methods
and live ware (Figure 1.1.)

Figure1.1: Components of GIS

14
Table 1.1: The complete details of different elements of a GIS

Components Details
Hardware Type of Computer Platforms
Modest Personnel Computers, High performance
workstations, Minicomputers, Mainframe computers

Input Devices
Scanners, Digitizers, Tape drivers, CD, Keyboard, Graphic
Monitor

Output Devices
Plotters, Printers

Software Input Modules, Editing, MRP Manipulation/ Analysis


Modules, Modeling Capability

Data Attribute Data, Spatial Data, Remote Sensing Data, Global


Database

Live ware People responsible for digitizing, implementing and using


GIS

1.7. GIS Data

Geographical data deals primarily with two types of data: Spatial data and Non-
spatial data. Spatial data is that which has physical dimensions and geographic
locations on the surface of earth (a river, a state boundary, a lake, a state capital,
etc.). Attribute or Non-spatial data is that qualifies the data that describes some
aspects of spatial data, not specified by its geometry alone.

1.8. Representation of Spatial Information

Geographical features are depicted on a map by Point, Line & Polygon.

Point feature ; A discrete location depicted by a special symbol or label. A single


x, y coordinates

Line feature; Line feature represents a linear feature. A set of ordered x, y


coordinates

Polygon feature; an area feature where boundary encloses a homogeneous area

15
Figure 1.2: Elements of a map

1.9. Representation of Attribute information

It consists of textural description on the properties associated with


geographical entities. Attributes are stored as a set of numbers and characters in
the form of a table. Many attribute data files can be linked together through the
use of a common identifier code.

1.10. Topology

Geographic data describes objects in terms of location, their attributes and spatial
relationship with each other. Topology is a mathematical procedure that
determines the spatial relationship of features.

Some of the advantages of topology are: Polygon network is fully


integrated, optimal storage and reduction in redundant information neighbors are
identified and polygon in polygon can be represented.

1.11. Data Models

Conversion of real world geographical variation into discrete objects is done


through data models. It represents the linkage between the real world domain of
geographic data and computer representation of these features. Data models
discussed here are for representing the spatial information.

Data models are of two types: Raster and Vector (Figure 1.3.). In raster type of
representation of the geographical data, a set of cells located by coordinate is
used; each cell is independently addressed with the value of an attribute. Each
cell contains a single value and every location corresponds to a cell. One set of
cell and associated value is a LAYER. Raster models are simple with which

16
spatial analysis is easier and faster. Raster data models require a huge volume of
data to be stored, fitness of data is limited by cell size & output is less beautiful.

Vector data model uses line segments or points represented by their explicit x, y
coordinates to identify locations. Discrete objects are formed by connecting line
segments which area is defined by set of line segments. Vector data models
require less storage space, outputs are appreciable, Estimation of area/perimeter is
accurate and editing is faster and convenient. Spatial analysis is difficult with
respect to writing the software program.

Figure 1.3: Vector and raster data

1.12. Data Structures

There are number of different ways to organize the data inside the
information system. The choice of data structure affects both; data storage
volume and processing efficiency. Many GIS have specialized capabilities for
storing and manipulating attribute data in addition to spatial information. Three
basic data structures are – Relational, Hierarchical and Network.

Relational data structure ; organizes the data in terms of two-dimensional tables


where each table is a separate file. Each row in the table is a record and each
record has a set of attributes. Each column in the table is an attribute. Different
tables are related through the use of a common identifier called KEY. The
information is extracted by relation are defined by query.

Hierarchical data structure ; stores the data in a way that a hierarchy is


maintained among the data items. Each node can be divided into one or more
additional node. Stored data gets more and more detailed as one branch further
out on the tree.

Network data structure; is similar to hierarchy structure with the exception that
in this structure a node may have more than one parent. Each node can be divided
into one or more additional nodes. Nodes can have many parents. The network

17
data structure has the limitation that the pointers must be updated every time a
change is made to database causing considerable overhead.

1.13. Errors in GIS

Errors in GIS environment can be classified into following major groups:

Age of data Reliability decreases with age


Map scale Non-availability of data at proper scale or use of
data at different scales
Density of observation Sparsely dense data set is less reliable
Relevance of data Use of surrogate data leads to errors
Data inaccuracy Positional, elevation, minimum mappable unit
Inaccuracy of contents Attributes are erroneously attached

Errors associated with processing

Error Reason
Map digitization errors due to boundary location problems on maps and
errors associated with digital representation of
features
Rasterisation errors due to topological mismatch arising during
approximation by grid
Spatial Integration errors due to map integration resulting in spurious
polygons
Generalization errors due to aggregation process when features are
abstracted to lower scale
Attribute mismatch errors
Misuse of logic

1.14. Spatial Analysis

For any application there are five generic questions a GIS can answer:

 Location - What exists at a particular location?


 Condition - Identify locations where certain conditions exist.
 Trends- What has changed since?
 Patterns- What spatial pattern exists?
 Modeling - What if ……….?

GIS is used to perform a variety of spatial analysis, including overlaying


combinations of features and recording resultant conditions, analyzing flows or
other characteristics of networks and defining districts in terms of spatial criteria.
Its uses in various fields are: facility management, planning, environmental

18
monitoring, population census analysis, insurance assessment, health service
provision, hazard mapping and many other applications. Although GIS and
AM/FM Systems have similar capabilities, GIS traditionally has referred to
systems that emphasize Spatial Analysis and modeling while AM/FM systems
management of geographically distributed facilities. A complete GIS or Spatial
Information System is consisting hardware, software and humanware (i.e. trained
experts in GIS).

A GIS can acquire and store data by import from external sources or by
capture from maps and reports. Once in storage the data must be kept backed up,
and updated when new information becomes available. Since more than 70% of
the cost in GIS Project lies in data capture; the database is the primary asset of a
GIS. Spatial data is collected from a variety of sources. Remotely sensed data
from satellite is a primary data source. The other information coming from
modern survey instruments is also a primary data source as it can be read directly
into GIS similar to remote sensing data. The secondary data capture involves
processing information which has already been compiled but requires converting
into a computer readable format by manual or automatic digitization.

1.15. Applications and trend of GIS

The GIS presently being used almost all the sectors like; agricultural development,
land evaluation analysis, change detection of vegetated areas, analysis of
deforestation and associated environmental hazards, monitoring vegetation health,
mapping, land degradation, crop acreage and production estimation, wasteland
mapping, soil resources mapping, groundwater potential mapping, geological and
mineral exploration, snow-melt run-off forecasting, monitoring forest fire,
monitoring ocean productivity, etc.

Further in future GIS will play vital role in Natural Resources Management,
Telecom GIS, Automated mapping and facility management (AM/FM), Virtual 3-
D GIS, Internet GIS, Spatial Multi-media, etc.

GIS Terminology

Some of the terms used in GIS are briefly explained below:

Data Planes are discrete sets of data. For example, imagery, thematic map,
topographic sheet, page of survey data each constitutes a data plane.

Themes are maps containing different types of information. For example, a topo-
sheet contains contours, roads, railways, boundaries of forests etc. Each of these
constitutes a theme.

19
Registration; the themes in a given data plane are spatially related to each other.
We say that the data is registered. Data in different planes may not be so readily
related. Scales may vary; there may be transitional and rotational errors. This
process of correction and development of an invariant spatial relationship between
different data planes is called registration.

Database; the spatially registered set of data constitutes a spatial database. In


addition, each spatial object has an associated attribute. This could be a name, a
number, a range of values etc. For example, a contour has a number, a road has a
name. Such attributes also form a part of the database. Further, there may be
other data sets associated demographic data.

Spatial objects; all spatial objects can be represented by points, lines and
polygons. A city is a point, a road is a line and a forest area is a polygon. The
manners in which these fundamental units are represented are defined by the
spatial data model. For example, we can have a chain as a set of line segments, a
closed chain forms a polygon, an open chain is a line and a line segment of zero
length is a point.

Scale; this is the relationship between distances on the ground and distances on a
map. Scale always applies to linear measures, never to area or elevations.

Resolutions; this is the smallest element which can be distinguished in a data set.
In case of imagery, this usually is the pixel size or a multiple to the pixel size.
However, in a map this term can be confusing. This may be taken to mean the
smallest mappable feature. However, some features are mapped by symbols even
if their size is small.

20
Chapter 02

HARDWARE AND SOFTWARE REQUIREMENTS


2.1. Introduction

The technological advancements in last two decades made possible the leas and
bounds development in the field of GIS. Switching over to a new technology
always invokes different feelings like exhilaration, fright, frustration and more.
Exhilaration comes from applying latest technology to our routine problems. Fear
is provoked by the constant rapid advancements. Frustration swells from the
tedious pace of implementation and associated turf battles over information.

The basic broad and vital components of a GIS are;

1. Hardware
2. Software
3. Brainware

They attracted by glitter and promises of industry‟s marketing strategy,


inexperienced users and even experienced users does mistakes in choosing a
proper hardware and software.

2.2. Design Philosophy of GIS

Implementation process of a GIS can be divided broadly into five major stages.

1. Concept (Requirement analysis, Feasibility analysis)


2. Design (Implementation plan, system design, DB design )
3. Development (System acquisition, DB acquisition, organizing staff &
training, operation procedure preparation, site preparation)
4. Operation (System installation, pilot project, data conversion, application
development, conversion to automated operations)
5. Audit (System review, system expansion)

It is clear that Applications drives the system design. Hardware and Software
parts play the brick stone part which can be brought at any time once the
conceptual design is ready. But at the same time we cannot ignore them. Once
again between software and hardware it was a general opinion that Computer
hardware should be viewed as subordinate to software.

In some situations, processor and operating system influence, even control, the
selection of GIS software. An organization that has accepted a particular vendor
as its supplier sometimes limits its software choices. In other instances, the

21
availability of existing processing resources and the need to avoid additional costs
governs the choice of a hardware platform.

2.3. System Design

GIS is implemented typically by acquiring a commercially available hardware and


software system. Numerous GIS systems are available in the market place, each
with individual strengths and weaknesses that should be assessed in light of the
organization‟s requirements.

2.4. Hardware

Generally Hardware is being accessed by six factors.

1. Affordability ( Cost of the machine)


2. Scalability ( Vertical or horizontal growth )
3. Reliability (System performance, Average down/recovery time)
4. Connectivity ( Network capability )
5. Security ( Protection from hackers)
6. Accessibility ( User friendliness)

Ever increasing need for sharing information forced the technocrats and
techno-architects to make the systems which almost satisfy the entire major above
said characters. Nowadays the Computer without Internet is being considered like
having a car without road. So the network capability, design has to be given
important weight while choosing hardware and peripherals. Nowadays even
printers are coming as “Network Ready” without being connected physically to a
computer.

Following Hardware is required to have full-fledged GIS.

1. Computer (100/233/300 MHZ Chip, 8/16/32 MB RAM, 2/4/10 GB HDD,


1.44MB FDD, 12/24/32x CD-ROM Drive)
2. Digitizer (A3/A1/A0 size)
3. Graphics Accelerator Card
4. Color Plotter / Printer
5. External Storage Devices (4/6/8mm DAT drives)
6. B/W Scanner (A0 size)

2.5. Software

Before buying a Software one should have a clear understanding of what the user
really want the Software to do for him or his project. Next step is to make sure
that the GIS Software can execute on your computer and under the control of the

22
operating system you are using. Next step is to see its capabilities and its user
friendliness. So the key terms used for validating software worthiness are

1. User friendliness (easy graphical interface)


2. Functionalities (Effectiveness & Efficiency)
3. Compatibilities (operating system friendliness)
4. Updativity (changeable with versions )
5. Documentation (help on the software functions/algorithms)
6. Cost-effectiveness (more functions with less price)

GIS market is increasing and the available Software is innumerable. So user has to
take care of his need and above said characters. It is also recommended that the
organization / institute which already using GIS systems can be asked for help
during the procurement and implementation process. Department of Space is the
leading agency in India using and developing GIS based software and
applications. Even it has developed many application oriented, user friendly
software, derived from existing GIS software functionalities, namely
GEO_SMART, DECISION_SPACE, BIO_CAP and more.

23
Chapter 03

DATABASE STRUCTURES AND FORMATS


3.1. Data Base

Data are raw material from which every land information system is built.
They gathered and assembled into records and files. A database is a systematic
collection of data that can be shared by different users. It is a group of records
and files that are organized, so that there is little or no redundancy.

3.2. Data Base Structure

A data base consists of data in many files, in order to be able to access data
from one or more files easily, it is necessary to have some kind of structure or
organization. The main kinds of data base structure are termed as;

1. Hierarchical
2. Network
3. Relational

3.2.1. Hierarchical Data Structure

A hierarchical file is a case of a tree structure. The tree is composed of hierarch7y


of nodes; the upper-most node is called the root. With the exception of this root,
every node is related to a node at a higher level called its parent. No element
though it can have more than one lower level element called children. A
hierarchical file is one with a tree-structure relationship between the records for
example a master detail file with two record types. Such a representation is often
very convenient because much data tend to be hierarchical in nature or can easily
be cast into this structure.

Department

Job Description Employee

Education Background Education Job History


Required Required

Figure 3.1: Hierarchical Data Structure

24
Hierarchical approach is very efficient if all desired access paths follow the parent
child linkages. However, it requires a relatively inflexible structure to be placed
on the problem at the outset, when the record type consisting the tree structure.
The combination of inflexible structure is setups and the overheads of maintaining
or changing pointer system makes extensive modification of the structure of
hierarchical systems to meet new requirements, a resource intensive operation.
These reasons have contributed to the lack of adoption of this type of DBMS for
flexible GIS requirements.

3.2.2. Network Data Structure

A network structure exists when a child in a data relationship has more than
one parent. An item in such a structure can be linked to any other item. The
physical data to support complex network structures is far more difficult to
develop than for simple structures.

Author 1 Author 2

Book 1 Book 2 Book 3

Figure 3.2: Network Structure

Each entity set with its attributes is considered to be a node in the network.
Relationship sets are represented as linkages in the form of pointers between
individual entities in different entity sets. As a result, all the different forms of
mapping one-to-many, many-to-many, etc. can be handled directly with large
number of pointers.

The network approach is powerful and flexible. For many applications, it is


also very fast and efficient in terms of CPU resources. From the implementation
point of view, it may be comparatively difficult to set up the database correctly
and although the query language is comprehensive, it may also be complex and
confusing for less expert users. Major restructuring of the data base may be time
consuming because of the extensive pointer structure that has to be rebuilt.

25
3.2.3 Relational Data Structures

In this type, data are organized in two-dimensional tables, such tables are
easy for a user to develop and understand. This structure can be described
mathematically, a most difficult task for other types of data structure. These
structures are called relational structures because each table represents a relation.

Since different users see different sets of data and different relationships
between them, it is necessary to extract sub-sets of the table columns for some
users and to join tables together for others to form larger tables. The mathematics
provides the basis for extracting some columns from the tables and for joining
various columns. This capability to manipulate relations provides flexibility is
normally not available in hierarchical or network structure.

Figure 3.3: Relational Data Structure

Relational systems are characterized by simplicity, in that all the data are
represented in tables (relations) of rows and columns.

From the data base design viewpoint, entity relationship modeling fits very
closely with relational systems. Each entity set is represented by a table, while
each row or „tube‟ in the table represents the data for an individual entity. Each
column holds data on one of the attributes of the entity set.

Since relationships between entities are directly represented as tables, there


is no requirement for pointers or linkages between data records to be set up, as
was the case with hierarchical or network systems.

Important features of Relational Data Bases

 Primary key
 Relational joins
 Normal forms

26
a) The Primary Key

The Relational approach has important implications for the design of data
base tables. Since each table or relation represents a set, it cannot, therefore, have
any rows whose entire contents are duplicated. Secondly, as each row must be
different to every other, it follows that a value in a single column, or a
combination of values in multiple columns, can be used to define a primary key
for the table, which allows each row to be uniquely identified. The uniqueness
properly allows the primary key to serve as the sole row level addressing
mechanism in the relational data base model.

b) Relational joins

Name Designation Emp. Code


MARTIN PROFESSOR 1107
KEN EASAN READER 1205

Name Salary Experience


(years)
Join MARTIN Rs. 1 Million 12

The mechanism for linking data in different tables is called a relational join.
Values in a column or columns in one table are matched to corresponding values
in a column or columns in a second table. Matching is frequently based on a
primary key in one table linked to a column in the second, which is termed a
foreign key. An example of the join mechanism is shown below:

c) Normal Forms

A certain amount of necessary data redundancy is implicitly in the relational


model because the join mechanism matches column values between tables.
Without careful design, unnecessary redundancy may be introduced into the
database.

All the tables must contain rows and columns and column values must be
atomic, that is they do not contain repeating groups of data, such as multiple
values of a census variable for different years.

The second requirement of normal form is that every column, which is not
part of the primary key, must be fully dependent on the primary key.

The third normal form requires that every non-primary key column must be
non-transitively dependent on the primary key.

27
Nevertheless, the fundamental working rule for most circumstances ensure
that each attribute of a table represents a fact about the primary key, the whole
primary key and nothing, but the primary key, while this is entirely valid from the
design view point, it must also be said that practical implementation requirements
may, on occasion, override theoretical considerations and lead to tables being
merged and de-normalized, usually for performance reasons.

Advantages and disadvantages of relational systems

The Advantages can be summarized as follows:


 Rigorous design methodology based on sound theoretical foundations
 All the other data base structures can be reduced to a set of relational tables,
so they are the most general form of data representation
 Ease of use and implementation compared to other types of system
 Modifiability, which allows new tables and new rows of data within tables to
be added without difficulty
 Flexibility in ad-hoc data retrieval because of the relational joins mechanism
and powerful query language facility.

Disadvantages include;

 A greater requirement for processing resources with increasing numbers of


users on a given system than with the other types of data base.

 On heavily loaded systems, queries involving multiple relational joins may


give slower response times than are desirable. This problem can largely be
mitigated by effective use of indexing and other optimization strategies,
together with the continued improvements in price performance in computing
hardware from mainframes to PC‟s.

3.3. Spatial Data Bases

With continued development in data base design, storage methods and


retrieval performance, it is now quite feasible to hold tens of Gigabytes of digital
cartographic data, map attribute data or both, using proprietary software and the
more powerful hardware platforms available from a variety of vendors. Some
examples of large spatial databases are shown below:

The DBMS provides a wide range of ready-made data manipulation tools,


so programming effort can be concentrated on algorithms for spatial analysis and
user interface requirements.
Though, a data base approach has several advantages over file system
approach, GIS system designers prefer the latter approach for storage of digital
map coordinates. This had led to the development of two different approaches to
implementation, based on either a hybrid or an integrated data model.

28
Table 1: Some Large Spatial Data Bases
Object Coordinate
Data base Nature of data
types (10 6 )
WDDES (World Digital Data Polygons 300 Contours, rivers,
for the Environment Sciences) Lines boundaries
SOTER (Soil & Terrain Data Polygons 150 Soil polygons
Base)
CORINE (Coordinated Inform. Polygons 50 Natural resource
on the European Environment) Lines and political
CGIS-CLI (Canada GIS) Polygons 90 Land use potential
Alberta LRIS (Land Resources Polygon 140 Land tenure
Information system) Lines
Edmonton City Polygon 4 Urban
Lines infrastructure
Points

3.4. The Hybrid Data Model

In this model, the digital cartographic data are stored in a set of direct access
operating system files for speed of input/output, while attribute data are usually
stored in a standard commercial relational type DBMS, such as INFO, ORACLE,
INGRES or INFORMIX. The GIS software manages linkages between the
cartographic files and the DBMS during different map processing operations, such
as overlay.

While a number of different approaches to the storage of the cartographic


data as used, the linking mechanism to the data base is essentially the same, based
on the unique identifiers stores in a data base table of attributes that allow them to
be tied to individual map elements.

Hybrid Types

(i) CAD based systems: In this type, map features are held as graphics
elements, but without any topological information. Examples of this type are
INTERGRAPH EGDS/DMRS and MICROSTATION – 32.

(ii) Vector-topological systems: These systems hold the topological map


information in a set of linked files very similar to the structure that might be
expected, if the data were inside, rather than outside the relational DBMS.
Examples of this type are ESRI ARC/INFO, GEOVISION and
INTERGRAPH MICROSTATION GIS Systems.

29
(iii) Quad tree-based systems: These systems do not have the same level of
representation in the commercial market place, as the previous types.
Example is SPANS Systems.

3.5. The Integrated Data Model

The integrated data model approach is also described as the spatial data base
management system approach where GIS serves as the query processor. In this
approach relational tables hold map coordinates data for points/nodes and line
segments, together with other tables containing topological information.
Attributes may be stored in the same table as the map feature database or in
separate tables accessible via relational joins.

In this approach, storage of individual coordinate pairs in different rows of a


data base table creates substantial performance overheads, if large volumes of data
have to be retrieved quickly for graphical purposes. To achieve satisfactory
retrieval performance, it has been found necessary to store coordinate strings in
long or „bulk data‟ columns in tables.

Considerations in adopting a database approach

The adoption of a data base approach to data management is a major


decision that impacts on every facet of CBIS, the decision dictates a virtually
irreversible course of system design, and for existing file oriented systems the
decision may involve some temporary disruption of services and no small risk to
file integrity during the conversion process.

Advantages of the Data Base Approach

1. Reduces redundancy in data storage


2. Simplifies data maintenance
3. Reduces processing time
4. Improves internal consistency among data
5. Data can be shared by many applications

Disadvantages of Data Base Approach

1. High Cost
2. Security is difficult to maintain
3. Consequences of Security breaches may be severe
4. Greater control over data is required

30
Chapter 04

SPATIAL DATA MODELS


4.1.Geographical Data

GIS neither store a map in any conventional sense, nor store a particular
image or view of geographic area. Instead, a GIS stores the data from which we
can draw a desired view to suit a particular purpose known as geographic data.

There are two types of data in GIS:

1. Spatial Data
2. Non-spatial or Attribute data

A geographical information system essentially integrates the above two


types of data and allows user to derive new data for planning.

4.1.1. Spatial Data

All geographical data can be reduced to three basic geographical


phenomenon can in principle be represented by a point, line or area plus a label
saying what it is. So an oil well could be represented by a point entity consisting
of a XY coordinate; a road could be represented by a series of XY coordinates; a
floodplain could be represented by an area entity covering a set of XY co-
ordinates plus the label „floodplain‟. The labels could be the actual names as given
here, or they could be special symbols.

The essential features of any data storage system are that they should be
able to allow data to be accessed-and cross-referenced quickly. There are several
ways of achieving this, some of which are more efficient than others.
Unfortunately, there seems to be no one 'best' method that can be used for all
situations. This explains in part the massive investment in labour and money in
effective database management systems, which are the computer programs that
control data input, output, storage, and retrieval from a digital database.

4.1.2. Non-spatial Data

Non-spatial data include information about the features. For example, name of
roads, schools, forests and population or census data for the region concerned, etc.
Non-spatial or attribute data is that qualifies the spatial data. It describes some
aspects of the spatial data, not specified by its geometry alone.

31
4.2. Geographical data in the computer

When geographical data are entered into a computer the user will be most
at ease if the geographical information system can accept the phenomenological
data structures that he has always been accustomed to using. But computers are
not organized like human minds and must be programmed to represent
phenomenological structures appropriately. Moreover, the way the geographical
data are visualized by the user is frequently, not the most efficient way to structure
the computer database. Finally, the data have to be written and stored on
magnetic devices that need to be addressed in a specific way.

4.3. Geographical Data Base

The data base concept is central to a GIS and is the main difference between
a GIS and drafting or computer mapping systems, which can produce only good
graphic output. All contemporary Geographic Information system incorporates a
data base management system. Data base systems provide the means of storing a
wide range of geographic information and updating it without the need to rewrite
program. In GIS, the spatial data models and Non-spatial data models or Data
base management system handle the feature description and how each feature is
related to each other.

4.4. Data Model

In order to represent the spatial information and their attributes, a data model
– a set of logical definitions or rules for characterizing the geographical data is
adopted. The data model represents the linkages between the real world domain
of geographical data and the computer and GIS representation of these features.
As a result, the data model, not only helps in organizing the real-world
geographical features into a systematic storage/retrieval mechanism, but also helps
in capturing the user‟s perception of these features.

The model;
a) Structures the data to be amenable to computer storage/retrieval and
manipulation. The data structure is the core of the model and it is based upon
this that features of real world are represented. The ability of the data
structure to totally represent the real world determines the success of the
model.

b) Abstracts the real world into properties, which is perceived by a specific


application. For example, a Landuse map is perceived to be made up of
different classes with symbols and legends. The district information is
perceived to be made up of district maps and different attribute tables.

32
c) Helps organize a systematic file structure, which is the internal organization
of real world data in a computer.

4.5. Spatial Data Model

Two approaches or models have been widely adopted for representing the spatial
data within GIS;

 The Cartographic Map Model


 The Geo-relational Model

Each of the approach is based on a specific spatial data model. The Cartographic
Map Model is usually based on a tessellated representation of space and the Geo-
relational model is usually associated with a vector representation of space.

4.6. Cartographic Map Model

Typically the grid-cell tessellation, widely known as raster structure is the most
commonly adopted structure in a GIS package. The quadtree tessellation is
another method, which has been adopted by many GIS packages.

4.7. Raster structure

The Grid-cell or raster model is a relatively simple approach to data


representation, both conceptually and operationally, and it has therefore, been
popular since the earliest days of GIS development. The model is currently
implemented in a large number of raster-based GIS packages. Raster data
performs a discretization of the geometric area of interest and the entire space is
broken into grid cells of a fixed or uniform size. In this type of representation of
geographic data, a set of cells located by coordinates is used; each cell is
independently addressed with the value of an attribute by specifying values to
each grid cell.

The simplest raster data structures consist of an array of grid cells. Each
grid cell is referenced by a row and column number and it contains a number
representing the type or value of the attribute being mapped. In raster structures a
point is represented by a single grid cell; a line by a number of neighbouring cells
strung out in given direction and an area by an agglomeration of neighbouring
cells.

Rasters are limited by the area they can represent and also the limits of
storage space. Also the fineness of data is limited by the cell size, thus the area of
coverage is traded off with the resolution of the coverage. The storage problems
are handled by resorting to coding, such as run-length ending, chain coding, block
coding, etc.

33
The capabilities of a raster-based Cartographic modeling system
ultimately arise from functions associated with individual data-transforming
operations and the way in which these operations are combined. This
transformation of data is facilitated by the fact that map layer zones are
represented not by lines or symbols, but by numerical values. It is also facilitated
by the fact that these values are directly associated with individual locations. The
use of numbers here makes it possible to transform geographical characteristics
using mathematical and arithmetical function.

4.8. Quadtree tessellation

A more elegant tessellation is the quadtree in which the geographical area


is decomposed into four equal quadrants and the decomposition continues till each
quad represents a homogeneous unit. The number of times the decomposition
process may be applied, known as the resolution of decomposition, may either be
fixed a priori or purely determined by the input data. The storage requirements of
a quadtree are much lower than that of a raster having the resolution of the
smallest quad element.

The tessellation into quad results in a tree with each node represented by
4- sub nodes and thus the name quadtree (Figure 4.1). Quadtree variants for
representing area, lines and points have been designed to represent polygon
features, line features and point features, respectively.

Figure 4.1: Quadtree Tessellation

34
4.9. The Geo-relational Model

In the second and more intensely developed approach to information


integration, attribute information is associated with point, line and polygons – as
spatial entities that describe features occurring in the real world. Thus, for
example, a point feature such as a city may have associated with it such items of
information as its total population, number of houses, and number of schools and
so on. Similarly, a linear feature such as a river might have associated with it such
information as name, mean discharge, etc. A polygonal feature such as landuse
category might be linked to information describing its use, past land use, its soil
type and so forth.

4.10. Vector Data Structures

The vector representation of an object is an attempt to represent the object


as exactly as possible. The co-ordinate space is assumed to be continuous, not
quantized as with the raster space, allowing all positions, lengths, and dimensions
to be defined precisely.

The vector data structure represents each geographical feature by a set of


coordinates. Vectors as x, y coordinates define points, lines and polygons. The
basic premise of the vector based structuring is to define a 2-dimensional space
where coordinates on the two-axes represent features. Generally, representing
points and lines is straightforward – points are characterized by a x, y coordinate
pair and line by a set of x, y coordinate pairs with a specific beginning and ending
vector. However, representing polygons in vector storage poses a challenge. The
three vector structures used in Geographical Information Systems (GIS) for the
storage of points, lines and areas are:

Point entities

Point entities can be considered to embrace all geographical and graphical


entities that are positioned by a single XY co-ordinate pair. Besides the XY co-
ordinates, other data must be stored to indicate what kind of 'point' it is, and the
other information associated with it.

35
Line entities

Line entities can be defined as all linear features built up of straight line
segments made up of two or more co-ordinates. The simplest line required the
storage of a begin point and an end point (two XY coordinate pairs) plus a
possible record indicating the display symbol to be used.

An 'arc’, a 'chain' or a 'string' is a set of XY co-ordinates pairs


describing a continuous complex line. The shorter the line segments, and the
larger the number of XY co-ordinates pairs, the closer the chain will approximate
a complex curve. Data storage space can be saved at the expense of processing
time by storing a number that indicates that the display driver routine should fit a
mathematical interpolation function to the stored co-ordinates when the line data
are sent to the display device.

As with 'point' and simple line, chains can be stored with data records
indicating the type of display line symbols to be used.

Area entities

There are several ways of vector structures possible for structuring polygons.
The simplest way to represent a polygon is the spaghetti representation, which is
nothing, but an extension of the simple chain, i.e. to represent each polygon as a
set of XY co-ordinates on the boundary i.e. the polygons are discretised to the
concept of line representation and are characterized by a set of x, y coordinate
pairs, but have the same vector as the beginning and ending vector that is
representing a self closing line as a polygon. The name of the symbols are used to
tell the user what each polygon in and then held as a set of simple text entities.
While this method has the advantages of simplicity it has many disadvantages.

a. This method of storage is not optimal as cognizance is not taken of shared


lines of two adjacent polygons as a result of which lines bordering polygons
are structured twice (digitized and stored twice) and each one once for each of
the polygons.

b. There is no neighbourhood information.

c. Islands are impossible except as purely graphical constructions, and

d. There are no easy ways to check if the topology of the boundary is correct or
whether it is incomplete ('dead ends') or makes topologically inaccessible
loops ('weird polygons').

36
The choice between Raster and Vector

The raster and vector models for spatial data structures are distinctly
different approaches to modeling geographical information, but are they mutually
exclusive? Only a few years ago, the conventional wisdom was that raster and
vector data structures were irreconcilable alternatives. They were then
irreconcilable because raster methods were required huge computer memories to
store and process image at the level of spatial resolution obtained by vector
structures. Certain kinds of data manipulation, such as polygon intersection or
spatial averaging presented enormous technical problems with the choice of raster
methods that allowed easy spatial analysis but resulted in ugly maps, or vector
methods that could provide database of manageable size and elegant graphics but
in which spatial analysis was extremely difficult.

Vector methods

Advantages

 Good representation of phenomenological data structure


 Compact data structure
 Topology can be completely described with network linkages
 Accurate graphics
 Retrieval, updating and generalization of graphics and attributes are possible

Disadvantages

 Complex data structures


 Combination of several vector polygon maps or polygon and raster maps
through overlay creates difficulties
 Simulation is difficult because each unit has a different topological form
 Display and plotting can be expensive, particularly for high quality, colour
and crosshatching
 The technology is expensive, particularly for the more sophisticated software
and hardware
 Spatial analysis and filtering within polygons are impossible

Raster methods

Advantages

 Simple data structures


 The overlay and combination of mapped data with remotely sensed data is
easy
 Various kinds of spatial analysis are easy
 Simulation is easy because each spatial unit has the same size and shape
 The technology is cheap and is being energetically develop

37
Disadvantage

 Volumes of graphic data


 The use of large cells to reduce data volumes means that phenomenologically
recognizable structures can be lost and there can be a serious loss of
information
 Crude raster maps are considerably less beautiful than maps drawn with fine
lines
 Network linkages are difficult to establish
 Projection transformation is time consuming unless spatial algorithms or
hardware are used.

The problem of raster or vector disappears once it is realized that both are
valid methods for representing spatial data, and that both structure are inter-
convertible. Conversion from vector to raster is the simplest and there are many
well know algorithms (e.g. Pavelidis 1982). Vector to raster conversions are now
performed automatically in many display screens by inbuilt microprocessors. The
reverse operation raster to vector, is also well understood (Pavlidis lists four
algorithms for thinning bands of pixel to lines), but it is much more complex
operation that is complicated by the need to reduce the number of co-ordinates in
the resulting lines by a process known as weeding.

4.11. Suggestions for the use of Raster and Vector Models

1. Use VECTOR data structure for data archiving phenomenologically


structured data (e.g. soil areas, land use units, etc.).

2 Use VECTOR methods for network analyses, such as for telephone networks,
or transport network analysis.

3. Use VECTOR data structure and VECTOR display methods for the highest
quality line drawing.

4. Use RASTER methods for quick and shear map over lay, map combination and
spatial analysis.

5. Use RASTER methods for simulation and modeling when it is necessary to


work with surfaces.

6. Use RASTER and VECTOR in combination for plotting high quality lines in
combination with efficient area filling in colour. The lines can be held in
VECTOR format and the raster filling in compact RASTER structures such as
run length codes or quad trees.

38
7. Preferably use compact VECTOR data structure for digital terrain models, but
don't neglect altitude matrices.

8. Use RASTER -VECTOR and VECTOR - RASTER algorithms to convert


data to the most suitable form for a given analysis or manipulation.

9. Remember that DISPLAY systems can operate either in RASTER or


VECTOR modes independent of the DATA STRUCTURES that are used to
store and manipulate the data.

39
Chapter 05

METHODS OF DATA INPUTTING IN GIS


Data input is the operation of encoding the data and writing them to the
database. The creation of a clean, digital database is a most important and
complex task upon which the usefulness of the GIS depends. Two aspects of the
data need to be considered separately for geographical information systems; these
are first the positional or geographical data necessary to define where the graphic
or cartographic features occur, and second, the associated attributes that record
what the cartographic features represent. It is this ability to process the
cartographic features in terms of their spatial and non-spatial attributes that is the
main distinguishing criterion between automated cartography (where the non-
spatial data relate mainly to colour, line type, symbolism, etc.) and geographical
information processing (where the non-spatial data may record land use, soil
properties, ownership, vegetation types, disease, and so on).

Data input to a geographical information system can be best described


under three headings:

1. Entering the spatial data (Digitizing);


2. Entering the non-spatial, associated attributes; and
3. Linking the spatial and the non-spatial data.

At each stage there should be necessary and proper data verification and
checking procedures to ensure that the resulted database is as free as possible from
error.

5.1. Entering Spatial Data

There is no single method of entering the spatial data to a GIS Rather,


there are several, mutually compatible methods that can be used singly or in
combination The choice of method is governed largely by application, the
available budget, and the type of data being input. The types of data encountered
are existing maps, including field sheets and hand-drawn documents, aerial
photographs, remotely-sensed data from satellite or airborne scanners, point-
sample data (e.g. soil profiles), and data from censuses or other surveys in which
the spatial nature of the data is more implicit than explicit.

The actual method of data input is also dependent on the structure of the database
of the geographical system. Although in an ideal system the user should not have
to worry about whether the data are stored and processed in raster or vector form,
such flexibility is still far from generally available, particularly in low-budget
systems.

40
5.1.1. Manual input to a vector system

The source data are envisaged as points, lines, or areas. The


coordinates of the data are obtained from the reference grid already on the map, or
from reference to a graticule or overlaid grid. They can then be simply typed into
a file or input to a program.

5.1.2. Manual input to a grid system

For a grid system, all point, lines, and areas are envisaged as sets of cells.
The simplest, and most tedious, method of inputting the data is as follows. First, a
grid cell (raster) size is laid over the map. The value of a single map attribute for
each cell is then written down and typed into a text file on the computer. For
example, consider the following single map to be input by gridding. To enter this
map in grid form not only must we decide on the resolution of the grid (here 5
mm) but on how the various soil units will be coded. If the data are to be stored in
an integer matrix, for example, it will first be necessary to replace the As, Bs, and
Cs of the original legend by numbers. These numbers used to represent the
mapped units can be thought of as colour or grey shades, but they could represent
the value of any other spatial property

5.1.3. Digitizing

A digitizer is an electronic or electromagnetic device consisting of a tablet


upon which the map or document can be placed. The most common typed
currently used for mapping and high quality graphics are either the electrical
orthogonal fine wire grid or the electrical wave phase type. Both kinds of
digitizer can be supplied in formats ranging from 11 x 11 in. (27 x 27 cm) to at
least 40 x 60 in. (1 x 1.5 m) as table or free standing models with or without
backlighting. Generally speaking, the smaller digitizers are used to choose
computer graphics commands from a menu; these digitizer are called tablets.
Menu areas can be defined on the larger table used for digitizing maps and plans.
Different sizes of digitizers are as follows:

Digitizers Active Area


A3 Size 12" x 18"
A1 Size 24" x 36"
A0 Size 36" x 48"
A00 Size 42" x 60"

Digitizer accuracy is limited by the resolution of the digitizer itself and by the skill
of the operator. The actual resolution of a digitizer is never exactly that specified,
but lies mostly within + 1 to 2 of the specified resolution elements. Deviation of
digitizer should not exceed + 3 to 6 resolution elements.

41
The co-ordinates of a point on the surface of the digitizer are sent to the
computer by a hand-held magnetic pen, a simple device called a 'mouse' or a
'puck'. For mapping, where considerable accuracy is required, a puck consisting of
a coil embedded in plastic with an accurately located point are digitized by
placing the cross -hairs over it and pressing a control button on the puck.

Most mouse are equipped with at least one button for point digitizing, but
it is now common practice to find manufacturers offering mouse with 4, 12, 16 or
more additional buttons. These buttons can be used for additional program
control, so that the operator can change from digitizing points to lines, for
example, without having to look up, or to move the mouse from the map. The
buttons can also be used to add identifying tables to the points, lines or cells being
digitized so that non-spatial data can later be associated with them.

The principal aim of the digitizer is to input quickly and accurately the
coordinates of point and bounding lines. The map to be digitized is secured to the
digital surface with tape. The scale of the map must be entered to the computer,
following by lines or areas. The co-ordinates of the data two digitizes at the
extreme left-hand bottom (X-min, Y- min) and right hand top (X max, Y-max) to
define the area being used. The absolute co-ordinates of X-min, Y-min, and X-
max, Y-max must then be typed in. All subsequent digitizes within these
coordinates can then be automatically adjusted for alignment and scale.

The digitizer can be used to input vector data that is point, lines, and the
boundaries of areas, by just entering the co-ordinates of these entities. In
combination with the appropriate programs, the digitizer can also allow the
operator to input text or special symbols at the places selected by the digitizers.
The digitizer can also be used to enter data from the thematic maps as run-length
coded that can be converted to full raster or vector format. The digitizing operator
scans over one row of data (from left to right) at a time with the digitizer cursor
with the Y-axis value held constant. Each time there is a change along the X-axis
the operator records a code value for the area just passed through. The XY co-
ordinates of the point are simultaneously recorded as a run length code. The
amount of work involved is related to the resolution required. The conversion to a
vector or full raster database is done by a post-processing program.

Once a map has been digitized it can be stored on tape for future use. As
computer cartography increases in importance, more and more standard
topographic maps and thematic maps of soil geology, land-use, etc. are being
captured in digital form. Digitizing, in spite of modern table digitizers, is time-
consuming. It may take nearly as long to digitize a map accurately as it takes to
draw it by hand. The average digitizing speed is approximately 10cm min -1 and a
detailed map may have as much as 200 m line detail. Put another way, it may take
some 20-40 person hours to digitize the boundaries on a 60 x 40-cm size 1:50000
soil map.

42
5.1.4. Automated scanning

Scanners can be separated into types, those that scan the map in a raster
mode, and those that can scan lines by following them directly. The former type
is usually operated in conjunction with programs to convert the raster image to
vector form when used for scanning topographic maps or the polygon outlines of-
soil or geological maps.

Raster Scanners

The raster scanner works on the simple principle that a point on any part
of a map may have one or two colour, black or white. The scanner incorporates a
source of illumination (usually a low power laser) and a television camera with a
high resolution lens. The camera may be equipped with a special sensor known as
a charge-coupled device or CCD. The scanner is mounted on rails and can be
moved systematically backwards and for-wards over the surface of the source
document. The step size, which controls the cell or pixel size, is very small on
modern, high-quality scanners (c. 25-50 m); on some scanners the step size can
be chosen from a range of sizes to suit the application. The resulting raster data is
a huge number of pixels, registered as 'black' or ' white'. These data must be
further processed to be turned in to useful map images, such as contours, roads
etc. The best possible scanners and excellent software, the resulting digital image
will be far from perfect, for it will contain all the smudges and defects of the
original map plus mistakes caused by contours close together being run into fat
lines instead of several thin lines. The digital image will then need to be cleaned
up interactively and contour heights must be added to reproduce the contour lines
properly. Most maps that people want to scan contain not just black white
information, in the form of continuous lines, but also colours, text, decide lines
and grey tones. The CCD, where CCD stands for charge coupled device, is an
elegant semiconductor device that is able to translate the photons of light falling
on its surface into counts of electrons. A typical CCD can be fitted into the 35
mm camera and can be equipped with normal lens systems to produce a highly
sensitive light collection instrument. Because the CCD actually counts photons
the line information can be obtained by filtering out all signals that do not reach,
or that exceed a certain level. The resulting line patterns can be vectored, or
transformed to another kind of raster structure for direct input to the GIS.
Matching, scale correction, and alignment can be controlled throughout the use of
fiducial marks on the documents which can themselves be scanned and processed
automatically. The only complication is that these correction processes involve
considerable computing that can best be done on specially developed image
analysis equipment.

43
Vector Scanners

The alternative to scanning lines using a raster device and then restoring the
vector structure by using brute force computing is to attempt to scan line images
directly. The transparent copy of the map to be digitized is projected onto a
screen in front of the operator. Using a light cursor, the operator guides a laser
beam to the starting point of a line. The laser beam then follows the line until it
arrives at a junction, or back at its starting point, as would happen with scanning a
contour envelope. Once a line has been scanned, a second laser is used to 'paint
out' the line on the screen by the principle of illuminating the familiar oil spot on
parchment. The operator then guides the scanning laser to the next starting point,
and so on. Before the scanning laser begins, the operator has the opportunity to
attach a label to the line for subsequent addition of non-spatial attributes.

The system has the great advantage that the line scanning is almost
instantaneous, and the scanned data are directly in a scale-correct vector format.
The greatest disadvantages are that a great deal of operator control is essential for
steering the laser at junctions. And that a transparent copy of the clean map needs
to be made. If the original map is a field map, then one can better digitize it
directly. Consequently, the Laser scan system has found most application for
scanning contour envelopes that have already been accurately, scribed and as a
method for making high quality microfiche prints of completed maps.

5.1.5. Spatial data already in digital raster form

All satellite sensors and multi-spectral sensing devices used in aero planes for
low-altitude surveys use scanners to form an electronic image of the terrain.
These electronic images can be transmitted by radio to ground receiving stations
or stored on magnetic media before being converted to visual image by a
computer system.

The scanned data are retained in the form of pixels (picture elements). Each pixel
has a value representing the amount of radiation within a given bandwidth
received by the scanner from the area of the earth's surface covered by the pixel.
This value can be represented visually by a grey scale or colour. Because each
cell can only contain a single value, many scanners are equipped with sensors that
are tuned to a range of carefully chosen wavelengths. For example, the scanners
on the original LANDSAT 1 were tuned to four wavebands (Band 1, 0.5-0.6m;
band 2, 0.6-07m; band 0.7-08m; band 4, 0.8-1.1m) in order to be able to
record differences in water, vegetation, and rock.

The resolution, or the area covered by a single pixel, depends on the height
of the sensor, the focal length of the lens or focusing system, the wavelength of
the radiation and other inherent characteristics of the sensor itself. Pixel sizes for
terrain data vary from the non-square 80 x 80 m of LANDSAT 1 to a few

44
centimeters for aero plane-based high-resolution sensors. Although scanned data
from satellite and aero plane sensors may be in pixel form, their format may not
be compatible with the format of a given GIS and they may need various kinds of
pre- processing. Pre-processing may involve adjusting the resolution and the pixel
shape (both for skewness and squareness), and the cartographic projection in order
to ensure topological compatibility with the database. Pre-processing may also
involve several kinds of data reduction process such as inferring and classifying
leaf area index, or land use by combining data from several wavelengths via
principal component or other transforms, and by semi-automated use of training
algorithms. The resulting mapped images would then be transferred to a more
general GIS for use with other kinds of spatial data.

5.1.6. Other sources of digital spatial data

Many investigations in soil science, hydrology, geology, and ecology rely


on mathematical methods for interpolating the values of measure properties at
unvisited points, from observations taken at discrete (point) locations. The
interpolation can be performed according to any of several methods and the results
can be output either as a matrix of grid cells (pixels) containing values of the
property is question; or as sets of contour lines or boundaries.

Many other kinds of data that are used for geographical analyses within
GIS are not strictly spatially encoded, though they have a strong implicit spatial
character. Examples are census data, list of telephone numbers and number of
schools in town or district.

5.2. Entering Attribute Data

Non-spatial associated attributes (sometimes called feature codes) are


those properties of a spatial entity that need to be handled in the geographical
information system, but which are not of themselves spatial in kind. For example,
a road can be digitized as a set of continuous pixels, or as a vector line entity. The
road can be represented in the spatial part of the GIS by a certain colour, symbol
or data location (overlay). Data about the kind of road (e.g. motorway or dirt
track) can be included in the range of cartographic symbols normally available.
Once the user wishes also to record data about the width of the road, the type of
surface, the construction method, the date of construction, the presence of
manhole covers, water pipes, electricity lines, any specific traffic regulations, the
estimated number of vehicles per hour, etc. all at once, then it is clear that
providing all these data refer to a common spatial entity, they can be efficiently
stored and processed apart from spatial data. By giving each type of data common
identifier they can be efficiently linked in any way desired. Similarly, for points
and areas, whether displayed spatially in restore or vector format, associated
spatial data referring to unique geographic entities or areas can be stored apart so
that they can be processed easily.

45
5.3. Linking spatial and Attribute Data

Although feature codes and identifiers can be attached to graphic entities


directly on input, it is not efficient to enter large numbers of complex non-spatial
attributes interactively. Linking the spatial data to the already digitized points,
lines, and areas can better be done using a special program that requires only that
the digital representation of the points, lines, and areas themselves carry unique
identifiers. Both the identifier and the co-ordinates are thus stored in the database.
Manual entering of simple identifiers as part of normal digitizing is easy and
should not seriously slow down digitizing times, though it is not so easy to
combine with stream digitizing.

For a map that has been produced by raster scanning, however, there is so
far no way to read a unique identifier and automatically to associate it with the
geographical entity. The attachment of unique identifiers must be done manually,
usually by display the scanned map an interactive work station and using a light
pen or similar device to 'pick up' the graphical entities and attach the identifiers to
them.

Unique identifiers can only be directly attached to graphic entities that can
be generated directly in the particular computer system that is being used. For
systems making use of polygon networks, however, the polygons have been
formed they can be given unique identifiers either by interactive digitizing using
an interactive workstation or by using 'point-in-polygon' algorithms to transfer
identifier codes from already digitized point or text entities to the surrounding
polygon.

Following are the steps needed to digitize a set of boundaries and non-spatial
attributes and link them together to form a topological linked database of
polygons. The particular set of steps will vary, depending on the type of mapping
operation involved. On the volume of data to be precedent on the availability or
otherwise of scanners of the appropriate quality

Once both the spatial and the non-spatial data have been input, the linkage
operation provides an ideal chance to verify the quality of both spatial and non-
spatial data. Screening routines can check that every graphic entity receives a
single set of non-spatial data; they can also check that none of the spatial attributes
exceed their expected range of values, or that nonsensical combinations of
attributes or of attributes and geographical entities do not occur. All geographical
entities not passing the screening test can be flagged so that the operator can
quickly and easily repair the errors. Assigning contour heights to digitized
contours is less difficult. The contour lines are displayed on an interactive display
screen, and the operator uses the cursor to draw a temporary straight line
perpendicularly across a number of contours. The lowest and highest contours
crossing the line gear their heights manually and intermediate contour heights are
assigned automatically.

46
Chapter 06

EDITING & TOPOLOGY CREATION IN GIS


6.1. Digitizing Errors

GIS software like ARC/INFO marks potential node errors with special
symbols. The two types of nodes and the potential errors they represent.

1. Pseudo Node
2. Dangling node

Pseudo Node ; drawn with a diamond symbol occurs where a single line connects
with itself an (island) or where only two arcs intersect. Pseudo Node does not
necessarily indicate an error or a problem. Acceptable Pseudo nodes may
represent an island (a spatial pseudo node) or the point where a road changes from
pavement to gravel (an attribute pseudo node).

Dangling Node; represented by a square symbol refers to the unconnected node


of a dangling arc.

Here are the node and labels errors that are identified automatically.

Here are the same errors, as they would be interpreted manually.

47
6.2. Correcting Errors

Correcting the errors is one of the most important steps in constructing your
database. Unless errors are corrected, your area calculations, any analysis and
subsequent maps will not be valid.

Errors What should be done


Missing arc(s) Draw them in
Missing label point(s) Mark position and unique User-1D
More than one label Identify which one(s) to delete
A gap between two arcs or an unclosed Indicate which arc to extend or which
polygon (Undershoot) node to move
An overshoot Indicate whether it should be deleted
Incorrect User-1D values Mark the correct value

6.3. Tolerance to be checked

ARC/INFO uses tolerances, expressed in digitizer units, for coverage


automation and updates steps, such as coverage registration, feature snapping,
coordinate spacing, and so on. These tolerances affect the coverage resolution in
that they specify the amount of coordinate movement allowed during an operation.
The greater the movement allowed the lower the resulting resolution.

RMS (Root Mean Square) error

It is a measure of tie registration accuracy during digitizing and coverage


transformation. ARC/INFO automatically calculates the root mean square error
(or tic registration error) when tics are used to register a map on the digitizer and
during TRANSFORM operations. The RMS value represents the amount of error
between original and new coordinate locations calculated by the transformation
process. The lower the RMS error, the more accurate the digitizing or
transformation will be.

To maintain highly accurate geographic data, the RMS should be kept


under 0.004 inches (or its equivalent measurement in the coordinate system being
used). For less accurate data, the value can be as high as 0.008 inches or its
equivalent measure.

TIC match tolerance

The tic match tolerance is used to ensure accurate map registration on a digitizer.
It is the maximum allowed distance between an existing tic and a tic being
digitized. Beyond this, the digitizing error is unacceptable and requires that the
map be re-registered. The tic match tolerance is used to ensure a low RMS error
during map registration on a digitizer. It measures how accurately each tic

48
location in the coverage matches the digitized location. A tic registration error is
calculated automatically for each tic when the map is registered on the digitizer.

A recommended tic match tolerance will vary depending on the quality of the data
being automated. The value should be no higher than 0.004 inches (0.01016 cm)
for highly accurate map data on 0.008 inches (0.02032 cm) for maps requiring less
accuracy.

In this example, tic 3 must be re-registered because it is farther than the


tic match Tolerance from the expected tie location.

Fuzzy tolerance

The fuzzy tolerance represents the minimum distance separating all arc
coordinates (nodes and vertices) in coverage. By definition, it also defines the
distance a coordinate can move during certain operations. The fuzzy tolerance is
an extremely small distance used to resolve in exact intersection locations due to
limited arithmetic precision of computers. Fuzzy tolerance values typically range
from 1/10,000 to 1/1,000,000 times the width of the coverage extent defined in the
coverage (.BND) file.

Fuzzy tolerance is very useful in cleaning overshoots, undershoots, slivers and


coordinates thinning along arcs. Normally, smaller (not too small) fuzzy tolerance
(less than 0.002 inches) is recommended.

Dangle length values vary with the type of feature to be cleaned. For linear
feature coverage 0.0 is recommended and for a polygon coverage 0.05 inches
(0.127 cm) of equivalent coverage units is often recommended.

49
Node Snap Tolerance

The node snap tolerance is the minimum distance within which two nodes
will be joined (matched) to form one node. The recommended value for
NODESNAP is 0.05 inches (0.0127 cm).

Arc Snap Tolerance

The distance within which a new arc will be extended to intersect an existing arc
is called the arc snapping tolerance. A node is created at the new intersection of
the connecting arcs. The recommended value for ARCSNAP tolerance is 0.05
inches.

50
Weed Tolerance

The weed tolerance is the minimum allowable distance between any two
vertices along an arc. The weed tolerance is used to reduce the number of
coordinates in an arc. The weed tolerance is a parameter that can be test before
adding arc features or to generalize existing arcs. When adding a new arc, a new
vertex within the weed tolerance of the previous vertex is disregarded. A weed
tolerance value of 0.02 inches (0.0508 cm) or equivalent coverage is
recommended.

Grain Tolerance

The grain tolerance controls the number of vertices in an arc and the
distance between them. The smaller the grain tolerance, the closer vertices can be
removed. The grain tolerance is also used for densifying the number of arcs in a
curve. Whereas, grain tolerance will affect the shape of newly created curves, it
has no effect on shape when used to densify existing arcs. 0.02 inches is a
recommended value.

The fuzzy tolerance for a digitizer precision of 0.002 inches is calculated as


follows:

Fuzzy Tolerance = (scale / number of inches per coverage unit) *0.002


For example, 1:250,000 scale map in coverage units of feet yields
(250000/12) *0.002 = 41,660

51
The following table shows the commonly used fuzzy tolerance in different scales:

Input scale Coverage units Fuzzy Tolerance (on the ground)


1:250,000 Feet 41.660
Meters 12.700

1:100,000 Feet 16.620


Meters 5.080

1:24,000 Feet 4.000


Meters 1.219

1:6,000 Feet 1.000


Meters 0.304

Dangle Length

A dangling arc has the same polygon on its left and right sides (as defined
by the polygon internal number) and at least one dangling node. The dangle
length defines the minimum allowed length for a dangling arc in coverage. The
CLEAN process deletes the dangling arcs less than the dangle length.

6.4. Calculating a tolerance for a given map scale

The tolerance value is often needed to be provided in coverage units


rather than inches. The following formula can be used to calculate measures in
feet or meters for various input map scales.

If the Coverage is stored in feet:


tol (feet) = tol (inches) * scale / 12

If the coverage is stored in meters:


tol (meters) = tol (cm) * scale / 100

52
The tolerance (TOL) file of coverage contains values for the coverage‟s fuzzy
tolerance, dangle length and tie match tolerance. TOLERANCE command will
display the previously set tolerance value of the specified coverage.

6.5. Topology Creation

In a GIS, Topology is used to represent the spatial relationships that exist between
geographic data. Spatial relationships are the associations between geographic
data based on their relative locations to one another. People intuitively understand
the spatial relationships that exist between objects. If you look at a road map, you
know that road „X‟ intersects road „Y‟ because you can see that the two roads
cross. The problem for a GIS is to take these intuitive relationships and make the
GIS understand them. The intuitive relationships must be converted into physical
relationships in order to describe the relationships in terms of finite definitions,
such as contiguity, containment or area definition and connectivity. This is
what topology does: it defines data, so that the intuitive relationships are physical
relationships, thus the system understands the spatial relationships that exist
between objects on a map. Finally we can say that Topology is the mathematical
representation of the physical relationships that exists between the
geographical elements.

The ability to create and store topological relationships has a number of


advantages. Topology stores data more efficiently. This allows processing of
larger data sets and faster processing. When topological relationships exist, you
can perform analyses such as modeling the flow through connecting lines in a
network, combining adjacent polygons that have similar characteristics and
overlaying geographic features.

The three major topological concepts of a GIS are:

1. Arcs connect to each other at nodes (Connectivity)


2. Arcs that connect to surround an area define a polygon (Containment or area
definition)
3. Arcs have direction and left and right sides (Contiguity)

6.5.1. Connectivity

The points (x, y pairs) along the arc, called vertices, define the shape of the arc.
The endpoints of the arc are called nodes. Each arc has two nodes: from-node and
to-node. Arcs join only at nodes. By tracking all the arcs that meet at any node,
GIS understands which arcs connect.

In the example given below, arcs 3,4,5 and 6 all join at node 3. With this
information, the computer knows that it is possible to travel along arc 5 and turn
onto arc 3 because they share a common node (3), but it is not possible to turn
directly from arc 5 onto arc 9because arc 5 and arc 9 don‟t share a common node.

53
6.5.2. Containment

Containment is nothing but a area definition. Polygons are represented as


a series of x, y coordinates that connect to enclose an area. Generally most of the
GIS software store polygons as in the figure given below. ARC/INFO, however,
stores the arc defining the polygon, rather than a closed set of x, y pairs. A list of
the arcs that make up each polygon is also stored and used to construct the
polygon when necessary.

54
In the example given above, arc, 4, 6, 7 and 8 comprise polygon 2. (The 0
before the 8 indicates that this arc creates an island inside polygon 2). Though an
arc may appear in the list of arcs for more than one polygon each is stored only
once. Storing each arc only once reduces the amount of data in the database and
also ensures that the boundaries of adjacent polygon do not overlap.

6.5.3. Contiguity

Contiguity defines the direction of each arc and neighbourhoodness because each
arc has direction (a from-node and a to-node). GIS maintains a list of the polygons
on the left and right sides of each arc. Polygon sharing a common arc is adjacent.

6.6. Constructing Topology

To create spatial relationships between the features in coverage, it is necessary to


construct topology. ARC/INFO assigns an internal number to each feature. These
numbers are then used to determine arc connectivity and polygon contiguity.
Once calculated, these values are recorded and stored in a tabular format called a
feature attribute table.

55
6.7. Feature Attribute Tables

Feature attribute tables are Info files associated with each feature type. For
example, constructing topology for a polygon coverage creates a Polygon
Attribute Table (PAT); for a line coverage, an Arc Attribute Table (AAT); and for
a point coverage, a Point Attribute Table (PAT). Each table is composed of rows
and columns. The columns represent an item, such as the perimeter, whereas the
rows represent an individual feature, such as polygon number 2.

This diagram illustrates how data is stored in a PAT. It highlights the four
standard items that are created for every PAT; AREA, PERIMETER, the Cover#,
and the Cover-1D values for the first six records, which correspond to features in
the polygon coverage, are listed.

NOTE# the actual name of the coverage is substituted wherever you see the
word ‘cover’.

The AAT contains seven standard items created in the following order:
FNODE# Internal node number for the beginning of an arc (From –node)
TNODE# Internal node number for the end of an arc (to-node)
LPOLY# Internal number for the left polygon (the cover# in a
corresponding PAT)
RPOLY# Internal number for the right polygon (the cover# in a
corresponding PAT)
LENGTH Length of each arc, measured in coverage units
COVER# Internal arc number (values assigned by ARC/INFO
COVER-ID User-ID (values assigned by the user)

The values for the left and right polygons in an AAT for a coverage
containing only lines always equal to zero.

56
Standard Items in a PAT

The PAT contains four standard items created in the following order:
AREA Area of each polygon, measured in coverage units
PERIMETER Length of each polygon boundary, measured in coverage
units
COVER# Internal polygon number (assigned by ARC/INFO)
COVER-ID User-ID (assigned by the user)

The PAT for a coverage of points always contains zero values for both AREA and
PERIMETER.

BUILD Versus CLEAN

The command BUILD and CLEAN construct topology in ARC/INFO. Although


these commands perform similar function – they both construct topology and
create feature attribute tables – there are important differences. BUILD creates or
updates a feature attribute table for point, line, polygon and node or annotation
coverage. CLEAN, however, performs co-ordinate editing and creates feature
attribute tables for polygon and line coverage only.

Whether you use BUILD or CLEAN for creating polygon or arc topology depends
on how the data was originally digitized; BUILD recognizes only existing
intersections, whereas CLEAN creates intersections wherever lines cross one
another. BUILD assumes that co-ordinate data is correct, whereas CLEAN finds
arcs that cross and places a node at each intersection. In addition, CLEAN correct
undershoots and overshoots within a specified tolerance.

57
Chapter 7

SPATIAL DATA ANALYSIS

7.1. Introduction

Geographic analysis allows us to study and understand the real world


processes by developing and applying manipulation, analysis criteria & models
and to carryout integrated modeling. These criteria illuminate underlying trends in
geographic data, making new information available. A GIS enhances this process
by providing tools which can be combined in meaningful sequence to reveal new
or previously unidentified relationships within or between data sets, thus
increasing better understanding of real world. The results of geographic analysis
can be commercial in the form of maps, reports or both. Integration involves
bringing together diverse information from a variety of sources and analysis of
multi-parameter data to provide answers and solutions to defined problems.

Spatial analysis is the vital part of GIS. It can be done in two ways. One is
the vector based and the other is raster based analysis. This chapter deals with
both vector-based and raster based spatial data analysis. First we shall discuss
general concepts and various types of spatial analysis possible in GIS and then
vector and raster based spatial analysis will be talked about in detail in separate
sections.

7.2. Significance of Spatial Analysis

Spatial analysis is one of the most important uses of GIS. Since the
advent of GIS in the 1980s, many government agencies have invested heavily in
GIS installations, including the purchase of hardware and software and the
construction of mammoth data-bases. Two fundamental functions of GIS have
been widely realized: generation of maps and generation of tabular reports.
Indeed, GISs provide a very effective tool for generating maps and statistical
reports from a database. However, GIS functionality far exceeds the purposes of
mapping and report compilation. In addition to the basic functions related to
automated cartography and data base management systems, the most important
uses of GISs are spatial analysis capabilities. As spatial information is organized
in a GIS, it should be able to answer complex questions regarding space.

Making maps alone does not justify the high cost of building a GIS. The
same maps may be produced using a simpler cartographic package. Likewise, if
the purpose is to generate tabular output, then a simpler database management
system or a statistical package may be a more efficient solution. It is spatial
analysis that requires the logical connections between attribute data and map
features, and the operational procedures built on the spatial relationships among

58
map features. These capabilities make GIS a much more powerful and cost-
effective tool than automated cartographic packages, statistical packages, or data
base management systems. Indeed, functions required for performing spatial
analysis that are not available in either cartographic packages or data base
management systems, are commonly implemented in GIS.

7.3. Using GIS for Spatial Analysis

Spatial analysis in GIS involves three types of operations: Attribute Query- also
known as aspatial query, Spatial Query and Generation of new data sets from the
original database. The scope of spatial analysis ranges from a simple query about
the spatial phenomenon to complicated combinations of attribute queries, spatial
queries, and alterations of original data.

Attribute Query: Requires the processing of attribute data exclusive of spatial


information. In other words, it‟s a process of selecting information by asking
logical questions.

Example: From a database of a city parcel map where every parcel is listed with a
land use code, a simple attribute query may require the identification of all parcels
for a specific land use type. Such a query can be handled through the table without
referencing the parcel map. Because no spatial information is required to answer
this question, the query is considered an attribute query. In this example, the
entries in the attribute table that have a land use code identical to the specified
type are identified.

Parcel No. Size Value Land Use


102 7,500 200,000 Commercial
103 7,500 160,000 Residential
104 9,000 250,000 Commercial
105 6,600 125,000 Residential

Figure 7.1: Sample parcel map and its attribute table

 Listing of Parcel number and value with landuse = „commercial‟ is an


attribute query

 Identification of all parcels within 100-m distance is a spatial query

Spatial Query: Involves selecting features based on location or spatial


relationships, which require processing of spatial information. For instance a
question may be raised about parcels within one mile of the freeway and each

59
parcel. In this case, the answer can be obtained either from a hardcopy map or by
using a GIS with the required geographic information.

Example: Let us take one spatial query example where a request is submitted for
rezoning, all owners whose land is within a certain distance of all parcels that may
be rezoned must be notified for public hearing. A spatial query is required to
identify all parcels within the specified distance. This process can not be
accomplished without spatial information. In other words, the attribute table of the
database alone does not provide sufficient information for solving problems that
involve location.

Parcels for rezoning

Parcels for notification

Figure 7.2: Land owners within a specified distance from the parcel to be rezoned
identified through spatial query

While basic spatial analysis involves some attribute queries and spatial queries,
complicated analysis typically require a series of GIS operations including
multiple attribute and spatial queries, alteration of original data, and generation of
new data sets. The methods for structuring and organizing such operations are
major concern in spatial analysis. An effective spatial analysis is one in which the
best available methods are appropriately employed for different types of attribute
queries, spatial queries, and data alteration. The design of the analysis depends on
the purpose of study.

7.4. GIS Usage in Spatial Analysis

GIS can interrogate geographic features and retrieve associated attribute


information, called identification. It can generate new set of maps by query and
analysis. It also evolves new information by spatial operations. Here are described
some analytical procedures applied with a GIS. GIS operational procedure and
analytical tasks that are particularly useful for spatial analysis include:

 Single layer operations


 Multi layer operations/ Topological overlay
 Spatial Modeling
 Geometric modeling
 Calculating the distance between geographic features
 Calculating area, length and perimeter

60
 Geometric buffers.
 Point pattern Analysis
 Network analysis
 Surface analysis
 Raster/Grid analysis
 Fuzzy Spatial Analysis
 Geo-statistical Tools for Spatial Analysis

1. Single layer operations: are procedures, which correspond to attribute


queries, and alterations of data that operate on a single data layer.

Example: Creating a buffer zone around all streets of a road map is a single layer
operation. As shown in the figure3:

Streets

Buffer Zones

Figure 7.3: Buffer zones extended from streets

2. Multi layer operations: are useful for manipulation of spatial data on multiple
data layers. Figure 4 depicts the overlay of two input data layers representing soil
map and a landuse map respectively. The overlay of these two layers produces the
new map of different combinations of soil and land use is delineated.

101

103

102
Figure 7.4: Combined polygons created from overlay of two data layers

3. Topological overlays: These are multi layer operations which allow


combining features from different layers to form a new map and give new
information and features that were not present in the individual maps. This topic
will be discussed in detail in section of vector based analysis.

61
4. Spatial modeling: involves the construction of explanatory and predictive
models for statistical testing. Figure 5 shows an example of air pollution spatial
modeling. Emissions of a specific particulate are measured at monitoring stations
represented as point locations on the bottom layer. The distribution of air
pollution is believed to be related to soils (silt content and other soil
characteristics), agricultural operations, roads, and topography. With a data base
containing all required data elements, a spatial model can be constructed to
explain the distribution of air pollution based on these related variables.

Figure 7.5: A spatial model for air pollution study

5. Point pattern analysis: deals with the examination and evaluation of spatial
patterns and the processes of point features.

A typical biological survey map is shown in figure 6, in which each point feature
denotes the observation of an endangered species such as big horn sheep in
southern California. The objective of illustrating point features is to determine the
most favorable environmental conditions for this species. Consequently, the
spatial distribution of species can be examined in a point pattern analysis. If the
distribution illustrates a random pattern, it may be difficult to identify significant
factors that influence species distribution. However, if observed locations show a
systematic pattern such as the clusters in this diagram, it is possible to analyze the
animals‟ behavior in terms of environmental characteristics. In general, point
pattern analysis is the first step in studying the spatial distribution of point
features.

62
Figure 7.6: Distribution of an endangered species

6. Network analysis: designed specifically for line features organized in


connected networks, typically applies to transportation problems and location
analysis such as school bus routing, passenger plotting, walking distance, bus stop
optimization, optimum path finding etc.

Figure 7 shows a common application of GIS-based network analysis. Routing is


a major concern for the transportation industry. For instance, trucking companies
must determine the most cost-effective way of connecting stops for pick-up or
delivery. In this example, a route is to be delineated for a truck to pick up
packages at five locations. A routing application can be developed to identify the
most efficient route for any set of pick-up locations. The highlighted line
represents the most cost-effective way of linking the five locations.

Figure 7.7: The most cost effective route

7. Surface analysis: deals with the spatial distribution of surface information in


terms of a three-dimensional structure.

The distribution of any spatial phenomenon can be displayed in a three-


dimensional perspective diagram for visual examination. A surface may represent
the distribution of a variety of phenomena, such as population, crime, market
potential, and topography, among many others. The perspective diagram in figure

63
8 represents topography of the terrain, generated from digital elevation model
(DEM) through a series of GIS-based operations in surface analysis.

Figure 7.8: Perspective diagram representing topography of the terrain derived


from a surface analysis

8. Grid analysis: involves the processing of spatial data in a special, regularly


spaced form.

The following illustration (figure 9) shows a grid-based model of fire progression.


The darkest cells in the grid represent the area where a fire is currently underway.
A fire probability model which incorporates fire behavior in response to
environmental conditions such as wind and topography delineates areas that are
most likely to burn in the next two stages. These areas are represented by lighter
shaded cells. Fire probability models are especially useful to fire fighting
agencies for developing quick-response, effective suppression strategies.

Figure 7.9: A fire behavior model delineates areas of fire progression based on a
grid analysis

In most cases, GIS software provides the most effective tool for performing the
above tasks.

64
7.5. Modeling/Analysis Issues involved in GIS

 Must understand data in totality and their relationships


 Data accuracy and quality
 Selecting right parameters for integration.
 Criteria formulation depending upon aim and objective of analysis.

7.6. Steps for Reforming Geographic Analysis

Before you can perform geographic analysis, you must define your problem and
identify a sequence of operations to produce meaningful results. Here is given a
workflow for performing spatial analysis:

Steps

1. Establish the objectives & criteria for the analysis


2. Prepare data for spatial operations
3. Perform spatial operation
4. Prepare data for tabular analysis
5. Perform tabular operations
6. Evaluate & interpret the results
7. Refine the analysis as necessary

7.7. Vector-Based Spatial Data Analysis

7.7.1. Various types of Overlay Operations in GIS

These are multi layer operations which allow combining features from
different layers to form a new map and give new information and features that
were not present in the individual maps

Topological overlays: Selective overlay of polygons, lines and points enables the
users to generate a map containing features and attributes of interest, extracted
from different themes or layers. (Refer fig. 4). Overlay operations can be
performed on both raster (and grid) and vector maps. In case of raster Map
calculation tool is used to perform overlay. We shall be discussing the various
overlay operations offered by ARC/INFO.

 In topological overlays polygon features of one layer can be combined with


point, line & polygon features of a layer.

 Polygon-in-polygon overlay:

Output is polygon coverage.


Coverages are overlaid two at a time.

65
There is no limit on the number of coverages to be combined.
New FAT is created having information about each newly created features.

 Point – in polygon overlay:

Output is point coverage with additional attributes.


No new point features are created.
No polygon boundaries are copied.

 Line-in-polygon overlay:

Output is line coverage with additional attribute.


No polygon boundaries are copied.
New arc-node topology is created.

 Polygon-in-polygon overlay:

Output is polygon coverage.


Coverages are overlaid two at a time.
There is no limit on the number of coverages to be combined.
New FAT is created having information about each newly created features.

 Point – in polygon overlay:

Output is point coverage with additional attributes.


No new point features are created.
No polygon boundaries are copied.

 Line-in-polygon overlay:

Output is line coverage with additional attribute.


No polygon boundaries are copied.
New arc-node topology is created.

 Logical Operators: Overlay analysis manipulates spatial data organized in


different layers to create combined spatial features according to logical
conditions specified in Boolean algebra with the help of logical and
conditional operators. The logical conditions are specified with operands (data
elements) & operators (relationships among data elements).

 Note: In vector overlay, arithmetic operations are performed with the help of
logical operators. There is no direct way to it.

Common logical operators include AND, OR, XOR (Exclusive OR), and
NOT. Each operation is characterized by specific logical checks of decision
criteria to determine if a condition is true or false. The following table shows the

66
true/ false conditions of the most common Boolean operations. In this table, A &
B are two operands. One (1) implies a true condition and zero (0) implies false.
Thus, if the A condition is true while the B condition is false, then the combined
condition of A & B is false, whereas the combined condition of A OR B is true.

AND: Common Area/ Intersection / Clipping Operation


OR : Union or Addition
NOT: (Inverter)
XOR: Minus

Truth Table of common Boolean operations


A B A AND B A OR B A NOT B B NOT A A XOR B
0 0 0 0 0 0 0
0 1 0 1 0 1 1
1 0 0 1 1 0 1
1 1 1 1 0 0 0

The most common basic multi layer operations are union, intersection, & identify
operations. All three operations merge spatial features on separate data layers to
create new features from the original coverage. The main difference among these
operations is in the way spatial features are selected for processing.

 Conditional Operators:

EQ=Equal to
NE#, < > Not equal to
GE > = Greater than or equal to
LE < =Less than or equal to
GT >Greater than
LT <Less than
CNContaining
NCNot containing

67
OPERATION PRIMARY LAYER OPERATIONLAYER RESULT

CLIP

ERASE

SPLIT

IDENTITY

UNION`

INTERSECT

Figure 7.10: Overlay operations

7.7.2. Various types of Spatial Operations in GIS

 Spatial Join Operations:

IDENTITY, INTERSECT and UNION provide different type of overlay


operations and give flexibility for geographic data manipulation and analysis. In
polygon overlay, features form two map coverages are geometrically intersected
to produce a new set of information. Attributes for these new features are derived
from the attributes of both the original coverages, thereby contain new spatial and
attribute data relationships.

 Feature Extraction Operations:

CLIP, ERASE and RESELECT facilitate extraction of desired features from


coverage either by using template coverage or by using spatial or logical criteria.

68
 Feature merging Operations:

DISSOLVE and ELIMINATE enables the polygon merging to create new


polygon feature and removal of the spurious/sliver polygons resulted due to an
overlay operation respectively.

 Proximal operation Operations:

ARC/INFO provides BUFFER command, which can be used to define a zone


of specified distance around a selected feature. Different sized buffers can be
generated around a selected feature based on associated attribute data.

 Map database merging and splitting Operations:

MAPJOIN and SPLIT commands facilitate the merging or splitting of maps.

 Coordinate transformation Operations:

PROJECT and TRANSFORM commands enable us to do coordinate


transformation using affine or projective transformation based on a set of control
points. PROJECT supports coordinate transformation between any two
projections.

 Note that ESRI-ARC/INFO offers all above functions and commands for
geographic information manipulation and analysis.

 Recent versions of ARC/INFO [ver. 7.1.1& 7.1.2 (UNIX based)] also provide
special functions designed for manipulation and analysis of REGIONS. It
offers host of commands viz. REGIONBUFFER, REGIONDISSOLVE,
AREAQUERY, REGIONQUERY, REGIONSELECT etc.

7.8. Buffer Analysis

Spatial searching (also called buffering or proximity analysis) is based on


the distance derived from certain selected features. Area expansion of features is
commonly known as buffer operation in GIS. It is used to highlight a zone of
interest around a point, line and polygon which in turn can be used to retrieve
attribute data or generate new features. Both constant and variable width buffers
can be generated. Because the buffer operation expands area, it always results in
polygon features.

69
 Salient Features of Buffer Operation

Buffer can be used to generate buffer zones around a point, line or polygon
feature. It allows finding the areas around the feature within the specified buffer
zone.

 BUFFER creates new output coverage by generating buffer zones around


input coverage features.
 Input coverage features can be point, line, polygon or nodes.
 Output coverage features will always be polygons.
 Polygon topology is created for the output coverage. New label points are
created in each polygon.
 Each polygon is flagged according to the type of area it represents & is stored
in an item called INSIDE in the output coverage PAT.

Buffer zones can be controlled in two ways;

 By using buffer distance to specify a single size for all buffer zones
 By specifying a buffer item, optionally, a buffer table to generate multiple
buffer sizes.

Syntax-

BUFFER <in-cover> <out-cover> {buffer-item} {buffer-table}{buffer-


distance}{fuzzy-tolerance} {LINE | POINT | NODE}

Point coverage Buffer zones Output Polygon coverage

 Buffer zones (middle) are generated from point coverage (left) resulting in a
polygon coverage (right).

70
Line coverage Output Polygon coverage

Buffer zones generated from line coverage (left) define polygon coverage (right).

Input polygon coverage Buffer zones Output Polygon coverage

Figure 7.11: Proximity operations

Buffer operation creates expanded polygon coverage (right) from two separate
polygons (left).

71
Chapter 08

RASTER BASED SPATIAL DATA ANALYSIS


This section discusses operational procedures and quantitative methods
for the analysis of spatial data in raster format. In raster analysis, geographic units
are regularly spaced, and the location of each unit is referenced by row and
column positions. Because geographic units are of equal size and identical shape,
area adjustment of geographic units is unnecessary and spatial properties of
geographic entities are relatively easy to trace. All cells in a grid have a positive
position reference, following the left-to-right and top-to-bottom data scan as
shown in figure 13. Every cell in a grid is an individual unit and must be assigned
a value. Depending on the nature of the grid, the value assigned ti al cell can be an
integer or a floating point. When data values are not available for particular cells,
they are described as NODATA cells. NODATA cells differ from cells containing
zero in the sense that zero value is considered to be data.

The regularity in the arrangement of geographic units allows for the


underlying spatial relationships to be efficiently formulated. For instance, the
distance between orthogonal neighbors (neighbors on the same row or column) is
always a constant whereas the distance between two diagonal units can also be
computed as a function of that constant. Therefore, the distance between any pair
of units can be computed from differences in row and column positions.
Furthermore, directional information is readily available for any pair of origin and
destination cells as long as their positions in the grid are known.

Advantages of using the raster format in spatial analysis are listed below:

 Efficient processing: Because geographic units are regularly spaced with


identical spatial properties, multiple layer operations can be processed very
efficiently.

 Numerous existing sources: Grids are the common format for numerous
sources of spatial information including satellite imagery, scanned aerial
photos, and digital elevation models, among others. These data sources have
been adopted in many GIS projects and have become the most common
sources of major geographic databases.

 Different feature types organized in the same layer: For instance, the same
grid may consist of point features, line features, and area features, as long as
different features are assigned different values.

Grid format disadvantages appear below:

72
 Data redundancy: When data elements are organized in a regularly spaced
system, there is a data point at the location of every grid cell, regardless of
whether the data element is needed or not. Although, several compression
techniques are available, the advantages of gridded data are lost whenever the
gridded data format is altered through compression. In most cases, the
compressed data cannot be directly processed for analysis. Instead, the
compressed raster data must first be decompressed in order to take advantage
of spatial regularity.

 Resolution confusion: Gridded data give an unnatural look and unrealistic


presentation unless the resolution is sufficiently high. Conversely, spatial
resolution dictates spatial properties. For instance, some spatial statistics
derived from a distribution may be different, if spatial resolution varies, which
is the result of the well-known scale problem.

 Cell value assignment difficulties: Different methods of cell value


assignment may result in quite different spatial patterns.

8.1. Grid Operations used in Map Algebra

Common operations in grid analysis consist of the following functions, which are
used in Map Algebra to manipulate grid files. The Map Algebra language is a
programming language developed to perform cartographic modeling. Map
Algebra performs following four basic operations:

1. Local functions: that work on every single cell

2. Focal functions: that process the data of each cell based on the information
of a specified neighborhood

3. Zonal functions: that provide operations that work on each group of cells of
identical values, and

4. Global functions: that work on a cell based on the data of the entire grid.

The principal functionality of these operations is described here.

8.1.1. Local Functions

Local functions process a grid on a cell-by-cell basis, that is, each cell is
processed based solely on its own values, without reference to the values of other
cells. In other words, the output value is a function of the value or values of the
cell being processed, regardless of the values of surrounding cells.

For single layer operations, a typical example is changing the value of


each cell by adding or multiplying a constant. In the following example, the input

73
grid contains values ranging from 0 to 4. Blank cells represent NODATA cells.
A simple local function multiplies every cell by a constant of 3. The results are
shown in the output grid at the right. When there is no data for a cell, the
corresponding cell of the output grid remains a blank.

Input Grid Output Grid


2 0 1 1 6 0 3 3
2 3 0 4 X3 = 6 9 0 12
4 2 3 12 6 9
1 1 2 3 3 6

Local functions can also be applied to multiple layers represented by


multiple grids of the same geographic area.

Input Grid Multiplier Grid Output Grid


2 0 1 1 1 1 2 2 2 0 2 2
2 3 0 4 X 1 2 2 2 2 6 0 12
=
4 2 3 2 2 3 3 8 6 9
1 1 2 2 3 3 4 2 3 8

Local functions are not limited to arithmetic computations. Trigonometric,


exponential, and logarithmic and logical expressions are all acceptable for
defining local functions.

8.1.2. Focal Functions

Focal functions process cell data depending on the values of neighboring


cells. For instance, computing the sum of a specified neighborhood and assigning
the sum to the corresponding cell of the output grid is the “focal sum” function.
Neighborhood is defined by a 3 x 3 kernel. For cells closer to the edge where the
regular kernel is not available, a reduced kernel is used and the sum is computed
accordingly. For instance, the upper left corner cell is adjusted by a 2 X 2 kernel.
Thus, the sum of the four values, 2,0,2 and 3 yields 7, which becomes the value of
this cell in the output grid. The value of the second row, second column, is the
sum of nine elements, 2, 0, 1, 2, 3, 0, 4, 2 and 2, and the sum equals 16.

74
Input Grid Output Grid
2 0 1 1 7 8 9 6
2 3 0 4 Focal Sum = 13 16 16 11
4 2 2 3 13 18 20 14
1 1 3 2 8 13 13 10

Another focal function is the mean of the specified neighborhood, the “focal
mean” function. In the above example, this function yields the mean of the eight
adjacent cells and the center cell itself. This is the smoothing function to obtain
the moving average in such a way that the value of each cell is changed into the
average of the specified neighborhood.

Input Grid Output Grid


2 0 1 1 1. 1. 1. 1.
Focal Mean = 8 3 5 5
2 3 0 4 2. 2. 1. 1.
2 0 8 8
4 2 2 3 2. 2. 2. 2.
2 0 2 3
1 1 3 2 2. 2. 2. 2.
0 2 2 5

Other commonly employed focal functions include standard deviation (focal


standard deviation), maximum (focal maximum), minimum (focal minimum), and
range (focal range).

8.1.3. Zonal Functions

Zonal functions process the data of a grid in such a way that cell of the
same zone are analyzed as a group. A zone consists of a number of cells that may
or may not be contiguous. A typical zonal function requires two grids – a zone
grid which defines the size, shape and location of each zone, and a value grid
which is to be processed for analysis. In the zone grid, cells of the same zone are
coded with the same value, while zones are assigned different zone values.

Figure illustrates an example of the zonal function. The objective of this function
is to identify the zonal maximum is to be identified for each zone. In the input
zone grid, there are only three zones with values ranging from 1 to 3. The zone

75
with a value of 1 has five cells, three at the upper right corner and two at the lower
left corner. The procedure involves finding the maximum value among these cells
from the value grid.

Zone Grid Value Grid Output Grid

2 2 1 1 1 2 3 4 5 5 8 8
2 3 3 1 5 6 7 8 5 7 7 8
Zonal =
Max 3 2 1 2 3 4 7 5
1 1 2 2 5 5 5 5 8 8 5 5

Typical zonal functions include zonal mean, zonal standard deviation, zonal sum,
zonal minimum, zonal maximum, zonal range, and zonal variety. Other statistical
and geometric properties may also be derived from additional zonal functions.
For instance, the zonal perimeter function calculates the perimeter of each zone
and assigns the returned value to each cell of the zone in the output grid.

8.1.4. Global Functions

For global functions, the output value of each cell is a function of the entire
grid. As an example, the Euclidean distance function computes the distance from
each cell to the nearest source cell, where source cells are defined in an input grid.
In a square grid, the distance between two orthogonal neighbors is equal to the
size of a cell, or the distance between the centroid locations of adjacent cells.
Likewise, the distance between two diagonal neighbors is equal to the cell size
multiplied by the square root of 2. Distance between non-adjacent cells can be
computed according to their row and column addresses.

In figure, the grid at the left is the source grid in which two clusters of source cells
exist. The source cells labeled 1 are the first cluster, and the cell labeled 2 is a
single-cell source. The Euclidean distance from any source cell is always equal to
0. For any other cell, the output value is the distance from its nearest source cell.

76
Source Grid Output Grid
1 1 2. 1. 0. 0.
0 0 0 0
1 1. 1. 1. 0.
Euclidean distance = 4 0 0 0
2 1. 0. 1. 1.
0 0 0 0
1. 1. 1. 2.
4 0 4 0

In the above example, the measurement of the distance from any cell must include
the entire source grid; therefore this analytical procedure is a global function.

Figure 21 provides an example of the cost distance function. The source grid is
identical to that in the preceding illustration. However, this time a cost grid is
employed to weigh travel cost. The value in each cell of the cost grid indicates
the cost for travelling through that cell. Thus, the cost for travelling from the cell
located in the first row, second column to its adjacent source cell to the right is
half the cost of travelling through itself plus half the cost of travelling through the
neighboring cell.

Source Grid Cost Grid Output Grid

1 1 2 2 4 4 5.0 3.0 0 0
1 4 4 3 3 3.5 2.5 2.8 0
=
2 2 1 4 1 1.5 0 2.5 2.0
2 5 3 3 2.1 3.0 2.8 4.0

Travel cost for each cell is derived from the distance to the nearest source
cell weighted by a cost function

Another useful global function is the cost path function, which identifies the least
cost path from each selected cell to its nearest source cell in terms of cost distance.
These global functions are particularly useful for evaluating the connectivity of a
landscape and the proximity of a cell to any given entities.

8.2. Raster Analysis Operations

In this section some of the important raster based analysis is dealt:

 Renumbering Areas in a Grid File


 Performing a Cost Surface Analysis
 Performing an Optimal Path Analysis
 Performing a Proximity Search

77
8.3. Grid-Based Spatial Analysis

Diffusion modeling and connectivity analysis can be effectively conducted from


grid data. Grid analysis is suitable for these types of problems because of the
grid‟s regular spatial configuration of geographic units.

Diffusion Modeling:

It deals with the process underlying spatial distribution. The constant


distance between adjacent units makes it possible to simulate the progression over
geographic units at a consistent rate. Diffusion modeling has a variety of possible
applications, including wildfire management, disease vector tracking, migration
studies, and innovation diffusion research, among others.

Connectivity Analysis:

Connectivity analysis evaluates inter-separation distance, which is


difficult to calculate in polygon coverage, but can be obtained much more
effectively in a grid.

The connectivity of a landscape measures the degree to which surface


features of a certain type are connected. Landscape connectivity is an important
concern in environmental management. In some cases, effective management of
natural resources requires maximum connectivity of specific features. For
instance, a sufficiently large area of dense forests must be well connected to
provide a habitat for some endangered species to survive. In such cases, forest
management policies must be set to maintain the highest possible level to
connectivity. Connectivity analysis is especially useful for natural resource and
environmental management.

8.4. Geo-statistical Tools for Spatial Analysis

Geostatistics studies spatial variability of regionalized variables: Variables that


have an attribute value and a location in a two or three dimensional space. Tools
to characterize the spatial variability are:

 Spatial Autocorrelation Function and


 Variogram

A variogram is calculated from the variance of pairs of points at different


separation. For several distance classes or lags, all point pairs are identified which
match that separation and the variance is calculated. Repeating this process for
various distance classes yields a variogram: A function in which distance class is
plotted versus variance.

78
Similarly, the spatial autocorrelation can be calculated and plotted in an
autocorrelogram. These functions can be used to measure spatial variability of
point data but also of maps or images.

1. Spatial autocorrelation of point data

The statistical analysis referred to as spatial autocorrelation, examines the


correlation of a random process with itself in space. Many variables that have
discrete values measured at several specific geographic positions (i.e., individual
observations can be approximated by dimensionless points), can be considered
random processes and can thus be analyzed using spatial autocorrelation analysis.
Examples of such phenomena are: Total amount of rainfall, toxic element
concentration, grain size, elevation at triangulated points, etc.

The spatial autocorrelation function, shown in a graph is referred to as spatial


autocorrelogram, showing the correlation between a series of points or a map and
itself for different shifts in space or time. It visualizes the spatial variability of the
phenomena under study. In general, large numbers of pairs of points that are close
to each other on average have a lower variance (i.e., are better correlated), than
pairs of points at larger separation. The autocorrelogram quantifies this
relationship and allows gaining insight into the spatial behavior of the
phenomenon under study.

2. Point interpolation

A point interpolation performs an interpolation on randomly distributed


point values and returns regularly distributed point values. The various
interpolation methods are: Voronoi Tessellation, moving average, trend surface
and moving surface.

 Nearest Neighbor

In this method (Voronoi Tessellation or Thiessen Polygons) the value,


identifier, or class name of the nearest point is assigned to the pixels. It offers a
quick way to obtain a Thiessen map from point data (Figure 26).

79
Figure: (a) An input point map, (b) The output map obtained as the result of the
interpolation operation applying the Voronoi Tessellation method.

 Moving Average

This method performs a weighted averaging on point values of a point map with
domain type Value or Identifier. The output value for a pixel is calculated as the
sum of the products of weights and point values, divided by sum of weights.
Weight values are calculated in such a way those points close to an output pixel
obtain large weights and points further away obtain small weights. Thus, the
values of points close to an output pixel are of greater importance to the output
pixel value, than the values of points that are further away. You have two options
(inverse distance and linear), to specify the method to calculate weight values.
Furthermore, one also has to specify a limiting distance. Points that are further
away from an output pixel than the limiting distance obtain weight zero and thus
have no influence on the output value for that pixel.

 Trend Surface

In this method, pixel values are calculated by fitting a surface through all point
values in the map. The surface may be of the first up to the sixth order. A trend
surface may give a general impression of the data. Surface fitting is performed by
a least square fit.

 Moving Surface

In this method pixel values are calculated by fitting a surface through weighted
point values. Weights for all points are calculated by a weight function. Weights
may, for instance, equal the inverse distance. Points outside a user-specified
limiting distance obtain weight zero by the weight function. Surface fitting is
performed by a least square fit.

3. Pattern analysis

Data in a point map represent of a particular phenomenon. The spatial distribution


of the points in the map can be examined to acquire more knowledge about the
phenomenon and the responsible process. Point pattern analysis is a technique that

80
is used to examine arrangement of point data in space and to obtain information
about the phenomenon.

Fundamental types of patterns: Complete spatial randomness (CSR), clustered and


regular

Spatial arrangement of point data in a point map can be one of the three
fundamental types (Figure 27): Complete spatial randomness (CSR), clustered
pattern and regular pattern. In case of complete spatial randomness (CSR), no
correlation exists between locations of points. In a clustered pattern, subgroups of
points tend to be significantly closer to each other than to other subgroups of
points. In a regular pattern, individual point items tend to repel each other and
distances between adjacent points tend to be further apart than for CSR.

 The nearest neighbor distance:

This method involves analyzing spatial distribution of the points, using


characteristics of the distances between individual points in the map. The pattern
is analyzed by calculating distances between individual points and it‟s first to
sixth nearest neighbor points in the pattern. The results are tested against the
expected distances in CSR. If the individual points are closer than they would be
for CSR, this indicates a clustered pattern. If, on the other hand, individual points
are further apart than they would be in CSR, a more regular pattern is assumed.

 The reflexive nearest neighbors

By this method two points are considered first order of reflexive nearest neighbors
(RNN) if they are each other's nearest neighbor. This definition can be extended to
higher orders; second order RNNs are points that are each other‟s second-nearest
neighbors etc. The frequencies are calculated for RNNs of first to sixth order.

81
Chapter 09

DIGITAL ELEVATION MODEL

A continually varying surface can be represented by iso-lines (contours), and these


contours can be effectively regarded as sets of closed, nested polygons. Set of iso-
lines are very suitable for the display of a continually varying surface, they are not
particularly suitable for numerical analysis or modeling. So other methods have
been development in order to be able to represent and to use effectively
information about the continuous variation of an attribute (usually altitude) over
space.

Any digital representation of the continuous variation of relief over space is


known as a digital elevation model (DEM). A digital elevation model is an
ordered array of numbers that represent spatial distribution of elevations above
some arbitrary datum in the landscape (Moore et. al. 1993). DEM describes the
elevation of any point in a given area in digital format and contains information of
the so-called 'skeleton' lines. Skeleton lines are lines of slope reversals (drainage,
crests) and breaks of slopes.

The term digital terrain model (DTM) is also commonly used. A Digital Terrain
Model (DTM) includes the spatial distribution of terrain attributes. A DTM is a
topographic map in digital format, consisting not only of a DEM, but also the
types of landuse, settlements 'types of drainage lines and so on. Because the term
'terrain' often implies attributes of a landscape other than the altitude of the land-
surface, the term DEM is preferred for models containing only elevation data.
Although DEMs were originally development for modeling relief, they can of
course be used to model the continuous variation of any other attribute Z over a
two-dimensional surface.

9.1. Need of Digital Elevation Model

Digital elevation models have many uses. Among the most important are the
following: -

 Storage of elevation data for digital topographic maps in national database.


Cut-and-fill problems in road design and other civil and military engineering
projects.
 Three-dimensional display of landforms for military purposes (weapon
guidance systems, pilot training) and for landscape design and planning
(landscape architecture).
 For analysis of cross-country visibility (also for military end for land-use
planning purposes).
 For planning routes of roads, locations of dams, etc.

82
 For statistical analysis and comparison of different kinds of terrain.
 For computing slope maps, aspect maps, and slope profiles that can be used
to prepare shaded relief maps, assist geo-morphological studies, or estimate
erosion and run-of.
 As a background for displaying thematic information or for combining relief
data with thematic data such as soil, lands, land-use or vegetation.
 Provide data for image simulation models of landscapes and landscape
processes.
 By replacing altitude by any represent surface of travel time, cost,
population, indices of visual beauty, varying time, cost, population, indices
of visual beauty, levels of pollution, groundwater levels, and so on.

9.2. Various Structures of DEM

There are various structures for DEM are in use. As such none of the data
structure satisfies all requirements. One has to look for the suitable type
depending upon the purpose and also on the computer available.

 Line model
 Triangulated Irregular Network
 Grid Network

9.2.1. Line Model

The line model describes the elevation of terrain by contours, the x, y coordinate
pairs along each contour of specified elevation.

Approximate three-dimensional displays can be made from the contours and


topographic indices can then be derived. The digital line graphs, as, derived by
digitizing contours from topographic map, are often the only available source of
data for the creation of the square grid elevation model. Hence a DLG may serve
as a temporary, intermediate product.

9.2.2. Triangulated Irregular Network (TIN)

The Triangulated Irregular Networks (or TIN) is a system designed by Peuker and
his co-workers (Peuker et al 1978) for digital elevation modeling that avoids the
redundancies of the altitude matrix and which at the same time would also be
more efficient for many types of computation (such as slope) than systems that are
based only on digitized continues, connected triangular facets based on a
Delaunay triangulation of irregularly spaces nodes or observation points. Unlike
the altitude matrices, the TIN allows extra information to be gathered in areas of
complex relief without the need for huge amounts of redundant data to be
gathered from areas of simple relief. Consequently, the data capture process for a

83
TIN can specifically follow ridges, stream lines, and other important topological
features that can be digitized to the accuracy required.

The TIN model is a vector topological structure similar in concept to the fully
topologically defined structures for representing polygon networks with the
exception that the TIN does not have to make provision for island s or holes. The
TIN model regards the nodes of the network as primary entities in the database.
The topological relations are built into the database. The topological relations are
built into the database by constructing pointers from each node to each of its
neighbouring nodes. The neighbour list is sorted clockwise around each node
starting at north. The world outside the area modeled by the TIN is represented by
a dummy node on the 'reverse side ' of the topological sphere on to which the TIN
is projected. This dummy node assists with describing the topology of the border
point and simplifies their processing.

The TIN model splits up the surface into triangular elemental planes. The terrain
surface is sampled by points (nodes) that are located at positions, which capture
the terrain characteristics. Three nodes are reference points for the triangular
facets; they will not be changed by procedures such as interpolation. The
sampling of the topography much include specific points and lines, such as peaks,
pits, line of slope reversal, i.e. drainage divides or drainage lines, and lines
indicating breaks of slopes. The skeleton lines should appear as arcs in the TIN
STRUCTURE. The sampling density should vary with the complexity of the
terrain or the information available. By increasing the density of the node
network a better approximation of the true surface is achieved, but at the expense
of expanding the data set and computation times.

Digital contours can also be used as input, from which the software will select
points first and then generate the TIN from them. The software should avoid
constructing 'FLAT' triangles from points that are on contour.

The advantages of the TIN Structure are;

 The data structure is efficient

 Sizes and shapes of the triangles are variables, as the local terrain conditions
require

 It enables the incorporation of break lines along triangle sides

 The 'saddle point' problem in contouring is avoided

 Terrain attributes (slopes, etc.) can be calculated, although with more


difficulty than is the case with a raster structure.

84
Structure of TIN

There are various ways in which TINs are stored in the database. Goodchild and
Kemp (1990) propose two structures, which are based on triangles and on nodes.
The triangle structure is efficient for slope analysis, while the node structure
allows easy contouring and other traversing procedures, requiring little storage
space.

The most widely used structure is based upon the triangle. Lists of coordinates,
nodes, and of the tri-neighbours are related through pointers. The use of pointers
creates a dynamic internal data structure.

McKenna (1987) proposed a flexible data structure, which has tables of nodes,
triangles and connected nodes, without the neighbouring triangles. Other
structures have been devised, such as the TIN side based structure, which is
efficient for contouring.

Triangulation

Starting from an irregularly spaced set of input measurement points, the input
nodes, and a triangulation method must be applied to create the TIN. Using a
graoh-structure with arcs and vertices, the area is subdivided in to triangles in such
a way that each measurement point becomes a node. Each triangle satisfies the
Delaunay criterion, which means that inside the circle through the three nodes of a
triangle, there is no other node. This procedure maximizes the interior angles and
minimizes the longest side of the triangle. The way this is various polygons could
be constructed first as an intermediate step in the procedure. The 'constrained
triangulation' (Auerbach and Schaeben, 1990) solves the problem of interpolated
lines, e.g. contour lines, crossing the triangle edges of Delaunay based
triangulation.

9.2.3. Grid Structure

Grid-based methods may involve the use of a regularly spaced triangular or square
grid. The elemental area is the cell bounded by three or four adjacent grid points,
depending on the method. The raster based GIS use the square grid network.

The grid density or cell size must be adjusted to that part of the terrain, which has
the highest irregularity, such as highly dissected terrain. One should bear in mind
that when the cell size is relatively large and terrain is steep, the input may
provide more than one height data point per cell, in this case an ad-hoc solution
must be found to cope with the problem.

The advantages of the regular grid method is the simplicity of the data storage,
usually as sequential Z co-ordinates along x (or y) direction, with specified

85
starting point and grid spacing. This simplifies the creation of derivative products
from the DEM.

9.3. Data Sources and Sampling Methods for DEMs

Data about the elevation of the earth's surface are usually obtained from
stereoscopic aerial photographs using suitable photogrammetric instruments
alternatively; Data may be obtained from gowned. Alternatively, be obtained
from ground survey, from sonar or from radar scanning

Here we will discuss methods of photogrammetric sampling for DEMS. Selective


sampling is when sample points are selected prior to or daring the sampling
process, adaptive sampling is when redundant sample point, may be rejected
during +the sampling process on the grounds that they carry to little extra
information. Progressive sampling is when sampling and data analysis are carried
out together, the results of the data analysis dictating how the sampling should
proceed. Sampling can be manual - i.e. a human operator guides the stereo
plotter. This is a slow process and liable to error. Semiautomatic systems have
been developed to guide the operator and these result in improved speeds and
accuracy; they are considered to be better than fully automated systems, which
though fast, may be insufficiently accurate.

Sampling may proceed in various modes, depending on the product required.


Purposive sampling is carried out to digitize contour lines, from lines, profiles,
and morphological lines. For many purposes, however, a more general DEM
based on an altitude matrix is required, and so area sampling is carried out, usually
based on a regular or irregular grid. In this respect, sampling aerial photographs
for spot heights does not differ greatly from the sampling techniques used for
sampling any other spatial property (e.g. random, stratified random or regular
grid. A regular sampling grid has a low adaptability to the scale of the variation
of the surface; in areas of low variation too many points may be sampled, and in
areas of large variation the number of sample points may be too small. If the
operator is given the freedom to make observations at will, the sampling can be
highly subjective.

Progressive sampling involves a series of successive runs, beginning first with a


coarse grid, and then proceeding to grid of higher densities. The grid density is
double on each successive sampling run and the points to be sampled are
determined by a computer analysis of the data obtained on the proceeding run.

The computer analysis proceeds as follows: a square patch of nine points on the
coarsest grid is selected and the height differences between each adjacent pair of
points along the rows and columns are calculated. The second differences are
then calculated. These carry information about the terrain curvature. If the
estimated curvature exceeds a certain threshold, then it is desirable to increase the
density and sample points at the next level of grid density on the next run.

86
Progressive sampling works well when there are no anomalous areas on the
photographs, such as cloud regions, or man-made objects; it is best for regular or
semi-regular terrain with horizontal, slightly tilted or smoothly undulating
surfaces. Moderately rough terrain with distinct morphological features and some
anomalous areas can be better handled by a modification of progressive sampling
called composite sampling. In composite sampling, abrupt steps in the terrain or
the boundaries of natural or anomalous objects are first delineated by hand before
sampling within these areas. Rough terrain types with many abrupt changes may
not be efficiently covered by any semi-Automated progressive or composite
sampling approach, and all data may have to be gathered by selective sampling.

Finally, the data collected by progressive and composite sampling must be


automatically converted to fill the whole altitude matrix uniformly.

9.4. Products Derived from a DEM

Various derived products can be obtained from DEMs whether the latter are in the
form of altitude matrices, sets of irregular point data or triangulated networks.

Products derived from DEMs are;

1. Block diagrams, profiles, and horizons


2. Volume estimation by numerical integration
3. Contour maps
4. Line of sight maps
5. Maps of slope, convexity, concavity, and aspect
6. Shaded relief maps
7. Drainage network and drainage basin delineation

9.5. DEM Applications

Digital Elevation Models have very wide applications. They form one of the input
maps in many GIS projects. They are also the basis for a large number of
derivative information. The most important applications of DEM are;

1. Slope Steepness Maps


Maps show the steepness of the slope in degrees, percentages or radiance for each
pixel.

2. Slope Direction Maps (Slope Aspect Maps)


Maps show the compass direction of the slope (between 0-360 degrees).

3. Slope Convexity Maps


Maps show the change of slope angles within a short distance. From these maps
you can see if the slope is straight, concave or convex in form.

87
4. Hill Shading Maps (Shadow Maps)
Maps show the terrain under an artificial illumination, with bright sides and
shadows.

5. 3-Dimentional Views
Maps show bird‟s eye view of the terrain from a user defined position above the
terrain.

6. Cross-Sections
Maps indicate the altitude of the terrain along a digitized line.

7. Volume Maps (Cut & Fill)


Maps generated by overlaying two DEMs from different periods, which allow you
to quantify the changes in elevation that took place as a result of slope flattened,
road construction, landslides, etc.

88
Chapter 10

Remote Sensing & GIS Data Integration (Methods & Issues)


One of the most persistent and pervasive buzzwords in the field of GIS is
„integration‟. Indeed, the ability of GIS to integrate diverse information is
frequently cited as its major defining attribute and as its major source of power
and flexibility in meeting user needs (Maguire, 1991). By integrating information,
users can take a unified view of their data and large organizations can establish a
single, coherent, corporate information system.

10.1. The Benefits of Information Integration

The benefits of the integration of diverse information within a GIS are widely
recognized:

 A broader range of operations can be performed on integrated information


than on disparate sets of data.

 By linking data sets together, spatial consistency is imposed on them. This


adds value to existing data, making them both a more effective and a more
marketable commodity.

 Through the integration of data which were previously the domain of


individual disciplinary specialists, an interdisciplinary perspective to
geographical problem solving is encouraged.

 Users benefit from the perception that they have access to a seamless
information environment, uncomplicated by the need to consider differences
in data sources, information types, storage devices, computer platforms and so
on.

 Further advantages accrue if several organizations pool their individual data


into a single integrated database

 Data acquisition costs are reduced, because of the elimination of duplicate


data collection and conversion activities.
 Organizations can draw on a broader base of information than hitherto, and
are thus able to address issues that were previously beyond their individual
data resources.

 Organizations can cooperate with one another within the context of shared
information and thereby make more effective management decisions.

89
10.2. Problems in integration using reference systems

One way in which data from maps of the region may be integrated through
relating the location map features to a reference system. This is typically a pair of
numbers (x, y) defining its distances east and north from a fixed point. Since the
earth is a spheroid and not flat, it means that no two-dimensional coordinate
system can represent the earth‟s surface without distortion.

Maps based on latitude and longitude will not necessarily be compatible


with each other because of the different projections available for mapping. Even
within the same map, problems arise because the length of a degree of longitude is
not constant, changing dramatically approaching the poles. It is also the case that
most projections do not represent lines of longitude as straight. Digitizing a map
can only be done with the aid of two orthogonal coordinate axes, and points
located with reference to curved lines cannot simply be integrated with data using
an orthogonal system.

Even those projections with straight-line graticule (of which Mercator is the best
known) cannot easily be integrated with other data sources, because of the
distortions of shape and/or area involved.

Some fundamental GIS operations, like calculations of polygon areas, will


of course be wrong if data have not been input from a map with an equal-area
projection. Other projection (reference) systems based on standard meridians or
parallels are also subject to increase in error with distance from the centre of
the map, leading to obvious errors when maps based different standard lines are
to be integrated. Mapping of large area on a UTM system faces problem since
coordinate systems defined has different zones. Good guidance to the problems of
this nature is dealt in detail by Maling (1973).

Most of these projection problems are well known to surveyors and cartographers
and for many of them solutions exists and can be operationalized. Any GIS
system should allow the conversion of table coordinates to a user-defined set and
many include routines for conversion between different projections.

10.3. Data set problems in integration

A very common problem in data integration is the difference in the area for which
data are available for two different variables. The ideal would be for each variable
needed in the GIS to be mapped separately at the same scale and for the same
aerial extent. In practice, map sheets will overlap and data may not be available
for all the areas required. If two or more map sheets are being input into the same
GIS, problems may occur at the edges of map sheets, even if they are based on a
common referencing system. Such problems are likely to be associated with
linking up line or area phenomena which cross the boundary between the map
sheets. A fundamental operation in any vector-based GIS system is polygon

90
creation, in which the GIS operate on a set of line segments to produce a set of
well-formed polygons. If two points are intended to be the same but are actually
digitized as being at slightly different locations the system has major problems in
deciding whether or not to treat the points as the same or different.

10.4. Integration of vector and raster (remote sensing) data in a GIS

One of the long-standing problems of operational GIS has been the separation of
information derived from maps and information derived from images. The former
has typically been the preserve of vector-oriented GIS and the latter has been the
preserve of image processing systems.

Recently designers of remote sensing systems have attempted to bridge the gap
either by adding mapping capabilities to existing image processing functions or by
providing users with on-line access to GIS information. Some raster GIS provide
vector tracing or annotation functions for raster images. For their part, developers
of vector GIS have tried to add a variety of raster/image processing capabilities;
displaying raster backgrounds beneath vector maps, vectorizing scanned images
and linking images to vector coordinate systems.

91
Chapter 11

DATA QUALITY AND SOURCES OF ERRORS


11.1. Introduction

The GIS database is a model of real world; it goes without saying that there is an
inherent discrepancy between GIS database and the real world it presents. This is
so because all models are approximations. In the whole business of GIS, if
geographical data collected, entered and processed are sufficiently reliable and
error free, would leads to Improvement in environmental resource management
and control. The word „error free‟ doesn‟t mean arise simply from mistakes,
wrong inputs or faulty field surveys but also a statistical error, meaning free from
variation. The spatial database has further problems in the sense that the real
world phenomena are not always discrete. Where is the bank of a shallow river?
Where is the boundary between two soil types? There are but a few questions,
which reveal the fuzzy nature of geographic data as an uncertain data set. Many
soil scientists, geographers have noticed that carefully drawn boundaries or iso-
lines on maps in generals give misrepresentations of change that are often gradual,
vague or fuzzy. Traditional cartography as a medium of spatial data handling had
solution to such problems. But now when digital data are handled digitally they
need to be stored with certain precision. Moreover, the spatial variation of natural
phenomena is not just a local noise function or inaccuracy that can be removed by
collecting more data or by increasing the precision measurement but is often a
fundamental aspect of nature that occurs at all scales (Goodchild, 1980; Mark and
Aronson, 1984). This means residual unmapped variation is a remarkable problem
in geographical information processing. It is therefore, prerequisite to fully
understand how errors arise, or propagated in GIS and what might be the effects
of errors on output.

11.2. Nature of Geographic Data

Geographic database is a model of the real world, the real world being
what the observer perceives. So the first issue to uncertainty in geographic data is
the issue of conceptual modeling of the real world. Conceptual models (also
termed as user view) are developed with the user‟s requirements in consideration.
A road may be an area feature for a particular user, a line feature for another user,
and only an arc of a graph for yet another user.

Bedard (1987) has distinguished two sources of uncertainty introduced


during modeling as:

Limitations inherent to the modeling process itself, some of the examples


being:

92
 Loss of details due to modeling constraints
 Model dependency on purpose or user requirement
 Problem in the translation from cognitive to physical model

Limitations related to model makers i.e. dependency on the model makers


cognitive view of the world. Model of the world is highly dependent on the model
makers experience, education, culture, beliefs, values, and personality. This may
further depends upon the tradeoff between cost and time versus modeling details.

The definition of subset S may be fuzzy. This means that the criteria for
assigning a certain terrain object to a certain class may not be clear. This can be
further defined as fuzziness of identification. This problem arises due to the fact
that the world is a continuum while classification or labeling in a database is
discrete. This fuzziness of identification or definition introduces uncertainty in
the existence of an entity or the classification of an entity type.

The definition of x may be uncertain. The definition of terrain objects is done


through measurements and their processing. All measurements are subject to
limitations. Such limitations on the properties of entities are related to their
qualitative (fuzziness) or quantitative (imprecision) character.

There may be insufficient evidence to assign an element x to the subset S.

11.3. Sources of Errors in GIS Database

Different sources of errors will be encountered in using GIS at different stages


e.g. data collection, data input, data storage, data manipulation, data output, and
use of results. However, errors in data collection, data input and data manipulation
are main concerns in a GIS database. Sources of data collection can be broadly
classified as primary and secondary sources. Primary sources of data collection
include techniques of geodesy, Photogrammetry, photo-interpretation, digital
image processing of remotely sensed data, and surveying. Secondary sources of
data collection are from the existing topographical maps and from the existing
categorical coverage maps. There are three main groups of factors governing the
errors as given below.

Obvious sources of errors

 Age of data
 Arial coverage
 Map scale
 Density of observation
 Accessibility
 Cost

93
Errors resulting from natural variation or from original measurements ;

 Positional accuracy
 Accuracy of content –qualitative and quantitative
 Data entry, output faults, natural variation and observed bias

Errors arises through processing

 Numerical errors in computer


 Faults arising through map overlay and misuse of logic
 Classification and generalization problems

Age of data

Most of the agencies involved in planning and environmental monitoring, in


general use old data as thematic maps on land use, soil water regimes which has
been changing over time. All these data were being collected as per old standard
and not acceptable in today scenario.

Arial coverage

It is desirable to have uniform cover of information for whole area under study.
Generally it is not the case and users have to match resource data with partial level
of information.

Map scale

Map scale plays an important role for specific purpose. Large-scale map shows
finer details and more detailed legends (e.g. 1:25000). Small-scale soil map
(1:250000) displays only soil associations, but very large scale (1:5,000) depicts
detailed soil series legend units. If map scales do not match with what required for
study, then small scale gives insufficient detail and large-scale maps contain too
much information and become burden for handling.

Density of observations

There are still many organization which produce maps without showing amount
of ground truth information upon which it is based. Sampling density provides
rough guide to data quality. It gives idea about an optimum density to resolve the
spatial patterns of interest. Various techniques such as nested sampling
techniques, auto-covariance studies and geo-statistical spatial analysis have gained
importance in determining optimum information needed for mapping an area or
estimating spatial relations of entity.

94
Accessibility

Not all data are equally accessible. Data about land resources are easily available,
but especially military aspects e.g. digital terrain models can be obstructed by
inter-bureau rivalries.

Costs

Collection and input of new data or conversion and reformatting of old data cost
money. For any project, the project manager should be able to assess the cost and
benefit of using existing data compared with new field surveys.

Positional accuracy

The positional accuracy of spatial data depends on type of data. Topographical


data having high degree of positional accuracy which is appropriate for objects
like roads, houses and land parcel boundaries. But changes in slope class or
ground water regime are unlikely occurs at well-defined boundaries. Positional
errors are qualitative or quantitative. Positional errors can result from poor field
work, through distortion or shrinkage of original paper base map, poor quality
vectorizing after raster scanning, through manual digitization, map projection and
reproduction, generalization. Errors due to manual digitizing are estimated to be
0.25 mm at map scale. The scan digitizing could be better in terms of positional
accuracy but with a higher chance of errors in attribute tagging.

Digitizing errors through manual techniques can occur due to width of line, skill
of the operator, complexity of the feature, resolution of the digitizer and density of
features. Furthermore, human problems like tiredness, inappropriate setting or
digitizing table etc. also have a direct effect on the overall accuracy.

Accuracy of content

The accuracy of content determines whether attribute attached to points, lines and
areas in GIS database are entered correctly or not. Qualitative accuracy shows
whether labels are correct or not and quantitative accuracy refers to the level of
bias associated with values assigned.

The sources of variation in data

Errors resulting from mistakes in data entry, these kinds of mistakes can most
easily be detected when they lead to unexpected values in the data; if they result in
allowed values then detection is difficult.

95
Measurement errors

Poor data can result from inaccurate or biased observers or apparatus. Wrong
procedure and mapping methods are also results into large differences in map.
Surveyor knowledge and experiences also makes differences in data collection.

Laboratory errors

Determinations carried out in one laboratory may not be reproducible in another


using same procedure.

Spatial variation and map quality

Many thematic maps about natural properties such as soil and vegetation cannot
take into account of local variation in impurities.

Numerical e rrors in computer

The precision of computer for recording numbers has important consequences for
arithmetic operations and data storage. The use of computer variable having
insufficient precision often leads to errors especially while processing of large
numbers. Rounding errors is unlikely a problem in performing statistical
calculations in large computers when arrays are defined. In case of raster data
processing, data is coded as integers. This leads to inaccurate estimation of area
and length due to quantizing effect. Chrisman (1984) has examined the role of
hardware limitations for storing GIS databases to the desired level of precision.

Table 11.1: The computer word length and digital range precision

No. of bits No. of significant digits Approx. decimal range


16 integer 4 32768 < = x < = +32767
32 integer 9 2x109 < = x < = 2x109
64 integer 18 9x1018 < = x < = 9x1018

Errors arising from overlay and boundary intersection

Many operations in GIS involve the procedure of overlaying two or more spatial
networks, composed of lines, regular grids or irregular polygons. The aim of
overlay may be data conversion (such as converting a vector representation of a
polygon net to raster form by overlaying a grid of given resolution) or data
combination (such as overlay of two or more raster maps or polygon networks).
The purpose can be purely topological manipulation, or manipulation solely on the
properties of the cells or polygons, or both. In doing such operations, possible
errors can result from:

96
1. Representing vector polygon map in form of rasterized grid cells
2. Through logical or arithmetic operation on two or more overlaid grid cells or
polygon networks.
3. Overlaying and intersecting two polygon networks

Figure 11.1: Problems rasterizing a vector map.

Figure11.2: Digitizing a curved line is a sampling process.

Example: Each grid cell in rasterized map can contain only single (mean) value
of an attribute. In the LANDSAT image, each cell had a size of 80m x 80m having
the mean reflectance value averaged over the area of the whole cell. If a part of
cell covered highly reflective surfaces such as road, the resultant effect would be
an over representation of the area of the road in the whole image. This is a source
of classification error particularly when large area grid cells are used to reduce
many features in a complex landscape. (Figure 11.1)

Errors Associated with Digitizing and Geo-coding

In practice, a perfect map does not exist. A perfect map is one having
homogenous mapping units and sharp boundaries. Even the best-drawn maps are
not perfect; a digitizing process introduces extra errors. There are two sources of
potential errors.

1. Associated with the source map


2. Errors associated with digital representation

97
(a) Apart from errors due to paper stretch and dissection, errors also arise at
boundary location because boundaries are not infinity thin. A 1-mm thick line
may cover different area on varying map scales (e.g. 1.25 m wide on scale1:1250
and 100 m wide on scale 1:100,000). This suggests that true dividing line should
be a midpoint of drawn line. When these boundary lines are converted by
digitizing, extra errors arise because with manual digitizing, digitization cannot be
done exactly at the middle of the line and with scanners errors will arise with data
reduction algorithms used.

(b) The relative errors of digitizing straight lines are much less than resulting from
digitizing complex courses because representation of curved shapes depends on
the number of vertices used. Converting continuous curved line into a digital
image involves a sampling process; only a small proportion of points along a
curve are sampled by digitizing as shown in figure 2.

Figure 11.3: Spurious polygons occurring due to overlay

Errors associated with overlaying two or more polygon net works

Laying the polygon boundaries on top of another and looking for boundary
coincidences commonly displays spatial relationships between two or more
thematic maps of an area. Earlier it was done using transparent sheets; but
evolution of digital maps promised better results. However, overlay advantage of
GIS has many drawbacks about data quality and boundary mismatch. Mc Alpine
and Cook (1971) showed that overlay may produce large number of small
polygons on the derived map. It was also described that the number of derived
polygon is more a function of boundary complexity than the number of polygons
on the overlaid maps. Though when the boundaries of polygons on the source
maps are highly correlated, serious problems arise through production of large
number of spurious polygons (see figure 3). Prominent features such as district
boundaries or rivers occur as a part of polygon boundaries in several maps.

98
The spurious polygon problem contains two apparent paradoxes. First the more
accurately each boundary is digitized on separate maps, and the more coordinates
are used, the layer the number of spurious polygons produced.

Spurious polygons are infecting equivalent to the mismatch areas resulting from
rasterizing polygon. Their total area should decrease as digitizing accuracy
increases, but the greater problem is their removal to avoid nonsense as the final
map. The net result of overlaying a soil map (without exact boundary) with a
country topographic map (topologically exact boundaries) is topological boundary
distorted.

Nature of boundaries

Errors in general arise from overlaying maps such as vegetation, soil or geology
having rarely a sharp thin boundary. We should not worry about exact location of
boundary because if we accept the fact that razor sharp boundaries in soil and
vegetation pattern, it rarely occurs. Under this assumption we may face another
problem of overlap zones for which our knowledge is “fuzzy” or unclear. As far
as soil map is concerned, rarely clearly distinction occurs. The most recognizable
boundaries are those associated with abrupt and large changes in the value of
critical soil properties over short distances. These soil changes can be deduced
from edges of river terraces, at the junction between major changes in geology
with abrupt changes in relief. Less easy to recognize and locate are the boundaries
used to divide a zone of continuous variation into compartments simply to aid
classification and mapping. These changes typically occur as a result of climatic
gradients or as a result of gradual soil changes resulting from differential
deposition (across a river flood plain from levees to back swamps).

11.4. Data Quality Parameters

So far we talked about the positional accuracy and attribute accuracy as main
concern in the study of uncertainty in GIS databases. But all data having
positional and attribute components have temporal facts as well. In essence,
geographical entities are defined in terms of spatial, thematic and temporal
dimensions (Lanter and Veregin, 1992) and each dimension has its corresponding
dimension of errors. A GIS is concerned with geographic information and their
analysis and such all the dimensions of errors in the information need equal
attention in the judgment of data quality.

Different approaches for classifying data quality parameters have been proposed.
One of the most accepted and quoted, description of data quality parameters of
digital spatial data is the one listed by the Digital Cartographic Data Standards
Task Force in the United States(DCDSTF, 1988), which consists of the following
components, each having temporal information.

 Positional accuracy

99
 Attribute accuracy
 Logical consistency
 Completeness
 Lineage

These components are considered to be an adequate description of data quality


parameters and are explained below:

Positional accuracy refers to the closeness of location information to its true


value. Measure of positional accuracy can be obtained by the-

 Use of internal evidence; the size of gaps, overshoots and undershoots may be
utilized as a positional accuracy.
 Comparison to an independent source of higher accuracy (e.g. larger scale
map, GPS) raw survey data.
 Use of deductive estimates from the knowledge of the source data their
associated accuracy and the process involved.

Attribute accuracy refers to the closeness of attribute values to their true values.
Accuracy assessment for measurements on a continuous scale (continuous
attribute) is done in a way similar to positional accuracy. Accuracy assessment for
categorical attributes can be made either:

 By deductive estimates, the basis of the deduction should be explained;


 By tests based on independent samples , reported via a misclassification
matrix as counts of sample units cross tabulated by the categories of the
sample and one of the tested material;
 By polygon overlay.

Logical consistency refers to the fidelity of relationships encoded in the database.


They may refer to the geometric structure of the data model (e.g. topologic
consistency) or to the encoded attribute information (e.g. semantic consistency).

Completeness refers to the exhaustiveness of the information in terms of spatial


and attributes properties encoded in the database. It may include information
regarding feature selection criteria, definition and mapping rules and the
deviations from them.

Tests on spatial completeness may be obtained from topological test used


for logical consistency, whereas test for attribute completeness is done by
comparison of a master list of geo-codes to the codes actually appearing in the
database.

100
11.5. Handling Errors in GIS

Present section will briefly examine the practical approaches for handling errors
in GIS.

Storing positional uncertainty in GIS

Handling errors in GIS starts with a thoughtful storage of quality-data.


There are five potential levels of storage of positional uncertainty in a vector
database namely, map, class of objects, polygon, arc, and point (NCGIA, 1990).
For lines and polygon, accuracy can be stored as an attribute of arc (e.g. width of
transition zone between two polygons), or of class of objects (e.g. positional errors
of Class 1 road, or the standard errors of class of points), or map as a whole
(general accuracy of data capture).

A fundamental issue in the storage of positional uncertainty is that the


positional accuracy at one level can not exactly define similar accuracy at another
level. Positional accuracy of points cannot definitely tell about the positional
accuracy of arcs. Similarly positional accuracy of polygons will only confuse the
positional accuracy of common arc. The accuracy of arc very much depends upon
the choice of the vertices. However, to this end, some assumptions and
generalizations ought to be accepted. It needs to be assumed, for practical reasons
that line segments between observed points are straight. Furthermore, accuracy of
arcs given as their epsilon band can be generated from the accuracy of notes and
vertices defining the arc.

Storing attribute accuracy

The issue of attribute accuracy is more complex than positional


uncertainty. Some of the issues could be:

 Given particular location, what is the probability that the location is A.


 Given certain attribute A, A defined, what is the probability that A is A.

More complexity occurs due to multi-valuedness of attributes of a single


geometric primitive.

In general, the attributes accuracy, it may be stored in the following ways:

 As an attribute of the object e.g. polygon is 95% A, or 99% B etc.


 As an attribute of entire class of objects e.g. F is correctly classified 90%.

101
Lineage reporting in spatial databases

The lineage of data in a spatial database can be stored in the geometric primitives
level, e.g. at the level of arcs and nodes.

The issue of lineage reporting has a very fundamental aspect in data quality
handling in spatial databases. All data have history or lineage which has direct
relevance to their quality information. The lineage of data alone stored in a data
base may, when properly analyzed, can be used to generate the other data quality
parameters of the spatial data. A system of storing and updating of the lineage of
data, and tracking through successive GIS operations, together with a knowledge
base could be adopted to handle data quality in GIS. Some work in the use of
lineage-based quality-database in GIS has been reported (e.g. Lanter, 1992, 1993).

Concluding Remarks

Considering GIS as a computerized system of data input, processing,


analysis, and data output, the issue of data quality handling requires to be looked
into at each phase of the process.

Since the field of GIS and spatial databases is rather new, it appears that
the issue of data quality has been getting lesser attention than it deserves. Theory
in spatial databases should incorporate this issue as fundamental. It seems that the
data quality of the geometry as well as of thematic attributes should be treated as
fundamental description of spatial data.

Development of an expert system in the handling of data quality should


result in an efficient and desirable tool in GIS. The lineage of data alone stored in
a data base may, when properly analyzed be used to generate the other quality
parameter of the spatial data. A system of storing and updating of the lineage of
data, and tracking through successive GIS operations, together with a knowledge
base could be a desirable approach of handling data quality in GIS.

102
Chapter 12

NETWORK ANALYSIS
One of the important application areas of a GIS is facility management based on
network analysis. Specific areas of application could be to determine optimal and
shortest paths as a routing analysis, or allocating resources based on demand and
capacity. Much of this analysis is based on a set of connected linear features. The
network, that forms a framework through which resources are flowing. This
framework could be visualized as a road network through which vehicles move, a
drainage network through which water flows, electric links for electricity passage
and so on. The real world is full of such networks and these can be well
represented as link graphs associated with a set of flow constraints. Therefore,
the representation, management and manipulation of network linear features
are an important module of a GIS.

12.1. What is a Network

Network is forming the infrastructure of the modern world. It is the system of


interconnected linear features through which resources are transported or
communication is achieved. The movement of people, transportation, distribution
of services and allocation and delivery of resources occurs through a network
system. The ARC/INFO NETWORK module facilitates the modeling of spatial
networks.

Figure 12.1: Drainage and road network

Network Module provides tools to;

 Find the shortest or minimum impedance path through a network (PATH)


 Most efficient path to a series of locations (TOUR)
 Assigns a portion of network to a location (ALLOCATION)
 Whether the location is connected to other (TRACING)
 Model the accessibility of location and interaction between location based on
cost of travel and communication (SPATIAL INTERACTION).

103
12.2. Network Data Model

Network data model is a representation of the components and


characteristics of the real world network system. The model consists of network
links, turns, stops, facility points, blocks and nodes. The relationship between the
characteristics of physical network systems is represented by the elements of
network model.

Network Elements in a GIS

Each network consists of different elements, each of which could be associated


with an attribute defining the characteristics of the element. These two, in
combination, form the important part of the network analysis. The different
elements and their attributes are:

a) Links: The basic element of the network and serving as a conduit for the
movement of resources is a link. The connectivity of the links is an important
aspect in the network. The links have two types of attributes:

(i) Resistance , which describes the amount of impedance offered by the links for
the flow of resources. The cost associated with traversing an entire network link
is termed the impedance of the link. A network link has two impedances; 'from-
to' impedance and 'to-from' impedance. The impedance used depends upon the
direction of arc traversed. Impedance values are contained in the items defined by
the user in the attribute table. Negative link impedance signifies that the link
cannot be traversed in that direction. The higher the value, the more the resistance
offered to movement. Link resistance is a measure of the impedance for
traversing a link from one end to the other. The resistance may be unidirectional
or bi-directional. This is generally a user-defined value for each link and could
depend upon the characteristic of the network.

(ii) Resource Demand is the number or amount of resources associated with each
line feature. For example, the number of students living along each road or the
amount of water required by people living along each pipeline, road capacity or
width and so on. Resource demand is especially critical for allocation problems.

b) Turns, qualifying the direction of flow of resources from one link to another
connected through a node. The turn is direction-specific and the flow of resources
could be restricted by the turns, for example, restriction of traffic taking a right or
left turn at an intersection. The attribute of turns relates to the additional
resistance offered for negotiating a turn from one link to another. The classic
example of this is the time taken to negotiate left turns and right turns at a traffic
intersection would be different. As turns are direction-specific, the attributes
could be different for the turns in different directions.

104
c) Stops, representing locations where resources would be picked up or dropped
off. For example, a bus stop could be defined as a stop where passengers can be
picked up and/or dropped off. Resource demand is attributes of stops and is a
measure of the amount of resources to be picked up or dropped off.

d) Facility Points, which are locations that have a supply of resources to


distribute to links in the network, e.g., a reservoir that supplies a specific volume
of water through pipe links. Facility points are also called as 'Centers‟. Resource
capacity is an important attribute of a facility point and is a measure of the total
resources that can be supplied to/by the facility point. Another important attribute
of the facility point is the influence zone of the facility point; a measure of the
limit up to which resources can be supplied from or received by a facility point.

e) Blocks, representing the locations through which no resources will flow.


Blocks are basically obstacles defined on the network to simulate and visualize a
specific condition of resource movement. Blocks do not have any attributes.

f) Nodes - These are end-points of the network links. Links are always
connected at nodes. Nodes represent intersection or interchange of road network,
switches in power grid etc.

Figure 12.2: Elements in a Network

105
12.3. Network Analysis

Network analysis in a GIS may be dependent on the utilities under concern, as


each utility service would have customized requirements as discussed above.
However, fundamental to all these requirements would be the following analysis;

a) Path Determination: Path finding is the process of calculating an optimal


path through series of points in a network and simulating the flow of resources
through them. Path finding functions could be categorized into two major groups,
the applications of which are different:

i) Source-Destination path, as an optimal path from a pre-defined source to a


pre-defined destination. In this case, the path of least resistance is determined
from the source to the destination by evaluating the link resistance and also the
turn resistance for all connected links. The minimum resistance is a cumulation of
the link and turn resistance.

ii) Optimal Cyclic path, mainly as an implementation of the set covering


problem, an example of which could be the Travelling Salesman‟s Problem. In
this case, the problem is to determine the optimal path after visiting all or a
specified set of links in the network. In this case, the optimal path is determined
from a matrix of resistance for each pair of links in the network. The matrix is
evaluated to determine the order of visiting links in the network and to define the
actual path. Optimal cycle path is also called 'TOUR'. It determines the order in
which the stops are visited. The ordering of stops is accomplished by determining
the minimum path between each stop and every stop, based on the impedances.

b) Resource allocation or distribution analysis : Resource allocation is a type


of districting problem where links are associated with resource centers. As links
are assigned to a centre, a portion of the resource of the centre is allocated to meet
the demand of the link. Links are allocated based on the least resistance rule (the
cumulative resistance of the link resistance and turn resistance, as in the path-
finding procedure). The allocation procedures occur simultaneously for each
facility point and for all possible turns/links at an intersection in successive
passes.

c) Utility locating or siting analysis: A variation of the allocation problem is to


determine the location of facility points for a network (as against the earlier case
where the facility point locations are specified) and also determine their capacities
to meet the demands of the network. This variation is useful for planning facility
locations and capacities. The location of facilities is determined based on a set of
constraints defined by the facility points such as capacity associated with facility,
influence zone area, number of facilities and so on, and the flow restrictions such
as demands of each link and so on.

106
12.4. Application Context of Network in GIS

Network applications in a GIS are oriented towards planning, administering and


operational management of resource facilities. Some of the crucial application
areas are:

a) Traffic routing for transportation planners, facility managers for efficient


routing of facility movement and so on.

b) Facilities management for planning demand-capacity ratios for resources and


optimal allocation of resources based on demand and capacity.

c) Districting or partitioning for associating links to facilities to define a service


area for each facility.

d) Facility locating, siting and locating of different facilities.

107
Chapter 13

CHARACTERISTICS OF LARGE AREA DATABASE


(GLOBAL AND REGIONAL)
Interdisciplinary study of the Earth as a global system has revolutionized
the thinking of earth scientists. Multidisciplinary databases of global thematic data
are currently being collected, developed and integrated to predict the nature of the
processes involved. This concept has been called Earth System Science (Earth
System Sciences Committee, 1988) or, simply known by the phrase which
describes its predictive goal, Global change (IGBP Special Committee 1988).
While supercomputer based modeling systems are often needed to handle
theoretical studies, GIS an extension of computerized data handling and analysis
are a natural tool for observationally based studies.

13.1. Data Types and Availability

Accurate and comprehensive global databases must be available for use in GIS for
understanding earth processes. Global databases are divided into three categories:

a) The Global Reference Data Sets are those which represent normal or long
term averages (e.g. climatic parameters, soils, population density etc.)

b) The second category is Global Monitoring Data Sets (Synoptic), which reflect
the Earth change on a temporal and spatial scale. These are snapshots of Earth
conditions such as snow cover, wind and precipitation.

c) The third category is the Global Monitoring Data Sets (Time Series) e.g.,
surface heat flux, vegetation index and sea surface temperature.

For all of the above types of global databases, satellite remote sensing provides
perhaps the most powerful tool to study the Earth.

13.2. Need for GIS in Global Study

The digital databases involved will be global in nature, multi-spatial, multi-


temporal and multi-disciplinary. Regardless of the form and format, the databases
will be large. There is a need to improve the specialized GIS functions that would
support analysis of patterns, trends and associations and to add the means for
linking GIS analysis with theoretical modeling systems.

13.3. Global Programs

Many national and international programs are underway to study global


phenomena. In September 1986, The International council of Scientific Unions

108
(ICSU) has established the International Geosphere – biosphere Programs (IGBP).
Its objective is to describe and understand the interactive, physical need biological
processes that regulate the earth system; the unique environment provides life, the
changes that are occurring in the system. The IGBP Working Group has begun
four projects, The End to End systems Study on Surface Temperature Data and an
IGBP Data Dictionary Project.

The other global based research programs are;

1. US Global Change Research Program


2. NASA Earth System Science Program
3. NOAA Climate and Global Change Program
4. International Cartographic Association (ICA)
5. World Digital Database for Environmental Science
6. Global Topography Database
7. International Soil Science Society (ISSS)
8. Soil and Terrain Database Project
9. UNEP Global Resources Information Database (GRID)

13.4. Global Database in GIS

GIS has two philosophies of implementation towards creating global databases.


The first is using global databases as an entire entity in the applications, not
merely using pieces of a global database. The other concept, which is currently
much more common is a regional application drawn from global databases.

Pre-existing regional data sets may be sources for global compilations. However,
there are inherent problems in such an approach, for example rectifying legends
designed for regional study to global contexts and filling in gaps in coverages
between existing regional compilations.

13.5. Regional Database Systems

There are numerous examples of large regional applications of global


databases used for multi-thematic studies. Most of these applications have a
requirement to perform regional assessments but also have a broader requirement
to address an issue on a global basis. Therefore, the user must have both a set of
global databases to draw from and a GIS which will handle global aspects such as
scale and projection. Generally, projections become a less critical consideration
for site specific mapping of small areas. However, for large area of a regional and
continental scale, specialized projections become a less critical consideration for
site specific mapping of small areas. However, for larger areas of a regional and
continental scale, specialized projections become desirable for preserving distance
and area value. Presently, the latitude/ longitude grid is the most common and
simplest scheme to work with although other more complex and efficient methods
are being developed.

109
UNEP‟s Global Resources Information Database (GRID) is an example of
regional database concept. Many parts of the Earth will be studied using GRID.
The GIS used by GRID is a combination of commercial and public software.

13.6. Global Database System

NOAA‟s National Geophysical Data Center has developed a Geophysical


Data System (GEODAS). GEODAS is an automated data assimilation, inventory,
quality control, selection and retrieval system based on customized software. It is
a graphics based system that provides procedures to assimilate data while
automatically building a directory and detailed inventory of the database. In
addition, the inventory allows selection and retrieval by spatial or temporal
criteria. Selection can also be performed by cruise identifier, institution, survey
parameter etc.

Another application of GIS to global database is shown by the


development of a global relief database. A combination of 10 minute and 5 minute
latitude – longitude gridded data was used. Some of the data were based on actual
measurements while others were on interfaces.

13.7. National (Natural) Resources Information System (NRIS)

Department of Space (DOS), Govt. of India has initiated National (Natural)


Resources Information System (NRIS) project for providing Information for
decision makers and encompasses information on natural resources related to
land, water, forests, minerals, soils, oceans etc. and socio – economic information
such as demographic data, amenities, infrastructure etc. The integration of these
sets of data would aid the decision making process for systematic resources
utilization and also aid sustainable development (IMSD), the program launched by
DOS during 1987, wherein Remote sensing based integrated land and water
resources development plans are being prepared and implemented for 174
problem districts.

The NRIS is visualized as a network of GIS based nodes covering the districts,
state and the entire country. These nodes will be the repositories of resource
information in the spatial domain and will provide vital input to decision making
at district/ state/ center levels.

13.8. Future Considerations

The real challenge for the future will be the sheer volume of data, which GIS
will need to handle to derive usable information. GIS based systems coupled with
Artificial Intelligence (AI) tools might be the answer which holds promise for
improving GIS activity for Global Applications. Natural language interfaces on a

110
GIS would allow the resource scientist to use the power of GIS for reducing the
need for a go between who is an expert on the software but not on the scientific
application. Expert systems contain querying capabilities based partly on
knowledge acquired from experts in the scientific field under study. Machine
vision involves the detection and identification of patterns in the images.

111
Chapter 14

TRENDS OF GEO-INFORMATICS
Geographic Information System, tastefully known as GIS, has multiple meanings
and the widely accepted views are the effectiveness of map processing, databases
and spatial analysis (Maguire D.J. et. al) with the help of human expertise.
Visualizing what we see on the ground over the digital domain is what the cause
of latest technology, Geo-informatics. The GIS technology has been making rapid
strides, keeping pace with technological progress. However, at the same time, the
developments in the area of applications and operational use of GIS are not
keeping pace with technological progress (Dasgupta, A.R., 1993). Main reasons
for the imbalances are due to problems in organizing and sharing of information
across agencies, regions still exists. Incomplete data directories and semantic
incompatibility, as well as bureaucratic, institutional and legal forces govern
sharing of information. Standardization is very important because more and more
users seek to mix and match hardware, software and information resources. The
trend towards OPEN systems has facilitated information sharing by removing
many incompatibilities in hardware interfaces, communication protocols,
operating systems, query languages and graphical user environments (Croswell
and Ahner, 1990). Commercial software packages are available to integrated
applications running on UNIX based workstations and PCs.

At the initial phase GIS capabilities and its utility has started in a limited way.
But the present trend is that GIS is opening up to many applications and
continuous developments are taking place in order to bring out its capabilities to
newer areas. Recent trends in Geo-informatics are:

1. Natural Resources Management


2. Telecom GIS
3. Automated Mapping / Facilities Management (AM/FM)
4. OLAP & Data Mining
5. Virtual 3D-GIS
6. On line / Internet GIS
7. OGC
8. Spatial Multimedia

All are aware that GIS applications are enormous. GIS is capable to identify,
locate, perform change studies, pattern analysis and modeling for Natural
Resources Management such as action plan for integrated sustainable
development, locating waste disposal sites and natural hazardous sites and identity
and analyze minerals, coal, timber and water resources. GIS is also a tool for
decision-makers for land and water resources assessment, development and
management.

112
There is enormous potential for infrastructure development. One among is
Telecommunications (Telecom GIS). GIS can be used advantageous to create
large digital database (spatial and non-spatial) for locating poles (GPS based
inventory can be used for precise positioning), network usage, cable fill and
decibel loss and will help for better decision making. There are indications that
telecommunication GIS is catching on (GITA, 1998). At the concluding panel of
GITA conference the following comment from a vendor was heard “Users do not
want to build the car; they want to drive it”. It indicates that GIS has moved
beyond users being expected to figure out what the software before applying it to
real tasks.

AM/FM is the combination of Automated Mapping and Facility


Management. AM able to produce maps and FM provides inventories of
facilities. AM/FM links the two to provide geographic access to facility
inventories. It can be used for a wide range of uses such as planning, dispatching,
accounting, marketing and real estate. Users appreciate generic GIS tools, but
they want very specific industry applications that work without customization.
Few software packages are available to meet the network management needs of
electric, gas, water and wastewater utilities etc. More and more AM/FM specific
GIS software packages will be available in the near future as the demand also
increasing for IT based solutions from the industry.

Online Analytical Processing (OLAP) is one of the advanced developments in


the field of GIS. It deals with database modeling and the questions it can answer
are what influences sales and why did it occur, where these type of answers are
not possible from relational database models. There is growing interest with the
business users wanting to take advantage of OLAP and data mining. Data Mining
is the decision support process in which we search for patterns of information in
data. Advanced predictive modeling is predicting new results from existing data,
to make better decisions. Even some people feel both OLAP and data mining are
same. Whatever it may be it will certainly help the user particularly business
organizations for effective utilization.

Since the inception of GIS people/users are fascinated and curious to project the
terrain and real objects in three dimensional (3D) views. Initial software packages
came with 2 1/2 (i.e. only the height information projected for display purpose).
Recent developments in GIS have brought Virtual 3D GIS with the capabilities of
visualization, spatial analysis, modeling and animation. Specific hardware
components like 3D accelerator cards are also available in the market to meet the
high performance requirements of 3D GIS (PC world, 1998).

On line (Internet) GIS can perform two major Internet GIS Applications
i.e. Server-side applications and client side applications based on Common
Gateway Interfaces (CGI) or gateway script. Server-side GIS completely relies on
GIS server to carry out the analysis and generation of output. The user generates a
request at his browser that is sent over the Internet to the GIS server, which

113
returns the requested output. Anybody who has Internet access come across and
uses the CGI based Internet and all the GIS analysis and outputs can be prepared
with the help of GIS server software.

OGC ( Open GIS Consortium, Inc), an international consortium of more


than 100 corporations, agencies and universities, co-ordinates collaborative
development of the open GIS specification and collaborative business
development to support full integration of geo-spatial data and geo-processing
resources into mainstream computing. In order to provide access to heterogeneous
geographic data sources, OGC has finalized Simple Feature specifications enable
programmers to write application software. OGC has planned to release Revision
1.0 specifications early in November „98. Whole GIS community is looking
forward in order to take advantage of this opportunity to access GIS data analyze
from different sources (http://www.open.org).

Spatial Multimedia: Availability of high speed hardware resources and


series of self reinforcing cycles of software innovations paved way of multimedia
i.e. the technology which combines video, audio & text. Multimedia in GIS is
referred to visualization of tool set extensions, the addition of sound and video
data types to GIS and developing hyper media spatial databases. Multimedia GIS
can be defined as “the use hypertext systems to create webs of multimedia
resources organized by theme or location” (Raper, 1995) thereby emphasizing the
information structuring issues.

The representational ability of present day GIS has its own limits of dealing
spatial elements in vector or raster form linked to alphanumeric data arranged in
tables. It cannot handle qualitative data which the human eye or ear can easily
recognize and the support for visualization of relationships between data in the
spatial data base is generally poor. The ability of multimedia technology to add to
the representational scope of systems has proved attractive to users of spatial data:
This has led to the development of multimedia GIS (Raper, 1995)

14.1. Automated Mapping & Facilities Management

Facilities management is a very influential, well organized GIS application


area has major representation from utility companies - telephone, electricity, gas
projects tend to be very large, well funded and critical to the efficient operation of
the utility umbrella term used by these organizations is AM/FM - Automated
Mapping and Facilities Management.

AM/FM is primarily distinguished by the context of applications: utilities,


urban facilities management. AM/FM is an information management tool and
integration of two tools; automated mapping produces maps and facilities
management provides digital inventories of facilities. AM/FM links the two to
provide geographical access to facility inventories

114
14.2. Automated Mapping

With control of different layers of information, provides a variety of ways to


output from a single database. E.g. by turning on or off layers, a street light map
or electrical feeder map could be produced from the same database.

Automated mapping capabilities

1. Better map maintenance is a major benefit of automated mapping


2. Productivity increases 2 to 10 times over manual methods
3. No problem with physical or content deterioration of maps since they can be
produced as needed or as updated
4. Centralized control is a major benefit to major corporations
5. Paper documents are replaced by a central digital store
6. Copies can be produced and distributed as and when necessary
7. Computerization provides easier but better controlled access
8. In the paper world, when a document was checked out no one else could
access the information - elaborate systems were set up to ensure return of the
document in a digital world we can control who can access and for what
purpose (read only, edit etc.)

Automated mapping shortcomings

1. Provides only graphic output, no means of query;


2. Cannot obtain attributes of objects, cannot access objects by their attributes
because objects are not connected topologically,
3. Cannot carry out sophisticated analysis of networks
4. Cannot relate map information to other records

14.3. Facilities Management Systems

Exist in many organizations to manage the resources effectively.

Facilities management systems capabilities

1. Consist of computerized inventories of the organization's facilities


2. Capabilities for sorting, maintaining and reporting information
(E.g. many utilities have pole files containing information on each pole, date
of installation)
3. Many types of reports can be generated
4. Can maintain a digital representation of the facility network to allow
engineering, network analysis in tabular, numeric form (not spatial)

Facilities management systems shortcomings

1. No geographic capabilities

115
2. Can generate only alphanumeric reports
3. Cannot access records geographically
4. Cannot generate geographic reports (maps)
5. Redundancy must arise if both automated mapping and facilities management
systems are maintained, one for mapping and the other for inventory

AM/FM

Combine automated mapping and facilities management into one system


geographic information provides a new window into the facilities database
information can be retrieved by pointing to a map image e.g. point to an electrical
cable and retrieve KVA (kilovolt-ampere) rating, length, mortality, or list of
transformers connected to it. AM/FM is a very successful marriage of two
traditional concepts.

AM/FM examples

1. Locating pole or facility item by street address


2. Generate reports on street lighting - does it meet standards in specified area?
3. Generate maps of electrical circuits or feeders at prescribed scale
4. Produce continuing reports on property
5. Provide reports for tax purposes

Benefits of AM/FM systems

1. Reduces the cost to maintain information


2. No physical maps to deteriorate, get lost, misfiled
3. Data is more accessible and secure
4. Impact the organization by integrating operations
5. Departments must cooperate because they now share data
6. Reduces potential duplication between departments
7. Ensures consistency of information base across departments
8. New forms of report available
9. New information provides basis for new forms of management

14.4. Characteristics of AM/FM

Scale: Service maps are needed at a scale of 1" to 100'. General systems planning
may require scales down to 1:1,000,000, e.g. for electrical utilities

Data sources: Data generally collected during construction or maintenance, using


sketches on standard base maps

Data quality: High data quality is desirable, e.g. accurate positioning of


underground facilities, but not always attainable in practice, much urban

116
infrastructure (e.g. water, sewer pipes) may be more than 100 years old and many
historical records may be missing.

Functionality: AM/FM systems stress addition of geographical access to existing


databases. Database likely to remain on mainframe geographical access may be
from workstation with geographical data maintained locally. Non-geographical
data characterized by frequent transactions - requires access to database from
many workstations. Geographical data input independently using specialized
graphics workstation. Backcloth used for input backcloth is a base map showing
the facility locations to be digitized as well as other geographic details.

E.g. streets, parcels digitizing may be done on screen with backcloth displayed in
raster form using video technology however base map itself is not entered into
database. Some vendors supplying the AM/FM market argue that: AM/FM
applications are literally "geographic information systems" - providing
geographically based access to information systems which provide analysis and
modeling functions are better described as "spatial analysis systems"

Organizations

AM/FM International - mostly utilities with strong representation by vendors,


governments little involvement as yet in education, research branches in many
countries

117
Chapter 15

OVERVIEW OF CURRENT GIS PACKAGES

15.1. Introduction

As a technology and an industry, GIS is experiencing one of the most active


periods of change in its 30 years history. Users stand to benefit with improved
products, more standardized systems, better integration, and easier maintenance,
lower costs of acquisition and maintenance and greater out-of-the box
functionality. Software developers and service providers also stand to benefit, but
only if they can navigate the uncertain waters of change and marshal, their
resources to meet increasingly demanding user requirements.

The worldwide GIS market enjoyed strong growth in 1998 – possibly as high as
20 percent and the prospects for 1999 may be just as bright for most areas,
according to analysts. But the numbers alone may not tell the whole story.

15.2. GIS Packages

1. ARC/INFO
2. ILWIS
3. Map Info (The Information Discovery Company)
4. Intergraph MGE
5. Intergraph Geo-media
6. ENVI

15.3. GIS Packages - A Survey

There are a considerable number of GIS packages available in the market. Indian
GIS packages are finding way into many user applications and a separate market
segment is identifiable for these. However, foreign GIS packages have also a
market share - developed time and the strength of their installations worldwide. In
this section, we cover the details of a over some leading commercial GIS
packages. To provide an overview, lists some of the latest packages in the market
and the broad specifications. For more detailed information, it would be
appropriate to contact the vendors who are listed in Table - 1.2. The world
scenario for GIS packages appears bright with a large number of these developed
for specific purposes.

15.4. GIS Packages of World Market

As is seen from Table - 1.1, there are a large number of GIS packages available in
the world market - mainly from America, Canadian and British companies.

118
However, a few of the leading packages have been discussed here as part of the
survey.

ARC/ INFO GIS

ARC/ INFO is one of the first GIS packages that was available commercially and
is a package being used all over the world. ARC/ INFO has been developed by
Environmental Systems Research Institute (ESRI), Redlands, USA.

Data Structure

ARC/ INFO is a vector based GIS package, capable of handling both spatial and
non - spatial data. It organizes geographical data using vector topological models
non - spatial data using relational models in a DBMS. Each vector is either a point
feature or a vertex of an arc; each arc is either a line feature or one line for a
polygon feature. The arc - node and polygon topology is organized to identify
point, lines and polygon relations. The cartographic data are then linked to the
attribute data through a link - item.

ARC/ INFO Functionalities

ARC/ INFO have a wide range of functionality which has been developed based
on a tool - box concept - where each function can be visualized as a tool and
having a specific utility thus, based on the user requirement as a specific tool or
function could be utilized. The major modules are discussed below:

ADS and ARCEDIT:

Database creation in ARC/ INFO is possible through the process of digitization


using the ARC Digitizing System (ADS) and the ARCEDIT module. The ADS is
a menu - driven module for digitizing and perform editing on spatial features.
ARCEDIT is a powerful editing utility having capabilities for feature based
editing. These modules include the functions for co ordinate entry using different
devices digitizers, screen cursors etc.

INFO

Info is a complete relational database manager for the tabular data associated with
geographic features in map coverage. ARC/ INFO keeps track of and updates map
feature attribute table which are stored as INFO data files. INFO can be used to
manipulate and update each feature's attribute by performing logical and
arithmetic operations on the rows and columns of the table. INFO provides
facilities for data definition of data files, use of existing data files data entry &
update and sort and query.

119
Analysis Modules

ARC/ INFO offers spatial overlay capabilities based on topological overlay


concepts. Overlays, Buffer generation, Proximity analysis, Feature Aggregation,
Feature Extraction, Transformation, Nearness functions and other integration
utilities are available.

ARCPLOT

This module has got capabilities for generating cartographic quality outputs from
the database. This includes utilities for interactive map composition, editing map
compositions, plotting and printing etc. The map composition functionalities
include the incorporation of coverage features as per required scale,
generalization, symbolization, transformation etc. Placements of non - coverage
features like legends, free text, logos, and graphic shapes and so on can also be
done.

TIN

The TIN module of ARC/ INFO can be used to create, store, manage and perform
analysis pertaining to the third dimensional data. The modeling capabilities
include calculation of slope, aspect, is - lines or contouring, range estimation,
perspectives, volumes etc. Additional functions for determining spatial visibility
zones, line of sight are also provided.

Network

Network module of ARC / INFO performs two general categories of functions -


network analysis and address geo-coding. Network analysis for optimal path
determination and resource allocation analysis is possible. Geo-coding module
allows for associating addresses to line networks and determining the spatial
framework of addresses in an application.

COGO

Which is Coordinate Geometry module of ARC/ INFO supports the functions


performed by land surveyors and civil engineers for the design and layout of
subdivisions, roads and related facilities, as well as for supporting the special
plotting requirements. COGO software tools allow the definition, adjustment and
close traverses including adding curves on a traverse; computers area, bearing and
azimuths.

GRID

It is a grid based module of ARC/ INFO and is available on mainframe platforms.


GRID has an interface to ARC/ INFO and thus coverage can be converted to

120
GRID and also from GRID to ARC/ INFO. GRID supports powerful modeling
tools.

Apart from the above, ARC/ INFO also provides gateways to and from other
systems ERDAS System, grid data, AUTOCAD - DXF format, IGES format and a
flat file format. ARCVIEW module is a desktop mapping package oriented
towards viewing and querying ARC/ INFO databases.

ARC/ INFO Platforms

ARC/ INFO is available on a wide range of platforms - OCS, workstations,


PRIME Systems, APOLLO Systems etc. ARC/ INFO is also available on a variety
of operating systems - DOS on PCs, VMS on VAX machines, UNIX on
workstations.

PAMAP GIS

PAMAP GIS is a product of PAMAP Graphics Limited; Canada is an integrated


group of software products designed for an open systems environment. The
package is modular and is designed to address the wide range of mapping and
analysis requirements of the natural resource sector.

PAMAP Data Structure

PAMAP adopts an integrated raster as well as vector representation of the graphic


elements. It uses vectors for data capture and storage and raster for analysis
purposes.

PAMAP Functionalities

PAMAP GIS has 7 major modules, described as follows:

GIS Mapper

It is the basic module for data entry in order to create a database of maps. It allows
for the generation and editing of the vector database which forms the base for
subsequent raster based analysis. It includes 'planner' which is a quick interactive
report generator. It has direct link to Dbase and data exchange with statistical
packages like RS/ 1, SPSS and SAS. GIS mapper also supports pen plotter output
to several plotters.

Analyzer

It is the main analysis engine of GIS. This module allows the user to perform Data
conversion for polygonization - Raster Creation, which generates polygonal
information from the vector boundaries of the polygonal areas; Overlay operation,

121
for performing two or more polygonal overlays to result in a new level output;
Proximity analysis, to generate "distance from cover" containing the distance
values to the nearest specified start location; Corridor analysis around specified
map features and This is an polygons are of influence etc.

Topographer

It is for processing of three dimensional data and DEM. Different products like
slope, aspect, perspective views, visibility from a view point and volume
calculations can be derived from this module.

Interpreter

PAMAP's interpreter is for importing remotely sensed images from digital image
analysis system into the as surface covers. The imported image can be displayed
on both 8 and 32 bit machine.

Modeler

This module integrates multiple surface raster or multiple database attributes to


make planning decisions quickly and accurately. Modeler consists of three main
functions - combination modeling; regression analysis and correlation and
covariance analysis.

Networker

This module is used to create, analyze a manage networks. Main network


functions are - network formation; location/ allocation analysis; network searches
and path analysis.

File Translator

It is for importing and processing map files created in various data formats like -
IGDS, SIF (Intergraph), DLG (Digital Line Graph) DXF (AutoCAD) etc.

PAMAP Platforms

PAMAP GIS is available on a variety of platforms - on 468/386/286 PC's; UNIX


workstations (Data generals, IBM's RISC 6000, SUN's Sparc Station) and VAX
systems. PAMAP has also introduced PAMAP GIS for MS - Windows with multi
tasking capability.

122
Spans GIS

Spatial Analysis System (SPANS) is a GIS Package developed by TYDAC


Technologies, USA and developed to provide a flexible working environment for
the GIS user.

SPANS Data Structure

SPANS adopts a mixed vector – tessellation approach for the GIS and have
developed the Region quad-tree data structure to configure a better way of
tessellating data than a conventional raster structure. It has the ability to read and
process vector and raster formats used by other GIS. It internally converts both
formats to quad-tree structure which is used to store and manipulate data. SPANS
can handle a maximum of 33000 x 33000 cells through a 15 level quad –
tessellation for a theme.

SPANS Functionalities

SPAN has got the following modules:

GIS Module

It is a set program forming the core of the SPANS package. It includes


digitization, editing, raster and vector to quad-tree conversion and vice versa,
projection coordinates to latitude/ longitude transformation and vice versa and
interactive command menu. It also includes map display/ browse and hard copy
output, plot output. The CORE includes polygon analysis, logical overlays, matrix
overlay, indexing/weighted overlay and spatial modeling. It includes corridor
analysis, point query, nearest point, point to area conversion etc.

TYDIG

It is standalone digitizing system with expensive editing features for transfer of


map data to digital format.

Contouring/ DEM Module


It is a standalone module for converting geo-referenced point observation into an
interpolated plane. It can accommodate up to 2000+ input points and generate
contours, slope aspect and angle of incidence maps.

Potential Mapping Module (POTMAP)

It is a point interpretation program and 7000 data points can be accommodated.


Features include user specified distance decay function, weighing schemes,
various interpolation options such as moving average, slope, aspects, density
statistical form etc.

123
Raster Interface Module

It enables the user to import and export a wide range of raster based GIS/ Imagery
data sources. All imported data can be converted to SPANS internal quad-tree
structure or standard raster format.

SPANS Platforms

SPANS is a micro computer based GIS package and operates on the PC platform
– AT and 386/ 486 family.

OTHER GIS PACKAGES

While it is not possible to cover all the GIS packages in this survey, some have
been discussed in the earlier section. A few of the other competitive packages are
discussed below:

Modular Geographic Environment (MGE) by Intergraph Corporation runs on


PC (Windows NT), Macintosh and UNIX platforms. A full featured package with
power packed functionality. It includes object oriented and relational spatial
analysis and a unique raster (grid) analysis capability which links raster files to an
RDBMS.

System 9 – a feature based “object oriented” modular GIS based on a toolbox


approach. It is available on IBM AIX, HP UX, Solaris 2 and Solaris 1 (SunOS).
Commands can be put into scripts (or C code). The database Empress is superior
to many other embedded GIS databases. Everything (spatial and attributes) is in
Empress Tables. It is primarily a vector based system with ability to display geo –
referenced raster imagery as a background.

MAPGRAFIX

A Macintosh GIS package with limited raster data is handling capabilities, e.g.
image as a backdrop. This is an interesting package in that it has no internal
DBMS of its own but accesses various commercial databases. In this way, it is
possible for this package to utilize existing databases (as long as the attributes are
geographically referenced) without the need for conversion. This package is not
topologically based and therefore cannot perform such topological functions as
optimal path determination etc. Additional modules include MAPLINK (a
translator for the conversion of 13 different data types such as DLG, TIGER AND
DXF) and MAPVIEW (a map projection module.).

IDRISI

GIS package has been developed by Clark University, USA and is an inexpensive
PC based package with many advanced features including good import/ export

124
capabilities, a new digitizing module and some image processing capabilities.
Based on raster structure, the package is very good for educational purposes. Runs
on straight DOS machines with minimal hardware requirements (e. g. minimal =
fast 386). The software is modular in design with strong support for the
manipulation and display of raster images. Manipulation and import of vector data
is supported.

GENAMAP, a topological vector based GIS for UNIX platforms. A separate


module called Genacell is available which provides raster/ grid modeling.
Hydrologic and TIN modeling is incorporated in another separate package called
Genecivil.

GRASS is a public domain UNIX package with a large established user base
which actually contributes to the code that is incorporated into the new versions.
There are several image processing tools and good support for spatial statistics
analysis with strong support of raster/ vector integration. The latest version (4.1) is
available via ftp (includes full sample database containing SPOT, DEM and other
data sets). GRASS is probably the most widely used GIS package for hydrologic/
watershed modeling applications and directly supports two widely used models:
AGNPS and ANSWERS.

MAPINFO

A popular package translated into several languages and ported to several


platforms (Windows, Macintosh, Sun and HP Workstations). MAPINFO has very
good display capabilities. It supports dBase as well as its own DBMS. Some
complaints about limited data import capabilities and the proprietary MapInfo
format. One such complaint cited the lack of TIGER import capability, MapInfo
itself resells the formatted Tiger database for MapInfo users but at a hefty price.

Map Info server products, such as Map Info Spatial Ware to enable you to manage
spatial object data in Oracle and Informix as never before with delivery
mechanisms, such as Map Info Map Xtreme for deploying mapping across the
Internet with client side tools, such as Map Info Mapx, our OCX for embedding
mapping. All these together with our core GIS products Map Info Professional,
geographic data products and services, means the winning spatial solution never
looked so good.

ENVI

Environment for Visualizing Images ENVI is the environmental monitoring tool


that helps users to make better decisions. ENVI‟s multi and hyper-spectral
analysis features, slick handling of large data & enhanced visualization helps earth
analysis organizations.

125
Visualization is critically important for decision making analysts and non-experts,
who need to understand what it is they are looking at, but may not have
experience with color-enhanced, multi-spectral images.

For basic to advanced applications, with no limits on file size or number of bands,
a suite of GIS feature and logical interface, you will find ENVI to be the most
powerful and easy to use image processing package available. (www.Rsinc.com)

Infomaster GIS (better solutions for better business)

State-of-the-art client/server spatial technology providing ORBMS and RDBMS


graphics storage, raster/vector integration rules based editing that guarantees data
integrity. Further this is fully scalable and open systems complaint with dynamic
access to other GIS datasets, CAD systems, Video, etc. Build your own mapping
environment with the point/ click GUI builder that puts customized business
solutions on desktops or into existing applications using the new OLE support in
minutes.

Asset Master

Utilizes single source storage for asset and GIS attributes providing total data
integrity. Asset master includes asset classification, asset condition assessment,
depreciation modeling, asset valuation, budget forecasting, asset lifecycle decision
support, maintenance management and more. Asset master can be integrated with
CAD, video or any other corporate system including other asset management
programs.

Web Master

A combined process of spatial data publishing, web serving and client browsing
which gives internet/intranet users across the community or the world access to
your spatial data. Web master is vector streaming with raster backdrop integration
to provide clear display of any source data, including data sets from other GISS.
Full catching and version control for data and application plug-ins guarantees high
performance and ease of client management across small or large organizations.

Info master provides a single, scalable environment that provides end to end
processing of spatial data, assets data and then delivers business solution in a
browser on the internet or intranet. All products are fully OLE enabled for
Microsoft product co-processing. (www.infomaster.com.au).

PMAP

It is a PC based package advertised for less than $500. It uses a raster data
structure and a command language which allows for macro file capabilities. Runs
on straight DOS machines and includes such modules as slope/ aspect, contouring,

126
optimal paths and spatial interpolation. Special versions of this package (aMAP
and tMAP) are available for academic or tutorial purposes, but are limited by file
size restrictions.

WINGIS

It is for MS – Windows and MS – Windows/ NT, consist of a number of modules


of a well devised system which can link geographical elements with data. The
modules are: WINMAP, WINVIEW, WINSAT and WIN3D. WINGIS permits
inter-relating of the most varied streams even from different databases and works
through SQL standard even with mainframes as well as with network databases.
The graphic background can be provided by digitizer, imported coordinate data or
by satellite or aerial photographs. The minimum hardware configuration for
WINGIS is a 486 PC with at least 16MB of RAM, 120 MB free HD space and
Windows or Windows/NT running. The theoretical limitations of WINGIS
program is mainly in function of the disk space. Project limitations are 42, 000 x
42, 000 kilometers with a resolution of 0.01 meter, a distance function of
maximally 21,000 km, a maximum of 16354 layers and 16354 points per graphic
object and a maximum of 2.1 billion objects, polygons to a maximum of 5000
points per polygon. The database is used the GUPTA SQLWINDOWS, which can
handle DB@, Informix, Ingres, Sybase, HPAIIBase, AS/ 400 as well as Oracle,
dBase, Lotus, DIF, CSV, SQL or ASCII data.

For a more detailed listing, TABLE – 1.1 may be referred as it lists some of the
latest packages in the market and their broad specifications.

GIS PACKAGES OF INDIA

In India too, a number of GIS packages have been developed and are available for
operational applications. A few private agencies have also come up with GIS
packages, while government sector efforts are also considerable.

ISROGIS

ISROGIS is state of the art Geographical Information System (GIS) package with
efficient tools for integration and manipulation of spatial and non – spatial data
and consists of a set of powerful modules for data capture from map through a
process of digitization, direct RS data interface; tabular data storage and
management; integrated manipulation and analysis; display and output operations.
ISROGIS has been developed jointly by Space Application Center, Ahmadabad
(SAC/ ISRO), Ahmadabad; ERA Software Systems (P) Ltd., Hyderabad and
PEGASUS Software Consultants (P) Ltd., Bangalore and licensed by (SAC/
ISRO) under a Technology Transfer agreement to ERA and PEGASUS.

127
ISROGIS Data Structure

ISROGIS adopts the PM Quad-tree16-1 data structure because of distinct


advantages over other structures and also because of the flexibility it offers
for manipulation. This data model efficiently represents polygonal maps. The PM
Quad-tree is an edge based structure that decomposes the vectors in a map into
quads so that a simple subset is obtained and this is then organized using the
vector structure. The quad-information is stored and the vector information of the
edges is also recorded. Thus this model not only has the properties of the quad but
also the properties of the vector.

ISROGIS Functionalities

ISROGIS has a set of twelve modules that can be used to perform a variety of
function spatial and non - spatial database organization, editing of spatial features,
symbolization of spatial features, integrated analysis of spatial features, querying
of spatial and non - spatial features, map output generation and plo6t6ing etc. The
modules are as follows:

CREATE module of the ISROGIS is the basic module that allows for the creation
of MAPS and THEMES either through a process of digitization and ASCII/ ARC
- INFO/ RS IMAGES file interfaces.

EDIT module of ISROGIS provides a comprehensive set of tools for a systematic


editing of spatial features and removing errors from the spatial data sets.

MAKE module enables the user to provide one's own perception to a THEME by
specifying a symbolization to the point, line and polygon features and annotation
with texts.

ANALYSE provides functions for manipulating and analyzing information and


includes the overlay operations - polygon overlay, geometrical clipping, buffer
generation, MAP generalization etc. Geometric TRANSFORMATION from one
coordinate system to another is also a part of ANALYSE.

ATTR_DB module of GIS enables the user to perform a variety of operations


over the attribute database. This module builds a layer of interface software over
the underlying relational database. This module should be used after creation of
MAP and its THEMES.

QUERY module of ISROGIS allows user to execute query operations for


obtaining information related to spatial and attribute data stored in the database.

128
LAYOUT module provides cartographic output capabilities for ISROGIS from
screen displays to plot outputs and provides facilities for interactively creating and
previewing SHEETS a graphic representation of spatial data sets.

MAPMOSAIC allows for joining or mosaicking a number of maps into a single


spatial entity. The facility to match the edges either interactively or automatically
is also to be provided under this module.

3D MODULE provides a powerful set of functions for handling the z axis of


spatial data. Based on TIN and interpolated metrics, the module can generate
slopes, aspects, iso-lines, view sheds, surface distances, perspective views etc.

Apart from these, ISROGIS has a Symbol Manager for generating user defined
customized library of symbols; Batch Processor as an alternate interface for users
and utilities for data management. ISROGIS has a powerful Graphical User
Interface (GUI) which makes use of the X Windows/ MS - Windows
environment.

ISROGIS Platform

ISROGIS is available on the PC Platforms on MS - Windows and on UNIX and


Sun platforms.

GRAM

GRAM is a PC based user friendly GIS tool designed to handle both spatial and
non-spatial attributes. GRAM has been developed by Indian Institute of
Technology (IIT), Bombay under the Department of Science and Technology
(DST) sponsored project. The package operates on PC - DOS systems. GRAM
can handle both vector and raster based data and has functionality for raster based
analysis, image analysis modules etc.

GRAM Data Structure

GRAM package adopts the raster data structure presently. A vector topological
model version is under development.

GRAM Functionalities

The GRAM package has the following modules:

Input module

'Digitize' is the tool available in GRAM input module which helps to digitize
point, line and aerial features. Within DIGITIZE, options are available to snap the
hanging segment, to delete the overshooting part of a segment or move segment

129
coordinate to the desired position. CLEAN is an utility which helps to remove
overshooting segments and snaps hanging segments automatically. POL_FILL is a
utility which help to adopt the thematic unit for each polygon reading from the
*.PNT file.

GRAM supports the maps manipulation on raster data planes. Hence the linear
features and polygon features have to be rasterized for the desired resolution,
before proceeding to data manipulation and analysis. POLRAS is the utility which
enables the conversion of polygons into raster data file. The polygons are grouped
as per their thematic units and rasterized. In addition to these utilities, the input
module also accepts remote sensing data or any raster data file taken from other
GIS packages. These data planes can be geometrically corrected and transformed
for registration with other data planes.

Attribute link

The system generates default data bases for point, segment and aerial units. The
user can append these data bases with additional attributes. Facility is available to
view the non spatial data for any polygon or segment or point features. Also using
the attributes in the data bases, thematic maps can be prepared.

Analyze Module

Analysis module supports a series of arithmetic and relational operations that can
be performed on a single or multiple data planes. In addition, overlay function
helps to overlay two maps of any independent themes. During overlay operation,
an overlay table gets generated indicating table gets generated indicating the pixel
values of the first map, the second map and the resulted overlaid map. Using this
table, one can regroup and assign proper pixel values to bring out meaningful
interpretation of overlaid map.

Terrain Module

Under this module, interpolation utilities are available to generate a surface from
contours and random spot (elevation) data. It can be used for digital terrain
modeling, geochemical mapping, ground water zone mapping etc. Utilities are
also available to view 3D projection of the data with another map draped over it.
Also slope, aspect and relief maps can be generated from the DTM.

Image Processing Module

GRAM has an image processing module consisting of utilities for image


preprocessing, enhancement, classification and geometric transformation. Image
preprocessing covers band separation, window extraction, histogram generation,
scatter plot creation, data compression and data enlargement while enhancement
covers density slicing, look up table manipulation, linear/ nonlinear stretching,

130
filtering, edge operators etc. Both supervised and unsupervised procedures are
included under classification.

Print Module

Print Module provides capabilities for displaying results in both graphic and
tabular formats. The graphic result can be displayed in suitable colours. Vector
information such as drainage, village boundaries etc. can draw over the raster
image to improve its quality. Legends and titles can be incorporated and the hard
copy can be obtained using inkjet printer. Tabular data is displayed using the
query and report generating support of the data base.

GEOSPACE

GEOSPACE is a PC based GIS tool designed to handle spatial data their non
spatial attributes. GEOSPACE has been developed by Regional Remote Sensing
Services Centers (RRSSc), DOS. The package operates on PC - Xenix operating
system and is interface to the CGIgraphic utilities. GEOSPACE handles spatial
data in raster mode and has functionality for raster based analysis.

GEOSPACE Data Structure

GEOSPACE packages adopt the raster data structure presently.


GEOSPACE Functionalities

GEOSPACE comprises of nine modules which facilitate database creation/


editing, integrated analysis, DTM and Network Analysis. The modules are as
follows:

Input/ Edit for the digitization of map features into the database. The vectors are
can either be converted to raster or taken up for editing. Mosaicing of maps is also
possible to obtain single scaled outputs.

Manipulation module allows for the integration of raster overlays on the screen to
obtain a composite display on the screen.

Analyze module allows proximity analysis, buffer generation, logical operations


on raster overlays. The logical operations could be AND, OR, XOR NAND.

GEONET is designed for determination of shortest paths between source and


destination.

GEO3D for generation of perspective views of the third dimension of the spatial
data, slope computation of the surface model and also draping of overlays on to
the perspective.

131
Attribute/ DB Manager mainly handle attribute data of the spatial data and also
user defined attributes. Attribute tables can be defined, modified, updated, deleted
and accessed for data retrieval. Output module allows for display of maps on the
screen and also for obtaining print outputs on printer and plot outputs on plotter.

GEOSPACE Platforms

GEOSPACE is available on PC Platforms and operating with Xenix/ Unix


operating system. Graphics is handled around the CGE core. Earlier versions of
were developed on VAX - 11/780 systems and these are also available with
similar functionalities.

GISNIC

GISNIC package has been developed by National Informatics Centre (NIC) and
can automate, manipulate, analyze and display geographic data in digital form.

GISNIC Data Structure

GISNIC adopts the arc node topological vector data model for representing the
spatial and non spatial datasets.

GISNIC Functionalities

GISNIC has 9 modules as part of the GIS package. These are DIGITIZ for
digitizing base maps, attribute generation, topology generation and export/ import
operations; EDIT for interactive editing of spatial features; PLOT for interactive
map creation and display, query operations and map composition; PRINTMAP for
off line generation of hardcopy of maps on printers; SHOWMAP for integrating
maps with existing applications; DUMP for obtaining graphic data dump; GEN
for keyboard entry of features into a map; ANALYSIS for buffer generation,
Overlay, Dissolve, Clip and other integration operations and DXFGIS for
importing AutoCAD DXF files into the GIS.

GISNIC Platforms

GISNIC is available on Intel 80 x 86 systems with Xenix and SCO – UNIX


platforms. Graphics handle around the CGI interface.

THEMAPS

THEMAPS developed by System Research Institute (SRI), Pune, is a DOS based


package available in the market. It is based on vector topology structure and has
interface to most DOS-based relational DBMS packages. Functionality is oriented
towards the adaptation of tabular data in spatial format and also integrated
analysis.

132
Conclusions

An overview of the major commercial GIS packages has been provided in this
section. While the list and the discussion are illustrative, it may be noted that there
are many other GIS packages available and being developed which are not
discussed in this section. This section also provides some details on the
functionalities and is oriented towards providing a base comparative of the
packages. Both commercial and in house development GIS packages have been
covered. While efforts have been made to provide the latest information, it is
advisable to obtain more detailed information on the packages discussed here is
also provided.

133
BIBILIOGRAPHY
1. Aybet J, 1990; „Integrated mapping systems – data conversion and integration.
Mapping Awareness 4(6):18-23.

2. Burrough P.A., “Principles of Geographical Information Systems for Land


Resource Assessment”.

3. Chrisman, N.R. (1984) The role of quality information in the long term
functioning of geographical information system. Cartographic Journal 21:79-
87.

4. Cowen D.J 1988; „GIS versus CAD versus DBMS: what are the differences?
Photogrammetric Engineering and Remote Sensing 54: 1551-1555.

5. Dangermond J., 1986; „The software toolbox approach to meeting the user‟s
need for GIS analysis. Proceedings of the GIS workshop, Atlanta, Georgia, 1-
4 April 1986, pp. 66-75.

6. Exploring Spatial Analysis in Geographical Information Systems by Yue


Hong Chou.

7. Farior, Y.S.; Maling, D.H. (1969) – The accuracy of area measurement by


point counting techniques. Cartographic Journal 6:21-35.

8. Fundamentals of Spatial by Robert Laurini & Derek Thompson.

9. Geocarto International.

10. Geographical Information System, Volume 1, Edited by Paul A.Longley,


Michael F.Goodchild, David J. Maguire & David W. Rhind.

11. George J.Klir & Bo Yuan- “Fuzzy sets and fuzzy logic”.

12. Goodchild, “Geographical Information Systems –Principles” Vol.1.

13. Goodchild, M.F. (1978) – Statistical aspects of the polygon overlay problem.
In Harvard papers on Geographical Information System. (Ed. G. Dulton)
Vol.6Addison Wesley, Reading Press.

14. ILWIS 2.1 User Guide, ILWIS Department, International Institute for
Aerospace Survey & Earth Sciences, October, 1997.

15. International Journal of Geographical Information Science.

134
16. Ira Becker; “Using Information Technology: About Software”, IBM personal
computer Handbook.

17. Jackson M.J., Mason D.C., 1986; „The development of integrated geo-
information systems. IJRS 7:723-740.

18. John C. Antenucci etal; “ Geographical Information Systems: A guide to the


technology”, Van Nostrand Reinhold Publications, 1991, New York.

19. Lanter, D.P. (1993)- A lineage inter database optimisation; Cartography and
Geographic Information System Vol.20, No.2.

20. Lefteri H. Tsoukalas & Robert E. Uhrig- “Fuzzy and Neural Approaches in
Engineering”.

21. Magazines; GIS Asia Pacific, GIS @ Development, GIS India

22. Maguire D.J., 1991; „An overview and definition of GIS‟, Geographical
Information Systems: Principles, Longman, London, PP. 9-20, Vol1.

23. Maling D.H. , 1973; „Coordinate Systems and Map Projections, George
Philip, London.

24. Miejerink, A.M.J. (1994); Introduction to the use of Geographic Information


Systems for Practical Hydrology, ITC & UNESCO Publication.

25. Narasimhan Tupil; “PCs to ride Net Boom”, Express Computers, December 7,
1998.

26. Shephered, I.D.H., 1991; „Information Integration and GIS‟, Geographical


Information Systems:Principles, Longman Scientific & Technical Pubications
Limited, UK.

27. Wilkinson, G.G., 1996; „A Reveiew of current issues in the integration of GIS
and remote sensing data‟, IJGIS, Vol.10,No.1.,85-101

28. Mark, D.M. and Anonson, P.B. (1984) – Scale dependent fractal dimensions
of topographic surfaces: an experienced investigation with application in
geomorphology and computer mapping. Mathi Geol.16: 671-83.

29. McAlpine, J.R. and cook, B.G. (1971)- Data reliability from map overlay. In:
Proc. Australia and New Zealand association for the advancement of science.
43rd congress, Brisbane.

30. Burroughs, P.A (1986), Principles of Geographical Information Systems for


Land Resources Assessment. Oxford University Press.

135
31. Rao, Mukund (1993). 3-D GIS: Concepts, Lecture notes prepared for the
training course on GIS for Resources Management and Development Planning
held at Space Application Centre, Ahmedabad, 23, August – 11, September.

32. Clark, D.M., DAHastings and JJ Kineman, 1991. In GIS: Applications eds.
David j. Maguire, Michael F, Good Child and David W Rhind, Vol. 2 pp 217
– 31 longman Scientific & Technical England.

33. Earth Systems Science Committee, 1988. Earth System Science, closer views.
NASA, Washington DC.

34. Global change research and geographic information systems requirements,


1997. In Integration of Geographic information systems and Remote Sensing
eds. Star, J. L., J. E., Estates and K. C. McGwire, pp 158 – 174.

35. IGBP Special Committee (1988) The International Grospher Biosphere


Programme a study of global change a plan for action Report No. 4 IGBP
Secretariat, Stockholm.

36. NRIS: Node Design and Standards, 1997 SAC Ahmedabad, India.

37. UNEP Global Resource Information Database (1988) Report on the meeting
of the GRID Scientific and Technical Management Advisory Committee, Jan
– 1988.

38. RRSSC, 1993. GEOSPACE GIS - An indigenous effort. Lecture Note in


Module - 3 of the Training Course on GIS for resources management and
development planning held at Space Applications Centre, Ahmedabad from
August 23, 1993 to September 11, 1993.

39. Vaish Prabhat K, 1993 - GIS in NIC. Lecture note in Module - 3 of the
Training Course on GIS for resources management and development planning
held at Space Applications Centre, Ahmedabad from August 23, 1993 to
September 11, 1993.

136
137

View publication stats

You might also like