0% found this document useful (0 votes)
10 views47 pages

Data Model & Data Structures GIS Unit

The document discusses geographic data and its representation in Geographic Information Systems (GIS), highlighting the distinction between data and information. It explains the components of geographic information, the importance of layers to avoid redundancy, and the two main types of spatial data models: vector and raster. Additionally, it details the data modeling process and various vector data structures, including spaghetti and topological structures, as well as raster data structures.

Uploaded by

Charles Muhoro
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
10 views47 pages

Data Model & Data Structures GIS Unit

The document discusses geographic data and its representation in Geographic Information Systems (GIS), highlighting the distinction between data and information. It explains the components of geographic information, the importance of layers to avoid redundancy, and the two main types of spatial data models: vector and raster. Additionally, it details the data modeling process and various vector data structures, including spaghetti and topological structures, as well as raster data structures.

Uploaded by

Charles Muhoro
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 47

Geographic data

• Data is the fuel that drives GIS.


• Data is a set of values or elements used to represent something.
For instance, the string 502132N is data. We can interpret that data as being a
geographical reference, in which case it could be a latitude value, in particular
50◦21′ 32′′ North. If we interpret it as being a reference to an address (such as a
ones home) associated with a person, the information that we get is completely
different.
• Information is, therefore, the result of data and its interpretation. and in
many cases, working with data means just trying to extract from it all the
information that it might contain.
• Understanding the meaning and the differences between data an information
allows us to understand, for instance, why the ratio between the size of a
given data and the amount of information it contains is not constant.
Geographic data
• Geographical information has two separate components: spatial and
thematic.
• The spatial component contains the position, referred to a given reference
system, and it answers the question where.
• The thematic component answers the question what and it defines the
characteristics of the phenomenon or feature that occurs at the location
indicated by the spatial component.
• In a GIS, the information about a given study area is divided into several levels.
Even if it refers to the same location, the information about different variables
is stored separately. i.e. a set of different blocks of information exists for the
same area, each of them containing a particular variable or set of elements.
Each of these blocks is called a layer.
Geographic data
• Layers help avoid data redundancy, since each layer just contains information
about a particular variable or type of feature.
• A traditional map always contains a set of different variables, not just a single
one. Some of them are used to provide a general context, such as the names
of the main cities or the main roads, and these appear in most maps. In a GIS,
they exist independently, and the user can add and combine them with other
layers whenever is needed.
• Geographical information can also divided into layers considering purely
spatial criteria, cutting it in smaller parts that cover a smaller area. This is
similar to what happens with traditional cartography, divided into map sheets.
• Therefore the main feature of a GIS to transparently integrate data
corresponding to different areas and create a seamless mosaic.
Data model & data structures
• In order to visualize natural phenomena, one must first determine how to best
represent geographic space.
• A data models are set of rules or a set of construct for describing and
representing selected aspects of the real world in a computer. i.e. features are
structured/ordered to facilitate representation in GIS or it can be defined as a
general description of specific set entities and relation btw them .
• Purpose of model is to provide a formal means of representing information
and formal means of manipulating such a representation
• Data structure: structuring or ordering of features in a GIS to facilitate storage
or retrieval . i.e. it implements a specific data model
Data model
• In Data modeling involves 3 steps :
i. Reality
ii. conceptual model: Establishing a geographical model. That is, a conceptual
model of a reality and its behavior.
iii. logical model: Establishing a representation model. That is, a way of coding
the conceptual model, reducing it to a finite set of elements.
iv. physical model: Establishing a storage model. That is, a storage strategy for
storing the elements of the representation model.
Data model
Data Models
• There are two broad categories of spatial (geographic) data models
encountered in GIS
1. vector data model, which represents the world phenomena into points,
lines, areas/polygons
2. raster data model (grid model), which represents world phenomena as
cells of a predefined, grid-shaped tessellation
Data model
Data model
• Data modeling :
• For example modeling spatial data about height of ground above sea level
(Elevation) steps:
i. Conceptual model : real world must be described in terms of a data
model for example Raster(grid)model
ii. Logical model : a data structure must be chosen to represent the data
model for example run-length encoded data structure
iii. Physical model: a file format must be selected that is suitable for the
data structure for example format such as a *.tiff file
Data model
Or
i. Conceptual model : - real world must be described in terms of a data
model for example vector model expressed as polygons bounded by
contour lines
ii. Logical model : a data structure must be chosen to represent the data
model for example data being arranged in a topological structure
iii. Physical model: a file format must be selected that is suitable for the
data structure for example DLG (Digital Line Graph) file format
Data model
Vector model
• use points and their associated X, Y coordinate pairs to represent the
vertices of spatial features.
• Simple point, line and polygon entities are essentially static representation
of phenomena in terms of XY coordinates
Data model
Vector model
Data model
Vector model
• point entity implies that the geographical extents of the object are limited
to a location that can be specified by one set of XY coordinates
• Points are typically used to model singular, discrete features such as
buildings, wells, power poles, sample locations, and so forth.
• Points have only the property of location. Other types of point features
include the node and the vertex.
Data model & data structures
Vector model
• line entity implies that the geographical extents of the object may be
adequately represented by sets of XY coordinate pairs that define a
connected path through space
• Lines are used to represent linear features such as roads, streams, faults,
boundaries, and so forth. Lines have the property of length.
• Lines that directly connect two nodes are sometimes referred to as chains,
edges, segments, or arcs.
Data model
Vector model
• A polygon is defined by a boundary consisting of one or more lines that
form a closed loop.
• In the case of polygons, the first coordinate pair (point) on the first line
segment is the same as the last coordinate pair on the last line segment.
• Polygons are used to represent features such as city boundaries, geologic
formations, lakes, soil associations, vegetation communities, and so forth.
• Polygons have the properties of area and perimeter. Polygons are also
called areas.
Data model
Vector model
• Elements can be represented using different types of primitives. For
instance, a city can be represented as a single point or as a polygon
with its perimeter.
• Using one or another geometry should depend on the type of
phenomenon that we want to model or the level of detail that is
needed, among other factors.
• The thematic component in the vector model is defined using
attributes. A layer usually contains multiple attributes. Attributes are
associated to features, can have information of all types.
• Topology: It is a key element of a vector layer. It contains the spatial
relations between its features.
Data model
Vector model
• For example a roads layer without topology (a) and with topology (b).
Circles in this last case indicate connections between roads.
Data model
Raster model
• The raster data model is widely used in applications ranging far beyond
geographic information systems (GISs).
• Most likely, you are already very familiar with this data model if you have
any experience with digital photographs. The universal JPEG, BMP, and
TIFF file formats (among others) are based on the raster data model
• Pic. Shows pixilation of a digital picture:
Data model
Raster model
• It is based on a systematic division of space. The whole space is characterized
by a set of elements that cover it, each of them with an associated value.
• The most common raster model is based on a grid of square cells, or
sometimes rectangular ones.
• Knowing the orientation of the grid, the size of the cells (which is the same for
all of them), and at least the coordinates of one of them, it is possible to know
the location of all cells, thanks to its regular structure.
• The cell size is a parameter related to the scale of the layer, since it defines its
resolution and depends on the level of detail used when the corresponding
measures were taken.
• Cells in a raster grid with their associated values.
Data model & data structures
Raster model
• The raster data model consists of rows and columns of equally sized pixels
interconnected to form a planar surface.
• Regard space as a grid of cells, each of which is associated with the
phenomenon that occupies it.
• The 2D geometric surface is divided into square cells (known as pixels) whose
size is determined by the resolution that is required to represent the variation
of an attribute for a given purpose.
• Each grid cell may be thought of as a separate entity that differs from vector
polygons only in terms of its regular form.
• A vector point can be represented as by a single cell; a vector ‘line’ by a set of
contiguous cells one cell wide having the same attribute value, and a vector
polygon by a set of contiguous cells having the same attribute value.
• These pixels are used as building blocks for creating points, lines, areas,
networks, and surfaces.
• Although pixels may be triangles, hexagons, or even octagons, square pixels
represent the simplest geometric form with which to work.
Data model
Raster model
Data model
Raster model
• The number of values stored for each cell defines the number of bands in a
raster layer.
• A band contains a single value for each cell.
• Hence a raster layer with more than one band as a set of sublayers, all of them
having the same spatial structure (extent and cell size), and wrapped as a
single layer.
• We can find a clear example of that in digital color images. A digital image is
composed of a grid of values (called pixels), each of them with an associated
color. In the most common case, that color is expressed with three values,
corresponding to the intensity of colors red, green and blue, which, when
combined, give the pixel color. That is, an image like that is a raster layer with
three bands, each of them containing one of the red, green and blue
components.
Data structures
Vector Data Structures
• Uses points, lines or polygons to describe geographical phenomena
• Vector units are characterized by the fact that their geographical location
may be independently and precisely defined, as may be their topological
relationships.
• A spatial phenomenon is modeled in terms of its geographical location
and attributes. e.g.
‘’An oil well could be represented by a point unit consisting of a single XY
coordinate pair and the label ‘oil well’; a section of oil pipeline could be
represented by a line unit consisting of a starting XY coordinate and an end XY
coordinate and a label ‘oil pipeline; an oil refinery could be represented by a
polygon unit covering a set of XY coordinates and the label ‘oil refinery’. ‘’
Data structures
Types Vector Data Structures
1. Spaghetti data structure
• Simplest vector data structure
• The map area is assumed to be a continuous coordinate space translated into
a list of x, y coordinates. Positions can be defined as precisely as desired.
• Features are represented by points, lines and polygons.
• each point, line, and/or polygon feature is represented as a string of X, Y
coordinate pairs with no inherent structure.
• One could envision each line in this model to be a single strand of spaghetti
that is formed into complex shapes by the addition of more and more strands
of spaghetti.
• A point is encoded as a single x, y coordinate where as area is represented by a
polygon and is recorded as a closed loop of x, y coordinates that define its
boundary.
• The common boundary between adjacent polygons must be recorded twice,
once for each polygon.
Data structures
Types Vector Data Structures
1. Spaghetti data structure
Data structures
Types Vector Data Structures
1. Spaghetti data structure
• The data structure is the data file of x, y coordinates, which is the form in
which spatial data is stored in the computer.
• Although all the spatial features are recorded, the spatial relationships
between these features are not encoded e.g. Information adjacent to each
polygon is not recorded.
• Advantages:
i. the model is efficient for digitally reproducing maps as information unnecessary to the
plotting process such as spatial relationships are not stored.
ii. The structure of this model is simple and easy to understand as the model is a map
expressed in Cartesian coordinates
• Disadvantages:
i. Technical problems encountered with this model are that there are no spatial relationships
and thus difficult to do any analysis. i.e. lack of topological information, which is
problematic if the user attempts to make measurements or analysis.
ii. A large storage areas is also required as the same information is recorded twice in cases
where features share boundaries. This creates some redundancies within the data model
and therefore reduces efficiency.
• CADD is a system using the spaghetti model
Data structures
Types Vector Data Structures
2. Topological data structure
• is characterized by the inclusion of topological information within the dataset,
as the name implies.
• Topology is a set of rules that model the relationships between neighboring
points, lines, and polygons and determines how they share geometry.
• Nodes (isolated points )& vertices (linked points to form a line) are used
• A node is an intersection point where two or more arcs meet or the beginning
or ending of a line.
• The beginning of the line is a special vertex called the start node and the end a
special vertex called an end node.
• A chain/arc/edge is a series of points joined by straight line segments that
start and end at a node.
• In the arc node topological data model, the basic logical entity is the arc, and
the node.
• A polygon is comprised of a closed chain of arcs that represent the boundary
of an area.
Data structures
Types Vector Data Structures
2. Topological data structure
• Three basic topological precepts that are necessary to understand the
topological data model :
i. Connectivity: describes the arc-node topology for the feature dataset.
 In the topological data model, nodes are the intersection points where two or more
arcs meet. In the case of arc-node topology, arcs have both a from-node (i.e.,
starting node) indicating where the arc begins and a to-node (i.e., ending node)
indicating where the arc ends.
 In addition, between each node pair is a line segment, sometimes called a link,
which has its own identification number and references both its from-node and to-
node.
Data structures
Types Vector Data Structures
2. Topological data structure
• Three basic topological precepts that are necessary to understand the
topological data model :
ii. Area definition: states that a arcs that connects to surround an area defines
a polygon, also called polygon-arc topology.
 In the case of polygon-arc topology, arcs are used to construct polygons, and each
arc is stored only once.
 This results in a reduction in the amount of data stored and ensures that adjacent
polygon boundaries do not overlap.
Data structures
Types Vector Data Structures
2. Topological data structure
• Three basic topological precepts that are necessary to understand the
topological data model :
iii. Contiguity: is based on the concept that polygons that share a boundary are
deemed adjacent.
 Specifically, polygon topology requires that all arcs in a polygon have a direction (a
from-node and a to-node), which allows adjacency information to be determined
 Polygons that share an arc are deemed adjacent, or contiguous, and therefore the
“left” and “right” side of each arc can be defined.
 This left and right polygon information is stored explicitly within the attribute
information of the topological data model.
Data structures
Types Vector Data Structures
2. Topological data structure
Data structures
Types Vector Data Structures
Other vector data structures include Geographic Base File (GBF), Dual
Independence Map Encoding (DIME), Triangular Integrated Network (TIN) and
Polygon Converter (POLYVERT).

Assign 2:
Find out how the following vector data structures work:
i. Geographic Base File (GBF)
ii. Triangular Integrated Network (TIN)
iii. Polygon Converter (POLYVERT)
Data structures
Raster Data Structures
• In its simplest form, the raster data model is represented by a regular grid
of squares or rectangular cells, where the row and column numbers define
the location of each pixel/or cell. Each cell in the raster model is assigned a
value.
• The value assigned to the cell indicates the value of the attribute it
represents.
• The total number of values to be stored is the product of the total number
of rows and columns. For this reason, large storage files are required to
store this data.
• Data compression is done so as to minimize the amount of storage space
required for storing these vast amounts of data and hence the need for an
efficient data structure.
Data structures
Types of Raster Data Structures
1. Full raster coding / cell by cell encoding
• The raster model is stored as a matrix. The cell values are written into a
file by row and column.
Data structures
Types of Raster Data Structures
2. Run length encoding
• Here instead of repeating pixel values, the raster cells are coded as pairs of
numbers - (run length, value).
• is a widely used compression technique for raster data.
• The primary data elements are pairs of values consisting of a pixel value
and a repetition count which specifies the number of pixels in the run.
• Data are built by reading successively row by row through the raster,
creating a new entry every time the pixel value changes or the end of the
row is reached.
• if the cells in a row contain the same value, only one entry is recorded
thus saving computer memory.
• method is useful in situations where large groups of neighboring pixels
have similar values (e.g., discrete datasets such as land use/land cover or
habitat suitability)
• It is less useful where neighboring pixel values vary widely (e.g.,
continuous datasets such as elevation or sea-surface temperatures).
Data structures
Types of Raster Data Structures
2. Run length encoding
Data structures
Types of Raster Data Structures
3. Chain coding
• Data reduction is represented by defining the boundary of a region by
using a series of cardinal directions and cells.
• It is more of data compaction than a data model
• This technic names a particular direction and the number of times the
direction is repeated

4,3 1

N,1 E,4 S,1 E,1 S,1 W,1 S,2 W,1 S,1 W,1 N,3 W,1 N,1 W,1
Data structures
Types of Raster Data Structures
3. Chain coding
• File structure
4,3 1

N,1 E,4 S,1 E,1 S,1 W,1 S,2 W,1 S,1 W,1 N,3 W,1 N,1 W,1

• The first line in the file structure gives the position of the cell at which the
chain coding started. The value 1 indicates that there is only one chain.
• On the second line, the first letter in each sequence represents the
direction and the number of cells lying in that direction.
Data structures
Types of Raster Data Structures
4. Block coding
• is a generalization of run-length encoding to two dimensions. It uses
square blocks to represent the region.
• For each square the position and the size of the pixels are stored.

Block size No. Cell coordinates


9 1 6,2
4 2 4,2 7,5
1 7 9,3 5,4 9,4 6,5 6,6 6,7 7,7
Data structures
Types of Raster Data Structures
5. Quad tree
• Quad tree coding stores the information by subdividing a square region
into quadrants, each of which may be further subdivided in squares until
the contents of the cells have the same values.
• A quadrant that cannot be subdivided is called a “leaf node.”
Data structures
Types of Raster Data Structures
5. Quad tree
Data structures
Format of Data Structures
• Formats used for data storage and exchange for vector data include:
Vector (object oriented)
DXF Drawing exchange format (autoCAD)
DGN DesiGN file (Microstation)
HPGL Hewlett Packard Graphics Language
EPS Encapsulated postscript
PS Postscript

• Formats used for data storage and exchange for raster data include:

Bit-mapped image (raster)


TIF Tagged Interface File format
BMP Windows bitmap
GIF Graphics Interchange Format
JPG Joint Photographic (Experts) Group
Data structures
Advantages and disadvantages of vector and raster data
Vector data
Advantages
• Vector data models tend to be better representations of reality due to the
accuracy and precision of points, lines, and polygons over the regularly
spaced grid cells of the raster model.
• provides an increased ability to alter the scale of observation and analysis.
As each coordinate pair associated with a point, line, and polygon
represents an infinitesimally exact location.
• Vector data tend to be more compact in data structure, so file sizes are
typically much smaller than their raster counterparts.
• Topology is inherent in the vector model. This topological information
results in simplified spatial analysis (e.g., error detection, network
analysis, proximity analysis, and spatial transformation) when using a
vector model.
Data structures
Advantages and disadvantages of vector and raster data
Vector data
Disadvantages
• the data structure tends to be much more complex than the simple raster
data model. As the location of each vertex must be stored explicitly in the
model, there are no shortcuts for storing data like there are for raster
models.
• the implementation of spatial analysis can also be relatively complicated
due to minor differences in accuracy and precision between the input
datasets.
• the algorithms for manipulating and analyzing vector data are complex
and can lead to intensive processing requirements, particularly when
dealing with large datasets.
Data structures
Advantages and disadvantages of vector and raster data
Vector data

Advantages  Data can be represented at its original resolution and form


without generalization.
 Since most data, e.g. hard copy maps, is in vector form no data conversion is
required
 Accurate geographic location of data is maintained
Disadvantages  The location of each vertex needs to be stored explicitly.
 Continuous data, such as elevation data, is not effectively represented in vector
form. Usually substantial data generalization or interpolation is required for these
data layers.
 Spatial analysis within basic units such as polygons is impossible without extra
data because they are considered to be internally homogeneous
Data structures
Advantages and disadvantages of vector and raster data
Raster data
Advantages
• the technology required to create raster graphics is inexpensive and
universal.
• raster graphics are the relative simplicity of the underlying data structure.
Each grid location represented in the raster image correlates to a single
value (or series of values if attributes tables are included).
Data structures
Advantages and disadvantages of vector and raster data
Raster data
Disadvantages
• raster files are typically very large: Particularly in the case of raster images
built from the cell-by-cell encoding methodology, the sheer number of
values stored for a given dataset result in potentially enormous files.
• the output images are less “pretty” than their vector counterparts. This is
particularly noticeable when the raster images are enlarged or zoomed.
• The geometric transformations that arise during map reprojection efforts
can cause problems for raster graphic.
• using the raster data model is that it is not suitable for some types of
spatial analyses. For example, difficulties arise when attempting to overlay
and analyze multiple raster graphics produced at differing scales and pixel
resolutions.
Data structures
Advantages and disadvantages of vector and raster data
Raster data Advantages  The geographic location of each cell is implied by its position in the cell matrix.
Accordingly, other than an origin point, e.g. bottom left corner, no geographic
coordinates are stored.
 Continuous data e.g. elevation data is accommodated well.
 Mathematical modeling is easy because all spatial entities have a simple, regular
shape
Disadvantages  It is difficult to adequately represent linear features as their boundaries are not
well delineated
 Cells suffer from mixed pixel problem

 Using large grid cells to reduce data volumes reduces spatial resolution; loss of
information & inability to recognize phenomenologically defined structures (The
cell size determines the resolution at which the data is represented).

You might also like