NXIGHBOHHOOD MODELS:
AN ALTERNATIVE POH
THE MODELING OF SPATIAL STHUCTUHES
aria del Carmen Heyes-Guerrero
Licenciatura, Universidad ~ a c i o n d l
~ u t d n o m ade ~ d x i c o ,1974
[Link]., Universidad ~ u t 6 n o m aMetropolitans, 1981
THl3SIS SU!dIVlI'l!T~D I h PARTIAL
~ ' I J~r~ILLlvlEr\lT
OF
THE HEUUIkEMEldTS FOR THh 5LGKXE OF
DOCTOR OF PHILOSOPHY
by
Special Arrangements
aria del Carmen Reyes-Guerrero, 1986
January, 1986
All rights reserved. This work nay not be
reproauced in whole or in part, ~y photocopy
or other means, without permission of the author.
APPROVAL
Name :
Y m i a d e l Camen ~ e ~ e s - ' ~ u e r r e r o
Degree :
Doctor of Philosophy
T i t l e of
Neighborhood Models: An PAternative for t h e
Modeling of S p a t i a l S t r u c t u r e s
Ekaniriing Cannittee :
1 x 1 Hutchinson
Chainran:
T. K. Poiker
Senior Supervisor
P.L. Brantinghm
B.K.
Ehattachqa
D.M. Eaves
--
W.G. G i l l
R.L. Morrill
Professor
Ekternal Examiner
Departrent of Geography
University of Washington
Date Approved:
19 Ti;:e l : i s
PARTIAL COPYRIGHT LICENSE
I hereby g r a n t t o Simon Fraser U n i v e r s i t y t h e r i g h t t o lend
my t h e s i s , p r o j e c t o r extended essay ( t h e t i t l e o f which i s s h o w below)
t o users o f t h e Simon Fraser U n i v e r s i t y L i b r a r y , and t o make p a r t i a l o r
s i n g l e copies o n l y f o r such users o r i n response t o a request from t h e
l i b r a r y o f any o t h e r u n i v e r s i t y , o r o t h e r educational i n s t i t u t i o n , on
i t s own behalf o r f o r one o f i t s users.
I f u r t h e r agree Pha-1 perri~is s i o n
f o r m u l t i p l e copying o f t h i s work f o r s c h o l a r l y purposes may be granted
by me o r t h e Dean o f Graduate Studies.
It i s understood t h a t copying
-or p u b l i c a t i o n o f t h i s work f o r f i n a n c i a l g a i n s h a l l n o t be a l l o w e d
w i t h o u t my w r i f t e n permission,,
T it l e o f Thes i s/Project/Extended
Essay- ,
Neighborhood Models : An Alternative for the ~odelin~
of
------
Spatial Structures
Author:
---.
(signature)
'
Maria del Carmen Reyes-Guemro
(date)
--.,-_
iii
Abstract
In the last three decades there has been a widespread use of
quantitative models in geography.
The majority of models have
been applied for descriptive, predictive and
hypothesis-testing purposes.
Quantitative geography is at a
-stage where the benefits and limitations of most of these
models nave been tested.
Consequently, there has been a
tendency to seek models considered to be more "appropriate"
for geographical analysis.
This work examines some of the
most commonly used mathematical tools in human geography and
presents a new family of models as an alternative for the
mathematical representation of spatial structures.
L
The proposed models are based on tne notion of "geographical
neighborhood" and are named neighborhooa moaels.
In the
formalization process mathematical concepts such as space and
subspace are used to uodel the notion of "neighborhooa";
two
quasi-mathematical structures (geo-spaces and geo-subspaces)
are defined as an aid in this modeling.
As a first step in the construction of neighborhood models a
set of measures of local variation (heterogeneity indices) are
introduced.
The role playea by these indices from a
mathematically formal point of view is funaamental, since
through them it is possible to combine and benefit from two
mathematical areas of knowledge: topology and fuzzy set
theory.
In tne second part of the thesis the indices are applied to
two geographical problems:
1 ) the design of classification
algorithms with regionalization purposes; 2) an aid in the
,selection of sites for the allocation of resources in an
educational planning environment.
In the first case the index
is used to define the degree of membership of an element to
the interior (or border) of a region tnrough the topological
concept of neighborhood and that of fuzzy set. In the second
application several indices are used to describe spatial and
temporal characteristics of tne demana for schools in an area
in central Mexico.
Finally, some future areas of research are proposed.
Although
these ideas concerning neighborhood models have not been
completely explored and developed, it is clear that the
approach provides a fruitful avenue for research.
To H o d o l f o , F i t o and P a b l o ,
C h a t a and C h a t o ,
Maru, Ara, and P a c o .
To my f r i e n d s .
This thesis was written under the senior supervision of Thomas
K. Poiker from whom I received suggestions and assistance as
well as support and encouragement.
Discussion of the thesis
with Pat Brantingham (supervisor) was also of assistance and
her support and enthusiasm were of great help for the
completion of this work.
Binai Bhattacharya (supervisor) made
,valuable comments on the algorithmic aspects as well as on the
general format.
I have also benefitted from discussions with
David Eaves (supervisor) as well as from Armando Bayona and
Rigoberto Quintero (~ireccidnGeneral de ~eografia,~exico).
In several aspects related with the Mexican educational
planning system I was kindly assisted
(SEP, Mkxico)
by
Maria Eugenia Reyes
David Howard (SEP, 1ul4xico) read the
preliminary versions and made valuable suggestions regarding
the use of the English language.
Dr. Xnrique Calder6n (UNAM,
~ d x i c o )is gratefully acknowledged for his continuous support
during the Ph.D. program.
Financial assistance for the first four years was given by tfle
Mexican Government (CONACYT), later I was awarded a one
semester S.F.U. stipend and in the last stage of the
completion of this thesis, I received assistance from the
Direccibn General de Geografia (~gxico),Nkstor Duch is kindly
acknowledged for this support.
vii
TABLE OF CONTENTS
Approval
ii
Abstract
iii
Dedication
Acknowledgments
vi
CHAPTER 1 . Introduction
1.1
Mathematical Models
1.2 Quantitative Geography
1 .2.1 Galton's Problem
1 .2.2 lqeignborhoods
CHAPTliR 2. Mathematical lqodels in Human Geography
2.1 Classical lvlodels
2.1 .1 Factor 1~Iodels
2.1 .2 Gravity Moaels
2.1 .3 Networks
2.2 The
Neighborhooa Approach
2.2.1 Geographical a n a Topological
Neighborhoods
2.2.2 Spatial Autocorrelation
2.2.3 Geostatistics
2.2.4 Topological Data Structures
2.3 Discussion
viii
CHAPTER 3. Neighborhood Models
3.1 A General Framework
3.1 .1 Intuitive Ideas
3.1 .2 Spaces and Subspaces
3.1 .3 Geo-spaces
3.1 .4 Geo-subspaces
3.2 Definition of Neighborhood Model
3.3 The Heterogeneity Index
3.3.1 Formal Definition
3.3.2 Interpretations
3.4 Fuzzy Topology
3.4.1 Definition of Fuzziness
3.5 Otner Indices
3.6 Conclusions
CHAPTER 4. A Topological Approach to Regionalization
4.1 Regionalization as a Classification Proolem
4.1 .1 Elements of a Kegionalization
4.1 .2 The Geographic Units
4.1.3 Measures of Homogeneity
4.1.4 Hegionalization Constraints
4.1 .5 The Number of Regions
4.1.6 The Algorithms
4.2 Spatial Algorithms
4.2.1 Byfulgien and Nordgard Algorithm
4.2.2 Berry's Algorithm
4.2.3 Lankford Algorithm
4.2.4 Brantingham Algorithm
4.3 The Design of Algorithms
4.3.1 A Topological Algorithm
4.3.2 The Regions as Graphs
4.3.3 Heteregeneous Regions
4.3.4 Fuzzy Regions
4.3.5 The Heterogeneity Surface
4.4 A Comparative hxample
4.4.1 A Hypothetical Case
4.4.2 A Topological Ward Algorithm
4.4.3 Conclusions
CHAPTER 5.
An Application to Xducational Planning
5.1 Introduction
5.2 School Location Planning
5.2.1
lv~odelsand PIethods
5.2.2 Administrative Policies
5.3 hducational Planning in Mexico
5.3.1
historical background
5.3.2 Planning Experiences
5.4 A Case Study of the State of San Luis Potosi
5.4.1 The Data
5.4.2 The Geo-space
5.4.3 The Geo-subspaces
5.4.4 Some Spatial Characteristics of the
Demana for Preparatory Schools
5.4.5 Additional Considerations
5.5 Summary
CHAPTER 6. Conclusions
6.1 Summary
6 . 2 Discussion
Appendix A.
Mathematical Concepts
A.l Matnematical Definitions
A . 2 Mathematical Discussion
References
L i s t of Tables
Number o f Neighbors by D i s t a n c e
173
S e r v i c e of P r e p a r a t o r y S c h o o l s f o r a 20 Km.
T r a v e l l i n g D i s t a n c e (1984 d a t a )
1 75
Number of Secondary G r a d u a t e s p e r
Settlement
177
S t a n d a r d i z e d H e t e r o g e n e i t y I n d e x 20 Km.
Network
181
S t a n d a r d i z e d H e t e r o g e n e i t y I n d e x 1 5 and
25 Km.
Networks ( 1 9 8 3 )
S t a n d a r d i z e d Temporal H e t e r o g e n e i t y I n d e x
( E x c l u d i n g County S e a t s )
Standardized Heterogeneity Index of t h e
Temporal S t a b i l i t y
xii
L i s t of F i g u r e s
Complete Graphs of Three and Four P o i n t s
Dendogram
O r i g i n a l Data
R e s u l t i n g Regions ( F i r s t S t a g e )
Subgraphs and Minimal Spanning T r e e s
Minimal Spanning T r e e (Second S t a g e )
Dendogram
Minimal Spanning T r e e
S i n g l e Linkage Method
F i r s t o r d e r Neighbors
Study Area
H y p o t h e t i c a l Case.
V a r i a b l e s A , B and C
M u l t i v a r i a t e H e t e r o g e n e i t y Index
T o p o l o g i c a l Algorithm ( F i r s t S t a g e )
V a r i a b l e s A , B and C.
Standardized
H e t e r o g e n e i t y Index
Standardized Multivariate Heterogeneity
Index
T o p o l o g i c a l A l g o r i t h m (Second S t a g e )
2 1 R e s u l t i n g Regions
S i n g l e Linkage Method.
Regions
69 Resulting
Single Linkage Method.
2 1 Resulting
Regions
Number of Schools.
Number of Students
Distribution of Elementary Schools in the
State of San Luis Potosi
Distribution of Secondary Schools in the
State of San Luis Potosi
Distribution of Preparatory Schools in the
State of San luis Potosi
State of San Luis Potosi
Communication Network
2 5 Krn.
Network
2 0 Km. Network
15 Km. Network
10 Km. Network
Catchment Areas,
20 Km. Network
Heterogeneity Index.
2 0 Km. Network
Spatial Distribution of the Temporal
Heterogeneity Index
191
Distribution of the Spatial Heterogeneity
Index Applied to Temporal Stability
192
210
210
Chapter 1 .
INTRODUCTION
Since the beginning of the Quantitative Revolution in
Geography a considerable amount of mathematical models have
been used to represent different aspects of the geographical
landscape.
There are, however, certain facets of spatial
structures that have received less attention from modelers.
Such is the case of the geographical notion of neighborhood.
The main aim of this thesis is to build mathematical models
designed to allow the formal representation of the
geographical concept of neignborhhod.
In the first part of this chapter, an overview of the role
played by mathematical models in the discipline is given,
while in the second part, intuitive notions of the
1 . 1 Mathematicai Models
Models have played a fundamental role in the development of
various areas of knowledge such as physics, engineering,
architecture, computing, economics and geography.
Commonly a
model is said to be a representation of objects, events and
processes of the real world ( ~ o h n s o n ,1963, p.218).
Depending
on their purpose, models are classified into different groups.
-For example, Forrester (1973, p.49) distiguishes between
physical and abstract, dynamic and static, linear and
nonlinear and stable and unstable models.
A model is said to
be mathematical when mathematical language is used in its
representation.
Mathematical models have been used extensively in the natural
sciences.
For example, in physics practically all knowledge
is expressed in mathematical language, and theoretical
advancement has been accomplished to a large degree through
mathematical modeling.
On the other hand, in the social
sciences the use of mathematical models has not been as
extensive or successful.
In geography, models such as maps
have always been intimately related to geographic knowledge.
However, mathematical models did not have a notable presence
in the development of geographical theory until the so-callea
"Quantitative Revolution1' which took place in the 1950's.
a consequence, a new branch of geography, quantitative
As
geography, was established.
1 . 2 Quantitative Geography
As noted by Gregory (1 983, p.80) mathematics has been used in
geography for a long time, particularly in the construction
and use of maps.
According to his view, trigonometry,
Euclidean geometry and space transformations are areas in
,which geographers are traditionally trained.
The quantitative era in geography is characterized not simply
by the presence of mathematical techniques but by the
intensification and expansion of their use and the
introduction of mathematical modeling and abstract theory.
Although quantitative geography originated in the United
b
States in the 1g5O1s, its development in other countries shows
distinctive characteristics.
According to recent literature
there are two major schools of quantitative geography:
Anglo-American and the Continental European schools.
the
They can
be identified by the facts and circumstances of their
development.
For example, Bennett (1981, p.1) characterizes
the German and French tradition as being more concerned with
deep methodological questioning, while in the English-speaking
countries more importance has been given to the development of
analytical techniques, and a more pragmatic approach has been
pursued.
In the 19801s, three decades after the initiation of the
quantitative revolution, European researchers are engaged in
the historical analysis of the quantitative approach and the
contrastive analysis of the state of the art in the two
principal schools.
Several academic meetings have been
organized to discuss these topics and their relevance to the
future of quantitative and theoretical geography ( ~ a i n i n g ,
,1984, Bennett , 1 981 and Beaumont, 1 983)
As a result of these meetings, a consensus seems to have
emerged regarding the initial development stages of
quantitative geography:
In early research, too much
importance was given to the techniques and not enough to their
role in the development of geographic theory.
For example,
b
the main purpose in applying mathematical techniques was often
to test untried tools.
In this respect Bennett and krigleyls
more theoretical perspective ( 1 981 , p. 6) regarding "core" and
"frame" disciplines are particularly interesting.
According to these authors, core disciplines are "those areas
of thinking which are providing new systems, concepts, and
developing new explanatory paradigms."
In contrast, frame
disciplines "are those which derive in methodology, object of
study, and terminology, from other external subjects.''
Adopting these terms, quantitative geography sought to be a
core discipline in its initial stage while in most recent
years it has become more of a frame discipline.
Consequently,
researchers in various branches of geography are making more
extensive use of quantitative techniques without necessarily
consid'ering themselves "quantitative geographers. I'
It should be mentioned that quantitative geography has evolved
not only in its approach to the use of mathematical models but
also in the kind of techniques used.
For example, statistical
inference in the 1960's "was seized as a panacea for
geographical metnodologytt
enne nett
and Wrigley, 1981, p.8),
while in the 1970's the appropriatness of this methodology was
questioned.
As a result, statistical techniques for specific
geographical purposes were designed.
Such is the case of
Cliff and O r d t s autocorrelation (1973) and Clark's
geostatistics (1979). Bennett and Wrigley (1981, p.9) have
anticipated that statistical inference will continue to be
used in the future but not to the extent that it was in tne
past.
In general, early quantitative methods adapted mathematical
models designed in otner areas of knowledge to represent the
"geographical landscape."
This is the case of such widely
used models as tne gravity model, factor analytic methods,
regression models, cluster analysis as well as a wide range of
statistical techniques.
However, as a result of critical
analysis by both geographers and scientists from other
disciplines, in more recent years more appropriate models and
discipline-specific techniques have been developed.
1 .2.1
Galton1s Problem
The criticisms maUe by Sir Francis Galton at the end of the
last century of the use of correlation analysis in
,anthropological studies is the point of departure of the
present work.
In 1889 at a meeting of the Royal Anthropological Institute,
E. B. Tylor recognized that anthropology needed a scientific
methodology for the analysis of ethnographic data and proposed
a cross-cultural survey methodology.
He applied this method
to data he had collected on different tribes and societies.
The importance of Tylorls work resides in the fact that he was
able to correlate distinct traits which were present in the
1
I
various human groups under study.
During the discussion
period of the meeting, a comment made by Sir Francis Galton
had a significant impact on the future development of both
anthropology and geography:
It was extremely desirable for the sake of those who may wish
to study the evidence for Dr. Tylorls conclusions, that full
information should be given as to the degree in which the
customs of the tribes and races which are compared together
are independent. It might be, that some of the tribes had
derived them from a common source, so that they were duplicate
copies of the same original. Certainly, in such an
investigation as this, each of the observations ought, in the
language of the statisticians, to be carefully "weighted." It
would give a useful iaea of the distribution of the several
customs and of their relative prevalence in the world, if a
map were so marked by shadings and colour as to present a
picture of their geographical ranges ( ~ y l o r ,1889).
As a consequence of this remark spatial dependence became an
issue in both anthropological and geographical studies and
spatial analysts became aware of the need to incorporate the
"spatial dimension" in their mathematical models.
One of the
concepts that has been used in modeling spatial dependence is
,that of "geographical neighborhood."
1 .2.2 Neighborhoods
A retrospective analysis of mathematical models and techniques
makes it clear that certain aspects of the "geographical
landscape" have been much more widely modeled than others.
For example, the modeling of properties such as distance and
shape were part of the "new geometric tradition" of the early
1960's ( ~ a i n i n g ,1983, p.86)
There are, however, other
aspects of the geographical landscape which have received
little attention from modelers.
This is the case of the
geographical notion of "neighborhood."
The concept of neighborhood is treated in geographical stuaies
at different levels and with different meanings.
For example
in urban studies neighborhoods are considered as sub-areas
I
with homogeneous characteristics such as level of income, age
of population, etc. and are used as the basic elements for
analysis
all, 1583).
In other instances geometric
characteristics of the geographical landscape are studied
through the relationship among neighbors.
Such is the case of
the cardinal neighbor method (~lliott,1983) where the
geometric patterns of cities and population centers are
identified and analysed, based on the neighborhood
relationship among them.
Neighborhoods also play an important role in other areas of
knowledge.
For example in mathematics, there is a whole
discipline which is based on the formal notion of
neighborhood, namely topology.
In it, concepts that are
usually defined in an euclidean space are generalized through
the concept of neighborhood and the "topological
b
characteristics" of mathematical objects are studied (Firby
and Gardiner, 1982).
Furthermore these mathematical concepts
have been applied in other disciplines such as chemistry,
where the topological characteristics of tnree dimensional
networks are used in the analysis of the chemical
characteristics of compounds (springer, 1973) and in the
classification ana characterization of gold compounds
all,
Gilmour and Mingos, 1984), among other applications.
Although the notion of neighborhood is conceived in different
manners among distinct disciplines, at an intuitive level the
concept is based on the idea that tne "space" that surrounds
an entity or a specific portion of "space" is of special
significance.
At elementary levels of analysis in both
mathematics and geography, the concept of neighborhood is
related with the intuitive notion of "surrounding space".
However, in both disciplines different definitions of
neighborhood are used according to the problem at hand.
For
example in some quantitative techniques such as
autocorrelation, geostatistics and topological data
structures, specific meanings are given to the concept of
neighborhood.
There is however, no general treatment of this
concept in the geographical literature.
To exemplify some of
the methods and techniques, specific definitions of
neighborhood are used throughout the presentation, but
emphasis is done on the general concept of geographical
b
neighborhood.
Briefly, the principal objective of this thesis is the design
of a family of formal tools that allow the modeling of the
concept of "geographical neighborhood.''
In Chapter 2 some classical models as well as those that have
incorporated the spatial dimension through the concept of
neighborhood are reviewed.
In Chapter 3 the notion of
neighborhood model is introduced, and the mathematical
concepts that are used in the modeling process are defined.
In Chapters 4 and 5 two applications of neighborhood models
are presented.
regionalization.
The first is a theoretical application in
The second is an application to a specific
problem in educational planning.
Finally, tne appendix
contains a mathematical discussion of some of the concepts
used in the thesis.
Chapter 2.
PiATHhMATICAL IviODELS I N HUIiAN GEOGKAPHY
Prom the period of the "Quantitative Revolution'' in geography
to the present, a considerable number of mathematical models
and methods have been used for geographical analysis purposes.
Many of them are adaptations of formal tools used in other
sciences such as physics, biology, botany, economics and
psychology.
The development of quantitative geography is at a
b
stage where it has Pecome necessary to analyse the benefits
and limitations of the techniques used in the past and to
propose new ones.
Mathematical models that have been used in the past have
represented distinct aspects of spatial structures using
various formal tools.
Since the principal objetive of this
work is to present neighborhood models as an alternative for
the modeling of spatial structures, it is convenient to study
some of the mathematical structures that have been
incorporated into existing models.
In this chapter some of the mathematical models that can be
considered as classical in human geography are reviewed as
well as those that have incorporated the notion of
neighborhood in the representation of the geographic
landscape.
2.1 Classical Models
Three of the models that have been extensively used in human
geography are: factor models, the gravity model and networks.
According to the purpose of application of each one of them,
the geographical landscape has been moaeled in different ways.
Whether the models actually include the entities and the
relationships arnong them that are most important for
L
geographical studies remains an open question. The reason for
presenting these three models is to establish which elements
of the geographical landscape have been included and to
discuss to what extent the spatial structure has been modeled.
2.1 . I Factor Models
Factor analysis was originally developed to aid scientists in
testing hypotheses concerned with the organization of mental
ability.
At the beginning of the century, psychologists wers
interested in measuring "general intelligence" by defining and
quantifying its components.
The point of departure is an nxm
matrix which includes for "nttpersons the scores of "mtttests
associated to each one of them.
By means of factor analysis a
small number (r<<m) of hypothetical factors are determined.
These factors allowed psychologists to isolate what they
viewed as fundamental personality components or "factors of
the mind" ( ~ e e s ,1971 , p.220).
According to Lawley (1 971 ,
p.1) the method was initially restricted to psychometrics, and
,for some time it remained the black sheep of statistical
theory.
Although it is
considered to be a "complicatedtt
technique, its use is increasingly widespread today due to the
existence of easy-to-use computer packages which facilitate
its application.
This mathematical model nas been extensively described both
b
for researchers with a mathematical and statistical background
( ~ a w l e ~1971
,
and Harman, 1976) and for those who lack such
training ( ~ a y l o r ,1977 and
Berry, 1971).
Since this study is
principally concerned with the specific application of the
model, only those mathematical aspects which are relevant to a
geographic context will be discussed.
Factorial Ecology
Factorial ecology is the branch of quantitative geography
which is concerned with tne use of factor models in
geographical studies.
Although principal component analysis
was developed by Pearson (1901) and Hotelling (1933), factor
analysis began with the work of Spearman (1904, 1926).
Research in factorial ecology does not appear until the
mid-fi'fties
ell, 1955 ) . However, the technique has becone
well-established since then, and it is presently taught as
part of the regular curricula in the geography programs
offered by most universities.
According to Taylor (1977, p.255) factorial ecology developed
in a period in which two opposing groups of researchers were
studying the structure of urban centers.
On the one hand,
there were the human ecologists who were interested in the
ecology of urban areas and had proposed several spatial models
(concentric ring model, sector model and the multiple nuclei
model).
On the other hand, there were the social area
analysts who hypothesized that the urban social structure
could be characterized through three indices or dimensions:
economic status, family status and ethnic status.
In fact,
the initial application of ecological factor analysis was
undertaken by Bell (1955).
Its purpose was to test the
hypothesis that urban populations could be adequately
described by the three-status criteria.
The results obtained by Bell were encouraging enough to cause
widespread acceptance and use of the technique among urban
social geographers.
Studies of various cities in the U.S.
( ~ a l i n s ,1971, p.235) allowed researchers to undertake
comparative (congruence) analysis: among the different
metropolitan areas and over several points in time.
In other
major urban centers in the world, factor models were applied
similarly ( ~ a y n e s ,1971, p.324, Janson, 1971, Johnston, 1971,
p.315).
The extensive use of the factor analysis technique for
hypothesis testing and as a descriptive tool has also made
researchers increasingly aware of its limitations.
In some
urban ecology studies the use of the technique has been
successful P U ~that, as with any other formal tool, the
appropriate use of factor models depends on tne understanding
that the researcher has of both the problem and the technique.
L
Factorial Ecology as a Geographical Model
Why and how does factorial ecology qualify as a geographical
model?
The fact that the model is applied through the use of
areal units has been considered enough to
automatically
classify it as a spatial analysis technique.
There have been
some attempts to introduce the location variable as one of the
variates in the factor model in order to transform it into a
"more" explicitly geographic model.
However, the results
obtained from these procedures have come under strong
criticism.
For example, when latitude and longitude are
included as variables in the factor model, two main
disadvantages have been found: the circularity of method and
the lack of invariance to the selection of axes ('Taylor, 1977,
p.275)'.
The first objection refers to the fact that in the
procedure location variables (latitude and longitude) are used
to calculate scores which are latter located on a map,
producing as an effect the location of locations.
In the
second case, criticism is based on the fact that the selection
of orthogonal axes is arbitrary so that the factors obtained
in each case are not necessarily equal.
In other words, the
factorial model is sensitive to the change of axes of the
location variables.
In order to understand how factorial ecology models the
L
geographical landscape it is important to review some basic
concepts behind the modeling process.
When the decision to use a mathematical model is made, the
common procedure is to develop or select the most appropriate
from the existing models.
The most relevant objects of the
phenomenon under study, together with the most important known
relationships among them, are usually expected to be included
in the model.
One of the advantages of using a mathematical model is that
once the real environment has been identified with a
mathematical structure, all mathematical knowledge is at the
service of the researcher.
Expected and unexpected relationships among the objects can
emerge as the result of applying the mathematical techniques.
This may allow the researcher to find optimum solutions to
specific problems.
In fact, the possibilities of using
,mathematical models as aids in research and in the solution of
specific problems is limited only by the researcher's ability
to apply existing tools or to develop new ones.
The history of the application of the model under discussion
reveals that researchers took no special interest in the
spatial relationships among objects (in this case areal
b
units). In fact, there has been no real conceptual difference
in the use of the technique by geographers and psychologists.
Once the areal units and the various census variables are
defined and identified with a mathematical structure (in this
case a matrix), all other crucial relationships beyond tne
scope of the moue1 tend to be ignored.
For example, factors that are extremely important for certain
geographical analyses such as distance, nearness, relative
Position, and contiguity are in no way subject to analysis by
the factorial model.
The model is insensitive to all these
factors.
As applied in Bell's study, the model is unable to
consider the relevance of the size and spatial distribution of
the census tracts.
Whether the census tracts of the city of
Los Angeles were arranged in a regular shape such as a square
or in a chain mode or as they actually are, and whether
similar tracts were close together or far apart, could not
influence the conclusions Bell arrived at based on the
factorial model. In short, the problem is the incapacity of
,factorial ecology to model any spatial structure.
The question that remains is not so much whether introducing
the relative position or absolute location makes factorial
ecology a "more" geographic model, but rather how important it
is that the models used by urban social geographers take
account of those spatial relationships which have been ignored
L
until now.
This question is still subject to debate.
2.1 .2 Gravity Models
The law of universal gravitation announced by Newton in 1666
is o n e of the cornerstones of modern science.
The law can be
stated as follows:
Every particle in the universe attracts every other particle
with a force that varies directly as the product of the masses
of the two particles and inversely as the square of their
distance apart. The direction of the force is along the
straight line joining the two particles ( ~ o w l e s ,1970, p.139).
The importance of movement in social phenomena inspired some
social geographers to use models similar to those developed by
physicists.
A family of models analogous to the law of
universal gravitation have been developed in geography to
study such phenomena as migration between population centers
and retail trade areas: the movements of persons (journey to
work, journey to shop, etc .) and the movement of goods.
The first models, such as the one used by Ravenstein in 1885
,for migration studies in England and Wales, were strongly
inspired by the Newtonian model.
However, more recently
geographers have developed a whole new family of interaction
models that differ substantially from the physical model
( ~ i l s o n ,1971 )
. As
Wilson shows (1 971 , p. 1 ) "gravity model"
has become something of a misnomer.
For example, a conceptual
distinction worth making is that while in physical terms the
gravitational force is exerted in equal magnitude on both
particles, and motion comes as a effect of that force, in
geographical applications the equivalent of force is
identified with movement (flows).
Gravity models have been mainly designed for use as predictive
and descriptive tools.
In the past, planning decisions have
been based on these models' predictions of traffic flows in
several metropolitan areas in the United States, population
migration between cities, and sales in a shopping center
(Taylor, 1977, p.287).
The mathematical expression of the model varies according to
predictive or descriptive objectives. Below are two of tne
most common equations.
Migration Model
The migration Tij between an origin i and a destination j is
,directly proportional to the product of the sizes of the areas
Oi and Dj and inversely proportional to the distance dij
between them, raised to some power n.
-n
Tij = k Oi Dj
(dij)
(2.1)
Interaction Models
L
The interaction between zone i and j is directly proportional
to the product of the "mass terms" Wi and Wj associated with
the zones and inversely proportional to a measure of distance
or cost of travel, raised to some power n.
Tij = k Wi Wj
-n
(~ij)
(2.2)
In these types of models it is common to find additional
Constraints.
These restrictions are often related to
knowledge of the total interaction, emerging or arriving flows
of a zone.
The full interaction model can be written as:
Tij = Ai Bj Oi Dj f ( ~ i j )
(2.3)
The difference between equation (2.3) and previous
mathematical expressions of the model is that the constant k
is substituted by the product of Ai and Bj.
Equations (2.4)
and (2.5) are derived from the constraints imposed on the
model.
According to Shepard (1969, p.8) tnere are two established
methodologies to estimate the parameters of the gravity model:
the regression and entropy maximization approaches.
In the
first case the researcher deals with tne model:
-n
Tij = k Oi DJ (dij) Eij
where Eij is a stochastic residual.
(2.8)
To estimate the
parameters with the ordinary least squares method, equation
(2.8) is transformed through the logarithmic function.
log Tij = log k + log Oi Dj - n log dij
log E i j
(2.9)
With the maximization entropy approach a metnod similar to the
microcanonical ensemble technique in statistical mechanics is
used.
'
The details are not discussed in this work and the
reader is referred to ( ~ i l s o n ,1971, p.4 and Haggett, 1977,
p.40).
,The Gravity Model as a Spatial Model
The Newtonian gravity model is undoubtedly extremely simple.
Its geographic counterpart is also simple.
Social processes
such as migration and journey-to-work trips are predicted from
the relationship between two entities: "mass" and "distance".
Depending on the purpose of the application, mass and distance
represent different quantities.
Their relationship is clearly
established in each one of the postulated models.
For
example, in equation (2.1) for a fixed mass the flow of
population increases as the distance decreases (the inverse is
also true).
This is a well-known geographic relation which
has been stated in various ways: "Towns attract more trade
from near than from far locations" ( ~ a y l o r ,1977, P-207);
"Everything is related with everything else but near things
are more related than distant ones" ( ~ o b l e r ,1970).
Despite
their simplicity, gravity models allow researchers to work
with important aspects of spatial structure.
It is important to note that by means of the law of universal
gravitation it is possible to calculate the force exerted
between two particles.
However, if more than two particles
are involved in the analysis the problem becomes more
complicated.
Although the motions of tne planets in the solar
system have been calculated through numerical solutions
, (Symon,
1969, p. 185), there is no general solution to the
problem involving the motion of any number of particles under
the forces exerted on one another.
Analogously, some of the gravity models postulated in
geography describe interaction exclusively between two
entities.
For example, in the prediction of migration between
two cities as represented in equation (2.1), it is assuued
that there is no other interaction between either of these two
cities and the rest of the universe of study.
There are,
however, geographic gravity models that have been designed to
account for the effect of other cities in the interaction
between two cities.
For example, in the study of
transportation it is assumed that the interaction between any
two cities is reduced due to the presence of a third one.
This assumption is based on the concept of "intervening
Opportunityt1which according to Taafe and Gauthier (1973,
p.95) was first formulated by Stenffer in a study of
intra-urban migration.
This concept is derived from tne
following reasoning:
"The number of migrants from any point within a city to a zone
at the periphery of the city was directly related
to the
number of opportunities or vacancies in that zone and
inversely related to the number of opportunities between the
originating point within the city and the zone in the
periphery."
.In the gravity model that includes this notion it is assumed
that the relationship of intervening opportunity is similar to
that of a distance.
Tij = k Oi Dj
The model can be expressed as follows:
-n
(dij)
Pi j
where Pij is the intervening opportunity.
2.1.3 Networks
One of the branches of mathematics that has had a wide range
of application is graph theory.
It has been used in physics,
chemistry, computer technology, architecture, sociology,
anthropology, linguistics, geography and other disciplines.
According to Harary (1972, p.1 ) , graph theory has been
independently discovered several times.
In the 18th century
Euler gave a solution to the Koningsberg Bridge Problem with
the aid of graphs.
In the 19th century Kirchoff and Cayley
Solved problems in physics and chemistry respectively using
elements of graph theory ( ~ a r a r y ,1372, p.2).
Graphs are commonly represented through diagrams that greatly
facilitate their interpretation.
The diagram is composed of a
set of points representing different entities and a set of
lines joining these points, representing a pre-established
relationship among the points.
Formally a graph can be defined as follows:
A graph consists of a finite set V=V(G) of p points together
with a prescribed set X of q ordered pairs of distinct points
,of V. Each pair x=(u,v) of points in X is a line of G , and x
is said to join u and v.
( ~ a r a r y ,1972, 13.9)
This simple mathematical structure has allowed researchers to
solve a wide variety of problems and at the same time has
encouraged the development of this branch of mathematics.
In
particular, the use of the model to represent geographic
entities and their relationships is very common.
For example,.
graph theory as it relates to the concept of connectivity has
been used
to establish the definition of routes between two
or more population centers.
In a similar context, transport
networks such as railway and road systems have been modeled
with graphs and compared spatially and temporally through
different measures.
Since the economic development of various
Countries has been related to the connectivity of railway
networks, the technique has allowed researchers to establish
interesting relationships.
Studies have been carried out at
different levels (urban, regional, international) using the
?
appropriate measures in each case
re eta
index, density, etc.)
( ~ a g g e t t ,1977, pp. 86-92).
Additionally, temporal
comparisons in the growth of transport networks permit Taaffe,
Morrill and Gould to identify four phases of development in
various countries.
In other cases graphic simulation models
have been developed to predict network growth (Haggett, 1977,
p.301 )
Since graph theory is closely related to other branches of
-mathematics including linear programming, combinatorics,
matrix theory, topology and probability, it is common to find
many other instances where graphs are used in a geographic
context.
Whenever any of these types of models is applied,
graphs are often used either as part of the analysis or as an
aid in the presentation and interpretation of results.
Networks as Spatial Models
Among the existing mathematical tools, graphs appear to be one
of the most appropriate means to model spatial relationships
among geographic entities.
They have been used as descriptive
and predictive tools and for hypothesis testing.
The diagramatic representation of a graph gives the researcher
the opportunity to obtain a complete image of the relationship
among the entities under study.
This is extremely important
for spatial analysis purposes since it allows the researcher
to perceive given relationships spatially and, if necessary,
to correlate them with other spatial relationships. The model
is flexible enough to allow various spatial relationships to
be represented.
Pletric relations, such as distances, either
measured in the Euclidean plane or in a geographic space (e.6.
traffic flows, route distances) can be represented in the
model by attaching a weight and/or a direction to the
corresponding link.
Non-metric or qualitative relations such
.as contiguity are also easily modeled.
2.2 The Neighborhood Approach
During the first stage of development of quantitative
geography there was a strong tendency to use models that had
been designed in other branches of knowledge.
However, as a
L
consequence of the awareness of spatial analysts of the need
to represent the notion of spatial dependence mathematically,
several models that incorporate this concept have been
developed in different branches of geography.
Neighborhoods have been commonly used as a tool in the
modeling of spatial dependence.
This concept has different
meanings in geographical and mathematical terms.
Both
meanings are fundamental to the development of "neighborhood
models1I which is the major aim of the present work.
In the first part of the following section the geographical
and mathematical meanings of neighborhood are presented, and
in the second part three models developed in the latter stage
of the quantitative revolution which include the notion of
neighborhood are discussed.
2.2.1
Geographical and Topological Neighborhoods
Geographical Neighborhoods.
The notion of neighborhood appears frequently in geographical
theory.
For example, as cited by Taylor ( 1 9 7 7 ) , in the
central place theory proposed by Christaller (1933) and Losch
(1940) it is assumed that settlements provide specialized
functions for other settlements.
The size and shape of the
areas served by each of the "central places'' has been one of
the main topics studied in this branch of geography.
One of
the best known hypotheses is that under certain conditions,
including aspects such as demand for central goods, purchasing
power, flow of consumers and other factors, the shape of the
trade regions of the central places is hexagonal ( ~ a g g e t t ,
1977, p.146).
Similarly, in applied branches of geography
such as school location planning (see chapter 5), the areas
surrounding a population center play an important role in
educational planning.
The areas served by a school or group
of schools are called catchment areas.
The distribution, size
optimize the design of school districts, among other things.
Similar neighborhood concepts have been useful in the planning
of health services, shopping centers and banking facilities.
These and other geographic notions of neighborhood have been
formalized using different mathematical concepts including
geometric entities and graphs.
Commonly, the point of
departure is a set of geographic units which are often
,identified with either points or areas in the Euclidean plane.
Geometric [Link] the Euclidean plane is the model involved, it is common
to use regular geometric entities such as circles, rectangles
or hexagons to delimit neighborhoods.
Other geometric
entities such as Thiessen polygons are also used to define
neighborhoods.
In tnis case the polygons are constructed so
that given a set of data points in the real plane, all points
inside a polygon centered on a data point are closer to that
point than to any other data point ( ~ e u c k e ret al., 1976,
p.26).
Geometric entities that satisfy a geographic condition are
also used to define neighborhoods.
For example, in the
delimitation of catchment areas, traveling distances or
existing physical barriers may determine the shape of the
neighborhood (see Chapter 5).
Two common characteristics of the neighborhoods described
above are: a) the unit of interest (eitner a point or a line)
belongs to the neighborhood and b) the resulting Euclidean
subspace (the neighborhood) is connected in mathematical
terms.
It should be mentioned that in geographical applications it is
,common to deal with either point or areal units.
cases the treatment is very similar.
In both
When points are the
units of interest they are often assumed to be in the center
of the geometric entity.
When dealing with areal units, each
one is identified with a point (e.g. tne centroid) and the
neighborhood is defined exactly as it is in the case of point
units.
Another way of defining the neighbors of an areal unit is
through a contiguity relation.
Among areal units it is said
that two units are contiguous if they share a common boundary.
In more formal terms, two areal units are neighbors if they
have at least one segment in common, and the Euclidean
subspace formed by the set of neighbors of a fixed areal unit
11 ,,I
is called the neighborhood of "a".
Graphs When other models such as networks are used, two points are
said to be neighbors if there is a line connecting them.
For
a given point "p" in the graph, the neighborhood of "pttis
defined as the set of neighbors of "p". When areal units are
involved, it is possible to identify each one with a point and
to draw a line between any two points, provided the
corresponding areal units are neighbors.
The original
.structure involving areal units is known in graph theory as
the dual of the graph ( ~ a r a r y ,p.113),
and the definition of
neighborhood is very similar to the case where the units are
points.
The relations established in these graphs are sometimes
represented
in matrix form.
Given "n" units, a nxn matrix is
L
defined as follows:
1 if the units are neighbors
mi j =
0 otherwise
In some applications weights are given to the neighboring
relation.
These quantities represent factors that are
considered important for the phenomena under study.
Examples
might be the length of the boundary between two counties or
the size of flows between two population centers.
In this
case the elements of the matrix are the values of the weights
attached to each pair of units.
Orders of [Link] has sometimes been useful in geographical analysis to
define different orders of neighborhoods and neighbors.
~ e i ~ h b o r h o o dlike
s
those defined in previous paragraphs are
called first order neighborhoods and their elements first
order neighbors.
The set of points which are neighbors of the
first order neighbors and are not first order neighbors
,themselves are called second order neighbors.
set is called a second order neighborhood.
The resulting
Third, fourth and
successive orders of neighborhoods can be defined in a similar
manner
Topological Neighborhoods
.
The concept of neighborhood plays a fundamental role in the
development of topological theory.
Some of this theory's
basic concepts which will be used in subsequent chapters are
discussed in the following paragraphs.
First of all, it should be said that the definition of
topological space is based on the notion of open set.
topology with which most readers are familiar is the "usual
topology" in the real line.
The Usual [Link] are sets of real numbers commonly used in calculus
and mathematical analysis.
real numbers a, b where a
An interval is determined by two
<
b.
It is said that the interval
is open if it does not contain its "extreme points a and b."
Haaser et. al., (1959, p.23) define an open interval as
follows :
The open interval determined by two numbers a and b, where
a < b, is the set of all real numbers x for which a<x<b.
,This open interval is denoted by (a,b). Another way of
writing this definition is
An open set in the real line can be defined as follows:
A set 0 is open if for every point x in 0 there is an open
interval I such that x belongs to the interval I and the
interval is contained in 0. The open intervals are examples
of open sets (Royden, 1968, p.39).
A more formal definition is given by Hu (1964, p.39).
A subset U of R is said to be an open set if for an
arbitrarily given point u in U there exists a positive real
number d such that a real number x is in U if I x-ul < d.
In the real plane the open sets are defined in a similar way.
Consider a disk surrounded by a tight ribbon.
the "borderttor limit of the disk.
The ribbon is
A circle (disk) that does
not contain its border (ribbon) is called an open circle.
Open circles are also examples of open sets in the usual
topology of the real plane RxR.
There are of course other
definitions of open set which vary according to the
5
topological space under consideration.
The open set concept is crucial for establishing the concept
of topological neighborhood.
It is said that a set is a neighborhood of a point if it
contains an open set that contains the point.
be a given space and p be a given point in X. A set
is said to be a neighborhood of the point p in the
space X iff there exists an open set U of X such that
Let
-N c X
This definition comprises the intuitive notion of
neighborhood.
The point of interest belongs to the
neighborhood and the neighborhood is formed by the llspacel'
that is near or proximal to the point.
For example, in the
real plane an open circle with center in 'la1'is a neighborhood
of point "a" (see figure 2.1 ) .
Other topological concepts which have been useful in
interpreting some of the results obtained in this study are
those of interior, exterior and boundary points.
Hu (1964,
p.21) gives the following definitions:
The point p is said to be an interior point of the set E
provided that there exists a neighborhood N of p in X
Contained in E. The point p is said to be an exterior point
of E if there exists a neighborhood N of p in which X contains
no point of E. Pinally, the point p is said to be a boundary
Point of E in case every neighborhood N of p in X contains at
least one point in E and at least one point not in E.
By examining a particular case in the Euclidean plane it is
possible to acquire a more intuitive grasp of this abstract
c0ncep.t. Consider a circle in the real plane RxR.
The points
in tne circumference that limit the circle are border points
while those in the circle are interior points (see figure
2.1 ) .
exterior point
interior point
border point
............................................................
Figure 2.1
9'
%.
f
I.
In other words, an open circle is defined as the set
C = ( x E RxR ; Ix-r'l <dl.
All the points belonging to tne
open circle are interior points of C.
On tne other hand, the
points lying on the circumference of the open circle, that is
jx E RxR ; I x-rll = dl, are boundary points of C.
Finally, the
set of points that are neither in the open circle nor in its
2.2.2 Spatial Autocorrelation
A mathematical technique that was specifically designed for
the study of spatial dependence is that of spatial
autocorrelation.
The concept of spatial autocorrelation can
be summarized as follows:
it is said that a set of areas
exhibits positive spatial autocorrelation if high values of a
variable in one area are associated with high values of the
,same variable in neighboring areas.
In brief, spatial
autocorrelation is a statistical technique that allows the
researcher to test hypotheses on spatial dependence.
The entity under study is assumed to be a two-dimensional area
which has been partitioned into non-overlapping regions that
are exhaustive of the area.
The basic areal units are called
counties, but tne technique is equally valid if the objects of
interest are point units.
Since in the study of spatial dependence the relationship
between an entity and its surrounding is fundamental, the
concept of neighborhood has a key role in the spatial
autocorrelation model.
It is common practice to represent the
neighborhood relationship in this type of analysis through tne
use of a matrix.
In some cases this permits the relations to
be weighted and different orders of neighbors to be taken into
Consideration.
In order to test hypotheses on spatial autocorrelation various
statistical techniques have been designed.
One of the best
known is the one proposed by Geary (cliff and Ord, 1973, p.8):
(n-1 )
.% ~f= 1
dij (xi-xj)
1=1
where n is the number of units,
xi is the value associated to the ith unit,
Zi
xi-X ; the deviation with respect
to the mean,
dij
- =
0 if units i and j are not linked
(1
if units i and j are linked
A= 1/2 Li ; the total number of links in
the county system
n
Li=
2 Wi j
; tne number of units linked to unit i.
Clearly, this measure is sensitive to the spatial pattern
induced by the neighboring relation.
However, it ignores
other spatial characteristics of the units such as shape and
size (cliff an Ord, 1973, p.272).
There are many situations in which this type of technique has
proven useful.
Among them are map comparisons with
applications to diffusion processes and the analysis of
regression residuals (cliff and Ord, 1973, p.69, 105).
2.2.3 Geostatistics
~eosta'tisticsis a field which was developed for, and mainly
applied to,
mining problems.
mining however.
Its relevance is not limited to
Again tne concept of neighborhood is a
fundamental part of the model.
According to Clark (1979, p.1 ) geostatistics began in the
early 1960's with the work of George Matheron and was then
introduced as "The Theory of Regionalized Variables."
The
basic problem it addresses is the estimation of a sample at a
particular location in space or time.
A well-known
application of these statistical techniques is the estimation
b
of ore reserves.
The method is designed to permit local estimation.
Given a
relatively small number of samples in an area, how can the
value of a fixed point belonging to that same area be
estimated?
The relative position of the point with respect to the samples
is assumed to determine its value.
This factor is accounted
for in the model by means of the concept of distance.
In fact
it is often assumed that the difference in value between two
points depends only on the distance between them and their
relative orientation
lark, 1979, p.5).
A basic concept in geostatistics is that of the variogram.
Given the set of differences between the values of all the
sample points, the variogram is defined
as its standard
deviation.
>The experimental variogram is expressed as follows:
Where h describes the distance and the relative orientation, g
is the grade (value) associated with the point, x denotes the
position of one sample and x+h the position of the other, and
n is the number of possible pairs in the sample set. Y(h) is
called the semi-variogram
lark, 1979, p.5).
For a given distance and orientation (e.g. 100m and
north-south) the values of the experimental variogram are
calculated.
The resulting set of values is plotted and used
to calculate "expected" values of the difference between the
grade values of two samples (Clark, 1979, p. 18)
According to Clark (1979, p.6) several semi-variogram models
have been designed, but only a few are regularly used.
include the spherical and exponential models
P-6)
These
lark, 1979,
'
2.2.4. Topological Data Structures
Another branch of geography in which the concept of
geographical neighbornood has been especially important is
that of Geographic Information Systems (GIS).
There are in
fact several areas where these applications have proven
fruitful.
Examples are image processing, digital terrain
-models, computer cartography and census data bases.
A similar process to mathematical modeling must be followed in
systems design.
A set of entities along with their
characteristics and relationships has to be identified with
formal structures that are representable in computer systems.
Fortunately, there are several well-established computer
representations for those mathematical structures such as
graphs and matrices which are often used in geographic
applications.
In the design of a GIs it is particularly common to find
spatial relationships that are easily manipulated through the
use of graphs.
Two examples are street structure in urban
areas and transportation networks.
Consequently, data
structures that allow efficient manipulation of graphs have
been designed in the past.
These types of structures have
been referred to in geographic literature as topological data
structures.
An example of an application of a topological data structure
is the one Peucker et. al. proposed (1976) for the treatment
of three-dimensional surfaces.
The data is a set of irregularly distributed points of the
surface.
Each of the points is selected so that it has a high
,content of information and is significant for the digital
terrain model ( ~ e u c k e rand Chrisman, 1975, p.64).
The data
set is assumed to be "triangulated" so that every point is a
vertex of a triangle.
This triangular irregular network (TIN)
is composed by triangular facets that cover the study area.
The neighborhood of each point in the TIN is defined as the
set of points that are connected to it by an edge of a
triangle.
The data structure is designed so that the
neighborhood of each point is explicitly stored.
This type of structure is important because it adequately
represents a graph such as the TIN.
However, its real value
is that this representation of geographical data effectively
allows the user to manipulate information using spatial
criteria.
The idea behind this type of structures is similar
to the one proposed in this study and applied in a different
context ( ~ e u c k e rand Chrisman, 1975).
Many other GIs based on this type of structures are found in
the literature.
The reader is referred to Dutton, (1978) and
Peucker and Chrisman (1975).
The concept of neighborhood is also found in other areas of
geoprocessing.
For example, in image processing when the
purpose is texture discrimination, it is common to replace the
grey level of each point by the average grey level of its
.neighborhood (~osenfeld,1978, p.3), and in the manipulation
of polygonal data the concept of local processing allows the
user to work with an amount of data which would be impossible
to consider if the whole data set were involved.
2.3 Discussion
Mathematical modeling has been used in geographical studies
for the past thirty five years.
The first stage of
development of geographical quantitative techniques is
characterized by the adaptation of existing techniques and
models in other areas of knowledge, while a second stage can
be identified by the development of models and techniques
specifically designed for geographical purposes.
Presently
both the classical models and those that incorporate the
notion of
neighborhood continue to be widely used.
It should
be mentioned that besides those models that have
Seen described there is a considerable amount of mathematical
models and techniques that have been applied in a geographical
context.
Such is the case of regression analysis ( ~ r o u w e rand
Ni jkamp, 1984, Rogerson, l984), discriminant analysis
(Fotheringham and Heeds, 1979,
Yupa and Mayfield, 1978),
probability theory ma or ley and Thornes, 1972, Burnett, 1978,
Muckay, 1983), simulation (Phipps and Laverty, 1983, Morrill
and Kelly, 1970) and linear programming ( ~ r o m l e yand Hanink,
,1985, Garfinkel and Nemhauser, 1970, Maxfield, 1972).
Currently, three main tendencies of research are found in the
area of quantitative geography: the application of existing
models or techniques to real-world problems,
the examination
of the mathematical properties and characteristics of existing
models and the modification of existing tools so that they
overcome criticisms.
Examples of the application, and in some cases adaptation of
models to specific situations are the works of Brouwer and
Nijkamp (1984) where a regression model is applied to the
study of the regional quality-of-life and residential
preferences in Holland and in that of Mulligan and
Gibson
(1984) where the purpose is to calibrate an economic base
model for
small communities.
On the other hand, further
studies of the characteristics of models are found in the
research undertaken by Smith (1984) where the main
purpose is
to characterize the gravity model in theoretical terms and in
:
that of Jong, Sprenger and Van Veen (1984) where the extreme
values of two spatial autocorrelation indices are derived.
Finally, efforts to adapt existing models to conditions that
had not been considered in the first design are found in the
works of Bodson and Peeters (1 975) and Bivand (l984),
regarding modifications of the linear regression model and the
spatial dependence effect
and in that of Schwab and Smith
-(1985) and Slater (1984) where the question of the form of
spatial interaction models regarding the level of spatial
resolution is addressed.
Since the initial stage of development of quantitative
geography criticisms have been made at two different levels.
At the more general level the criticisms are directed to the
general use of quantitative techniques.
The main argument is
based on the idea that the quantitative approach is a
positivist one
e en nett
and Wrigley, 1981, p.10, Johnston,
1981). At a second level comments are made around either the
use of the models or in the mathematical characteristics of
specific models and techniques.
Most of the criticisms made
in this second level are related with the statistical methods
that are commonly found in geographical studies ( ~ o u l d ,1970,
Martin, 1974, Sheppard, 1979, Bennett and Wrigley, 1981, p.8).
Quantitative geography is at a mature stage were the initial
enthusiasm provoked by early results has faded, and the
complete rejection of its benefits is not a current tendency.
Mathematical modeling is viewed as a tool for geographical
studies accepting that in some cases the quantitative methods
have proven to be a fruitful approach and at the same time
that their limitations are such that the search for better
models is far from having come to an end.
This last statement
is particularly true in relation with the modeling of spatial
,structures.
As mentioned in the description of the classical models,
widely used tools such as the factor models, were not designed
for the representation of spatial structures.
It is a fact
that in the modeling of the geographical landscape two
competing components are often found.
On one hand the
geographer is interested in studying the characteristics of a
phenomena that are a consequence of the site itself but on the
other hand geographical studies are focussed in the spatial
relationships between a site and its surrounding.
Berry
(1968, p.226) describes this fact as the dichotomy within
Geography, the dual concepts of site and situation: "Site is
vertical referring to local, man-made relations, to form and
morphology.
Situation is horizontal and functional, referring
to regional interdependencies and the connections between
places, or what Ullman calls spatial interactions".
These two
competing components are present in the design of models.
In
i.
completely dominant over the "situation or Galton's" one,
while in other nodels such as that of spatial autocorrelation
the relationship is reversed.
There is no doubt that to
adequately represent spatial structures it is necessary to
design models with a dominant "Galton's component" without
obliterating the other one.
In this thesis, models that
incorporate Galton's component through the topological concept
,of neighborhood are presented.
Chapter 3.
NEIGHBORHOOD MODELS
Among the models presented in Chapter 2, those that use the
notion of neighborhood to model spatial structure are clearly
distinguishable.
In each one of them geographical
neighborhoods are represented through different abstract
entities.
However, a global treatment of the use of
neighborhoods to represent spatial structures is not found in
the literature.
In the first sections of this chapter a quasi-mathematical
structure is proposed as a general framework for the design of
models that follow a neighborhood approach, and the concept of
neighborhood model is presented.
Different characteristics of neighborhoods are of interest for
the spatial analyst.
In the second section of the chapter,
the notion of local variation of a neighborhood is formalized
through various indices, and its geographical and mathematical
interpretations are discussed.
3.1 General Framework
Since %he use of mathematical models in human geography is
relatively new, the accumulated experience in formalizing (in
a mathematical sense) geographical concepts is also relatively
small.
In the natural sciences it is a common practice to
establish abstract models based on the scientist's knowledge
of the phenomena under study.
An analogous procedure has been
followed in geographical models.
It is, however, possible to establish explicitly, intermediate
stages in the formalization process.
This makes the modeler's
task of selecting appropriate mathematical representations of
b
the geographical landscape easier.
In this section two quasi-mathematical structures (geo-spaces
and geo-subspaces) are defined as an aid in the design of
models that incorporate neighborhoods.
Both geo-spaces and
geo-subspaces must be defined prior to designing the
mathematical model.
3.1.1. Intuitive Ideas
Whenever the geographical landscape is modeled, geographical
entities are commonly identified with mathematical entities
such as points, lines, areas or surfaces.
There are, however,
other elements of importance to geographical analysis, such as
the relative position of a entity with respect to its
surrounding or neighborhood, that have seldom been dealt with
mathematically.
Maps are excellent examples of the fact that geographers are
usually not interested in the study of isolated entities.
Undoubtedly, the map is the most successful of geographical
models. It allows the geographer to represent the most
relevant spatial relationships including distance, contiguity,
connectivity and shape. However, its most outstanding
characteristic is that the relative position of all its
elements with respect to their neighborhood is explicitly
b
represented.
This fact allows geographers to manipulate the
information content of maps using spatial criteria which focus
on spatial relationships among the entities rather than on the
entities themselves, although it should be mentioned that for
geographers spatial relationships are often implicit in maps,
and in many cases they deal with them in an intuitive manner.
3.1.2 Spaces and Subspaces
The major aim of this study is to present the development of
formal models with characteristics similar to the ones
mentioned above for maps.
The first step in the design of
such models is to formalize the concept of "geographical
neighborhood.I1
Since it is a very broad concept, discussion
in the following paragraphs is limited to the case where the
"geographical landscape" has been modeled representing its
entities and relations in a Euclidean space.
Since Euclidean
spaces are subject to very intuitive geometrical
interpretations, their correspondence with geographical space
becomes very natural.
In crude terms a geographical neighborhood is either a
geographical area limited by physical features or
administrative boundaries or an area tnat surrounds a
geographical entity such as a city, a school or an airport.
In Euclidean space the concept of "geographical neighborhood"
can be identified with that of subspace, a concept which is
often used in mathematical analysis.
In general terms, a
subspace of a Euclidean space is simply a subset of the
original one.
There are, however, other definitions of
subspace depending on the mathematical structure in question.
For example, Royden (1 968, pp.127, 137) gives the following
definitions of metric space and subspace:
A metric space ( ~ , p )is a nonempty set X of elements (which we
call points) together with a real-valued function p defined on
such that for all x,y and z in X:
i> P(X,Y) 2 0;
ii) p(x,~) = 0 if and only if x = y ;
XxX
iii> P(X,Y> = p(y,x);
iv) P ~ Y I) P(X,Z) + p(z,y)
The function p is called a metric.
If ( ~ , p )is a metric space and S is a subset of X, then S
becomes a metric space if we restrict p to S, that is to say,
if we take as the distance between two points of 9 their
distance as points of X. When we consider S as a metric space
with this metric, we call S a subspace of X.
In other words, a set of the space is a subspace if it
inherits the mathematical structure defined in the space.
In
many other mathematical spaces, such as vector and
topological, subspaces play an important theoretical role.
Considered
intuitively and transferred to a map context, the
concept of subspace might be expressed as follows:
given map, a piece of map is still a map.
for a
However, closer
scrutiny of the analogy makes it clear that this statement is
not always true.
If too small a piece of map is taken, it
ceases to satisfy the function of a map.
In the same way, if
isolated elements of a map are cut out, tne result will
probably not be a map.
This observation clearly indicates that if the concept of
geographical neighborhood is to be formalized, precautions
should be taken so that the proposed model preserves certain
features that are essential for spatial analysis.
The conditions imposed on a set of a space to be a subspace
must be analogous to the conditions imposed on the abstract
entities selected to represent a geographical neighborhood
regarding certain spatial conditions such as maximum distance,
minimum area or contiguity constraints.
3.1 . 3 Geo-Spaces
Having established a resemblance between the concept of
neighborhood in geographical terms and the mathematical
concept of subspace, in order to proceed with the
formalization process it is necessary to introduce the concept
of geo-space.
It is possible to ascribe roles to entities in geographical
theory that are similar to these roles played by space and
subspace in mathematics.
and geo-subspaces.
These entities are named geo-spaces
They are discussed in the the following
paragraphs, and although their definition is intuitive and
general, they have proved to be useful for the purposes of
this study.
In the geographic modeling process it is common to find a set
of entities under study (such as rivers, roads, cities or
census tracts) that are identified with mathematical entities
(such as points and lines in a Euclidean space, nodes or links
in a graph or elements of a matrix).
One or more spatial
relations are established among the entities.
These spatial
relations are represented mathematically through equations or
specific mathematical structures.
In the formalization
process it is common to assume that the mathematical entities
of interest are immersed in a mathematical space.
Some of the
most commonly used mathematical spaces in geographic
applications are the Euclidean, matrix and topological spaces.
A quasi-mathematical structure of this type is called a
geo-space
To come back to the example in Section 2.1.2, in the modeling
of migration it is common to find a set of population centers
identified with a set of points in the Euclidean plane.
The
basic spatial relation is established through distance and is
expressed in an equation such as equation 2.1
we say that we have a Xuclidean
In this case
geo-space.
Clearly, a geo-space is not really a mathematical structure
since it contains elements of the geographical landscape.
Rather it is the quasi-mathematical product of an intermediate
step in the modeling process.
The purpose of using such an
intermediate structure in the formalization process instead of
a purely mathematical one is to ensure the inclusion of all
the spatial relationships that have been identified as
relevant.
A natural way of conceptualizing a subspace of a geo-space is
to consider a subset of the entities under study along with
their mathematical counterpart, where the spatial relations
established in the geo-space are preserved along with their
mathematical expressions.
The subsets considered have to be
part of a subspace of the mathematical space under
consideration whenever the latter is part of the geo-space.
For example, a subspace of the Euclidean geo-space described
in the previous section is simply a subset of the set of
cities considered and the points in the Euclidean space with
which they are identified.
The spatial relation established
among the points is preserved, since in the Euclidean space it
is always possible to calculate the distance between any two
points.
In mathematical metric spaces the constraint imposed on the
subspace is the preservation of the distance function.
Similarly, in the case of geo-subspaces the main constraint is
related to the preservation of the spatial relationships
established in the geo-space.
3.2 Definition of Neighborhood Model
Once the spatial relationships of interest have been
explicitly expressed either verbally or mathematically in the
definition of the geo-space and the corresponding
geo-subspaces have been identified, the next stage is to
formally establish the mathematical model to be used.
As will be seen in the following chapters the concept of
geo-subspace permits the design of formal tools to model the
concept of geographical neighborhood.
Formal tools that are designed to represent mathematically the
spatial structure through the notion of neighborhood are
called Neighborhood Models.
These models are suitable whenever the interest of the study
resides in the characteristics or behavior of sub-spaces
rather than in single or isolated entities.
Autocorrelation,
geostatistics and topological data structures are examples of
neighborhood models.
3.3. The Heterogeneity Index
Section 2.2
shows that the notion of "geographical
neighborhood" figures in the most recent geographical models.
It is vital here to achieve the previously stated objective of
modeling the "geographical landscape" through the concept of
geo-subspace.
One of the characteristics of a subspace that
interests the geographer is "variation."
--
.
-
The similarity ( or
difference) between an entity and its surrounding is a measure
of this variation.
of variation
Throughout this chapter several measures
of a geo-subspace are proposed and possible
interpretations are indicated.
These measures will be called
,heterogeneity
-- indices
\--.
3.3.1
Formal Definition
In order to begin formalizing the idea of local variation it
is assumed that the study area is partitioned into
non-overlaping areal units that completely cover it and that
the variable of interest associated with each areal unit is
only one and it is of interval scale type.
Additionally, it
is assumed that the geo-subspace of each areal unit is
well-defined.
Thus, the number of geo-subspaces (in this case
neighborhoods) is equal to the number of areal units.
The
definitions would also be valid if the units of study were
points.
The heterogeneity index associated with the neighborhood of
unit "a" , Ia is defined as follows:
',-
where Xi is the value associated with the ith areal unit, Xa
is the value of unit "a" and k is the number of neighbors of
11
unit "a."
Ia is therefore_ the sum of- squares
of deviations
-----.
__I_____
between unit "a" and its k neighbors.
-- - -
, u'-'.
- ----
'Clearly, this index is highly dependent on the units of
measurement.
Therefore, in order to make comparisons between
neighborhoods easier a new index is defined as follows:
Imax
Ha =
,'
/,,
'1
Ia
Imax - Imin
where Ia is the heterogeneity index associated with the
neighborhood of "a" ,
and Imin and Imax are the maximum and
minimum value of the set of heterogeneity indices associated
with the neighborhoods of the areal units under study.
takes values between
zero
and one.
u
- -
Ha
The higher the degree of
/
,/ 17-1r
/-
variation of
- the neighborhood, the closer the value of Ha to
-*
./"
/,-
fl
- . /
zero.
Another measure of variation which appears natural under the
same assumptions is that of local variance defined as follows:
k+1
va =
(xi
- Xa)
k+ 1
i=l
where k is the number of neighbors of unit "at',Xi is the
value associated to the ith unit and Xa is the mean value of
the values associated with unit "a" and its neighbors i.e.
Xi
2- k + l
where
Xa = X k+l
Va is therefore the sum of squares of deviations towards the
mean.
A way to standardize this method has been previously proposed
(silk, p.20).
Since comparisons are essential in this type of
analysis, a similar standardization is proposed for Va as
follows :
In statistical terms, the index as defined in equation 3.3.
corresponds to the variance of the geo-subspace.
Usually, for
a given sample a mean and a variance are associated to it.
In
this case, for a given geo-space a set of variances and means
are associated to it, one for each of the geo-subspaces under
study.
It is, however, also possible to calculate the mean
and variance of the set of heterogeneity indices associated to
a geo-space.
For example, consider a geo-space represented
via a graph where each point is connected to all the other
points, of the graph (i.e. it is a complete graph) as shown in
figure 3.1.
In this case, for each point its neighbors are
the remaining points in the graph, and as will be show in the
following paragraph, the value of the heterogeneity index as
defined in equation 3.3 is the same for every one of the
points.
Let al, a2, a3,
Xa2,
...Xan
...an
be the points of the graph and Xal,
the values associated to each one of them.
The
heterogeneity index of the geo-subspace of each point ai is:
Vai =
n
where
-X
Xal
Xa2
... + Xan
n
In this case, since all the heterogeneity indices of the
geo-subspaces have the same value, the variance of the indices
for this particular geo-space is always zero.
However, as shown in section 4.3.2, the most common case in
geographical studies is that of a geo-space represented via a
non-complete graph, so that the variance of its heterogeneity
indices will differ from zero in most cases.
Complete graphs of three and four points.
Figure 3.1
The Multivariate Case
In spatial analysis the researcher often deals with several
traits which characterize each of the units.
Therefore, it
becomes necessary to extend the definition of the
heterogeneity index to the multivariate case.
The purpose of such an index is to summarize for the different
values associated to each of the spatial units the
relationship between each of the units and its neighbors.
feasible way of doing this is by obtaining the heterogeneity
index separately for each one of the variables of interest and
then adding them.
The multivariate heterogeneity index for unit "a" is defined
as follows:
where Xij is the value of the jth variable for the ith
neighbor of "att,Xaj is the value of the jth variable
associated with unit "a" , p is the number of variables and k
'is the number of neighbors of unit "a".
Similarly, as the heterogeneity index was defined as a measure
of the local variance in equation 3.3, it is feasible to
define a multivariate index adding the variances associated to
each variable for a given geo-subspace as follows:
p
k+l
I t
( Xij
Xj )
where Xj is the mean value of the jth variable associated to
unit "a" and its neighbors.
Analogously to the univariate case, the variance of the set of
multivariate heterogeneity indices can be calculated for the
geo-space under study.
There are various problems involved in the multivariate case.
The most obvious one is tne fact that the variables are
measured in different units which are often non-comparable.
The usual procedure to overcome this restriction is to apply a
transformation to the original variables.
An example of a
transformation used to equalize the variables is to force them
to have unit variance.
There is no complete agreement on the
merits of these methods, and the discussion of whether to
ignore the problem or to apply a transformation is left to the
judgment of the analyst of the problem at hand.
Nevertheless, it is possible to redefine the index so that
comparisons among the various indices associated with the
different variables become easier.
Let Haj be the index
associated to regional unit "at' according to variable j.
The
new index is defined as follows:
Ha=
Haj
j=1
The value of this index is between zero and p.
The smaller
the variation of the neighborhood with respect to the p
variables, the closer the value of Ha to p.
Another problem that arises when several variables are
included in the analysis is related to the definition of
neighborhood.
The geo-spaces generated by the study of two variables are not
necessarily the same.
As a consequence a geo-subspace of one
of them is not necessarily a geo-subspace of the other.
In
terms of neighborhoods this means that for a given unit "a"
its neighbors with respect to one variable are not necessarily
the same with respect to another one.
If the neighborhood
relation for each variable were represented by means of a
graph, the generated graphs could be different.
The definition of the multivariate heterogeneity index has to
be altered as follows to consider this contingency:
where kj is the number of neighbors of unit "a" induced by
the jth variable and k j = kl,
...,kp.
As with previous indices, the deviation with respect to the
value of unit "a" is calculated for each one of the "p"
variables.
However, in this case the neighbors and their
number can vary from one variable to the other.
3.3.2
Interpretations
In the previous section measures of local variation were
proposed for univariate and multivariate cases.
However, no
geographical meaning was given to their values.
Two possible
interpretations of the heterogeneity indices are described in
t'he following paragraphs.
The Heterogeneity Index as a Topological Measure
In the interpretation of the heterogeneity index as a
topological measure, knowledge of two branches of mathematics
,is combined.
Concepts that are a traditional part of
topological studies such as the interior and boundary of a
region, are combined with concepts from the relatively new
area of fuzzy sets, such as the degree of membership of an
element to a set.
In order to fully understand the role of the heterogeneity
L
index it is necessary to establish the assumptions upon which
the interpretation rests.
Thus, in the first part of this
section some basic mathematical concepts are mentioned prior
to interpreting the index.
Topological Concepts.- First of all, it is assumed that the
point of departure is a connected graph G that forms part of a
geo-space.
That is, the entities under study have been
identified with the nodes and links between them that
represent a spatial relationship.
Additionally, the neighbors
of a node are defined as its first order neighbors.
For convenience, a topology is defined on the graph so that
every subgraph formed by a node and its neighbors is a
topological neighborhood.
The mathematical details of this
definitions are discussed in section 3.4.
Fuzzy Set Concepts.-
In the classic concept of membership of
an element to a set, the element either belongs or does not
belong to the set.
This concept was expanded by Zadeh (1965)
to reflect more accurately situations which often arise in the
real world.
Whether an element belongs to a class is often a
matter of degree. To model these situations mathematically
Zadeh proposed an entity which he called "fuzzy set."
Zadeh's definition of fuzzy set (1965) follows:
Given X a space of points, a fuzzy set A in X is characterized
by a menbership function fa(x) which associates with each
point in X a real number in the interval (0,1), The nearer
the value of fa(x) to unity, the higher the grade of
membership of x in A.
Based on this definition, operations and concepts similar to
those studied in ordinary sets have been applied to the study
of fuzzy sets.
Examples of such operations include union and
intersection, convexity and algebraic operations.
The concept of fuzzy sets has been widely applied in areas
that include metamatnematics, numerical taxonomy and pattern
recognition.
A large amount of research has been undertaken
since the concept was formulated by Zadeh in the 1960's.
A Geographical Interpretation.-
One of the problems that has
traditionally worried geographers is the definition of classes
among a set of entities.
It is in this context that the
heterogeneity index becomes meaningful.
In this thesis the
idea is to define the degree of membership in a class for each
one of the nodes of the graph.
Since in this initial process there are no pre-defined
classes, the degree of membership is better understood as the
potential for becoming an interior point of a hypothetical
class.
For a fixed point p in the graph G the heterogeneity index Hp
can be interpreted as a measure of the potential of membership
of p to the interior of a hypothetical region.
According to
the topological definition of the interior of a region, a
point is in the interior if there exists a neighborhood of p
that belongs to the region.
In this case the topological
neighborhood of point p is the set of its first order
neighbors.
It is at this stage that the concept of
fuzzy set
becomes relevant.
Although it can not be established at this point whether the
neighborhood belongs to the region or not, it is possible to
measure the degree of membership of the neighborhood to the
interior of the region.
This measure is given by the
heterogeneity index associated with the neighborhood of p.
The closer the value of Hp to one, the higher the degree of
membership of the neighborhood to the interior of the region.
Inversely, the closer the value of Hp to zero, the lower the
degree of membership of the neighborhood to the interior.
This relationship corresponds to our intuitive conception of
the interior of a region.
If a geo-subspace tends to be
homogeneous; that is, if the similarity between an entity and
its surrounding is high, then it must be in the interior of a
region.
As expected, in this case the value of the index is
close to one.
In the inverse case, if the subspace is highly
heterogeneous, it must belong to the border of a region and
the value of the heterogeneity index is close to zero.
It should be noted that in our problem the notion of the
exterior of a region is meaningless since there are no defined
regions.
Therefore, it is possible to distinguish only
between interior and boundary points.
This interpretation of the heterogeneity index as a measure of
the potential of a point to be either in the interior or
border of a region will be applied in a classification context
in the following chapter.
The Heterogeneity Index as a Geographical Measure
The homogeneity of a geo-subspace has also been a
problem for geographers.
traditional
While the number of geographical
studies related to regionalization (see Chapter 4) is quite
large, the study of the heterogeneity of a geo-subspace has
not received much attention.
Intuitively however, the concept
seems to be very important for spatial analysis studies.
For example, in issues related to mapmaking the cartographer
sometimes views the areas of "heterogeneity" as an indicator
of the scales that should be used.
In such cases, the
assumption is that for areas that show a "uniform landscape"
there is, in general terms, less interest in producing larger
scale maps.
A second example comes from social geography,
where the study of urban spatial patterns has received much
attention, particularly regarding the distribution of social
groups (silk, 1979, p.100).
Urban areas have been
differentiated by the social characteristics of their
population.
Nevertheless, spatial patterns of the boundaries
of the "city neighborhoods" are intuitively equally important.
Two contiguous city neighborhoods most likely interact
significantly through their boundaries.
If this is so,
"highly heterogeneous" boundaries must play a different role
in the study of social interaction than "less heterogeneous"
ones.
A high income residential neighborhood surrounded by a
low income one must interact in a different manner with its
surroundings than a similar high income residential
neighborhood would with a middle-class one.
Heterogeneity indices similar to the ones proposed in this
chapter could be used in the study of geographical
heterogeneity.
An example of an application of this concept
is found in the educational planning problem presented in
Chapter 5.
3.4. Fuzzy Topology
A close examination of the intuitive ideas behind the
mathematical theory of topology clearly points out the strong
resemblance between the geographical problem posed in this
thesis and one commonly encountered in this branch of
mathematics.
Firby and Gardiner (1982) give an excellent
overview of the main ideas that are the basis for the
development of topological theory.
The term "topologyt'was
originally introduced in the 19th century by one of Gausst
students and was used in addition to "analysis situs" to refer
to this new branch of mathematics.
Two parallel developments
of topological theory can be identified:
topology and algebraic topology.
point-set or general
Point-set topology was first
inspired by Cantor's work (1880)on the general theory of
sets, but its major advancement occurred only in this century
in the work of Frechet (1906) and Hausdorff (1912).
In general topology, concepts that are usually defined in a
Euclidean space such as "limit" and "continuity" are
generalized to abstract sets through the notion of
neighborhood.
For example, the definition of continuity of a function
,defined in the real plane (HXK) is based on the notion of open
interval as can be appreciated from the following formal
definition given by Haaser [Link]. (1959, p.327).
The function f is continuous at the point Xo in Df if for each
c > 0 there exists a d > 0 such that
whenever X E Df and [ X - X O /
the function f.
< d.
Df denotes the domain of
In a similar manner the concept of continuity is generalized
to abstract sets using the concept of open set and
neighborhood.
For example, a function from a metric space X to a metric
space Y is continuous if and only if for each open set 0 in Y,
-1
the set f ( 0 ) is an open set in X ( ~ o y d e n ,1968, p.132).
In summary, the space surrounding a point or, in other words,
the notion of nearness to a point is formalized in general
topology by means of the concepts of open set and neighborhood
and is used to generalize ideas that had been developed when
the set of interest was the real numbers.
In contrast, algebraic topology, inspired by more geometrical
problems, was introduced by Poincare between 1895 and 1905.
It should be mentioned that this thesis focusses on the
application of general rather than algebraic or surface
topology.
Nevertheless, it is recognized that concepts
developed in areas where there is a geometrical approach, such
as surface topology, can be of interest for certain
geographical studies.
3.4.1 Definition of Fuzziness
As in many other branches of mathematics, general topology is
based on the traditional concept of membership to a set where
an element either belongs or does not belong to it.
As
mentioned in section 3.3.2 the concept of fuzzy set has been
used in various branches of mathematics to generalize
theories.
For example in traditional systems of formal logic
a proposition is either true or false.
However, the
application of the notion of fuzziness has permitted the
development of a multi-valued logic which has been found
useful in the design of the so-called artificial intelligence
expert systems.
In the case of general topology the idea of fuzziness in a
geographical context appears in a natural manner.
In the same
way as the bivalent notion of membership to a set does not
provide an adequate model for some real problem-solving
situations in applied mathematics, in geography the bivalent
notion of the interior of a set as defined in topology is not
always adequate for regionalization purposes (see section
4.304)
With the aid of the heterogeneity index it is possible to
formalize the idea of fuzziness in topological terms.
For
example, once a geo-space under study has been identified with
a graph as defined in section 3.3.2, a topology can be defined
on this set.
Since the operations among graphs are implicit in the concept
of topology, the definitions of union and intersection between
graphs have to be established.
Union: Given two graphs G1 and G2, with their corresponding
sets of nodes V1 and V2, and of links X1 and X2, the union
between G1 and G2 (GI
U G2) is the graph G with V
V1 U V2
and X = X1 U X2 ( ~ a r a r y ,1972, p.21 ) .
Intersection: The intersection GI fl G2 is defined through the
links as follows: X = X1n X2 and V is the set of all the nodes
represented in X.
For convenience the following definition of a topology on G is
given:
U is an open set of G if:
i) U is a subgraph of G and
ii) for every point p
v(u),
there
exists a non-empty connected subgraph
N of U such that v(~)f(pj and p
v(N).
It can be proven that these open sets satisfy the conditions
required to be a topology of G (see Appendix A).
In particular for every point p in V(G) the subgraph formed by
p and its neighbors is a topological neighborhood.
This can
also be proven (see Appendix A).
It should be remembered that other topologies can be defined
on G.
The convenience of this particular definition is that
entities which have been previously used to model the notion
of "geographical neighborhood" such as the subgraph formed by
a node and its first neighbors are also topological
neighborhoods.
As a result, the heterogeneity index associated to a
geographical neighborhood becomes part of a topological space.
The heterogeneity index can be interpreted topologically as
the degree of membership of a neighborhood to the interior of
a set.
3.5 Other Indices
In the definition of the heterogeneity index it was assumed
that the variables involved were of interval scale.
At this
point the question of whether it is possible to define
equivalent indices for other types of variables is considered,
and an equivalent index is proposed for those cases in which
the variables are of nominal type.
The class to which a particular unit belongs can be determined
C
by a nominal variable.
These types of variables are
encountered in geographical problems in which characteristics
that can only be described through classes are involved.
Such
is the case of spatial analysis problems where variables such
as sex (female, male), income (low, medium, high), religion,
or nationality characterize the spatial units under study.
There are several measures used to assess the similarity
between units with respect to nominal variables.
The
comparison is made in terms of whether the units have the same
p.123).
or different scores on the variables ( ~ n d e n b e r ~1973,
,
The following "matching coefficient" is one of those
similarity measures:
Sab =
Nab
T
where Sab is the similarity between units !'a1' and "b" , Nab
is
the number of variables on which the units match and T is the
total number of variables.
The more similar two units are,
the closer to 1 the value of Sab.
The particular objective at this point is to define a
similarity index that reflects the relationship between a unit
and its neighbors.
The index should be defined so that the
more similar a unit is to its neighbors, the larger the value
of the index;
the more heterogeneous a unit is with respect
to its neighbors, the closer to zero the value of the index.
The proposed index follows:
Nia
where Nia is the number of variables on which units "a1' and
I,
iI! match, T is the total number of variables and k is the
number of neighbors of "a".
If unit "a" and its k neighbors match in all the variables
then the index Ia equals 1 .
If unit "a" does not match in any
of the variables with any neighbor, the value of the index is
zero.
There are many other matching coefficients.
Therefore, the
heterogeneity index has to be redefined in every case
depending on the measure used.
Whether a particular index is
appropriate or not depends on the problem at hand.
3.6 Conclusions
A general framework for the design of models that represent
spatial structure mathematically using the notion of
neighborhood was established in the first part of this
chapter.
The benefits of this approach can be appreciated in
the design of the two neighborhood models in the following
chapters.
As a first step in the development of neighborhood models
measures of local variation of a geo-subspace were defined and
through them the geographical notion of neighborhood was
related to the topological concept of neighborhood.
These measures, heterogeneity indices, are meaningful in both
geographical and mathematical terms.
From a geographical
point of view, this formalization is important for the modeler
since the geographical entity of neighborhood is identified
with an element of a mathematical structure which has been
broadly studied during this century.
On the other hand, in
mathematical terms the indices allow the definition of
fuzziness in a topological space.
A development which to the
best of the knowledge of the author has not been explored
before and could lead to the development of a new topology, a
fuzzy topology.
Chapter 4.
A TOPOLOGICAL APPROACH TO REGIONALIZATION
Regionalization is probably one of the best known branches of
geography.
One of the central issues in regionalization
problems is homogeneity.
It seems natural, therefore, to
apply a concept like local variation of a geo-subspace to the
process of region building.
In this chapter an application of
the heterogeneity index in the design of classification
L
algorithms is presented.
In the first and second sections an
overview of regionalization is given, and several existing
spatial algorithms are discussed.
In the final sections two
regionalization algorithms that use a heterogeneity index are
presented, and a hypothetical case is included.
4.1 Regionalization as a Classification Problem
The identification of areal groups that show a homogeneous
distribution of one or more characteristics but differ from
other groups is one of the central issues in regional
geography ( ~ o b l e r
, 1958, p. 1 4 0 ) .
Regionalization is the process by which regions are identified
and classified.
Bunge (1966) clearly recognizes the
definition of regions as a classification or taxonomic
problem.
In taxonomic terminology, a uniform region is
equivalent to an areal class, a single feature region is a
classification using a single category, etc.
From this point
of view a regionalization is a classification of geographic
units.
A whole body of classification techniques have been developed
as an inquiry tool for other sciences such as biology and
botany.
The methods developed in classification or cluster
analysis are in essence formal; that is, they employ a
mathematical frame.
The intent of such methods is to find a
solution to a classification problem similar to the one
produced by a specialist.
Decision rules for classifications
are usually designed in the form of algorithms.
Various
disciplines use algorithms that are in essence equal but have
been adapted to different circumstances.
Regional geography
shares universally accepted methods such as "central
agglomerative procedures."
Geographic studies which adopt a
classification methodology to various contexts have been
reported.
cities
Some examples are the studies of areal patterns in
ones,
1977), the partition of an area into adequate
zones for the optimal location of service centers such as
hospitals and schools ( ~ c o t t ,1969) and political districting
(~arfinkeland Nemhauser, 1968).
According to Haggett et al. (1 977, p.451 ) there are three
classificatory approaches that have been used by geographers:
uniform regions, nodal regions and planning or programming
regions.
Uniform regions are those in which places located
within the regions are homogeneous with respect to one or more
properties.
The regions are disjoint, contiguous and
completely exhaust the study area.
Nodal regions measure
interactions between units such as migration and number of
telephone calls.
Planning regions are created to satisfy
specific needs of an institution, to implement policy
decisions or for administrative purposes.
The criteria
selected to define these regions reflect the objectives for
which they were created.
Such is the case of the definition
of enumeration areas for a census.
The resulting regions are
not necessarily contiguous and might not exhaust the study
area.
In addition to the clear differences among types of regions i t
should be emphasized that the classification of locations into
regions also serves different purposes.
The main ones are:
hypothesis testing, administration and programming.
In the
case of nodal and uniform regions, classification is often
undertaken as an exercise to substantiate spatial theories.
The definition of programming regions serves specific
purposes.
This does not mean that the types of regions are
not closely related.
In fact, the definition of programming
regions is often constrained by previously defined nodal or
uniform regions.
4.1 .1 Elements of a Regionalization
Depending on the purpose of regionalization, different choices
are available to the analyst.
result of the process.
Each one has an impact on the
Therefore, the appropriate selection
of units, algorithms, etc is of vital importance.
In some
cases choices are almost equivalent, while others differ
drastically.
These decisions often depend on the analyst's
understanding of the problem itself.
It could therefore be
argued that this introduces a subjective factor to the
regionalization process.
When a classification exercise is carried out, the first stage
consists in defining its elements.
A regionalization has the
same basic elements as other classifications, but it also adds
spatial constraints.
The elements of regionalization are:
the units for the regionalization
the properties that characterize the regions
a measure of homogeneity or similarity
spatial constraints such as contiguity
and compactness
a grouping criterion
the algorithm to create the regions
the number of regions.
At this point it is worth mentioning the principal factor that
singularizes region-building in comparison to other
classification schemes:
the data units have an implicit
locational characteristic (~unge,1966).
In numerical
taxonomy similarity and nearness are equivalent; however in
spatial applications it is important to draw a distinction
between the two terms.
Similarity measures are commonly
applied with a nearness or contiguity constraint.
4.1.2 The Geographic Units
The two main limitations which the analyst faces in the
selection of units are the availability and the level of
aggregation of the data.
According to Sawicki (1973), the
availability of locational data for urban and regional
researchers is very limited.
In most cases it is obtained
from secondary sources such as census and administrative
offices.
Data is often compiled for fixed administrative
areas such as school districts or for street blocks.
The
accessibility of census data and availability of statistical
packages has increased, but spatial analysts have become
increasingly aware that existing data is not always compatible
with the hypothesis under study ( ~ a w i c k i ,1973, p. 146).
The two most common regionalization units are areal and point
types.
It appears that the most popular areal level used in
urban studies has been the census tract ( ~ a w i c k i ,1973,
p.110).
Tracts are delineated so that they are homogeneous
.with respect to characteristics such as income and
topographical features, as well as constraints such as
population size and contiguity.
It seems to be the case that
analysts have a vast range of levels of aggregation from which
an appropriate selection may be made.
However, spatial
analysis done at different levels of aggregation has shown not
only different but contradictory results ( ~ a w i c k i ,1973,
p.110).
The fact that the selection of units determines to a
great extent the results of spatial analysis severely
restricts the researcher since s/he often does not have
control of the definition of units used to compile data.
4.1.3 Measures of Homogeneity
In geographic studies regions are considered "areal systems
based on levels of similarities and differences in spatially
distributed traits" ( ~ o b l e r ,1958, p.140).
Homogeneity is
identified with low areal variance and heterogeneity with high
areal variance (~unge,1966, p.22).
Regions must be
internally homogeneous and differentiated from other regions.
It is in this context that the grouping of areas into regions
has been approached as a classification problem.
Identifying homogeneity with similarity as it is understood in
numerical taxonomy has made it possible to define regions
using the same techniques as in cluster analysis.
To carry
,out a regionalization it is therefore necessary to establish
the significance of homogeneity or similarity among areas and
regions.
In the following paragraphs some of the measures of similarity
[Link] that have been used for grouping purposes
are presented.
The point of departure is a set of units and a
b
set of variables characterizing them.
Depending on the scale
of measurement the variables can be classified in four groups:
nominal, ordinal, interval and ratio.
1 . A nominal scale allows distinctions to be
made between classes.
2. An ordinal scale induces an ordering of the objects.
3. An interval scale allows comparisons of two objects
by neans of the differences between them.
4. A ratio allows comparisons of two objects by
both a difference and a ratio.
(Andenberg, 1973, p.27)
Some of the similarity measures used for interval scale data
follow:
Minkowski Metric
The distance between units "i" and "j" according to the
Minkowski metric is:
where q>l and p is the number of variables.
In particular for
q = 2, Dq is the Euclidean distance.
The greater the dissimilarity between units "k" and "j", the
larger the value of Dq.
The measure increases with decreasing
similarity and decreases with increasing similarity.
In
geographic applications the Euclidean distance is the most
commonly used metric.
When the Minkowski metric is used, it is assumed that the
variables are immersed in an orthogonal space.
This poses
some limitations in geographic applications, since the
variables are often not orthogonal.
Principal component
analysis has been used to overcome this restriction (~yfulgen
and Nordgard, 1973).
Correlation Analysis
The prbduct moment correlation coefficient can be used as a
measure of association between units.
In geographic terms the
degree of association has been interpreted as a measure of
"regional bonds" (~aggettet. al., 1977, p.476).
,central problems
One of the
of this method is that the variables
associated to a unit involve different measurement units.
This renders mean and variance meaningless (~ndenberg,1973,
p.113).
The correlation between data units "j" and "k" is
defined as:
P
(xij
i= 1
Rjk =
where Xi
xj) ( ~ i k- xk)
P
2
[i=l
r (xij - xj)
lIp
Xi
P
- 2 112
z ( ~ i k - X k )]
i=1
and p is the number of variables.
i=l
Analysis of Variance
There are at least four measures of similarity that have been
used in terms of analysis of variance.
The first two
quantities (a and b below) are used in univariate cases while
the last two (c and d below) are used in multivariate cases.
a)
In grouping procedures the following quantity is used as
an objective function:
where Wi is the weight assigned to each data unit, n is the
number of units and
1
5 denotes the weighted arithmetic mean of
those Xi that are assigned to the subset to which element lli'l
ishe her, 1958, p.789).
belongs
become more homogeneous.
D decreases as the groups
D is known as the sum of squares
within groups in the sense of analysis of variance.
In
grouping procedures the objective is to minimize D.
b)
In building regions it is desirable to have internal
differences minimized and differences between regions
maximized.
That is, homogeneity within regions and
heterogeneity between regions should characterize the
grouping.
The following measure shows these inter and intra-regional
differences.
external variation (between regions)
(4.5)
internal variation (within regions)
The closer the grouping fits the desired requirements, the
h i g h e r t h e v a l u e o f "H1I. I t s h o u l d b e n o t e d t h a t t h i s
q u a n t i t y i s u s e d as a t y p e o f o b j e c t i v e f u n c t i o n r a t h e r t h a n
as a c r u d e m e a s u r e o f h o m o g e n e i t y .
c ) The Ward Method
Ward d e f i n e d t h e f o l l o w i n g m e a s u r e b a s e d o n t h e i d e a t h a t
whenever t h e r e is a g r o u p i n g t h e r e is a l o s s o f i n f o r m a t i o n :
t o t a l within region
e r r o r sum o f s q u a r e s
Ek =
mk
I:
i = 1 j=1
e r r o r sum o f s q u a r e s
f o r region k
mk
Xik = l/mk
Where
( x ijk-Xik)
(4.6)
Xijk
mean o f t h e i t h
variable for areas
i n region k
X i j k = value of t h e i t h v a r i a b l e f o r
the j t h area i n the kth region,
n = number o f a r e a s ,
= number o f r e g i o n s ,
mk= number o f a r e a s i n r e g i o n k ,
p = number o f v a r i a b l e s .
I n t h i s c a s e E i s u s e d as a n o b j e c t i v e f u n c t i o n t h a t h a s t o b e
minimized.
The more i n f o r m a t i o n t h a t i s l o s t i n a
r e g i o n a l i z a t i o n , t h e l a r g e r t h e v a l u e o f "E".
d) Cliff and Haggett (1970) defined a similar homogeneity
measure as:
1 - E
max E
where E is the total within-region error sum of squares, and
the maximum is taken over all the possible values of E.
In
-fact the maximum value is obtained when the resulting region
is only one; that is, when all the units are grouped together.
Since B is equal to zero when all areas are grouped into one
region and equal to one when each area is a region, the closer
B is to one the better the regional system performs in terms
of homogeneity.
It should be added that there are many other similarity
measures that have been used for grouping purposes.
The
reader is referred to Andenberg (1973), Hartigan (1974) and
Cormack (1 971 )
4.1.4 Regionalization Constraints
In addition to homogeneity there are other constraints that
are sometimes imposed on regionalizations.
Among the criteria
that have been used for both districting and region building
are:
. Equality
of population
2. Contiguity
3. Compactness
4. Preservation of political or
administrative boundaries
5. Region boundaries should follow
geographic features such as rivers
and mountains (~horesson,p. 237).
Even though all of these constraints are in essence spatial,
the contiguity constraint is particularly interesting for this
work since it is strongly related to the notion of
geographical neighborhood.
Contiguity
Homogeneity, as understood in classification analysis, means
either similarity or nearness.
Regional homogeneity, however,
refers to both similarity and geographical nearness.
There are two manners in which the contiguity constraint can
be interpreted:
a) When the units are of areal type, a
region is contiguous if for any two
units, a1 and a2, that belong to it,
it is possible to travel from a1 to
a2 through a path wholly contained
in the region.
In mathematical
terms this is called a connected set.
b) In some other instances the need for
contiguity does not necessarily
imply a physical border-to-border
relation but simply a neighborhood
one as described in Chapter 2.
There has been some disagreement among geographers about the
necessity of imposing a contiguity constraint on a
regionalization.
In some instances, such as the definition of
administrative zones or electoral districts, there can be no
doubt concerning the need for such a constraint.
However, the
requirement is less clear in the use of grouping for research
purposes.
There are two basic reasons researchers carry out
regionalizations: as an exploratory tool or to test a
hypothesis.
However, it should be remembered that there is an
important difference between using the grouping itself to test
a hypothesis and testing the hypothesis of whether there are
clusters or not.
Classification or cluster analysis can only
be used as such in the first case.
Two approaches to the building of uniform regions are
possible:
a) A classification without a contiguity
constraint (called a typification) is undertaken,
_--
followed by the mapping of results.
b) A classification with a contiguity
constraint is undertaken.
According to Byfulgien and Nordgard (1974) these two
,approaches are "not necessarily conflicting but
complementary."
position.
However, this is not a universally accepted
For example, Johnston (1970) argues that
Itregionalizationwith contiguity constraints over-simplifies
and operates against efficient hypothesis testing."
arises in the interpretation of results. The
f'
product o f a typification is a set of groups that satisfy a
The problem
"2---___---
condition of homogeneity, but are not necessarily contiguous.
These results have a value in themselves.
However, when the
resulting groups are mapped, the units are implicitly
classified by another variable, that of location.
Sets of
units that belong to the same group and are contiguous seem to
form regions.
However, it can not be ascertained that these
newly formed regions satisfy the same homogeneity condition as
the original grouping.
4.1 . 5 The Number of Regions
In a regionalization problem the main task is to find a
grouping of the units that "best1' satisfies the needs of the
analyst.
Therefore, the first idea that comes to mind is to
select from all the possible groupings the one that best
satisfies the constraints.
Cliff and Haggett (1970) have
,looked into some combinatorial aspects of the regionalization
problem.
They were able to calculate the number of different
aggregations to form "m" regions given "nu areas without a
contiguity constraint.
lU =
n! (gl!
.... gj!)
-A
17 fi!
i=1
where gj is the number of regions which comprise j units, fi
is the number of areas combined to form region i and the
summation is over all m element partitions of n.
For example,
if the number of units is four ( n = 4 ) and the number of regions
is two ( m = 2 ) , the m element partitions of n are two; ( 3 , l ) and
(2,2).
They also calculated the number for the case where the areas
are under a strong contiguity constraint, i.e. when they form
a chain.
As Cliff and Haggett (1970, p.288) nave shown, in both cases
the number is too large to permit approaching the
regionalization problem by exhausting the possibilities.
This
is the reason why it has become necessary to design heuristic
algorithms to find solutions which approximate the "best"
,result.
When a classification approach is undertaken the number of
regions to which "n" areas should be aggregated often has to
be defined by the analyst. As will be seen in the following
section, hierarchical methods group "n" units in any number
between one and m.
It is the task of the analyst to decide on
C
the "best" level of aggregation.
In non-hierarchical methods
the number of seeds determines the number of regions.
In
other instances, such as some of the algorithms described in
section 4.3, the number of regions is a result of the
algorithm.
4.1 .6 The Algorithms
Once a similarity measure as well as the constraints for the
grouping have been established, it is necessary, in order to
actually obtain the regions, to define a procedure by which
the areas are to be clustered.
The existing methods can be
classified as hierarchical or non-hierarchical.
Hierarchical Methods
In a hierarchical method the starting point is a set of "n"
data units, and it ends with the universal region in which all
--
the units are grouped in one region.
-
1-
,
.
-
/ -
--
..
In some cases the
.procedure is divisive because it starts from the universal
region.
What is common to the hierarchies, whether they are
divided --or- grouped is that they remain as such throughout the
entire remaining process.
An easy way to visualize this process is by neans of tree
diagrams as shown in figure 4.1.
Each node represents a
region, and the stages of the procedure are shown in the axis
below the tree.
In the first step the two most similar units
are merged, and the number of regions (or units) left is
reduced to 'In-1".
11
n- iII
After the ith step the number of regions is
The process involves 'In-1 I' steps.
According to Andenberg (1973, p.132) there are three major
-
hierarchical clustering
methods: linkage, centroid and
_-- --
variance.
Briefly, in the single linkage method, clusters are
merged using the shortest distance (similarity) between their
elements as a criterion.
In the centroid method, the
D i s t a n c e Between Groups
Dendogram
Figure 4.1
similarity between clusters is given by the similarity between
their means.
Finally, in the Ward method the clusters that
produce the minimum increase in the total within-group error
sum of squares (as defined in section 4.1.3) are merged.
Specific hierarchical methods used in geographical studies
will be presented in the next section.
Non-hierarchical Methods
The difference between hierarchical and nonhierarchical
methods is that in the latter two units that belong to the
same region, at any stage of the process, do- not necessarily
..---
remain joined.
In fact, nonhierarchical methods are based on
the assumption that given an initial partition of the units,
subsequent improvements
-- -are feasible.
Usually the first step
L
is the selection
of a set of units called seeds.
-
An initial
partition is defined by joining each unit to its most similar
seed.
In the following stages each new partition is defined
by taking the previous one as a point of departure.
The
process ends when the llbest'lpartition is found.
4.2 Spatial Algorithms
The algorithms that geographers have used for regionalization
purposes can be divided into two groups.
The first is
composed of those shared with other disciplines, such as the
Ward and Singe linkage methods.
The second group is composed
of those algorithms that include specific spatial constraints
such as contiguity and compactness.
The first group has been extensively described in the
literature (see Andenberg, 1973, Cormack, 1971, Hartigan,
1975) and will not be discussed in further detail here.
second group is of more interest.
The
It will be referred to
,throughout this study as "spatial algorithms."
From a methodological point of view we can distinguish three
types of spatial algorithms:
a) those in which the contiguity of groups can only
be assured by checking if the units have a common border;
b) those that use the notion of neighborhood to identify
contiguous groups;
c) those in which contiguity is assured together
with other constraints imposed on the resulting
regions.
For comparative purposes some "typical" algorithms of the
first two types will be described.
The third type of spatial
algorithms which is not described here, uses techniques such
as integer programming and is usually applied to districting
problems (~arfinkel,1970).
4.2.1
Byfulgien and Nordgard Algorithm
The following method was originally introduced by McQuitty and
later transformed into a spatial algorithm by Byfulgien and
Nordgard ( 1 9 7 3 ) .
This is an example of a hierarchical
algorithm of the first type.
The similarity measure is the
Euclidean distance, and the clustering criterion is of the
,single linkage type.
The number of resulting regions is
determined by the algorithm.
The main characteristic of the
resulting regions is that "all basic units have their most
similar contiguous unit within the same region."
Byfulgien and Nordgard applied this method in eastern Norway
to agricultural data and concluded that it can produce regions
L
with very dissimilar units.
This is because the condition
required to add a unit to a region is its similarity to just
one of the other units of the region.
The Algorithm
Def. 1 . "A" is the set of "nu areal units in
which the area of study is subdivided.
That is,
...ani.
{al,
Def. 2. Dij is the distance between areal
units ai and aj.
D e f . 3. M i s a nxn m a t r i x t h a t c o n t a i n s a l l t h e
d i s t a n c e s between a r e a l u n i t s . T h a t i s
mij=Dij;i,j=l,.
.. n
S t e p 1 . F i n d t h e two most similar a r e a l u n i t s .
L e t a i and a j be t h e s e two u n i t s .
S t e p 2 . Check if t h e a r e a l u n i t s a i and a j
have a b o r d e r i n common.
If they a r e
c o n t i g u o u s c o n t i n u e w i t h S t e p 3. O t h e r w i s e
l e t D i j be a " l a r g e number" and c o n t i n u e
with Step 1 .
S t e p 3. Merge u n i t s a i and a j t o f o r m r e g i o n R.
S t e p 4 . L e t N be t h e s e t of a r e a l u n i t s t h a t h a v e
a b o r d e r i n common w i t h R .
For e a c h e l e m e n t
o f N c h e c k i f i t i s c l o s e r t o one o f t h e
u n i t s t h a t b e l o n g t o R t h a n t o any o t h e r
of i t s contiguous u n i t s .
S t e p 5. L e t F be t h e s e t of u n i t s t h a t s a t i s f y
t h e c o n d i t i o n s t a t e d i n s t e p 4.
no s u c h u n i t , i . e .
If t h e r e is
F = 0, c o n t i n u e t o S t e p 7 .
S t e p 6 . Form a new r e g i o n m e r g i n g r e g i o n R and
t h e p r e v i o u s l y d e f i n e d s e t F.
R U F and c a l l i t R.
That is, t a k e
C o n t i n u e w i t h S t e p 4.
S t e p 7. Region R i s one o f t h e r e s u l t i n g r e g i o n s .
S t e p 8. Take t h e s e t of a r e a l u n i t s t h a t do n o t
b e l o n g t o any r e g i o n and o b t a i n i t s
distance matrix M.
Step 9. If all areal units belong to a region,
then end the grouping process.
Otherwise
repeat the procedure starting from step 1 .
4.2.2 Berry's Algorithm
Berry (1961) modified a central agglomerative procedure to
.include a contiguity constraint and used it for an economic
regionalization.
Most of the central agglomerative procedures
follow the general scheme given by Andenberg (1 973, p. 133).
The modified or "spatial" general procedure is as follows:
Def. 1
"A" is the set of "n" areal units in
which the area of study is subdivided.
That is A = (al, a2,
Def. 2
...,
an1
Dij is the distance between areal
units ai and aj
Def. 3
M' is a
mij =
n X n
matrix where:
Di j if ai and aj are contiguous;
co
otherwise
(or a very large number);
In this case the matrix M' reflects not only the similarity
between regions but also their contiguity
relation.
-___-.
__.
Step 1 . Begin with n areal units.
Step 2. Search the matrix MI for the two most
similar pairs of contiguous regions.
Let the chosen regions or units
be labeled ai and aj.
Step 3. Reduce the number of regions (or units)
by one merging regions ai and aj.
Label
the resulting region ai and update matrix
M' to reflect the similarities between ai
and all other existing regions (or units).
Delete the row and column of PIt that
corresponds to region (or unit) aj.
Step 4. Perform Steps 2 and 3 a total of
( n - 1 ) times.
Both areal and point units can be used with this type of
algorithm.
The range of similarity measures and clustering
criteria that can be incorporated into this algorithm is the
same as that of a non-spatial central agglomerative procedure.
.Other constraints such as compactness can be included without
changing the basic structure of the algorithm.
One of the disadvantages of this type of algorithm is that
sometimes it produces a
chaining
---- - process.
-
This occurs when in
every stage a unit is merged to the same (or to a few)
region(s).
The result is that at any stage there are a few
"larget'regions together with ungrouped units,
In this type of algorithm the user decides on the number of
resulting regions, since the procedure starts with n regions
and ends with one.
4.2.3 Lankford Algorithm
An example of an algorithm that uses the notion of
neighborhood to identify regions is described by Lankford
(1 969)
This algorithm was not specifically developed for
cases where there are contiguity constraints and the units
themselves are not necessarily spatial.
Given the set of variables which represent the attributes with
which the units are characterized, it is assumed that they are
immersed in a m-dimensional orthogonal space.
A similarity
measure designed to detect zones of "high density" in the
m-dimensional space is introduced.
The Algorithm
Def.
Let the density ~ ( a )for areal unit "a" be defined
as follows:
w(4
=(I
14
I
XEN( a)
-1
d( a,x)
where n is the number of neighbors of unit "a" 9 d( a , d
denotes the Euclidean distance between units "a" and "x"
and ~ ( a )is the neighborhood of 'la."
This measure was designed to detect the high density zones.
unit i'mmersed in a dense zone has a high I1W" value associated.
Def.
Let f(a,b) be the association between units ''a" and " b " :
This expression resembles the one for gravitational
attraction.
The following definitions are necessary to extend the concepts
previously given to the case of units already grouped.
Def.
Two groups G and H are called neighbors
if there is an element g in G such that
its neighborhood
~ ( g )intersects H
i.e. ~ ( g n)H # 0 .
Def.
The interface I(G,H) between two groups G
and H is the subset of elements of G (or H)
such that its neighborhood intersects H (or G).
Def.
The association between neighbor groups G
and H is the average of the associations
between all pairs (g,h) in the interface:
where m is the number of pairs in the
interface, and (g,h)
I(G,H).
The algorithm itself is a central agglomerative procedure
using "f" as the measure of similarity.
applied in a two-dimensional space.
This procedure was
Apparently, no attempt
has been made to use it in a more general case.
4.2.4 Brantingham Algorithm
Brantingham (1978) has presented an algorithm that uses
topological concepts not only intuitively but in a formal
sense as well.
The procedure is designed in such a way that
at each stage it is decided whether two units should be
separated by a border.
In this sense it differs radically
from previously described algorithms, in which the main
decision leads to the grouping of units.
City blocks were used as basic units in the regionalization.
The similarity measure is based on the difference of absolute
values, and it is a multivariate grouping.
The Algorithm
Def. 1 "At1is the set of "n" areal units in
which the area of study is subdivided.
That is A = {al, a2,
...,
an1
Def. 2 Let ~ ( a j )be the set of units
contiguous to aj.
Def. 3 Let fi(aj) be the value of the ith
variable associated to aj.
Def. 4 A basis of a topology T in X is a
subcollection B of T such that every open
set U in T is a union of some open sets
in B (HU, 1964, p. 17).
Def. 5 A basis set Bi of the topology is the set
of all contiguous units such that the
interunit variation of the variable of
interest is less than some fixed percentage.
Step 1 . Fix a maximum percentage of interunit
variation, call it b.
Step 2. Let ak be an element of the basis Bi
(i=l the first time), where ak is an
arbitrary unit.
Step 3 . For each element aj in Bi (which has not
been t e s t e d b e f o r e ) p e r f o r m s t e p s 4 t o 7 .
S t e p 4. For e a c h e l e m e n t a j i n B i c h e c k among t h e
n e i g h b o r s o f a j t h a t do n o t b e l o n g t o B i ,
i f t h e y e x c e e d t h e maximum p e r c e n t a g e
of i n t e r u n i t v a r i a t i o n with respect
That is:
t o aj.
If(aj)
where a j
<
<
f(ai)l
>
Bi, ai
1.
max [ b f ( a i ) , b f ( a j ) 1
~ ( a j ) a, i $ B i and
Call F t h e s e t o f u n i t s t h a t
e x c e e d t h e maximum p e r c e n t a g e of i n t e r n a l
v a r i a t i o n and I t h e complement.
That is:
S t e p 5. D r a w a b o r d e r between e v e r y e l e m e n t
o f F and a j .
S t e p 6. Those u n i t s t h a t do n o t e x c e e d t h e
i n t e r u n i t v a r i a t i o n are added t o B i .
R e d e f i n e B i as B i U I .
S t e p 7 . If t h e r e a r e e l e m e n t s of B i t h a t have
n o t been t e s t e d f o r t h e p e r c e n t a g e o f
i n t e r n a l v a r i a t i o n , c o n t i n u e w i t h s t e p 3.
S t e p 8. The r e s u l t i n g s e t B i i s a b a s i s s e t .
S t e p 9. If t h e r e a r e s t i l l u n i t s t h a t do n o t
belong to any basis set, then choose
any arbitrary element of this set and
continue to step 2 to create a new basis
set.
In this case the basis sets are interpreted as the resulting
regions.
If the pattern that appears from the above procedure
is not satisfactory, a new value for the internal variation is
-fixed and the procedure is repeated.
The basis sets are
defined so that the newly defined variation is smaller than
the previous variation used.
This guarantees that once a
border is drawn between two units it will remain as such
through the whole process.
4.3 The Design of Algorithms
From the previous sections it can be seen how cluster analysis
techniques have been used for regionalization purposes.
In
some cases the techniques have been applied without any
modifications, while in others they have been adapted for
spatial analysis purposes.
Considering that one of the interpretations of the
heterogeneity index is as a measure of the degree of
membership of a point to a set, it seems natural to use it in
a region-building process.
In this case the heterogeneity index is assumed to be an
indicator of the potential of an element to belong to the
interior of a region.
The closer the value of the index to 1 ,
the higher the potential of the element to be an interior
point.
As is common in a regionalization problem, it is assumed that
>the elements under consideration are clustered into
homogeneous groups.
That is, every element is either an
interior point of a homogeneous region or is on its border.
4.3.1 A Topological Algorithm
The next objective is to define a procedure that integrates
the heterogeneity index and the grouping process.
hierarchical method is proposed in which the clustering
criterion is determined by the heterogeneity index.
The
algorithm is designed so that the order in which the elements
are grouped is determined by the index.
Intuitively, those
elements that have a higher potential to be in the interior of
a region should be clustered before the ones that have a high
potential to be on the borders.
The proposed algorithm is
described below:
Def. 1 "At' is the set of Itnu areal units in which
the area of study is subdivided. That is:
A = {al, a2,
Def.2
...,
ani.
A unit that does not belong to any region
is called an "elementary unit."
Def. 3'A region is the union of at least two
/elementar$ units.
Def. 4 A unit is an entity that originally was
elementary but now constitutes part of a
region.
Def. 5 N(ai) is the set of neighbors of entity "ai".
The possible entities are: region, unit
and elementary unit.
Def. 6 The heterogeneity index associated with
a region is similar to the one defined
for elementary units as follows:
where Xij is the value of the jth variable
for the "k" neighboring entities of the region
and Xrj is the value of the variable
associated with the region.
The manner
in which these quantities are calculated
depends on the problem at hand.
Step 1. Select the elementary unit with the highest
value in the heterogeneity index.
Call it ai.
Step 2. Search for the most similar neighbor to ai.
Call it aj.
Step 3. If aj is an elementary unit then group
ai and aj.
Call the region
Rk.
Step 4. If aj is a unit then group ai and the
region that contains aj.
Note that ai
and aj are no longer elementary units.
Step 5. If there are more elementary units left,
return to step 1 ; otherwise continue with
step 6.
Step 6. If the number of resulting regions is less
than desired then finish the procedure.
Otherwise continue with step 7.
Step 7. Rename the regions as elementary units.
Establish the neighborhooa relations for
the new elementary units.
Step 8. Start again with step 1 .
General Characteristics of the Algorithm
This type of algorithm can be used when the units of interest
are either areas such as census tracts, counties and provinces
of a country or points such as airports in a transportation
network.
The selection of the similarity measure depends on
the problem at hand, but it has to be consistent with the
definition of the heterogeneity index.
That is, if a
neighborhood N "tends" to be in the interior of a region, it
should also be homogeneous with respect to the similarity
measure used.
The heterogeneity index can be redefined to consider this
restriction as follows: Assume S to be the function of
,similarity between any two elements.
The heterogeneity index
associated to the neighborhood of unit "a" is of the form:
where s ( x ~ , x ~ )is a measure of similarity between the value
associated to the ith neighbor and unit "a" and k is the
number of neighbors.
This algorithm was designed to assure that the resulting
regions satisfied a contiguity constraint.
However, other
constraints such as that of compactness could be introduced
without altering the basic structure of the algorithm.
The
algorithm is hierarchical, but it differs basically from
others in its clustering criterion.
In this case there are
two clustering criteria: a spatial and a non-spatial one.
The
heterogeneity index determines the order in which the grouping
is going to take place, while the similarity measure
determines which units are to be grouped.
Any of the several
criteria used in hierarchical algorithms as described in
section 4.1 .6 could be adapted.
The number of resulting regions obtained by this type of
algorithm is not the same as in a typical hierarchical one.
The number of regions in each stage is determined by the data.
While in typical cases the analyst can choose any number
-desired between 1 and n, in this case it can only be selected
from the results.
This constraint can actually be an
advantage, if the analyst has no way of anticipating the
number of regions there may be.
4.3.2 The Regions as Graphs
The structure of the units in a clustering problem can be
viewed as a graph (Andenberg, 1973, p.150).
The nodes of the
graph are the units themselves, and the lengths of the edges
are given by the similarity between the units.
complete since all its nodes are adjacent.
The graph is
The single linkage
method finds the minimal spanning tree; that is, the shortest
tree with (n-1 ) edges that connects all the nodes (~ndenberg,
1973, p.150).
In a regionalization problem the structure of
the data units can also be viewed as a graph.
nodes of the graph represent the units.
Again, the
There is an edge
between two nodes if the units are neighbors, and the length
of the edges is given by the similarity measure.
This graph
is in fact a subgraph of the complete graph G 1 generated in a
classification problem without a contiguity constraint.
When a single linkage criterion is used in the proposed
algorithm, the resulting regions obtained in a first stage
generate a set of graphs where the nodes represent the
elements that belong to each region and the edges are defined
.through the clustering criterion. Each of them is a subgraph
of G I .
Moreover, each one of these subgraphs is a minimal
spanning tree of the subgraph of G I formed by the nodes that
belong to the regions together with their neighboring
relations.
If the algorithm is repeated until all the units are grouped
b
in a single region, the resulting graph will also be a minimal
spanning tree of G I .
Both the topological algorithm and the original single linkage
method will generate a minimal spanning tree.
The basic
difference is that at any given stage the order in which the
units are grouped is not necessarily the same. A hypothetical
case that exemplifies this is presented in the following
paragraphs.
A Hypothetical Example
To illustrate the issues discussed in this section concerning
the concept of minimal spanning tree, a brief example is
presented.
The purpose of the exercise is to regionalize the
18 areal units shown in figure 4.2.
The areas were grouped
according to two different algorithms:
a single linkage
method with a contiguity constraint and the topological one
presented in section 4.3.1. In both cases two areas are said
to be contiguous if they have at least one segment in common.
The similarity between two areas is given by the absolute
difference.
That is:
dij = IXi
~jl
where Xi is the value associated to the ith area.
Figure 4.3 shows the resulting regions obtained after the
first stage of the topological algorithm.
Each one of the
subgraphs (one for each region) shown in figure 4.4 is a
minimal spanning tree.
In the second stage of the procedure
all the areal units are grouped into one region.
The minimal
spanning tree generated is shown in figure 4.5.
The dendogram generated by the usual linkage method is shown
in figure 4.6.
Since the procedure is hierarchical, in the
final stage all the units are grouped into one region.
The
minimal spanning tree generated by this procedure is shown in
O r i g i n a l Data
Figure 4.2
~ e s u l t i n gR e g i o n s ( F i r s t S t a g e )
Figure 4.3
Subgraphs and Minimal Spanning T r e e s
Figure 4.4
Minimal Spanning T r e e (Second S t a g e )
Figure 4.5
Dendogram
F i g u r e 4.6
Minimal Spanning T r e e . S i n g l e Linkage Method
F i g u r e 4.7
figure 4.7.
As can be appreciated by comparing graph G t (figure 4.8) to
the graphs formed after the first stage of the topological
algorithm (figure 4.4), each one of the latter is a subgraph
of GI.
Analogously, the minimal spanning tree (figure 4.5) is
also a subgraph of G I .
Finally, it should be noted that the
minimal spanning trees generated by the two algorithms
(topological and single linkage) are not necessarily the same.
4.3.3 Heterogeneous Regions
Like homogeneity, heterogeneity may also be used as a
constraint in the definition of regions.
In some instances
researchers seek to identify groups of elements that are
heterogeneous and spatially clustered.
In these cases
measures of dissimilari,ty and the heterogeneity index can be
used in the design of algorithms, just as similarity and the
homogeneity index were used in the cases mentioned previously.
4.3.4 Fuzzy Regions
In section 4.3.1 a hierarchical algorithm incorporating the
concept of heterogeneity index was proposed.
The index was
used as an indicator of the order in which the units were to
be merged.
There are other ways in which the index can be
used in the design of algorithms.
In this section an
alternative hierarchical algorithm which uses the notion of
fuzzy sets is proposed.
The heterogeneity index associated with the neighborhood of
unit "a" is interpreted as the degree of membership of the
unit to a hypothetical region.
When the decision to group two
units is made, it becomes natural to ask what the degree of
membership of the resulting group is to the hypothetical
region.
Intuitively, the two units that have the higher
degree of membership to a region are the ones that should be
clustered first.
The concept of fuzzy set makes it possible
to assign a value to the degree of membership of the set of
two units to a region.
Given two contiguous units ai and aj , their corresponding
neighborhoods ~ ( a i )and ~ ( a j )and associated heterogeneity
indices Hai and Haj, the degree of membership of the set
{ai,aj] to a hypothetical region R can be defined as follows:
~ ( a i , a j )=
min { Hai,Haj
(4.18)
The above definition can be extended to the case where the
units are entities as defined in the preceding algorithm,
since both the heterogeneity index and the concept of
neighborhood have been defined for regions (see Def.6, section
The terms and assumptions under which the proposed algorithm
is de~cribedare exactly the same as the ones presented for
the preceding algorithm except for an additional matrix M1
defined as follows:
mij = H(ai,aj)
where ai and aj are either regions or elementary units as
defined in section 4.2.1.
Step 1
Find the two contiguous entities ai and aj
with the maximum value of H. That is,
find the maximum value in MI;
~ ( a i , a j )= max { ~(a,m,an) : for all pairs of
contiguous units]
Step 2. Group entities ai and aj and name it
ai (i<j).
Step 3. Update the ith row and column of matrix
M 1 , and delete the jth row and column
of matrix M I
Step 4 . If there are elementary units left in MI,
then start again with step 1 . Otherwise
the renaining elements of PI1 represent
the resulting regions.
This procedure,like the preceding algorithm, can be repeated
until all the units are clustered in one region.
Given the resulting regions R1,
associated to an element
ai
...Rk,
Ri
the heterogeneity index
can be interpreted in fuzzy
set terms as the degree of membership of "ailf to the interior
of Ri.
Its conplement (1-~ai)can be interpreted as the
degree of membership of "ai" to the border of Ri.
Therefore,
it can be said that the resulting groups are fuzzy regions.
As in many other areas of knowledge, in regionalization
studies, regions are defined so that for every element of the
universe of study it is clearly distinguishable whether or not
it belongs to the region.
There are however, certain
geographical problems that can benefit from a "fuzzy"
definition of the degree of membership of an element to a
region.
For example, consider an ecological study of an urban
area such as Mexico City.
Although there are no previous
studies of this type for this particular area, an obvious
characteristic of the city is its lack of clear-cut
differences in residential areas as well as in land use.
That
is, it is common to find "border areastfbetween urban
neighborhoods where middle and low income families or even
high and low income families dwell on contiguous pieces of
land.
Similarly, it is common to find areas with various
simultaneous uses.
Such is the case of zones that are
residential, providers of public services, educational,
medical and industrial.
In a study area with these characteristics a method that
incorporates tha definition of fuzzy regions seems to be the
most appropriate choice.
There are precedents for the application of the concept of
fuzzy set in the design of classification algorithms ( ~ u n n ,
1974). In the cases Dunn mentions, the result is a set of
regions where each of the elementary units has assigned a
degree of membership to each region.
On the other hand, the
proposed algorithm differs from previous ones, since the
degree of membership is assigned only to the interior and the
border of a given region.
4.3.5 The Heterogeneity Surface
In the design of the previously discussed classification
algorithms it was assumed that the units under study were
clustered in homogeneous or heterogeneous groups.
There are,
however, some instances where researchers do not know the
number of regions or have no previous information on the
spatial patterns of the data.
In these cases they can use the
heterogeneity index to increase their knowledge.
To illustrate the manner in which the heterogeneity index can
be applied in these situations, a heterogeneity "surfacet1is
defined as follows:
where p is a point inside the areal unit "aV1and Ha is its
associated index.
F is a step function; therefore its graph
is not a continuous surface.
However, for illustrative
purposes it can be said that a "basin" in
the graph is a set
of contiguous areal units that have a value of F close to
zero, and "ridges" are areal units with values close to one.
This graph can provide some significant information.
For
example, basins indicate the presence of the interior of a
region, and ridges indicate boundary points.
If it is assumed
that the elements are clustered into regions, then the graph
must be composed of basins surrounded by ridges.
In such
cases F can be used to estimate the number of resulting
regions.
F may also be used in cases where the area under study is
composed of both homogeneous and heterogeneous regions.
"highlands" in F indicate the presence of heterogeneous
regions in the same manner as "lowlands1' indicate the
The
existence of homogeneous ones.
The graph of F can therefore
be used in the identification of such regions.
In other cases the lack of pattern in the heterogeneity
surface might indicate an absence of regions in the study
area.
Finally, the possibility of using the heterogeneity surface in
multivariate cases is worth mentioning.
When the objective of
regionalization is to obtain homogeneous regions with respect
to more than one characteristic, the first question that
arises is whether it is possible to obtain homogeneous regions
with respect to all the variables simultaneously.
For each one of the variables involved, let Fi be the function
associated with the ith variable.
A quick look at a pair of
graphs can aid the analyst in deciding on the compatibility of
the variables.
If the two graphs show that basins and ridges
coincide, then it may be concluded that the variables are in
fact compatible.
However, if for one variable there "tend" to
be basins where there are ridges in the other, grouping the
areas using both variables simultaneously is intuitively
outruled
4.4 A Comparative Example
In section 4.3.1 a modified agglomerative single linkage
algorithm was presented.
As mentioned before, the main
difference between the usual single linkage method and the
topological algorithm resides on the order in which the units
are aggregated.
Besides this obvious difference there are
other ones that are derived from the use of a neighborhood
approach.
To exemplify these ideas as well as to test the
performance of the topological algorithm the method was
applied to a hypothetical case and the results were compared
with those of a contiguity-constrained [Link] method.
4.4.1
A Hypothetical Case
The hypothetical case is defined on a regular area subdivided
into 240 units as shown in figure 4.9. The classification of
the units is based on three variables and the values of each
---
-_ .
----
variable were assigned under the assumption that the area is
formed by homogeneous regions considering each of them
separated and simultaneously.
That is, if each one of the
variables were to be represented on a map, homogeneous regions
-- -.--- -
would be present.
Moreover i/f:the three above-mentioned maps
L
'0
were combined, the resulting mLp would also be formed by
y--\
homogeneous regions.
The values were assigned so that
variable "A" was used as a basis.
Therefore in the first
S t u d y Area
F i g u r e 4.9
stage the values of variable "A" were defined as shown in
figure 4.10.
The values of variable "B" were defined so that
the regions formed under- it, were sub-regions of variable "At1
--
(see figure 4.10). Finally, the values of variable " C "
were
assigned in such a way that its "border zonesw did not
-
necessarily coincide with those of variable "A1' and "B"
(see
figure 4.10).
One of the central issues in the application of a topological
algorithm is the definition of _the
neighborhood relationship.
The advantage of using a regular structure is the ease of the
1
-
implementation of the algorithm since the neighborhoods for
each of the areal units is explicitly defined.
In this case
it was decided to define the neighborhood of each of the units
as the set of bordering units
unit itself.
__ together
- -- with- the
.--
L/",
The areas that are connected to a unit solely by
a
point-weye
'
- -- -'------__I____
excluded to avoid regions linked through points.
Two regionalization algorithms were applied to this
hypothetical problem, the topological algorithm as described
single
linkage
in section 4.3.1 and a contiguity-constrained
--method that follows the general format as described in section
4.2.2.
Both algorithms are hierarchical and use the same
grouping criterion but differ in other aspects.
The
topological algorithm is based on the neighborhood approach
which is reflected in the algorithm through the inclusion of
the heterogeneity index concept,
while the other algorithm
follows the traditional approach were the area is assumed to
be formed by units rather than by geo-subspaces or
neighborhoods.
The Topological Algorithm
The first step in the application of the algorithm to the
,multivariate hypothetical case is the calculation of the
standardized heterogeneity index using equations 3.1 and 3.2.
In this case, the heterogeneity index can be interpreted as
the degree of membership of a unit to the interior of a
region.
The closer the value to one, the higher the degree of
membership to the region.
The nultivariate heterogeneity
index was calculated according to equation 3.7 as shown in
figure 4.11. Again, the closer the value of the index to
three, the higher the degree of membership of a unit to a
region formed according to the three variables.
The heterogeneity index plays a fundamental role at this stage
of the analysis in the definition of the "grouping pattern"
Assume for example, that one of the variables of interest is
the column-wise position of each of the units.
That is, all
the units that are positioned in the ith column have assigned
a value of "i"
The value of the heterogeneity index for an
interior unit is a constant.
That is, the degree of
?YN
RNn
n-n
n
JR* RR' RRn RR* In
n
p
ann
RR" RE* RRn Iqnn
qnm
j
I
oon
p+
an pen
4-N
--4
eon
I
r-ho mao
m m ~man
00 ooe 000 000' --N'
nnnn$a00- a
a DDN,
~ ~0 0 nno
DDN
~
ma^ a a ~
&%
RR" RRn RR" IJRWJ #g; egg CCA g g l
YN*
000 ---Inno
440 mad o ~ bb4
o
~
000 0 0
~0 ~ ; o . o o
. ~0 ~
0 0 m
~ m ~
PO0
am-
am0 0001
m m ~
DDN 0
@--
000 NN- 4.~
~ D N
0 0 DPN,
~
I
membership of all these units is exactly the same, so that
there is no grouping pattern. Since the algorithm is designed
h
to be applied to problems were a grouping pattern exists, it
is not advisable to include in the classification a variable
such as the one previously described.
An additional criterion was added to the algorithm to solve
,
cases $ere
.criterion.
two or more neighbors satisfy the grouping
If two or more neighbors
are at the same distance
from a unit, then the one selected to be grouped is the one
that satisfies the following conditions: a) it already belongs
to a region and b) is the first clock-wise neighbor.
First [Link] the assumptions described above, the algorithm was
applied obtaining as a result the regions shown in figure
4.12.
It should be remembered that the number of resulting
regions in this type of algorithm is part of the results of
the procedure.
That is, contrary to a usual hierarchical
method were the analyst has to decide on the number of
resulting regions, in this method the number of regions is
determine through the heterogeneity index.
In this
application, the number of resulting regions from the first
stage is 69.
Since the heterogeneity index can be interpreted as the degree
~ 7 4 5
z. -745
b.-m
2 . 9 6 z n b.97e5-7 ~ ~ - 7 72.
L-W~ZL
2.89788 I
wuz
2.99942
12.
-1
2.99927
.---
2.53200 2. 51481 2.43710
--
2.92278 2.83761 2.87944 2.93498 2.98228
1.91099 1.8630s 2.81 863 2.80607
Multivariate Heterogeneity Index
Figure 4.11
1. -86
1. 63267 2.90956 2.87536
2.99942 2.99754
2.92566 2.89006 2.96121 2.96569 2.96729 2.97041 2.93610 2.92443 2.98476
1.89710 1.92563 2.94773 2.93507
2.99931 2.99351 2.95626 2.95078 2.94992 2.9391 1 2.98766 2.99443 2.97251 2.96884 2.96552
2.99323 2.98184
2.96917 2.97826 2.97857 2.99095 2.99704 2.99717 2.95035 2.93235 2.92650 1.83545 1.86301 2.96179 2.94942
2.99205 2.98594 2.97121 2.97403 2.98296 2.98937 2.99828 2.91608 2.93420 2.96326 2.94641 2.07171 2.04480 2.92359 2.94992
2. WE72 2.99359 2.97395 2.9748 2.99449 2.99504
--- --
2.99280 2.98688 2.98221 2.98477 2.99701 2.99054 2.88824 2.78146 2.74992 2.87012 2.14283 l . O 9 S 2 1-9B858 2.84692 2.8JJ26
2.97523 2.97353 2.96763 2.98091 2.99858 2.98971 2.99345 2.88852 2.87742 2.16598 1.33096 2.06219 2.99445 2.91229 2.89345
2.51554 2.66889 2.77035 2.83696 2.85768 2.8746% 2.88158 2.B5855 2.94399 2.11660 2.16231 2.99014 2.99315 2.91947 2.89324
2.53655 2.68348 2.77979 2.83432 2.85817 2.88076 2.87798 2.85703 2.85940 2.12424 2.15898 2.98934 2.83376 2.78679 2.84397
2.99215 2.97969 2.97972 2.99150 2.98413 2.97997 2.97772 2.98998 2.98345 2.13682 2.13592 2.79511 2.64348 2.89447 2.99852
2.43725 2.56250 2.53685 2.52348 2.50633 2.48775 2.13393 2.12977 2.12356 1.25496 2.08064 2.68560 2.68607 2.88705 2.89523
-------
1.83096 1.75556 1.72901 I.67413 1.56857 1.35714 1.32716 1.59684 1.57936 2. Xi812 2.24-3
2.19650 2.13822 2.06361 2.22015 2.19525 2.44881 2.43569 2.39907 2.35194 2.65297 2.61768 2.53981
2.00000 2.24904 2.19554
.------1.44510
2.99904
2.99618 2.90724 2.89350 2.99258 2.97901 2.95149 2.95926 2.99121 2. 995% 2. 99890 2.99979
2.99809 2.99809 2.92712 2.92308 2.99552 2.96496 2.96645 2.96676 2.99370 2.99861 12.99985 2.99981
z.
2.99745 2.99904 2. -904
2.99745 2.99698 2.99713
99969 Z . W B ~ 2~ 99979
L..
of membership of a unit to a region,
the result of the
classification procedure is not restricted to the definition
of each of the regions but, allows the analyst to gain
information on the "interiorityu
of a unit.
the resulting regions are fuzzy.
For example, both units 19
and 48 belong to the same region.
In this sense,
However, according to their
heterogeneity indices, unit 19 has a higher degree of
membership to the region than unit 48. To ease the reading of
,this measure, the multivariate heterogeneity index was
standardized as shown in figure 4.13.
In it, the degree of
membership of each of the units to the defined regions is
clearly appreciated.
The closer the value to one, the higher
the degree of membership i.e. the more interior to the region
is the unit.
Analogously, the closer the value of the unit to
zero, the lower the degree of membership i.e. the unit is
characterized nore as a "border" element.
It should be noted
that Itborder" units are not necessarily always in the actual
border of a region.
Second Stage.-
It is possible to iterate this type of algorithm in order to
obtain a smaller number of regions, as described in section
4.3.1. The 69 resulting regions from the first stage, were
re-named, values for the three variables were assigned to the
new elementary units and a neighborhood relationship was
established.
This task can be accomplished in several ways.
95.33
95.33
20
98.5
98.5
20.5
.97901
-97366
84
84
21
92.17
92.17
20.83
88.5
75
19
-91674
87.5
75
17
87.5
72
16.5
-81364
87.67
71.67
18
43.83
35.17
9.16
is
.98016
98
98
20.5
98.5
98.5
26.5
-96234
-96796
-25409
96.5
96.5
28.5
98.4
98.4
23.8
94
94
22
96.5
96.5
24.5
94.5
94.5
20.5
-97028
-65001
97
97
24
-97436
92
123.5
21
-30825
23.8
15.5
15.5
.WE93
3 0 ~
i
15
-4567
-92332
91.5
91.5
24
-61488
90
90
31.33
.65922
92.75
92.75
21
33
1.5
29.5
94
132
21.5
.8643
.921S3
75.83
75.83
27.5
-61096
-78788
33
1
28
33
-5
32
I
-33
33
-74369
-42103
33
2.66
3.66
33
1.5
4
-7092
Topological Algorithm ( F i r s t Stage)
V a r i a b l e s A , - B 'and C ; s t a n d a r d i z e d H e t e r e g e n e i t y Index
F i g u r e 4.12
-974
87.5
87.5
21
.99538
-88689
93
93
20.33
.96763
-92025
92.5
92.75
20. 25
-92493
90.5
89.5
90.5
89.5
~ 3 . 5 31.25
~
-81043
B9.B
89-29
20
88
88
30.25
.&618
-44289
59
49.67
12.33
-89277
-70x9
-78555
.59155
-87287
79.38
79.38
18.88
89.5
89.5
32.5
.as054
-30055
87.25
73.5
19.25
-99363
33
33
32.5
-72772
87
74
17.33
-92335
-19267
-91915
86
86
33-67
85.33
85.33
30
33
4
-80426
.49-
33
33
33
3.09
19
13.5
22.
-091JS
-24415
.!34051
18.71
18.71
25.71
-93423
.80829
-88359
18.5
11-63
22
-2781
18.5
10.5
28
17.25
17.25
25.25
.93683
17
17
16
33.n
1
3
33.33
1.33
22.33
17.67
12.33
17.33
.94114
17.67
17.67
16.67
-95036
18.8
18.8
16
I n t h i s c a s e , t h e c h o s e n p r o c e d u r e w a s t o a s s i g n t o t h e new
u n i t s t h e mean v a l u e of t h e e l e m e n t a r y u n i t s t h a t b e l o n g e d t o
it.
T h i s p r o c e d u r e was r e p e a t e d f o r t h e t h r e e v a r i a b l e s as
shown i n f i g u r e 4 . 1 2 .
The t o p o l o g i c a l a l g o r i t h m was a p p l i e d t o t h e 69 new u n i t s and
as a r e s u l t 21 r e g i o n s were formed as shown i n f i g u r e 4 . 1 4 .
Again, i n t h i s c a s e , t h e h e t e r o g e n e i t y index a s s o c i a t e d t o
l e a c h e l e m e n t a r y u n i t c a n be i n t e r p r e t e d as t h e d e g r e e of
membership t o t h e r e g i o n .
F o r e x a m p l e , t h e u n i t s formed w i t h
o r i g i n a l u n i t s { 2 , 3 , 4 , 5 , 181 and i 1 7 , 1 9 , 2 0 , 31, 3 2 , 33,
3 4 , 3 5 , 4 8 , 49, 581 b e l o n g t o t h e same r e g i o n however, i t c a n
be s t a t e d t h a t t h e f i r s t u n i t i s more a n i n t e r i o r e l e m e n t of
t h e r e g i o n t h a n t h e s e c o n d one as c a n be a p p r e c i a t e d from t h e
h e t e r o g e n e i t y i n d e x v a l u e s as shown i n f i g u r e 4 . 1 2 .
T h e r e a r e o t h e r manners i n which t h e v a l u e s of t h e v a r i a b l e s
and t h e n e i g h b o r h o o d r e l a t i o n s h i p c a n be d e f i n e d .
For
e x a m p l e , i n s t e a d o f u s i n g t h e mean v a l u e , t h e minimum, o r t h e
maximum o r a w e i g h t e d mean c o u l d have been c o n s i d e r e d and even
t h e o r i g i n a l v a l u e s of t h e e l e m e n t a r y u n i t s c o u l d have been
preserved.
I n t h i s l a s t c a s e , t h e n e i g h b o r s of a r e g i o n c o u l d
be d e f i n e d as t h e s e t of o r i g i n a l e l e m e n t a r y u n i t s t h a t have a
b o r d e r i n common w i t h i t , and t h e d i s t a n c e between t h e r e g i o n s
c o u l d be c a l c u l a t e d c o n s i d e r i n g t h e e l e m e n t a r y u n i t s i n t h e
b o r d e r of e a c h r e g i o n .
F i g u r e 4.14
T o p o l o g i c a l Algorithm (Second S t a g e ) . 2 1 R e s u l t i n g Regions
The Single Linkage Method
To illustrate the difference between the results obtained by
using the topological algorithm with other existing methods, a
contiguity-constrained single linkage procedure was applied to
the same problem.
Since the main purpose was to evaluate the
differences due to the inclusion of the heterogeneity index,
,the similarity measure used was also the euclidean distance
and the contiguity relations and grouping criterion were
preserved.
The single linkage method is a hierarchical procedure where
given "nttelementary units there are "n-i" regions in the ith
step.
To have results comparable to those obtained via the
topological algorithm, the procedure was stopped at the 171th
and 219 steps.
The 69 and 21 resulting regions are shown in
figure 4.15 and 4.16 respectively.
A Comparison
Comparison of regionalization algorithms can be undertaken
focussing on different aspects and at various levels.
Murtagh
( 1 9 8 5 ) for example, compares contiguity-constrained algorithms
under two main aspects, their computational performance and
the differences derived from the inclusion of a contiguity
2-R 22: 222 2% 22: 2% ZZR %ti 22: 2% 5% 52R
aNR 222 22s 22R 22: 5% 25:
RNn I n n 3-n
por
p-
R--
9-0
9
k?
22R
$2&
;R!
onn
22fi
NwN
060 00m &en mmn oon mm-r n n ~
--N
4-N
--IN
--N
00N
00N
PDN
gg: gg-
rg~
0
~ b R-r
n
qNK !?!!I
2% 2% 5% 22: F% SSR 668 $$A SSR 6% .,-I
o
M
a,
p?;
R-N
p r
ENn Rn-
gm-
8.; $8:: i : I ( i F i r C6.S FFRI I I R Stir1 F S 4 t : d PPPI
GO+
R N ~
pnr
00N
00N
no-1 ooN ogo -4-1 nno nno r r o nno ooo ooo nn
n n r e n r n amn c m 0 m N P I N 00* 0 D N 0 0 N I m-d
RE*
RRn
RRn RRn RR'
000
00-
RR"
nn
nnn
~
nn
RR" RE'
RRn
--00N
RE"
nnNl
nnn
nnn nna nnN
nn
nn
nnn
R V 33'
**I
8rnN
1 1
e
mm
rn m
mm
mnn mm-
mt.4
$ ne
be
00N
'
I I Z ~ ~
nno
00N
ma0 000
man D D N
n n r RRn Rpn
nn
R p RE+
690 nno oeo
oor
00n
RR'
--0
000
0
~
NNV
~P N
490 0
ah- e n r mnp
arm
ahr m
ah.., bnrl
Figure 4.16
S i n g l e Linkage Method. 2 1 R e s u l t i n g Regions
constraint.
The purpose of the comparison between the
topological algorithm and the single linkage one is to look
into the differences derived from applying a neighborhood
approach to the design of regionalization procedures.
Both algorithms are hierarchical and follow a single linkage
grouping criterion, however the order in which the units are
grouped is not necessarily the same, therefore the spatial
structures portrayed in the resulting regions are not
necessarily equal.
This fact points to a fundamental issue in the formalization
of regionalization procedures.
It was mentioned before that
one of the advantages of using a mathenatical framework in the
classification problems is that once an algorithm is
established, the solution becomes reproducible.
However, it
should also be considered tnat the design of the algorithm
depends on the analyst's knowledge and understanding of the
problem at hand.
Therefore, the resulting regionalization
depends on the assumptions made in the design of the algorithm
and the different regionalizations obtained from different
procedures can be explained under these considerations.
Comparison of the regionalizations obtained by both algorithms
shows that different aspects of the spatial structure of the
data emerges from the two procedures.
While the topological
algorithm regions have a tendency to be small and compact and
there are no isolated elementary units, the single linkage
method tends to produce larger regions as well as single-unit
regions.
On the other hand, while in the single linkage
method' the regions are well-defined, the resulting groups from
the topological algorithm are "fuzzy regions".
Finally, it should be remembered that since both algorithms
,are hierarchical it is possible to continue the process until
all the units are grouped into one single region.
However, it
was considered for this particular example, that the
comparison of results at the two
presented stages satisfied
the purpose of the exercise and therefore no further stages
d
were implemented.
4.4.2 A Topological Ward Algorithm
Two of the most commonly used algorithms in regionalization
problems are the contiguity-constrained Single Linkage and
Ward Methods.
The Ward Method is a hierarchical procedure
where the grouping criterion as described in section 4.1.3 is
defined so that the increase in the within group variance as
defined in equation 4.6 is minimized.
Similarly, as the
neighborhood approach was applied to a single linkage method,
it is possible to include the heterogeneity index concept in
the design of a Ward-type algorithm.
The algorithm is similar to the one proposed in section 4.3.1
except that in this case step 2 has to be modified as follows:
Step 2. Search for the neighbor of ai such that
the increment of the error sum of squares
as defined in equation 4.7, is the smallest.
Call it aj.
Again the main difference between the usual
contiguity-constrained Ward method and the topological one, is
the order in which the units are grouped.
4.4.3 Conclusions
Iri
There are many classification algorithms that can be applied
for regionalization purposes and the selection of the
appropriate procedure depends on the knowledge and
understanding the analyst has on the problem at hand.
This
knowledge is reflected in every decision made regarding the
different elements that are involved in the design of the
algorithm.
For example, in the case of the two algoritnms
that were compared, the single linkage and the topological,
the introduction of the heterogeneity index had a significant
effect in the results.
the exercise are:
The conclusions that can be drawn froa
the algorithms were designed to discover
different aspects of the spatial structure and the single
linkage method is a "sensitive model1' to changes in the
grouping order.
Besides the abovementioned, it should be
added that the different manners in which the two algorithms
group the units clearly points to the importance of an
adequate selection of parameters.
Although the most evident
border lines are identified by both procedures, the final
geometric patterns are clearly different.
If the analyst is
&interested in avoiding single-area units and regions
dissimilar in size, then a topological algorithm would be an
appropriate approach in a similar problem as the one presented
here.
Moreover, if the analyst needs to measure the degree of
interiority within a region, a topological algorithm would
have to be applied since it is the onlj existing procedure
that provides this type of information.
To better understand the issues involved in the use of a
classification scheme with regionalization purposes, it should
be remembered that the design of a regionalization algorithm
can be viewed as a modeling process.
In the case of the
neighborhood approach the point of departure is the intuitive
notion that certain aspects of the geographical landscape can
be adequately represented through the topological concept of
neighborhood.
These notions are formalized through the
heterogeneity indices and applied in the design of algorithms.
In this case the form of the algorithms is intimately related
with that of a more general model, where the main elements of
study of the spatial structure are geo-subspaces.
It can
therefore be stated that the application of a topological
algorithm is of interest where the Galton's component is an
important factor in the analysis of the problem.
Chapter 5.
AN APPLICATION TO EDUCATIONAL PLANNING
5.1 Introduction
In general, the heterogeneity index may be interpreted as a
measure of the local variation of the geographical landscape.
In particular, the study of the heterogeneity of geo-subspaces
may be used as an aid to solve planning problems.
In this
chapter an application of a neighborhood nodel in an
educational planning environment is presented.
Background
This application is part of a major project of the Mexican
government to provide planning agencies with cartographic
products and technical support in all aspects related to the
geographic information required for their activities.
The 25 million student education system was selected for two
reasons.
First it is one of the current Mexican
administration's highest priorities.
Second, geographic
information has not yet been systematically applied to
educational planning in Mexico.
One of the main concerns of Mexican educational planners is
the location of present and future school services.
In the
past, the decisions made by the government regarding the
>location of schools have not included spatial criteria.
However, a geographic information system for educational
planning purposes is currently being developed by the Ministry
of Education.
It is expected that 1986 will be the first year
when the spatial criteria is incorporated into the decision
making process.
Some methods to solve school location problems have been
developed by the World Bank, the International Institute for
Educational Planning.
However, spatial models and methods
such as those presented in this thesis are believed to improve
the solutions provided by previously used methods.
5.2 School Location Planning
Many different models and methods have been developed for
educational planning purposes.
Most of them have been applied
to regional or national planning, but little emphasis has been
given to spatial aspects.
For example, models and methods for
projecting school enrollments and manpower requirements are
often presented at a national level (Davis, 1980), although
maps and some spatial criteria have been included for local
planning purposes.
School location planning and area planning are the names that
several authors ( ~ a v i sand Schefelbein, 1980, Gould, 1978)
have given to the set of administrative policies, models and
methods that are used "to plan the distribution, size and
spacing of schools" ( ~ o u l d ,1978, p.2).
According to Davis and Schefelbein ( 1 9 3 0 ) , the basic purposes
of educational planning for areas are:
-To assess the outreach, or coverage, and
distribution of educational services to
population in areas within a nation state.
- To compare the coverage between and among the
areas, usually on the basis of the percentages
of the relevant population receiving service.
-To compare the coverage of the area with
national norms, standards or plan targets.
-To inventory facilities and resources
allocated to programs in the areas.
-To plan the provision of educational services
so as to expand coverage, enhance equity in the
coverage, and to improve the efficiency and
effectiveness of educational services in the
areas.
5.2.1 Models and Methods
Maps are the spatial models that are most commonly used in
area planning.
Usually, various indicators are mapped to
study their spatial distribution.
Which indicators or variables are represented on a map depends
on the depth of the analysis.
Indicators and variables can be
.classified into three major groups:
1 ) basic indicators such
as population, enrollments and school services;
2) efficiency
and effectiveness indicators such as the enrollment ratios,
percentage of enrollees who graduate; and 3) complementary
variables such as topography, highways and trails, and
potential usage of soil.
The first group of inventory indicators aids the planner in
gaining knowledge of the spatial location of educational
services.
The second group allows the planner to make
comparisons among areas.
The third group of complementary
variables provides information necessary to understand the
spatial behavior of the previous two.
Another group of
indicators and spatial variables provides guidelines for the
location of new school services.
Examples are threshold
population density and range measurements.
Threshold
population density represents the minimum total population
necessary for establishing a school, and range is the maximum
distance children are expected to travel to school ( ~ o u l d ,
1978)
5.2.2 Administrative Policies
In addition to the specific procedures developed for school
location planning, strong emphasis has been given to the
administrative policies that would lead to successful
,implementation of a plan.
Gould (1978) gives a detailed description of the role of both
central authorities and local officials according to the World
Bank's guidelines for school location.
Central authorities, through their national ministries, are
expected to do the following: provide norms for the sizes and
costs of schools and classrooms; establish construction
standards;
ensure that adequate data is compiled for area
diagnosis; administer the allocation of resources among the
various regions; and analyse spatial patterns of the services
schools provide.
Local officials are expected to apply the
norms established by the ministries and to provide the data
required by the central authorities.
School location planning has often been applied in Third World
countries.
UNESCO and the International Institute for
Educational Planning, for example, have undertaken studies of
school location planning in Costa Rica (~allak,1975), Sri
Lanka (~urugeand Ariyadasa, 1976) and Uganda ( ~ o u l d ,1 9 7 3 ) .
5.3 Educational Planning in Mexico
5.3.1 Historical Background
In order to understand the relevance of school location
planning in the overall context of Mexican educational
planning, historical perspective is important.
The
promulgation of the 1917 Constitution and the Lopez Mateos
Eleven-Year Plan are two events of this century considered by
experts as crucial in the development of Mexican education
When the 1910 Mexican Revolution ended in 191 7, a new
Constitution was prornulgatea.
Article 3 stipulated that
education be compulsory, secular and free for all Mexicans.
More than 40 years later, during the administration of Adolfo
Lopez Mateos (1958-64), an eleven-year plan was developed and
partially carried out.
The main goal of the plan was to completely satisfy the demand
for elementary education throughout the country.
this there were several major programs.
To achieve
First, there was
massive construction of classrooms. Between 1958 and 1964,
21,000 classrooms were built at the rate of
every two hours.. . I t (Solana et al. 1981 )
"... one
classroom
Second, a National
Commission in charge of publishing free textbooks for all
elementary school students was organized.
In addition,
important curricular changes were carried out, and
Finally,
special attention was given to programs for the in-service
training of elementary school teachers (Solana et al., 1981).
,Like Lopez Mateos virtually all post-Revolutionary Mexican
administrations have dedicated considerable human and
financial resources to elementary education.
This certainly
does not mean that other levels of education have been
abandoned. However, the main targets have been the lower
levels.
5.3.2 Planning Experiences
Before 1970, an educational planning agency did not exist
within the Mexican government.
It was not until the
administration of Luis Echeverria (1970-76) that an
organization for educational planning was formally established
within the Ministry of Education (secretaria de ~ducacibn
P6blica).
As would be expected, one of the main goals of the planning
agency
was to assure every school-age child access to
elementary education."
For this purpose various quantitative analyses were
undertaken, and in some cases sophisticated mathematical
models were used to predict, among other things, the flow of
students through the lower levels of education.
In order to use quantitative techniques it was necessary to
.develop an information system.
This system would contain
reliable, up-to-date data on the number of students and
teachers at the various educational levels as well as data
describing the schools1 physical resources.
During the following administration of President
J O S ~~
6 ~ e z
Portillo (1976-82), the general tendencies in educational
planning remained the same.
It has only been in the last
three years that planners have started looking more closely
into the quality of education.
This is probably a natural
consequence of the considerable progress the country has made
in quantitative terms. (see Figure 5.1).
As Figure 5.1 shows, the number of schools has increased
............................................................
The Mexican educational system comprises various levels.
Among them are elementary, secondary and preparatory levels.
Children are expected to enter elementary school at the age of
6 or 7 and remain there for six years. The following two
levels are secondary and preparatory with a duration of three
years each. There are different types of secondary and
preparatory schools (technical, general, etc).
ELEMEt4TARY
LEUEL
BOOT
NUMBER
Y E k F : ' = O F S C H O O L S ?N HUt4DEEDS
ELEMENTARY L E V E L
N U M B E R OF
" E A R S
STUDENTSI t4
F i g u r e 5.1
MILLI~HS
throughout the century.
The early demand for elementary
schools was so great that establishing one almost anywhere was
beneficial to the local community and to the country.
Now
however, the precise location of a school has become
particularly important.
Today's density of schools obliges
the planner to make a detailed and accurate study of the
geographic distribution of resources before making decisions.
,Another factor which has become important is the need to
ensure the coordinated growth of the educational levels.
There is no point in building a secondary school where there
is an insufficient flow of students from the elementary
schools or if migration affects the school-age population
significantly.
Geographic information is essential to permit
rational planning of these educational issues.
The use of geographic information in educational planning in
Mexico has not been systematic, but there is increasing
awareness of the need to support educational planning with
spatial analysis methods.
5.4 A Case Study of the State of San Luis Potosi
As mentioned in the previous section, the Mexican government's
investment in education has been mainly directed to the
elementary level.
However, the government has dedicated
considerable attention to the development of the secondary
system during the last six years, especially in the state of
San Luis Potosi.
As can be observed in figures 5.2 and 5.3,
the distribution of both elementary and secondary schools
throughout the area is reasonably uniform;
that is, the
growth of both systems has apparently been coordinated.
This
pattern is not maintained at the next level of education.
The
number of preparatory schools capable of receiving the flow of
-secondary graduates is evidently insufficient, as can be seen
in figure 5.4.
The Central Planning Office has become aware
of this problem and has decided to alleviate it through the
allocation of additional resources.
In this section a neighborhood model is proposed as an aid in
the selection of sites for the location of preparatory schools
and in school location planning in general.
Although the
technique is presented within a specific context, an analysis
of the different alternatives for the allocation of additional
resources (such as the establishment of a new school) is not
undertaken and no particular solutions are proposed.
However,
in the presentation of the model, some possible
interpretations are indicated to point out potential uses of
this tool in an educational planning environment.
Figure 5 . 2
Distribution of elementary schools
in the state of San Luis Potosi.
Figure 5 . 3
Distribution of secondary schools
in the s t a t e of Sari Luis Potosi.
Fiyre 5.4
Distribution of preparatory schools
in t h e state of San Luis Potosl.
5.4.1 The Data
Originally the state of San Luis Potosi had been selected as
the area of study on the basis of the availability of data.
Studies at the local level require disaggregated data.
At a
settlement level, census data has only been processed for a
few areas, among them San Luis Potosi.
In order to test the proposed model in a reasonable period of
time only a portion of the study area was selected.
This area
includes eight counties in the northern part of San Luis
Potosi as shown on figure 5.5.
This area is interesting
because of its apparently "heterogeneous landscape".
The two main sources of information were the Ministry of
b
Education and the National Institute of Statistics, Geography
and Informatics.
The Ministry has established a nationwide
information system with detailed data on its own human and
material resources as well as on the country's students.
Every elementary, secondary and preparatory school in the
country is registered, and information is stored on the number
of enrollees per group; the total number of groups, teachers
and classrooms; the estimated capacity and location of
schools; the number of students that have graduated, passed to
the following grade, failed or dropped out.
Figure 5 . 5
S t a t e of San L u i s P o t o s i
The National Institute of Statistics, Geography and
Informatics is in charge of the national census and of the
production of diverse cartographic products at a national
level.
5.4.2 The Geo-Space
A set of entities of interest to the present study are the
,settlements inside the selected area that have a secondary
and/or a preparatory school.
The flow of graduates among
these entities is assumed to depend on their spatial
relationship.
In this case the spatial factor considered
decisive was the distance a student has to travel to attend
school ("traveling distance").
Graphs were the mathematical models that were considered
appropriate to represent both the entities and their
relationships.
Each node in the graph represents a
settlement, and the links between them are defined through the
spatial relation of traveling distance.
The Traveling Distance Network
In order to determine the traveling distance between any two
settlements of the geo-space, a network was defined through
the existing communication network.
Although the feasibility
of traveling from one town to another depends on the physical
characteristics of the terrain, it was assumed that the
highway and trail system could provide enough information to
compensate for these differences.
The network was defined with the aid of topographic maps
(scale 1:250,000) as follows:
a link was established between
any two settlements whenever they were connected by any type
,of road without passing through a third settlement.
The
resulting network is shown on figure 5.6.
According to the definition of the network, each link
represents a road connecting two settlements.
Since the type
of road is an important factor to be considered in terms of
the ease of transportation, each road was divided in at most
b
two representative sections.
For example, the road between
two settlements could be composed of a section of highway
together with a section of trail. Two weights at most were
attached to each link according to the type of road of its
sections.
are:
The five types of roads together with their weights
paved (I), unpaved (1.5), trail (2), unpaved road or
trail in a mountainous area (3) and footpath (4). These
weights were determined by a reliable informant familiar with
the area of study so the values assigned to the roads are, to
some extent, subjective.
The traveling distance between two settlements is calculated
using the network as follows:
Let ti and tj be two settlements that are connected through a
link and the distance between them,
Dij
wik dik
Dij is:
wkj dkj
where wik, wkj and dik, dkj are the two weights and distances
attached to the link between settlements ti and tj.
If ti and tj are not linked, the traveling distance is
calculated by the sum of distances of the shortest path
between ti and tj.
5.4..3 The Geo-Subspaces
From a spatial point of view the flow of secondary school
graduates between settlements is one of the relevant factors
to be considered in the allocation of new schools.
The
subsets of settlements that have the possibility of
interacting through the flow of students are therefore part of
the geo-subspaces of interest in the present problem.
With
this idea in mind a neighborhood of each of the nodes was
defined through the traveling distance.
For a given maximum traveling distance dm, the neighborhood of
the node
ti) is the set of nodes where the traveling
distance to ti is less or equal to dm.
ti)
{ tj ; Dij
<
dm
The maximum traveling distance can be fixed at different
values.
Consequently, a network can be associated with each
one of them as follows:
the nodes of the network are the set
of settlements of the geo-space, and a link exists between any
two of them if they are neighbors.
In order to test different maximum traveling distances a small
computer system that allows the user to postulate different
distances and to plot the resulting networks was implemented.
Figures 5.7, 5.8, 5.9 and 5.10 show the results obtained by
testing different values for the maximum traveling distance
for the study area of San Luis Potosi.
Each of the networks shows a different spatial pattern
according to the fixed distance.
For example, for the 25 km
threshold, most of the settlements are clustered in two large
networks and only a few of them appear isolated.
In contrast,
in the 10 km case most of the settlements are isolated .The
percentage of settlements without neighbors for each of the
distances considered are: 6 0 $ , 26.42$,
13.57% and 5.71% for
the 10, 15, 20 and 25 km networks respectively.
These
observations allow the planner to better understand the
spatial dispersion of the units in the area.
Besides the measures commonly used in area planning, (number
of students per classroom, enrollment ratios, etc), some of
the specific characteristics of these networks can be included
as criteria for the location of schools.
The percentage of
isolated units and the number of neighbors of each node are
indicators of the degree of communication of each of the
settlements.
Table 5.1 shows the values of these quantities
,associated with each settlement and for the different
distances.
The settlements are identified in the table by
their code, as shown in table 5.3.
Catchment Areas
The catchment area associated to a school is simply the area
served by it.
In defining a catchment area factors such as
transportation facilities and terrain are considered.
In this
case each of the networks can be used in the definition of
these areas.
Since the interaction relationships are
explicitly represented through the set of links in this case,
each of the networks can be used in the definition of the
catchment areas.
Figure 5.11
shows the catchment areas of
each of the existing preparatory schools, assuming a maximum
traveling distance of 20 km.
The lack of service at the preparatory level was quantified
1OKM
15KM
20KM
25KM
CODE
1OEM
15KM
2
O
25KM
CODE
15KM
ZOKM
25KM
lOKM
CODE
1OKM
15KM
20KM
25KM
CODE
1OKM
15KM
20KM
25KM
CODE
1OKM
15KM
ZOKM
25KM
CODE
1OKM
15KM
20KM
25KM
CODE
425
0
0
0
1
405
0
0
1
5
303
0
1
2
5
202
1
2
4
6
102
1
1
2
2
606
0
1
3
5
607
0
0
0
2
104
8
8
8
8
426
0
1
2
2
406
0
1
1
2
304
0
1
1
1
608
0
0
1
2
409
0
0
0
0
307
1
1
4
6
206
2
2
3
3
106
0
0
0
0
609
0
0
1
2
108
0
0
0
0
410
0
1
1
2
308
0
0
0
1
411
0
1
3
4
309
0
0
0
0
207 208
0
3
1
5
2
7
5 1 1
107
3
3
3
3
412
1
1
3
4
310
0
1
1
1
209
0
1
1
2
109
1
1
1
2
701
1
1
1
4
702
0
3
5
1
703
0
2
4
0
504
1
6
1
15
414
0
2
2
3
312
0
0
1
4
211
1
2
3
6
111
0
0
0
0
TABLE
505
3
3
0
10
415
0
0
1
1
313
0
1
2
4
212
0
2
5
7
112
1
1
4
5
506
2
5
7
14
416
0
0
2
2
314
0
0
1
2
213
0
1
1
5
113
1
3
3
3
0
0
0
0
0
1
1
1
418
0
2
3
3
0
1
2
4
316
215
0
0
2
' 4
115
0
2
4
6
419
0
0
1
2
1
1
3
7
317
216
1
1
2
4
116
1
2
4
4
218
1
3
5
6
118
0
0
0
2
704 705 706
1
0
1
2
0
2
4
0
3
6
5
1
3
5.1
707
0
0
0
1
708
1
1
2
3
219
2
3
3
6
119
0
3
4
6
420
0
0
2
2
421
0
1
1
1
422
0
0
0
0
318 401 402
0
0
0
0
1
0
1
4
1
2
4
2
217
2
3
4
5
117
0
0
0
1
709
0
0
1
2
710 711 712
0
0
0
1
1
2
2
2
2
2
2
3
4
605
0
0
0
3
513
0
2
S
10
423
0
1
2
3
403
0
1
3
3
301
2
3
7
7
120
0
2
5
5
713 801 802
1
3
1
2
4
3
2
8
6
1
1
1
1
604
0
0
2
2
507 508 509 510 511 512
0
1
1
1
0
2
2
4
3
4
0
4
9
3
7
7
6
2
6
7
9
8
12
3
9
417
0
0
3
3
315
214
1
3
4
4
114
1
1
2
4
522 523 524 523 526 527 528 601 602 603
1
0
0
0
1
1
1
1
0
0
0
2
1
2
2
2
1
1
2
1
0
1
4
5
4
4
3
5
1
1
4
2
2
1
8
4
5
7
2
2
6
4
3
1
503
2
6
7
12
413
0
0
1
1
311
0
1
1
1
210
1
2
2
2
110
0
1
4
5
NUMBER OF NEIGHBORS BY DISTANCE
610
0
1
1
1
520 521
0
2
0
3
2
1
0
7
9
428 429 430 501 502
0
0
0
3
2
3
0
1
9
3
3
0
1
1
4
4
4
1
2
21
9
408
0
2
4
5
306
0
0
0
0
205
1
4
8
9
105
0
0
0
1
517 518 519
2
0
1
2
1
1
4
1
3
7
1 1
2 1 0
427
0
0
0
0
407
0
2
2
2
305
1
1
1
1
203 204
1
2
3
3
4
7
7 1 1
103
0
0
0
2
514 515 516
2
0
2
6
2
7
K
M
1
0
14
5 1 6
424
0
1
2
404
1
1
3
4
302
0
2
3
5
201
1
5
9
14
101
1
2
3
3
using the number of secondary graduates, the capacity of each
of the preparatory schools and the catchment areas.
A measure
of the coverage of each of the settlements with preparatory
schools was calculated as follows:
k
xi)
Xj
Xi)
Xj
Cj
i=1
Lj =
k
(
i=l
where Lj is the lack of service associated with the jth
settlement, Xi is the number of secondary school graduates the
ith settlement and a fixed year, k is the number of neighbors
of the jth settlement and Cj is the capacity of its
preparatory schools.
Table 5.2 shows the results obtained
using a traveling distance of 20 km and 1984 data.
In this
case the lack of service for Charcas is a negative number.
This means that there is a surplus in the service for this
particular settlement.
Settlement
Ci
Li
Charcas
C 1 =392
L1=-0 0481
-4.81 $
Mat ehuala
C2=1242
L2= 0.1573
15-73 %
Table 5.2 Service of preparatory
schools for a 20 km traveling distance
( 1 984 data)
5.4.4 Some Spatial Characteristics of the Demand for
Preparatory Schools
One of the factors that is considered crucial in the location
of preparatory school services is the spatial distribution of
the demand.
In this case the study of the demand was carried
out using the number of secondary school graduates for three
'consecutive years (1983-85) (see table 5.3).
It was
additionally assumed that all graduates demand preparatory
school services, and that there is no flow of students from
neighboring counties outside the study area.
Interaction Spaces
Besides the local demand generated by a settlement, the
possible flow of secondary graduates among settlements is
another of the issues that has to be considered in the
location of a preparatory school.
In this case the space
where a flow of students to attend a preparatory school is
expected, is delimited by the neighborhood of each of the
settlements.
This space is called interaction space.
measure of the expected intensity of interaction among
settlements in a geo-subspace is given by the heterogeneity
index of the neighborhood.
For the study area of San Luis
Potosi, a univariate heterogeneity index (equation. 3.2) was
SETTLEMENT
CODE
Real d e C a t o r c e
L a s Ad j u n t a s
A l a m i t o s d e 10s D i a z
La C a n a d a
Cardoncita
Cas t a R o n
Los C a t o r c e
Guadalupe d e l C a r n i c e r o
La Maroma
E l Mastranto
P o t r e r o No.1
R a n c h i t o de Coronados
E l S a l t o y Anexos
San Antonio de Coronados
~ S a nJ o s e d e C o r o n a d o s
S a n t a Cruz de C a r r e t a s
S a n t a Maria d e l R e f u g i o
Tanque d e D o l o r e s
Vigas d e C o r o n a d o
Wadley
Cedral
E l Blanco
C e r r o de las F l o r e s
La C r u z
Cuare j o
Hidalgo
J e s u s Maria
Lagunillas
P a l o Blanco
P r e s a Verde
R e f u g i o de las Monjas
El Saladito
San I s i d r o
San Lorenzo
San Pablo
S a n t a R i t a de S o t o l
T a n q u e Nuevo
Zamarr i p a
Progreso
Char c a s
A l v a r o Obregon
Caiiada V e r d e
E l Capulin
E l Cedazo
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
20 1
202
20 3
204
205
206
207
208
209
21 0
21 1
21 2
21 3
21 4
21 5
21 6
21 7
21 8
21 9
30 1
30 2
303
304
305
Number o f S e c o n d a r y G r a d u a t e s
per settlement.
Table 5.3
Emiliano Zapata
F r a n c i s c o I . Madero
Guadalupe V i c t o r i a
La B o r c i l l a
Lo d e A c o s t a
Miguel Hidalgo
N o r i a d e C e r r o Gordo
Pocitos
Presa Santa Gertrudis
San R a f a e l
El Terrero
Vicente Guerrero
La Z a p a t i l l a
Guadalcazar
Amoles
Buenavista
,Charco Blanco
Charco Cercado
El Praile
La Hincada
Huisache
IYilagro de Guadalupe
Negritas
Noria d e l Refugio
NuHez
Peyote
La P o l v o r a
Potreritos
P o z a s d e S a n t a Ana
Pozo d e Acufia
P r e s a de Guadalupe
P r e s a de T e p e t a t e
Quelital
Reale j o
San Antonio de T r o j e s
San F r a n c i s c o d e l T u l i l l o
San I g n a c i o
San J o s e de C e r v a n t e s
S a n Rafael d e 10s N i e t o s
S a n t a R i t a d e l Rocio
S a n t o Domingo
Ventana
La R o s i t a
llatehuala
A r r o y i t o d e Agua
La B o n i t a
La Cabra
La Caja
La C a r b o n e r a
Table 5.3 (cont.)
E l Carmen
Concepcion
E n c a r n a c i o n de Abajo
E s t a n q u e de Agua Buena
Guerrero
E l Mezquite
Pastoriza
Los P o c i t o s
Pozo de S a n t a Clara
Rancho Nuevo
Sacramento
San A n t o n i o de l a s B a r r a n c a s
San A n t o n i o de 10s C a s t i l l o
San F r a n c i s c o C a l e r o s
San J o s e de l a Viuda
San J o s e de 10s G u a j e s
, S a n Miguel
S a n t a Cruz
Santa Lucia
Tanque C o l o r a d o
E l Vaquero
Los Cinco Sefiores
Vanegas
E l Gallo
Huertecillas
La P u n t a
E l Salado
San J u a n de Vanegas
San V i c e n t e
Tanque de Lopez
E l Tepetate
Zaragoza
V i l l a de Guadalupe
Biznaga
Guadalupi t o
L l a n o de J e s u s Maria
La Masita
La P r e s i t a
P u e r t o de Magdalenas
Rancho A l e g r e
San B a r t o l o
San F r a n c i s c o
Santa Isabel
Santa Teresa
Z a r a g o z a de S o l i s
La Paz
San A n t o n i o de l a s T r o j e s
Table 5.3 ( cont )
calculated using the number of secondary school graduates for
each settlement and the neighborhood relation established in
the 20 km network.
As can be observed in table 5.4, the pattern of the demand is
very similar in all three cases.
The spatial distribution for
1983 is shown in figure 5.12.
Two areas are distinguished by
"less homogeneous" subspaces.
These are the area surrounding
.Matehuala, the largest settlement inside the study area, and a
smaller area around Charcas, the second most important urban
center.
The heterogeneity index therefore indicates that the
geo-subspaces that form the area around Matehuala and Charcas
are characterized by relative heterogeneity in the demand.
The standardized heterogeneity index associated with each
settlement shows that the city of Matehualals variation is
much more significant than that of the rest of the
settlements.
The settlement of Guerrero is the only other one
where the value of the index is smaller than 0.5.
In fact, the values associated with most of these settlements
is close to 1 which means that the geo-subspaces associated to
them are homogeneous in comparison with Matehualals.
In those areas that are formed by heterogeneous geo-subspaces,
greater interaction among the settlements can be expected than
in areas where homogeneity prevails.
Thus, the impact of the
location of a school on a settlement immersed in a
heterogeneous environment should be analysed in greater
detail.
For example, the settlement of Guerrero is in the
catchment area of two settlements with very different demands:
Matehuala and La Presita.
In this case, before a decision is
taken regarding the location of a school, several alternatives
related to the possible flow of students have to be analysed.
For example, Matehuala could become a point of attraction for
the secondary school graduates of Guerrero.
On the other
hand, the location of a school in La Presita could satisfy the
demand of Guerrero and avoid the overcrowding of Matehuala.
In summary, the measure of the local variation of interaction
spaces allows the planner to identify the degree of expected
interaction in settlements.
This aids in the delimitation of
zones where "intense" interactions are expected.
Similarly,
the index can serve in the definition of zones formed by
geo-subspaces of homogeneous demand.
Finally, to test the impact of a change in the criterion of
maximum traveling distance on the spatial pattern of
variation, the heterogeneity index was calculated using the 1 5
and 25 km network for the year 1983. As can be seen in table
5.5, the pattern presented is similar to the one obtained for
the 20 km network.
There are however, differences in the
values associated with particular settlements.
This indicates
CODE
15KM
25KH
CODE
15KM
Z5KM
CODE
15KM
25KM
CODE
15KM
25KM
CODE
15KM
25KM
COX
15KM
25KH
CODE
15KM
25KH
CODE
15KM
25KH
CODE
15KH
25KU
COLE
15KM
25KM
CODE
15KM
25KH
CODE
15KM
25KM
CODE
15KM
25KM
CODE
15Y.M
25KM
TCIBLE
5.5
STCINDF\RDIZED HETEROGENEITY INDEX
15 AND 25 KU NETWORKS ( 1 9 8 3 )
that although no significant change should be expected in the
general interaction pattern if any of the three (15,20,25 km)
maximum traveling distances is taken as a threshold, in the
analysis of individual settlements attention has to be given
to the' heterogeneity values associated in each case.
A Temporal Analysis
,Up to this point the whole analysis has focussed on the state
of the educational system at a fixed point in time.
However,
the analysis of the evolution of the system is important both
to understand its present state and to evaluate the impact of
planning actions.
Temporal Stability
At this point local variation of temporal subspace was studied
insofar as it could be expected to indicate temporal
"stability" of the demand in each of the settlements.
A heterogeneity index similar to the spatial heterogeneity
index was used to calculate the temporal variation of the
demand for each settlement.
is defined as follows:
The temporal heterogeneity index
and
where k is the number of school years considered, X is the
mean value of the demand and Xi is the demand in the ith year.
,This index can be standardized in a similar manner to the
spatial case.
The index was applied to the study of the temporal stability
of the demand for a specific type of secondary schools
"[Link]
Three different types of secondary schools can be
distinguished in the school system: general, technical and
TV-secondary.
TV-secondary is the type of school that has
been established in most of the settlements in San Luis
Potosi.
In fact, the only places where there are technical
and general schools are the county seats.
TV-secondaries are
designed to serve communities where the size of the population
is too small to establish a regular school, and the settlement
can not be serviced by neighboring ones.
Televised classes
keep the number of teachers required to a minimum.
Besides the deficit of preparatory school service in those
settlements that have regular secondary schools, there is also
a lack of preparatory schools for the students graduating from
a TV-secondary in the area.
The number of students in each of
these schools is in the interval [1,50].
There are, however,
variations in the number of students from one school year to
the next.
The temporal heterogeneity index was used as a tool
, t o quantify this variation (see table 5.6).
With the aid of this index it is possible to identify those
settlements where temporal variation is significant.
The
value of the index can be interpreted as a measure of
"temporal stability" of demand for preparatory services.
For
planning purposes a preparatory school in a settlement or
catchment area which has an "unstable" demand is not
advisable.
Degree of temporal stability is a measure that has been
associated to each settlement as an isolated entity.
There
is, however, an interaction space related to each settlement
that must also be considered. The spatial distribution of
temporal stability as shown in figure 5.13
"heterogeneous pattern."
presents a
A heterogeneity index was applied
using the value of the temporal heterogeneity index as a
variable to quantify this spatial variation and the 20 km
network as a basis to define the neighborhoods (see table
5.7)
The value of the index associated with a settlement is that it
provides a measure of the spatial variation of its
neighborhood according to the temporal variation of the
settlements.
For example, values close to zero indicate that
the spatial neighborhood is "highlyttheterogeneous with
,respect to "temporal stability."
On the other hand, values
close to 1 indicate that the spatial neighborhood is "much
less" heterogeneous with respect to the "temporal stability"
of the settlements inside it.
The map in figure 5.14 shows the spatial distribution of the
spatial heterogeneity index of temporal stability.
Both maps
b
(figures 5.13 and 5.14) can be used to identify neighborhoods
where temporal stability is high and spatial heterogeneity is
low.
Assuming that the temporal trend is maintained, this
characteristic of a neighborhood indicates to the planner that
demand in the catchment area of a settlement where a school is
to be located will not have a large temporal variation.
5.4.5 Additional Considerations
Besides the use of the heterogeneity index as an aid in the
study of the spatial characteristics of demand,
this same
120
.I2227
210
.52401
301
xxx
31 1
-96943
403
.00436
413
.65502
423
-96943
503
-86462
513
-96943
523
-87772
605
-88209
705
-94325
802
.90829
119
-93013
209
.96943
219
.52401
310
-65502
402
-70742
412
-75109
422
. 8 1222
502
-75109
512
.87772
522
-75109
604
-35807
704
.98689
80 1
118
-88209
208
-91703
218
-96943
309
-89082
40 1
XXX
41 1
-0917
421
-0524
50 1
xxx
51 1
.59388
52 1
-79039
603
-72052
703
-89082
713
-97703
702
.94323
712
.72052
116
.982!!53
206
-83842
216
- 51528
307
-56331
317
-79039
409
-93013
419
-99563
429
-98689
509
-99563
519
.98689
60 1
70 1
XXX
71 1
-70742
610
-98689
710
-67248
609
.79039
709
-93013
608
.51528
708
-82969
ill
-82969
20 1
XXX
21 1
.56X 1
302
-5764 1
312
.73362
404
.41921
414
-96943
424
-87772
504
-83842
514
.a6462
524
-96943
606
- 91703
706
.ZOO87
CODE
INDEX
CODE
INDEX
CODE
INDEX
CODE
INDEX
CODE
INDEX
CODE
INDEX
CODE
INDEX
CODE
INDEX
CODE
INDEX
CODE
INDEX
CODE
INDEX
CODE
lNDEX
CODE
INDEX
TABLE 5 . 6
(EXCLUDING COUNTY SEATS)
STANDARDIZED TEMPORARAL HETEROGENEITY INDEX
XXX
110
-78602
101
XXX
109
-86462
108
-96943
106
.El7772
XXX
CODE
INDEX
::
9
DN
2s
?o
-.
DN
-m
2,
8
n~
Z?
gN O
Z
"N
R
2g
N.
n.
-m
8
n~ 2
89 N-
S
N:O;
w-
Nh
TO;
m
m
am
~
&
8-y
,, ,, ,, ,,
Fb
pNd
an
Sn
Nr.
NQ
N
QM
$q
NQ
4
QD
8,
E
m4h
$,
E:
~b
;y
n~
PC
m
@
mm
sp
-x
$2
m
~
mm
O
gq
S ~ a t i adistribution
l
of the temporal
heterogeneity index.
F i g u r e 5.13
Distribution of the spatial heterogeneity
index applied t o temporal stability.
Figure 5.14
measure can be used to study other aspects of educational
planning.
1.
Some examples follow:
Various indicators can be used to compare efficiency in
the settlements studied.
The rate of students graduating from
elementary and secondary schools for a fixed cohort may be an
indicator of the quality of the educational services.
,The heterogeneity index can be used to test the uniformity of
the services provided.
Settlements with high values indicate
anomalous conditions either superior or inferior to those of
surrounding settlements.
Homogeneous zones receive similar
services, while heterogeneous zones show disparate services.
2.
The number of inhabitants per school in the different age
groups and at different educational levels is an indicator of
the distribution of the resources among the areas.
The interpretation of the heterogeneity index in this case is
similar to the previous example.
Other variables that
indicate the amount of resources given to a settlement can
receive a similar treatment.
3. If two or more indicators of the equity or efficiency of
the system have been defined, a multivariate heterogeneity
index can aid the planner in identifying zones or settlements
by equity/inequity or efficiency/inefficiency measures.
4. Finally, a time series analysis of the evolution of school
services in such aspects as equity and efficiency can give the
planner important information in order to better understand
the present state of the educational service and to anticipate
future developments.
,5.5 Summary
The heterogeneity index as a measure of the degree of
membership of an areal unit to a region was applied in the
design of regionalization algorithms.
There are however other
possi-ble interpretations of a measure of local variation of a
geo-space.
Planning was selected for tne application of the
L
heterogeneity index because areas of "high contrast" are of
special interest for planners.
In particular in school
location planning various indicators are used in tne
characterization of the spatial distribution of human
resources and material assets of the school systems.
The
usual procedure in school location problems has two stages,
first a definition of measures of interest for the planner,
such as efficiency and effectiveness is done, and second, a
representation of these indicators is made in maps.
As
mentioned before, one of the main differences between the
neighborhood approach presented in this work and previous ones
is that while in the most traditional models the
representation of the geographical landscape is made through
isolated entities such as lines,points and areas, in
neighborhood models the basic units of study are
geo-subspaces
In the first part of the chapter
scenario" is established.
a "school location problem
The area of study is the northern
,part of the state of San Luis Potosi in central Mexico.
The
Mexican school system has reached a point where there is a
need to assure a coordinated growth among the different levels
of education.
Although since the 1970's a planning system was
established within the government, very little emphasis has
been done on the spatial aspects in the different school
systems models that have been implemented.
Currently, the
information system that supports the decision making is being
transformed into a geographic information system.
That is,
for the first time, location variables are being included into
the planning system at a national level.
The area of study is characterized by a significant secondary
school system growth, as a consequence there is a greater
demand on the higher levels of education.
The central
authorities are aware of this phenomena and have decided to
satisfy the demand establishing new schools.
It is a common practice to use various indicators as well as
their spatial distribution and catchment areas in the decision
process for the location of schools.
There are however
besides the above-mentioned spatial aspects of the school
location problem, other ones that have not been studied.
In
the second part of the chapter several indicators based on the
notion of "local variation" are proposed as tools in a further
study of the spatial characteristics of the problem.
However,
,it is important to mention that the basic goal is to present
the tools rather than specific solutions for the location of
new schools.
It is clear that in a problem of the complexity
of the one presented here, in order to reach the best feasible
solution it is necessary to include in the analysis social,
cultural, administrative and financial factors besides the
geographical aspects.
The geo-space of interest is defined as the set of settlements
inside the study area that have a secondary and/or a
preparatory school, together with the spatial relationships
that are relevant to the problem, the geo-subspaces are
defined as sub-sets of settlements.
The size and shape of the
geo-subspaces is not necessarily fixed.
It is in fact a
parameter that the planner can use at the decision making
stage.
Such is the case of the traveling distance, that can
be used in the definition of the "optimum" number of schools
to be located if there is a limitation on the number of
schools to be established,
since it allows the planner to
determine the size of the catchment areas.
For example, if it
is assumed that the "optimum" traveling distance for a
secondary graduate is less than 10km (figure 5.10) it is clear
that the number of schools is larger than if the optimum is
fixed at 25 km. (figure 5.7).
Two indices were applied in the study of the demand of
,preparatory schools in the area of interest.
In the first
case the heterogeneity of the geo-subspaces was interpreted as
a measure of the expected degree of interaction within a
catchment area.
Since the degree of interaction is a measure
of the flow of secondary graduates, this characteristic of the
geo-subspaces can be used as a factor in the analysis of the
impact of the location of new schools.
The second index is a
measure of the local variation of the temporal stability of
the demand.
In this case, the demand's stability or
unstability in a region can be used in similar studies.
In brief, the study of the local variation of the
geo-subspaces that form the area of study allows the planner
to include into the decision process, spatial aspects such as
the degree of communication of the settlements, the level of
interaction within a neighborhood and the local variation of
the temporal stability, that were not previously considered.
Chapter 6.
CONCLUSIONS
6.1 Summary
A common criticism of the mathematical models used in human
geography is that in many cases key features that are
considered essential for geographical analysis purposes are
not represented.
This limitation has been discussed with
regard to factor analytic models and statistical inference in
section 2.1 . I
(see also Haining, 1 9 8 3 ) .
Heighborhoods is one
of the geographical concepts that had received little
attention by modelers until the latter part of the
quantitative revolution.
Although there are several
mathematical models that that have been specifically designed
to represent geographical neighborhoods, there has been no
general attempt to establish an overall framework for the
development of these tools.
Therefore, as mentioned in the
introduction, the principal objective of this work has been to
present a general approach to the modeling of the notion of
geographical neighborhood as well as to develop mathematical
representations of it.
Three different levels of modeling are found in the thesis.
In the' first and most general, the notion of neighborhood
model has been introduced through the mathematical concepts of
space and subspace.
Second, the local variation of
geo-subspaces has been modeled through the heterogeneity
,index. Finally, two neighborhood models have been developed
and applied to specific situations:
the design of
regionalization algorithms and the definition of spatial
criteria for school location planning.
The first level of modeling is based on the intuitive notion
that subjacent to the geographical landscape there are spaces
b
and subspaces.
In the development of mathematical theory,
spaces and subspaces play a fundamental role.
Since the mathematical concepts of space and subspace satisfy
certain requirements that make them appropriate as elements of
representation of the geographical notion of neighborhood, two
corresponding quasi-mathematical structures, geo-spaces and
geo-subspaces, have been introduced to further the objective
of establishing a general framework for the design of
neighborhood models.
A general approach to the design of neighborhood models has
been discussed, and tools have been developed to the study of
geo-subspaces.
Although other existing techniques such as
autocorrelation and geostatistics have incorporated
neighb'orhoods in models with predictive purposes, the present
research is based on the assumption that the concept of
geographical neighborhood has not been fully modeled
previously.
With this idea in mind a measure of "local variation" of a
geo-subspace has been defined.
Although the measure itself
resembles that of statistical variance, in this case the
treatment is non-inferential.
In fact, formalization has been
carried out in topological rather than in statistical terms.
Applications of topological concepts in spatial analysis are
found in several branches of geography including geomorphology
(~ark,1977), geographic information systems ( ~ u t t o n ,1968) and
transportation ( ~ a g g e t tand Chorley, 1969).
The topological
entities on which these applications are based are graphs.
There are however other topological concepts of interest for
geographical purposes. One of the main sub-branches in
topology is general, or set, topology.
Based on the
mathematical concept of neighborhood, in set topology,
concepts such as that of limit and continuity are extended to
abstract sets ( ~ i r b yand Gardiner, 1982).
Topological
concepts such as the boundary or interior of a region, open
set and neighborhood are used to model the geographical
concept of neighborhood.
Here, the introduction of a topology to an specific element of
a geospace, a graph, has allowed the identification between
the geographical and topological concepts of neighborhood.
This might be expected since both notions have their origin in
.,the same conception of nearness to a point or entity.
The idea of fuzziness as developed by Zadeh ( 1 9 6 5 ) has been
incorporated to a topological space through the heterogeneity
index.
This result suggests the possible development of a new
mathematical structure, Fuzzy Topology.
In the third level of modeling, neighborhood models have been
applied in two instances:
the design of regionalization
algorithms and the definition of criteria for the location of
schools.
The algorithms designed are presented as examples of the use
of the neighborhood approach.
As mentioned previously,
algorithms have to be designed according to the problem at
hand.
That is, the algorithms that have been presented are
not necessarily adequate for every regionalization problem,
although for the specific case of a central agglomerative
procedure the application of the heterogeneity index has been
fully described.
The fuzzy set algorithm presented provides the analyst with
information which is not available when a bivalent logic is
used.
A degree of membership of an element to a region has
been given as a result of the classification procedure.
It is
important to note that geographical concepts have been
,previously modeled using fuzzy sets no ale, 1972 and Leung,
1982).
However, in the design of classification algorithms
that include a contiguity constraint the use of fuzzy concepts
has posed some special problems.
Contiguity is a
characteristic that has been considered essentially bivalent,
although in some cases (cliff and Ord, 1973) quantities such
as the length of the border have been used as a measure of
"contiguousness."
In the algorithm discussed here the
fuzziness refers, in topological terms, to the degree of
"interiority" of a point to a set.
As a result, the
"membership" of an element to a region has not been described
in terms of its "membership" to contiguous regions but rather
with respect to its degree of membership to the border or the
interior of the region.
In the second application a neighborhood nodel has been
designed to aid in a school location problem.
In this
particular area of educational planning the use of spatial
models has been scarce, although indicators that include some
spatial criteria are generally used as an aid in the selection
of sites to establish new schools.
In this case the
heterogeneity index has been applied to aid the study of some
spatial and temporal aspects of the demand for preparatory
schools.
The spatial interaction among settlements and the
temporal stability of demand are two spatial factors that have
been pointed out as important indicators in the analysis of
,alternatives for the allocation of resources.
6.2 Discussion
It has been stated that three levels of modeling have been the
concern of this study:
1 ) the establishment of a general
framework for tne design of neighborhood nodels;
design of tools for the study of geo-spaces and
2) the
3) the
application of neighborhood nodels to specific geographical
problems.
This concluding section contains some brief and somewhat
speculative remarks on the general importance of each level of
analysis.
At the third and most detailed level of modeling two
neighborhood models were applied to geographical problems:
regionalization and school location planning.
Although in
neither case was the use of the technique exhaustive, the
results obtained indicate that the mathematical modeling of
geographical landscape through "neighborhoods" constitutes a
fruitful avenue of inquire for both applied and basic research
purposes.
At the second tool design level, it was the development of the
heterogeneity index as a measure of local variation of a
,geo-subspace that made the third-level applications possible.
It would seem to follow that the same conceptual tool could be
applied not only to similar contexts in the future but also to
the study of the set of neighborhoods that conform a
geo-space.
One obvious area of exploration would be to
substitute neighborhoods for tne entities used in existing
models.
For example, a measure of correlation could be
applied to two or nore sets of neighborhoods of one or more
geo-spaces.
Similarly, the representation of a geo-space
through a topological space with fuzzy characteristics, raise
the possibility of using topological and fuzzy set theory for
a thorough study of the geographical landscape.
In this case,
even though fornalization was achieved by identifying
geographical entities with mathematical ones, the use of the
mathematical models themselves was not extensive.
It must
therefore be acknowledged that the strengths and weaknesses
of the application of topological and fuzzy set theory to
geographical problems remains largely unexplored.
Finally, from a more general point of view, this thesis
illustrates the kind of discoveries that can be expected from
high-level communication and interaction among two or more
fields' of inquiry.
In this particular case the
geo-mathematician finds, on one hand, a previously unknown
universe of applications of abstract mathematical theory, and
on the other hand, the equally unsuspected possibility of
,modeling the fundamental notion of geographical neighborhoods.
APPENDIX
MATHEMATICAL CONCEPTS
This appendix contains the mathematical details of the
.,definitions of a topology in a connected graph, as presented
in chapter 3.
The definitions of some mathematical concepts
necessary for the discussion are given in the first section.
A.l Mathematical Definitions
1 ) A graph with m points and q lines is called a
b , q ) graph.
2) Walk of a graph.
A walk of a graph is an alternating sequence of
points and lines
. . . ,Vn-1 ,Xn,Vn
VO,X1 ,V1 ,X2,
beginning and ending with points, in which each
line is incident with the two points immediately
preceding and following it.
3) Path of a graph.
A path of a graph is a walk where all the points
(and t h u s a l l t h e l i n e s ) a r e d i s t i n c t .
4 ) Connected graph.
A g r a p h G is connected i f e v e r y p a i r of p o i n t s
a r e j o i n e d by a p a t h .
5 ) Subgraph.
A subgr a p h of G i s a g r a p h h a v i n g a l l i t s p o i n t s
and l i n e s i n G .
6 ) Difference.
For t h i s p a r t i c u l a r a p p l i c a t i o n t h e d i f f e r e n c e
b e t w e e n two g r a p h s G1 and G 2 i s d e f i n e d as
follows:
The l i n e s i n
G1
G2
are all the lines that
b e l o n g t o GI a n d do n o t b e l o n g t o G 2 . T h a t i s :
X ( G I - ~ 2 )= X ( G ~)
The p o i n t s i n G I - G 2
x ( G ~ )
a r e those t h a t are
r e p r e s e n t e d i n X ( G ~- ~ )2
7 ) Topology
L e t X be a g i v e n s e t of o b j e c t s c a l l e d t h e
p o i n t s o f X.
A t o p o l o g y i n X i s a non-empty
c o l l e c t i o n of
s u b s e t s o f X c a l l e d open s e t s s a t i s f y i n g t h e
f o l l o w i n g f o u r axioms:
Ax. 1
The e m p t y s e t i s o p e n .
Ax. 2
The s e t X i t s e l f i s o p e n .
AX.
The u n i o n o f a n y f a m i l y of o p e n
s e t s is open.
Ax. 4 The i n t e r s e c t i o n o f a n y ( a n d h e n c e
o f a n y f i n i t e number o f ) o p e n s e t s
is open.
A s e t is s a i d t o b e t o p o l o g i z e d i f a t o p o l o g y
h a s been g i v e n i n X.
A t o p o l o g i z e d s e t X is
c a l l e d a t o p o l o g i c a l s p a c e and t h e t o p o l o g y T i s
c a l l e d t h e topology of t h e space X
(HU,
1964,
p.16).
I n t h i s c a s e t h e s e t o f i n t e r e s t is a c o n n e c t e d g r a p h G w i t h
p o i n t s V(G) a n d l i n e s x ( G ) .
To a p p l y t h e c o n c e p t o f t o p o l o g y
t o G, t h e c o n c e p t o f s u b s e t i s i d e n t i f i e d w i t h t h a t o f
subgraph.
To e x e m p l i f y some o f t h e s e d e f i n i t i o n s a s s u m e t h a t GI, G 2 , G3
a n d G4 a r e f o u r g r a p h s as shown i n f i g u r e A . 1
By d e f i n i t i o n
G5
GI U G2
is such t h a t :
V(GI U G2)
V(GI) U v ( G ~ ) =
{ V1, V2, V3, V4, V5, V6 V7
and
X(G1 U ~ 2 =) x(G1) U x ( G ~ ) =
=
( ( ~ 1, ~ 2 ) , ( ~ 2 , ~ 3 ) ~ ( ~ 3 , ~ 4 ) ~ ( ~ 3 , ~ 5 ) , ( ~ 2 , ~ 6 ) , (
is' shown diagrammatically in figure A.2.
G5
The intersection between G1 and G3 (G6 = G1 fl G3)
as defined
in Chapter 3 is such that:
.,
X(G~ n ~ 3 =) X(GI )
=
n~
( ~ =3 1
{(vl ,v2)7(v2,v3)1
and it points are those represented in
V ( G I ~ G ~= )(vl, v2, v31.
~ ( G l n~ 3 )so that
G6 is shown in figure A.2.
As another example of the intersection of two graphs consider
G7
G3
G4.
In this case since x ( G ~ ) =
8, G7 is the (0,O)
graph.
The difference between G1 and G3
X(G8)
x(G1 ) - X(G3)
{(v3,~4),(~3,~5)1
G1
G3) is such that:
v(G~) = [v3,v4,v51
and
V(G4)
Moreover, as expected G1
G3
(1;8=
G4 = G3,
G3 U G4
GI is the (0,0)graph since x ( G ~ )c X(GI )
G1 and
Figure A. 1
Figure A . 2
A.2 Mathematical Discussion
Given ( m , q ) a connected graph G such that
q#O
then the
following propositions are true:
Proposition 1.
The collection of open sets as defined in
section 3.4 is a topology of G
Ax.1 The (0,O) graph is an open set.
This is true by the
empty condition.
Ax.2 i)
G is an open set.
By definition G is a subgraph of
ti.
ii)
Let
v(G).
Since G is connected and
there exists a point pl such that
subgraph
X(N)
G
be defined as:
((p,pl ) ) .
such that
V(N)
(p,pl)
=
q # 0,
x(G).
Let
[p,pl 1 and
N is a non-empty connected subgraph of
V(N) # (pi and
V(N).
Therefore
is an open set.
Ax.3 The union of any family of open sets is open.
Let
01, 02,
...On
be a family of open sets of G and
i)
n
By d e f i n i t i o n V ( O ) = U ~ ( 0 i and
)
i=1
Given p
V ( O ) and ( p l , p 2 )
x(O),
x(o)= U
t h e r e e x i s t open s e t s
O i and O j s u c h t h a t p c V ( 0 i ) and ( p l , p 2 )
~ ( 0 j ) .
S i n c e O i and O j a r e s u b g r a p h s of G , t h e n p
(pl,p2) E x(G).
~ ( 0 i ) .
i=1
V ( G ) and
T h e r e f o r e a l l t h e p o i n t s and l i n e s of 0
b e l o n g t o G and 0 i s a s u b g r a p h of G .
ii)
Given
V(0) t h e r e e x i s t s O i s u c h t h a t p
~ ( 0 i ) .
S i n c e O i i s a n open s e t t h e r e e x i s t s a non-empty
c o n n e c t e d s u b g r a p h N of O i s u c h t h a t V ( N ) # ( p ] and
p
v(N).
However N is a l s o a c o n n e c t e d s u b g r a p h of 0
s i n c e V ( N ) c ~ ( 0 iC)V ( O )
and
X(N)cX(Oi) cX(0).
T h e r e f o r e f o r e v e r y p o i n t p i n 0 t h e r e i s a non-empty
c o n n e c t e d s u b g r a p h N s u c h t h a t V ( N ) # ( p j and p E V(N)
T h e r e f o r e 0 i s a n open s e t
Ax.4
The i n t e r s e c t i o n o f any two open s e t s is open.
L e t 01 and 02 be two open s e t s and 0 = 01 1302 = ( m , q )
where q # 0 .
i ) By d e f i n i t i o n X(0) = ~ ( 0 1 f)l X ( 0 2 ) and V ( O ) a r e a l l t h e
p o i n t s t h a t b e l o n g s t o a t l e a s t one p a i r i n ~ ( 0 ) .
Let ( p l , p 2 )
x(o),
t h e n ( p l , p 2 ) E ~ ( 0 1 and
)
(pl ,p2)
X(02).
(pl , p 2 )
X ( G ) and l e t p
p c V(02).
S i n c e 01 and 02 a r e s u b g r a p h s o f G ,
C
v ( o ) , t h e n p G V(01:)and
B u t 01 a n d 02 a r e s u b g r a p h s o f G , t h e n
p V(G).
T h e r e f o r e a l l t h e p o i n t s and l i n e s of 0 b e l o n g
t o G , s o 0 is a subgraph of G.
i i ) L e t pl be a p o i n t i n 0 , pl
~ ( 0 ) . By d e f i n i t i o n
t h e r e e x i s t s a l i n e i n 0 , ( p , p l ) such t h a t
( p , p l ) ~~ ( 0 1 and
)
(p,pl)
follows:
X(02).
L e t N be d e f i n e d as
V ( N ) = { ~ , ~ and
l l X(N) = { ( p , p l ) j .
N is a
nonempty c o n n e c t e d s u b g r a p h of 0 s u c h t h a t V ( N ) #
P E
{PI
and
V(N
T h e r e f o r e 0 i s a n open s e t .
P r o p o s i t i o n 2.
For e v e r y p o i n t p
V ( G ) t n e s u b g r a p h formed by
i t s f i r s t o r d e r n e i g h b o r s and t h e l i n e s j o i n i n g
them t o p , i s a t o p o l o g i c a l n e i g h b o r h o o d of p.
Proof:
Let pl,p2,
. . . p n be t h e s e t o f n e i g h b o r s of p.
Define N
as f o l l o w s :
V(N) = (p, pl,p2,
connected )
...p n j .
and X ( N ) =
( V(N) # { p j s i n c e G is
( ~ , p )l, ( p , p 2 ) ,
... ( p , p n ) ] .
Let
p i be a n a r b i t r a r y n e i g h b o r o f p and U a s u b g r a p h of N
s u c h t h a t V ( U ) = { P , P ~ ]a n d X ( U ) = { ( p , ~ l )1.
U is a n
open set such that p is a point of U and U is a subgraph
of N.
Therefore N is a topological neighborhood of p.
REFERENCES
Andenberg, M.R. 1973. Cluster Analysis for Applications.
York, Academic Press.
.,
New
Beaumont, J.R. 1983. "Quantitative and Theoretical Geography
in Europe." Area 15: 166-167.
Bell, W. 1955. "Economic, Family and Ethnic Status: An
Empirical Text." American Sociological Review, 20: 45-52.
Bennett, R.J. 1981. "Quantitative and Theoretical Geography in
Western Europe." European Progress in Spatial Analysis,
Bennett R.J. ed. Pion Limited: 1-32.
Bennett, R.J. and Wrigley N. 1981. "Retrospect and Prospect
on British Quantitative Geography." Quantitative Geography: a
British View. Bennett, R.J. and Wrigley N. eds. Roetledge &
Kegan Paul, London, Boston and Henley: 3-11.
b
Berry, Brian J.L. 1971. "Introduction: The Logic and
Limitations of Comparative Factorial Ecology." Economic
Geography, 47 ( ~ u n e:)209-21 9.
--------
Regions. "
1961. "A Method of Deriving Multifactor Uniform
Przelg. Geogr , 33 : 263-282.
-------- .
1968. "Approaches to Regional Analysis." Spatial
Analysis, A Reader in Statistical Geography, Berry B.J.L. and
Marble D.F. eds. Prentice Hall.
-------- . 1973.
"A Paradigm for Modern Geography." Directions
in Geography, Chorley R.J. ed. :3-21.
Bivand R. 1984. "Regression Modeling with Spatial Dependence:
An Application of Some Class Selection and Estimation
Methods." Geographical Analysis, 16:
Bodson P . and Peeters D. 1975. "Estimation of the
Coefficients of a Linear Regression in the Presence of Spatial
Auto~orrelation.'~
Environment and Planning A, 7:455-472.
Brantingham, P.L. and Brantingham P.J. 1978 "A Topological
Technique for Regionalization." Environment and Behavior,
10:335-353*
Brouwer F. and Nijkamp P. 1984. "Linear Logit Models for
Categorical Data in Spatial Mobility Analysis." Economic
Geography, 60:102-110.
Bunge, W. 1966. Theoretical Geography. The Royal University
of Lund Sweden, Department of Geography.
Burnett P. 1978. "Markovian Models of Movement within Urban
Spatial Structures." Geographical Analysis, 10:142-153.
Byfulgien, J. and Nordgard, A. 1973 "Region Building-A
,Comparison of Methods." Nordsk. Geogr Tidsskr , 27: 127-1 51
-------- 1974
"Types or Regions?"
Norsk. Geogr. Tidsskr .,
28: 157-1 66.
Clark, I. 1979. Practical Geostatistics. Applied Science
Publishers, Ltd. London.
Cliff A.D. and Haggett, P. 1970 "On tne Efficiency of
Alternative Aggregations in Region Building Problems."
Environment and Planning, 2:285-294.
Cliff A.D. and Ord J.K. 1973. Spatial Autocorrelation. London
Pion
Cormack, R.M. 1971. "A Heview of Classification."
Statistical Society A, 134:321 -367.
The Royal
Cromley R.G. and Hanink D.M. 1985. "Location Portfolio
Analysis." Geographical Analysis, 173318-330.
Davis, Russell G. and Schiefelbein, E. 1980. Planning
Education for Development, Vol. 11, Models and Methods for
Systematic Planning of Education. Massachussets Institute of
Technology.
De Jong P.,Sprenger C. and Van Veen F. 1984. ItOnExtreme
Values of Moran's I and Gearyls c." Geographical Analysis.
16: 17-24.
Dutton, G., ed. 1978. Harvard Papers on Geographic Information
Systems. First International Advanced Symposium on Topological
Data Structures for Geographic Information Systems. Harvard
University.
Dunn, J .C. 1974. I1Some Recent Investigations of a New Fuzzy
Partitioning Algorithm and its Applications to Pattern
Classification Problems." Journal of Cybernetics 4, 2:1-15.
Elliot, Harold M. 1983. "Surrounding Larger Neighbors and the
Atlantic Coast Cardinal Neighbor Gradient." Economic
Geography 59: 426-444.
Firby, P.A. and Gardiner C.F., 1982. Surface Topology.
Horwood Limited.
Ellis
Fisher, D.W. 1958. "On Grouping for Maximum Homogeneity."
Journal of the American Statistical Association, No. 53, pp.
789-798
,Forrester, J.W. 1973.
eight printing.
Industrial Dynamics. M.I.T. Press,
Fotheringham A.S. and Reeds L.G. 1979. "An Application of
Discriminant Analysis to Agricultural Land Use Prediction."
Economic Geography, 55 : 1 1 4-122.
Fowles, Grant R. 1970. Analytical Mechanics. Holt, Reinhart
and Winston Inc.
Gale, S. 1972. "Inexactness, Fuzzy Sets ,and the Foundations
of Behavioral Geography.'' Geographical Analysis 4: 337-349.
Garfinkel, R.S. and Nemhauser , G.L. 1970. "Optimal Political
Districting by Implicit Enumeration Techniques." Management
Science, 16 :B495-B508.
Gould P. 1970. "Is Statistix Inferens the Geographical Name
for a Wild Goose?I1 Economic Geography, 4b:439-448.
Gould, W.T. 1978. School Location Guidelines.
Office Memorandum.
-------- .
World Bank
1973. Planning the Location of Schools: Ankole
District, Uganda.
Planning, Paris.
International Institute for Educational
Gregory, S. 1983. "Quantitative Geography: the British
Experience and the Role of the Institute." Trans. Inst. Br.
Geogr.. N.S. 8: 80-89.
Guruge, A. and Ariyadasa, K.D. 1976. Planning the Location of
Schools: Case Studies in Sri Lanka. International Institute
for Educational Planning (UNESCO), Paris.
Haaser, N.B., La Salle, J.P. and Sullivan, J.A. 1959.
Introduction to Analysis. Blaisdell Publishing Company.
Yaggett, P. and Chorley, R. 1969. Network Analysis in
Geography. Edward Arnold.
Haggett, P., Cliff, A. and Prey, A. 1977. Locational Analysis
in Human Geography. Edward Arnold, Second Edition.
Haining, R.P. 1983. "Advances in Applied Spatial Analysis."
Area 16: 8.
Hall, B.F. 1983. "Neighborhood Differences in Retail Food
Stores: Income Versus Race and Age of Population." Economic
Geography 59: 282-295.
,Hall, K.P., Gilmour, D.I. and Mingos, D.M.P. 1984. "Molecular
Orbital Analysis of the Bonding in High Nuclearity Gold
Cluster ~om~bunds."Journal of ~r~anometallic
chemistry, 268:
275-293
Hallak, J. et al. 1975. Metodo de Preparacion del Mapa
Escolar: La Region de San Ramon, Costa Kica.
Harary, F. 1972. Graph Theory. Addison Wesley, Publishing
Company.
Harman, H.H. 1976. Modern Factor Analysis. Chicago: University
of Chicago Press 3d ed., rev.
b
Hartigan,J.A. 1975.
Clustering Algorithms. John Wiley & Sons.
Haynes, Kingsley E. 1971. "Spatial Change in 'Jrban Structure:
Alternative Approaches to Ecological Dynamics." Economic
Geography, 47 ( ~ u n e) : 324-335.
Hu, Sze-Tsen 1964. Elements of General Topology. Holden-Day,
Inc., San Franciso, London, Amsterdam.
Janson, C. 1971. "A Preliminary Report on Swedish Urban
Spatial Structure." Economic Geography, 47 (~une):249-265.
Johnson, K. and Rosenzweig 1963. The Theory of Management of
Systems. Kogakusha, Mc Graw Hill.
Johnston R.J. 1981. "Ideology and Quantitative Human
Geography in the English-Speaking World." European Progress in
Spatial Analysis, Bennett R.J. ed. Pion Limited: 35-46.
---_____
. 1971.
ltSome Limitations of Factorial Ecology and
Social Area Analysis." Economic Geography, 47 (~une):314-323.
--______
. 1970. "
Grouping and Regionalization: Some
Methodological and ~echnicalObservations." Economic
Geography, 46,2:293-305
Jones,.E. and Eyles, J. 1977. An Introduction to Social
Geography. Oxford University Press.
Lankford, P.M. 1969. llRegionalization:Theory and Alternative
algorithm^.^^ Geographical Analysis I:196-212.
Lawley, D.N. and Maxwell, A.E. 1971. Factor Analysis as a
Statistical Method. American Elsevier Publishing Co.
,Leung, Y. 1982. "Approximate Characterization of Some
Fundamental Concepts of Spatial Analysis." Geographical
Analysis 14: 29-40.
Martin R. 1974. "On Spatial Dependence, Bias and the Use of
First Spatial Differences on Regression Analysis." Area
6 : 185-1 94.
Maxfield D.W. 1972. "Spatial Planning of School Districts."
Annals Association of American Geographers, 62:582-590.
PIorley C.D. and Thornes J.B. 1972. "A Markov Decision Model
for Network Flow." Geographical Analysis, 4:180-193.
b
Morrill H.L. and Kelly M.B. 1970 "The Simulation of Hospital
Use and the Estimation of Location Efficiency." Geographical
Analysis, 2:283-300.
Muckay D.B. 1983. "Alternative Probabilistic Scaling Models
for Spatial Data." Geographical Analysis, 15:173-189.
Mulligan G.F. and Gibson L.J. 1984. "Regression Estimates of
Economic Base Multipliers for Small Communities." Economic
Geography 60: 225-237.
~urtagh,F - 1985. "A Survey of Algorithms for
contiguity-constrained Clustering and Related Problems."
Computer Journal, 28: 82-88.
The
Peucker, T.K. and Chrisman, N. 1975. "Cartographic Data
structures." The American Cartographer, 1 (~~ril):55-69.
~eucker,T.K., Fowler, R.J., Little, J.J. and Mark, D.M. 1976.
~rian~ulated
Irregular Networks for Representing Three
~imensionalSurfaces- Technical Report #lo: Geography
Department, Simon Fraser University.
Phipps, A.G. and Laverty W.H. 1983. "Optimal Stopping and
Residential Search Behavior." Geographical Analysis,
15: 187-204.
Ravenstein, E.G. 1885, 1889. "The Laws of Migration."
of the Royal Statistical Society, 48: 52.
Journal
Rees, Philip H. 1971 "Factorial Eco1ogy:An Extended
Definition, Survey and Critique of the Field." Economic
Geography, 47 ( ~ u n e :)220-233.
Rogerson, P.A. 1984. "New Directions in the Modelling of
Interregional Migration." Economic Geography, 60:111-120.
Rosenfeld, A. 1978. "Extraction of Topological Information
,from Digital Images." Harvard Papers on Geographic Information
Systems, Vo1.6. First International Symposium on Topological
Data Structures for Geographic Information Systems. Harvard
University.
Royden, H.C. 1968. Real Analysis. Collier-Plcmillan Limited,
London.
Salins, Peter D. 1971. "Household Location Patterns in
American Metropolitan Areas." Economic Geography, 47
( ~ u n e :)234-248.
Sawicki, D.S. 1973. "Studies of Aggregated Areal Data:
Problems of Statistical Inference." Land Economics,
1 2 :237-247.
Schwab, M.G. and Smith T.R. 1985. "Functional Invariance
under Spatial Aggregation from Continuous Spatial Interaction
Models." Geographical Analysis, 17:217-230.
Scott, A. 1969.
Studies in Regional Science. Pion Limited.
Sheppard, Eric S. 1979. "Notes on Spatial Interaction ."
Professional Geographer, 31 (1 ):8-15.
Silk, J. 1979. Statistical Concepts in Geography.
Allen & Unwin, London.
George
Smith, T.E. 1984. "Testable Characterizations of Gravity
Models." Geographical Analysis, 16:74-94.
Slater, P.B. 1985. "Point-to-Point Migration Functions and
Gravity Model Re-Normalization: Approaches to Aggregation in
Spatial Interaction Modeling." Environment and Planning A,
17:1025-1044.
Springer, C.S. 1973. "Role of the Five-Coordinate Intermediate
in the Stereochemistry of Dissociative Reactions of Octahedral
Compounds." Journal of the American Chemical Society, 95:
1459-1 467
Solana, F., Cardiel, R. and BolaRos, R. 1981. Historia de la
Educacidn Phblica en ~ 6 x i c o . Secretaria de Educacidn Publica.
Symon, K.R. 1969. Mechanics. Addison Wesley Publishing
Company.
Taafe, E.J. and Gauthier, H.L. 1973. Geography of
Transportation. Prentice Hall, Inc.
Taylor, P. 1977. Quantitative Methods in Geography. Houghton
Mifflin, Boston.
Tobler, W. 1970. "A Computer Movie Simulating Urban Growth in
the Detroit Region." Proceedings of the I.G.U. Commission on
Quantitative Methods, Economic Geography 46.
Thoresson, J.D. and Liittschwager, J.M. 1967. "Legislative
Districting by Computer Simulation." Behavioral Science,
1 2 :237-247.
Tylor, E.B. 1889. "On a Method of Investigating the
Development of Institutions Applied to Laws of Marriage and
Descent." Journal of the Anthropological Institute of Great
Britain and Ireland, 18:245-272.
b
Wilson, A. G. 1971. "A Family of Spatial Interaction Models,
and Associated Developments." Environment and Planning,
3: 1-32
Yupa L. S and Mayf ield D. 1 978. "Non-Adopt ion of Innovat ions :
Evidence from Discriminant Analysis." Economic Geography
5:145-156Zadeh, L.A.
8:338-353
1965- "Fuzzy Sets."
Inforuation and Control
_____--1976.
" W z z y Sets and their Application to Pattern
classification and Clustering Analysis." Classification and
Clustering, Ed. J * Van Ryzin. Academic Press Inc.
Zobler, Lo 1958. "Decision Making in Regional Construction."
~ n n a l sof the Association of American Geographers, 48:140-148.