0% found this document useful (0 votes)
94 views235 pages

Metropolitans

paper

Uploaded by

Will Tohallino
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
94 views235 pages

Metropolitans

paper

Uploaded by

Will Tohallino
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd

NXIGHBOHHOOD MODELS:

AN ALTERNATIVE POH

THE MODELING OF SPATIAL STHUCTUHES

aria del Carmen Heyes-Guerrero


Licenciatura, Universidad ~ a c i o n d l
~ u t d n o m ade ~ d x i c o ,1974
[Link]., Universidad ~ u t 6 n o m aMetropolitans, 1981

THl3SIS SU!dIVlI'l!T~D I h PARTIAL

~ ' I J~r~ILLlvlEr\lT
OF

THE HEUUIkEMEldTS FOR THh 5LGKXE OF


DOCTOR OF PHILOSOPHY
by
Special Arrangements

aria del Carmen Reyes-Guerrero, 1986


January, 1986

All rights reserved. This work nay not be


reproauced in whole or in part, ~y photocopy
or other means, without permission of the author.

APPROVAL

Name :

Y m i a d e l Camen ~ e ~ e s - ' ~ u e r r e r o

Degree :

Doctor of Philosophy

T i t l e of

Neighborhood Models: An PAternative for t h e


Modeling of S p a t i a l S t r u c t u r e s

Ekaniriing Cannittee :

1 x 1 Hutchinson

Chainran:

T. K. Poiker
Senior Supervisor

P.L. Brantinghm

B.K.

Ehattachqa

D.M. Eaves

--

W.G. G i l l

R.L. Morrill
Professor
Ekternal Examiner
Departrent of Geography
University of Washington

Date Approved:

19 Ti;:e l : i s

PARTIAL COPYRIGHT LICENSE

I hereby g r a n t t o Simon Fraser U n i v e r s i t y t h e r i g h t t o lend


my t h e s i s , p r o j e c t o r extended essay ( t h e t i t l e o f which i s s h o w below)
t o users o f t h e Simon Fraser U n i v e r s i t y L i b r a r y , and t o make p a r t i a l o r
s i n g l e copies o n l y f o r such users o r i n response t o a request from t h e
l i b r a r y o f any o t h e r u n i v e r s i t y , o r o t h e r educational i n s t i t u t i o n , on
i t s own behalf o r f o r one o f i t s users.

I f u r t h e r agree Pha-1 perri~is s i o n

f o r m u l t i p l e copying o f t h i s work f o r s c h o l a r l y purposes may be granted


by me o r t h e Dean o f Graduate Studies.

It i s understood t h a t copying

-or p u b l i c a t i o n o f t h i s work f o r f i n a n c i a l g a i n s h a l l n o t be a l l o w e d
w i t h o u t my w r i f t e n permission,,

T it l e o f Thes i s/Project/Extended

Essay- ,

Neighborhood Models : An Alternative for the ~odelin~


of

------

Spatial Structures

Author:

---.

(signature)

'

Maria del Carmen Reyes-Guemro

(date)

--.,-_

iii

Abstract

In the last three decades there has been a widespread use of


quantitative models in geography.

The majority of models have

been applied for descriptive, predictive and


hypothesis-testing purposes.

Quantitative geography is at a

-stage where the benefits and limitations of most of these


models nave been tested.

Consequently, there has been a

tendency to seek models considered to be more "appropriate"


for geographical analysis.

This work examines some of the

most commonly used mathematical tools in human geography and


presents a new family of models as an alternative for the
mathematical representation of spatial structures.
L

The proposed models are based on tne notion of "geographical


neighborhood" and are named neighborhooa moaels.

In the

formalization process mathematical concepts such as space and


subspace are used to uodel the notion of "neighborhooa";

two

quasi-mathematical structures (geo-spaces and geo-subspaces)


are defined as an aid in this modeling.

As a first step in the construction of neighborhood models a


set of measures of local variation (heterogeneity indices) are
introduced.

The role playea by these indices from a

mathematically formal point of view is funaamental, since


through them it is possible to combine and benefit from two
mathematical areas of knowledge: topology and fuzzy set
theory.

In tne second part of the thesis the indices are applied to


two geographical problems:

1 ) the design of classification

algorithms with regionalization purposes; 2) an aid in the


,selection of sites for the allocation of resources in an
educational planning environment.

In the first case the index

is used to define the degree of membership of an element to

the interior (or border) of a region tnrough the topological


concept of neighborhood and that of fuzzy set. In the second
application several indices are used to describe spatial and
temporal characteristics of tne demana for schools in an area
in central Mexico.

Finally, some future areas of research are proposed.

Although

these ideas concerning neighborhood models have not been


completely explored and developed, it is clear that the
approach provides a fruitful avenue for research.

To H o d o l f o , F i t o and P a b l o ,
C h a t a and C h a t o ,
Maru, Ara, and P a c o .

To my f r i e n d s .

This thesis was written under the senior supervision of Thomas

K. Poiker from whom I received suggestions and assistance as


well as support and encouragement.

Discussion of the thesis

with Pat Brantingham (supervisor) was also of assistance and


her support and enthusiasm were of great help for the
completion of this work.

Binai Bhattacharya (supervisor) made

,valuable comments on the algorithmic aspects as well as on the


general format.

I have also benefitted from discussions with

David Eaves (supervisor) as well as from Armando Bayona and


Rigoberto Quintero (~ireccidnGeneral de ~eografia,~exico).
In several aspects related with the Mexican educational
planning system I was kindly assisted

(SEP, Mkxico)

by

Maria Eugenia Reyes

David Howard (SEP, 1ul4xico) read the

preliminary versions and made valuable suggestions regarding


the use of the English language.

Dr. Xnrique Calder6n (UNAM,

~ d x i c o )is gratefully acknowledged for his continuous support


during the Ph.D. program.
Financial assistance for the first four years was given by tfle
Mexican Government (CONACYT), later I was awarded a one
semester S.F.U. stipend and in the last stage of the
completion of this thesis, I received assistance from the
Direccibn General de Geografia (~gxico),Nkstor Duch is kindly
acknowledged for this support.

vii

TABLE OF CONTENTS

Approval

ii

Abstract

iii

Dedication

Acknowledgments

vi

CHAPTER 1 . Introduction

1.1

Mathematical Models

1.2 Quantitative Geography

1 .2.1 Galton's Problem

1 .2.2 lqeignborhoods

CHAPTliR 2. Mathematical lqodels in Human Geography


2.1 Classical lvlodels
2.1 .1 Factor 1~Iodels
2.1 .2 Gravity Moaels
2.1 .3 Networks
2.2 The

Neighborhooa Approach

2.2.1 Geographical a n a Topological


Neighborhoods
2.2.2 Spatial Autocorrelation
2.2.3 Geostatistics
2.2.4 Topological Data Structures
2.3 Discussion

viii

CHAPTER 3. Neighborhood Models


3.1 A General Framework
3.1 .1 Intuitive Ideas
3.1 .2 Spaces and Subspaces
3.1 .3 Geo-spaces
3.1 .4 Geo-subspaces
3.2 Definition of Neighborhood Model

3.3 The Heterogeneity Index


3.3.1 Formal Definition
3.3.2 Interpretations

3.4 Fuzzy Topology


3.4.1 Definition of Fuzziness

3.5 Otner Indices


3.6 Conclusions

CHAPTER 4. A Topological Approach to Regionalization

4.1 Regionalization as a Classification Proolem


4.1 .1 Elements of a Kegionalization
4.1 .2 The Geographic Units
4.1.3 Measures of Homogeneity

4.1.4 Hegionalization Constraints


4.1 .5 The Number of Regions
4.1.6 The Algorithms
4.2 Spatial Algorithms
4.2.1 Byfulgien and Nordgard Algorithm
4.2.2 Berry's Algorithm

4.2.3 Lankford Algorithm


4.2.4 Brantingham Algorithm

4.3 The Design of Algorithms


4.3.1 A Topological Algorithm
4.3.2 The Regions as Graphs

4.3.3 Heteregeneous Regions


4.3.4 Fuzzy Regions
4.3.5 The Heterogeneity Surface

4.4 A Comparative hxample


4.4.1 A Hypothetical Case
4.4.2 A Topological Ward Algorithm

4.4.3 Conclusions

CHAPTER 5.

An Application to Xducational Planning

5.1 Introduction
5.2 School Location Planning
5.2.1

lv~odelsand PIethods

5.2.2 Administrative Policies


5.3 hducational Planning in Mexico
5.3.1

historical background

5.3.2 Planning Experiences


5.4 A Case Study of the State of San Luis Potosi
5.4.1 The Data
5.4.2 The Geo-space
5.4.3 The Geo-subspaces

5.4.4 Some Spatial Characteristics of the


Demana for Preparatory Schools

5.4.5 Additional Considerations

5.5 Summary

CHAPTER 6. Conclusions

6.1 Summary
6 . 2 Discussion

Appendix A.

Mathematical Concepts

A.l Matnematical Definitions


A . 2 Mathematical Discussion

References

L i s t of Tables

Number o f Neighbors by D i s t a n c e

173

S e r v i c e of P r e p a r a t o r y S c h o o l s f o r a 20 Km.
T r a v e l l i n g D i s t a n c e (1984 d a t a )

1 75

Number of Secondary G r a d u a t e s p e r
Settlement

177

S t a n d a r d i z e d H e t e r o g e n e i t y I n d e x 20 Km.
Network

181

S t a n d a r d i z e d H e t e r o g e n e i t y I n d e x 1 5 and
25 Km.

Networks ( 1 9 8 3 )

S t a n d a r d i z e d Temporal H e t e r o g e n e i t y I n d e x
( E x c l u d i n g County S e a t s )
Standardized Heterogeneity Index of t h e
Temporal S t a b i l i t y

xii

L i s t of F i g u r e s

Complete Graphs of Three and Four P o i n t s


Dendogram
O r i g i n a l Data
R e s u l t i n g Regions ( F i r s t S t a g e )
Subgraphs and Minimal Spanning T r e e s
Minimal Spanning T r e e (Second S t a g e )

Dendogram
Minimal Spanning T r e e
S i n g l e Linkage Method
F i r s t o r d e r Neighbors
Study Area
H y p o t h e t i c a l Case.

V a r i a b l e s A , B and C

M u l t i v a r i a t e H e t e r o g e n e i t y Index
T o p o l o g i c a l Algorithm ( F i r s t S t a g e )
V a r i a b l e s A , B and C.

Standardized

H e t e r o g e n e i t y Index
Standardized Multivariate Heterogeneity
Index
T o p o l o g i c a l A l g o r i t h m (Second S t a g e )
2 1 R e s u l t i n g Regions

S i n g l e Linkage Method.
Regions

69 Resulting

Single Linkage Method.

2 1 Resulting

Regions
Number of Schools.

Number of Students

Distribution of Elementary Schools in the


State of San Luis Potosi
Distribution of Secondary Schools in the
State of San Luis Potosi
Distribution of Preparatory Schools in the
State of San luis Potosi
State of San Luis Potosi
Communication Network
2 5 Krn.

Network

2 0 Km. Network

15 Km. Network
10 Km. Network
Catchment Areas,

20 Km. Network

Heterogeneity Index.

2 0 Km. Network

Spatial Distribution of the Temporal


Heterogeneity Index

191

Distribution of the Spatial Heterogeneity


Index Applied to Temporal Stability

192
210
210

Chapter 1 .

INTRODUCTION

Since the beginning of the Quantitative Revolution in


Geography a considerable amount of mathematical models have
been used to represent different aspects of the geographical
landscape.

There are, however, certain facets of spatial

structures that have received less attention from modelers.


Such is the case of the geographical notion of neighborhood.
The main aim of this thesis is to build mathematical models
designed to allow the formal representation of the
geographical concept of neignborhhod.

In the first part of this chapter, an overview of the role


played by mathematical models in the discipline is given,
while in the second part, intuitive notions of the

1 . 1 Mathematicai Models

Models have played a fundamental role in the development of


various areas of knowledge such as physics, engineering,
architecture, computing, economics and geography.

Commonly a

model is said to be a representation of objects, events and


processes of the real world ( ~ o h n s o n ,1963, p.218).

Depending

on their purpose, models are classified into different groups.


-For example, Forrester (1973, p.49) distiguishes between
physical and abstract, dynamic and static, linear and
nonlinear and stable and unstable models.

A model is said to

be mathematical when mathematical language is used in its


representation.

Mathematical models have been used extensively in the natural


sciences.

For example, in physics practically all knowledge

is expressed in mathematical language, and theoretical


advancement has been accomplished to a large degree through
mathematical modeling.

On the other hand, in the social

sciences the use of mathematical models has not been as


extensive or successful.

In geography, models such as maps

have always been intimately related to geographic knowledge.


However, mathematical models did not have a notable presence
in the development of geographical theory until the so-callea
"Quantitative Revolution1' which took place in the 1950's.
a consequence, a new branch of geography, quantitative

As

geography, was established.

1 . 2 Quantitative Geography

As noted by Gregory (1 983, p.80) mathematics has been used in


geography for a long time, particularly in the construction
and use of maps.

According to his view, trigonometry,

Euclidean geometry and space transformations are areas in


,which geographers are traditionally trained.

The quantitative era in geography is characterized not simply


by the presence of mathematical techniques but by the
intensification and expansion of their use and the
introduction of mathematical modeling and abstract theory.
Although quantitative geography originated in the United
b

States in the 1g5O1s, its development in other countries shows


distinctive characteristics.

According to recent literature

there are two major schools of quantitative geography:


Anglo-American and the Continental European schools.

the
They can

be identified by the facts and circumstances of their


development.

For example, Bennett (1981, p.1) characterizes

the German and French tradition as being more concerned with


deep methodological questioning, while in the English-speaking
countries more importance has been given to the development of
analytical techniques, and a more pragmatic approach has been
pursued.

In the 19801s, three decades after the initiation of the


quantitative revolution, European researchers are engaged in
the historical analysis of the quantitative approach and the
contrastive analysis of the state of the art in the two
principal schools.

Several academic meetings have been

organized to discuss these topics and their relevance to the


future of quantitative and theoretical geography ( ~ a i n i n g ,
,1984, Bennett , 1 981 and Beaumont, 1 983)

As a result of these meetings, a consensus seems to have


emerged regarding the initial development stages of
quantitative geography:

In early research, too much

importance was given to the techniques and not enough to their


role in the development of geographic theory.

For example,
b

the main purpose in applying mathematical techniques was often


to test untried tools.

In this respect Bennett and krigleyls

more theoretical perspective ( 1 981 , p. 6) regarding "core" and


"frame" disciplines are particularly interesting.

According to these authors, core disciplines are "those areas


of thinking which are providing new systems, concepts, and
developing new explanatory paradigms."

In contrast, frame

disciplines "are those which derive in methodology, object of


study, and terminology, from other external subjects.''
Adopting these terms, quantitative geography sought to be a

core discipline in its initial stage while in most recent


years it has become more of a frame discipline.

Consequently,

researchers in various branches of geography are making more


extensive use of quantitative techniques without necessarily
consid'ering themselves "quantitative geographers. I'

It should be mentioned that quantitative geography has evolved


not only in its approach to the use of mathematical models but
also in the kind of techniques used.

For example, statistical

inference in the 1960's "was seized as a panacea for


geographical metnodologytt

enne nett

and Wrigley, 1981, p.8),

while in the 1970's the appropriatness of this methodology was


questioned.

As a result, statistical techniques for specific

geographical purposes were designed.

Such is the case of

Cliff and O r d t s autocorrelation (1973) and Clark's


geostatistics (1979). Bennett and Wrigley (1981, p.9) have

anticipated that statistical inference will continue to be


used in the future but not to the extent that it was in tne
past.

In general, early quantitative methods adapted mathematical


models designed in otner areas of knowledge to represent the
"geographical landscape."

This is the case of such widely

used models as tne gravity model, factor analytic methods,

regression models, cluster analysis as well as a wide range of


statistical techniques.

However, as a result of critical

analysis by both geographers and scientists from other


disciplines, in more recent years more appropriate models and
discipline-specific techniques have been developed.

1 .2.1

Galton1s Problem

The criticisms maUe by Sir Francis Galton at the end of the


last century of the use of correlation analysis in
,anthropological studies is the point of departure of the
present work.

In 1889 at a meeting of the Royal Anthropological Institute,

E. B. Tylor recognized that anthropology needed a scientific


methodology for the analysis of ethnographic data and proposed
a cross-cultural survey methodology.

He applied this method

to data he had collected on different tribes and societies.


The importance of Tylorls work resides in the fact that he was
able to correlate distinct traits which were present in the

1
I

various human groups under study.

During the discussion

period of the meeting, a comment made by Sir Francis Galton


had a significant impact on the future development of both
anthropology and geography:
It was extremely desirable for the sake of those who may wish
to study the evidence for Dr. Tylorls conclusions, that full
information should be given as to the degree in which the
customs of the tribes and races which are compared together
are independent. It might be, that some of the tribes had
derived them from a common source, so that they were duplicate
copies of the same original. Certainly, in such an
investigation as this, each of the observations ought, in the

language of the statisticians, to be carefully "weighted." It


would give a useful iaea of the distribution of the several
customs and of their relative prevalence in the world, if a
map were so marked by shadings and colour as to present a
picture of their geographical ranges ( ~ y l o r ,1889).
As a consequence of this remark spatial dependence became an
issue in both anthropological and geographical studies and
spatial analysts became aware of the need to incorporate the
"spatial dimension" in their mathematical models.

One of the

concepts that has been used in modeling spatial dependence is


,that of "geographical neighborhood."

1 .2.2 Neighborhoods

A retrospective analysis of mathematical models and techniques


makes it clear that certain aspects of the "geographical
landscape" have been much more widely modeled than others.

For example, the modeling of properties such as distance and


shape were part of the "new geometric tradition" of the early
1960's ( ~ a i n i n g ,1983, p.86)

There are, however, other

aspects of the geographical landscape which have received


little attention from modelers.

This is the case of the

geographical notion of "neighborhood."

The concept of neighborhood is treated in geographical stuaies


at different levels and with different meanings.

For example

in urban studies neighborhoods are considered as sub-areas


I

with homogeneous characteristics such as level of income, age

of population, etc. and are used as the basic elements for


analysis

all, 1583).

In other instances geometric

characteristics of the geographical landscape are studied


through the relationship among neighbors.

Such is the case of

the cardinal neighbor method (~lliott,1983) where the


geometric patterns of cities and population centers are
identified and analysed, based on the neighborhood
relationship among them.

Neighborhoods also play an important role in other areas of


knowledge.

For example in mathematics, there is a whole

discipline which is based on the formal notion of


neighborhood, namely topology.

In it, concepts that are

usually defined in an euclidean space are generalized through


the concept of neighborhood and the "topological
b

characteristics" of mathematical objects are studied (Firby


and Gardiner, 1982).

Furthermore these mathematical concepts

have been applied in other disciplines such as chemistry,


where the topological characteristics of tnree dimensional
networks are used in the analysis of the chemical
characteristics of compounds (springer, 1973) and in the
classification ana characterization of gold compounds

all,

Gilmour and Mingos, 1984), among other applications.

Although the notion of neighborhood is conceived in different


manners among distinct disciplines, at an intuitive level the

concept is based on the idea that tne "space" that surrounds


an entity or a specific portion of "space" is of special
significance.

At elementary levels of analysis in both

mathematics and geography, the concept of neighborhood is


related with the intuitive notion of "surrounding space".
However, in both disciplines different definitions of
neighborhood are used according to the problem at hand.

For

example in some quantitative techniques such as


autocorrelation, geostatistics and topological data
structures, specific meanings are given to the concept of
neighborhood.

There is however, no general treatment of this

concept in the geographical literature.

To exemplify some of

the methods and techniques, specific definitions of


neighborhood are used throughout the presentation, but
emphasis is done on the general concept of geographical
b

neighborhood.

Briefly, the principal objective of this thesis is the design


of a family of formal tools that allow the modeling of the
concept of "geographical neighborhood.''

In Chapter 2 some classical models as well as those that have


incorporated the spatial dimension through the concept of
neighborhood are reviewed.

In Chapter 3 the notion of

neighborhood model is introduced, and the mathematical


concepts that are used in the modeling process are defined.

In Chapters 4 and 5 two applications of neighborhood models


are presented.
regionalization.

The first is a theoretical application in


The second is an application to a specific

problem in educational planning.

Finally, tne appendix

contains a mathematical discussion of some of the concepts


used in the thesis.

Chapter 2.

PiATHhMATICAL IviODELS I N HUIiAN GEOGKAPHY

Prom the period of the "Quantitative Revolution'' in geography


to the present, a considerable number of mathematical models
and methods have been used for geographical analysis purposes.
Many of them are adaptations of formal tools used in other
sciences such as physics, biology, botany, economics and
psychology.

The development of quantitative geography is at a


b

stage where it has Pecome necessary to analyse the benefits


and limitations of the techniques used in the past and to
propose new ones.

Mathematical models that have been used in the past have


represented distinct aspects of spatial structures using
various formal tools.

Since the principal objetive of this

work is to present neighborhood models as an alternative for


the modeling of spatial structures, it is convenient to study
some of the mathematical structures that have been
incorporated into existing models.

In this chapter some of the mathematical models that can be


considered as classical in human geography are reviewed as
well as those that have incorporated the notion of
neighborhood in the representation of the geographic
landscape.

2.1 Classical Models

Three of the models that have been extensively used in human


geography are: factor models, the gravity model and networks.
According to the purpose of application of each one of them,
the geographical landscape has been moaeled in different ways.
Whether the models actually include the entities and the
relationships arnong them that are most important for
L

geographical studies remains an open question. The reason for


presenting these three models is to establish which elements
of the geographical landscape have been included and to
discuss to what extent the spatial structure has been modeled.

2.1 . I Factor Models

Factor analysis was originally developed to aid scientists in


testing hypotheses concerned with the organization of mental
ability.

At the beginning of the century, psychologists wers

interested in measuring "general intelligence" by defining and

quantifying its components.

The point of departure is an nxm

matrix which includes for "nttpersons the scores of "mtttests


associated to each one of them.

By means of factor analysis a

small number (r<<m) of hypothetical factors are determined.


These factors allowed psychologists to isolate what they
viewed as fundamental personality components or "factors of
the mind" ( ~ e e s ,1971 , p.220).

According to Lawley (1 971 ,

p.1) the method was initially restricted to psychometrics, and


,for some time it remained the black sheep of statistical
theory.

Although it is

considered to be a "complicatedtt

technique, its use is increasingly widespread today due to the


existence of easy-to-use computer packages which facilitate
its application.

This mathematical model nas been extensively described both


b

for researchers with a mathematical and statistical background


( ~ a w l e ~1971
,
and Harman, 1976) and for those who lack such
training ( ~ a y l o r ,1977 and

Berry, 1971).

Since this study is

principally concerned with the specific application of the


model, only those mathematical aspects which are relevant to a
geographic context will be discussed.

Factorial Ecology

Factorial ecology is the branch of quantitative geography


which is concerned with tne use of factor models in

geographical studies.

Although principal component analysis

was developed by Pearson (1901) and Hotelling (1933), factor


analysis began with the work of Spearman (1904, 1926).
Research in factorial ecology does not appear until the
mid-fi'fties

ell, 1955 ) . However, the technique has becone

well-established since then, and it is presently taught as


part of the regular curricula in the geography programs
offered by most universities.

According to Taylor (1977, p.255) factorial ecology developed


in a period in which two opposing groups of researchers were
studying the structure of urban centers.

On the one hand,

there were the human ecologists who were interested in the


ecology of urban areas and had proposed several spatial models
(concentric ring model, sector model and the multiple nuclei
model).

On the other hand, there were the social area

analysts who hypothesized that the urban social structure


could be characterized through three indices or dimensions:
economic status, family status and ethnic status.

In fact,

the initial application of ecological factor analysis was


undertaken by Bell (1955).

Its purpose was to test the

hypothesis that urban populations could be adequately


described by the three-status criteria.

The results obtained by Bell were encouraging enough to cause


widespread acceptance and use of the technique among urban

social geographers.

Studies of various cities in the U.S.

( ~ a l i n s ,1971, p.235) allowed researchers to undertake


comparative (congruence) analysis: among the different
metropolitan areas and over several points in time.

In other

major urban centers in the world, factor models were applied


similarly ( ~ a y n e s ,1971, p.324, Janson, 1971, Johnston, 1971,
p.315).

The extensive use of the factor analysis technique for


hypothesis testing and as a descriptive tool has also made
researchers increasingly aware of its limitations.

In some

urban ecology studies the use of the technique has been


successful P U ~that, as with any other formal tool, the
appropriate use of factor models depends on tne understanding
that the researcher has of both the problem and the technique.
L

Factorial Ecology as a Geographical Model

Why and how does factorial ecology qualify as a geographical


model?

The fact that the model is applied through the use of

areal units has been considered enough to

automatically

classify it as a spatial analysis technique.

There have been

some attempts to introduce the location variable as one of the


variates in the factor model in order to transform it into a
"more" explicitly geographic model.

However, the results

obtained from these procedures have come under strong

criticism.

For example, when latitude and longitude are

included as variables in the factor model, two main


disadvantages have been found: the circularity of method and
the lack of invariance to the selection of axes ('Taylor, 1977,
p.275)'.

The first objection refers to the fact that in the

procedure location variables (latitude and longitude) are used


to calculate scores which are latter located on a map,
producing as an effect the location of locations.

In the

second case, criticism is based on the fact that the selection


of orthogonal axes is arbitrary so that the factors obtained
in each case are not necessarily equal.

In other words, the

factorial model is sensitive to the change of axes of the


location variables.

In order to understand how factorial ecology models the


L

geographical landscape it is important to review some basic


concepts behind the modeling process.

When the decision to use a mathematical model is made, the


common procedure is to develop or select the most appropriate
from the existing models.

The most relevant objects of the

phenomenon under study, together with the most important known


relationships among them, are usually expected to be included
in the model.

One of the advantages of using a mathematical model is that

once the real environment has been identified with a


mathematical structure, all mathematical knowledge is at the
service of the researcher.

Expected and unexpected relationships among the objects can


emerge as the result of applying the mathematical techniques.
This may allow the researcher to find optimum solutions to
specific problems.

In fact, the possibilities of using

,mathematical models as aids in research and in the solution of


specific problems is limited only by the researcher's ability
to apply existing tools or to develop new ones.

The history of the application of the model under discussion


reveals that researchers took no special interest in the
spatial relationships among objects (in this case areal
b

units). In fact, there has been no real conceptual difference


in the use of the technique by geographers and psychologists.
Once the areal units and the various census variables are
defined and identified with a mathematical structure (in this
case a matrix), all other crucial relationships beyond tne
scope of the moue1 tend to be ignored.

For example, factors that are extremely important for certain


geographical analyses such as distance, nearness, relative
Position, and contiguity are in no way subject to analysis by
the factorial model.

The model is insensitive to all these

factors.

As applied in Bell's study, the model is unable to

consider the relevance of the size and spatial distribution of


the census tracts.

Whether the census tracts of the city of

Los Angeles were arranged in a regular shape such as a square


or in a chain mode or as they actually are, and whether
similar tracts were close together or far apart, could not
influence the conclusions Bell arrived at based on the
factorial model. In short, the problem is the incapacity of
,factorial ecology to model any spatial structure.

The question that remains is not so much whether introducing

the relative position or absolute location makes factorial


ecology a "more" geographic model, but rather how important it
is that the models used by urban social geographers take
account of those spatial relationships which have been ignored
L

until now.

This question is still subject to debate.

2.1 .2 Gravity Models

The law of universal gravitation announced by Newton in 1666

is o n e of the cornerstones of modern science.

The law can be

stated as follows:
Every particle in the universe attracts every other particle
with a force that varies directly as the product of the masses
of the two particles and inversely as the square of their
distance apart. The direction of the force is along the
straight line joining the two particles ( ~ o w l e s ,1970, p.139).
The importance of movement in social phenomena inspired some

social geographers to use models similar to those developed by


physicists.

A family of models analogous to the law of

universal gravitation have been developed in geography to


study such phenomena as migration between population centers
and retail trade areas: the movements of persons (journey to
work, journey to shop, etc .) and the movement of goods.

The first models, such as the one used by Ravenstein in 1885


,for migration studies in England and Wales, were strongly
inspired by the Newtonian model.

However, more recently

geographers have developed a whole new family of interaction


models that differ substantially from the physical model
( ~ i l s o n ,1971 )

. As

Wilson shows (1 971 , p. 1 ) "gravity model"

has become something of a misnomer.

For example, a conceptual

distinction worth making is that while in physical terms the


gravitational force is exerted in equal magnitude on both
particles, and motion comes as a effect of that force, in
geographical applications the equivalent of force is
identified with movement (flows).

Gravity models have been mainly designed for use as predictive


and descriptive tools.

In the past, planning decisions have

been based on these models' predictions of traffic flows in


several metropolitan areas in the United States, population
migration between cities, and sales in a shopping center
(Taylor, 1977, p.287).

The mathematical expression of the model varies according to


predictive or descriptive objectives. Below are two of tne
most common equations.

Migration Model

The migration Tij between an origin i and a destination j is


,directly proportional to the product of the sizes of the areas
Oi and Dj and inversely proportional to the distance dij
between them, raised to some power n.
-n
Tij = k Oi Dj

(dij)

(2.1)

Interaction Models
L

The interaction between zone i and j is directly proportional


to the product of the "mass terms" Wi and Wj associated with
the zones and inversely proportional to a measure of distance
or cost of travel, raised to some power n.
Tij = k Wi Wj

-n
(~ij)

(2.2)

In these types of models it is common to find additional


Constraints.

These restrictions are often related to

knowledge of the total interaction, emerging or arriving flows


of a zone.

The full interaction model can be written as:

Tij = Ai Bj Oi Dj f ( ~ i j )

(2.3)

The difference between equation (2.3) and previous


mathematical expressions of the model is that the constant k
is substituted by the product of Ai and Bj.

Equations (2.4)

and (2.5) are derived from the constraints imposed on the


model.

According to Shepard (1969, p.8) tnere are two established


methodologies to estimate the parameters of the gravity model:
the regression and entropy maximization approaches.

In the

first case the researcher deals with tne model:


-n
Tij = k Oi DJ (dij) Eij
where Eij is a stochastic residual.

(2.8)
To estimate the

parameters with the ordinary least squares method, equation

(2.8) is transformed through the logarithmic function.

log Tij = log k + log Oi Dj - n log dij

log E i j

(2.9)

With the maximization entropy approach a metnod similar to the


microcanonical ensemble technique in statistical mechanics is
used.

'

The details are not discussed in this work and the

reader is referred to ( ~ i l s o n ,1971, p.4 and Haggett, 1977,


p.40).

,The Gravity Model as a Spatial Model

The Newtonian gravity model is undoubtedly extremely simple.


Its geographic counterpart is also simple.

Social processes

such as migration and journey-to-work trips are predicted from


the relationship between two entities: "mass" and "distance".

Depending on the purpose of the application, mass and distance


represent different quantities.

Their relationship is clearly

established in each one of the postulated models.

For

example, in equation (2.1) for a fixed mass the flow of


population increases as the distance decreases (the inverse is
also true).

This is a well-known geographic relation which

has been stated in various ways: "Towns attract more trade


from near than from far locations" ( ~ a y l o r ,1977, P-207);
"Everything is related with everything else but near things
are more related than distant ones" ( ~ o b l e r ,1970).

Despite

their simplicity, gravity models allow researchers to work

with important aspects of spatial structure.

It is important to note that by means of the law of universal


gravitation it is possible to calculate the force exerted
between two particles.

However, if more than two particles

are involved in the analysis the problem becomes more


complicated.

Although the motions of tne planets in the solar

system have been calculated through numerical solutions


, (Symon,

1969, p. 185), there is no general solution to the

problem involving the motion of any number of particles under


the forces exerted on one another.

Analogously, some of the gravity models postulated in


geography describe interaction exclusively between two
entities.

For example, in the prediction of migration between

two cities as represented in equation (2.1), it is assuued


that there is no other interaction between either of these two
cities and the rest of the universe of study.

There are,

however, geographic gravity models that have been designed to


account for the effect of other cities in the interaction
between two cities.

For example, in the study of

transportation it is assumed that the interaction between any


two cities is reduced due to the presence of a third one.
This assumption is based on the concept of "intervening
Opportunityt1which according to Taafe and Gauthier (1973,
p.95) was first formulated by Stenffer in a study of

intra-urban migration.

This concept is derived from tne

following reasoning:
"The number of migrants from any point within a city to a zone
at the periphery of the city was directly related

to the

number of opportunities or vacancies in that zone and


inversely related to the number of opportunities between the
originating point within the city and the zone in the
periphery."
.In the gravity model that includes this notion it is assumed
that the relationship of intervening opportunity is similar to
that of a distance.

Tij = k Oi Dj

The model can be expressed as follows:


-n
(dij)

Pi j

where Pij is the intervening opportunity.

2.1.3 Networks

One of the branches of mathematics that has had a wide range


of application is graph theory.

It has been used in physics,

chemistry, computer technology, architecture, sociology,


anthropology, linguistics, geography and other disciplines.
According to Harary (1972, p.1 ) , graph theory has been
independently discovered several times.

In the 18th century

Euler gave a solution to the Koningsberg Bridge Problem with


the aid of graphs.

In the 19th century Kirchoff and Cayley

Solved problems in physics and chemistry respectively using

elements of graph theory ( ~ a r a r y ,1372, p.2).


Graphs are commonly represented through diagrams that greatly
facilitate their interpretation.

The diagram is composed of a

set of points representing different entities and a set of


lines joining these points, representing a pre-established
relationship among the points.
Formally a graph can be defined as follows:

A graph consists of a finite set V=V(G) of p points together


with a prescribed set X of q ordered pairs of distinct points
,of V. Each pair x=(u,v) of points in X is a line of G , and x
is said to join u and v.
( ~ a r a r y ,1972, 13.9)
This simple mathematical structure has allowed researchers to
solve a wide variety of problems and at the same time has
encouraged the development of this branch of mathematics.

In

particular, the use of the model to represent geographic


entities and their relationships is very common.

For example,.

graph theory as it relates to the concept of connectivity has


been used

to establish the definition of routes between two

or more population centers.

In a similar context, transport

networks such as railway and road systems have been modeled


with graphs and compared spatially and temporally through
different measures.

Since the economic development of various

Countries has been related to the connectivity of railway


networks, the technique has allowed researchers to establish
interesting relationships.

Studies have been carried out at

different levels (urban, regional, international) using the


?

appropriate measures in each case

re eta

index, density, etc.)

( ~ a g g e t t ,1977, pp. 86-92).

Additionally, temporal

comparisons in the growth of transport networks permit Taaffe,


Morrill and Gould to identify four phases of development in
various countries.

In other cases graphic simulation models

have been developed to predict network growth (Haggett, 1977,


p.301 )

Since graph theory is closely related to other branches of


-mathematics including linear programming, combinatorics,
matrix theory, topology and probability, it is common to find
many other instances where graphs are used in a geographic
context.

Whenever any of these types of models is applied,

graphs are often used either as part of the analysis or as an


aid in the presentation and interpretation of results.

Networks as Spatial Models

Among the existing mathematical tools, graphs appear to be one


of the most appropriate means to model spatial relationships
among geographic entities.

They have been used as descriptive

and predictive tools and for hypothesis testing.

The diagramatic representation of a graph gives the researcher


the opportunity to obtain a complete image of the relationship
among the entities under study.

This is extremely important

for spatial analysis purposes since it allows the researcher

to perceive given relationships spatially and, if necessary,


to correlate them with other spatial relationships. The model
is flexible enough to allow various spatial relationships to
be represented.

Pletric relations, such as distances, either

measured in the Euclidean plane or in a geographic space (e.6.


traffic flows, route distances) can be represented in the
model by attaching a weight and/or a direction to the
corresponding link.

Non-metric or qualitative relations such

.as contiguity are also easily modeled.

2.2 The Neighborhood Approach

During the first stage of development of quantitative


geography there was a strong tendency to use models that had
been designed in other branches of knowledge.

However, as a
L

consequence of the awareness of spatial analysts of the need


to represent the notion of spatial dependence mathematically,
several models that incorporate this concept have been
developed in different branches of geography.
Neighborhoods have been commonly used as a tool in the
modeling of spatial dependence.

This concept has different

meanings in geographical and mathematical terms.

Both

meanings are fundamental to the development of "neighborhood


models1I which is the major aim of the present work.

In the first part of the following section the geographical

and mathematical meanings of neighborhood are presented, and


in the second part three models developed in the latter stage
of the quantitative revolution which include the notion of
neighborhood are discussed.

2.2.1

Geographical and Topological Neighborhoods

Geographical Neighborhoods.

The notion of neighborhood appears frequently in geographical


theory.

For example, as cited by Taylor ( 1 9 7 7 ) , in the

central place theory proposed by Christaller (1933) and Losch

(1940) it is assumed that settlements provide specialized


functions for other settlements.

The size and shape of the

areas served by each of the "central places'' has been one of


the main topics studied in this branch of geography.

One of

the best known hypotheses is that under certain conditions,


including aspects such as demand for central goods, purchasing
power, flow of consumers and other factors, the shape of the
trade regions of the central places is hexagonal ( ~ a g g e t t ,
1977, p.146).

Similarly, in applied branches of geography

such as school location planning (see chapter 5), the areas


surrounding a population center play an important role in
educational planning.

The areas served by a school or group

of schools are called catchment areas.

The distribution, size

optimize the design of school districts, among other things.


Similar neighborhood concepts have been useful in the planning
of health services, shopping centers and banking facilities.

These and other geographic notions of neighborhood have been


formalized using different mathematical concepts including
geometric entities and graphs.

Commonly, the point of

departure is a set of geographic units which are often


,identified with either points or areas in the Euclidean plane.

Geometric [Link] the Euclidean plane is the model involved, it is common


to use regular geometric entities such as circles, rectangles
or hexagons to delimit neighborhoods.

Other geometric

entities such as Thiessen polygons are also used to define


neighborhoods.

In tnis case the polygons are constructed so

that given a set of data points in the real plane, all points
inside a polygon centered on a data point are closer to that
point than to any other data point ( ~ e u c k e ret al., 1976,
p.26).

Geometric entities that satisfy a geographic condition are


also used to define neighborhoods.

For example, in the

delimitation of catchment areas, traveling distances or


existing physical barriers may determine the shape of the
neighborhood (see Chapter 5).

Two common characteristics of the neighborhoods described


above are: a) the unit of interest (eitner a point or a line)
belongs to the neighborhood and b) the resulting Euclidean
subspace (the neighborhood) is connected in mathematical
terms.

It should be mentioned that in geographical applications it is


,common to deal with either point or areal units.
cases the treatment is very similar.

In both

When points are the

units of interest they are often assumed to be in the center


of the geometric entity.

When dealing with areal units, each

one is identified with a point (e.g. tne centroid) and the


neighborhood is defined exactly as it is in the case of point
units.

Another way of defining the neighbors of an areal unit is


through a contiguity relation.

Among areal units it is said

that two units are contiguous if they share a common boundary.


In more formal terms, two areal units are neighbors if they
have at least one segment in common, and the Euclidean
subspace formed by the set of neighbors of a fixed areal unit
11 ,,I

is called the neighborhood of "a".

Graphs When other models such as networks are used, two points are
said to be neighbors if there is a line connecting them.

For

a given point "p" in the graph, the neighborhood of "pttis


defined as the set of neighbors of "p". When areal units are
involved, it is possible to identify each one with a point and
to draw a line between any two points, provided the
corresponding areal units are neighbors.

The original

.structure involving areal units is known in graph theory as


the dual of the graph ( ~ a r a r y ,p.113),

and the definition of

neighborhood is very similar to the case where the units are


points.

The relations established in these graphs are sometimes


represented

in matrix form.

Given "n" units, a nxn matrix is


L

defined as follows:
1 if the units are neighbors
mi j =
0 otherwise

In some applications weights are given to the neighboring


relation.

These quantities represent factors that are

considered important for the phenomena under study.

Examples

might be the length of the boundary between two counties or


the size of flows between two population centers.

In this

case the elements of the matrix are the values of the weights
attached to each pair of units.

Orders of [Link] has sometimes been useful in geographical analysis to


define different orders of neighborhoods and neighbors.
~ e i ~ h b o r h o o dlike
s
those defined in previous paragraphs are
called first order neighborhoods and their elements first
order neighbors.

The set of points which are neighbors of the

first order neighbors and are not first order neighbors


,themselves are called second order neighbors.
set is called a second order neighborhood.

The resulting

Third, fourth and

successive orders of neighborhoods can be defined in a similar


manner

Topological Neighborhoods

.
The concept of neighborhood plays a fundamental role in the
development of topological theory.

Some of this theory's

basic concepts which will be used in subsequent chapters are


discussed in the following paragraphs.

First of all, it should be said that the definition of


topological space is based on the notion of open set.

topology with which most readers are familiar is the "usual


topology" in the real line.

The Usual [Link] are sets of real numbers commonly used in calculus
and mathematical analysis.
real numbers a, b where a

An interval is determined by two

<

b.

It is said that the interval

is open if it does not contain its "extreme points a and b."


Haaser et. al., (1959, p.23) define an open interval as
follows :
The open interval determined by two numbers a and b, where
a < b, is the set of all real numbers x for which a<x<b.
,This open interval is denoted by (a,b). Another way of
writing this definition is

An open set in the real line can be defined as follows:

A set 0 is open if for every point x in 0 there is an open


interval I such that x belongs to the interval I and the
interval is contained in 0. The open intervals are examples
of open sets (Royden, 1968, p.39).

A more formal definition is given by Hu (1964, p.39).


A subset U of R is said to be an open set if for an
arbitrarily given point u in U there exists a positive real
number d such that a real number x is in U if I x-ul < d.
In the real plane the open sets are defined in a similar way.
Consider a disk surrounded by a tight ribbon.
the "borderttor limit of the disk.

The ribbon is

A circle (disk) that does

not contain its border (ribbon) is called an open circle.


Open circles are also examples of open sets in the usual
topology of the real plane RxR.

There are of course other

definitions of open set which vary according to the


5

topological space under consideration.

The open set concept is crucial for establishing the concept


of topological neighborhood.
It is said that a set is a neighborhood of a point if it
contains an open set that contains the point.
be a given space and p be a given point in X. A set
is said to be a neighborhood of the point p in the
space X iff there exists an open set U of X such that

Let

-N c X

This definition comprises the intuitive notion of


neighborhood.

The point of interest belongs to the

neighborhood and the neighborhood is formed by the llspacel'


that is near or proximal to the point.

For example, in the

real plane an open circle with center in 'la1'is a neighborhood


of point "a" (see figure 2.1 ) .

Other topological concepts which have been useful in


interpreting some of the results obtained in this study are
those of interior, exterior and boundary points.

Hu (1964,

p.21) gives the following definitions:


The point p is said to be an interior point of the set E
provided that there exists a neighborhood N of p in X
Contained in E. The point p is said to be an exterior point
of E if there exists a neighborhood N of p in which X contains
no point of E. Pinally, the point p is said to be a boundary
Point of E in case every neighborhood N of p in X contains at

least one point in E and at least one point not in E.


By examining a particular case in the Euclidean plane it is
possible to acquire a more intuitive grasp of this abstract
c0ncep.t. Consider a circle in the real plane RxR.

The points

in tne circumference that limit the circle are border points


while those in the circle are interior points (see figure
2.1 ) .

exterior point
interior point
border point

............................................................
Figure 2.1
9'

%.

f
I.

In other words, an open circle is defined as the set


C = ( x E RxR ; Ix-r'l <dl.

All the points belonging to tne

open circle are interior points of C.

On tne other hand, the

points lying on the circumference of the open circle, that is

jx E RxR ; I x-rll = dl, are boundary points of C.

Finally, the

set of points that are neither in the open circle nor in its

2.2.2 Spatial Autocorrelation

A mathematical technique that was specifically designed for


the study of spatial dependence is that of spatial
autocorrelation.

The concept of spatial autocorrelation can

be summarized as follows:

it is said that a set of areas

exhibits positive spatial autocorrelation if high values of a


variable in one area are associated with high values of the
,same variable in neighboring areas.

In brief, spatial

autocorrelation is a statistical technique that allows the


researcher to test hypotheses on spatial dependence.

The entity under study is assumed to be a two-dimensional area


which has been partitioned into non-overlapping regions that
are exhaustive of the area.

The basic areal units are called

counties, but tne technique is equally valid if the objects of


interest are point units.

Since in the study of spatial dependence the relationship


between an entity and its surrounding is fundamental, the
concept of neighborhood has a key role in the spatial
autocorrelation model.

It is common practice to represent the

neighborhood relationship in this type of analysis through tne


use of a matrix.

In some cases this permits the relations to

be weighted and different orders of neighbors to be taken into


Consideration.

In order to test hypotheses on spatial autocorrelation various


statistical techniques have been designed.

One of the best

known is the one proposed by Geary (cliff and Ord, 1973, p.8):

(n-1 )

.% ~f= 1

dij (xi-xj)

1=1

where n is the number of units,


xi is the value associated to the ith unit,

Zi

xi-X ; the deviation with respect


to the mean,

dij
- =

0 if units i and j are not linked

(1

if units i and j are linked

A= 1/2 Li ; the total number of links in


the county system
n

Li=

2 Wi j

; tne number of units linked to unit i.

Clearly, this measure is sensitive to the spatial pattern


induced by the neighboring relation.

However, it ignores

other spatial characteristics of the units such as shape and


size (cliff an Ord, 1973, p.272).

There are many situations in which this type of technique has


proven useful.

Among them are map comparisons with

applications to diffusion processes and the analysis of

regression residuals (cliff and Ord, 1973, p.69, 105).

2.2.3 Geostatistics

~eosta'tisticsis a field which was developed for, and mainly


applied to,

mining problems.

mining however.

Its relevance is not limited to

Again tne concept of neighborhood is a

fundamental part of the model.

According to Clark (1979, p.1 ) geostatistics began in the


early 1960's with the work of George Matheron and was then
introduced as "The Theory of Regionalized Variables."

The

basic problem it addresses is the estimation of a sample at a


particular location in space or time.

A well-known

application of these statistical techniques is the estimation


b

of ore reserves.

The method is designed to permit local estimation.

Given a

relatively small number of samples in an area, how can the


value of a fixed point belonging to that same area be
estimated?

The relative position of the point with respect to the samples


is assumed to determine its value.

This factor is accounted

for in the model by means of the concept of distance.

In fact

it is often assumed that the difference in value between two

points depends only on the distance between them and their


relative orientation

lark, 1979, p.5).

A basic concept in geostatistics is that of the variogram.


Given the set of differences between the values of all the
sample points, the variogram is defined

as its standard

deviation.

>The experimental variogram is expressed as follows:

Where h describes the distance and the relative orientation, g


is the grade (value) associated with the point, x denotes the
position of one sample and x+h the position of the other, and
n is the number of possible pairs in the sample set. Y(h) is
called the semi-variogram

lark, 1979, p.5).

For a given distance and orientation (e.g. 100m and


north-south) the values of the experimental variogram are
calculated.

The resulting set of values is plotted and used

to calculate "expected" values of the difference between the

grade values of two samples (Clark, 1979, p. 18)

According to Clark (1979, p.6) several semi-variogram models


have been designed, but only a few are regularly used.
include the spherical and exponential models
P-6)

These

lark, 1979,

'

2.2.4. Topological Data Structures

Another branch of geography in which the concept of


geographical neighbornood has been especially important is
that of Geographic Information Systems (GIS).

There are in

fact several areas where these applications have proven


fruitful.

Examples are image processing, digital terrain

-models, computer cartography and census data bases.

A similar process to mathematical modeling must be followed in


systems design.

A set of entities along with their

characteristics and relationships has to be identified with


formal structures that are representable in computer systems.
Fortunately, there are several well-established computer
representations for those mathematical structures such as
graphs and matrices which are often used in geographic
applications.

In the design of a GIs it is particularly common to find


spatial relationships that are easily manipulated through the
use of graphs.

Two examples are street structure in urban

areas and transportation networks.

Consequently, data

structures that allow efficient manipulation of graphs have


been designed in the past.

These types of structures have

been referred to in geographic literature as topological data

structures.

An example of an application of a topological data structure


is the one Peucker et. al. proposed (1976) for the treatment
of three-dimensional surfaces.

The data is a set of irregularly distributed points of the


surface.

Each of the points is selected so that it has a high

,content of information and is significant for the digital


terrain model ( ~ e u c k e rand Chrisman, 1975, p.64).

The data

set is assumed to be "triangulated" so that every point is a


vertex of a triangle.

This triangular irregular network (TIN)

is composed by triangular facets that cover the study area.


The neighborhood of each point in the TIN is defined as the
set of points that are connected to it by an edge of a
triangle.

The data structure is designed so that the

neighborhood of each point is explicitly stored.

This type of structure is important because it adequately


represents a graph such as the TIN.

However, its real value

is that this representation of geographical data effectively


allows the user to manipulate information using spatial
criteria.

The idea behind this type of structures is similar

to the one proposed in this study and applied in a different


context ( ~ e u c k e rand Chrisman, 1975).

Many other GIs based on this type of structures are found in


the literature.

The reader is referred to Dutton, (1978) and

Peucker and Chrisman (1975).

The concept of neighborhood is also found in other areas of


geoprocessing.

For example, in image processing when the

purpose is texture discrimination, it is common to replace the


grey level of each point by the average grey level of its
.neighborhood (~osenfeld,1978, p.3), and in the manipulation
of polygonal data the concept of local processing allows the
user to work with an amount of data which would be impossible
to consider if the whole data set were involved.

2.3 Discussion

Mathematical modeling has been used in geographical studies


for the past thirty five years.

The first stage of

development of geographical quantitative techniques is


characterized by the adaptation of existing techniques and
models in other areas of knowledge, while a second stage can
be identified by the development of models and techniques
specifically designed for geographical purposes.

Presently

both the classical models and those that incorporate the


notion of

neighborhood continue to be widely used.

It should

be mentioned that besides those models that have

Seen described there is a considerable amount of mathematical


models and techniques that have been applied in a geographical
context.

Such is the case of regression analysis ( ~ r o u w e rand

Ni jkamp, 1984, Rogerson, l984), discriminant analysis


(Fotheringham and Heeds, 1979,

Yupa and Mayfield, 1978),

probability theory ma or ley and Thornes, 1972, Burnett, 1978,


Muckay, 1983), simulation (Phipps and Laverty, 1983, Morrill
and Kelly, 1970) and linear programming ( ~ r o m l e yand Hanink,
,1985, Garfinkel and Nemhauser, 1970, Maxfield, 1972).

Currently, three main tendencies of research are found in the


area of quantitative geography: the application of existing
models or techniques to real-world problems,

the examination

of the mathematical properties and characteristics of existing


models and the modification of existing tools so that they
overcome criticisms.

Examples of the application, and in some cases adaptation of


models to specific situations are the works of Brouwer and
Nijkamp (1984) where a regression model is applied to the
study of the regional quality-of-life and residential
preferences in Holland and in that of Mulligan and

Gibson

(1984) where the purpose is to calibrate an economic base


model for

small communities.

On the other hand, further

studies of the characteristics of models are found in the


research undertaken by Smith (1984) where the main

purpose is

to characterize the gravity model in theoretical terms and in


:

that of Jong, Sprenger and Van Veen (1984) where the extreme
values of two spatial autocorrelation indices are derived.
Finally, efforts to adapt existing models to conditions that
had not been considered in the first design are found in the
works of Bodson and Peeters (1 975) and Bivand (l984),
regarding modifications of the linear regression model and the
spatial dependence effect

and in that of Schwab and Smith

-(1985) and Slater (1984) where the question of the form of


spatial interaction models regarding the level of spatial
resolution is addressed.

Since the initial stage of development of quantitative


geography criticisms have been made at two different levels.
At the more general level the criticisms are directed to the
general use of quantitative techniques.

The main argument is

based on the idea that the quantitative approach is a


positivist one

e en nett

and Wrigley, 1981, p.10, Johnston,

1981). At a second level comments are made around either the


use of the models or in the mathematical characteristics of
specific models and techniques.

Most of the criticisms made

in this second level are related with the statistical methods


that are commonly found in geographical studies ( ~ o u l d ,1970,
Martin, 1974, Sheppard, 1979, Bennett and Wrigley, 1981, p.8).

Quantitative geography is at a mature stage were the initial

enthusiasm provoked by early results has faded, and the


complete rejection of its benefits is not a current tendency.
Mathematical modeling is viewed as a tool for geographical
studies accepting that in some cases the quantitative methods
have proven to be a fruitful approach and at the same time
that their limitations are such that the search for better
models is far from having come to an end.

This last statement

is particularly true in relation with the modeling of spatial


,structures.

As mentioned in the description of the classical models,


widely used tools such as the factor models, were not designed
for the representation of spatial structures.

It is a fact

that in the modeling of the geographical landscape two


competing components are often found.

On one hand the

geographer is interested in studying the characteristics of a


phenomena that are a consequence of the site itself but on the
other hand geographical studies are focussed in the spatial
relationships between a site and its surrounding.

Berry

(1968, p.226) describes this fact as the dichotomy within


Geography, the dual concepts of site and situation: "Site is
vertical referring to local, man-made relations, to form and
morphology.

Situation is horizontal and functional, referring

to regional interdependencies and the connections between


places, or what Ullman calls spatial interactions".

These two

competing components are present in the design of models.

In

i.

completely dominant over the "situation or Galton's" one,


while in other nodels such as that of spatial autocorrelation
the relationship is reversed.

There is no doubt that to

adequately represent spatial structures it is necessary to


design models with a dominant "Galton's component" without
obliterating the other one.

In this thesis, models that

incorporate Galton's component through the topological concept


,of neighborhood are presented.

Chapter 3.

NEIGHBORHOOD MODELS

Among the models presented in Chapter 2, those that use the


notion of neighborhood to model spatial structure are clearly
distinguishable.

In each one of them geographical

neighborhoods are represented through different abstract


entities.

However, a global treatment of the use of

neighborhoods to represent spatial structures is not found in


the literature.

In the first sections of this chapter a quasi-mathematical


structure is proposed as a general framework for the design of
models that follow a neighborhood approach, and the concept of
neighborhood model is presented.

Different characteristics of neighborhoods are of interest for


the spatial analyst.

In the second section of the chapter,

the notion of local variation of a neighborhood is formalized


through various indices, and its geographical and mathematical

interpretations are discussed.

3.1 General Framework

Since %he use of mathematical models in human geography is


relatively new, the accumulated experience in formalizing (in
a mathematical sense) geographical concepts is also relatively
small.

In the natural sciences it is a common practice to

establish abstract models based on the scientist's knowledge


of the phenomena under study.

An analogous procedure has been

followed in geographical models.

It is, however, possible to establish explicitly, intermediate


stages in the formalization process.

This makes the modeler's

task of selecting appropriate mathematical representations of


b

the geographical landscape easier.

In this section two quasi-mathematical structures (geo-spaces


and geo-subspaces) are defined as an aid in the design of
models that incorporate neighborhoods.

Both geo-spaces and

geo-subspaces must be defined prior to designing the


mathematical model.

3.1.1. Intuitive Ideas

Whenever the geographical landscape is modeled, geographical

entities are commonly identified with mathematical entities


such as points, lines, areas or surfaces.

There are, however,

other elements of importance to geographical analysis, such as


the relative position of a entity with respect to its
surrounding or neighborhood, that have seldom been dealt with
mathematically.

Maps are excellent examples of the fact that geographers are


usually not interested in the study of isolated entities.
Undoubtedly, the map is the most successful of geographical
models. It allows the geographer to represent the most
relevant spatial relationships including distance, contiguity,
connectivity and shape. However, its most outstanding
characteristic is that the relative position of all its
elements with respect to their neighborhood is explicitly
b

represented.

This fact allows geographers to manipulate the

information content of maps using spatial criteria which focus


on spatial relationships among the entities rather than on the
entities themselves, although it should be mentioned that for
geographers spatial relationships are often implicit in maps,
and in many cases they deal with them in an intuitive manner.

3.1.2 Spaces and Subspaces

The major aim of this study is to present the development of


formal models with characteristics similar to the ones

mentioned above for maps.

The first step in the design of

such models is to formalize the concept of "geographical


neighborhood.I1

Since it is a very broad concept, discussion

in the following paragraphs is limited to the case where the


"geographical landscape" has been modeled representing its
entities and relations in a Euclidean space.

Since Euclidean

spaces are subject to very intuitive geometrical


interpretations, their correspondence with geographical space
becomes very natural.

In crude terms a geographical neighborhood is either a


geographical area limited by physical features or
administrative boundaries or an area tnat surrounds a
geographical entity such as a city, a school or an airport.

In Euclidean space the concept of "geographical neighborhood"


can be identified with that of subspace, a concept which is
often used in mathematical analysis.

In general terms, a

subspace of a Euclidean space is simply a subset of the


original one.

There are, however, other definitions of

subspace depending on the mathematical structure in question.


For example, Royden (1 968, pp.127, 137) gives the following
definitions of metric space and subspace:

A metric space ( ~ , p )is a nonempty set X of elements (which we


call points) together with a real-valued function p defined on
such that for all x,y and z in X:
i> P(X,Y) 2 0;
ii) p(x,~) = 0 if and only if x = y ;

XxX

iii> P(X,Y> = p(y,x);


iv) P ~ Y I) P(X,Z) + p(z,y)
The function p is called a metric.
If ( ~ , p )is a metric space and S is a subset of X, then S
becomes a metric space if we restrict p to S, that is to say,
if we take as the distance between two points of 9 their
distance as points of X. When we consider S as a metric space
with this metric, we call S a subspace of X.
In other words, a set of the space is a subspace if it
inherits the mathematical structure defined in the space.

In

many other mathematical spaces, such as vector and


topological, subspaces play an important theoretical role.

Considered

intuitively and transferred to a map context, the

concept of subspace might be expressed as follows:


given map, a piece of map is still a map.

for a

However, closer

scrutiny of the analogy makes it clear that this statement is


not always true.

If too small a piece of map is taken, it

ceases to satisfy the function of a map.

In the same way, if

isolated elements of a map are cut out, tne result will


probably not be a map.

This observation clearly indicates that if the concept of


geographical neighborhood is to be formalized, precautions
should be taken so that the proposed model preserves certain
features that are essential for spatial analysis.

The conditions imposed on a set of a space to be a subspace

must be analogous to the conditions imposed on the abstract


entities selected to represent a geographical neighborhood
regarding certain spatial conditions such as maximum distance,
minimum area or contiguity constraints.

3.1 . 3 Geo-Spaces

Having established a resemblance between the concept of


neighborhood in geographical terms and the mathematical
concept of subspace, in order to proceed with the
formalization process it is necessary to introduce the concept
of geo-space.

It is possible to ascribe roles to entities in geographical


theory that are similar to these roles played by space and
subspace in mathematics.
and geo-subspaces.

These entities are named geo-spaces

They are discussed in the the following

paragraphs, and although their definition is intuitive and


general, they have proved to be useful for the purposes of
this study.

In the geographic modeling process it is common to find a set


of entities under study (such as rivers, roads, cities or
census tracts) that are identified with mathematical entities
(such as points and lines in a Euclidean space, nodes or links
in a graph or elements of a matrix).

One or more spatial

relations are established among the entities.

These spatial

relations are represented mathematically through equations or


specific mathematical structures.

In the formalization

process it is common to assume that the mathematical entities


of interest are immersed in a mathematical space.

Some of the

most commonly used mathematical spaces in geographic


applications are the Euclidean, matrix and topological spaces.

A quasi-mathematical structure of this type is called a


geo-space

To come back to the example in Section 2.1.2, in the modeling


of migration it is common to find a set of population centers
identified with a set of points in the Euclidean plane.

The

basic spatial relation is established through distance and is


expressed in an equation such as equation 2.1
we say that we have a Xuclidean

In this case

geo-space.

Clearly, a geo-space is not really a mathematical structure


since it contains elements of the geographical landscape.
Rather it is the quasi-mathematical product of an intermediate
step in the modeling process.

The purpose of using such an

intermediate structure in the formalization process instead of


a purely mathematical one is to ensure the inclusion of all
the spatial relationships that have been identified as
relevant.

A natural way of conceptualizing a subspace of a geo-space is


to consider a subset of the entities under study along with
their mathematical counterpart, where the spatial relations
established in the geo-space are preserved along with their
mathematical expressions.

The subsets considered have to be

part of a subspace of the mathematical space under


consideration whenever the latter is part of the geo-space.

For example, a subspace of the Euclidean geo-space described


in the previous section is simply a subset of the set of
cities considered and the points in the Euclidean space with
which they are identified.

The spatial relation established

among the points is preserved, since in the Euclidean space it


is always possible to calculate the distance between any two
points.

In mathematical metric spaces the constraint imposed on the


subspace is the preservation of the distance function.
Similarly, in the case of geo-subspaces the main constraint is
related to the preservation of the spatial relationships
established in the geo-space.

3.2 Definition of Neighborhood Model

Once the spatial relationships of interest have been


explicitly expressed either verbally or mathematically in the
definition of the geo-space and the corresponding
geo-subspaces have been identified, the next stage is to
formally establish the mathematical model to be used.

As will be seen in the following chapters the concept of


geo-subspace permits the design of formal tools to model the
concept of geographical neighborhood.

Formal tools that are designed to represent mathematically the


spatial structure through the notion of neighborhood are
called Neighborhood Models.

These models are suitable whenever the interest of the study


resides in the characteristics or behavior of sub-spaces
rather than in single or isolated entities.

Autocorrelation,

geostatistics and topological data structures are examples of


neighborhood models.

3.3. The Heterogeneity Index

Section 2.2

shows that the notion of "geographical

neighborhood" figures in the most recent geographical models.

It is vital here to achieve the previously stated objective of


modeling the "geographical landscape" through the concept of
geo-subspace.

One of the characteristics of a subspace that

interests the geographer is "variation."


--

.
-

The similarity ( or

difference) between an entity and its surrounding is a measure


of this variation.
of variation

Throughout this chapter several measures

of a geo-subspace are proposed and possible

interpretations are indicated.

These measures will be called

,heterogeneity
-- indices
\--.

3.3.1

Formal Definition

In order to begin formalizing the idea of local variation it


is assumed that the study area is partitioned into
non-overlaping areal units that completely cover it and that
the variable of interest associated with each areal unit is
only one and it is of interval scale type.

Additionally, it

is assumed that the geo-subspace of each areal unit is


well-defined.

Thus, the number of geo-subspaces (in this case

neighborhoods) is equal to the number of areal units.

The

definitions would also be valid if the units of study were


points.

The heterogeneity index associated with the neighborhood of


unit "a" , Ia is defined as follows:

',-

where Xi is the value associated with the ith areal unit, Xa


is the value of unit "a" and k is the number of neighbors of
11

unit "a."

Ia is therefore_ the sum of- squares


of deviations
-----.
__I_____

between unit "a" and its k neighbors.


-- - -

, u'-'.

- ----

'Clearly, this index is highly dependent on the units of


measurement.

Therefore, in order to make comparisons between

neighborhoods easier a new index is defined as follows:


Imax
Ha =

,'

/,,
'1

Ia

Imax - Imin

where Ia is the heterogeneity index associated with the


neighborhood of "a" ,

and Imin and Imax are the maximum and

minimum value of the set of heterogeneity indices associated


with the neighborhoods of the areal units under study.
takes values between
zero
and one.
u
- -

Ha

The higher the degree of


/
,/ 17-1r

/-

variation of
- the neighborhood, the closer the value of Ha to
-*

./"

/,-

fl

- . /

zero.

Another measure of variation which appears natural under the


same assumptions is that of local variance defined as follows:

k+1
va =

(xi

- Xa)

k+ 1

i=l

where k is the number of neighbors of unit "at',Xi is the


value associated to the ith unit and Xa is the mean value of
the values associated with unit "a" and its neighbors i.e.

Xi

2- k + l

where

Xa = X k+l

Va is therefore the sum of squares of deviations towards the


mean.

A way to standardize this method has been previously proposed


(silk, p.20).

Since comparisons are essential in this type of

analysis, a similar standardization is proposed for Va as


follows :

In statistical terms, the index as defined in equation 3.3.


corresponds to the variance of the geo-subspace.

Usually, for

a given sample a mean and a variance are associated to it.

In

this case, for a given geo-space a set of variances and means


are associated to it, one for each of the geo-subspaces under

study.

It is, however, also possible to calculate the mean

and variance of the set of heterogeneity indices associated to


a geo-space.

For example, consider a geo-space represented

via a graph where each point is connected to all the other


points, of the graph (i.e. it is a complete graph) as shown in
figure 3.1.

In this case, for each point its neighbors are

the remaining points in the graph, and as will be show in the


following paragraph, the value of the heterogeneity index as
defined in equation 3.3 is the same for every one of the
points.

Let al, a2, a3,


Xa2,

...Xan

...an

be the points of the graph and Xal,

the values associated to each one of them.

The

heterogeneity index of the geo-subspace of each point ai is:

Vai =
n

where

-X

Xal

Xa2

... + Xan

n
In this case, since all the heterogeneity indices of the
geo-subspaces have the same value, the variance of the indices
for this particular geo-space is always zero.

However, as shown in section 4.3.2, the most common case in


geographical studies is that of a geo-space represented via a

non-complete graph, so that the variance of its heterogeneity


indices will differ from zero in most cases.

Complete graphs of three and four points.


Figure 3.1

The Multivariate Case

In spatial analysis the researcher often deals with several


traits which characterize each of the units.

Therefore, it

becomes necessary to extend the definition of the


heterogeneity index to the multivariate case.

The purpose of such an index is to summarize for the different


values associated to each of the spatial units the
relationship between each of the units and its neighbors.

feasible way of doing this is by obtaining the heterogeneity


index separately for each one of the variables of interest and
then adding them.

The multivariate heterogeneity index for unit "a" is defined


as follows:

where Xij is the value of the jth variable for the ith
neighbor of "att,Xaj is the value of the jth variable
associated with unit "a" , p is the number of variables and k
'is the number of neighbors of unit "a".

Similarly, as the heterogeneity index was defined as a measure


of the local variance in equation 3.3, it is feasible to
define a multivariate index adding the variances associated to
each variable for a given geo-subspace as follows:
p

k+l

I t

( Xij

Xj )

where Xj is the mean value of the jth variable associated to


unit "a" and its neighbors.

Analogously to the univariate case, the variance of the set of


multivariate heterogeneity indices can be calculated for the
geo-space under study.

There are various problems involved in the multivariate case.

The most obvious one is tne fact that the variables are
measured in different units which are often non-comparable.
The usual procedure to overcome this restriction is to apply a
transformation to the original variables.

An example of a

transformation used to equalize the variables is to force them


to have unit variance.

There is no complete agreement on the

merits of these methods, and the discussion of whether to


ignore the problem or to apply a transformation is left to the
judgment of the analyst of the problem at hand.

Nevertheless, it is possible to redefine the index so that


comparisons among the various indices associated with the
different variables become easier.

Let Haj be the index

associated to regional unit "at' according to variable j.

The

new index is defined as follows:

Ha=

Haj

j=1

The value of this index is between zero and p.

The smaller

the variation of the neighborhood with respect to the p


variables, the closer the value of Ha to p.

Another problem that arises when several variables are


included in the analysis is related to the definition of
neighborhood.

The geo-spaces generated by the study of two variables are not


necessarily the same.

As a consequence a geo-subspace of one

of them is not necessarily a geo-subspace of the other.

In

terms of neighborhoods this means that for a given unit "a"


its neighbors with respect to one variable are not necessarily
the same with respect to another one.

If the neighborhood

relation for each variable were represented by means of a


graph, the generated graphs could be different.

The definition of the multivariate heterogeneity index has to


be altered as follows to consider this contingency:

where kj is the number of neighbors of unit "a" induced by


the jth variable and k j = kl,

...,kp.

As with previous indices, the deviation with respect to the


value of unit "a" is calculated for each one of the "p"
variables.

However, in this case the neighbors and their

number can vary from one variable to the other.

3.3.2

Interpretations

In the previous section measures of local variation were


proposed for univariate and multivariate cases.

However, no

geographical meaning was given to their values.

Two possible

interpretations of the heterogeneity indices are described in


t'he following paragraphs.

The Heterogeneity Index as a Topological Measure

In the interpretation of the heterogeneity index as a


topological measure, knowledge of two branches of mathematics
,is combined.

Concepts that are a traditional part of

topological studies such as the interior and boundary of a


region, are combined with concepts from the relatively new
area of fuzzy sets, such as the degree of membership of an
element to a set.

In order to fully understand the role of the heterogeneity


L

index it is necessary to establish the assumptions upon which


the interpretation rests.

Thus, in the first part of this

section some basic mathematical concepts are mentioned prior


to interpreting the index.

Topological Concepts.- First of all, it is assumed that the


point of departure is a connected graph G that forms part of a
geo-space.

That is, the entities under study have been

identified with the nodes and links between them that


represent a spatial relationship.

Additionally, the neighbors

of a node are defined as its first order neighbors.

For convenience, a topology is defined on the graph so that


every subgraph formed by a node and its neighbors is a
topological neighborhood.

The mathematical details of this

definitions are discussed in section 3.4.

Fuzzy Set Concepts.-

In the classic concept of membership of

an element to a set, the element either belongs or does not


belong to the set.

This concept was expanded by Zadeh (1965)

to reflect more accurately situations which often arise in the


real world.

Whether an element belongs to a class is often a

matter of degree. To model these situations mathematically


Zadeh proposed an entity which he called "fuzzy set."
Zadeh's definition of fuzzy set (1965) follows:

Given X a space of points, a fuzzy set A in X is characterized


by a menbership function fa(x) which associates with each
point in X a real number in the interval (0,1), The nearer
the value of fa(x) to unity, the higher the grade of
membership of x in A.
Based on this definition, operations and concepts similar to
those studied in ordinary sets have been applied to the study
of fuzzy sets.

Examples of such operations include union and

intersection, convexity and algebraic operations.

The concept of fuzzy sets has been widely applied in areas


that include metamatnematics, numerical taxonomy and pattern
recognition.

A large amount of research has been undertaken

since the concept was formulated by Zadeh in the 1960's.

A Geographical Interpretation.-

One of the problems that has

traditionally worried geographers is the definition of classes


among a set of entities.

It is in this context that the

heterogeneity index becomes meaningful.

In this thesis the

idea is to define the degree of membership in a class for each


one of the nodes of the graph.

Since in this initial process there are no pre-defined


classes, the degree of membership is better understood as the
potential for becoming an interior point of a hypothetical
class.

For a fixed point p in the graph G the heterogeneity index Hp


can be interpreted as a measure of the potential of membership
of p to the interior of a hypothetical region.

According to

the topological definition of the interior of a region, a


point is in the interior if there exists a neighborhood of p
that belongs to the region.

In this case the topological

neighborhood of point p is the set of its first order


neighbors.

It is at this stage that the concept of

fuzzy set

becomes relevant.

Although it can not be established at this point whether the


neighborhood belongs to the region or not, it is possible to

measure the degree of membership of the neighborhood to the


interior of the region.

This measure is given by the

heterogeneity index associated with the neighborhood of p.


The closer the value of Hp to one, the higher the degree of
membership of the neighborhood to the interior of the region.
Inversely, the closer the value of Hp to zero, the lower the
degree of membership of the neighborhood to the interior.
This relationship corresponds to our intuitive conception of
the interior of a region.

If a geo-subspace tends to be

homogeneous; that is, if the similarity between an entity and


its surrounding is high, then it must be in the interior of a
region.

As expected, in this case the value of the index is

close to one.

In the inverse case, if the subspace is highly

heterogeneous, it must belong to the border of a region and


the value of the heterogeneity index is close to zero.

It should be noted that in our problem the notion of the


exterior of a region is meaningless since there are no defined
regions.

Therefore, it is possible to distinguish only

between interior and boundary points.

This interpretation of the heterogeneity index as a measure of


the potential of a point to be either in the interior or
border of a region will be applied in a classification context
in the following chapter.

The Heterogeneity Index as a Geographical Measure

The homogeneity of a geo-subspace has also been a


problem for geographers.

traditional

While the number of geographical

studies related to regionalization (see Chapter 4) is quite


large, the study of the heterogeneity of a geo-subspace has
not received much attention.

Intuitively however, the concept

seems to be very important for spatial analysis studies.

For example, in issues related to mapmaking the cartographer


sometimes views the areas of "heterogeneity" as an indicator
of the scales that should be used.

In such cases, the

assumption is that for areas that show a "uniform landscape"


there is, in general terms, less interest in producing larger
scale maps.

A second example comes from social geography,

where the study of urban spatial patterns has received much


attention, particularly regarding the distribution of social
groups (silk, 1979, p.100).

Urban areas have been

differentiated by the social characteristics of their


population.

Nevertheless, spatial patterns of the boundaries

of the "city neighborhoods" are intuitively equally important.


Two contiguous city neighborhoods most likely interact
significantly through their boundaries.

If this is so,

"highly heterogeneous" boundaries must play a different role


in the study of social interaction than "less heterogeneous"
ones.

A high income residential neighborhood surrounded by a

low income one must interact in a different manner with its


surroundings than a similar high income residential
neighborhood would with a middle-class one.

Heterogeneity indices similar to the ones proposed in this


chapter could be used in the study of geographical
heterogeneity.

An example of an application of this concept

is found in the educational planning problem presented in


Chapter 5.

3.4. Fuzzy Topology

A close examination of the intuitive ideas behind the


mathematical theory of topology clearly points out the strong
resemblance between the geographical problem posed in this
thesis and one commonly encountered in this branch of
mathematics.

Firby and Gardiner (1982) give an excellent

overview of the main ideas that are the basis for the
development of topological theory.

The term "topologyt'was

originally introduced in the 19th century by one of Gausst


students and was used in addition to "analysis situs" to refer
to this new branch of mathematics.

Two parallel developments

of topological theory can be identified:


topology and algebraic topology.

point-set or general

Point-set topology was first

inspired by Cantor's work (1880)on the general theory of


sets, but its major advancement occurred only in this century

in the work of Frechet (1906) and Hausdorff (1912).

In general topology, concepts that are usually defined in a


Euclidean space such as "limit" and "continuity" are
generalized to abstract sets through the notion of
neighborhood.

For example, the definition of continuity of a function


,defined in the real plane (HXK) is based on the notion of open
interval as can be appreciated from the following formal
definition given by Haaser [Link]. (1959, p.327).
The function f is continuous at the point Xo in Df if for each
c > 0 there exists a d > 0 such that
whenever X E Df and [ X - X O /
the function f.

< d.

Df denotes the domain of

In a similar manner the concept of continuity is generalized


to abstract sets using the concept of open set and
neighborhood.

For example, a function from a metric space X to a metric


space Y is continuous if and only if for each open set 0 in Y,

-1
the set f ( 0 ) is an open set in X ( ~ o y d e n ,1968, p.132).
In summary, the space surrounding a point or, in other words,
the notion of nearness to a point is formalized in general
topology by means of the concepts of open set and neighborhood

and is used to generalize ideas that had been developed when


the set of interest was the real numbers.

In contrast, algebraic topology, inspired by more geometrical


problems, was introduced by Poincare between 1895 and 1905.
It should be mentioned that this thesis focusses on the
application of general rather than algebraic or surface
topology.

Nevertheless, it is recognized that concepts

developed in areas where there is a geometrical approach, such


as surface topology, can be of interest for certain
geographical studies.

3.4.1 Definition of Fuzziness

As in many other branches of mathematics, general topology is


based on the traditional concept of membership to a set where
an element either belongs or does not belong to it.

As

mentioned in section 3.3.2 the concept of fuzzy set has been


used in various branches of mathematics to generalize
theories.

For example in traditional systems of formal logic

a proposition is either true or false.

However, the

application of the notion of fuzziness has permitted the


development of a multi-valued logic which has been found
useful in the design of the so-called artificial intelligence
expert systems.

In the case of general topology the idea of fuzziness in a


geographical context appears in a natural manner.

In the same

way as the bivalent notion of membership to a set does not


provide an adequate model for some real problem-solving
situations in applied mathematics, in geography the bivalent
notion of the interior of a set as defined in topology is not
always adequate for regionalization purposes (see section

4.304)
With the aid of the heterogeneity index it is possible to
formalize the idea of fuzziness in topological terms.

For

example, once a geo-space under study has been identified with


a graph as defined in section 3.3.2, a topology can be defined
on this set.

Since the operations among graphs are implicit in the concept


of topology, the definitions of union and intersection between
graphs have to be established.

Union: Given two graphs G1 and G2, with their corresponding


sets of nodes V1 and V2, and of links X1 and X2, the union
between G1 and G2 (GI

U G2) is the graph G with V

V1 U V2

and X = X1 U X2 ( ~ a r a r y ,1972, p.21 ) .

Intersection: The intersection GI fl G2 is defined through the


links as follows: X = X1n X2 and V is the set of all the nodes

represented in X.

For convenience the following definition of a topology on G is


given:

U is an open set of G if:


i) U is a subgraph of G and
ii) for every point p

v(u),

there

exists a non-empty connected subgraph


N of U such that v(~)f(pj and p

v(N).

It can be proven that these open sets satisfy the conditions


required to be a topology of G (see Appendix A).

In particular for every point p in V(G) the subgraph formed by


p and its neighbors is a topological neighborhood.

This can

also be proven (see Appendix A).

It should be remembered that other topologies can be defined


on G.

The convenience of this particular definition is that

entities which have been previously used to model the notion


of "geographical neighborhood" such as the subgraph formed by
a node and its first neighbors are also topological

neighborhoods.

As a result, the heterogeneity index associated to a

geographical neighborhood becomes part of a topological space.


The heterogeneity index can be interpreted topologically as
the degree of membership of a neighborhood to the interior of
a set.

3.5 Other Indices

In the definition of the heterogeneity index it was assumed


that the variables involved were of interval scale.

At this

point the question of whether it is possible to define


equivalent indices for other types of variables is considered,
and an equivalent index is proposed for those cases in which
the variables are of nominal type.

The class to which a particular unit belongs can be determined


C

by a nominal variable.

These types of variables are

encountered in geographical problems in which characteristics


that can only be described through classes are involved.

Such

is the case of spatial analysis problems where variables such


as sex (female, male), income (low, medium, high), religion,
or nationality characterize the spatial units under study.

There are several measures used to assess the similarity


between units with respect to nominal variables.

The

comparison is made in terms of whether the units have the same


p.123).
or different scores on the variables ( ~ n d e n b e r ~1973,
,

The following "matching coefficient" is one of those


similarity measures:

Sab =

Nab
T

where Sab is the similarity between units !'a1' and "b" , Nab

is

the number of variables on which the units match and T is the


total number of variables.

The more similar two units are,

the closer to 1 the value of Sab.

The particular objective at this point is to define a


similarity index that reflects the relationship between a unit
and its neighbors.

The index should be defined so that the

more similar a unit is to its neighbors, the larger the value


of the index;

the more heterogeneous a unit is with respect

to its neighbors, the closer to zero the value of the index.


The proposed index follows:

Nia

where Nia is the number of variables on which units "a1' and


I,

iI! match, T is the total number of variables and k is the

number of neighbors of "a".

If unit "a" and its k neighbors match in all the variables

then the index Ia equals 1 .

If unit "a" does not match in any

of the variables with any neighbor, the value of the index is


zero.

There are many other matching coefficients.

Therefore, the

heterogeneity index has to be redefined in every case


depending on the measure used.

Whether a particular index is

appropriate or not depends on the problem at hand.

3.6 Conclusions

A general framework for the design of models that represent


spatial structure mathematically using the notion of
neighborhood was established in the first part of this
chapter.

The benefits of this approach can be appreciated in

the design of the two neighborhood models in the following


chapters.

As a first step in the development of neighborhood models


measures of local variation of a geo-subspace were defined and
through them the geographical notion of neighborhood was
related to the topological concept of neighborhood.

These measures, heterogeneity indices, are meaningful in both


geographical and mathematical terms.

From a geographical

point of view, this formalization is important for the modeler

since the geographical entity of neighborhood is identified


with an element of a mathematical structure which has been
broadly studied during this century.

On the other hand, in

mathematical terms the indices allow the definition of


fuzziness in a topological space.

A development which to the

best of the knowledge of the author has not been explored


before and could lead to the development of a new topology, a
fuzzy topology.

Chapter 4.

A TOPOLOGICAL APPROACH TO REGIONALIZATION

Regionalization is probably one of the best known branches of


geography.

One of the central issues in regionalization

problems is homogeneity.

It seems natural, therefore, to

apply a concept like local variation of a geo-subspace to the


process of region building.

In this chapter an application of

the heterogeneity index in the design of classification


L

algorithms is presented.

In the first and second sections an

overview of regionalization is given, and several existing


spatial algorithms are discussed.

In the final sections two

regionalization algorithms that use a heterogeneity index are


presented, and a hypothetical case is included.

4.1 Regionalization as a Classification Problem

The identification of areal groups that show a homogeneous


distribution of one or more characteristics but differ from
other groups is one of the central issues in regional

geography ( ~ o b l e r
, 1958, p. 1 4 0 ) .

Regionalization is the process by which regions are identified


and classified.

Bunge (1966) clearly recognizes the

definition of regions as a classification or taxonomic


problem.

In taxonomic terminology, a uniform region is

equivalent to an areal class, a single feature region is a


classification using a single category, etc.

From this point

of view a regionalization is a classification of geographic


units.

A whole body of classification techniques have been developed


as an inquiry tool for other sciences such as biology and
botany.

The methods developed in classification or cluster

analysis are in essence formal; that is, they employ a


mathematical frame.

The intent of such methods is to find a

solution to a classification problem similar to the one


produced by a specialist.

Decision rules for classifications

are usually designed in the form of algorithms.

Various

disciplines use algorithms that are in essence equal but have


been adapted to different circumstances.

Regional geography

shares universally accepted methods such as "central


agglomerative procedures."

Geographic studies which adopt a

classification methodology to various contexts have been


reported.
cities

Some examples are the studies of areal patterns in

ones,

1977), the partition of an area into adequate

zones for the optimal location of service centers such as


hospitals and schools ( ~ c o t t ,1969) and political districting
(~arfinkeland Nemhauser, 1968).

According to Haggett et al. (1 977, p.451 ) there are three


classificatory approaches that have been used by geographers:
uniform regions, nodal regions and planning or programming
regions.

Uniform regions are those in which places located

within the regions are homogeneous with respect to one or more


properties.

The regions are disjoint, contiguous and

completely exhaust the study area.

Nodal regions measure

interactions between units such as migration and number of


telephone calls.

Planning regions are created to satisfy

specific needs of an institution, to implement policy


decisions or for administrative purposes.

The criteria

selected to define these regions reflect the objectives for


which they were created.

Such is the case of the definition

of enumeration areas for a census.

The resulting regions are

not necessarily contiguous and might not exhaust the study


area.

In addition to the clear differences among types of regions i t


should be emphasized that the classification of locations into
regions also serves different purposes.

The main ones are:

hypothesis testing, administration and programming.

In the

case of nodal and uniform regions, classification is often

undertaken as an exercise to substantiate spatial theories.


The definition of programming regions serves specific
purposes.

This does not mean that the types of regions are

not closely related.

In fact, the definition of programming

regions is often constrained by previously defined nodal or


uniform regions.

4.1 .1 Elements of a Regionalization

Depending on the purpose of regionalization, different choices


are available to the analyst.
result of the process.

Each one has an impact on the

Therefore, the appropriate selection

of units, algorithms, etc is of vital importance.

In some

cases choices are almost equivalent, while others differ


drastically.

These decisions often depend on the analyst's

understanding of the problem itself.

It could therefore be

argued that this introduces a subjective factor to the


regionalization process.

When a classification exercise is carried out, the first stage


consists in defining its elements.

A regionalization has the

same basic elements as other classifications, but it also adds


spatial constraints.

The elements of regionalization are:

the units for the regionalization


the properties that characterize the regions
a measure of homogeneity or similarity

spatial constraints such as contiguity


and compactness

a grouping criterion
the algorithm to create the regions
the number of regions.

At this point it is worth mentioning the principal factor that


singularizes region-building in comparison to other
classification schemes:

the data units have an implicit

locational characteristic (~unge,1966).

In numerical

taxonomy similarity and nearness are equivalent; however in


spatial applications it is important to draw a distinction
between the two terms.

Similarity measures are commonly

applied with a nearness or contiguity constraint.

4.1.2 The Geographic Units

The two main limitations which the analyst faces in the


selection of units are the availability and the level of
aggregation of the data.

According to Sawicki (1973), the

availability of locational data for urban and regional


researchers is very limited.

In most cases it is obtained

from secondary sources such as census and administrative


offices.

Data is often compiled for fixed administrative

areas such as school districts or for street blocks.

The

accessibility of census data and availability of statistical

packages has increased, but spatial analysts have become


increasingly aware that existing data is not always compatible
with the hypothesis under study ( ~ a w i c k i ,1973, p. 146).

The two most common regionalization units are areal and point
types.

It appears that the most popular areal level used in

urban studies has been the census tract ( ~ a w i c k i ,1973,


p.110).

Tracts are delineated so that they are homogeneous

.with respect to characteristics such as income and


topographical features, as well as constraints such as
population size and contiguity.

It seems to be the case that

analysts have a vast range of levels of aggregation from which


an appropriate selection may be made.

However, spatial

analysis done at different levels of aggregation has shown not


only different but contradictory results ( ~ a w i c k i ,1973,
p.110).

The fact that the selection of units determines to a

great extent the results of spatial analysis severely


restricts the researcher since s/he often does not have
control of the definition of units used to compile data.

4.1.3 Measures of Homogeneity


In geographic studies regions are considered "areal systems
based on levels of similarities and differences in spatially
distributed traits" ( ~ o b l e r ,1958, p.140).

Homogeneity is

identified with low areal variance and heterogeneity with high

areal variance (~unge,1966, p.22).

Regions must be

internally homogeneous and differentiated from other regions.


It is in this context that the grouping of areas into regions
has been approached as a classification problem.

Identifying homogeneity with similarity as it is understood in


numerical taxonomy has made it possible to define regions
using the same techniques as in cluster analysis.

To carry

,out a regionalization it is therefore necessary to establish


the significance of homogeneity or similarity among areas and
regions.

In the following paragraphs some of the measures of similarity


[Link] that have been used for grouping purposes
are presented.

The point of departure is a set of units and a


b

set of variables characterizing them.

Depending on the scale

of measurement the variables can be classified in four groups:


nominal, ordinal, interval and ratio.
1 . A nominal scale allows distinctions to be
made between classes.
2. An ordinal scale induces an ordering of the objects.

3. An interval scale allows comparisons of two objects


by neans of the differences between them.

4. A ratio allows comparisons of two objects by


both a difference and a ratio.
(Andenberg, 1973, p.27)

Some of the similarity measures used for interval scale data


follow:

Minkowski Metric

The distance between units "i" and "j" according to the


Minkowski metric is:

where q>l and p is the number of variables.

In particular for

q = 2, Dq is the Euclidean distance.

The greater the dissimilarity between units "k" and "j", the
larger the value of Dq.

The measure increases with decreasing

similarity and decreases with increasing similarity.

In

geographic applications the Euclidean distance is the most


commonly used metric.

When the Minkowski metric is used, it is assumed that the


variables are immersed in an orthogonal space.

This poses

some limitations in geographic applications, since the


variables are often not orthogonal.

Principal component

analysis has been used to overcome this restriction (~yfulgen

and Nordgard, 1973).

Correlation Analysis

The prbduct moment correlation coefficient can be used as a


measure of association between units.

In geographic terms the

degree of association has been interpreted as a measure of


"regional bonds" (~aggettet. al., 1977, p.476).
,central problems

One of the

of this method is that the variables

associated to a unit involve different measurement units.


This renders mean and variance meaningless (~ndenberg,1973,
p.113).

The correlation between data units "j" and "k" is

defined as:
P
(xij
i= 1

Rjk =

where Xi

xj) ( ~ i k- xk)

P
2
[i=l
r (xij - xj)

lIp

Xi

P
- 2 112
z ( ~ i k - X k )]
i=1

and p is the number of variables.

i=l

Analysis of Variance

There are at least four measures of similarity that have been


used in terms of analysis of variance.

The first two

quantities (a and b below) are used in univariate cases while


the last two (c and d below) are used in multivariate cases.

a)

In grouping procedures the following quantity is used as

an objective function:

where Wi is the weight assigned to each data unit, n is the


number of units and
1

5 denotes the weighted arithmetic mean of

those Xi that are assigned to the subset to which element lli'l

ishe her, 1958, p.789).

belongs

become more homogeneous.

D decreases as the groups

D is known as the sum of squares

within groups in the sense of analysis of variance.

In

grouping procedures the objective is to minimize D.

b)

In building regions it is desirable to have internal

differences minimized and differences between regions


maximized.

That is, homogeneity within regions and

heterogeneity between regions should characterize the


grouping.

The following measure shows these inter and intra-regional


differences.
external variation (between regions)

(4.5)

internal variation (within regions)


The closer the grouping fits the desired requirements, the

h i g h e r t h e v a l u e o f "H1I. I t s h o u l d b e n o t e d t h a t t h i s
q u a n t i t y i s u s e d as a t y p e o f o b j e c t i v e f u n c t i o n r a t h e r t h a n
as a c r u d e m e a s u r e o f h o m o g e n e i t y .

c ) The Ward Method

Ward d e f i n e d t h e f o l l o w i n g m e a s u r e b a s e d o n t h e i d e a t h a t
whenever t h e r e is a g r o u p i n g t h e r e is a l o s s o f i n f o r m a t i o n :

t o t a l within region
e r r o r sum o f s q u a r e s

Ek =

mk

I:

i = 1 j=1

e r r o r sum o f s q u a r e s
f o r region k

mk

Xik = l/mk

Where

( x ijk-Xik)

(4.6)

Xijk

mean o f t h e i t h
variable for areas
i n region k

X i j k = value of t h e i t h v a r i a b l e f o r
the j t h area i n the kth region,

n = number o f a r e a s ,

= number o f r e g i o n s ,

mk= number o f a r e a s i n r e g i o n k ,
p = number o f v a r i a b l e s .

I n t h i s c a s e E i s u s e d as a n o b j e c t i v e f u n c t i o n t h a t h a s t o b e
minimized.

The more i n f o r m a t i o n t h a t i s l o s t i n a

r e g i o n a l i z a t i o n , t h e l a r g e r t h e v a l u e o f "E".

d) Cliff and Haggett (1970) defined a similar homogeneity


measure as:
1 - E

max E
where E is the total within-region error sum of squares, and
the maximum is taken over all the possible values of E.

In

-fact the maximum value is obtained when the resulting region


is only one; that is, when all the units are grouped together.
Since B is equal to zero when all areas are grouped into one
region and equal to one when each area is a region, the closer

B is to one the better the regional system performs in terms


of homogeneity.

It should be added that there are many other similarity


measures that have been used for grouping purposes.

The

reader is referred to Andenberg (1973), Hartigan (1974) and


Cormack (1 971 )

4.1.4 Regionalization Constraints

In addition to homogeneity there are other constraints that


are sometimes imposed on regionalizations.

Among the criteria

that have been used for both districting and region building
are:

. Equality

of population

2. Contiguity

3. Compactness

4. Preservation of political or
administrative boundaries

5. Region boundaries should follow


geographic features such as rivers
and mountains (~horesson,p. 237).

Even though all of these constraints are in essence spatial,


the contiguity constraint is particularly interesting for this
work since it is strongly related to the notion of
geographical neighborhood.

Contiguity

Homogeneity, as understood in classification analysis, means


either similarity or nearness.

Regional homogeneity, however,

refers to both similarity and geographical nearness.

There are two manners in which the contiguity constraint can


be interpreted:
a) When the units are of areal type, a
region is contiguous if for any two
units, a1 and a2, that belong to it,
it is possible to travel from a1 to

a2 through a path wholly contained


in the region.

In mathematical

terms this is called a connected set.


b) In some other instances the need for
contiguity does not necessarily
imply a physical border-to-border
relation but simply a neighborhood
one as described in Chapter 2.

There has been some disagreement among geographers about the


necessity of imposing a contiguity constraint on a
regionalization.

In some instances, such as the definition of

administrative zones or electoral districts, there can be no


doubt concerning the need for such a constraint.

However, the

requirement is less clear in the use of grouping for research


purposes.

There are two basic reasons researchers carry out


regionalizations: as an exploratory tool or to test a
hypothesis.

However, it should be remembered that there is an

important difference between using the grouping itself to test


a hypothesis and testing the hypothesis of whether there are
clusters or not.

Classification or cluster analysis can only

be used as such in the first case.

Two approaches to the building of uniform regions are

possible:
a) A classification without a contiguity
constraint (called a typification) is undertaken,
_--

followed by the mapping of results.


b) A classification with a contiguity
constraint is undertaken.

According to Byfulgien and Nordgard (1974) these two


,approaches are "not necessarily conflicting but
complementary."
position.

However, this is not a universally accepted

For example, Johnston (1970) argues that

Itregionalizationwith contiguity constraints over-simplifies


and operates against efficient hypothesis testing."

arises in the interpretation of results. The


f'
product o f a typification is a set of groups that satisfy a
The problem

"2---___---

condition of homogeneity, but are not necessarily contiguous.


These results have a value in themselves.

However, when the

resulting groups are mapped, the units are implicitly


classified by another variable, that of location.

Sets of

units that belong to the same group and are contiguous seem to
form regions.

However, it can not be ascertained that these

newly formed regions satisfy the same homogeneity condition as


the original grouping.

4.1 . 5 The Number of Regions

In a regionalization problem the main task is to find a


grouping of the units that "best1' satisfies the needs of the
analyst.

Therefore, the first idea that comes to mind is to

select from all the possible groupings the one that best
satisfies the constraints.

Cliff and Haggett (1970) have

,looked into some combinatorial aspects of the regionalization


problem.

They were able to calculate the number of different

aggregations to form "m" regions given "nu areas without a


contiguity constraint.

lU =

n! (gl!

.... gj!)

-A

17 fi!
i=1
where gj is the number of regions which comprise j units, fi
is the number of areas combined to form region i and the
summation is over all m element partitions of n.

For example,

if the number of units is four ( n = 4 ) and the number of regions


is two ( m = 2 ) , the m element partitions of n are two; ( 3 , l ) and
(2,2).

They also calculated the number for the case where the areas
are under a strong contiguity constraint, i.e. when they form
a chain.

As Cliff and Haggett (1970, p.288) nave shown, in both cases


the number is too large to permit approaching the
regionalization problem by exhausting the possibilities.

This

is the reason why it has become necessary to design heuristic


algorithms to find solutions which approximate the "best"
,result.

When a classification approach is undertaken the number of


regions to which "n" areas should be aggregated often has to
be defined by the analyst. As will be seen in the following
section, hierarchical methods group "n" units in any number
between one and m.

It is the task of the analyst to decide on


C

the "best" level of aggregation.

In non-hierarchical methods

the number of seeds determines the number of regions.

In

other instances, such as some of the algorithms described in


section 4.3, the number of regions is a result of the
algorithm.

4.1 .6 The Algorithms

Once a similarity measure as well as the constraints for the


grouping have been established, it is necessary, in order to
actually obtain the regions, to define a procedure by which

the areas are to be clustered.

The existing methods can be

classified as hierarchical or non-hierarchical.

Hierarchical Methods

In a hierarchical method the starting point is a set of "n"


data units, and it ends with the universal region in which all

--

the units are grouped in one region.


-

1-

,
.
-

/ -

--

..

In some cases the

.procedure is divisive because it starts from the universal


region.

What is common to the hierarchies, whether they are

divided --or- grouped is that they remain as such throughout the


entire remaining process.

An easy way to visualize this process is by neans of tree


diagrams as shown in figure 4.1.

Each node represents a

region, and the stages of the procedure are shown in the axis
below the tree.

In the first step the two most similar units

are merged, and the number of regions (or units) left is


reduced to 'In-1".
11

n- iII

After the ith step the number of regions is

The process involves 'In-1 I' steps.

According to Andenberg (1973, p.132) there are three major


-

hierarchical clustering
methods: linkage, centroid and
_-- --

variance.

Briefly, in the single linkage method, clusters are

merged using the shortest distance (similarity) between their


elements as a criterion.

In the centroid method, the

D i s t a n c e Between Groups
Dendogram

Figure 4.1

similarity between clusters is given by the similarity between


their means.

Finally, in the Ward method the clusters that

produce the minimum increase in the total within-group error


sum of squares (as defined in section 4.1.3) are merged.
Specific hierarchical methods used in geographical studies
will be presented in the next section.

Non-hierarchical Methods

The difference between hierarchical and nonhierarchical


methods is that in the latter two units that belong to the
same region, at any stage of the process, do- not necessarily
..---

remain joined.

In fact, nonhierarchical methods are based on

the assumption that given an initial partition of the units,


subsequent improvements
-- -are feasible.

Usually the first step


L

is the selection
of a set of units called seeds.
-

An initial

partition is defined by joining each unit to its most similar


seed.

In the following stages each new partition is defined

by taking the previous one as a point of departure.

The

process ends when the llbest'lpartition is found.

4.2 Spatial Algorithms

The algorithms that geographers have used for regionalization


purposes can be divided into two groups.

The first is

composed of those shared with other disciplines, such as the

Ward and Singe linkage methods.

The second group is composed

of those algorithms that include specific spatial constraints


such as contiguity and compactness.

The first group has been extensively described in the


literature (see Andenberg, 1973, Cormack, 1971, Hartigan,
1975) and will not be discussed in further detail here.
second group is of more interest.

The

It will be referred to

,throughout this study as "spatial algorithms."

From a methodological point of view we can distinguish three


types of spatial algorithms:

a) those in which the contiguity of groups can only


be assured by checking if the units have a common border;
b) those that use the notion of neighborhood to identify
contiguous groups;
c) those in which contiguity is assured together
with other constraints imposed on the resulting
regions.

For comparative purposes some "typical" algorithms of the


first two types will be described.

The third type of spatial

algorithms which is not described here, uses techniques such


as integer programming and is usually applied to districting
problems (~arfinkel,1970).

4.2.1

Byfulgien and Nordgard Algorithm

The following method was originally introduced by McQuitty and


later transformed into a spatial algorithm by Byfulgien and
Nordgard ( 1 9 7 3 ) .

This is an example of a hierarchical

algorithm of the first type.

The similarity measure is the

Euclidean distance, and the clustering criterion is of the


,single linkage type.

The number of resulting regions is

determined by the algorithm.

The main characteristic of the

resulting regions is that "all basic units have their most


similar contiguous unit within the same region."

Byfulgien and Nordgard applied this method in eastern Norway


to agricultural data and concluded that it can produce regions
L

with very dissimilar units.

This is because the condition

required to add a unit to a region is its similarity to just


one of the other units of the region.

The Algorithm

Def. 1 . "A" is the set of "nu areal units in


which the area of study is subdivided.
That is,

...ani.

{al,

Def. 2. Dij is the distance between areal


units ai and aj.

D e f . 3. M i s a nxn m a t r i x t h a t c o n t a i n s a l l t h e
d i s t a n c e s between a r e a l u n i t s . T h a t i s
mij=Dij;i,j=l,.

.. n

S t e p 1 . F i n d t h e two most similar a r e a l u n i t s .


L e t a i and a j be t h e s e two u n i t s .
S t e p 2 . Check if t h e a r e a l u n i t s a i and a j
have a b o r d e r i n common.

If they a r e

c o n t i g u o u s c o n t i n u e w i t h S t e p 3. O t h e r w i s e
l e t D i j be a " l a r g e number" and c o n t i n u e
with Step 1 .
S t e p 3. Merge u n i t s a i and a j t o f o r m r e g i o n R.
S t e p 4 . L e t N be t h e s e t of a r e a l u n i t s t h a t h a v e
a b o r d e r i n common w i t h R .

For e a c h e l e m e n t

o f N c h e c k i f i t i s c l o s e r t o one o f t h e
u n i t s t h a t b e l o n g t o R t h a n t o any o t h e r
of i t s contiguous u n i t s .
S t e p 5. L e t F be t h e s e t of u n i t s t h a t s a t i s f y
t h e c o n d i t i o n s t a t e d i n s t e p 4.
no s u c h u n i t , i . e .

If t h e r e is

F = 0, c o n t i n u e t o S t e p 7 .

S t e p 6 . Form a new r e g i o n m e r g i n g r e g i o n R and


t h e p r e v i o u s l y d e f i n e d s e t F.
R U F and c a l l i t R.

That is, t a k e

C o n t i n u e w i t h S t e p 4.

S t e p 7. Region R i s one o f t h e r e s u l t i n g r e g i o n s .
S t e p 8. Take t h e s e t of a r e a l u n i t s t h a t do n o t
b e l o n g t o any r e g i o n and o b t a i n i t s

distance matrix M.
Step 9. If all areal units belong to a region,
then end the grouping process.

Otherwise

repeat the procedure starting from step 1 .

4.2.2 Berry's Algorithm

Berry (1961) modified a central agglomerative procedure to


.include a contiguity constraint and used it for an economic
regionalization.

Most of the central agglomerative procedures

follow the general scheme given by Andenberg (1 973, p. 133).


The modified or "spatial" general procedure is as follows:
Def. 1

"A" is the set of "n" areal units in


which the area of study is subdivided.
That is A = (al, a2,

Def. 2

...,

an1

Dij is the distance between areal


units ai and aj

Def. 3

M' is a
mij =

n X n

matrix where:

Di j if ai and aj are contiguous;


co
otherwise
(or a very large number);

In this case the matrix M' reflects not only the similarity
between regions but also their contiguity
relation.
-___-.

__.

Step 1 . Begin with n areal units.


Step 2. Search the matrix MI for the two most

similar pairs of contiguous regions.


Let the chosen regions or units
be labeled ai and aj.
Step 3. Reduce the number of regions (or units)
by one merging regions ai and aj.

Label

the resulting region ai and update matrix

M' to reflect the similarities between ai


and all other existing regions (or units).
Delete the row and column of PIt that
corresponds to region (or unit) aj.
Step 4. Perform Steps 2 and 3 a total of
( n - 1 ) times.

Both areal and point units can be used with this type of
algorithm.

The range of similarity measures and clustering

criteria that can be incorporated into this algorithm is the


same as that of a non-spatial central agglomerative procedure.
.Other constraints such as compactness can be included without
changing the basic structure of the algorithm.

One of the disadvantages of this type of algorithm is that


sometimes it produces a
chaining
---- - process.
-

This occurs when in

every stage a unit is merged to the same (or to a few)


region(s).

The result is that at any stage there are a few

"larget'regions together with ungrouped units,

In this type of algorithm the user decides on the number of


resulting regions, since the procedure starts with n regions
and ends with one.

4.2.3 Lankford Algorithm

An example of an algorithm that uses the notion of


neighborhood to identify regions is described by Lankford

(1 969)

This algorithm was not specifically developed for

cases where there are contiguity constraints and the units


themselves are not necessarily spatial.

Given the set of variables which represent the attributes with


which the units are characterized, it is assumed that they are
immersed in a m-dimensional orthogonal space.

A similarity

measure designed to detect zones of "high density" in the


m-dimensional space is introduced.

The Algorithm

Def.

Let the density ~ ( a )for areal unit "a" be defined


as follows:

w(4

=(I

14

I
XEN( a)

-1
d( a,x)

where n is the number of neighbors of unit "a" 9 d( a , d

denotes the Euclidean distance between units "a" and "x"


and ~ ( a )is the neighborhood of 'la."

This measure was designed to detect the high density zones.

unit i'mmersed in a dense zone has a high I1W" value associated.

Def.

Let f(a,b) be the association between units ''a" and " b " :

This expression resembles the one for gravitational


attraction.

The following definitions are necessary to extend the concepts


previously given to the case of units already grouped.

Def.

Two groups G and H are called neighbors

if there is an element g in G such that


its neighborhood

~ ( g )intersects H

i.e. ~ ( g n)H # 0 .

Def.

The interface I(G,H) between two groups G


and H is the subset of elements of G (or H)
such that its neighborhood intersects H (or G).

Def.

The association between neighbor groups G


and H is the average of the associations
between all pairs (g,h) in the interface:

where m is the number of pairs in the


interface, and (g,h)

I(G,H).

The algorithm itself is a central agglomerative procedure


using "f" as the measure of similarity.
applied in a two-dimensional space.

This procedure was

Apparently, no attempt

has been made to use it in a more general case.

4.2.4 Brantingham Algorithm

Brantingham (1978) has presented an algorithm that uses


topological concepts not only intuitively but in a formal
sense as well.

The procedure is designed in such a way that

at each stage it is decided whether two units should be


separated by a border.

In this sense it differs radically

from previously described algorithms, in which the main


decision leads to the grouping of units.

City blocks were used as basic units in the regionalization.


The similarity measure is based on the difference of absolute

values, and it is a multivariate grouping.

The Algorithm

Def. 1 "At1is the set of "n" areal units in


which the area of study is subdivided.
That is A = {al, a2,

...,

an1

Def. 2 Let ~ ( a j )be the set of units


contiguous to aj.
Def. 3 Let fi(aj) be the value of the ith
variable associated to aj.
Def. 4 A basis of a topology T in X is a
subcollection B of T such that every open
set U in T is a union of some open sets
in B (HU, 1964, p. 17).
Def. 5 A basis set Bi of the topology is the set
of all contiguous units such that the
interunit variation of the variable of
interest is less than some fixed percentage.

Step 1 . Fix a maximum percentage of interunit


variation, call it b.
Step 2. Let ak be an element of the basis Bi
(i=l the first time), where ak is an
arbitrary unit.
Step 3 . For each element aj in Bi (which has not

been t e s t e d b e f o r e ) p e r f o r m s t e p s 4 t o 7 .
S t e p 4. For e a c h e l e m e n t a j i n B i c h e c k among t h e
n e i g h b o r s o f a j t h a t do n o t b e l o n g t o B i ,
i f t h e y e x c e e d t h e maximum p e r c e n t a g e
of i n t e r u n i t v a r i a t i o n with respect
That is:

t o aj.

If(aj)

where a j

<

<

f(ai)l

>

Bi, ai

1.

max [ b f ( a i ) , b f ( a j ) 1

~ ( a j ) a, i $ B i and

Call F t h e s e t o f u n i t s t h a t

e x c e e d t h e maximum p e r c e n t a g e of i n t e r n a l
v a r i a t i o n and I t h e complement.

That is:

S t e p 5. D r a w a b o r d e r between e v e r y e l e m e n t
o f F and a j .
S t e p 6. Those u n i t s t h a t do n o t e x c e e d t h e
i n t e r u n i t v a r i a t i o n are added t o B i .
R e d e f i n e B i as B i U I .
S t e p 7 . If t h e r e a r e e l e m e n t s of B i t h a t have
n o t been t e s t e d f o r t h e p e r c e n t a g e o f
i n t e r n a l v a r i a t i o n , c o n t i n u e w i t h s t e p 3.
S t e p 8. The r e s u l t i n g s e t B i i s a b a s i s s e t .
S t e p 9. If t h e r e a r e s t i l l u n i t s t h a t do n o t

belong to any basis set, then choose


any arbitrary element of this set and
continue to step 2 to create a new basis
set.

In this case the basis sets are interpreted as the resulting


regions.

If the pattern that appears from the above procedure

is not satisfactory, a new value for the internal variation is


-fixed and the procedure is repeated.

The basis sets are

defined so that the newly defined variation is smaller than


the previous variation used.

This guarantees that once a

border is drawn between two units it will remain as such


through the whole process.

4.3 The Design of Algorithms

From the previous sections it can be seen how cluster analysis


techniques have been used for regionalization purposes.

In

some cases the techniques have been applied without any


modifications, while in others they have been adapted for
spatial analysis purposes.

Considering that one of the interpretations of the


heterogeneity index is as a measure of the degree of
membership of a point to a set, it seems natural to use it in
a region-building process.

In this case the heterogeneity index is assumed to be an


indicator of the potential of an element to belong to the
interior of a region.

The closer the value of the index to 1 ,

the higher the potential of the element to be an interior


point.

As is common in a regionalization problem, it is assumed that


>the elements under consideration are clustered into
homogeneous groups.

That is, every element is either an

interior point of a homogeneous region or is on its border.

4.3.1 A Topological Algorithm

The next objective is to define a procedure that integrates


the heterogeneity index and the grouping process.

hierarchical method is proposed in which the clustering


criterion is determined by the heterogeneity index.

The

algorithm is designed so that the order in which the elements


are grouped is determined by the index.

Intuitively, those

elements that have a higher potential to be in the interior of


a region should be clustered before the ones that have a high
potential to be on the borders.

The proposed algorithm is

described below:

Def. 1 "At' is the set of Itnu areal units in which

the area of study is subdivided. That is:

A = {al, a2,
Def.2

...,

ani.

A unit that does not belong to any region


is called an "elementary unit."

Def. 3'A region is the union of at least two


/elementar$ units.
Def. 4 A unit is an entity that originally was
elementary but now constitutes part of a
region.
Def. 5 N(ai) is the set of neighbors of entity "ai".
The possible entities are: region, unit
and elementary unit.
Def. 6 The heterogeneity index associated with
a region is similar to the one defined
for elementary units as follows:

where Xij is the value of the jth variable


for the "k" neighboring entities of the region
and Xrj is the value of the variable
associated with the region.

The manner

in which these quantities are calculated


depends on the problem at hand.

Step 1. Select the elementary unit with the highest

value in the heterogeneity index.


Call it ai.
Step 2. Search for the most similar neighbor to ai.
Call it aj.
Step 3. If aj is an elementary unit then group
ai and aj.

Call the region

Rk.

Step 4. If aj is a unit then group ai and the


region that contains aj.

Note that ai

and aj are no longer elementary units.


Step 5. If there are more elementary units left,
return to step 1 ; otherwise continue with
step 6.
Step 6. If the number of resulting regions is less
than desired then finish the procedure.
Otherwise continue with step 7.
Step 7. Rename the regions as elementary units.
Establish the neighborhooa relations for
the new elementary units.
Step 8. Start again with step 1 .

General Characteristics of the Algorithm

This type of algorithm can be used when the units of interest


are either areas such as census tracts, counties and provinces
of a country or points such as airports in a transportation
network.

The selection of the similarity measure depends on

the problem at hand, but it has to be consistent with the


definition of the heterogeneity index.

That is, if a

neighborhood N "tends" to be in the interior of a region, it


should also be homogeneous with respect to the similarity
measure used.

The heterogeneity index can be redefined to consider this


restriction as follows: Assume S to be the function of
,similarity between any two elements.

The heterogeneity index

associated to the neighborhood of unit "a" is of the form:

where s ( x ~ , x ~ )is a measure of similarity between the value


associated to the ith neighbor and unit "a" and k is the

number of neighbors.

This algorithm was designed to assure that the resulting


regions satisfied a contiguity constraint.

However, other

constraints such as that of compactness could be introduced


without altering the basic structure of the algorithm.

The

algorithm is hierarchical, but it differs basically from


others in its clustering criterion.

In this case there are

two clustering criteria: a spatial and a non-spatial one.

The

heterogeneity index determines the order in which the grouping


is going to take place, while the similarity measure

determines which units are to be grouped.

Any of the several

criteria used in hierarchical algorithms as described in


section 4.1 .6 could be adapted.

The number of resulting regions obtained by this type of


algorithm is not the same as in a typical hierarchical one.
The number of regions in each stage is determined by the data.
While in typical cases the analyst can choose any number
-desired between 1 and n, in this case it can only be selected
from the results.

This constraint can actually be an

advantage, if the analyst has no way of anticipating the


number of regions there may be.

4.3.2 The Regions as Graphs

The structure of the units in a clustering problem can be


viewed as a graph (Andenberg, 1973, p.150).

The nodes of the

graph are the units themselves, and the lengths of the edges
are given by the similarity between the units.
complete since all its nodes are adjacent.

The graph is

The single linkage

method finds the minimal spanning tree; that is, the shortest
tree with (n-1 ) edges that connects all the nodes (~ndenberg,
1973, p.150).

In a regionalization problem the structure of

the data units can also be viewed as a graph.


nodes of the graph represent the units.

Again, the

There is an edge

between two nodes if the units are neighbors, and the length

of the edges is given by the similarity measure.

This graph

is in fact a subgraph of the complete graph G 1 generated in a


classification problem without a contiguity constraint.

When a single linkage criterion is used in the proposed


algorithm, the resulting regions obtained in a first stage
generate a set of graphs where the nodes represent the
elements that belong to each region and the edges are defined
.through the clustering criterion. Each of them is a subgraph
of G I .

Moreover, each one of these subgraphs is a minimal

spanning tree of the subgraph of G I formed by the nodes that


belong to the regions together with their neighboring
relations.

If the algorithm is repeated until all the units are grouped


b

in a single region, the resulting graph will also be a minimal


spanning tree of G I .

Both the topological algorithm and the original single linkage


method will generate a minimal spanning tree.

The basic

difference is that at any given stage the order in which the


units are grouped is not necessarily the same. A hypothetical
case that exemplifies this is presented in the following
paragraphs.

A Hypothetical Example

To illustrate the issues discussed in this section concerning


the concept of minimal spanning tree, a brief example is
presented.

The purpose of the exercise is to regionalize the

18 areal units shown in figure 4.2.

The areas were grouped

according to two different algorithms:

a single linkage

method with a contiguity constraint and the topological one


presented in section 4.3.1. In both cases two areas are said
to be contiguous if they have at least one segment in common.
The similarity between two areas is given by the absolute
difference.

That is:

dij = IXi

~jl

where Xi is the value associated to the ith area.

Figure 4.3 shows the resulting regions obtained after the


first stage of the topological algorithm.

Each one of the

subgraphs (one for each region) shown in figure 4.4 is a


minimal spanning tree.

In the second stage of the procedure

all the areal units are grouped into one region.

The minimal

spanning tree generated is shown in figure 4.5.

The dendogram generated by the usual linkage method is shown


in figure 4.6.

Since the procedure is hierarchical, in the

final stage all the units are grouped into one region.

The

minimal spanning tree generated by this procedure is shown in

O r i g i n a l Data
Figure 4.2

~ e s u l t i n gR e g i o n s ( F i r s t S t a g e )
Figure 4.3

Subgraphs and Minimal Spanning T r e e s


Figure 4.4

Minimal Spanning T r e e (Second S t a g e )


Figure 4.5

Dendogram
F i g u r e 4.6

Minimal Spanning T r e e . S i n g l e Linkage Method


F i g u r e 4.7

figure 4.7.

As can be appreciated by comparing graph G t (figure 4.8) to


the graphs formed after the first stage of the topological
algorithm (figure 4.4), each one of the latter is a subgraph
of GI.

Analogously, the minimal spanning tree (figure 4.5) is

also a subgraph of G I .

Finally, it should be noted that the

minimal spanning trees generated by the two algorithms


(topological and single linkage) are not necessarily the same.

4.3.3 Heterogeneous Regions

Like homogeneity, heterogeneity may also be used as a


constraint in the definition of regions.

In some instances

researchers seek to identify groups of elements that are


heterogeneous and spatially clustered.

In these cases

measures of dissimilari,ty and the heterogeneity index can be


used in the design of algorithms, just as similarity and the
homogeneity index were used in the cases mentioned previously.

4.3.4 Fuzzy Regions

In section 4.3.1 a hierarchical algorithm incorporating the


concept of heterogeneity index was proposed.

The index was

used as an indicator of the order in which the units were to


be merged.

There are other ways in which the index can be

used in the design of algorithms.

In this section an

alternative hierarchical algorithm which uses the notion of


fuzzy sets is proposed.

The heterogeneity index associated with the neighborhood of


unit "a" is interpreted as the degree of membership of the
unit to a hypothetical region.

When the decision to group two

units is made, it becomes natural to ask what the degree of


membership of the resulting group is to the hypothetical
region.

Intuitively, the two units that have the higher

degree of membership to a region are the ones that should be


clustered first.

The concept of fuzzy set makes it possible

to assign a value to the degree of membership of the set of


two units to a region.

Given two contiguous units ai and aj , their corresponding


neighborhoods ~ ( a i )and ~ ( a j )and associated heterogeneity
indices Hai and Haj, the degree of membership of the set
{ai,aj] to a hypothetical region R can be defined as follows:

~ ( a i , a j )=

min { Hai,Haj

(4.18)

The above definition can be extended to the case where the


units are entities as defined in the preceding algorithm,
since both the heterogeneity index and the concept of
neighborhood have been defined for regions (see Def.6, section

The terms and assumptions under which the proposed algorithm


is de~cribedare exactly the same as the ones presented for
the preceding algorithm except for an additional matrix M1
defined as follows:

mij = H(ai,aj)

where ai and aj are either regions or elementary units as


defined in section 4.2.1.

Step 1

Find the two contiguous entities ai and aj


with the maximum value of H. That is,
find the maximum value in MI;
~ ( a i , a j )= max { ~(a,m,an) : for all pairs of
contiguous units]

Step 2. Group entities ai and aj and name it


ai (i<j).
Step 3. Update the ith row and column of matrix
M 1 , and delete the jth row and column

of matrix M I

Step 4 . If there are elementary units left in MI,


then start again with step 1 . Otherwise
the renaining elements of PI1 represent
the resulting regions.

This procedure,like the preceding algorithm, can be repeated


until all the units are clustered in one region.

Given the resulting regions R1,


associated to an element

ai

...Rk,

Ri

the heterogeneity index

can be interpreted in fuzzy

set terms as the degree of membership of "ailf to the interior


of Ri.

Its conplement (1-~ai)can be interpreted as the

degree of membership of "ai" to the border of Ri.

Therefore,

it can be said that the resulting groups are fuzzy regions.

As in many other areas of knowledge, in regionalization


studies, regions are defined so that for every element of the
universe of study it is clearly distinguishable whether or not
it belongs to the region.

There are however, certain

geographical problems that can benefit from a "fuzzy"


definition of the degree of membership of an element to a
region.

For example, consider an ecological study of an urban

area such as Mexico City.

Although there are no previous

studies of this type for this particular area, an obvious


characteristic of the city is its lack of clear-cut
differences in residential areas as well as in land use.

That

is, it is common to find "border areastfbetween urban


neighborhoods where middle and low income families or even
high and low income families dwell on contiguous pieces of
land.

Similarly, it is common to find areas with various

simultaneous uses.

Such is the case of zones that are

residential, providers of public services, educational,


medical and industrial.

In a study area with these characteristics a method that


incorporates tha definition of fuzzy regions seems to be the
most appropriate choice.

There are precedents for the application of the concept of


fuzzy set in the design of classification algorithms ( ~ u n n ,

1974). In the cases Dunn mentions, the result is a set of


regions where each of the elementary units has assigned a
degree of membership to each region.

On the other hand, the

proposed algorithm differs from previous ones, since the


degree of membership is assigned only to the interior and the
border of a given region.

4.3.5 The Heterogeneity Surface

In the design of the previously discussed classification


algorithms it was assumed that the units under study were
clustered in homogeneous or heterogeneous groups.

There are,

however, some instances where researchers do not know the


number of regions or have no previous information on the
spatial patterns of the data.

In these cases they can use the

heterogeneity index to increase their knowledge.

To illustrate the manner in which the heterogeneity index can


be applied in these situations, a heterogeneity "surfacet1is
defined as follows:

where p is a point inside the areal unit "aV1and Ha is its


associated index.

F is a step function; therefore its graph

is not a continuous surface.

However, for illustrative

purposes it can be said that a "basin" in

the graph is a set

of contiguous areal units that have a value of F close to


zero, and "ridges" are areal units with values close to one.

This graph can provide some significant information.

For

example, basins indicate the presence of the interior of a


region, and ridges indicate boundary points.

If it is assumed

that the elements are clustered into regions, then the graph
must be composed of basins surrounded by ridges.

In such

cases F can be used to estimate the number of resulting


regions.

F may also be used in cases where the area under study is


composed of both homogeneous and heterogeneous regions.
"highlands" in F indicate the presence of heterogeneous
regions in the same manner as "lowlands1' indicate the

The

existence of homogeneous ones.

The graph of F can therefore

be used in the identification of such regions.

In other cases the lack of pattern in the heterogeneity


surface might indicate an absence of regions in the study
area.

Finally, the possibility of using the heterogeneity surface in


multivariate cases is worth mentioning.

When the objective of

regionalization is to obtain homogeneous regions with respect


to more than one characteristic, the first question that
arises is whether it is possible to obtain homogeneous regions
with respect to all the variables simultaneously.

For each one of the variables involved, let Fi be the function


associated with the ith variable.

A quick look at a pair of

graphs can aid the analyst in deciding on the compatibility of


the variables.

If the two graphs show that basins and ridges

coincide, then it may be concluded that the variables are in


fact compatible.

However, if for one variable there "tend" to

be basins where there are ridges in the other, grouping the


areas using both variables simultaneously is intuitively
outruled

4.4 A Comparative Example


In section 4.3.1 a modified agglomerative single linkage
algorithm was presented.

As mentioned before, the main

difference between the usual single linkage method and the


topological algorithm resides on the order in which the units
are aggregated.

Besides this obvious difference there are

other ones that are derived from the use of a neighborhood


approach.

To exemplify these ideas as well as to test the

performance of the topological algorithm the method was


applied to a hypothetical case and the results were compared
with those of a contiguity-constrained [Link] method.

4.4.1

A Hypothetical Case

The hypothetical case is defined on a regular area subdivided


into 240 units as shown in figure 4.9. The classification of
the units is based on three variables and the values of each
---

-_ .

----

variable were assigned under the assumption that the area is


formed by homogeneous regions considering each of them
separated and simultaneously.

That is, if each one of the

variables were to be represented on a map, homogeneous regions


-- -.--- -

would be present.

Moreover i/f:the three above-mentioned maps


L

'0

were combined, the resulting mLp would also be formed by

y--\

homogeneous regions.

The values were assigned so that

variable "A" was used as a basis.

Therefore in the first

S t u d y Area

F i g u r e 4.9

stage the values of variable "A" were defined as shown in


figure 4.10.

The values of variable "B" were defined so that

the regions formed under- it, were sub-regions of variable "At1


--

(see figure 4.10). Finally, the values of variable " C "

were

assigned in such a way that its "border zonesw did not


-

necessarily coincide with those of variable "A1' and "B"

(see

figure 4.10).

One of the central issues in the application of a topological


algorithm is the definition of _the
neighborhood relationship.
The advantage of using a regular structure is the ease of the
1
-

implementation of the algorithm since the neighborhoods for


each of the areal units is explicitly defined.

In this case

it was decided to define the neighborhood of each of the units

as the set of bordering units


unit itself.
__ together
- -- with- the
.--

L/",
The areas that are connected to a unit solely by
a
point-weye
'
- -- -'------__I____

excluded to avoid regions linked through points.

Two regionalization algorithms were applied to this


hypothetical problem, the topological algorithm as described
single
linkage
in section 4.3.1 and a contiguity-constrained
--method that follows the general format as described in section
4.2.2.

Both algorithms are hierarchical and use the same

grouping criterion but differ in other aspects.

The

topological algorithm is based on the neighborhood approach


which is reflected in the algorithm through the inclusion of

the heterogeneity index concept,

while the other algorithm

follows the traditional approach were the area is assumed to


be formed by units rather than by geo-subspaces or
neighborhoods.

The Topological Algorithm

The first step in the application of the algorithm to the


,multivariate hypothetical case is the calculation of the
standardized heterogeneity index using equations 3.1 and 3.2.
In this case, the heterogeneity index can be interpreted as
the degree of membership of a unit to the interior of a
region.

The closer the value to one, the higher the degree of

membership to the region.

The nultivariate heterogeneity

index was calculated according to equation 3.7 as shown in


figure 4.11. Again, the closer the value of the index to
three, the higher the degree of membership of a unit to a
region formed according to the three variables.

The heterogeneity index plays a fundamental role at this stage


of the analysis in the definition of the "grouping pattern"

Assume for example, that one of the variables of interest is


the column-wise position of each of the units.

That is, all

the units that are positioned in the ith column have assigned
a value of "i"

The value of the heterogeneity index for an

interior unit is a constant.

That is, the degree of

?YN

RNn

n-n
n

JR* RR' RRn RR* In

n
p

ann

RR" RE* RRn Iqnn


qnm

j
I

oon
p+
an pen

4-N

--4

eon

I
r-ho mao
m m ~man

00 ooe 000 000' --N'


nnnn$a00- a
a DDN,
~ ~0 0 nno
DDN
~
ma^ a a ~

&%

RR" RRn RR" IJRWJ #g; egg CCA g g l

YN*

000 ---Inno
440 mad o ~ bb4
o
~
000 0 0
~0 ~ ; o . o o
. ~0 ~
0 0 m
~ m ~

PO0

am-

am0 0001
m m ~
DDN 0

@--

000 NN- 4.~


~ D N
0 0 DPN,
~
I

membership of all these units is exactly the same, so that


there is no grouping pattern. Since the algorithm is designed
h
to be applied to problems were a grouping pattern exists, it
is not advisable to include in the classification a variable
such as the one previously described.

An additional criterion was added to the algorithm to solve


,

cases $ere
.criterion.

two or more neighbors satisfy the grouping

If two or more neighbors

are at the same distance

from a unit, then the one selected to be grouped is the one


that satisfies the following conditions: a) it already belongs
to a region and b) is the first clock-wise neighbor.

First [Link] the assumptions described above, the algorithm was


applied obtaining as a result the regions shown in figure

4.12.

It should be remembered that the number of resulting

regions in this type of algorithm is part of the results of


the procedure.

That is, contrary to a usual hierarchical

method were the analyst has to decide on the number of


resulting regions, in this method the number of regions is
determine through the heterogeneity index.

In this

application, the number of resulting regions from the first


stage is 69.

Since the heterogeneity index can be interpreted as the degree

~ 7 4 5
z. -745

b.-m

2 . 9 6 z n b.97e5-7 ~ ~ - 7 72.

L-W~ZL

2.89788 I

wuz
2.99942

12.
-1

2.99927

.---

2.53200 2. 51481 2.43710

--

2.92278 2.83761 2.87944 2.93498 2.98228

1.91099 1.8630s 2.81 863 2.80607

Multivariate Heterogeneity Index


Figure 4.11

1. -86

1. 63267 2.90956 2.87536

2.99942 2.99754

2.92566 2.89006 2.96121 2.96569 2.96729 2.97041 2.93610 2.92443 2.98476

1.89710 1.92563 2.94773 2.93507

2.99931 2.99351 2.95626 2.95078 2.94992 2.9391 1 2.98766 2.99443 2.97251 2.96884 2.96552

2.99323 2.98184

2.96917 2.97826 2.97857 2.99095 2.99704 2.99717 2.95035 2.93235 2.92650 1.83545 1.86301 2.96179 2.94942

2.99205 2.98594 2.97121 2.97403 2.98296 2.98937 2.99828 2.91608 2.93420 2.96326 2.94641 2.07171 2.04480 2.92359 2.94992

2. WE72 2.99359 2.97395 2.9748 2.99449 2.99504

--- --

2.99280 2.98688 2.98221 2.98477 2.99701 2.99054 2.88824 2.78146 2.74992 2.87012 2.14283 l . O 9 S 2 1-9B858 2.84692 2.8JJ26

2.97523 2.97353 2.96763 2.98091 2.99858 2.98971 2.99345 2.88852 2.87742 2.16598 1.33096 2.06219 2.99445 2.91229 2.89345

2.51554 2.66889 2.77035 2.83696 2.85768 2.8746% 2.88158 2.B5855 2.94399 2.11660 2.16231 2.99014 2.99315 2.91947 2.89324

2.53655 2.68348 2.77979 2.83432 2.85817 2.88076 2.87798 2.85703 2.85940 2.12424 2.15898 2.98934 2.83376 2.78679 2.84397

2.99215 2.97969 2.97972 2.99150 2.98413 2.97997 2.97772 2.98998 2.98345 2.13682 2.13592 2.79511 2.64348 2.89447 2.99852

2.43725 2.56250 2.53685 2.52348 2.50633 2.48775 2.13393 2.12977 2.12356 1.25496 2.08064 2.68560 2.68607 2.88705 2.89523

-------

1.83096 1.75556 1.72901 I.67413 1.56857 1.35714 1.32716 1.59684 1.57936 2. Xi812 2.24-3

2.19650 2.13822 2.06361 2.22015 2.19525 2.44881 2.43569 2.39907 2.35194 2.65297 2.61768 2.53981

2.00000 2.24904 2.19554

.------1.44510

2.99904

2.99618 2.90724 2.89350 2.99258 2.97901 2.95149 2.95926 2.99121 2. 995% 2. 99890 2.99979

2.99809 2.99809 2.92712 2.92308 2.99552 2.96496 2.96645 2.96676 2.99370 2.99861 12.99985 2.99981

z.

2.99745 2.99904 2. -904

2.99745 2.99698 2.99713

99969 Z . W B ~ 2~ 99979

L..

of membership of a unit to a region,

the result of the

classification procedure is not restricted to the definition


of each of the regions but, allows the analyst to gain
information on the "interiorityu

of a unit.

the resulting regions are fuzzy.

For example, both units 19

and 48 belong to the same region.

In this sense,

However, according to their

heterogeneity indices, unit 19 has a higher degree of


membership to the region than unit 48. To ease the reading of
,this measure, the multivariate heterogeneity index was
standardized as shown in figure 4.13.

In it, the degree of

membership of each of the units to the defined regions is


clearly appreciated.

The closer the value to one, the higher

the degree of membership i.e. the more interior to the region


is the unit.

Analogously, the closer the value of the unit to

zero, the lower the degree of membership i.e. the unit is


characterized nore as a "border" element.

It should be noted

that Itborder" units are not necessarily always in the actual


border of a region.

Second Stage.-

It is possible to iterate this type of algorithm in order to


obtain a smaller number of regions, as described in section

4.3.1. The 69 resulting regions from the first stage, were


re-named, values for the three variables were assigned to the
new elementary units and a neighborhood relationship was
established.

This task can be accomplished in several ways.

95.33
95.33
20

98.5
98.5
20.5
.97901

-97366
84
84
21

92.17
92.17
20.83

88.5
75
19

-91674

87.5
75
17

87.5
72
16.5

-81364

87.67
71.67
18

43.83
35.17
9.16

is

.98016

98
98
20.5

98.5
98.5
26.5

-96234
-96796

-25409

96.5
96.5
28.5
98.4
98.4
23.8

94
94
22
96.5
96.5
24.5
94.5
94.5
20.5
-97028

-65001

97
97
24
-97436

92
123.5
21

-30825

23.8

15.5
15.5
.WE93
3 0 ~
i
15

-4567

-92332

91.5
91.5
24

-61488

90
90
31.33

.65922

92.75
92.75
21

33
1.5
29.5

94
132
21.5

.8643

.921S3

75.83
75.83
27.5

-61096

-78788

33
1
28

33
-5
32

I
-33

33
-74369
-42103

33
2.66
3.66

33
1.5
4

-7092

Topological Algorithm ( F i r s t Stage)


V a r i a b l e s A , - B 'and C ; s t a n d a r d i z e d H e t e r e g e n e i t y Index
F i g u r e 4.12

-974

87.5
87.5
21

.99538

-88689

93
93
20.33

.96763

-92025

92.5
92.75
20. 25

-92493

90.5
89.5
90.5
89.5
~ 3 . 5 31.25
~

-81043

B9.B
89-29
20

88
88
30.25

.&618

-44289

59
49.67
12.33

-89277

-70x9

-78555

.59155

-87287

79.38
79.38
18.88

89.5
89.5
32.5

.as054

-30055

87.25
73.5
19.25

-99363

33
33
32.5

-72772

87
74
17.33

-92335

-19267

-91915

86
86
33-67

85.33
85.33
30

33
4

-80426

.49-

33

33
33
3.09

19
13.5
22.

-091JS

-24415

.!34051

18.71
18.71
25.71

-93423

.80829

-88359

18.5
11-63
22

-2781

18.5
10.5
28

17.25
17.25
25.25

.93683

17
17
16

33.n
1
3
33.33
1.33
22.33

17.67
12.33
17.33

.94114

17.67
17.67
16.67

-95036

18.8
18.8
16

I n t h i s c a s e , t h e c h o s e n p r o c e d u r e w a s t o a s s i g n t o t h e new
u n i t s t h e mean v a l u e of t h e e l e m e n t a r y u n i t s t h a t b e l o n g e d t o

it.

T h i s p r o c e d u r e was r e p e a t e d f o r t h e t h r e e v a r i a b l e s as

shown i n f i g u r e 4 . 1 2 .

The t o p o l o g i c a l a l g o r i t h m was a p p l i e d t o t h e 69 new u n i t s and

as a r e s u l t 21 r e g i o n s were formed as shown i n f i g u r e 4 . 1 4 .


Again, i n t h i s c a s e , t h e h e t e r o g e n e i t y index a s s o c i a t e d t o
l e a c h e l e m e n t a r y u n i t c a n be i n t e r p r e t e d as t h e d e g r e e of
membership t o t h e r e g i o n .

F o r e x a m p l e , t h e u n i t s formed w i t h

o r i g i n a l u n i t s { 2 , 3 , 4 , 5 , 181 and i 1 7 , 1 9 , 2 0 , 31, 3 2 , 33,


3 4 , 3 5 , 4 8 , 49, 581 b e l o n g t o t h e same r e g i o n however, i t c a n
be s t a t e d t h a t t h e f i r s t u n i t i s more a n i n t e r i o r e l e m e n t of
t h e r e g i o n t h a n t h e s e c o n d one as c a n be a p p r e c i a t e d from t h e
h e t e r o g e n e i t y i n d e x v a l u e s as shown i n f i g u r e 4 . 1 2 .

T h e r e a r e o t h e r manners i n which t h e v a l u e s of t h e v a r i a b l e s
and t h e n e i g h b o r h o o d r e l a t i o n s h i p c a n be d e f i n e d .

For

e x a m p l e , i n s t e a d o f u s i n g t h e mean v a l u e , t h e minimum, o r t h e
maximum o r a w e i g h t e d mean c o u l d have been c o n s i d e r e d and even
t h e o r i g i n a l v a l u e s of t h e e l e m e n t a r y u n i t s c o u l d have been
preserved.

I n t h i s l a s t c a s e , t h e n e i g h b o r s of a r e g i o n c o u l d

be d e f i n e d as t h e s e t of o r i g i n a l e l e m e n t a r y u n i t s t h a t have a
b o r d e r i n common w i t h i t , and t h e d i s t a n c e between t h e r e g i o n s
c o u l d be c a l c u l a t e d c o n s i d e r i n g t h e e l e m e n t a r y u n i t s i n t h e
b o r d e r of e a c h r e g i o n .

F i g u r e 4.14

T o p o l o g i c a l Algorithm (Second S t a g e ) . 2 1 R e s u l t i n g Regions

The Single Linkage Method

To illustrate the difference between the results obtained by


using the topological algorithm with other existing methods, a
contiguity-constrained single linkage procedure was applied to
the same problem.

Since the main purpose was to evaluate the

differences due to the inclusion of the heterogeneity index,


,the similarity measure used was also the euclidean distance
and the contiguity relations and grouping criterion were
preserved.

The single linkage method is a hierarchical procedure where


given "nttelementary units there are "n-i" regions in the ith
step.

To have results comparable to those obtained via the

topological algorithm, the procedure was stopped at the 171th


and 219 steps.

The 69 and 21 resulting regions are shown in

figure 4.15 and 4.16 respectively.

A Comparison

Comparison of regionalization algorithms can be undertaken


focussing on different aspects and at various levels.

Murtagh

( 1 9 8 5 ) for example, compares contiguity-constrained algorithms

under two main aspects, their computational performance and


the differences derived from the inclusion of a contiguity

2-R 22: 222 2% 22: 2% ZZR %ti 22: 2% 5% 52R

aNR 222 22s 22R 22: 5% 25:


RNn I n n 3-n

por

p-

R--

9-0
9

k?

22R

$2&

;R!

onn

22fi

NwN

060 00m &en mmn oon mm-r n n ~

--N

4-N

--IN

--N

00N

00N

PDN

gg: gg-

rg~
0

~ b R-r
n

qNK !?!!I
2% 2% 5% 22: F% SSR 668 $$A SSR 6% .,-I
o
M

a,
p?;

R-N

p r

ENn Rn-

gm-

8.; $8:: i : I ( i F i r C6.S FFRI I I R Stir1 F S 4 t : d PPPI

GO+

R N ~
pnr

00N

00N

no-1 ooN ogo -4-1 nno nno r r o nno ooo ooo nn


n n r e n r n amn c m 0 m N P I N 00* 0 D N 0 0 N I m-d

RE*

RRn

RRn RRn RR'

000
00-

RR"

nn

nnn
~
nn

RR" RE'

RRn

--00N

RE"

nnNl
nnn

nnn nna nnN


nn
nn
nnn

R V 33'

**I

8rnN

1 1

e
mm
rn m
mm
mnn mm-

mt.4

$ ne
be

00N

'

I I Z ~ ~

nno
00N

ma0 000
man D D N

n n r RRn Rpn
nn

R p RE+

690 nno oeo

oor
00n

RR'

--0

000
0
~

NNV
~P N

490 0

ah- e n r mnp
arm
ahr m
ah.., bnrl

Figure 4.16

S i n g l e Linkage Method. 2 1 R e s u l t i n g Regions

constraint.

The purpose of the comparison between the

topological algorithm and the single linkage one is to look


into the differences derived from applying a neighborhood
approach to the design of regionalization procedures.

Both algorithms are hierarchical and follow a single linkage


grouping criterion, however the order in which the units are
grouped is not necessarily the same, therefore the spatial

structures portrayed in the resulting regions are not


necessarily equal.

This fact points to a fundamental issue in the formalization


of regionalization procedures.

It was mentioned before that

one of the advantages of using a mathenatical framework in the


classification problems is that once an algorithm is
established, the solution becomes reproducible.

However, it

should also be considered tnat the design of the algorithm


depends on the analyst's knowledge and understanding of the
problem at hand.

Therefore, the resulting regionalization

depends on the assumptions made in the design of the algorithm


and the different regionalizations obtained from different
procedures can be explained under these considerations.

Comparison of the regionalizations obtained by both algorithms


shows that different aspects of the spatial structure of the
data emerges from the two procedures.

While the topological

algorithm regions have a tendency to be small and compact and


there are no isolated elementary units, the single linkage
method tends to produce larger regions as well as single-unit
regions.

On the other hand, while in the single linkage

method' the regions are well-defined, the resulting groups from


the topological algorithm are "fuzzy regions".

Finally, it should be remembered that since both algorithms


,are hierarchical it is possible to continue the process until
all the units are grouped into one single region.

However, it

was considered for this particular example, that the


comparison of results at the two

presented stages satisfied

the purpose of the exercise and therefore no further stages


d

were implemented.

4.4.2 A Topological Ward Algorithm

Two of the most commonly used algorithms in regionalization


problems are the contiguity-constrained Single Linkage and
Ward Methods.

The Ward Method is a hierarchical procedure

where the grouping criterion as described in section 4.1.3 is


defined so that the increase in the within group variance as
defined in equation 4.6 is minimized.

Similarly, as the

neighborhood approach was applied to a single linkage method,


it is possible to include the heterogeneity index concept in
the design of a Ward-type algorithm.

The algorithm is similar to the one proposed in section 4.3.1


except that in this case step 2 has to be modified as follows:

Step 2. Search for the neighbor of ai such that


the increment of the error sum of squares
as defined in equation 4.7, is the smallest.
Call it aj.

Again the main difference between the usual


contiguity-constrained Ward method and the topological one, is
the order in which the units are grouped.

4.4.3 Conclusions
Iri

There are many classification algorithms that can be applied


for regionalization purposes and the selection of the
appropriate procedure depends on the knowledge and
understanding the analyst has on the problem at hand.

This

knowledge is reflected in every decision made regarding the


different elements that are involved in the design of the
algorithm.

For example, in the case of the two algoritnms

that were compared, the single linkage and the topological,


the introduction of the heterogeneity index had a significant
effect in the results.
the exercise are:

The conclusions that can be drawn froa

the algorithms were designed to discover

different aspects of the spatial structure and the single


linkage method is a "sensitive model1' to changes in the
grouping order.

Besides the abovementioned, it should be

added that the different manners in which the two algorithms


group the units clearly points to the importance of an
adequate selection of parameters.

Although the most evident

border lines are identified by both procedures, the final


geometric patterns are clearly different.

If the analyst is

&interested in avoiding single-area units and regions


dissimilar in size, then a topological algorithm would be an
appropriate approach in a similar problem as the one presented
here.

Moreover, if the analyst needs to measure the degree of

interiority within a region, a topological algorithm would


have to be applied since it is the onlj existing procedure
that provides this type of information.

To better understand the issues involved in the use of a


classification scheme with regionalization purposes, it should
be remembered that the design of a regionalization algorithm
can be viewed as a modeling process.

In the case of the

neighborhood approach the point of departure is the intuitive


notion that certain aspects of the geographical landscape can
be adequately represented through the topological concept of
neighborhood.

These notions are formalized through the

heterogeneity indices and applied in the design of algorithms.


In this case the form of the algorithms is intimately related

with that of a more general model, where the main elements of


study of the spatial structure are geo-subspaces.

It can

therefore be stated that the application of a topological


algorithm is of interest where the Galton's component is an
important factor in the analysis of the problem.

Chapter 5.

AN APPLICATION TO EDUCATIONAL PLANNING

5.1 Introduction

In general, the heterogeneity index may be interpreted as a


measure of the local variation of the geographical landscape.
In particular, the study of the heterogeneity of geo-subspaces
may be used as an aid to solve planning problems.

In this

chapter an application of a neighborhood nodel in an


educational planning environment is presented.

Background

This application is part of a major project of the Mexican


government to provide planning agencies with cartographic
products and technical support in all aspects related to the
geographic information required for their activities.

The 25 million student education system was selected for two

reasons.

First it is one of the current Mexican

administration's highest priorities.

Second, geographic

information has not yet been systematically applied to


educational planning in Mexico.

One of the main concerns of Mexican educational planners is


the location of present and future school services.

In the

past, the decisions made by the government regarding the


>location of schools have not included spatial criteria.
However, a geographic information system for educational
planning purposes is currently being developed by the Ministry
of Education.

It is expected that 1986 will be the first year

when the spatial criteria is incorporated into the decision


making process.

Some methods to solve school location problems have been


developed by the World Bank, the International Institute for
Educational Planning.

However, spatial models and methods

such as those presented in this thesis are believed to improve


the solutions provided by previously used methods.

5.2 School Location Planning

Many different models and methods have been developed for


educational planning purposes.

Most of them have been applied

to regional or national planning, but little emphasis has been

given to spatial aspects.

For example, models and methods for

projecting school enrollments and manpower requirements are


often presented at a national level (Davis, 1980), although
maps and some spatial criteria have been included for local
planning purposes.

School location planning and area planning are the names that
several authors ( ~ a v i sand Schefelbein, 1980, Gould, 1978)
have given to the set of administrative policies, models and
methods that are used "to plan the distribution, size and
spacing of schools" ( ~ o u l d ,1978, p.2).

According to Davis and Schefelbein ( 1 9 3 0 ) , the basic purposes


of educational planning for areas are:
-To assess the outreach, or coverage, and
distribution of educational services to
population in areas within a nation state.

- To compare the coverage between and among the


areas, usually on the basis of the percentages
of the relevant population receiving service.
-To compare the coverage of the area with
national norms, standards or plan targets.
-To inventory facilities and resources
allocated to programs in the areas.
-To plan the provision of educational services
so as to expand coverage, enhance equity in the
coverage, and to improve the efficiency and
effectiveness of educational services in the
areas.

5.2.1 Models and Methods

Maps are the spatial models that are most commonly used in
area planning.

Usually, various indicators are mapped to

study their spatial distribution.

Which indicators or variables are represented on a map depends


on the depth of the analysis.

Indicators and variables can be

.classified into three major groups:

1 ) basic indicators such

as population, enrollments and school services;

2) efficiency

and effectiveness indicators such as the enrollment ratios,


percentage of enrollees who graduate; and 3) complementary
variables such as topography, highways and trails, and
potential usage of soil.

The first group of inventory indicators aids the planner in


gaining knowledge of the spatial location of educational
services.

The second group allows the planner to make

comparisons among areas.

The third group of complementary

variables provides information necessary to understand the


spatial behavior of the previous two.

Another group of

indicators and spatial variables provides guidelines for the


location of new school services.

Examples are threshold

population density and range measurements.

Threshold

population density represents the minimum total population


necessary for establishing a school, and range is the maximum

distance children are expected to travel to school ( ~ o u l d ,

1978)

5.2.2 Administrative Policies

In addition to the specific procedures developed for school


location planning, strong emphasis has been given to the
administrative policies that would lead to successful
,implementation of a plan.

Gould (1978) gives a detailed description of the role of both


central authorities and local officials according to the World
Bank's guidelines for school location.

Central authorities, through their national ministries, are


expected to do the following: provide norms for the sizes and
costs of schools and classrooms; establish construction
standards;

ensure that adequate data is compiled for area

diagnosis; administer the allocation of resources among the


various regions; and analyse spatial patterns of the services
schools provide.

Local officials are expected to apply the

norms established by the ministries and to provide the data


required by the central authorities.

School location planning has often been applied in Third World


countries.

UNESCO and the International Institute for

Educational Planning, for example, have undertaken studies of


school location planning in Costa Rica (~allak,1975), Sri
Lanka (~urugeand Ariyadasa, 1976) and Uganda ( ~ o u l d ,1 9 7 3 ) .

5.3 Educational Planning in Mexico

5.3.1 Historical Background

In order to understand the relevance of school location


planning in the overall context of Mexican educational
planning, historical perspective is important.

The

promulgation of the 1917 Constitution and the Lopez Mateos


Eleven-Year Plan are two events of this century considered by
experts as crucial in the development of Mexican education

When the 1910 Mexican Revolution ended in 191 7, a new


Constitution was prornulgatea.

Article 3 stipulated that

education be compulsory, secular and free for all Mexicans.


More than 40 years later, during the administration of Adolfo
Lopez Mateos (1958-64), an eleven-year plan was developed and
partially carried out.

The main goal of the plan was to completely satisfy the demand
for elementary education throughout the country.
this there were several major programs.

To achieve

First, there was

massive construction of classrooms. Between 1958 and 1964,

21,000 classrooms were built at the rate of


every two hours.. . I t (Solana et al. 1981 )

"... one

classroom

Second, a National

Commission in charge of publishing free textbooks for all


elementary school students was organized.

In addition,

important curricular changes were carried out, and

Finally,

special attention was given to programs for the in-service


training of elementary school teachers (Solana et al., 1981).

,Like Lopez Mateos virtually all post-Revolutionary Mexican


administrations have dedicated considerable human and
financial resources to elementary education.

This certainly

does not mean that other levels of education have been


abandoned. However, the main targets have been the lower
levels.

5.3.2 Planning Experiences

Before 1970, an educational planning agency did not exist


within the Mexican government.

It was not until the

administration of Luis Echeverria (1970-76) that an


organization for educational planning was formally established
within the Ministry of Education (secretaria de ~ducacibn
P6blica).

As would be expected, one of the main goals of the planning


agency

was to assure every school-age child access to

elementary education."

For this purpose various quantitative analyses were


undertaken, and in some cases sophisticated mathematical
models were used to predict, among other things, the flow of
students through the lower levels of education.

In order to use quantitative techniques it was necessary to


.develop an information system.

This system would contain

reliable, up-to-date data on the number of students and


teachers at the various educational levels as well as data
describing the schools1 physical resources.

During the following administration of President

J O S ~~

6 ~ e z

Portillo (1976-82), the general tendencies in educational


planning remained the same.

It has only been in the last

three years that planners have started looking more closely


into the quality of education.

This is probably a natural

consequence of the considerable progress the country has made


in quantitative terms. (see Figure 5.1).

As Figure 5.1 shows, the number of schools has increased

............................................................

The Mexican educational system comprises various levels.


Among them are elementary, secondary and preparatory levels.
Children are expected to enter elementary school at the age of
6 or 7 and remain there for six years. The following two
levels are secondary and preparatory with a duration of three
years each. There are different types of secondary and
preparatory schools (technical, general, etc).

ELEMEt4TARY

LEUEL

BOOT

NUMBER

Y E k F : ' = O F S C H O O L S ?N HUt4DEEDS

ELEMENTARY L E V E L

N U M B E R OF

" E A R S

STUDENTSI t4
F i g u r e 5.1

MILLI~HS

throughout the century.

The early demand for elementary

schools was so great that establishing one almost anywhere was


beneficial to the local community and to the country.

Now

however, the precise location of a school has become


particularly important.

Today's density of schools obliges

the planner to make a detailed and accurate study of the


geographic distribution of resources before making decisions.

,Another factor which has become important is the need to


ensure the coordinated growth of the educational levels.
There is no point in building a secondary school where there
is an insufficient flow of students from the elementary
schools or if migration affects the school-age population
significantly.

Geographic information is essential to permit

rational planning of these educational issues.

The use of geographic information in educational planning in


Mexico has not been systematic, but there is increasing
awareness of the need to support educational planning with
spatial analysis methods.

5.4 A Case Study of the State of San Luis Potosi

As mentioned in the previous section, the Mexican government's


investment in education has been mainly directed to the
elementary level.

However, the government has dedicated

considerable attention to the development of the secondary


system during the last six years, especially in the state of
San Luis Potosi.

As can be observed in figures 5.2 and 5.3,

the distribution of both elementary and secondary schools


throughout the area is reasonably uniform;

that is, the

growth of both systems has apparently been coordinated.

This

pattern is not maintained at the next level of education.

The

number of preparatory schools capable of receiving the flow of


-secondary graduates is evidently insufficient, as can be seen
in figure 5.4.

The Central Planning Office has become aware

of this problem and has decided to alleviate it through the


allocation of additional resources.

In this section a neighborhood model is proposed as an aid in


the selection of sites for the location of preparatory schools
and in school location planning in general.

Although the

technique is presented within a specific context, an analysis


of the different alternatives for the allocation of additional
resources (such as the establishment of a new school) is not
undertaken and no particular solutions are proposed.

However,

in the presentation of the model, some possible


interpretations are indicated to point out potential uses of
this tool in an educational planning environment.

Figure 5 . 2

Distribution of elementary schools


in the state of San Luis Potosi.

Figure 5 . 3

Distribution of secondary schools


in the s t a t e of Sari Luis Potosi.

Fiyre 5.4

Distribution of preparatory schools


in t h e state of San Luis Potosl.

5.4.1 The Data

Originally the state of San Luis Potosi had been selected as


the area of study on the basis of the availability of data.
Studies at the local level require disaggregated data.

At a

settlement level, census data has only been processed for a


few areas, among them San Luis Potosi.

In order to test the proposed model in a reasonable period of


time only a portion of the study area was selected.

This area

includes eight counties in the northern part of San Luis


Potosi as shown on figure 5.5.

This area is interesting

because of its apparently "heterogeneous landscape".

The two main sources of information were the Ministry of


b

Education and the National Institute of Statistics, Geography


and Informatics.

The Ministry has established a nationwide

information system with detailed data on its own human and


material resources as well as on the country's students.
Every elementary, secondary and preparatory school in the
country is registered, and information is stored on the number
of enrollees per group; the total number of groups, teachers
and classrooms; the estimated capacity and location of
schools; the number of students that have graduated, passed to
the following grade, failed or dropped out.

Figure 5 . 5

S t a t e of San L u i s P o t o s i

The National Institute of Statistics, Geography and


Informatics is in charge of the national census and of the
production of diverse cartographic products at a national
level.

5.4.2 The Geo-Space

A set of entities of interest to the present study are the


,settlements inside the selected area that have a secondary
and/or a preparatory school.

The flow of graduates among

these entities is assumed to depend on their spatial


relationship.

In this case the spatial factor considered

decisive was the distance a student has to travel to attend


school ("traveling distance").

Graphs were the mathematical models that were considered


appropriate to represent both the entities and their
relationships.

Each node in the graph represents a

settlement, and the links between them are defined through the
spatial relation of traveling distance.

The Traveling Distance Network

In order to determine the traveling distance between any two


settlements of the geo-space, a network was defined through
the existing communication network.

Although the feasibility

of traveling from one town to another depends on the physical


characteristics of the terrain, it was assumed that the
highway and trail system could provide enough information to
compensate for these differences.

The network was defined with the aid of topographic maps


(scale 1:250,000) as follows:

a link was established between

any two settlements whenever they were connected by any type


,of road without passing through a third settlement.

The

resulting network is shown on figure 5.6.

According to the definition of the network, each link


represents a road connecting two settlements.

Since the type

of road is an important factor to be considered in terms of


the ease of transportation, each road was divided in at most
b

two representative sections.

For example, the road between

two settlements could be composed of a section of highway


together with a section of trail. Two weights at most were
attached to each link according to the type of road of its
sections.
are:

The five types of roads together with their weights

paved (I), unpaved (1.5), trail (2), unpaved road or

trail in a mountainous area (3) and footpath (4). These


weights were determined by a reliable informant familiar with
the area of study so the values assigned to the roads are, to
some extent, subjective.

The traveling distance between two settlements is calculated


using the network as follows:
Let ti and tj be two settlements that are connected through a
link and the distance between them,
Dij

wik dik

Dij is:

wkj dkj

where wik, wkj and dik, dkj are the two weights and distances
attached to the link between settlements ti and tj.

If ti and tj are not linked, the traveling distance is


calculated by the sum of distances of the shortest path
between ti and tj.

5.4..3 The Geo-Subspaces

From a spatial point of view the flow of secondary school


graduates between settlements is one of the relevant factors
to be considered in the allocation of new schools.

The

subsets of settlements that have the possibility of


interacting through the flow of students are therefore part of
the geo-subspaces of interest in the present problem.

With

this idea in mind a neighborhood of each of the nodes was


defined through the traveling distance.

For a given maximum traveling distance dm, the neighborhood of


the node

ti) is the set of nodes where the traveling

distance to ti is less or equal to dm.

ti)

{ tj ; Dij

<

dm

The maximum traveling distance can be fixed at different


values.

Consequently, a network can be associated with each

one of them as follows:

the nodes of the network are the set

of settlements of the geo-space, and a link exists between any


two of them if they are neighbors.

In order to test different maximum traveling distances a small


computer system that allows the user to postulate different
distances and to plot the resulting networks was implemented.
Figures 5.7, 5.8, 5.9 and 5.10 show the results obtained by
testing different values for the maximum traveling distance
for the study area of San Luis Potosi.

Each of the networks shows a different spatial pattern


according to the fixed distance.

For example, for the 25 km

threshold, most of the settlements are clustered in two large


networks and only a few of them appear isolated.

In contrast,

in the 10 km case most of the settlements are isolated .The


percentage of settlements without neighbors for each of the
distances considered are: 6 0 $ , 26.42$,

13.57% and 5.71% for

the 10, 15, 20 and 25 km networks respectively.

These

observations allow the planner to better understand the


spatial dispersion of the units in the area.

Besides the measures commonly used in area planning, (number


of students per classroom, enrollment ratios, etc), some of
the specific characteristics of these networks can be included
as criteria for the location of schools.

The percentage of

isolated units and the number of neighbors of each node are


indicators of the degree of communication of each of the
settlements.

Table 5.1 shows the values of these quantities

,associated with each settlement and for the different


distances.

The settlements are identified in the table by

their code, as shown in table 5.3.

Catchment Areas

The catchment area associated to a school is simply the area


served by it.

In defining a catchment area factors such as

transportation facilities and terrain are considered.

In this

case each of the networks can be used in the definition of


these areas.

Since the interaction relationships are

explicitly represented through the set of links in this case,


each of the networks can be used in the definition of the
catchment areas.

Figure 5.11

shows the catchment areas of

each of the existing preparatory schools, assuming a maximum


traveling distance of 20 km.

The lack of service at the preparatory level was quantified

1OKM
15KM
20KM
25KM

CODE

1OEM
15KM
2
O
25KM

CODE

15KM
ZOKM
25KM

lOKM

CODE

1OKM
15KM
20KM
25KM

CODE

1OKM
15KM
20KM
25KM

CODE

1OKM
15KM
ZOKM
25KM

CODE

1OKM
15KM
20KM
25KM

CODE

425
0
0
0
1

405
0
0
1
5

303
0
1
2
5

202
1
2
4
6

102
1
1
2
2

606
0
1
3
5

607
0
0
0
2

104
8
8
8
8

426
0
1
2
2

406
0
1
1
2

304
0
1
1
1

608
0
0
1
2

409
0
0
0
0

307
1
1
4
6

206
2
2
3
3

106
0
0
0
0

609
0
0
1
2

108
0
0
0
0

410
0
1
1
2

308
0
0
0
1
411
0
1
3
4

309
0
0
0
0

207 208
0
3
1
5
2
7
5 1 1

107
3
3
3
3

412
1
1
3
4

310
0
1
1
1

209
0
1
1
2

109
1
1
1
2

701
1
1
1
4

702
0
3
5
1

703
0
2
4
0

504
1
6
1
15

414
0
2
2
3

312
0
0
1
4

211
1
2
3
6

111
0
0
0
0

TABLE

505
3
3
0
10

415
0
0
1
1

313
0
1
2
4

212
0
2
5
7

112
1
1
4
5

506
2
5
7
14

416
0
0
2
2

314
0
0
1
2

213
0
1
1
5

113
1
3
3
3

0
0
0
0

0
1
1
1
418
0
2
3
3

0
1
2
4

316

215
0
0
2
' 4

115
0
2
4
6

419
0
0
1
2

1
1
3
7

317

216
1
1
2
4

116
1
2
4
4
218
1
3
5
6

118
0
0
0
2

704 705 706


1
0
1
2
0
2
4
0
3
6
5
1
3

5.1

707
0
0
0
1

708
1
1
2
3

219
2
3
3
6

119
0
3
4
6

420
0
0
2
2

421
0
1
1
1

422
0
0
0
0

318 401 402


0
0
0
0
1
0
1
4
1
2
4
2

217
2
3
4
5

117
0
0
0
1

709
0
0
1
2

710 711 712


0
0
0
1
1
2
2
2
2
2
2
3
4

605
0
0
0
3

513
0
2
S
10

423
0
1
2
3

403
0
1
3
3

301
2
3
7
7

120
0
2
5
5

713 801 802


1
3
1
2
4
3
2
8
6
1
1
1
1

604
0
0
2
2

507 508 509 510 511 512


0
1
1
1
0
2
2
4
3
4
0
4
9
3
7
7
6
2
6
7
9
8
12
3
9

417
0
0
3
3

315

214
1
3
4
4

114
1
1
2
4

522 523 524 523 526 527 528 601 602 603
1
0
0
0
1
1
1
1
0
0
0
2
1
2
2
2
1
1
2
1
0
1
4
5
4
4
3
5
1
1
4
2
2
1
8
4
5
7
2
2
6
4
3
1

503
2
6
7
12

413
0
0
1
1

311
0
1
1
1

210
1
2
2
2

110
0
1
4
5

NUMBER OF NEIGHBORS BY DISTANCE

610
0
1
1
1

520 521
0
2
0
3
2
1
0
7
9

428 429 430 501 502


0
0
0
3
2
3
0
1
9
3
3
0
1
1
4
4
4
1
2
21
9

408
0
2
4
5

306
0
0
0
0

205
1
4
8
9

105
0
0
0
1

517 518 519


2
0
1
2
1
1
4
1
3
7
1 1
2 1 0

427
0
0
0
0

407
0
2
2
2

305
1
1
1
1

203 204
1
2
3
3
4
7
7 1 1

103
0
0
0
2

514 515 516


2
0
2
6
2
7
K
M
1
0
14
5 1 6

424
0
1
2

404
1
1
3
4

302
0
2
3
5

201
1
5
9
14

101
1
2
3
3

using the number of secondary graduates, the capacity of each


of the preparatory schools and the catchment areas.

A measure

of the coverage of each of the settlements with preparatory


schools was calculated as follows:
k

xi)

Xj

Xi)

Xj

Cj

i=1
Lj =
k
(

i=l
where Lj is the lack of service associated with the jth
settlement, Xi is the number of secondary school graduates the
ith settlement and a fixed year, k is the number of neighbors
of the jth settlement and Cj is the capacity of its
preparatory schools.

Table 5.2 shows the results obtained

using a traveling distance of 20 km and 1984 data.

In this

case the lack of service for Charcas is a negative number.


This means that there is a surplus in the service for this
particular settlement.

Settlement

Ci

Li

Charcas

C 1 =392

L1=-0 0481

-4.81 $

Mat ehuala

C2=1242

L2= 0.1573

15-73 %

Table 5.2 Service of preparatory


schools for a 20 km traveling distance
( 1 984 data)

5.4.4 Some Spatial Characteristics of the Demand for


Preparatory Schools

One of the factors that is considered crucial in the location


of preparatory school services is the spatial distribution of
the demand.

In this case the study of the demand was carried

out using the number of secondary school graduates for three


'consecutive years (1983-85) (see table 5.3).

It was

additionally assumed that all graduates demand preparatory


school services, and that there is no flow of students from
neighboring counties outside the study area.

Interaction Spaces

Besides the local demand generated by a settlement, the


possible flow of secondary graduates among settlements is
another of the issues that has to be considered in the
location of a preparatory school.

In this case the space

where a flow of students to attend a preparatory school is


expected, is delimited by the neighborhood of each of the
settlements.

This space is called interaction space.

measure of the expected intensity of interaction among


settlements in a geo-subspace is given by the heterogeneity
index of the neighborhood.

For the study area of San Luis

Potosi, a univariate heterogeneity index (equation. 3.2) was

SETTLEMENT

CODE

Real d e C a t o r c e
L a s Ad j u n t a s
A l a m i t o s d e 10s D i a z
La C a n a d a
Cardoncita
Cas t a R o n
Los C a t o r c e
Guadalupe d e l C a r n i c e r o
La Maroma
E l Mastranto
P o t r e r o No.1
R a n c h i t o de Coronados
E l S a l t o y Anexos
San Antonio de Coronados
~ S a nJ o s e d e C o r o n a d o s
S a n t a Cruz de C a r r e t a s
S a n t a Maria d e l R e f u g i o
Tanque d e D o l o r e s
Vigas d e C o r o n a d o
Wadley
Cedral
E l Blanco
C e r r o de las F l o r e s
La C r u z
Cuare j o
Hidalgo
J e s u s Maria
Lagunillas
P a l o Blanco
P r e s a Verde
R e f u g i o de las Monjas
El Saladito
San I s i d r o
San Lorenzo
San Pablo
S a n t a R i t a de S o t o l
T a n q u e Nuevo
Zamarr i p a
Progreso
Char c a s
A l v a r o Obregon
Caiiada V e r d e
E l Capulin
E l Cedazo

101
102
103
104
105
106
107
108
109
110
111
112

113
114
115
116
117
118
119
120
20 1
202
20 3
204
205
206
207
208
209
21 0
21 1
21 2
21 3
21 4
21 5
21 6
21 7
21 8
21 9
30 1
30 2
303
304
305

Number o f S e c o n d a r y G r a d u a t e s
per settlement.
Table 5.3

Emiliano Zapata
F r a n c i s c o I . Madero
Guadalupe V i c t o r i a
La B o r c i l l a
Lo d e A c o s t a
Miguel Hidalgo
N o r i a d e C e r r o Gordo
Pocitos
Presa Santa Gertrudis
San R a f a e l
El Terrero
Vicente Guerrero
La Z a p a t i l l a
Guadalcazar
Amoles
Buenavista
,Charco Blanco
Charco Cercado
El Praile
La Hincada
Huisache
IYilagro de Guadalupe
Negritas
Noria d e l Refugio
NuHez
Peyote
La P o l v o r a
Potreritos
P o z a s d e S a n t a Ana
Pozo d e Acufia
P r e s a de Guadalupe
P r e s a de T e p e t a t e
Quelital
Reale j o
San Antonio de T r o j e s
San F r a n c i s c o d e l T u l i l l o
San I g n a c i o
San J o s e de C e r v a n t e s
S a n Rafael d e 10s N i e t o s
S a n t a R i t a d e l Rocio
S a n t o Domingo
Ventana
La R o s i t a
llatehuala
A r r o y i t o d e Agua
La B o n i t a
La Cabra
La Caja
La C a r b o n e r a
Table 5.3 (cont.)

E l Carmen
Concepcion
E n c a r n a c i o n de Abajo
E s t a n q u e de Agua Buena
Guerrero
E l Mezquite
Pastoriza
Los P o c i t o s
Pozo de S a n t a Clara
Rancho Nuevo
Sacramento
San A n t o n i o de l a s B a r r a n c a s
San A n t o n i o de 10s C a s t i l l o
San F r a n c i s c o C a l e r o s
San J o s e de l a Viuda
San J o s e de 10s G u a j e s
, S a n Miguel
S a n t a Cruz
Santa Lucia
Tanque C o l o r a d o
E l Vaquero
Los Cinco Sefiores
Vanegas
E l Gallo
Huertecillas
La P u n t a
E l Salado
San J u a n de Vanegas
San V i c e n t e
Tanque de Lopez
E l Tepetate
Zaragoza
V i l l a de Guadalupe
Biznaga
Guadalupi t o
L l a n o de J e s u s Maria
La Masita
La P r e s i t a
P u e r t o de Magdalenas
Rancho A l e g r e
San B a r t o l o
San F r a n c i s c o
Santa Isabel
Santa Teresa
Z a r a g o z a de S o l i s
La Paz
San A n t o n i o de l a s T r o j e s

Table 5.3 ( cont )

calculated using the number of secondary school graduates for


each settlement and the neighborhood relation established in
the 20 km network.

As can be observed in table 5.4, the pattern of the demand is


very similar in all three cases.

The spatial distribution for

1983 is shown in figure 5.12.

Two areas are distinguished by

"less homogeneous" subspaces.

These are the area surrounding

.Matehuala, the largest settlement inside the study area, and a


smaller area around Charcas, the second most important urban
center.

The heterogeneity index therefore indicates that the

geo-subspaces that form the area around Matehuala and Charcas


are characterized by relative heterogeneity in the demand.
The standardized heterogeneity index associated with each
settlement shows that the city of Matehualals variation is
much more significant than that of the rest of the
settlements.

The settlement of Guerrero is the only other one

where the value of the index is smaller than 0.5.

In fact, the values associated with most of these settlements


is close to 1 which means that the geo-subspaces associated to
them are homogeneous in comparison with Matehualals.

In those areas that are formed by heterogeneous geo-subspaces,


greater interaction among the settlements can be expected than
in areas where homogeneity prevails.

Thus, the impact of the

location of a school on a settlement immersed in a


heterogeneous environment should be analysed in greater
detail.

For example, the settlement of Guerrero is in the

catchment area of two settlements with very different demands:


Matehuala and La Presita.

In this case, before a decision is

taken regarding the location of a school, several alternatives


related to the possible flow of students have to be analysed.
For example, Matehuala could become a point of attraction for
the secondary school graduates of Guerrero.

On the other

hand, the location of a school in La Presita could satisfy the


demand of Guerrero and avoid the overcrowding of Matehuala.

In summary, the measure of the local variation of interaction


spaces allows the planner to identify the degree of expected
interaction in settlements.

This aids in the delimitation of

zones where "intense" interactions are expected.

Similarly,

the index can serve in the definition of zones formed by


geo-subspaces of homogeneous demand.

Finally, to test the impact of a change in the criterion of


maximum traveling distance on the spatial pattern of
variation, the heterogeneity index was calculated using the 1 5
and 25 km network for the year 1983. As can be seen in table
5.5, the pattern presented is similar to the one obtained for
the 20 km network.

There are however, differences in the

values associated with particular settlements.

This indicates

CODE
15KM
25KH

CODE
15KM
Z5KM

CODE
15KM
25KM

CODE
15KM
25KM

CODE
15KM
25KM

COX
15KM
25KH

CODE
15KM
25KH

CODE
15KM
25KH

CODE
15KH
25KU

COLE
15KM
25KM

CODE
15KM
25KH

CODE
15KM
25KM

CODE
15KM
25KM

CODE
15Y.M
25KM

TCIBLE

5.5

STCINDF\RDIZED HETEROGENEITY INDEX


15 AND 25 KU NETWORKS ( 1 9 8 3 )

that although no significant change should be expected in the


general interaction pattern if any of the three (15,20,25 km)
maximum traveling distances is taken as a threshold, in the
analysis of individual settlements attention has to be given
to the' heterogeneity values associated in each case.

A Temporal Analysis

,Up to this point the whole analysis has focussed on the state

of the educational system at a fixed point in time.

However,

the analysis of the evolution of the system is important both


to understand its present state and to evaluate the impact of
planning actions.

Temporal Stability

At this point local variation of temporal subspace was studied


insofar as it could be expected to indicate temporal
"stability" of the demand in each of the settlements.

A heterogeneity index similar to the spatial heterogeneity


index was used to calculate the temporal variation of the
demand for each settlement.
is defined as follows:

The temporal heterogeneity index

and

where k is the number of school years considered, X is the


mean value of the demand and Xi is the demand in the ith year.
,This index can be standardized in a similar manner to the
spatial case.

The index was applied to the study of the temporal stability


of the demand for a specific type of secondary schools
"[Link]

Three different types of secondary schools can be


distinguished in the school system: general, technical and
TV-secondary.

TV-secondary is the type of school that has

been established in most of the settlements in San Luis


Potosi.

In fact, the only places where there are technical

and general schools are the county seats.

TV-secondaries are

designed to serve communities where the size of the population


is too small to establish a regular school, and the settlement
can not be serviced by neighboring ones.

Televised classes

keep the number of teachers required to a minimum.

Besides the deficit of preparatory school service in those


settlements that have regular secondary schools, there is also
a lack of preparatory schools for the students graduating from
a TV-secondary in the area.

The number of students in each of

these schools is in the interval [1,50].

There are, however,

variations in the number of students from one school year to


the next.

The temporal heterogeneity index was used as a tool

, t o quantify this variation (see table 5.6).

With the aid of this index it is possible to identify those


settlements where temporal variation is significant.

The

value of the index can be interpreted as a measure of


"temporal stability" of demand for preparatory services.

For

planning purposes a preparatory school in a settlement or


catchment area which has an "unstable" demand is not
advisable.

Degree of temporal stability is a measure that has been


associated to each settlement as an isolated entity.

There

is, however, an interaction space related to each settlement


that must also be considered. The spatial distribution of
temporal stability as shown in figure 5.13
"heterogeneous pattern."

presents a

A heterogeneity index was applied

using the value of the temporal heterogeneity index as a


variable to quantify this spatial variation and the 20 km

network as a basis to define the neighborhoods (see table


5.7)

The value of the index associated with a settlement is that it


provides a measure of the spatial variation of its
neighborhood according to the temporal variation of the
settlements.

For example, values close to zero indicate that

the spatial neighborhood is "highlyttheterogeneous with


,respect to "temporal stability."

On the other hand, values

close to 1 indicate that the spatial neighborhood is "much


less" heterogeneous with respect to the "temporal stability"
of the settlements inside it.

The map in figure 5.14 shows the spatial distribution of the


spatial heterogeneity index of temporal stability.

Both maps
b

(figures 5.13 and 5.14) can be used to identify neighborhoods


where temporal stability is high and spatial heterogeneity is
low.

Assuming that the temporal trend is maintained, this

characteristic of a neighborhood indicates to the planner that


demand in the catchment area of a settlement where a school is
to be located will not have a large temporal variation.

5.4.5 Additional Considerations

Besides the use of the heterogeneity index as an aid in the


study of the spatial characteristics of demand,

this same

120
.I2227
210
.52401
301

xxx

31 1
-96943
403
.00436
413
.65502
423
-96943
503
-86462
513
-96943
523
-87772
605
-88209
705
-94325
802
.90829

119
-93013
209
.96943
219
.52401
310
-65502
402
-70742
412
-75109
422
. 8 1222
502
-75109
512
.87772
522
-75109
604
-35807
704
.98689
80 1

118
-88209
208
-91703
218
-96943
309
-89082
40 1
XXX

41 1
-0917
421
-0524
50 1
xxx

51 1
.59388
52 1
-79039
603
-72052
703
-89082
713
-97703

702
.94323
712
.72052

116
.982!!53
206
-83842
216
- 51528
307
-56331
317
-79039
409
-93013
419
-99563
429
-98689
509
-99563
519
.98689
60 1
70 1
XXX

71 1
-70742

610
-98689
710
-67248

609
.79039
709
-93013

608
.51528
708
-82969

ill
-82969

20 1

XXX

21 1
.56X 1

302
-5764 1

312
.73362

404
.41921

414
-96943

424
-87772

504
-83842

514
.a6462

524
-96943

606
- 91703

706
.ZOO87

CODE
INDEX

CODE
INDEX

CODE
INDEX

CODE
INDEX

CODE
INDEX

CODE
INDEX

CODE
INDEX

CODE
INDEX

CODE
INDEX

CODE
INDEX

CODE
INDEX

CODE
lNDEX

CODE
INDEX

TABLE 5 . 6

(EXCLUDING COUNTY SEATS)

STANDARDIZED TEMPORARAL HETEROGENEITY INDEX

XXX

110
-78602

101

XXX

109

-86462

108
-96943

106
.El7772

XXX

CODE
INDEX

::
9

DN

2s

?o

-.

DN
-m

2,

8
n~

Z?

gN O

Z
"N

R
2g

N.

n.

-m

8
n~ 2
89 N-

S
N:O;

w-

Nh

TO;

m
m
am
~

&

8-y

,, ,, ,, ,,
Fb

pNd

an

Sn

Nr.

NQ

N
QM

$q

NQ

4
QD

8,

E
m4h
$,

E:

~b

;y

n~

PC

m
@
mm

sp

-x

$2

m
~
mm
O

gq

S ~ a t i adistribution
l
of the temporal
heterogeneity index.
F i g u r e 5.13

Distribution of the spatial heterogeneity


index applied t o temporal stability.
Figure 5.14

measure can be used to study other aspects of educational


planning.

1.

Some examples follow:

Various indicators can be used to compare efficiency in

the settlements studied.

The rate of students graduating from

elementary and secondary schools for a fixed cohort may be an


indicator of the quality of the educational services.

,The heterogeneity index can be used to test the uniformity of


the services provided.

Settlements with high values indicate

anomalous conditions either superior or inferior to those of


surrounding settlements.

Homogeneous zones receive similar

services, while heterogeneous zones show disparate services.

2.

The number of inhabitants per school in the different age

groups and at different educational levels is an indicator of


the distribution of the resources among the areas.

The interpretation of the heterogeneity index in this case is


similar to the previous example.

Other variables that

indicate the amount of resources given to a settlement can


receive a similar treatment.

3. If two or more indicators of the equity or efficiency of


the system have been defined, a multivariate heterogeneity
index can aid the planner in identifying zones or settlements

by equity/inequity or efficiency/inefficiency measures.

4. Finally, a time series analysis of the evolution of school


services in such aspects as equity and efficiency can give the
planner important information in order to better understand
the present state of the educational service and to anticipate
future developments.

,5.5 Summary

The heterogeneity index as a measure of the degree of


membership of an areal unit to a region was applied in the
design of regionalization algorithms.

There are however other

possi-ble interpretations of a measure of local variation of a


geo-space.

Planning was selected for tne application of the


L

heterogeneity index because areas of "high contrast" are of


special interest for planners.

In particular in school

location planning various indicators are used in tne


characterization of the spatial distribution of human
resources and material assets of the school systems.

The

usual procedure in school location problems has two stages,


first a definition of measures of interest for the planner,
such as efficiency and effectiveness is done, and second, a
representation of these indicators is made in maps.

As

mentioned before, one of the main differences between the


neighborhood approach presented in this work and previous ones

is that while in the most traditional models the


representation of the geographical landscape is made through
isolated entities such as lines,points and areas, in
neighborhood models the basic units of study are
geo-subspaces

In the first part of the chapter


scenario" is established.

a "school location problem

The area of study is the northern

,part of the state of San Luis Potosi in central Mexico.

The

Mexican school system has reached a point where there is a


need to assure a coordinated growth among the different levels
of education.

Although since the 1970's a planning system was

established within the government, very little emphasis has


been done on the spatial aspects in the different school
systems models that have been implemented.

Currently, the

information system that supports the decision making is being


transformed into a geographic information system.

That is,

for the first time, location variables are being included into
the planning system at a national level.

The area of study is characterized by a significant secondary


school system growth, as a consequence there is a greater
demand on the higher levels of education.

The central

authorities are aware of this phenomena and have decided to


satisfy the demand establishing new schools.

It is a common practice to use various indicators as well as


their spatial distribution and catchment areas in the decision
process for the location of schools.

There are however

besides the above-mentioned spatial aspects of the school


location problem, other ones that have not been studied.

In

the second part of the chapter several indicators based on the


notion of "local variation" are proposed as tools in a further
study of the spatial characteristics of the problem.

However,

,it is important to mention that the basic goal is to present


the tools rather than specific solutions for the location of
new schools.

It is clear that in a problem of the complexity

of the one presented here, in order to reach the best feasible


solution it is necessary to include in the analysis social,
cultural, administrative and financial factors besides the
geographical aspects.

The geo-space of interest is defined as the set of settlements


inside the study area that have a secondary and/or a
preparatory school, together with the spatial relationships
that are relevant to the problem, the geo-subspaces are
defined as sub-sets of settlements.

The size and shape of the

geo-subspaces is not necessarily fixed.

It is in fact a

parameter that the planner can use at the decision making


stage.

Such is the case of the traveling distance, that can

be used in the definition of the "optimum" number of schools


to be located if there is a limitation on the number of

schools to be established,

since it allows the planner to

determine the size of the catchment areas.

For example, if it

is assumed that the "optimum" traveling distance for a


secondary graduate is less than 10km (figure 5.10) it is clear
that the number of schools is larger than if the optimum is
fixed at 25 km. (figure 5.7).

Two indices were applied in the study of the demand of


,preparatory schools in the area of interest.

In the first

case the heterogeneity of the geo-subspaces was interpreted as


a measure of the expected degree of interaction within a
catchment area.

Since the degree of interaction is a measure

of the flow of secondary graduates, this characteristic of the


geo-subspaces can be used as a factor in the analysis of the
impact of the location of new schools.

The second index is a

measure of the local variation of the temporal stability of


the demand.

In this case, the demand's stability or

unstability in a region can be used in similar studies.

In brief, the study of the local variation of the


geo-subspaces that form the area of study allows the planner
to include into the decision process, spatial aspects such as
the degree of communication of the settlements, the level of
interaction within a neighborhood and the local variation of
the temporal stability, that were not previously considered.

Chapter 6.

CONCLUSIONS

6.1 Summary

A common criticism of the mathematical models used in human


geography is that in many cases key features that are
considered essential for geographical analysis purposes are
not represented.

This limitation has been discussed with

regard to factor analytic models and statistical inference in


section 2.1 . I

(see also Haining, 1 9 8 3 ) .

Heighborhoods is one

of the geographical concepts that had received little


attention by modelers until the latter part of the
quantitative revolution.

Although there are several

mathematical models that that have been specifically designed


to represent geographical neighborhoods, there has been no
general attempt to establish an overall framework for the
development of these tools.

Therefore, as mentioned in the

introduction, the principal objective of this work has been to


present a general approach to the modeling of the notion of

geographical neighborhood as well as to develop mathematical


representations of it.

Three different levels of modeling are found in the thesis.


In the' first and most general, the notion of neighborhood
model has been introduced through the mathematical concepts of
space and subspace.

Second, the local variation of

geo-subspaces has been modeled through the heterogeneity


,index. Finally, two neighborhood models have been developed
and applied to specific situations:

the design of

regionalization algorithms and the definition of spatial


criteria for school location planning.

The first level of modeling is based on the intuitive notion


that subjacent to the geographical landscape there are spaces
b

and subspaces.

In the development of mathematical theory,

spaces and subspaces play a fundamental role.

Since the mathematical concepts of space and subspace satisfy


certain requirements that make them appropriate as elements of
representation of the geographical notion of neighborhood, two
corresponding quasi-mathematical structures, geo-spaces and
geo-subspaces, have been introduced to further the objective
of establishing a general framework for the design of
neighborhood models.

A general approach to the design of neighborhood models has


been discussed, and tools have been developed to the study of
geo-subspaces.

Although other existing techniques such as

autocorrelation and geostatistics have incorporated


neighb'orhoods in models with predictive purposes, the present
research is based on the assumption that the concept of
geographical neighborhood has not been fully modeled
previously.

With this idea in mind a measure of "local variation" of a


geo-subspace has been defined.

Although the measure itself

resembles that of statistical variance, in this case the


treatment is non-inferential.

In fact, formalization has been

carried out in topological rather than in statistical terms.

Applications of topological concepts in spatial analysis are


found in several branches of geography including geomorphology
(~ark,1977), geographic information systems ( ~ u t t o n ,1968) and
transportation ( ~ a g g e t tand Chorley, 1969).

The topological

entities on which these applications are based are graphs.


There are however other topological concepts of interest for
geographical purposes. One of the main sub-branches in
topology is general, or set, topology.

Based on the

mathematical concept of neighborhood, in set topology,


concepts such as that of limit and continuity are extended to
abstract sets ( ~ i r b yand Gardiner, 1982).

Topological

concepts such as the boundary or interior of a region, open


set and neighborhood are used to model the geographical
concept of neighborhood.

Here, the introduction of a topology to an specific element of


a geospace, a graph, has allowed the identification between
the geographical and topological concepts of neighborhood.
This might be expected since both notions have their origin in
.,the same conception of nearness to a point or entity.

The idea of fuzziness as developed by Zadeh ( 1 9 6 5 ) has been


incorporated to a topological space through the heterogeneity
index.

This result suggests the possible development of a new

mathematical structure, Fuzzy Topology.

In the third level of modeling, neighborhood models have been


applied in two instances:

the design of regionalization

algorithms and the definition of criteria for the location of


schools.

The algorithms designed are presented as examples of the use


of the neighborhood approach.

As mentioned previously,

algorithms have to be designed according to the problem at


hand.

That is, the algorithms that have been presented are

not necessarily adequate for every regionalization problem,


although for the specific case of a central agglomerative

procedure the application of the heterogeneity index has been


fully described.

The fuzzy set algorithm presented provides the analyst with


information which is not available when a bivalent logic is
used.

A degree of membership of an element to a region has

been given as a result of the classification procedure.

It is

important to note that geographical concepts have been


,previously modeled using fuzzy sets no ale, 1972 and Leung,
1982).

However, in the design of classification algorithms

that include a contiguity constraint the use of fuzzy concepts


has posed some special problems.

Contiguity is a

characteristic that has been considered essentially bivalent,


although in some cases (cliff and Ord, 1973) quantities such
as the length of the border have been used as a measure of
"contiguousness."

In the algorithm discussed here the

fuzziness refers, in topological terms, to the degree of


"interiority" of a point to a set.

As a result, the

"membership" of an element to a region has not been described


in terms of its "membership" to contiguous regions but rather
with respect to its degree of membership to the border or the
interior of the region.

In the second application a neighborhood nodel has been


designed to aid in a school location problem.

In this

particular area of educational planning the use of spatial

models has been scarce, although indicators that include some


spatial criteria are generally used as an aid in the selection
of sites to establish new schools.

In this case the

heterogeneity index has been applied to aid the study of some


spatial and temporal aspects of the demand for preparatory
schools.

The spatial interaction among settlements and the

temporal stability of demand are two spatial factors that have


been pointed out as important indicators in the analysis of
,alternatives for the allocation of resources.

6.2 Discussion

It has been stated that three levels of modeling have been the
concern of this study:

1 ) the establishment of a general

framework for tne design of neighborhood nodels;


design of tools for the study of geo-spaces and

2) the

3) the

application of neighborhood nodels to specific geographical


problems.

This concluding section contains some brief and somewhat


speculative remarks on the general importance of each level of
analysis.

At the third and most detailed level of modeling two


neighborhood models were applied to geographical problems:
regionalization and school location planning.

Although in

neither case was the use of the technique exhaustive, the


results obtained indicate that the mathematical modeling of
geographical landscape through "neighborhoods" constitutes a
fruitful avenue of inquire for both applied and basic research
purposes.

At the second tool design level, it was the development of the


heterogeneity index as a measure of local variation of a
,geo-subspace that made the third-level applications possible.
It would seem to follow that the same conceptual tool could be
applied not only to similar contexts in the future but also to
the study of the set of neighborhoods that conform a
geo-space.

One obvious area of exploration would be to

substitute neighborhoods for tne entities used in existing


models.

For example, a measure of correlation could be

applied to two or nore sets of neighborhoods of one or more


geo-spaces.

Similarly, the representation of a geo-space

through a topological space with fuzzy characteristics, raise


the possibility of using topological and fuzzy set theory for
a thorough study of the geographical landscape.

In this case,

even though fornalization was achieved by identifying


geographical entities with mathematical ones, the use of the
mathematical models themselves was not extensive.

It must

therefore be acknowledged that the strengths and weaknesses


of the application of topological and fuzzy set theory to
geographical problems remains largely unexplored.

Finally, from a more general point of view, this thesis


illustrates the kind of discoveries that can be expected from
high-level communication and interaction among two or more
fields' of inquiry.

In this particular case the

geo-mathematician finds, on one hand, a previously unknown


universe of applications of abstract mathematical theory, and
on the other hand, the equally unsuspected possibility of
,modeling the fundamental notion of geographical neighborhoods.

APPENDIX

MATHEMATICAL CONCEPTS

This appendix contains the mathematical details of the


.,definitions of a topology in a connected graph, as presented
in chapter 3.

The definitions of some mathematical concepts

necessary for the discussion are given in the first section.

A.l Mathematical Definitions

1 ) A graph with m points and q lines is called a

b , q ) graph.

2) Walk of a graph.

A walk of a graph is an alternating sequence of


points and lines

. . . ,Vn-1 ,Xn,Vn

VO,X1 ,V1 ,X2,

beginning and ending with points, in which each


line is incident with the two points immediately
preceding and following it.

3) Path of a graph.
A path of a graph is a walk where all the points

(and t h u s a l l t h e l i n e s ) a r e d i s t i n c t .

4 ) Connected graph.
A g r a p h G is connected i f e v e r y p a i r of p o i n t s

a r e j o i n e d by a p a t h .

5 ) Subgraph.
A subgr a p h of G i s a g r a p h h a v i n g a l l i t s p o i n t s

and l i n e s i n G .

6 ) Difference.
For t h i s p a r t i c u l a r a p p l i c a t i o n t h e d i f f e r e n c e
b e t w e e n two g r a p h s G1 and G 2 i s d e f i n e d as
follows:
The l i n e s i n

G1

G2

are all the lines that

b e l o n g t o GI a n d do n o t b e l o n g t o G 2 . T h a t i s :
X ( G I - ~ 2 )= X ( G ~)

The p o i n t s i n G I - G 2

x ( G ~ )

a r e those t h a t are

r e p r e s e n t e d i n X ( G ~- ~ )2

7 ) Topology
L e t X be a g i v e n s e t of o b j e c t s c a l l e d t h e
p o i n t s o f X.
A t o p o l o g y i n X i s a non-empty

c o l l e c t i o n of

s u b s e t s o f X c a l l e d open s e t s s a t i s f y i n g t h e
f o l l o w i n g f o u r axioms:
Ax. 1

The e m p t y s e t i s o p e n .

Ax. 2

The s e t X i t s e l f i s o p e n .

AX.

The u n i o n o f a n y f a m i l y of o p e n

s e t s is open.

Ax. 4 The i n t e r s e c t i o n o f a n y ( a n d h e n c e
o f a n y f i n i t e number o f ) o p e n s e t s

is open.

A s e t is s a i d t o b e t o p o l o g i z e d i f a t o p o l o g y

h a s been g i v e n i n X.

A t o p o l o g i z e d s e t X is

c a l l e d a t o p o l o g i c a l s p a c e and t h e t o p o l o g y T i s
c a l l e d t h e topology of t h e space X

(HU,

1964,

p.16).

I n t h i s c a s e t h e s e t o f i n t e r e s t is a c o n n e c t e d g r a p h G w i t h
p o i n t s V(G) a n d l i n e s x ( G ) .

To a p p l y t h e c o n c e p t o f t o p o l o g y

t o G, t h e c o n c e p t o f s u b s e t i s i d e n t i f i e d w i t h t h a t o f
subgraph.

To e x e m p l i f y some o f t h e s e d e f i n i t i o n s a s s u m e t h a t GI, G 2 , G3
a n d G4 a r e f o u r g r a p h s as shown i n f i g u r e A . 1

By d e f i n i t i o n

G5

GI U G2

is such t h a t :

V(GI U G2)

V(GI) U v ( G ~ ) =

{ V1, V2, V3, V4, V5, V6 V7

and

X(G1 U ~ 2 =) x(G1) U x ( G ~ ) =
=

( ( ~ 1, ~ 2 ) , ( ~ 2 , ~ 3 ) ~ ( ~ 3 , ~ 4 ) ~ ( ~ 3 , ~ 5 ) , ( ~ 2 , ~ 6 ) , (

is' shown diagrammatically in figure A.2.

G5

The intersection between G1 and G3 (G6 = G1 fl G3)

as defined

in Chapter 3 is such that:


.,

X(G~ n ~ 3 =) X(GI )
=

n~

( ~ =3 1

{(vl ,v2)7(v2,v3)1

and it points are those represented in

V ( G I ~ G ~= )(vl, v2, v31.

~ ( G l n~ 3 )so that

G6 is shown in figure A.2.

As another example of the intersection of two graphs consider


G7

G3

G4.

In this case since x ( G ~ ) =

8, G7 is the (0,O)

graph.

The difference between G1 and G3

X(G8)

x(G1 ) - X(G3)

{(v3,~4),(~3,~5)1

G1

G3) is such that:

v(G~) = [v3,v4,v51

and

V(G4)

Moreover, as expected G1

G3

(1;8=

G4 = G3,

G3 U G4

GI is the (0,0)graph since x ( G ~ )c X(GI )

G1 and

Figure A. 1

Figure A . 2

A.2 Mathematical Discussion

Given ( m , q ) a connected graph G such that

q#O

then the

following propositions are true:

Proposition 1.

The collection of open sets as defined in


section 3.4 is a topology of G

Ax.1 The (0,O) graph is an open set.

This is true by the

empty condition.

Ax.2 i)

G is an open set.

By definition G is a subgraph of

ti.

ii)

Let

v(G).

Since G is connected and

there exists a point pl such that


subgraph

X(N)
G

be defined as:

((p,pl ) ) .

such that

V(N)

(p,pl)
=

q # 0,

x(G).

Let

[p,pl 1 and

N is a non-empty connected subgraph of

V(N) # (pi and

V(N).

Therefore

is an open set.

Ax.3 The union of any family of open sets is open.


Let

01, 02,

...On

be a family of open sets of G and

i)

n
By d e f i n i t i o n V ( O ) = U ~ ( 0 i and
)
i=1

Given p

V ( O ) and ( p l , p 2 )

x(O),

x(o)= U

t h e r e e x i s t open s e t s

O i and O j s u c h t h a t p c V ( 0 i ) and ( p l , p 2 )

~ ( 0 j ) .

S i n c e O i and O j a r e s u b g r a p h s of G , t h e n p
(pl,p2) E x(G).

~ ( 0 i ) .

i=1

V ( G ) and

T h e r e f o r e a l l t h e p o i n t s and l i n e s of 0

b e l o n g t o G and 0 i s a s u b g r a p h of G .

ii)

Given

V(0) t h e r e e x i s t s O i s u c h t h a t p

~ ( 0 i ) .

S i n c e O i i s a n open s e t t h e r e e x i s t s a non-empty
c o n n e c t e d s u b g r a p h N of O i s u c h t h a t V ( N ) # ( p ] and
p

v(N).

However N is a l s o a c o n n e c t e d s u b g r a p h of 0

s i n c e V ( N ) c ~ ( 0 iC)V ( O )

and

X(N)cX(Oi) cX(0).

T h e r e f o r e f o r e v e r y p o i n t p i n 0 t h e r e i s a non-empty
c o n n e c t e d s u b g r a p h N s u c h t h a t V ( N ) # ( p j and p E V(N)

T h e r e f o r e 0 i s a n open s e t

Ax.4

The i n t e r s e c t i o n o f any two open s e t s is open.


L e t 01 and 02 be two open s e t s and 0 = 01 1302 = ( m , q )
where q # 0 .

i ) By d e f i n i t i o n X(0) = ~ ( 0 1 f)l X ( 0 2 ) and V ( O ) a r e a l l t h e

p o i n t s t h a t b e l o n g s t o a t l e a s t one p a i r i n ~ ( 0 ) .
Let ( p l , p 2 )

x(o),

t h e n ( p l , p 2 ) E ~ ( 0 1 and
)

(pl ,p2)

X(02).

(pl , p 2 )

X ( G ) and l e t p

p c V(02).

S i n c e 01 and 02 a r e s u b g r a p h s o f G ,
C

v ( o ) , t h e n p G V(01:)and

B u t 01 a n d 02 a r e s u b g r a p h s o f G , t h e n

p V(G).

T h e r e f o r e a l l t h e p o i n t s and l i n e s of 0 b e l o n g

t o G , s o 0 is a subgraph of G.

i i ) L e t pl be a p o i n t i n 0 , pl

~ ( 0 ) . By d e f i n i t i o n

t h e r e e x i s t s a l i n e i n 0 , ( p , p l ) such t h a t
( p , p l ) ~~ ( 0 1 and
)
(p,pl)
follows:

X(02).

L e t N be d e f i n e d as

V ( N ) = { ~ , ~ and
l l X(N) = { ( p , p l ) j .

N is a

nonempty c o n n e c t e d s u b g r a p h of 0 s u c h t h a t V ( N ) #
P E

{PI

and

V(N

T h e r e f o r e 0 i s a n open s e t .

P r o p o s i t i o n 2.

For e v e r y p o i n t p

V ( G ) t n e s u b g r a p h formed by

i t s f i r s t o r d e r n e i g h b o r s and t h e l i n e s j o i n i n g
them t o p , i s a t o p o l o g i c a l n e i g h b o r h o o d of p.

Proof:

Let pl,p2,

. . . p n be t h e s e t o f n e i g h b o r s of p.

Define N

as f o l l o w s :
V(N) = (p, pl,p2,
connected )

...p n j .

and X ( N ) =

( V(N) # { p j s i n c e G is
( ~ , p )l, ( p , p 2 ) ,

... ( p , p n ) ] .

Let

p i be a n a r b i t r a r y n e i g h b o r o f p and U a s u b g r a p h of N
s u c h t h a t V ( U ) = { P , P ~ ]a n d X ( U ) = { ( p , ~ l )1.

U is a n

open set such that p is a point of U and U is a subgraph


of N.

Therefore N is a topological neighborhood of p.

REFERENCES

Andenberg, M.R. 1973. Cluster Analysis for Applications.


York, Academic Press.
.,

New

Beaumont, J.R. 1983. "Quantitative and Theoretical Geography


in Europe." Area 15: 166-167.
Bell, W. 1955. "Economic, Family and Ethnic Status: An
Empirical Text." American Sociological Review, 20: 45-52.
Bennett, R.J. 1981. "Quantitative and Theoretical Geography in
Western Europe." European Progress in Spatial Analysis,
Bennett R.J. ed. Pion Limited: 1-32.
Bennett, R.J. and Wrigley N. 1981. "Retrospect and Prospect
on British Quantitative Geography." Quantitative Geography: a
British View. Bennett, R.J. and Wrigley N. eds. Roetledge &
Kegan Paul, London, Boston and Henley: 3-11.
b

Berry, Brian J.L. 1971. "Introduction: The Logic and


Limitations of Comparative Factorial Ecology." Economic
Geography, 47 ( ~ u n e:)209-21 9.

--------

Regions. "

1961. "A Method of Deriving Multifactor Uniform


Przelg. Geogr , 33 : 263-282.

-------- .

1968. "Approaches to Regional Analysis." Spatial


Analysis, A Reader in Statistical Geography, Berry B.J.L. and
Marble D.F. eds. Prentice Hall.

-------- . 1973.

"A Paradigm for Modern Geography." Directions


in Geography, Chorley R.J. ed. :3-21.

Bivand R. 1984. "Regression Modeling with Spatial Dependence:


An Application of Some Class Selection and Estimation
Methods." Geographical Analysis, 16:
Bodson P . and Peeters D. 1975. "Estimation of the
Coefficients of a Linear Regression in the Presence of Spatial
Auto~orrelation.'~
Environment and Planning A, 7:455-472.

Brantingham, P.L. and Brantingham P.J. 1978 "A Topological


Technique for Regionalization." Environment and Behavior,
10:335-353*
Brouwer F. and Nijkamp P. 1984. "Linear Logit Models for
Categorical Data in Spatial Mobility Analysis." Economic
Geography, 60:102-110.
Bunge, W. 1966. Theoretical Geography. The Royal University
of Lund Sweden, Department of Geography.
Burnett P. 1978. "Markovian Models of Movement within Urban
Spatial Structures." Geographical Analysis, 10:142-153.
Byfulgien, J. and Nordgard, A. 1973 "Region Building-A
,Comparison of Methods." Nordsk. Geogr Tidsskr , 27: 127-1 51

-------- 1974

"Types or Regions?"

Norsk. Geogr. Tidsskr .,

28: 157-1 66.


Clark, I. 1979. Practical Geostatistics. Applied Science
Publishers, Ltd. London.
Cliff A.D. and Haggett, P. 1970 "On tne Efficiency of
Alternative Aggregations in Region Building Problems."
Environment and Planning, 2:285-294.
Cliff A.D. and Ord J.K. 1973. Spatial Autocorrelation. London
Pion
Cormack, R.M. 1971. "A Heview of Classification."
Statistical Society A, 134:321 -367.

The Royal

Cromley R.G. and Hanink D.M. 1985. "Location Portfolio


Analysis." Geographical Analysis, 173318-330.
Davis, Russell G. and Schiefelbein, E. 1980. Planning
Education for Development, Vol. 11, Models and Methods for
Systematic Planning of Education. Massachussets Institute of
Technology.
De Jong P.,Sprenger C. and Van Veen F. 1984. ItOnExtreme
Values of Moran's I and Gearyls c." Geographical Analysis.
16: 17-24.
Dutton, G., ed. 1978. Harvard Papers on Geographic Information
Systems. First International Advanced Symposium on Topological
Data Structures for Geographic Information Systems. Harvard
University.

Dunn, J .C. 1974. I1Some Recent Investigations of a New Fuzzy


Partitioning Algorithm and its Applications to Pattern
Classification Problems." Journal of Cybernetics 4, 2:1-15.
Elliot, Harold M. 1983. "Surrounding Larger Neighbors and the
Atlantic Coast Cardinal Neighbor Gradient." Economic
Geography 59: 426-444.
Firby, P.A. and Gardiner C.F., 1982. Surface Topology.
Horwood Limited.

Ellis

Fisher, D.W. 1958. "On Grouping for Maximum Homogeneity."


Journal of the American Statistical Association, No. 53, pp.

789-798
,Forrester, J.W. 1973.
eight printing.

Industrial Dynamics. M.I.T. Press,

Fotheringham A.S. and Reeds L.G. 1979. "An Application of


Discriminant Analysis to Agricultural Land Use Prediction."
Economic Geography, 55 : 1 1 4-122.
Fowles, Grant R. 1970. Analytical Mechanics. Holt, Reinhart
and Winston Inc.
Gale, S. 1972. "Inexactness, Fuzzy Sets ,and the Foundations
of Behavioral Geography.'' Geographical Analysis 4: 337-349.
Garfinkel, R.S. and Nemhauser , G.L. 1970. "Optimal Political
Districting by Implicit Enumeration Techniques." Management
Science, 16 :B495-B508.
Gould P. 1970. "Is Statistix Inferens the Geographical Name
for a Wild Goose?I1 Economic Geography, 4b:439-448.
Gould, W.T. 1978. School Location Guidelines.
Office Memorandum.

-------- .

World Bank

1973. Planning the Location of Schools: Ankole

District, Uganda.
Planning, Paris.

International Institute for Educational

Gregory, S. 1983. "Quantitative Geography: the British


Experience and the Role of the Institute." Trans. Inst. Br.
Geogr.. N.S. 8: 80-89.
Guruge, A. and Ariyadasa, K.D. 1976. Planning the Location of
Schools: Case Studies in Sri Lanka. International Institute
for Educational Planning (UNESCO), Paris.

Haaser, N.B., La Salle, J.P. and Sullivan, J.A. 1959.


Introduction to Analysis. Blaisdell Publishing Company.
Yaggett, P. and Chorley, R. 1969. Network Analysis in
Geography. Edward Arnold.
Haggett, P., Cliff, A. and Prey, A. 1977. Locational Analysis
in Human Geography. Edward Arnold, Second Edition.
Haining, R.P. 1983. "Advances in Applied Spatial Analysis."
Area 16: 8.
Hall, B.F. 1983. "Neighborhood Differences in Retail Food
Stores: Income Versus Race and Age of Population." Economic
Geography 59: 282-295.
,Hall, K.P., Gilmour, D.I. and Mingos, D.M.P. 1984. "Molecular
Orbital Analysis of the Bonding in High Nuclearity Gold
Cluster ~om~bunds."Journal of ~r~anometallic
chemistry, 268:
275-293
Hallak, J. et al. 1975. Metodo de Preparacion del Mapa
Escolar: La Region de San Ramon, Costa Kica.
Harary, F. 1972. Graph Theory. Addison Wesley, Publishing
Company.
Harman, H.H. 1976. Modern Factor Analysis. Chicago: University
of Chicago Press 3d ed., rev.
b

Hartigan,J.A. 1975.

Clustering Algorithms. John Wiley & Sons.

Haynes, Kingsley E. 1971. "Spatial Change in 'Jrban Structure:


Alternative Approaches to Ecological Dynamics." Economic
Geography, 47 ( ~ u n e) : 324-335.
Hu, Sze-Tsen 1964. Elements of General Topology. Holden-Day,
Inc., San Franciso, London, Amsterdam.
Janson, C. 1971. "A Preliminary Report on Swedish Urban
Spatial Structure." Economic Geography, 47 (~une):249-265.
Johnson, K. and Rosenzweig 1963. The Theory of Management of
Systems. Kogakusha, Mc Graw Hill.
Johnston R.J. 1981. "Ideology and Quantitative Human
Geography in the English-Speaking World." European Progress in
Spatial Analysis, Bennett R.J. ed. Pion Limited: 35-46.

---_____
. 1971.

ltSome Limitations of Factorial Ecology and


Social Area Analysis." Economic Geography, 47 (~une):314-323.

--______
. 1970. "

Grouping and Regionalization: Some


Methodological and ~echnicalObservations." Economic
Geography, 46,2:293-305
Jones,.E. and Eyles, J. 1977. An Introduction to Social
Geography. Oxford University Press.

Lankford, P.M. 1969. llRegionalization:Theory and Alternative


algorithm^.^^ Geographical Analysis I:196-212.
Lawley, D.N. and Maxwell, A.E. 1971. Factor Analysis as a
Statistical Method. American Elsevier Publishing Co.
,Leung, Y. 1982. "Approximate Characterization of Some
Fundamental Concepts of Spatial Analysis." Geographical
Analysis 14: 29-40.
Martin R. 1974. "On Spatial Dependence, Bias and the Use of
First Spatial Differences on Regression Analysis." Area
6 : 185-1 94.
Maxfield D.W. 1972. "Spatial Planning of School Districts."
Annals Association of American Geographers, 62:582-590.
PIorley C.D. and Thornes J.B. 1972. "A Markov Decision Model
for Network Flow." Geographical Analysis, 4:180-193.
b

Morrill H.L. and Kelly M.B. 1970 "The Simulation of Hospital


Use and the Estimation of Location Efficiency." Geographical
Analysis, 2:283-300.
Muckay D.B. 1983. "Alternative Probabilistic Scaling Models
for Spatial Data." Geographical Analysis, 15:173-189.
Mulligan G.F. and Gibson L.J. 1984. "Regression Estimates of
Economic Base Multipliers for Small Communities." Economic
Geography 60: 225-237.
~urtagh,F - 1985. "A Survey of Algorithms for
contiguity-constrained Clustering and Related Problems."
Computer Journal, 28: 82-88.

The

Peucker, T.K. and Chrisman, N. 1975. "Cartographic Data


structures." The American Cartographer, 1 (~~ril):55-69.
~eucker,T.K., Fowler, R.J., Little, J.J. and Mark, D.M. 1976.
~rian~ulated
Irregular Networks for Representing Three
~imensionalSurfaces- Technical Report #lo: Geography
Department, Simon Fraser University.

Phipps, A.G. and Laverty W.H. 1983. "Optimal Stopping and


Residential Search Behavior." Geographical Analysis,
15: 187-204.
Ravenstein, E.G. 1885, 1889. "The Laws of Migration."
of the Royal Statistical Society, 48: 52.

Journal

Rees, Philip H. 1971 "Factorial Eco1ogy:An Extended


Definition, Survey and Critique of the Field." Economic
Geography, 47 ( ~ u n e :)220-233.
Rogerson, P.A. 1984. "New Directions in the Modelling of
Interregional Migration." Economic Geography, 60:111-120.
Rosenfeld, A. 1978. "Extraction of Topological Information
,from Digital Images." Harvard Papers on Geographic Information
Systems, Vo1.6. First International Symposium on Topological
Data Structures for Geographic Information Systems. Harvard
University.
Royden, H.C. 1968. Real Analysis. Collier-Plcmillan Limited,
London.
Salins, Peter D. 1971. "Household Location Patterns in
American Metropolitan Areas." Economic Geography, 47
( ~ u n e :)234-248.
Sawicki, D.S. 1973. "Studies of Aggregated Areal Data:
Problems of Statistical Inference." Land Economics,
1 2 :237-247.
Schwab, M.G. and Smith T.R. 1985. "Functional Invariance
under Spatial Aggregation from Continuous Spatial Interaction
Models." Geographical Analysis, 17:217-230.
Scott, A. 1969.

Studies in Regional Science. Pion Limited.

Sheppard, Eric S. 1979. "Notes on Spatial Interaction ."


Professional Geographer, 31 (1 ):8-15.
Silk, J. 1979. Statistical Concepts in Geography.
Allen & Unwin, London.

George

Smith, T.E. 1984. "Testable Characterizations of Gravity


Models." Geographical Analysis, 16:74-94.
Slater, P.B. 1985. "Point-to-Point Migration Functions and
Gravity Model Re-Normalization: Approaches to Aggregation in
Spatial Interaction Modeling." Environment and Planning A,
17:1025-1044.

Springer, C.S. 1973. "Role of the Five-Coordinate Intermediate


in the Stereochemistry of Dissociative Reactions of Octahedral
Compounds." Journal of the American Chemical Society, 95:
1459-1 467
Solana, F., Cardiel, R. and BolaRos, R. 1981. Historia de la
Educacidn Phblica en ~ 6 x i c o . Secretaria de Educacidn Publica.
Symon, K.R. 1969. Mechanics. Addison Wesley Publishing
Company.
Taafe, E.J. and Gauthier, H.L. 1973. Geography of
Transportation. Prentice Hall, Inc.
Taylor, P. 1977. Quantitative Methods in Geography. Houghton
Mifflin, Boston.
Tobler, W. 1970. "A Computer Movie Simulating Urban Growth in
the Detroit Region." Proceedings of the I.G.U. Commission on
Quantitative Methods, Economic Geography 46.
Thoresson, J.D. and Liittschwager, J.M. 1967. "Legislative
Districting by Computer Simulation." Behavioral Science,
1 2 :237-247.
Tylor, E.B. 1889. "On a Method of Investigating the
Development of Institutions Applied to Laws of Marriage and
Descent." Journal of the Anthropological Institute of Great
Britain and Ireland, 18:245-272.
b

Wilson, A. G. 1971. "A Family of Spatial Interaction Models,


and Associated Developments." Environment and Planning,
3: 1-32

Yupa L. S and Mayf ield D. 1 978. "Non-Adopt ion of Innovat ions :


Evidence from Discriminant Analysis." Economic Geography
5:145-156Zadeh, L.A.
8:338-353

1965- "Fuzzy Sets."

Inforuation and Control

_____--1976.

" W z z y Sets and their Application to Pattern


classification and Clustering Analysis." Classification and
Clustering, Ed. J * Van Ryzin. Academic Press Inc.

Zobler, Lo 1958. "Decision Making in Regional Construction."


~ n n a l sof the Association of American Geographers, 48:140-148.

You might also like