Cluster Sampling
It is applicable when area size is extremely large and practically it is
not possible to collect the information so researchers make the small
clusters or groups according to the area or any other parameter.
Clustered sampling is a type of sampling where an entire
population is first divided into clusters or groups. Then, a random
cluster is selected, from which data is collected, instead of
collecting data from all the individuals from the entire population.
Cluster sampling is most often used in cases where it is not
practical to get a sample from the entire population
A few examples of clusters that are already available are:
Geographic Clusters:
To conduct a national survey, we must first select a random
sample of states or cities, and then survey all individuals within
those selected areas. This reduces the cost and challenges
associated with surveying individuals across the entire country.
Schools or Classrooms: Generally, in educational research, we
might randomly select a sample of schools or classrooms and
then collect data from all students within those clusters.
Businesses or Organizations: When studying the performance of
businesses or organizations, we could randomly select a sample
of companies and then collect data from all employees within
those companies
This type of sampling is useful when there is a large population or when
there is a natural grouping of the elements within the entire population,
some of which are mentioned above.
Steps to Perform Clustered Sampling
The steps to perform simple clustered sampling are as follows:
Step 1: Define the Population
Firstly, we need to clearly define what population we need to study. This
can be any geographical area, an organization, or any other according to
our interest.
Population
Step-2: Create Groups/Clusters
Now, we divide the population into clusters or groups, and the groups do
not overlap each other. Each cluster must represent the entire
population. There are also naturally occurring clusters like schools,
cities etc.
Create groups/clusters
Step-3: Randomly Select Clusters
As each cluster is similar to each other, we may now do the random
sampling technique i.e., select a random sample of clusters from all the
clusters formed. It’s important that each cluster has a known and there
is equal chance of being selected.
Randomly select clusters
Step-4: List Elements from Selected Cluster
Within each selected cluster, list all the elements within that cluster.
For example, if the selected cluster is of grade 8th students in one
school, we need to list all the students in that class. This step is done for
our ease and understanding.
Step-5: Collect Data
Collect data from every individual in the list we made. The data collection
can be done in various ways like surveys, interviews, observations, or
any other method according to the type of population and our topic of
interest.
data collection from clusters
Step-6: Analyze the Data
The final step after collecting the data is to perform analysis on data and
draw conclusions about the population. This can be done through various
data analysis techniques and we can take decisions according to the
output obtained
Advantages:
Low cost/high frequency of use/ Cheap: This method is
cheaper than other sampling methods, like simple random
sampling or stratified sampling. It’s because this method
reduces the need to survey each and every element in the
population and the efforts to sample each and every
individual is decreased.
Practical: This is practically possible when we cannot survey
each individual in a population because clusters/groups can be
more easily recognized and can be accessed.
Requires list of all clusters, but only of individuals within
chosen clusters.
Reduces cost.
Increased Efficiency: This method increases efficiency in data
collection, if the clusters are already naturally occurring groups (for
example, households, schools, geographic regions) that are easier
to sample together.
Disadvantages
Larger error for comparable size than other probability
methods
Multistage very expensive and validity depends on other
methods used
Risk of Bias: If the clusters are not good representation of the
entire population or is not evenly distributed, it may result in
the biased/wrong result