Unit 5- Assessment
Learning objectives
At the end of this session the student
will be able to:
Define assessment, measurement and
evaluation
Differentiate the meaning between
assessment, measurement and evaluation
Identify types of assessment, and evaluation
Describe the qualities of measurements
Is there a difference between
assessment, measurement and
evaluation???
During the process of gathering information
for effective planning and instruction, the
words measurement,
assessment and evaluation are often used
interchangeably.
These words, however, have significantly
different meanings
When defined within an educational setting,
assessment, evaluation, measurement and
testing are all used to measure
◦ how much of the assigned materials students
are mastering,
◦ how well student are learning the materials,
and
◦ how well student are meeting the stated goals
and objectives.
Assessment
Is the process of documenting, usually in
measurable terms, knowledge, skills, attitudes and
beliefs.
Assessment in education is the process of
gathering, interpreting, recording & using
information about pupils’ responses to an
educational task. (Harlen, Gipps, Broadfoot, Nuttal,
1992)
One of the primary measurement tools in education
is the assessment.
Teachers gather information by giving tests,
conducting interviews and monitoring
behavior.
The assessment should be carefully prepared
and administered to ensure its reliability and
validity.
In other words, an assessment must provide
consistent results and it must measure what it
claims to measure.
Assessment can focus on:
the individual learner
the learning community( class, workshop, or other
organized group of learners)
the institution or
the educational system
The process of gathering quantitative and
qualitative data of what a student can do, and how
much a student possesses.
TYPES OF ASSESSMENT
Assessment can be
Formal or informal
Formative or summative
Formal assessments
have data which support the conclusions made from the
test.
We usually refer to these types of tests as standardized
measure.
The data is mathematically computed and summarized.
Scores such as percentiles .
Standard scores are mostly commonly given from this type of
assessment.
Informal assessments
Content and performance driven rather than data.
For example, Bed side presentation
running records are informal assessments because
they indicate how well a student is reading a
specific book. Scores such as 10 correct out of 15,
percent of words read correctly, and most rubric
scores are given from this type of assessment
Formative assessment
“A diagnostic use of assessment to provide
feedback to teachers and students over the course
of instruction.”
--- Carol Boston
Gathering of data during a time program is being
develop.
To provide feedback for the improvement of an
instruction or for the improvement of the program.
formative assessment is administrated at the
end of the day’s lesson.
The goal of formative assessment is to
monitor students learning to provide ongoing
feedback that can be used by instructors to
improve their teaching and for the students
to improve their learning.
Determines whether the teacher delivers
quality instruction in a particular day-base
summative assessment
is one attempts to assess student learning for a
specific time period.
◦ E.g a unit test
Use to determine the mastery & achievement of
the student.
Done usually at the end of a chapter or unit.
Accountability of success or failure.
Use primarily in assigning grades.
Designed to determine the extent to which
the instructional objectives has been
achieved
The goal of summative assessment is to
evaluate student learning at the end of an
instructional unit by comparing it against
some standard or benchmark
Summative assessment is used to:
diagnose students’ strengths and
weaknesses.
To assign grades.
To determine the teachers’ effectiveness.
To monitor students’ progress.
To evaluate teachers.
Evaluation
Creating valid and reliable assessments is critical to accurately
measuring educational data. Evaluating the information gathered,
however, is equally important to the effective use of the
information for instruction
Evaluation in education is a systematic process which enables the
extent to which the student has attained the educational
objective to be measured.
Evaluation always includes measurements (quantitative or
qualitative) plus a value judgement.
“The process of making overall judgment
about one’s work or a whole school
work”--- Cameron
“Evaluation is a process of determining
to what extent the educational objectives
are being realized” --- Ralph Taylor
Teachers use this information to judge the
relationship between what was intended by
the instruction and what was learned.
They evaluate the information gathered to
determine what students know and
understand, how far they have progressed
and how fast, and how their scores and
progress compare to those of other students.
Evaluation is concerned with a whole range of
issues in and beyond education; lessons,
programs, and skills can be evaluated.
It produce a global view of achievements
usually based on many different types of
information such as observation of lessons, test
scores, assessment reports, course documents
or interviews with students and teachers
TYPES OF EVALUATION
ProcessEvaluation
Product Evaluation
Process evaluation
It refers to evaluation taking place during
the program or learning activity.
It is conducted while the event to be
evaluated is occurring & focuses on
identifying the progress towards purposes,
objectives, or outcomes to improve the
activities, courses, curriculum, program or
teaching and student.
It is also known as formative evaluation.
Product evaluation
Product evaluations examines the effects of
outcomes of some object.
It conducted at the end of course.
It is also known as summative evaluation.
It evaluates the progress towards an
established outcomes.
Purpose of evaluation is to:
Clarify and define objectives.
Facilitate the improvement a program.
Motivate Participants.
Establish and maintain standards to ,meet
legal, professional and academic credentials.
Test the efficiency of teachers.
Measurement
Measurement is an act or process that involves the
assignment of numerical values to whatever is being
tested. So it involves the quantity of something
It simply means determining the attributes or
dimensions of an object, skill or knowledge.
It is the term used to describe the assignment of a
number to a given assessment.
The number can be a raw score or a score based on
a normal distribution curve
We use common objects in the physical world to
measure, such as tape measures, scales and
meters.
These measurement tools are held to standards
and can be used to obtain reliable results.
Some standard measurements in education are
raw scores, percentile ranks and standard
scores.
TEST
A test or quiz is used to examine someone's knowledge of
something to determine what he or she knows or has
learned.
Testing measures the level of skill or knowledge that has
been reached.
An instrument or activity used to accumulate data on a
person’s ability to perform a specified task.
Test may be called as tool, a question, set of question
Test is the form of questioning or measuring tool used to
access the status of one’s skill, attitude and fitness.
Test…
KINDS OF TEST
Objective Test Vs Subjective Test
Individual Test Vs Group Test
Unstandardized Test Vs Standardized Test
Objective Test
it is a test paper and pencil test where the students’
answers can be compared and quantified to yield a
numerical score.
This is because it requires convergent and specific
response.
Subjective Test
it is a paper and pencil test which is not easily quantified
as students are given the freedom to write their answer to
a question, such as an essay test.
Thus, the answer to this type of test is divergent.
Individual Test-
it is a test administrated to one student at a time.
Group Test-
it is one administrated to a group of students.
Unstandardized Test-
it is one prepared by teachers for use in the classroom, with no
established norms for scoring and interpretation of results.
Standardized Test
it is a test prepared by an expert or specialist. This type of test
samples behavior under uniform procedures.
PURPOSE OF TEST & MEASUREMENT
For getting knowledge about the progress.
For preparation of effective planning.
For knowing the abilities and capacities.
For giving motivation.
For knowing the achievements in future.
For research and experimentations.
Qualities of a measuring tool
Quality…
Among the qualities of a test,
whatever its nature, four are
essential
1. validity
2. Reliability
3. objectivity
4. practicability
Quality…
1. Validity
Is the extent to which the instrument really measures
what it is intended to measure.
The validity of the test concerns what the test measures
and how well it does so.
A valid measurement tool does a good job of measuring
on the concept that it purports to measure.
For example, to judge the validity of any test it is
necessary to know what is this test for. Also, it ascertains
what is the purpose of the test.
It is important to remember that the validity
of an instrument only applies to a specific
purpose with a specific group of people
For example, a scale is not considered
simply “valid” or “invalid”— but it might be
considered valid for measuring social
responsibility outcomes with college
freshmen
No outside factors should be allowed to interfere
with the manner in which the evaluation is
carried out.
The notion of validity is a very relative one. It
implies a concept of degree, i.e., one may speak
of very valid, moderately valid or not very
valid results
Types of validity
Validity of a test is classified into 4
types:
Content validity
Concurrent validity
Predictive validity
Construct Validity
Content validity
refers to the extent to which the content of
the test represents the content of the course.
In addition, a well-constructed test should
not only contain the subject matter.
But, also the objective of instructions and the
three main domains cognitive, affective and
psychomotor.
Concurrent validity-
refers to the degree to which the test
correlates to the criterion of the test and
acceptable for the measure.
Moreover, at the time of testing this criterion
is always available.
Also, it establishes a statistical tool that
correlates and interpret test results.
Predictive validity-
Predictive validity is determined by questions
like
◦ To what extent do the results obtained in physiology help
to predict performance in pathology?
◦ To what extent do the results obtained during the pre-
clinical years help in predicting the success of students
during the clinical years?
when the results of a test are to be used for predicting
the performance of a student in another domain or in
another situation:
This relates the actual performance of a
student in a test with its achievement so
that we can predict they are true results.
This is very helpful to predict the future
outcomes of the test giver.
Also, this predicted result is available for
future use of validation after a long period.
Construct Validity
This measure the theoretical trait of the test.
Moreover, test items must include some
mental factors like intelligence, reading
comprehension, critical thinking, or
mathematical aptitude
2. Reliability
The meaning of reliability is accuracy and consistency.
Reliability refers to:
The extent to which a test is consistent, stable and
dependent.
The test should approves what it represents.
its consistency in result, taking a test, again and again,
will not change the result.
Furthermore, it gives the same result every time .
Example: Suppose if a student scores 70 marks in the
maths paper on Wednesday. And on the next
Wednesday on the same test she/he scores 25 marks
then we cannot rely on this data.
Besides, the inconsistency of the result of a single test,
it limit and affect the person score.
In addition, it limits the samples to certain areas of the
subject matter.
Also, the disturbed mind of examinee also affects his
score.
3. Objectivity: this is the extent to which several
independent and competent examiners agree on what
constitutes an acceptable level of performance.
4. Practicability: the extent to which the test can be
used without much expenditure of money effort and
time.
depends upon the time required to construct an
examination, to administer and score it, and to interpret
the results, and on its overall simplicity of use.
It should never take precedence over the validity of the
test.
factors that determine the Practicability
Administrability- It means that a test can be administered with
clarity, ease, and uniformity.
Also, the direction is simple, concise, and clear
Besides, it specifies a time limit, sample questions, and oral
instructions.
Scoreability- it concerns the score of the test.
A good test is easy to score.
Scoring direction, scoring key is simple, and an answer is
available.
Most noteworthy the test score is useful for evaluation of students.
Testing
Testing consists of four primary steps:
1. test construction
2. test administration
3. Test scoring and
4. analyzing the test.
1. TEST CONSTRUCTION
Eight basic steps in constructing a test:
1. Defining the purpose
2. Listing the topics
3. Listing types of questions
4. Writing items
5. Reviewing items Regardless of how skilled
the teacher is.
6. Writing directions
7. Devising a scoring key
8. Evaluating a test
1. Defining the purpose.
determine who is taking the test,
why the test is being taken, and
how the scores will be used.
2. Listing the topics
representative sampling- once
the purpose and parameters
have been established, specific
topics are listed and examined
for their relative importance in
the section.
.
3. Listing types of questions.
Different types of material calls
for different types of test
questions.
MCQ- to test knowledge of
mathematics
Essays - student's understanding
of literature or philosophy.
In deciding what types of test
questions to use (short answer,
essay, true/false, matching,
multiple choice, etc.)
the following advantages and
disadvantages should be kept in
mind:
Type Advantages Disadvantages
Short Can test many facts in short Difficult to measure complex
Answer time learning easy to score
Fairy Excellent format for math
Often ambiguous Tests recall
Essay Can test complex learning Difficult to score objectively
Can evaluate Uses a great thinking process
and creativity deal of testing
time
Subjective
True/False Test the most facts in shortest Difficult to measure complex
time learning to score
Easy Tests recognition
Objective
Difficult to write reliable
items
Subject to guessing
Matching Excellent for testing process of elimination
associations and recognition of complex learning (especially
facts concepts)
Difficult to write good items
Subject to
Although terse can test
Objective Can evaluate learning at all
Multiple levels of complexity.
Choice Difficult to write
Somewhat subject to
Can be highly reliable objective
guessing Tests fairly large
knowledge base
In short time
Easy to score
4 important considerations In
choosing types of questions to be
used on a test
1. Classroom conditions
multiple choice questions can be
easily copied in an overcrowded
classroom,
blackboards -might be impossible
for long questions
2. Administration and scoring
Numbers of students, time constraints,
and other factors might necessitate the
use of questions which can be
administered and scored quickly and
easily.
3. The types of knowledge being
tested
A simplified checklist could be used by
the teacher to determine if students
have been assessed in all relevant areas.
4. Writing items:
Once purpose, topics and types
of questions have been
determined, the teacher is ready
to begin writing the specific
parts, or items, of the test.
Initially, more items should be
written than will be included on
the test. :
Guidelines When writing items:
Cover important material
Items should be independent.
◦ The answer to one item should not be
found in another item; correctly
answering one item should not be
dependent on correctly answering a
previous item.
Write simply and clearly.
◦ Use only terms and examples students
will understand
Besure students know how to
respond.
◦ students who understand the
material will know what type of
answer is required and how to record
their answers.
◦ For example, on essay questions, the
teacher may specify the length and
scope of the answer required.
Include questions of varying
difficulty.
◦ from the easiest to most difficult
items not to immediately discourage
the weaker students.
Be flexible.
◦ Whenever feasible, any test should
contain several types of items.
5. Reviewing items
Regardless of how skilled the
teacher is, not all his/her first efforts
will be perfect or even acceptable.
revising the good and eliminating
the bad items, All items should be
evaluated in terms of purpose,
standardization, validity,
practicality, efficiency, and fairness
6. Writing directions
Clear and concise directions should
be written for each section.
Whenever possible, an example of a
correctly answered test item should
be provided as a model.
If there is any question as to the
clarity of the directions, the teacher
should "try them out" on someone
else before giving the exam.
7. Devising a scoring key
While the test items are fresh in
his/her mind, the teacher should
make a scoring key.
scoring key -- a list of correct
responses, acceptable variations,
and weights assigned to each
response .
In order to assure representative
sampling, all items should be
assigned values at this time.
For example, if "factoring"
comprised 50% of class material
to be tested and only 25% of the
total number of test questions,
each question should be assigned
double value.
8. Evaluating A Test
All methods of assessing student
learning should achieve the same
thing: the clear, consistent and
systematic measurement of a behavior
or something that is learned.
Once a test has been constructed, it
should be reviewed to ensure that it
meets six specific criteria:
◦ clarity, consistency, validity,
practicality, efficiency, and fairness.
The following is a checklist of
questions that should be asked
after the test (or any assessment
activity) has been prepared and
before it is administered:
A CLEARLY DEFINED Who is being assessed?
PURPOSE What material is the test (or activity) measuring?
What kinds of knowledge or skills is the test (or activity)
measuring?
Do the tasks or test items relate to the objectives?
STANDARDIZATION OF Are content, administration, and scoring consistent in all
CONTENT groups?
VALIDITY Is this test (or activity) a representative sampling of the
material presented in this section?
Does this test (or activity) faithfully reflect the level of
difficulty of material covered in the class?
PRACTICABILITY AND Will the students have enough time to finish the test (or
EFFICIENCY activity)?
Are there sufficient materials available to present the test or
complete the activity effectively?
What problems might arise due to structural or material
difficulties or shortages?
FAIRNESS Did the teacher adequately prepare students for this
activity/test?
Were they given advance notice?
Did they understand the testing procedure?
How will the scores affect the students' lives?
Activities for evaluation
1. Make a statement of fact.
Now write it as a test item in the
form of multiple choice, matching,
true/false, and short answer.
If you were to include this item on a
test, which format would you
choose?
2. Write directions for the format
you chose in activity one and read
them to someone else.
Are they clear? Concise?
Understandable?
3. Take a test that you have
designed.
Before you administer it use the
checklist to evaluate it
2. TEST ADMINISTERATION
Once the items, directions, and answer key
have been written, the teacher should
consider the manner in which the test will be
presented in advance.
Factors such as duplication, visual aids, and
use of the blackboard should be considered
in advance to insure clarity in presentation
as well as to avoid technical difficulties.
Principles in test administration
a. Establish Classroom Policy
b. Teaching Test-Taking
Techniques
A. Establish Classroom Policy
discipline is a major factor
Establish a classroom policy concerning
such matters as tardiness, absences, make-
ups, leaving the room, and cheating
Advise students of procedural rules such as:
◦ What to do if they have any questions.
◦ What to do when they are finished taking the
test.
◦ What to do if they run out of paper, need a new
pen
◦ What to do if they run out of time.
B. Teaching Test-Taking Techniques
The teacher should familiarize his/her students
with:
° The type of test and how to study for it.(proficiency or
other)
° The types of items and how to respond to them (e.g.
matching, fill in the blank, essay questions, etc.).
° The types of directions
3. TEST SCORING
Points under test scoring
Raw Score
Transforming Raw Scores
Weighting Test Items
Deriving Percentages
Assuring Objectivity
Using a scoring key
Assigning a specific value to each
test item or activity component.
Scoring can be raw score or
transformed to fit the
requirements of testing within
specific contexts
Raw Score is
number of items answered correctly.
If a student answers eight out of ten
items correctly, his/her raw score is
eight.
Transforming Raw Scores
Transformation of raw scores into
fractions, decimals, or multiples of their
raw value in order to make tests match
such a predetermined number.
E.g If the desired result is a score over 20,
but a test includes 30 questions
Consider fractions ( 2/3 pt. for each) or as
decimals (.66 each).
Likewise, if a test has only 10 questions,
each can be multiplied by two to obtain a
score over 20.
Weighting Test Items
some questions are more important or
more difficult than others
1 point each or less value 1/4 point, or .25.
Deriving Percentages
Transforming raw scores into percentages,
Use- to compare tests of varying length and
difficulty or tests of varying amounts of points
on equal terms.
If all items on a test are worth the same
amount
◦ Percent correct = (Number of items correct ) /
(Total number of items) x 100%
Assuring Objectivity
As with test construction, the key
to successful test scoring is
objectivity.
By setting certain standards and
prescribing certain rules, the
teacher can be sure that scoring
has been objective and students
have been treated fairly.
Three techniques are particularly
helpful in assuring objectivity:
◦ immediate scoring & recording
(to alleviate misunderstanding and
bias)
◦ using a scoring key
◦ having a procedure for comparing
responses to the key
Using a scoring key
can make scoring papers go quickly
reducing the possibility of error and bias.
It simplify and standardize the process of scoring if
numerous people will be scoring the test.
increase objectivity.
This techniques is particularly useful with essay
tests where it is important to look for key points in
each response.
4. ANALYSING TEST RESULTS
After scoring, test can be analyzed in numerous ways
to provide the teacher with information about student
performance.
For example:
tests from one semester can be ranked to show
relative areas of strength and weakness
averaged class scores on a given test can be ranked
to compare one class's performance to that of
another.
Importance:
for making decisions about lesson planning
and future testing
To know how to approach different students
and classes.
In order to analyze anything, specific criteria
must be established
In test analysis, three different criteria are
generally used:
the content of the test
◦ Criterion-Referenced Scoring
the norm group taking the test, or
◦ Norm-Referenced Scoring
an individual student
◦ Self-Referenced Scoring
Criterion-Referenced Scoring
Criterion-referenced scoring uses the content of
the test itself as a the basis of comparison for
assessing the student's level of achievement.
Thus, a content-referenced score of 80% means
that the student correctly answered 80% of the
items on the test.
The most common of all methods of test
analysis
Criterion or Content-referenced scoring is
used :
to determine the level of achievement at
which to begin a student
to determine how much a student has
learned from given section of material; and
to determine a student's potential in a given
field.
Norm-Referenced Scoring
Sometimes referred to as "grading on a
curve," norm-referenced scoring uses the
class as a whole as a referent.
The class average, or mean, usually serves
as the base score against which all other
grades are judged.
Self-Referenced Scoring
Though it is difficult to do in large classes, self-
referenced scoring measures an individual
student's rate of progress relative to his or her
own past performance.
By comparing past test scores, a teacher can
assess a student's rate of progress in a given
subject area or across subjects to see where
he/she is in need of help.
The advantages and disadvantages of
Criterion-, Norm- and Self-Referenced scoring
are listed below:
Type of Advantages Disadvantages
Grading
Norm- 1 Allows for comparisons 1 It whose class does well
referenced among students some students still get
poor grades
2 Classes can be compared to 2 It class as a whole does
other classes poorly a good grade could
be misleading
3 Allows teacher to spot 3 Does not allow individual
students who are dropping progress or individual
behind the class circumstances to be
considered
4. The whole class (or
large portions of it) must
be evaluated in the same
way
5 Everyone m class (or
norm group) must be
evaluated with the same
instrument under the
Analysis also can be done through
Percentile Ranking
Charting Student Performance
Percentile Ranking
Just as the raw scores for individual test items can
be transformed to fit a certain testing model (e.g.
Francophone testing - score/20), so can one set of
test results be analyzed in relation to previous
tests as well as other classes' performances.
Percentile ranks offer a way to obtain an image of
class performance on a test by calculating the
percentage of persons who obtain lower scores.
To obtain a percentile rank
divide the number of students below the
passing grade by the total number of
students who took the test.
For example, if 10 students out of 30 get
passing scores (50% and above), then the
percentile ranking for that test would be 66%
-- that is, 66% of that class rank in the lower
fiftieth percentile.
Charting Student Performance
Just as percentile ranking can give a teacher a
comparative measure of class performance,
charting the results of a test can give the
teacher an internal picture of how his/her
class has performed as a whole.
The graph below, for example, clearly and
graphically illustrates that the majority of the
students in the class failed the test.
Item Analysis
(Evaluation of
Objective Tests)
Item –Analysis
Useful in preparing question bank
and reviewing questions
Represents for the quality control of
tests and examinations
Done through calculating the
difficulty index and
discrimination index
Purposes of Item
Analysis
1. Evaluates the response pattern of each
item within the group tested. How many
% have answered on each option.
2. Evaluate the mastery of course content.
3. Provide information about the level of
difficulty of each question and the ability
of the test item to discriminate between
the good and poor students.
10
Purpose…
4. It improves the reliability of
objective tests.
5. It provides a basis for revising
and restructuring tests.
6. It provides a basis for retaining
or deleting specific items.
10
Conditions for application of
Item Analysis
It applies to relative referenced tests (the
procedure leads to a choice of questions that tend
to maximize variance and ensure discriminatory
ranking.
It is applicable only to questions scored
dichotomously (1: 0) on the MCQ type.
It should not be applied if the total number of
students are very small (a minimum of 20 students
could be proposed as a “pragmatic” criterion).
10
Difficulty Index
An index for measuring the
easiness or difficulty of a test
question.
It is the percentage of students
who have correctly answered a
test question (easiness index).
It varies from 0-100%.
10
H+L x 100
N
H= number of correct
answers in the high group.
L= number of correct
answers in the low group.
N= Total number of students
in both groups (H+L).
10
Discrimination index
An indicator showing how
significantly a question
discriminates between high and
low students (vary from –1 to
+1).
2 x ( H- L)
N
10
Steps in Item Analysis
• Award of a score to each student
• Ranking in the order of merit - Proceed
from the highest to the lowest score.
• Identification of high and low groups -
Preferable 1/3
• Calculate the difficulty index of a
question
• Calculate the discrimination index of a
question
• Critically evaluate each question - could
be done manually as well using
computers. 10
Analysis….
Difficulty index the higher the index
denotes the easier is the question.
In principle, a question with the difficulty
index lying between 25% and 75% is
acceptable (in that range, the discrimination
index is more likely to be high).
A test with the difficulty index in the range of
50% - 60% is very likely to be reliable.
10
Analysis…
In calculating the discrimination index, the higher the
index denotes the more a question will distinguish
between high and low scorers.
When a test is composed of questions with high
discrimination indexes, it ensures a ranking that clearly
discriminates between the students according to
their level of performance, i.e. it gives no advantage
to the low group over the high group.
In other words, it helps you to find out the best
students.
10
Analysis…
0.4 and over: excellent question.
0.3: Good question
0.2: marginal question-revise
Under 0.15: poor question- most
likely to be discarded.
0: No discrimination at all
Negative number is a bad item
10
Exercise:
On dividing group of 21 students
into 3 groups of 7 each, based on
the basis of the total score of
each student, it is found that for
the first item 7 students in the
high group and 4 from the lower
group have got the right answer.
Calculate the difficulty index; the
discrimination index and give
your decision based on the
analysis.
11
THANK YOU!