0% found this document useful (0 votes)
4K views109 pages

Unit:1 Measurement and Evaluation in Education

Educational measurement involves quantifying changes in student behavior and characteristics through tools like tests and scales. It is a type of psychological measurement that is indirect and based on inferring traits from behavior samples rather than directly measuring abstract concepts. While educational measurement aims to be objective, it is inherently subjective due to relative scales, lack of a true zero point, and inability to fully capture the traits being measured.

Uploaded by

Mukul Saikia
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PPT, PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
4K views109 pages

Unit:1 Measurement and Evaluation in Education

Educational measurement involves quantifying changes in student behavior and characteristics through tools like tests and scales. It is a type of psychological measurement that is indirect and based on inferring traits from behavior samples rather than directly measuring abstract concepts. While educational measurement aims to be objective, it is inherently subjective due to relative scales, lack of a true zero point, and inability to fully capture the traits being measured.

Uploaded by

Mukul Saikia
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PPT, PDF, TXT or read online on Scribd
  • Unit 1: Measurement and Evaluation in Education
  • Functions of Educational Measurement
  • Evaluation: Its Meaning, Characteristics, Basic Principles
  • Relation Between Measurement and Evaluation
  • Test, Examination and Evaluation
  • Nature of Educational Evaluation
  • Steps of Evaluation in Education
  • Importance of Evaluation in Education
  • Unit 2: Test Construction
  • Principles of Test Construction
  • Test Construction and Standardization
  • Trying Out the Test
  • Item Analysis
  • Unit 3: Measuring Tools
  • Errors in Measurement
  • Factors Affecting Validity of a Test
  • Unit 4: Intelligence Test
  • Unit 5: Personality Test
  • Unit 6: Aptitude, Interest and Attitude Test
  • Unit 7: Educational Achievement Test
  • Unit 8: New Trends in Evaluation

Unit:1 Measurement and

Evaluation in Education

• Concept of Educational Measurement –Its nature,


functions
• Evaluation-Its meaning, Characteristics, basic
principles,
• Relationship between measurement and evaluation,
• Test ,Examination and evaluation,
• Steps of evaluation in education,
• Importance of evaluation in education
Concept of Educational Measurement—
its nature, functions
• Measurement is an indispensable part of human life. Almost all aspects
of our daily life are touched by measurement of different forms.
• Ross & Stanley, in their book Measurement in today's schools wrote,
“From birth to death almost every aspect of our daily lives is touched
by measurement in its numerous forms. At birth the record of that
important event is carefully made according to the nurse’s watch.
During the next few days measurements of the baby’s weight and
temperature are part of the daily routine of the hospital. Ever
afterward, whether in school or outside, watches, clocks, balances,
thermometers, money systems, and other forms of measurement play
prominent roles in the life of every human being.”
• Thus, measurement, in its various forms, controls man’s life to a great
extent. Day-to-day activities like buying and selling, cooking, stitching,
constructing, diagnosing diseases and using medicines, and numerous
other such acts require measurement in its different forms.
• Hence, the concept of measurement is neither new nor difficult for us.
MEANING OF MEASUREMENT
• Measurement is the act or process of finding out the size, dimensions,
amount, degree, or quantity of something.
• It is a process by which some characteristic of an individual or some object
or event is presented in the form of some number or symbol according to
specified rule(s).
• Measurement ascertains dimensions such as: big or small; much or less;
long or short; high or low; of an object, event or other phenomena.
• Different units such as kilogram, metre, foot, inch, etc. and various tools
such as weighing scale, thermometers, measuring tapes etc are used for
making different types of measurement.
• For example, in order to measure the physical dimensions such as the
length, breadth and height of a classroom, tools like a measuring tape or
rule is required and the measurement is expressed in units like metre, foot,
or inches.
• Psychometricians have put forward many definitions of
measurement in order to explain the meaning of measurement. Some
of the popular definitions have been presented below:-
• According to S.S. Stevens, “Measurement means assignment of
numbers to objects or events according to rules”.
• According to James M. Bradfield, “Measurement is the process of
assigning symbols to dimensions in order to characterize the status
of a phenomenon as precisely as possible”.
• Guilford says, “Measurement means the description of data in terms
of numbers and this, in turn, means taking advantage of the many
benefits that operations with numbers and mathematical thinking
provide”.
• Thus, measurement means quantification of data. It involves
the process of presenting some object, event, or their characteristics
in symbolic or numeric form.
TYPES OF MEASUREMENT
• Physical measurement: Measurement that exists in the physical or
material world
• used to measure various physical or material things of the world.
• dimensions such as weight, height, length, breadth etc. are measured using
units such as gram, metre, foot, etc. which are of definite value and have
universal standard.
• That is why physical measurement is simple and objective.
• Some characteristics of physical measurement are:-
• starts from a true zero point. E.g., when the length of something is
measured to be ten feet, it is implied that the length of the object is ten feet
from zero. Zero, in physical measurement, indicates the absolute absence
of the dimension.
• complete in itself; it is not dependent upon any other thing.
• direct, definite, and objective measurement.
• the object or thing being measured can be measured completely.
• The units and the tools are definite, universal, and of same standard.
• units bearing the same name have the same value and hence they can be
subjected to direct comparison.
• Psychological measurement: exists in the
psychological or mental world and their
characteristics.
• generally qualitative measurement.
• Unlike physical measurement, psychological
measurement is indirect, because psychological
characteristics or traits are abstract and hence
cannot be subjected to direct measurement.
• Such traits can be measured indirectly only when
they find expression in the behaviour of man.
• That is why psychological measurement is a
complex process.
• Moreover, in the absence definite and standard
units this measurement is generally subjective and
incomplete.
• Some of the important characteristics are:-
• does not have a true zero point. Zero, in this type of measurement, is
not absolute but relative. E.g.
• not complete in itself; it is relative and is dependent upon various
other factors.
• indirect, indefinite, and subjective.
• incomplete, because the psychological trait or characteristic being
measured can never be measured completely. For example, in order
to measure intelligence of an individual, only a sample of his
behaviour can be measured and his total behaviour cannot be
measured.
• The tools and units are not of definite or universal standard. Hence
psychological measurement does not have universal acceptability.
• Direct comparison between different measurements is not possible
since tools and units are not of definite or universal standard.
MEANING OF EDUCATIONAL MEASUREMENT
• Education refers to the deliberate effort made to
bring desirable change in student behaviour.
• In order to effect desirable change in student
behaviour, selected learning experiences are
provided to students using a curriculum in the
form of teaching.
• At the end of teaching, the act of ascertaining or
determining the changes (that are thought to have
taken place as a result of teaching) in student
behaviour is known as educational measurement.
• In order to ascertain the changes in student behaviour,
characteristics of students, such as their knowledge,
skills, abilities, and interests are assigned with
numbers on the basis of an established set of rules.
• This process of quantifying the influence or
effectiveness of learning experiences in terms of
modification of student behaviour may be called
educational measurement.
• According to R. P. Taneja, “It is the process of
quantifying any human attribute that is pertinent in
the context of education.”
• Thus, educational measurement is the measurement of behavior
change in the field of education.
• For such measurement collection and systematic organization of
data relating to the behavior of an individual in the field of
education is required.
• For this, the activities such as, construction of various educational
tests and their standardization; development of other tools of
measurement and their use; etc. are involved. Measurement of
educational achievement, intelligence, interest, ability, aptitude,
and other traits of students using various tests and other tools falls
within the scope of educational measurement.
• Hence, educational measurement may be understood as a part of
psychological measurement.
NATURE OF EDUCATIONAL MEASUREMENT
• The nature of educational measurement is unique which is revealed
through the following characteristics of it:
• Educational measurement refers to the quantitative presentation of
different aspects of individual’s behaviour and their characteristics
in the field of education.
• In other words, educational measurement is the process of
assigning numbers or symbols to individuals or their characteristics
according to set rules.
• It also refers to the process of testing, scaling, or appraising the
outcomes of learning.
• It includes construction and administration of tools such as tests
and scales, their standardization and validation, use of statistical
techniques in the interpretation of obtained measures or test scores.
• Quantitative values used in educational measurement indicate the
presence, absence, or the intensity of the trait being measured.
• Does not start from a true zero point. Zero in educational
measurement refers only to a relative position.
• The value of units used in educational measurement is neither
constant nor definite. That is why educational measurement is not
completely free from the subjectivity.
• Numerical data obtained by educational measurement does not
convey definite meaning in their primary or crude form. For
example, the score of 60 in an examination does not make proper
sense unless total marks in the examination; minimum marks
required to pass the examination; marks of other students in the
same examination; etc. are known.
• educational measurement is based on samples of behaviours and
hence it is incomplete.
• Indirect and is based on inference. Educational achievement or
other characteristics of students cannot be measured directly and
must be measured indirectly on the basis of their answers or
behaviours in response to the questions asked or tasks assigned
to them.
• Numerical data obtained from different measurements cannot be
subjected to direct comparison since measurement in education
is relative and not absolute.
• Thus, educational measurement is indirect, indefinite, and
subjective unlike physical measurement which is direct,
definite, and objective. Though constant attempts have been
made to minimize the subjectivity and other limitations of
educational measurement by experts, it cannot claim the
accuracy and precision that characterizes physical measurement.
FUNCTIONS OF EDUCATIONAL
MEASUREMENT
(i) Instructional function: selection of objectives of
instruction, their adequacy, determination of their usability,
assessing the progress of students, determining the success
or failure of teaching methods and strategies, necessary
revision and reform of curriculum etc.
(ii) Selection: for admission and particular courses,
promotion or retention, various curricular as well as co-
curricular activities etc.
(iii) Classification: on the bases of their abilities, interests, or
other criteria, for selecting students for different courses or
career, forming groups or sections, for special instruction
or guidance; for advanced studies and other purposes.
• (iv) Comparison: --on the basis of achievement, ability,
aptitude, intelligence, and other traits or characteristics.
– comparison between various schemes of instruction, teaching
methods, and strategies etc.,
(v) Prediction: success or failure of students in particular
course, career or other related areas of activities.
(vi) Diagnosis: of strengths and weaknesses so that proper
measures may be taken
(vii) Guidance:
(viii) Administration: educational decisions, policy
formulation, assessing the effectiveness of administrative
measures, preparation of progress report, admission,
promotion, and other administrative decisions.
(ix) Research: Measurement in education helps
immensely in the conduct of research for finding
solutions to educational problems.

(x) Development: Educational measurement is very


important for obtaining feedback about the success of
teaching-learning and teaching methods, for
collecting data to improve learning experiences and
teaching strategies, and for making effective schemes
of education to achieve the objectives of instruction.
Evaluation-Its meaning,
Characteristics, basic principles
• Evaluation is the process of supplying information for making
decisions.
• Decision making or the act of selecting one out of many
alternatives is based on making judgment or assigning values to
alternatives.
• Such act of value judgment, again, is based on certain
measurements, rules, objectives, or other criteria.
• The act of evaluation thus includes the tasks of collecting,
organizing, analyzing, and processing information so that
value judgment could be formed for making decisions.
• The concept of evaluation in education is relatively new.
• It is a process of making judgement about the worth or value of the
educative process and its outcome with reference to the objectives.
• It involves making judgments about the educational plans including
goals and objectives; curriculum, methods, and personnel; decision
about terminating, continuing, or modifying various programs.
• It is also a process judging the extent to which change in behaviour
of students has effected in the light of the set goals.
• It is not just an assessment or measurement of knowledge in a
subject or subjects.
• Rather, a comprehensive assessment of students’ motivation,
aptitude, attitude, interest, values, skills and other traits of
personality.
• In fact, evaluation is the assessment of the dynamic changes of the
total personality of the individual.
• The NCERT in its booklet, “The concept of evaluation in
education” (1963), describes evaluation as the process of
determining three important aspects:-
• The extent to which an objective is being attained,
• The effectiveness of the learning experiences provided in
the classroom, and
• How well the goals of education have been accomplished.
• Thus, three important elements are closely interwoven in
the process of evaluation; they are:
– aims and objectives of education,
– learning experiences provided to attain the goals and objectives,
and
– change in student behaviour as a consequence of the learning
experiences provided.
the process of evaluation:-
• Thus, evaluation is always carried out with reference to the
aims and objectives.
• Hence, educational evaluation or the process of making
judgment about students’ progress must refer to the
predetermined goals of education.
• According to Bradfield and Murdock, “Evaluation is the
assignment of symbols to phenomenon in order to
characterize the worth or value of a phenomenon usually
with reference to some social, cultural or scientific
standard.”
• According to Cronbach, evaluation is “the collection and use
of information to make decisions about an educational
program.”
• Ralph W. Tyler defined evaluation as, “the process of
determining to what extent educational objectives are
actually being realized by the programme of curriculum and
instruction.”
• Bloom says, “Evaluation is concerned with securing evidence
on the attainment of specific objectives of instruction.”
• Thus, Evaluation is a very comprehensive
process.
• It helps to build an educational program, assess
its effectiveness, and helps to improve it.
• Evaluation is a much wider concept than
traditional examination which takes into
account the cognitive, affective, and psycho-
motor change in behaviour or learning
outcomes.
Relation between measurement and evaluation
• The concepts of measurement and evaluation are often confused and hence used
interchangeably.
• Measurement and evaluation, though similar, have different meanings.
• The concept of evaluation is much wider and more scientific than measurement.
• Measurement is only a part of evaluation. However, evaluation is based on
measurement.
• Measurement is a numerical score of an object, subject, or some other
characteristic, but evaluation is the act of adding subjective judgment or value to
that score or measurement.
• For example, determining the length and breadth of a piece of cloth is an act of
measurement; whereas, assessing the usefulness or adequacy of that piece of cloth
to stitch a dress of definite size is an act of evaluation.
• Similarly, if in a test of Mathematics a student scores 45, this is a measurement of
his knowledge in Mathematics. Now, if his score is compared with other students
appearing the same test and he is given a rank or grade, then this is evaluation.
• Acc to Bradfield and Murdock, “Measurement is the process of
assigning symbols to dimensions in order to characterize the
status of a phenomenon as precisely as possible” whereas,
“Evaluation is the assignment of symbols to phenomenon in
order to characterize the worth or value of a phenomenon usually
with reference to some social, cultural or scientific standard.”
• Similarly, Munroe wrote, “In measurement the emphasis is upon
single aspect of subject-matter, achievement of specific skills and
abilities, whereas in evaluation the emphasis is upon broad
personality changes and major objectives of educational
programme.”

• A comparative analysis of the features of measurement and


evaluation may help us understand the difference between them:
MEASUREMENT EVALUATION
Measurement is the act of assigning Evaluation is the act of making
numerical value to things and ordering subjective value judgement about things
them. on the basis of information obtained by
measurement.
Measurement is objective, definite, and Evaluation is subjective, relatively
impersonal. indefinite, and personally meaningful.

Measurement is the quantification of Evaluation is a qualitative value


objects or their characteristics. judgement of the quantity.

Measurement determines the presence, Evaluation gives normative value and


absence, or intensity of some variable interpretation to measures besides
being measured. determining the presence or absence of
a variable.
Measurement is quantitative and Evaluation is qualitative and normative
numerical. value.

Measurement supplies information Evaluation uses information supplied by


required for making evaluation. measurement.

Measurement is discrete process and its Evaluation is an integrated process and an


scope is limited. integral part of education. Its scope is
comprehensive.

Measurement concerned with immediate Evaluation is associated with long term


objectives and hence it is present oriented. goals of education; hence it is a continuous
process.

Measurement is scientific in nature. Evaluation is philosophic in nature.


Test, examination and Evaluation
• Test, examination, and evaluation are similar terms and are used
interchangeably at times.
• However, these similar terms have different meanings.
• Test and examinations are tools of evaluation and they have
differences in their meaning and scope.
• A test is a set of items or questions to be answered or tasks to be
performed on a certain area or topic.
• The testee is evaluated on the basis of his answers to the questions
or on the basis of his performance on the assigned tasks.
• An examination too, is a set of questions or tasks; however, an
examination includes many areas or topics unlike a test which is
focussed only on a single area or topic.
• Thus, the scope of an examination is wider than a test.
• An examination can be a collection of many interrelated or
interconnected tests.
• An examination may measure various psycho-physical
qualities of examinees, but a test measures a single dimension
of behaviour or personality of the testee.
• In educational field, a test and an exam both test the
knowledge of a student.
• So, in most cases tests and exams are synonyms.
• However, exams are held at the end of the academic session
which includes all subjects or disciplines.
• On the other hand, tests are held during the academic session
which is based on particular units or topics of the curriculum.
• Tests are generally diagnostic and formative that aim at
checking achievements after a lesson or series of lessons,
while examinations are summative and they help in making
decision about promoting or retaining students at the end of
the academic session.
• A test and an exam are both tools of evaluation.
• A test aims at measuring a particular portion or part of an
object, or of a subject.
• Examination, in contrast, aims at measuring the object or
subject in its entirety.
• For example, a physician makes an examination of a patient
to diagnose diseases. Such physical examination involves
many tests, such as, blood test, urine and stool test, etc.
• Evaluation is a concept which is wider than test and
examination.
• Various tests and exams are used as tools for evaluation.
• A test or an examination is usually concerned with any
one dimension of the cognitive, affective, or psycho-
motor behaviour of the individual. Whereas, evaluation is
comprehensive process and it takes into account all the
aspects including cognitive, affective, as well as psycho-
motor of behaviour of the individual.
• Besides, evaluation is a continuous process which uses
the information obtained from different tests and
examinations held from time to time.
NATURE OF EDUCATIONAL EVALUATION
• Following characteristics reveal the nature of evaluation.
• The concept of evaluation, which is relatively new in the field of
education, is more scientific and comprehensive than
examination or measurement.
• Evaluation is quantitative as well as qualitative measure of a
phenomenon. It is the act of adding value judgement to an
assessment or measurement.
• Although evaluation depends on measurement, evaluation and
measurement are not same; measurement is only a part of
evaluation. Measurement assigns quantitative score and
evaluation makes value judgement on that score.
• Evaluation is a process of making judgement about the value or
worth of something, and hence it is subjective.
• Evaluation is based on some social, cultural, or scientific
standards and/or goals.
• Educational evaluation is closely associated with the goals
and objectives of education.
• Evaluation is a continuous and systematic process which
takes into account all the three domains of students’
behaviour: cognitive, affective, and conative or psycho-
motor.
• Evaluation is not just a process of assessing knowledge in
certain subjects; rather it is an elaborate assessment of
students’ motivation, aptitude, attitude, interest, values,
skills, and other psycho-physical traits. In fact, evaluation
is the assessment of the total personality of the individual.
• Evaluation is a continuous process unlike tests or
examinations that are held periodically or at the end
of academic session.
• Since evaluation is continuous, no value or measure
should be considered final or absolute.
• Three vital elements are closely interwoven in the
process of educational evaluation; they are: aims and
objectives of education, learning experiences
provided to attain the goals and objectives, and
change in student behaviour as a consequence of the
learning experiences provided.
• The basic nature of evaluation is philosophical and
normative.
Steps of evaluation in education
• A systematic program of evaluation must follow the following
steps:
• Determining the objectives of evaluation: Deciding what to
evaluate is the first important step in the evaluation process.
Evaluation starts with formulating and stating the objectives of
evaluation clearly.
• Defining the objectives in behavioural terms: The second
important step in educational evaluation is defining the
objectives in behavioural terms. Because broad or vaguely
stated objectives cannot be measured and evaluated. For
example, if we want to evaluate “study skills” then it needs to be
defined by listing what the students are expected to acquire.
• Selecting and fixing criteria or standards for
evaluation: Evaluation is always done by comparing
observations or data with predetermined criteria or
standards. Hence, it is important at this stage to select
and fix such criteria or standard for making
evaluation.
• Selection and development of tools and techniques
for data collection: Selecting appropriate tools and
techniques for collecting relevant data is next
important step in evaluation. The evaluator may have
to develop his own tool and technique if available
tools do not fit into his purpose of evaluation.
• Collecting and recording data: using selected tools and
their recording in appropriate form is done.
• Analyses and interpretation of data: After collection and
recording, data are analysed using various statistical and
other procedures so that the data could be interpreted
and valid conclusions can be arrived at. The collected
data are then compared with the previously determined
standards or criteria for evaluation.
• Evaluation and decision making: decision about the
value of the product or process being evaluated is taken
and judgment is made about their adequacy or
effectiveness on the bases of analyses, interpretation and
comparison with the standards, criterion, and/or goals.
Importance of evaluation in education
• Evaluation is an integral part of teaching-learning process.

• The report of the Kothari Commission (1964-66) realizing the


importance of evaluation observed, “The new approach to evaluation
will attempt to improve the written examination so that it becomes
valid and reliable measure of educational achievement and to devise
techniques for measuring those important aspects of the student’s
growth that cannot be measured by written examinations.”
• Some important points about the importance of evaluation may be
stated as under:
• Evaluation appraises the success and failure of education.
Evaluation helps in knowing the extent of learner’s progress
towards goals and the success and failure of the methods and
learning experiences in achieving the goals.
• Evaluation analyses and reviews the goals and objectives of
education. Evaluation starts with clarifying the goals and
objectives of education and their statement.
• Traditional examination system is defective and fails to make
proper evaluation of students. Evaluation makes continuous and
comprehensive assessment of learners which is more scientific.
• Evaluation is important in view of its role in discovering
individual differences among students. Such discovery of
differences among students helps to shape objectives, curricula,
and methods to suit different categories of students.
• Evaluation helps in promotion, placement in various courses and
career, and grading of students. It helps modifying learning
experiences or curriculum in order to attain the objectives of
education.
• Evaluation makes comprehensive appraisal of desired changes in
student behaviour and provides valuable insights about the needs,
possibilities, strengths, and weaknesses of students. Such
information about students makes instruction effective and useful.
• Evaluation helps in making necessary modification in the learning
experiences or the curriculum.
• Evaluation provides a basis for guidance. Evaluation provides a
total picture of the individual which helps to guide him in
educational and vocational field. Thus, success of educational,
vocational and personal guidance depends on proper evaluation of
students’ strengths, weakness and other personality characteristics.
Unit: 2:

Test Construction
General Procedure of Test Construction
and standardization
• Psychological test is one of the most important tools of
measurement and evaluation.
• A test is a collection of some questions to be answered or tasks
to be performed by the testee.
• The questions to be answered or the tasks to be performed are so
selected or framed that the response of the testee to such
question/task could make the trait to be measured explicit and
measureable.
• Thus, a psychological test may be understood as a set of
carefully selected stimuli to elicit a sample of behaviour required
to measure a particular trait or characteristic.
• According to Marshall, “A psychological test may be
defined as a pattern of stimuli, selected and organised to
elicit responses which reveal certain characteristics in
the person who takes them.”

• According to Frank S. Freeman, “A psychological test


is a standardized instrument designed to measure
objectively one or more aspects of a total personality by
means of samples of verbal or non-verbal responses, or
by means of other behaviours.”

• Anastasi says, “A psychological test is essentially an


objective measure of a sample of behaviour”.
NATURE OF PSYCHOLOGICAL TEST:

A psychological test is a tool of psychological


measurement.
A psychological test is a set or pattern of stimuli,
selected and organised to elicit responses
Psychological tests are tools of indirect
measurement,
Psychological tests require a testee to perform some
observable and measurable behaviour.
20/04/20 MUKUL SAIKIA, DARRANG COLLEGE
• In a psychological test, the behaviour which is

thought to be important in describing or

understanding the trait or characteristic is used to

measure that trait, or characteristic.


• In psychological testing, scores or categories are

assigned to individuals to represent abstract

dimensions of behaviour, traits, or performance of

individuals.
20/04/20 MUKUL SAIKIA, DARRANG COLLEGE
• A psychological test follows standardized procedure. The

procedure for administering a psychological test is uniform in all

settings and with all examiners. Psychological tests accompany

an instructional manual for administration of the test.


• A psychological test always measures a limited sample of the

behaviour. Tests depend on samples of behaviour to make

inference about the total domain of the relevant behaviour.


• Psychological tests aim to measure individual difference in

terms of performance, traits, or characteristics.

20/04/20 MUKUL SAIKIA, DARRANG COLLEGE


• Psychological tests accompany norms or standards calculated on
the basis of average performance of a group on the same test. Such
norms help in interpreting the score or performance of an individual
by comparing it with the norm for the group to which the individual
belongs.
• The most important feature of a psychological test is that it makes
inference about the total domain of behaviour on the basis of the
measurement of the sampled behaviour. Thus it makes predictions
about the non-tested behaviour or performance, or trait of the testee.

20/04/20 MUKUL SAIKIA, DARRANG COLLEGE


PRINCIPLES OF TEST CONSTRUCTION:-
• Proper Planning: the area, objectives, type of items and
their effectiveness, cost, length, time and practicality
must be considered.
• Defining objectives in behavioural terms (what the
student is expected to do)
• Content to be covered should be clearly specified.
• A blueprint must be prepared including the number of
items under each topic.
• The instruction should be clear, unambiguous and
precise.
• Items must help in measuring all instructional objectives.

20/04/20 MUKUL SAIKIA, DARRANG COLLEGE


• Items must be appropriate for particular objectives.
• Items must test higher mental abilities rather than simple
recall of information.
• Test must be valid, reliable and objective.
• Items having obvious answers, unimportant,
meaningless and ambiguity must be avoided.
• Language of the items should be clear and straight
forward.
• Items without clear cut answer must be avoided.
• Tricky questions or items should be avoided.
• Items which furnish the answers to other items must be
avoided.

20/04/20 MUKUL SAIKIA, DARRANG COLLEGE


TEST CONSTRUCTION AND
STANDARDIZATION:-

• 1. Planning the Test:


– Decision about age, educational level, subject, number of items, size
or length of test, time, total marks, printing etc
– Determination of the objectives

– Behavioural definition of objectives

– Analysis of content:
• content-(maths/language/science/history etc; )

• mental process-(memory/analysis/reasoning etc;)

• performance- (type of performance to be measured)

20/04/20 MUKUL SAIKIA, DARRANG COLLEGE


– Preparation of the Blue-print:
• A test blueprint reflects the content of a test. It contains
the instructional objectives, the questions or tasks to
match the instructional objectives, and the learning
domains and levels therein.
• type of items (MCQ, Alternate response, true-false,
completion etc)
• Weightage /marks to be allotted to each topic/unit

20/04/20 MUKUL SAIKIA, DARRANG COLLEGE


• Preparing the first draft of the test: In this step
– items for the test,
– instructions to the testees and
– the scoring key is prepared.
– Some guidelines for preparing items :
• Items must relate to the objectives of the test,
• Items must represent the whole content and all instructional
objectives.
• Should be clearly written and should not be ambiguous.
• The difficulty level should be of appropriate level.
• Items must be grammatically correct and should not give any cue to
testees.
• Items without clear cut answer must not be included.

20/04/20 MUKUL SAIKIA, DARRANG COLLEGE


• More than one type of items should be included to make
the test interesting and representative
• Preliminary draft should contain more items than final
draft
• Items that measure learning rather than memory should
find place
• Items should be equally weighted
• Should be ordered in the increasing order of difficulty
• Items containing answers to other items must be avoided.
• Test items must be technically correct, i.e. reliable, valid
and objective.

20/04/20 MUKUL SAIKIA, DARRANG COLLEGE


• Another important task is preparation of instructions.
• validity and reliability depend on clarity of instructions
• instructions for the test must:
– clarify the purpose of the test,
– mention the time allotted for completing the test,
– provide clear directions to testees about how to answer the
items,
– state how the answers to items would be recorded,
– how to deal with the effect of guessing,
– Preparation of scoring key, and
– explain the method how to score the items.
• Expert’s opinion
• Small group try out
20/04/20 MUKUL SAIKIA, DARRANG COLLEGE
• Trying out the test:
• Administered to a large and random representative sample of
the population
• The sample must have poor, good, and brilliant students.
• The physical and psychological condition should be normal.
• There should be proper invigilation and other required
conditions.
• Scoring should be done with the help of the scoring key and
procedure prepared beforehand.

20/04/20 MUKUL SAIKIA, DARRANG COLLEGE


• Trying out helps:

• to determine the validity, reliability, and


usability of the test
• to improve defective items and delete
unnecessary ones.
• to determine the difficulty and discriminating
power of items

20/04/20 MUKUL SAIKIA, DARRANG COLLEGE


• Item analysis:
• The main purpose is to improve the quality of test items
for future administration.
• The simple meaning of item analysis is determination of
difficulty level and discriminating power of test items.
• Item analysis helps in discarding the items that are too
easy or too difficult.
• Only items with average or moderate difficulty level
and discriminating power are retained.
• Distracter analysis in case of multiple choice type items.

20/04/20 MUKUL SAIKIA, DARRANG COLLEGE


• Preparing the final draft of the test:
• Items selected on the basis of item analysis are
put into the final draft
• Clear instructions are given
• time required for completing the test is
determined and recorded.

20/04/20 MUKUL SAIKIA, DARRANG COLLEGE


• Standardization:
• The main purpose is to determine the qualitative standard
and norms for the test. The main tasks involved:
– Administering the final draft on a very large group of students,
– Scoring the test booklets and arranging the scores for analyses,
– Calculating Mean and other statistical values of the scores,
– Determining the norm for the test on the basis of the statistical
values,
– Determining the reliability of the test items,
– Determining the validity of the test items,
– Considering the various aspects of usability of the test,
– Preparing test manual or guidelines for administering the test.

20/04/20 MUKUL SAIKIA, DARRANG COLLEGE


Meaning and characteristics of a standardized test:
• A standardized psychological test is a test that is
administered, scored, and interpreted in the same way
for all test-takers. A standardized test is characterized
by uniformity of procedures for administration and
scoring of the test. Owing to its uniformity of
procedure, the scores obtained by different persons on
standardized test are comparable.
• According to L. J. Cronbach, “a standardized test is one
in which procedure, apparatus, and scoring has been
fixed so that precisely the same test can be given at
different times and places”.
Characteristics of a standardized test
• Standardized test is administered under identical conditions, it
does not matter where, when, by whom or to whom,
• A standardized test is always scored objectively and in a
standard or consistent manner; the procedures are specified so
that different scorers arrive at the same scores for the same set of
responses,
• A standardized test requires all test takers to answer the same
questions in the same way,
• Scores obtained by different individuals or groups in
standardized test can be compared in order to assess the relative
performance.
• Standardized tests are usually designed to measure
relative performance of individuals or groups. Hence
establishment of norms for comparing the
performance of individuals or groups forms an
important part of a standardized test construction.
• The formulation of directions for a standardized test
is a major part of its construction. Directions for
standardized tests specify the materials to be used,
time limits to be given, oral instructions and
demonstrations to testees, and all other details of the
testing situation.
• Standardized tests are highly reliable and valid.
ITEM ANALYSIS:-
• A test is basically a collection of items.
• An item is a statement in the form of a question.
• E. Lindquist:"A Scoring unit in an exercise”
• K. L. Bean: an item is “a single task or question that usually
cannot be broken down into any smaller unit”.
• the quality of a test depends on the quality of test items.
• validity, reliability, and objectivity of a test depend on the
quality of items.
• Item analysis suggests deletion of unnecessary items to make
it shorter and to increase its validity and reliability.
MEANING OF ITEM ANALYSIS:-
• Item analysis in its simplest meaning is the process of
determining relative difficulty level and discriminating
power of the test items.
• Abodunrin: Item analysis is “the process involved in
examining or analysing testee’s responses to each item on
a test with a basic intent of judging the quality of item
specifically, the difficulty and discriminating ability of the
item as well as effectiveness of each alternative.”
• Guilford, “The major goal of item analysis are
the improvement of total score reliability or total
score validity and the achievement of better item
sequences and types of score distributions. Both
the validity and reliability of a test depend on
characteristics of its items. Item analysis ensures
high reliability and validity into a test in
advance.”
OBJECTIVES OR PURPOSES OF ITEM ANALYSIS:-

• improving the qualitative standard of items for future use.


• helping to construct appropriate, effective, high quality
items.
• helping in deleting and/or revising the defective or
ineffective items.
• determining the relative difficulty level of each item
• determining the discriminating power of each item.
• Analyzing distracters.
• suggesting ways to improve items
• studying the strengths and weaknesses in educational
achievement of the test-takers.
• Finding out the nature of students’ achievement,
effectiveness of teaching, and the characteristics of
test items.
• Helping to construct better items by way of studying
the incorrect responses of the test-takers.
THE PROCESS OF ITEM ANALYSIS:-
• Mainly involves the tasks of determining difficulty
level and discriminating power of each item besides
analysis of distracters.
• test experts believe that there is no need for
analysing the responses of all the respondents
• analysing the responses of a section of respondents
may serve the purpose well.
• For this, two groups from among the respondents of
the items should be formed and named the ‘High
group’ and the ‘Low group’.
• In order to form these groups, all the respondents should
be ranked on the basis of their scores in the test.

• Then, for convenience of statistical calculations, any


percentage between highest 25% and 35% of
respondents should be included in the High group and
any percentage between lowest 25% and 35% of
respondents should be included in the Low group.

• It is important to note in this connection that different


formulas are used to determine difficulty level and
discriminating power of test items.
DETERMINATION OF DIFFICULTY LEVEL:-
• The difficulty level of item is simply the percentage of
students who answer an item correctly.
• The item difficulty index ranges from 0.00 to 1.00; the
higher the value, the easier the item.
• If an item is answered correctly by everyone, or
everyone gets it wrong, then it must be understood that
the item is too easy or too difficult.
• Such item will not discriminate between good and poor
students.
• Hence items only with average difficulty level should be
included in a test.
• To determine difficulty level of an item, the total
number of students answering an item correctly in both
high and low group should be divided by the total
number of students in the two groups. This calculation
can be done using the following formula:
 

• Where, = Difficulty Index


• = Number of respondents who attempted the item correctly in the High
group
• = Number of respondents who attempted the item correctly in the Low
group
• NH = Total number of respondents in the High group
• NL = Total number of respondents in the Low group
• For example, if an item is attempted correctly by 12
out of 20 students in the high group and 8 out of 20
students in the low group, then the index of difficulty
level for the item would be:

• Difficulty index may range from 0.00 to 1.00;


the higher the value, the easier the item.
• According to H.E. Garrett, any index of item
difficulty ranging from 0.40 to 0.60 is
acceptable.
• However, an item with proper difficulty level may not
be able to distinguish between good and poor learners
sometimes. For example, if an item is attempted
correctly by 10 out of 20 students in the high group and
10 out of 20 students in the low group, then the index of
difficulty level for the item would be 0.50;

• Difficulty level for the item above is acceptable, but the


item could not make any discrimination between good
and poor learners;
• On the other hand if an item is attempted correctly by all 20
out of 20 students in the high group and none out of 20
students in the low group, then the index of difficulty level
for the item would again be 0.50.

• But, the discriminating power of this item would be highest;


because this item is answered correctly by all in the high
group and none in the low group.
• This example shows that only acceptable index of difficulty
level cannot make for good items.
• This is why determination of discriminating power of an
item is indispensable part of item analysis.
• DETERMINATION OF DISCRIMINATING
POWER:-
• the discrimination index of an item is defined as “the
degree to which it discriminates between students of
high and low achievement”.
• There are various methods for determining
discriminating power of test item;
• one such method is to:
– subtract the number of students giving correct answers in the
low group from the number of students giving correct
answers in the high group and
– dividing that by the number of students either in the high or
in the low group.
– This calculation can be done by using the following formula:
• For example, if an item is answered correctly by all the
20 students of the high group and by none in the low
group, then the index of discriminating power of the
item would be:

• In the same way, if an item is answered correctly by 10


out 20 in the high group and 10 out of 20 in the low
group, then the index of discriminating power of the
item would be:
• index of discriminating power of an item may range
between +1.00 and –1.00.
• The highest level of discrimination of an item is
indicated by +1.00 and this happens when an item is
answered correctly by all the students of the high
group and by none in the low group.
• When an equal number of students from both high
and low group answer an item correctly, then the
index of discriminating power would be 0.00.
• And a negative index of discrimination is obtained
when more students from the low group than the high
group answers the item correctly.
• The following table shows the different indices
of discriminating power and their
interpretations:
DISTRACTER ANALYSIS:-
• Distracter analysis is the process of determining
how the distracters were able to function
effectively by drawing the test takers away from
the correct answer.
• The process of distracter analysis involves
counting the number of times each distracter is
selected in order to determine the effectiveness
of the distracter.
• An effective distracter should be selected by a
significant number of respondents.
• Distracter analysis can also be done by using the
formula for determining item discrimination.
However, instead of a positive value, a negative value
is acceptable. In this method, the responses of the
High and the Low group to an item is tabulated and
the discrimination index calculated as shown below:
• In the table, Alternative A is the key and hence a positive
value with a discrimination index of 0.40 is acceptable.
• The value for B is 0 and this means the distracter could not
discriminate.
• Hence, effectiveness of this distracter is questionable and
must be revised or removed.
• The positive value (0.60) obtained by C shows that more of
the good students selected this distracter. Hence, this
distracter again is not functioning effectively.
• Distracter D has functioned effectively by attracting more
students in the low group than in the high group.
• A distracter is effective when it seems to be the correct
answer to poor students.
Unit: 3: Measuring tools

• Different types of Tools


• Errors in measurement,
• Characteristics of a good test
• Validity, Reliability, Objectivity and Norms
(Meaning, factors and method of
determination types)
Tools for Measurement:-

04/20/20 Dr. Mukul Saikia, Darrang College, Tezpur 85


• Inventories: Inventories contain certain questions or statements
concerning some personal characteristics, liking, disliking,
interest, hobby etc which are written in the first person.
• Testees require to respond to such questions or statements in terms
of ‘true’ or ‘false’, ‘agree’ or ‘disagree’.
• Inventories are self reporting tools.
• Depending upon the objectives, inventories are classified as
interest, adjustment, and personality inventories.
• Examples are: Strong interest inventory, Bell Adjustment
inventory, Minnesota Multiphasic Personality inventory etc.
• Inventories are economic, simple, and objective tools.
• However, inventory data may be biased or superficial. Another
problem with inventory is that it is very difficult to validate.
• Attitude scale: a tool used to measure attitude of people towards
some issue, object or subject.
• An attitude is a learned emotional response set for or against some
object, individual, issue or other phenomenon. It is a mental or
neural readiness of an individual to respond or to act either in
positive or in negative way towards something.
• Attitude scale provides a basis for assigning a numerical value to a
person’s attitude.
• It is a tool designed to produce scores indicating the intensity and
direction (positive or negative) of a person’s feelings about an
object or event. 
• Attitude scales consist of statements and the respondent is asked to
respond to the statements and on the basis of his responses a score
is allotted for each item which indicates his position.
• The most popular attitude scales are the Thurstone, Likert, and
Guttman scales.
• Observation: It is the act of watching behaviour of an individual or a group,
events, or noting down physical characteristics in natural or actual setting.
• Observation helps to measure and evaluate external or overt behaviour of
individual.
• Observation can be natural or controlled;
• Natural observation is advantageous because of the fact that people behave
naturally if they do not know that they are being observed. However, natural
observation is not always possible or feasible and hence controlled observations
must also be used.
• Observations can also be participant or non-participant. In participant
observation the observer becomes one among the people whose behaviour is
being observed and he takes part in their activities while observing them. In non-
participant observation, the observation is conducted from at a distance without
interfering in the natural setting and activities of the individuals being observed.
• Observation must be systematic and carefully planned.
• The main advantage of observation technique is that it allows to collect data
where and when an event or activity is occurring.
• However, observation is susceptible to observer bias. Moreover, people try to
perform better when they know they are being observed.
• Rating scale: Rating scales consist of some statements expressing
opinion or judgment concerning some situation, object, or character
on a scale of values. It consists of some points indicating the
observable trait in its different degrees. The rater is asked to rate
his choice on a 3, 5, or 7 point scale.
• Though there are different forms or types of rating scales, the
Numeric and Graphic rating scales are most commonly used.
• Though rating scales are very popular and widely used tools of
evaluation, they have several limitations. These include:
• Halo effect, or the tendency to rate on the basis of previous
performance or general impression about the person to be rated;
• Generosity error or the tendency of raters’ to be too generous
• Leniency error or the tendency to assign higher rate to people
known to the rater;
• Central tendency error or the tendency to rate around the average;
and other such errors.
• Check list: Check list is a simple tool of
evaluation. It contains a list of behaviours or
questions and the respondent needs to check
‘yes’ or ‘no’ against the question or behaviour.
It is a very useful tool for identifying the
presence or absence of knowledge, skills, or
behaviours.
• Questionnaire: is a set of questions concerning some social,
psychological or educational topic or issue.
• A questionnaire may either be administered face to face or mailed.
• Questionnaire may be administered individually or in group.
• Questionnaires may be classified into two types: closed and open.
• Closed questionnaires are relatively objective, scoring is easy and
fast, and answering to such questionnaire is easy for the respondents.
• Open questionnaires can delve deep into the issue and provide
freedom to the respondents to answer in any length and way.
However, analysing and interpreting the open answers is a difficult
task.
• Chief advantages of questionnaire include their economy, ability to
cover wide sample, freedom to respondents, etc.
• However, questionnaire is very time consuming tool and there is no
check if respondents misunderstand or omit some questions.
Moreover, the rate of return of mailed questionnaires is very low.
• Interview: a means of face to face and direct communication for
collecting required data. Once the interviewer establishes rapport
with interviewee and creates conducive environment for interview it
becomes easy to collect all important data from the interviewee.
Another big advantage of interview is that body language of the
respondent supplies additional information about the subject.
• Interviews may be classified into structured and unstructured types.
Unstructured interview is variously known as ‘focussed’, ‘depth’, or
‘non-directive’ interview.
• Interview is flexible and has high response rate, can elicit deep and
sensitive information.
• However, interview is costly, time consuming, invades privacy of
the respondents, and suffers from inhibitions due to face to face
presence of interviewer. Moreover, information obtained from
interview is difficult to interpret.
Tools for Measurement:-

04/20/20 Dr. Mukul Saikia, Darrang College, Tezpur 93


Errors in Measurement:
• The measurement error (also called Observational
Error) is defined as the difference between the
true or actual value and the measured value.
• It includes random/unsystematic error
(naturally occurring errors that are to be expected
with any experiment) and systematic error
(caused by a mis-calibrated instrument that affects
all measurements).
• Systematic errors occur due to faulty tools of [Link]
example, you are measuring the weights of 100 athletes. The scale
you use is one Kg off: this is a systematic error that will result in
all athletes body weight calculations to be off by a kg. (constant
error)
• Unsystematic errors occur due to many unpredictable variables
(hence also known as variable error)On the other hand, let’s say
your scale was accurate. Some athletes might be more dehydrated
than others. Some might have wetter (and therefore heavier)
clothing or some candy bar in a pocket. These are random errors
and are to be expected. In fact, all collected samples will have
random errors — they are, for the most part, unavoidable.
FACOTRS AFFECTING VALIDITY OF A TEST
– Factors inherent the test itself:
– Clarity of instruction:
– Length of the test:
– Language used by test items:
– Difficulty level of test item: ( when the item difficulty level in a test is
homogeneous, validity of the test is low; on the other hand, when item difficulty
level in a test is heterogeneous, validity of the test is high.)
– Item discrimination power:
– Defective construction of items:
– Inappropriate items:
– Faulty arrangement of items:
– Guessing:
• The Criterion of validation:
• Method of estimating validity:
• Nature of the group: (Validity is always specific to a particular group of
individuals.)
• Factors related to test-takers’ response:
• Factors related to administration and scoring of the test:
TYPES OF VALIDITY OF A TEST
• Face validity
• Content validity (variously known as Curricular validity,
Rational or Logical validity, Internal validity etc. Content
validity is particularly applicable in the tests of educational
achievement.)
• Criterion validity (Criterion validity is also known as Empirical
validity or Statistical validity. ) predictive and concurrent
• Construct validity
• Factorial validity (A test is usually constructed to measure
different mental construct or factors. For measuring different
such factors or constructs, different units of items are included
in a test. Factorial validity refers to the coefficient of
correlation between the scores on units of items constituting a
factor and the scores on the total test. )
FACTORS AFFECTING RELIABILITY OF A TEST
• Length of the test:
• Methods used to estimate reliability:
• Nature of the group: more reliable if the group of students is more
heterogeneous.
• Difficulty level of items: Too difficult or too easy items in a test reduce
reliability of the test.
• Objectivity of scoring:
• Clarity of instruction:
• Interval between testing and retesting:
• Environment of test administration:
• Psycho-physical condition of the test-takers:
• Guessing:
TYPES OR METHODS OF ESTIMATING RELIABILITY OF
A TEST
• The test-retest method: The advantages of the test-retest method of reliability are:
• it is a simple method,
• it is economic method, since only one administration of the test is required to estimate this
reliability,
• this method does not require the construction of a second test or form,
• this method is applicable to almost all types of tests,
• this method does not involve complex statistical procedures, simple coefficient of correlation
helps.
• The disadvantages of the test-retest method of reliability are:
• it is difficult to decide the interval between the two administrations of the test;
• if the interval between the two administration is short, immediate memory, habit, familiarity with
the test items may raise the reliability coefficient to a very high level and on the other hand if the
interval between the two administration is long, changes in behaviour and maturity of students
may make the reliability coefficient low,
• if the conditions under which the two different administrations of the test are held are different that
results in low reliability of the test,
• this method is time consuming.
• The equivalent or parallel form method:
• The advantages of the equivalent form method are as follows:
• In this method there is no need to administer the test twice,
• There is no problem of time interval between two administrations,
• The problem of familiarity with test or immediate memory affecting
reliability is absent here,
• This method is advancement over the test-retest method.
• The limitations of this method are:
• It is very difficult to prepare two identical forms of a single test,
• This methods requires the construction and standardization of a second
form of the test which is time consuming and difficult,
• Since items in the two forms are similar in content, difficulty and form, the
practice and familiarity effect cannot be controlled completely,
• There are many tests that cannot be divided into two forms.
• The split-half or the odd-even method: The major advantages of this
method are:
• The problem of preparing equivalent halves of same test is eliminated,
• Issues of memory, familiarity, practice effects, fatigue, maturity etc are
removed,
• The variations in testing situations are eliminated since only one
administration is required.
• The main disadvantages of this method are:
• A test can be divided into two halves in a number of ways; and hence the
correlation between the scores on the two halves may not have a unique
value,
• It is very difficult to divide a test into two halves in such a way that both
halves are equivalent,
• Splitting a test into two equivalent forms is difficult also because items of a
test measure same trait or ability.
• This method cannot be used in power tests and heterogeneous tests.
• The method of rational equivalence: Kuder-Richardson
developed different formulas for computing reliability of a
test. Following is the most popularly used formula:

• r11=
• Where,
• r11= reliability coefficient of the whole test
• n = total number of items in the test
• SD = Standard Deviation of the whole test
• SD2= the variance of the total test
• P = the percent of individuals passing each item
• Q = the percent of individuals not passing each item
• ∑pq = the sum of the products of pq
• The chief advantages of using this method are:
• The difficulties of the other methods can be overcome by using
this method,
• This method requires single administration of the test,
• The method is simple and economic,
• This method gives accurate results when items are homogeneous,
• This method is scientific.
• The limitations of this method are:
• This method cannot be used successfully when the items of the
test are not homogeneous,
• Different formulas given by Kuder-Richardson give different
estimates of reliability of the same test
• This method does not work with power test or tests with
heterogeneous items.
Unit: 4-: Intelligence Test
•  Intelligence Test-- meaning
•  Individual and group test of Intelligence –
Binet test ,Army Alpha and Army Beta test,
•  Uses of Intelligence test
Unit: 5 -Personality Test

•  Personality test meaning


•  Questionnaire technique - MMPI,
•  Rating scale,
•  Projective tests
Unit: 6 – Aptitude, Interest and
Attitude Test
•  Aptitude test- Types of Aptitude, uses of
aptitude test
•  Measurement of Interest- Kuder interest
inventory-
•  Measurement of attitude -Thurston and
Likert scale
Unit: 7 - Educational Achievement
Test
•  Educational Achievement Test - meaning
and classification,
•  Construction of test Educational
Achievement Test
•  Different types of Educational Achievement
Test
•  Uses of Educational Achievement Test
Unit: 8 - New Trends in evaluation

•  Normed referenced and criterion referenced


test,
•  Reporting Test result –cumulative record
card,
•  Grading and continuous evaluation,
•  Formative and summative evaluation

You might also like