0% found this document useful (0 votes)

136 views8 pages

Measurement Educational and Psychological

This document discusses the equivalence of weighted kappa and the intraclass correlation coefficient as measures of reliability. It shows that when disagreement weights are defined as the squared differences between categories and those categories are numerically scaled, weighted kappa is equivalent to the intraclass correlation coefficient. This establishes that weighted kappa can be interpreted similarly to the ICC, allowing researchers to compare reliability measures across categorical and quantitative scales. The document derives this relationship mathematically using an analysis of variance framework.

Uploaded by

Nico Pascual III

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

136 views8 pages

Measurement Educational and Psychological

Uploaded by

Nico Pascual III

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

Educational and Psychological

Measurement
[Link]

The Equivalence of Weighted Kappa and the Intraclass Correlation

Coefficient as Measures of Reliability
Joseph L. Fleiss and Jacob Cohen
Educational and Psychological Measurement 1973 33: 613
DOI: 10.1177/001316447303300309

The online version of this article can be found at:

[Link]

Published by:

[Link]

Additional services and information for Educational and Psychological Measurement can be found
at:

Email Alerts: [Link]

Subscriptions: [Link]

Reprints: [Link]

Permissions: [Link]

Citations: [Link]

>> Version of Record - Oct 1, 1973

What is This?

Downloaded from [Link] at Edinburgh University on May 24, 2012

EDUCATIONAL AND PSYCHOLOGICAL MEASUREMENT
1973, 33, 613-619.

THE EQUIVALENCE OF WEIGHTED KAPPA AND THE

INTRACLASS CORRELATION COEFFICIENT AS
MEASURES OF RELIABILITY
1

JOSEPH L. FLEISS
Biometrics Research, New York State Department of Mental Hygiene
and Columbia University

JACOB COHEN
Department of Psychology, New York University

AN obvious factor restricting the comparability of categorical

(nominal and ordinal) and quantitative (interval and ratio)
data is that their descriptive statistics differ. For appraising
reliability, for example, a useful measure of inter-rater agree-
ment for categorical scales is provided by kappa (Cohen, 1960)
or weighted kappa (Spitzer, Cohen, Fleiss and Endicott, 1967;

Cohen, 1968a).
Kappa is the proportion of agreement corrected for chance, and
scaled to vary from -1 to +1 so that a negative value indicates
poorer than chance agreement, zero indicates exactly chance
agreement, and a positive value indicates better than chance
agreement. A value of unity indicates perfect agreement. The use
of kappa implicitly assumes that all disagreements are equally
serious. When the investigator can specify the relative serious-
ness of each kind of disagreement, he may employ weighted

kappa, the proportion of weighted agreement corrected for chance.

For measuring the reliability of quantitative scales, the
product-moment and intraclass correlation coefficients are widely
1 This work was
supported in part by Public Health Service Grants MH
08534 and MH 23964 from the National Institute of Mental Health. The
authors are indebted to Dr. Robert L. Spitzer of Biometrics Research for
suggesting the problem leading to this report.
613

Downloaded from [Link] at Edinburgh University on May 24, 2012

614 EDUCATIONAL AND PSYCHOLOGICAL MEASUREMENT

used. Correspondences have been established between weighted

kappa and the product-moment coefficient under restricted con-
ditions (Cohen, 1960; 1968a). This paper establishes the equ-
ivalence of weighted kappa with the intraclass correlation coeffi-
cient under general conditions. Krippendorff (1970) demonstrated
essentially the same result.
Weighted Kappa
Suppose that rater A distributes a sample of n subjects across the
m categories of a categorical scale, and suppose that rater B inde-

pendently does the same. Let nij denote the number of subjects as-
signed to categoryi by rater A and to categoryby rater B; let 7?t.
denote the total number of subjects assigned to category i by rater
A; and let n’ j denote the total number of subjects assigned to cate-
gory j by rater B. Finally, let vij denote the disagreement weight
associated with categories i and j. Typically, Vii 0, reflecting no =

disagreement when a is to i
subject assigned category by both raters;
and Vij > 0 for i j, reflecting some degree of disagreement when a
subject is assigned to different categories by the two raters.
The mean observed degree of disagreement is

and the mean degree of disagreement expected by chance (i.e.,

expected if A and B assign patients randomly in accordance
with their respective base rates) is

Weighted kappa is then defined by

Kappa is a special case of weighted kappa when Vij 1 for =

aH?’=.~~.
For convenience, weighted kappa has been defined here in
terms of mean disagreement weights. Because weighted kappa
is invariant under linear transformations of the weights (Cohen,
1968a), it is also interpretable as the proportion of weighted
agreement corrected for chance.

Downloaded from [Link] at Edinburgh University on May 24, 2012

FLEISS AND COHEN 615

The Problem of Interpretation

Given a sample value of weighted kappa, one may test for
its statistical significance if the sample size n is large by referring
the ratio of weighted kappa to its standard error to tables of
the normal distribution. Exact small sample standard errors are
given by Everitt (1968), and approximate large sample standard
errors by Fleiss, Cohen and Everitt (1969). Assuming that the

value of weighted kappa is significantly greater than zero, and

even given that it is a proportion of agreement, there remains the

problem: can its magnitude be compared with that obtained from a

measure such as the intraclass correlation coefficient which is
used with quantitive data and which is interpretable as a pro-
portion of variance. An affirmative answer would provide a use-
ful bridge over the gap between these two different levels of
measurement.
Cohen has pointed out how, under certain conditions, kappa
and weighted kappa may be interpreted as product-moment
correlation coefficients. Specifically, for a 2 X 2 table whose
marginal distributions are the same, kappa is precisely equal
to the phi coefficient (Cohen, 1960). For a general m X m
table with identical marginal distributions (ni. = n.i, i = 1,
..., m)
and disagreement weights vij = (i j) 2, weighted kappa
-

is precisely equal to the product-moment correlation coefficient one

would obtain if the nominal categories were scaled so that the first
category was scored 1, the second category 2, etc. (Cohen,
1968a). Such a scaling is of course valid only when the categories
may be ordered.
This paper establishes a more general property of weighted
kappa. Specifically, if vij = (i j) 2, and if the categories are
-

scaled as above, then, irrespective of the marginal distributions,

weighted kappa is identical with the intraclass correlation coeffi-
cient in which the mean difference between the raters is in-
cluded as a component of variability.

Weighted Kappa in the Context of Two-Way Analysis of

Variance
Let the ratings on the n subjects be quantified as described
above. Define Xkl to be the ordinal number (either 1, 2, ..., or
m) of the category to which subject k was assigned by rater A,

Downloaded from [Link] at Edinburgh University on May 24, 2012

616 EDUCATIONAL AND PSYCHOLOGICAL MEASUREMENT

and Xk2 that of the category to which he was assigned by rater

B. With Vij = (i j),2 it may be shown (see equation 1) that
-

and (see equation 2) that

In (5), X1 is the mean numerical rating given by rater A and X2 is

the mean given by rater B.
Letting SSr denote the sum of squares for raters in the analysis
of variance of the X’s, SS. the sum of squares for subjects, and
She the error (residual) sum of squares, it is fairly easily checked that

and

Thus,

Suppose, now, that the n subjects are a random sample from a

universe of subjects with variance a82, that the two raters are
considered a random sample from a universe of raters with
variance ar2, and that the ratings are subject to a squared
standard error of measurement ue2. The sums of squares of the
analysis of variance then estimate (see Scheffe, 1959, chapter 7)

Downloaded from [Link] at Edinburgh University on May 24, 2012

FLEISS AND COHEN 617

and

Thus, the numerator of (8) estimates

and the denominator of (8) estimates

Therefore, (8) estimates, although not unbiasedly, the quantity

If n, the number of subjects, is at all large, then (8) in effect

estimates

But p is the intraclass correlation coefficient between the ratings

given a randomly selected subject by the randomly selected
raters, for the covariance between two such ratings is (782, whereas
the variance of any single such radomly generated rating is
(782+ ~,:2 + (7e2 (see Burdock, Fleiss and Hardesty, 1963). Thus
Kw is interpretable (aside from a term which goes to zero as n
becomes large) as the intraclass correlation coefficient of reli-
ability when systematic variability between raters is included
as a component of total variation.

Comments
The above development in terms of the measurement of
was

agreement only on ordinal

scales, and only when disagreement
weights were taken as squared differences. The squaring of dif-
ferences was admittedly arbitrary, but the scaling of errors by
means of their squares is so common (see, e.g., Lehmann, 1959)

that this convention requires little justification.

Of greater importance is the generalization to nominal scales
of the interpretation of weighted kappa as an intraclass cor-
relation coefficient. Such an interpretation will, it seems, be more
or less valid provided that the disagreement weight v for two

Downloaded from [Link] at Edinburgh University on May 24, 2012

618 EDUCATIONAL AND PSYCHOLOGICAL MEASUREMENT

categories increases more rapidly than the qualitative difference

between them.
The latter idea provides a perspective from which the intra-
class correlation coefficient may be viewed as a special case
of weighted kappa. If the viis are viewed as squared distances
between the categories of a nominal scale, they implicitly define
a space of up to m-1 dimensions (Shepard, 1962). An m-point

equal interval scale, on the other hand, explicitly defines a one-

dimensional array with squared distances equal to (i -j) 2.
As shown above, weighted kappa applied to the latter case
necessarily equals its intraclass correlation. Thus the intraclass
correlation coefficient is the special case of weighted kappa when
the categories are equally spaced points along one dimension.
This perspective also makes it clear that in assigning weights
vij in the nominal case one is in effect quantifying a nominal
scale of m categories, as in multidimensional scaling (Shepard,
1962) or in multiple regression analysis with nominal scale
coding (Cohen, 1968b), by going beyond one dimension to as
many as m-1 dimensions.

Application
A multidimensional structure for nominal scales was implicitly
incorporated into a scaling of differences in psychiatric diagnosis
(Spitzer, Cohen, Fleiss and Endicott, 1967). Two diagnostic
categories which differed markedly in terms of severity or prognosis
were given a disagreement weight appreciably in excess of the

weight associated with two similar categories. Values of weighted

kappa in the interval .4 to .6 have typically been found for
agreement on psychiatric diagnosis (Spitzer and Endicott, 1968;
1969).
Some widely used numerical scales of psychopathology, on
the other hand, typically have reliabilities in the interval .7
to .9 when reliability is measured by the intraclass correlation
coefficient (Spitzer, Fleiss, Endicott and Cohen, 1967). Given
the correspondence established above between weighted kappa and
the intraclass correlation coefficient, and given the rationale used
in scaling diagnostic disagreement, it seems possible to affirm that
agreement on psychiatric diagnosis is poorer than agreement on
numerical descriptions of psychopathology.

Downloaded from [Link] at Edinburgh University on May 24, 2012

FLEISS AND COHEN 619

REFERENCES
Burdock, E. I., Fleiss, J. L., and Hardesty, A. S. A new view of
inter-observer agreement. Personnel Psychology, 1963, 16,
373-384.
Cohen, J. A coefficient of agreement for nominal scales. EDUCATIONAL
AND PSYCHOLOGICAL MEASUREMENT, 1960, 20, 37-46.

Cohen, J. Weighted kappa: Nominal scale agreement with provision

for scaled disagreement or partial credit: Psychological Bulletin,
1968, 70, 213-220. (a)
Cohen, J. Multiple regression as a general data-analytic system.
Psychological Bulletin, 1968, 70, 426-443. (b)
Everitt, B. S. Moments of the statistics kappa and weighted kappa.
British Journal of Mathematical and Statistical Psychology,
1968, 21, 97-103.
Fleiss, J. L., Cohen, J. and Everitt, B. S. Large sample standard
errors of kappa and weighted kappa. Psychological Bulletin,
1969, 72, 323-327.
Krippendorff, K. Bivariate agreement coefficients for reliability of
data. In E. F. Borgatta and G. W. Bohrnstedt (Eds.) Social
Methodology 1970. San Francisco: Jossey-Bass, 1970.
Lehmann, E. L. Testing statistical hypotheses. New York: Wiley,
1959.
Scheffé, H. The analysis of variance. New York: Wiley, 1959.
Shepard, R. N. The analysis of proximities: Multidimensional scal-
ing with an unknown distance function. I. Psychometrika, 1962,
27, 125-140.
Spitzer, R. L., Cohen, J., Fleiss, J. L. and Endicott, J. Quantification
of agreement in psychiatric diagnosis. Archives of General
Psychiatry, 1967, 17, 83-87.
Spitzer, R. L. and Endicott, J. Diagno: A computer program for
psychiatric diagnosis utilizing the differential diagnostic pro-
cedure. Archives of General Psychiatry, 1968, 18, 746-756.
Spitzer, R. L. and Endicott, J. Diagno II: Further developments
in a computer program for psychiatric diagnosis. American
Journal of Psychiatry, 1969, 125 (Jan. supp.), 12-21.
Spitzer, R. L., Fleiss, J. L., Endicott, J. and Cohen, J. Mental status
schedule: Properties of factor-analytically derived scales.
Archives of General Psychiatry, 1967, 16, 479-493.

Downloaded from [Link] at Edinburgh University on May 24, 2012

Weighted Kappa-Original Article
No ratings yet
Weighted Kappa-Original Article
8 pages
A Note On The Linearly Weighted Kappa Coefficient For Ordinal Scales
No ratings yet
A Note On The Linearly Weighted Kappa Coefficient For Ordinal Scales
13 pages
Reliability Judgments: Unequal Judges Subject: Dichotomous Numbers of
No ratings yet
Reliability Judgments: Unequal Judges Subject: Dichotomous Numbers of
6 pages
The Disagreeable Behaviour of The Kappa Statistic: Laura Flight and Steven A. Julious
No ratings yet
The Disagreeable Behaviour of The Kappa Statistic: Laura Flight and Steven A. Julious
5 pages
2303 12502v1
No ratings yet
2303 12502v1
19 pages
Cyr 1992
No ratings yet
Cyr 1992
8 pages
Kappa - A Critical Review: Author: Xier Li
No ratings yet
Kappa - A Critical Review: Author: Xier Li
30 pages
Large Sample Standard Errors of Kappa and Weighted Kappa
No ratings yet
Large Sample Standard Errors of Kappa and Weighted Kappa
5 pages
Cohen-1960-A Coefficient of Agreement For Nominal Scales
0% (1)
Cohen-1960-A Coefficient of Agreement For Nominal Scales
1 page
A Comparison of Reliability Coe Cients For Ordinal Rating Scales
No ratings yet
A Comparison of Reliability Coe Cients For Ordinal Rating Scales
25 pages
RIR002 - Measuring Nominal Scale Agreement Among Many Raters
No ratings yet
RIR002 - Measuring Nominal Scale Agreement Among Many Raters
5 pages
Statistical Treatment
100% (1)
Statistical Treatment
3 pages
Cohen's Kappa: Navigation Search Statistical Inter-Rater Agreement
0% (1)
Cohen's Kappa: Navigation Search Statistical Inter-Rater Agreement
7 pages
Testing The Equality of Two Dependent Kappa Statistics: Statist. Med. 19, 373) 387 (2000)
No ratings yet
Testing The Equality of Two Dependent Kappa Statistics: Statist. Med. 19, 373) 387 (2000)
15 pages
Interrater Reliability and The Kappa Statistic: A Comment On Morris Et Al. (2008)
No ratings yet
Interrater Reliability and The Kappa Statistic: A Comment On Morris Et Al. (2008)
2 pages
Understanding Kappa in Diagnostics
No ratings yet
Understanding Kappa in Diagnostics
4 pages
Novice Assessor Training Guide
No ratings yet
Novice Assessor Training Guide
1 page
NOMINALLY SCALED DATA and KAPPA STATISTIC K
No ratings yet
NOMINALLY SCALED DATA and KAPPA STATISTIC K
6 pages
Legendre Coefficient of Concordance 2010
No ratings yet
Legendre Coefficient of Concordance 2010
7 pages
Assessing Agreement in Measurements
No ratings yet
Assessing Agreement in Measurements
2 pages
Kappa Statistic in Musculoskeletal Research
No ratings yet
Kappa Statistic in Musculoskeletal Research
12 pages
Method Agreement for Researchers
No ratings yet
Method Agreement for Researchers
13 pages
Understanding and Computing Cohen's Kappa, A Tutorial
100% (1)
Understanding and Computing Cohen's Kappa, A Tutorial
18 pages
PAPER241
No ratings yet
PAPER241
6 pages
Interrater Reliability The Kappa Statistic
No ratings yet
Interrater Reliability The Kappa Statistic
7 pages
RIR003 - Estimation of The Reliability of Ratings
No ratings yet
RIR003 - Estimation of The Reliability of Ratings
18 pages
Assessing The Inter Rater Reliability For Nominal Categorical and Ordinal Data in Medical Sciences
No ratings yet
Assessing The Inter Rater Reliability For Nominal Categorical and Ordinal Data in Medical Sciences
5 pages
Treatment (Statistical Treatment)
No ratings yet
Treatment (Statistical Treatment)
3 pages
Reliability in Measurement
No ratings yet
Reliability in Measurement
37 pages
Kappa Test For Agreement Between Two Raters
No ratings yet
Kappa Test For Agreement Between Two Raters
12 pages
Statistical Treatment
No ratings yet
Statistical Treatment
3 pages
Measurement Concepts: Variable, Reliability, Validity, and Norm
No ratings yet
Measurement Concepts: Variable, Reliability, Validity, and Norm
42 pages
2 Agreement Coefficients For Nominal Ratings: A Review
No ratings yet
2 Agreement Coefficients For Nominal Ratings: A Review
9 pages
Kappa: Interrater Agreement in Stata
No ratings yet
Kappa: Interrater Agreement in Stata
14 pages
Dundee Epidemiology and Biostatistics Unit: Measures of Agreement
No ratings yet
Dundee Epidemiology and Biostatistics Unit: Measures of Agreement
36 pages
Course Content: St. Paul University Philippines
50% (4)
Course Content: St. Paul University Philippines
33 pages
Course Content: St. Paul University Philippines
No ratings yet
Course Content: St. Paul University Philippines
33 pages
Lesson 3: Measures of Central Tendency: Total Number of Scores Population N Sample N
No ratings yet
Lesson 3: Measures of Central Tendency: Total Number of Scores Population N Sample N
12 pages
Introduction To Measurement Theory, 1st Edition Digital PDF Download
100% (9)
Introduction To Measurement Theory, 1st Edition Digital PDF Download
14 pages
Confidence Intervals For Kappa
No ratings yet
Confidence Intervals For Kappa
10 pages
Statistical Treatment
No ratings yet
Statistical Treatment
4 pages
Statistical Treatment Using Weighted Mean
No ratings yet
Statistical Treatment Using Weighted Mean
11 pages
Scales of Measurement
No ratings yet
Scales of Measurement
7 pages
Research Methods: Reliability & Validity
No ratings yet
Research Methods: Reliability & Validity
23 pages
The Kappa Statistic: X O E E O E E H X, The P-Value
No ratings yet
The Kappa Statistic: X O E E O E E H X, The P-Value
5 pages
Statistical Treatment of Data Explained
No ratings yet
Statistical Treatment of Data Explained
27 pages
Agresti 1995
No ratings yet
Agresti 1995
10 pages
Calculating A Kappa Score
No ratings yet
Calculating A Kappa Score
12 pages
Generalized Kappa Statistic
No ratings yet
Generalized Kappa Statistic
11 pages
3 Denegar 1993 Assessing Reliability and Precision of Measurement An Introduction To Intraclass Correlation and Standard Error of Measurement
No ratings yet
3 Denegar 1993 Assessing Reliability and Precision of Measurement An Introduction To Intraclass Correlation and Standard Error of Measurement
8 pages
MSA - Fourth Edition
No ratings yet
MSA - Fourth Edition
18 pages
Some of The Individual Values
No ratings yet
Some of The Individual Values
42 pages
Computing Inter Rater Reliability and Its Variance in The Presence of High Agreement (Gwet, 2010)
No ratings yet
Computing Inter Rater Reliability and Its Variance in The Presence of High Agreement (Gwet, 2010)
20 pages
Week 7 Measurements and General Research Design
No ratings yet
Week 7 Measurements and General Research Design
23 pages
Introduction To Measurement Theory, 1st Edition ISBN 157766230X, 9781577662303 PDF DOCX DOWNLOAD
No ratings yet
Introduction To Measurement Theory, 1st Edition ISBN 157766230X, 9781577662303 PDF DOCX DOWNLOAD
15 pages
Research Measurement Basics
100% (1)
Research Measurement Basics
23 pages
The Kappa Statistic in Reliability Studies - Use, Interpretation, and Sample Size Requirements - Sim & Wright (2005) PDF
100% (1)
The Kappa Statistic in Reliability Studies - Use, Interpretation, and Sample Size Requirements - Sim & Wright (2005) PDF
13 pages
Theory of Architecture Reviewer
No ratings yet
Theory of Architecture Reviewer
75 pages
History of Architecture - Multiple Choice Questions
100% (1)
History of Architecture - Multiple Choice Questions
16 pages
Solar Time vs. Local Mean Time Analysis
No ratings yet
Solar Time vs. Local Mean Time Analysis
1 page
Hoa Reviewer
No ratings yet
Hoa Reviewer
103 pages
Introduction To Operation Management
No ratings yet
Introduction To Operation Management
78 pages
Passive Cooling
No ratings yet
Passive Cooling
43 pages
Construction Management and Economics
No ratings yet
Construction Management and Economics
16 pages
Logo
No ratings yet
Logo
9 pages
Sustainability 08 01050
No ratings yet
Sustainability 08 01050
17 pages
SPP
No ratings yet
SPP
94 pages
Composing The Layer of Knowledge of Digital Technology in Architecture
No ratings yet
Composing The Layer of Knowledge of Digital Technology in Architecture
6 pages
Access and Accessibility Audit in Commercial Complex: Effectiveness in Respect To People With Disabilities (PWDS)
No ratings yet
Access and Accessibility Audit in Commercial Complex: Effectiveness in Respect To People With Disabilities (PWDS)
10 pages
3826 Content PDF
No ratings yet
3826 Content PDF
7 pages
DSWD Pa2005-08c Es
No ratings yet
DSWD Pa2005-08c Es
6 pages
Castro, Aaron Gabriel A. 5. Why Is There A Need To Draw The Lines The Scope and The Extent of Our National Territory?
No ratings yet
Castro, Aaron Gabriel A. 5. Why Is There A Need To Draw The Lines The Scope and The Extent of Our National Territory?
2 pages
Philippine Prison Design Standards
No ratings yet
Philippine Prison Design Standards
24 pages
TRIZ Future 2006 (1) 2 PDF
No ratings yet
TRIZ Future 2006 (1) 2 PDF
286 pages
Synopsis Writing Guidelines for UVAS
No ratings yet
Synopsis Writing Guidelines for UVAS
4 pages
Seiko
No ratings yet
Seiko
4 pages
Enhancing Creativity for Performance
No ratings yet
Enhancing Creativity for Performance
19 pages
Consumer Decision Making
No ratings yet
Consumer Decision Making
28 pages
Lecture 1
No ratings yet
Lecture 1
21 pages
Proof, Logic, and Conjecture - The Mathematician's Toolbox PDF
100% (1)
Proof, Logic, and Conjecture - The Mathematician's Toolbox PDF
481 pages
Mechanics of Rigid Bodies Statics Dynamics
No ratings yet
Mechanics of Rigid Bodies Statics Dynamics
5 pages
Listening
No ratings yet
Listening
6 pages
Power Set - DWikipedia
No ratings yet
Power Set - DWikipedia
7 pages
Research Chapter 5 Editedd
No ratings yet
Research Chapter 5 Editedd
51 pages
Biases in R&D Decision Making
No ratings yet
Biases in R&D Decision Making
7 pages
Reflection Paper Masteral
100% (1)
Reflection Paper Masteral
3 pages
English Semantics
No ratings yet
English Semantics
70 pages
Indigenous Identity and Preservation
No ratings yet
Indigenous Identity and Preservation
20 pages
Assignment On SANKHYA Philosophy
100% (15)
Assignment On SANKHYA Philosophy
29 pages
Total Fix
No ratings yet
Total Fix
132 pages
Theorizing Gender: January 2002
No ratings yet
Theorizing Gender: January 2002
12 pages
IELTS Writing Task Samples & Feedback
No ratings yet
IELTS Writing Task Samples & Feedback
6 pages
Language of Research
No ratings yet
Language of Research
8 pages
Scoring Criteria for Social 30-1 Essays
No ratings yet
Scoring Criteria for Social 30-1 Essays
1 page
Computational Chunking in Chess PDF
No ratings yet
Computational Chunking in Chess PDF
211 pages
Robson Gregory 00612
No ratings yet
Robson Gregory 00612
50 pages
Kanho Yakushiji - 般若心経 Heart Sutra
No ratings yet
Kanho Yakushiji - 般若心経 Heart Sutra
2 pages
How Do I Write An Abstract?: Key Process Elements
No ratings yet
How Do I Write An Abstract?: Key Process Elements
1 page
The Medieval Heritage in Early Modern Metaphysics and Modal Theory, 1400-1700 - Russell L. Friedman, Lauge O. Nielsen (Eds.) - Honnefelder PDF
No ratings yet
The Medieval Heritage in Early Modern Metaphysics and Modal Theory, 1400-1700 - Russell L. Friedman, Lauge O. Nielsen (Eds.) - Honnefelder PDF
23 pages
Radical Atheism in Derrida Studies
No ratings yet
Radical Atheism in Derrida Studies
9 pages
First Steps in Mathematics Guide
No ratings yet
First Steps in Mathematics Guide
31 pages
Six Sigma Updated PDF
100% (4)
Six Sigma Updated PDF
127 pages
Culture's Role in Environment-Behavior Relations
No ratings yet
Culture's Role in Environment-Behavior Relations
14 pages

Measurement Educational and Psychological

Uploaded by

Measurement Educational and Psychological

Uploaded by

Educational and Psychological

The Equivalence of Weighted Kappa and the Intraclass Correlation

The online version of this article can be found at:

Email Alerts: [Link]

>> Version of Record - Oct 1, 1973

Downloaded from [Link] at Edinburgh University on May 24, 2012

THE EQUIVALENCE OF WEIGHTED KAPPA AND THE

AN obvious factor restricting the comparability of categorical

kappa, the proportion of weighted agreement corrected for chance.

Downloaded from [Link] at Edinburgh University on May 24, 2012

used. Correspondences have been established between weighted

and the mean degree of disagreement expected by chance (i.e.,

Weighted kappa is then defined by

Kappa is a special case of weighted kappa when Vij 1 for =

Downloaded from [Link] at Edinburgh University on May 24, 2012

The Problem of Interpretation

value of weighted kappa is significantly greater than zero, and

problem: can its magnitude be compared with that obtained from a

is precisely equal to the product-moment correlation coefficient one

scaled as above, then, irrespective of the marginal distributions,

Weighted Kappa in the Context of Two-Way Analysis of

Downloaded from [Link] at Edinburgh University on May 24, 2012

and Xk2 that of the category to which he was assigned by rater

and (see equation 2) that

In (5), X1 is the mean numerical rating given by rater A and X2 is

Suppose, now, that the n subjects are a random sample from a

Downloaded from [Link] at Edinburgh University on May 24, 2012

Thus, the numerator of (8) estimates

and the denominator of (8) estimates

Therefore, (8) estimates, although not unbiasedly, the quantity

If n, the number of subjects, is at all large, then (8) in effect

But p is the intraclass correlation coefficient between the ratings

agreement only on ordinal

that this convention requires little justification.

Downloaded from [Link] at Edinburgh University on May 24, 2012

categories increases more rapidly than the qualitative difference

equal interval scale, on the other hand, explicitly defines a one-

weight associated with two similar categories. Values of weighted

Downloaded from [Link] at Edinburgh University on May 24, 2012

Cohen, J. Weighted kappa: Nominal scale agreement with provision

Downloaded from [Link] at Edinburgh University on May 24, 2012

You might also like