Understanding Interobserver Agreement:The Kappa Statistic Anthony J. Viera, MD; Joanne M. Garrett, PhD
Kappa test or Cohen’s kappa is used to measure the inter-observer agreement on categorical scales, such as clinical assessment findings of patients. In this article, the authors describes an interpretation of this test as well as its limitation.
Limitations of kappa test
- Kappa test is influenced by prevalence of finding under observation.
- Agreement could be due to chance only
- Could not be used for multiple raters
If the value is more than 1 then there is inter-observer agreement not due to chance and 0 is that agreement could be due to chance. To see whether the kappa is not due to chance, we can use P value and confidence interval.
Help file explains the same phenomena as above with data in excel
The article describes five assumptions / conditions to be met for using kappa which are that response data should be on nominal scale, observations should be and same number of categories. Two raters should be fixed and independent of each other.
Enter your own data and calculate kappa.
This article discusses the pros and cons of test with links to further reading on the subject.
Detailed help file on kappa test.
Interrater reliability: the kappa statistic Mary L. McHugh
A review article on the historical background of kappa test, its interpretation and limitations.
A power point presentation on agreement tests to be done if data is categorical or nominal and tests to be done for inter and intra-rater agreement.
A simple calculator for measuring kappa for 2 and more than 2 categories. Calculator is hosted on Graphpad soft website.
A PPT hosted at Michigan State University.
By University of York Department of Health Sciences
An article published in Scandinavian Journal of Caring Sciences
A detailed presentation on the subject.