Inter-rater Reliability

In statistics, inter-rater reliability, inter-rater agreement, or concordance is the degree of agreement among raters. It gives a score of how much homogeneity, or consensus, there is in the ratings given by judges. It is useful in refining the tools given to human judges, for example by determining if a particular scale is appropriate for measuring a particular variable. If various raters do not agree, either the scale is defective or the raters need to be re-trained.

There are a number of statistics which can be used to determine inter-rater reliability. Different statistics are appropriate for different types of measurement. Some options are: joint-probability of agreement, Cohen's kappa and the related Fleiss' kappa, inter-rater correlation, concordance correlation coefficient and intra-class correlation.

Read more about Inter-rater Reliability:  Sources of Inter-rater Disagreement, The Philosophy of Inter-rater Agreement, Joint Probability of Agreement, Kappa Statistics, Correlation Coefficients, Intra-class Correlation Coefficient, Limits of Agreement, Krippendorff’s Alpha