In using kappas as a basis of statistical inference, whether or not kappas are consistent with random decision making is usually of minimal importance.

While the kappas that emerged from consideration of agreement between non-ordered categories can be extended to ordinal measures [21—23], there are better alternatives to kappas for ordered categories.

However, many presentations impose such restricting assumptions on the distributions of pi that may not well represent what is actually occurring in the population.

Kappa has meaning beyond percentage agreement corrected for chance PACC.

The scaling of Mount Everest is one example. Unless such a statistic estimates the same population parameter as does the intraclass kappa, it is not an estimate of the reliability of a binary measure. RBI or batting averages in baseball are purely descriptive statistics, not meant to be used as a basis of statistical inference.

Consequently, we would suggest that: This wry comment by Fleiss et al. Whether to use or not use kappa has very little to do with its relationship to PACC.

Since then, the types of research questions in medical research that are well addressed with kappas for example, reliability and validity of diagnosis, risk factor estimation abound, and such areas of research have become of ever growing interest and importance [4].

Consequently there are now two separate and distinct lines of inquiry, sharing historical roots, one concerning use and interpretation of percentage agreement that will not be addressed here, and that concerning use and interpretation of kappa which is here the focus. The simplest way to estimate the reliability of a measure is to obtain a representative sample of N patients from the population to which results are to be generalized.

Such models merely represent special cases often useful for illustrating certain properties of kappa, or for disproving certain general statements regarding kappa, as they here will be.

Kappas were designed to measure correlation between nominal, not ordinal, measures. To them whether rescaling it to a kappa is appropriate to its understanding and use is a side issue [16—20].

Fortunately, the results were consistent. Thus the ratings might be M ratings by the same pathologist of tissue slides presented over a period of time in a way that ensures blindness: In this case category 1 is completely discriminated from categories 2 and 3, but the decisions between 2 and 3 are made randomly.

The discovery of the Northwest Passage is a second. Kappas are based on no such limiting assumptions. Even restricted to non-ordered categories, kappas are meant to be used, not only as descriptive statistics, but as a basis of statistical inference.

Sir Alexander Fleming in discovered penicillin by noticing that bacteria failed to grow on a mouldy Petri dish. However, in summarizing current knowledge of penicillin and its uses, a mouldy Petri dish is at most a historical curiosity, not of current relevance to knowledge about penicillin.

Once one understands how each is computed, it is a matter of personal preference and subjective judgement which statistic would be preferable in evaluating the performance of batters.

Then discussions of bias, standard error, or any other such statistical inference procedures from sample to population are compromised.

In much the same way, Jacob Cohen discovered kappa by noticing that this statistic represented percentage agreement between categories corrected for chance PACC.The o -diagonal elements are jj ∗ Pj Pj∗ ; j = j ∗, with jj ∗ the correlation coe cient between pij and pij ∗. The diagonal elements of X are Pj Pj, and the o -diagonal elements are −Pj Pj∗.

