Skip to main page content
U.S. flag

An official website of the United States government

Dot gov

The .gov means it’s official.
Federal government websites often end in .gov or .mil. Before sharing sensitive information, make sure you’re on a federal government site.

Https

The site is secure.
The https:// ensures that you are connecting to the official website and that any information you provide is encrypted and transmitted securely.

Access keys NCBI Homepage MyNCBI Homepage Main Content Main Navigation
. 2015 Jul 14;10(7):e0132365.
doi: 10.1371/journal.pone.0132365. eCollection 2015.

How Well Do Raters Agree on the Development Stage of Caenorhabditis elegans?

Affiliations

How Well Do Raters Agree on the Development Stage of Caenorhabditis elegans?

Annabel A Ferguson et al. PLoS One. .

Abstract

The assessment of inter-rater reliability is a topic that is infrequently addressed in Caenorhabditis elegans research, despite the existence of sophisticated statistical methods and the strong interest in the field in obtaining reliable and accurate data. This study applies statistical modeling as a robust means of analyzing the performance of worm researchers measuring the stage of worm development in terms of the two independent factors that comprise "agreement", which are (1) accuracy, representing trueness, a lack of systematic differences, or lack of bias, and (2) precision, representing reliability or the extent to which random differences are small. In our study, multiple raters assessed the same sample of worms to determine the developmental stage of each animal, and we collected data linking each scorer with their assessment for each worm. To describe the agreement of the raters, we developed a structural equation model with latent variables and thresholds, which assumes that all the raters are jointly scoring each worm. This common factor model separately quantifies the two aspects of agreement. The stage-specific thresholds examine accuracy and characterize the relative biases of each rater during the scoring process. The factor loadings for each rater examine the precision and characterizes the random error of the rater. Within our group, we found that the overall agreement was good, while certain adjustments in particular raters would have decreased systematic differences. Hence, the use of developmental stage as an experimental outcome can be both accurate and precise.

PubMed Disclaimer

Conflict of interest statement

Competing Interests: The authors have declared that no competing interests exist.

Figures

Fig 1
Fig 1. A common factor ordinal model to analyze rater agreement.
This model describes the ordinal measurements (R1, R2, and R3) made by three raters (1, 2, and 3), which are observed (manifest) variables denoted by squares. These variables are related to the variables μ and χ, which are latent, meaning that they are not directly observable, but are included in the model since they underlie the actual observable values. The latent variable μ corresponds to the true worm stage but on a continuous scale. The variable μ is defined as being normally distributed with a mean of zero and a standard deviation of one. This standard deviation is represented by the curved arrow showing the value one (“1”) that is adjacent to μ. Each rater judges the stage of worm development on his or her own continuous scale, shown as the latent variables χ1, χ2, and χ3 in the model. Each rater’s unknown continuous scale is a linear function of μ as indicated by the single arrow paths pointing from μ to each χ. The slopes (path coefficients) for these linear functions are denoted by ρ1, ρ2, and ρ3 and the intercepts are equal to zero. These functions result in χ1, χ2, and χ3 having a residual standard deviation of 1-ρ1 2, 1-ρ2 2, and 1-ρ3 2, respectively, which are denoted by the labelled curved arrow beside each variable. The directed path from each rater’s continuous scale, χ, and the observed ordinal measurement, R, is nonlinear as denoted by the sinusoidal path. The nonlinear relationship can be described as a threshold model where the thresholds (ci1, ci2, ci3, and ci4) for rater i control the marginal probability of each observed ordinal measurement (denoted by P(L1), P(L2), P(dauer), P(L3), and P(L4)) under the assumption that each rater’s continuous judgment is normally distributed with mean of zero and a standard deviation of one.
Fig 2
Fig 2. Sample still image of the sample of worms used for scoring by the raters.
Each rater was assigned the same sample of worms to score for developmental stage. The worms were shown in both a 40X magnification image (illustrated) as well as a short video recording of each animal. Each worm was identified by a number to facilitate each rater evaluating identical animals in the same order.
Fig 3
Fig 3. Head-to-head comparison of ratings from pairs of observers.
Each of the 60 animal developmental stage ratings from a pair of reviewers is compared via the use of a pair-wise scatter plot matrix. The axis showing numbers 1 through 5 represents the animal stage with 1 representing L1, 2 representing L2, 3 representing dauer, 4 representing L3, and 5 representing L4. The green line represents perfect agreement between the two observers, and points along this line represent animals that are scored similarly by each observer. In contrast points either above or below the line represent disagreement between the raters. The ordinal values are slightly “jittered” to make it easier to discern the varying density of the ratings.
Fig 4
Fig 4. Heat map showing the pairwise ratio of the residual error estimates for all raters.
The residual error estimate for the rater indicated in each row was divided by the rater in each column, and then was displayed as a heat map to highlight similarities and differences between raters. Each ratio is shown as the number inside of the colored box. The brightness of the color indicates relative strength of difference between raters, with red representing a ratio greater than one and green representing a ratio less than one.
Fig 5
Fig 5. Rater-specific thresholds estimated using the common factor model.
The thresholds classify the worms into the L1, L2, dauer, L3, or L4 stages. Each stages represents an abstract concept encompassing size, morphologic, and behavioral features of the worm that can be perceived by a rater relative to each threshold. Threshold 1 (A) separates the L1 and L2 categories, threshold 2 (B) separates the L2 and dauer categories, threshold 3 (C) separates the dauer and L3 categories, and threshold 4 (D) separates the L3 and L4 categories.
Fig 6
Fig 6. Heat map showing differences between raters for the predicted proportion of worms assigned to each stage of development.
The brightness of the color indicates relative strength of difference between raters, with red as positive and green as negative. Result are shown as column minus row for each rater 1 through 7.
Fig 7
Fig 7. Comparison of the common factor model with rater behavior.
Shown are bar-graphs depicting the percentages predicted for the assignment of animals to each stage by individual reviewers from the estimated common factor model (left column), and the observed percentage of animals assigned to each of the developmental stages for the raters (right column).

Similar articles

Cited by

References

    1. Riddle DL, Swanson MM, Albert PS. Interacting genes in nematode dauer larva formation. Nature. 1981;290(5808):668–71. . - PubMed
    1. Ambros V. Heterochronic Genes In: Riddle DL, Blumenthal T, Meyer BJ, Priess JR, editors. C elegans II. 2nd ed Cold Spring Harbor (NY) 1997. - PubMed
    1. Cassada RC, Russell RL. The dauerlarva, a post-embryonic developmental variant of the nematode Caenorhabditis elegans. Developmental biology. 1975;46(2):326–42. . - PubMed
    1. Abrahante JE, Miller EA, Rougvie AE. Identification of heterochronic mutants in Caenorhabditis elegans. Temporal misexpression of a collagen::green fluorescent protein fusion gene. Genetics. 1998;149(3):1335–51. - PMC - PubMed
    1. Moore BT, Jordan JM, Baugh LR. WormSizer: high-throughput analysis of nematode size and shape. PLoS One. 2013;8(2):e57142 10.1371/journal.pone.0057142 - DOI - PMC - PubMed

Publication types

LinkOut - more resources