Skip to main page content
U.S. flag

An official website of the United States government

Dot gov

The .gov means it’s official.
Federal government websites often end in .gov or .mil. Before sharing sensitive information, make sure you’re on a federal government site.

Https

The site is secure.
The https:// ensures that you are connecting to the official website and that any information you provide is encrypted and transmitted securely.

Access keys NCBI Homepage MyNCBI Homepage Main Content Main Navigation
. 2017 Jun;30(6):1287-1310.
doi: 10.1007/s11145-017-9724-6. Epub 2017 Feb 6.

Writing Evaluation: Rater and Task Effects on the Reliability of Writing Scores for Children in Grades 3 and 4

Affiliations

Writing Evaluation: Rater and Task Effects on the Reliability of Writing Scores for Children in Grades 3 and 4

Grace Young-Suk Kim et al. Read Writ. 2017 Jun.

Abstract

We examined how raters and tasks influence measurement error in writing evaluation and how many raters and tasks are needed to reach a desirable level of .90 and .80 reliabilities for children in Grades 3 and 4. A total of 211 children (102 boys) were administered three tasks in narrative and expository genres, respectively, and their written compositions were evaluated in widely used evaluation methods for developing writers: holistic scoring, productivity, and curriculum-based writing scores. Results showed that 54% and 52% of variance in narrative and expository compositions were attributable to true individual differences in writing. Students' scores varied largely by tasks (30.44% and 28.61% of variance), but not by raters. To reach the reliability of .90, multiple tasks and raters were needed, and for the reliability of .80, a single rater and multiple tasks were needed. These findings offer important implications about reliably evaluating children's writing skills, given that writing is typically evaluated by a single task and a single rater in classrooms and even in state accountability systems.

Keywords: Generalizability theory; assessment; rater effect; task effect; writing.

PubMed Disclaimer

Figures

Figure 1
Figure 1
Generalizability and phi coefficients of holistic scores as a function of raters and tasks: Y axis represents reliability; X axis represents number of tasks; lines represent number of raters from one rater (lowest line) to seven raters (highest line)
Figure 2
Figure 2
Generalizability and phi coefficients of number sentences as a function of raters and tasks: Y axis represents reliability; X axis represents number of tasks; lines represent number of raters from one rater to seven raters (lines largely overlap due to small rater effect)

References

    1. Abbott RD, Berninger VW. Structural equation modeling of relationships Among developmental skills and writing skills in primary- and intermediate-grade writers. Journal of Educational Psychology. 1993;85:478–508.
    1. Applebee AN, Langer JA. The state of writing instruction in America’s schools: What existing data tell us. Albany, NY: Center on English Learning & Achievement, University at SUNY, Albany; 2006.
    1. Author et al. (2014).

    1. Author et al. (2015).

    1. Bachman L. Statistical analyses for language assessment. Cambridge: Cambridge University Press; 2004.

LinkOut - more resources