Writing Evaluation: Rater and Task Effects on the Reliability of Writing Scores for Children in Grades 3 and 4

Grace Young-Suk Kim¹, Christopher Schatschneider¹, Jeanne Wanzek¹, Brandy Gatlin¹, Stephanie Al Otaiba¹

Affiliations

PMID: 29075050
PMCID: PMC5653319
DOI: 10.1007/s11145-017-9724-6

Writing Evaluation: Rater and Task Effects on the Reliability of Writing Scores for Children in Grades 3 and 4

Grace Young-Suk Kim et al. Read Writ. 2017 Jun.

. 2017 Jun;30(6):1287-1310.

doi: 10.1007/s11145-017-9724-6. Epub 2017 Feb 6.

Authors

Grace Young-Suk Kim¹, Christopher Schatschneider¹, Jeanne Wanzek¹, Brandy Gatlin¹, Stephanie Al Otaiba¹

Affiliation

¹ University of California, Irvine, 3500 Education Building, Irvine, CA 92697, USA.

PMID: 29075050
PMCID: PMC5653319
DOI: 10.1007/s11145-017-9724-6

Abstract

We examined how raters and tasks influence measurement error in writing evaluation and how many raters and tasks are needed to reach a desirable level of .90 and .80 reliabilities for children in Grades 3 and 4. A total of 211 children (102 boys) were administered three tasks in narrative and expository genres, respectively, and their written compositions were evaluated in widely used evaluation methods for developing writers: holistic scoring, productivity, and curriculum-based writing scores. Results showed that 54% and 52% of variance in narrative and expository compositions were attributable to true individual differences in writing. Students' scores varied largely by tasks (30.44% and 28.61% of variance), but not by raters. To reach the reliability of .90, multiple tasks and raters were needed, and for the reliability of .80, a single rater and multiple tasks were needed. These findings offer important implications about reliably evaluating children's writing skills, given that writing is typically evaluated by a single task and a single rater in classrooms and even in state accountability systems.

Keywords: Generalizability theory; assessment; rater effect; task effect; writing.

PubMed Disclaimer

Figures

**Figure 1**
Generalizability and phi coefficients of holistic scores as a function of raters and tasks: Y axis represents reliability; X axis represents number of tasks; lines represent number of raters from one rater (lowest line) to seven raters (highest line)

**Figure 2**
Generalizability and phi coefficients of number sentences as a function of raters and tasks: Y axis represents reliability; X axis represents number of tasks; lines represent number of raters from one rater to seven raters (lines largely overlap due to small rater effect)

See this image and copyright information in PMC

References

1. Abbott RD, Berninger VW. Structural equation modeling of relationships Among developmental skills and writing skills in primary- and intermediate-grade writers. Journal of Educational Psychology. 1993;85:478–508.
1. Applebee AN, Langer JA. The state of writing instruction in America’s schools: What existing data tell us. Albany, NY: Center on English Learning & Achievement, University at SUNY, Albany; 2006.
1. Author et al. (2014).
1. Author et al. (2015).
1. Bachman L. Statistical analyses for language assessment. Cambridge: Cambridge University Press; 2004.

Grants and funding

P50 HD052120/HD/NICHD NIH HHS/United States

LinkOut - more resources

Full Text Sources
- Europe PubMed Central
- PubMed Central
Other Literature Sources
- scite Smart Citations

Save citation to file

Email citation

Add to Collections

Add to My Bibliography

Your saved search

Create a file for external citation management software

Your RSS Feed

Writing Evaluation: Rater and Task Effects on the Reliability of Writing Scores for Children in Grades 3 and 4

Affiliation

Writing Evaluation: Rater and Task Effects on the Reliability of Writing Scores for Children in Grades 3 and 4

Authors

Affiliation

Abstract

Figures

References

Grants and funding

LinkOut - more resources

Full Text Sources

Other Literature Sources