Accuracy and reliability of forensic handwriting comparisons

Affiliations

¹ Noblis, Inc., Reston, VA 20191.
² Federal Bureau of Investigation, Laboratory Division, Questioned Document Unit, Quantico, VA 22135.
³ Meredith DeKalb Miller, Green Cove Springs, FL 32043.
⁴ Federal Bureau of Investigation, Laboratory Division, Research and Support Unit, Quantico, VA 22135.
⁵ Ideal Innovations, Inc., Arlington, VA 22203.

PMID: 35914157
PMCID: PMC9371688
DOI: 10.1073/pnas.2119944119

Accuracy and reliability of forensic handwriting comparisons

R Austin Hicklin et al. Proc Natl Acad Sci U S A. 2022.

. 2022 Aug 9;119(32):e2119944119.

doi: 10.1073/pnas.2119944119. Epub 2022 Aug 1.

Affiliations

¹ Noblis, Inc., Reston, VA 20191.
² Federal Bureau of Investigation, Laboratory Division, Questioned Document Unit, Quantico, VA 22135.
³ Meredith DeKalb Miller, Green Cove Springs, FL 32043.
⁴ Federal Bureau of Investigation, Laboratory Division, Research and Support Unit, Quantico, VA 22135.
⁵ Ideal Innovations, Inc., Arlington, VA 22203.

PMID: 35914157
PMCID: PMC9371688
DOI: 10.1073/pnas.2119944119

Abstract

Forensic handwriting examination involves the comparison of writing samples by forensic document examiners (FDEs) to determine whether or not they were written by the same person. Here we report the results of a large-scale study conducted to assess the accuracy and reliability of handwriting comparison conclusions. Eighty-six practicing FDEs each conducted up to 100 handwriting comparisons, resulting in 7,196 conclusions on 180 distinct comparison sets, using a five-level conclusion scale. Erroneous "written by" conclusions (false positives) were reached in 3.1% of the nonmated comparisons, while 1.1% of the mated comparisons yielded erroneous "not written by" conclusions (false negatives). False positive rates were markedly higher for nonmated samples written by twins (8.7%) compared to nontwins (2.5%). Notable associations between training and performance were observed: FDEs with less than 2 y of formal training generally had higher error rates, but they also had higher true positive and true negative rates because they tended to provide more definitive conclusions; FDEs with at least 2 y of formal training were less likely to make definitive conclusions, but those definitive conclusions they made were more likely to be correct (higher positive predictive and negative predictive values). We did not observe any association between writing style (cursive vs. printing) and rates of errors or incorrect conclusions. This report also provides details on the repeatability and reproducibility of conclusions, and reports how conclusions are affected by the quantity of writing and the similarity of content.

Keywords: decision analysis; documents; error rates; forensics; handwriting.

PubMed Disclaimer

Conflict of interest statement

The authors declare no competing interest.

Figures

**Fig. 1.**
Distribution of conclusions. Training group results are discussed in section 3.5 (baseline dataset: All, 2,863 mated and 3,713 nonmated responses from 86 participants; group A, 2,026 mated and 2,635 nonmated responses from 63 participants; group B, 837 mated and 1,078 nonmated responses from 23 participants).

**Fig. 2.**
Decision rates by QKset, sorted by average conclusion. Multiple methods of assessing consensus are shown: average response, majority, plurality, and IQR. Green circles indicate nonmated QKsets in which the writers were twins. Averages were calculated by converting conclusions to an ordinal (1 to 5) scale and rounding to the nearest value. Each QKset received responses from 31 to 48 participants (mean 36.5) (baseline dataset: n = 6,576 responses).

**Fig. 3.**
QK472: nonmated QKset that resulted in a total of 23 erroneous *Written* conclusions (13 in the baseline dataset, and an additional 10 in the repeats [responses from second assignments in the repeatability dataset], 5 of which were repeated errors). The subjects who wrote the Q and K were twins. The Q was cropped to eliminate the date and “From,” making discrimination more difficult. Conclusion rates for this QKset: *Written*: 13 in baseline data (10 additional in repeats); *ProbWritten*: 19 in baseline data (10 additional in repeats); *NoConc*: 5 in baseline data (7 additional in repeats); *ProbNot*: 7 in baseline data (4 additional in repeats); *NotWritten*: 4 in baseline data (2 additional in repeats).

**Fig. 4.**
Comparison of participants by (A) rates of errors and incorrect conclusions and (B) rates of correct conclusions. Means are shown as dashed lines (examiner comparison dataset: n = 70 participants who completed at least 50 total comparisons; rates are calculated from 6,096 responses; markers in A are jittered to minimize superimpositions; rates are calculated based on mean of 37.9 mated and 49.2 nonmated QKsets per participant).

**Fig. 5.**
Repeatability and reproducibility of conclusions (repeatability dataset: 620 first responses [313 mated, 307 nonmated], 620 second responses [313 mated, 307 nonmated]; reproducibility: 236,366 pairwise combinations of responses [103,398 mated, 132,968 nonmated] derived from the 6,576 individual responses in the baseline dataset [2,863 mated, 3,713 nonmated]).

See this image and copyright information in PMC

Comment in

Reply to Kukucka: Calculating error rates in forensic handwriting examiner decisions.
Hicklin RA, Eisenhart L, Richetelli N, Belcastro P, Burkes TM, Smith M, Buscaglia J, Perlman RS, Peters EM. Hicklin RA, et al. Proc Natl Acad Sci U S A. 2022 Dec 27;119(52):e2217508119. doi: 10.1073/pnas.2217508119. Epub 2022 Dec 19. Proc Natl Acad Sci U S A. 2022. PMID: 36534793 Free PMC article. No abstract available.
On the (mis)calculation of forensic science error rates.
Kukucka J. Kukucka J. Proc Natl Acad Sci U S A. 2022 Dec 27;119(52):e2215695119. doi: 10.1073/pnas.2215695119. Epub 2022 Dec 19. Proc Natl Acad Sci U S A. 2022. PMID: 36534798 Free PMC article. No abstract available.

References

1. National Research Council, Strengthening Forensic Science in the United States: A Path Forward (The National Academies Press, 2009).
1. President’s Council of Advisors on Science and Technology (PCAST), Report to the President. Forensic Science in Criminal Courts: Ensuring Scientific Validity of Feature-Comparison Methods (Executive Office of the President, 2016).
1. National Commission on Forensic Science, Views of the Commission Facilitating Research on Laboratory Performance. https://www.justice.gov/archives/ncfs/page/file/909311/download. Accessed 6 July 2022.
1. Kam M., Wetstein J., Conn R., Proficiency of professional document examiners in writer identification. J. Forensic Sci. 39, 5–14 (1994).
1. Kam M., Fielding G., Conn R., Writer identification by professional document examiners. J. Forensic Sci. 42, 778–786 (1997).

Publication types

Actions

MeSH terms

Actions
Actions
Actions
Actions
Actions
Actions

LinkOut - more resources

Full Text Sources

Save citation to file

Email citation

Add to Collections

Add to My Bibliography

Your saved search

Create a file for external citation management software

Your RSS Feed

Accuracy and reliability of forensic handwriting comparisons

Affiliations

Accuracy and reliability of forensic handwriting comparisons

Authors

Affiliations

Abstract

Conflict of interest statement

Figures

Comment in

References

Publication types

MeSH terms

LinkOut - more resources

Full Text Sources