Skip to main page content
U.S. flag

An official website of the United States government

Dot gov

The .gov means it’s official.
Federal government websites often end in .gov or .mil. Before sharing sensitive information, make sure you’re on a federal government site.

Https

The site is secure.
The https:// ensures that you are connecting to the official website and that any information you provide is encrypted and transmitted securely.

Access keys NCBI Homepage MyNCBI Homepage Main Content Main Navigation
. 2022 Aug 9;119(32):e2119944119.
doi: 10.1073/pnas.2119944119. Epub 2022 Aug 1.

Accuracy and reliability of forensic handwriting comparisons

Affiliations

Accuracy and reliability of forensic handwriting comparisons

R Austin Hicklin et al. Proc Natl Acad Sci U S A. .

Abstract

Forensic handwriting examination involves the comparison of writing samples by forensic document examiners (FDEs) to determine whether or not they were written by the same person. Here we report the results of a large-scale study conducted to assess the accuracy and reliability of handwriting comparison conclusions. Eighty-six practicing FDEs each conducted up to 100 handwriting comparisons, resulting in 7,196 conclusions on 180 distinct comparison sets, using a five-level conclusion scale. Erroneous "written by" conclusions (false positives) were reached in 3.1% of the nonmated comparisons, while 1.1% of the mated comparisons yielded erroneous "not written by" conclusions (false negatives). False positive rates were markedly higher for nonmated samples written by twins (8.7%) compared to nontwins (2.5%). Notable associations between training and performance were observed: FDEs with less than 2 y of formal training generally had higher error rates, but they also had higher true positive and true negative rates because they tended to provide more definitive conclusions; FDEs with at least 2 y of formal training were less likely to make definitive conclusions, but those definitive conclusions they made were more likely to be correct (higher positive predictive and negative predictive values). We did not observe any association between writing style (cursive vs. printing) and rates of errors or incorrect conclusions. This report also provides details on the repeatability and reproducibility of conclusions, and reports how conclusions are affected by the quantity of writing and the similarity of content.

Keywords: decision analysis; documents; error rates; forensics; handwriting.

PubMed Disclaimer

Conflict of interest statement

The authors declare no competing interest.

Figures

Fig. 1.
Fig. 1.
Distribution of conclusions. Training group results are discussed in section 3.5 (baseline dataset: All, 2,863 mated and 3,713 nonmated responses from 86 participants; group A, 2,026 mated and 2,635 nonmated responses from 63 participants; group B, 837 mated and 1,078 nonmated responses from 23 participants).
Fig. 2.
Fig. 2.
Decision rates by QKset, sorted by average conclusion. Multiple methods of assessing consensus are shown: average response, majority, plurality, and IQR. Green circles indicate nonmated QKsets in which the writers were twins. Averages were calculated by converting conclusions to an ordinal (1 to 5) scale and rounding to the nearest value. Each QKset received responses from 31 to 48 participants (mean 36.5) (baseline dataset: n = 6,576 responses).
Fig. 3.
Fig. 3.
QK472: nonmated QKset that resulted in a total of 23 erroneous Written conclusions (13 in the baseline dataset, and an additional 10 in the repeats [responses from second assignments in the repeatability dataset], 5 of which were repeated errors). The subjects who wrote the Q and K were twins. The Q was cropped to eliminate the date and “From,” making discrimination more difficult. Conclusion rates for this QKset: Written: 13 in baseline data (10 additional in repeats); ProbWritten: 19 in baseline data (10 additional in repeats); NoConc: 5 in baseline data (7 additional in repeats); ProbNot: 7 in baseline data (4 additional in repeats); NotWritten: 4 in baseline data (2 additional in repeats).
Fig. 4.
Fig. 4.
Comparison of participants by (A) rates of errors and incorrect conclusions and (B) rates of correct conclusions. Means are shown as dashed lines (examiner comparison dataset: n = 70 participants who completed at least 50 total comparisons; rates are calculated from 6,096 responses; markers in A are jittered to minimize superimpositions; rates are calculated based on mean of 37.9 mated and 49.2 nonmated QKsets per participant).
Fig. 5.
Fig. 5.
Repeatability and reproducibility of conclusions (repeatability dataset: 620 first responses [313 mated, 307 nonmated], 620 second responses [313 mated, 307 nonmated]; reproducibility: 236,366 pairwise combinations of responses [103,398 mated, 132,968 nonmated] derived from the 6,576 individual responses in the baseline dataset [2,863 mated, 3,713 nonmated]).

Comment in

References

    1. National Research Council, Strengthening Forensic Science in the United States: A Path Forward (The National Academies Press, 2009).
    1. President’s Council of Advisors on Science and Technology (PCAST), Report to the President. Forensic Science in Criminal Courts: Ensuring Scientific Validity of Feature-Comparison Methods (Executive Office of the President, 2016).
    1. National Commission on Forensic Science, Views of the Commission Facilitating Research on Laboratory Performance. https://www.justice.gov/archives/ncfs/page/file/909311/download. Accessed 6 July 2022.
    1. Kam M., Wetstein J., Conn R., Proficiency of professional document examiners in writer identification. J. Forensic Sci. 39, 5–14 (1994).
    1. Kam M., Fielding G., Conn R., Writer identification by professional document examiners. J. Forensic Sci. 42, 778–786 (1997).

Publication types

LinkOut - more resources