Skip to main page content
U.S. flag

An official website of the United States government

Dot gov

The .gov means it’s official.
Federal government websites often end in .gov or .mil. Before sharing sensitive information, make sure you’re on a federal government site.

Https

The site is secure.
The https:// ensures that you are connecting to the official website and that any information you provide is encrypted and transmitted securely.

Access keys NCBI Homepage MyNCBI Homepage Main Content Main Navigation
. 2022 Mar:57:102655.
doi: 10.1016/j.fsigen.2021.102655. Epub 2021 Dec 28.

A multi-dimensional evaluation of the 'NIST 1032' sample set across four forensic Y-STR multiplexes

Affiliations

A multi-dimensional evaluation of the 'NIST 1032' sample set across four forensic Y-STR multiplexes

Carolyn R Steffen et al. Forensic Sci Int Genet. 2022 Mar.

Abstract

This manuscript reports Y-chromosomal short tandem repeat (Y-STR) haplotypes for 1032 male U.S. population samples across 30 Y-STR loci characterized by three capillary electrophoresis (CE) length-based kits (PowerPlex Y23 System, Yfiler Plus PCR Amplification Kit, and Investigator Argus Y-28 QS Kit) and one sequence-based kit (ForenSeq DNA Signature Prep Kit): DYF387S1, DYS19, DYS385 a/b, DYS389I, DYS389II, DYS390, DYS391, DYS392, DYS393, DYS437, DYS438, DYS439, DYS448, DYS449, DYS456, DYS458, DYS460, DYS481, DYS505, DYS518, DYS522, DYS533, DYS549, DYS570, DYS576, DYS612, DYS627, DYS635, DYS643, and Y-GATA-H4. The length-based Y-STR haplotypes include six loci that are not reported in the sequence-based kit (DYS393, DYS449, DYS456, DYS458, DYS518, and DYS627), whereas three loci included in the sequence-based kit are not present in length-based kits (DYS505, DYS522, and DYS612). For the latter, a custom multiplex was used to generate CE length-based data, allowing 1032 samples to be evaluated for concordance across the 30 Y-STR loci included in these four commercial Y-STR typing kits. Discordances between typing methods were analyzed further to assess underlying causes such as primer binding site mutations and flanking region insertions/deletions. Allele-level frequency and statistical information is provided for sequenced loci, excluding the multi-copy loci DYF387S1 and DYS385 a/b, for which locus-specific haplotype-level frequencies are provided instead. The resulting data reveals the degree of information gained through sequencing: 88% of sequenced Y-STR loci contain additional sequence-based alleles compared to length-based data, with the DYS389II locus containing the most additional alleles (51) observed by sequencing. Despite these allelic increases, only minimal improvement was observed in haplotype resolution by sequence, with all four commercial kits providing a similar ability to differentiate length-based haplotypes in this sample set. Finally, a subset of 369 male samples were compared to their corresponding additionally sequenced father samples, revealing the sequence basis for the 50 length-based changes observed, and no additional sequence-based mutations. GenBank accession numbers are reported for each unique sequence, and associated records are available in the STRSeq Y-Chromosomal STR Loci National Center for Biotechnology Information (NCBI) BioProject, accession PRJNA380347. Haplotype data is updated in the Y-STR Haplotype Reference Database (YHRD) for the 'NIST 1032' data set to now achieve the level of maximal haplotype of YHRD. All supplementary files including revisions to previously published Y-STR data are available in the NIST Public Data Repository: U.S. population data for human identification markers, DOI 10.18434/t4/1500024.

Keywords: DNA sequencing; Haplotype frequency; MPS; NGS; Short tandem repeat; Y-STR typing.

PubMed Disclaimer

Conflict of interest statement

Conflict of interest

The authors declare no conflict of interest.

Figures

Fig. 1.
Fig. 1.
Schematic representation of the Y chromosome with the loci analyzed in this study. The relative positions of each marker determined by their position in the GRCh38 human reference genome. The short and long arms of the chromosome are labeled ‘p’ and ‘q’, respectively. The heterochromatic region is represented by grey shading, and the pseudoautosomal regions (PAR) are represented by black caps.
Fig. 2.
Fig. 2.
Schematic representation of the multipronged data analysis workflow used in this study. On the left a MiSeq FGx represents data generation on this platform. CE kit abbreviations are the same as given in Table 1. Arrows show the progression through the processes to reach the final data set.
Fig. 3.
Fig. 3.
Schematic representation of a (5′ or 3′) flanking region of DYS456 is on the top of the graph. The gray box on the right is the DYS456 repeat region and the attached black line represents the flanking region. The three arrows on the left represent possible primer placements, and the three lines below represent the amplicons of the four kits. These approximate positions are inferred from the allele calls observed from the four commercial kits (Table 4.), suggesting the location of indels ‘[]’ in this sample pair. The ‘[AT]’ represents a sequenced indel inside the repeat region, while the ‘[2 bp]’ represents an expected 2 bp deletion amplified by YFP but not ForenSeq (resulting in 16 and 16.2 allele calls, respectively), and at the primer binding site of ArgusY28 and PPY23 (resulting in null alleles).
Fig. 4.
Fig. 4.
Resolution of haplotypes of the ‘NIST 1032’ sample set through the evolution of commercial kits. Fig. 4a. shows the histograms of haplotypes for four different series of Y-STR kits, starting with MiHT for each series with progression through the earlier to current kit versions. In each histogram unique haplotypes in the set are shown in light grey, and shared haplotypes across samples are colored darker. Number of markers for each set are noted at the bottom of each histogram. The first three boxes of the figure show data for the CE kits, while the last box (with black border) depicts the haplotypes described by the ForenSeq marker set, based on derived length alleles in the middle and sequenced alleles on the right. Maximum haplotype resolution would show a single light grey bar, which was not achieved for this sample set by any of these kits. Fig. 4b. is an alluvial graph representing the break-down (bended connector lines) of the shared haplotypes of the MiHT (excluding the n = 668 haplotypes which were unique by the MiHT) by marker sets of commercial CE kits in three stages: starting from the shared haplotypes of the MiHT, followed by early versions of the kits, and the current commercial versions at the third horizontal line for each series/manufacturer. Below the three CE-based kits is the ForenSeq kit, starting from haplotypes of MiHT, then change via length to sequence-based haplotypes. The thickness of the connectors represents the number of samples within the shared haplotypes. The horizontal black bars represent unique haplotypes and remaining unresolved haplotype pairs in the current kits. Maximum haplotype resolution would show a single horizontal black bar, which was not achieved by any of these kits, leaving one to three unresolved pairs for this sample set.
Fig. 5.
Fig. 5.
Allele counts and gene diversity values for both length-based and sequence-based alleles for 25 Y-STRs in the ForenSeq kit (UAS 24 plus the unreported DYS456) for the ‘NIST 1032’ set. The ‘x’ axis lists the Y-STRs, the left ‘y’ axis is the number of alleles, and the right ‘y’ axis is the gene diversity. For each locus, the bars of the histogram consist of up to three sections: grey = number of length-based alleles, orange = number of new alleles resulting from repeat region sequence variations and blue = additional new alleles from flanking region variants. The number inside of each section indicates the count of unique alleles. Gene diversity values are plotted for length- (◯) and sequence-based alleles (×). Gene diversity values for the two multi-copy loci (DYF387S1 and DYS385 a/b) are calculated based on locus-specific haplotype frequencies.

Similar articles

Cited by

References

    1. Kayser M, Forensic use of Y-chromosome DNA: a general overview, Hum. Genet. 136 (5) (2017) 621–635. - PMC - PubMed
    1. Ballantyne KN, et al., Toward male individualization with rapidly mutating y-chromosomal short tandem repeats, Hum. Mutat. 35 (8) (2014) 1021–1032. - PMC - PubMed
    1. Coble MD, Hill CR, Butler JM, Haplotype data for 23 Y-chromosome markers in four U.S. population groups, Forensic Sci. Int. Genet 7 (3) (2013) e66–e68. - PubMed
    1. Gettings KB, et al., Sequence-based U.S. Population Data for 27 Autosomal STR loci, 37, Forensic Sci. Int., Genet., 2018, pp. 106–115. - PMC - PubMed
    1. Borsuk LA, et al., Sequence-based U.S. population data for the SE33 locus, Electrophoresis 0 (2018) 1–8. - PMC - PubMed

Publication types