Skip to main page content
U.S. flag

An official website of the United States government

Dot gov

The .gov means it’s official.
Federal government websites often end in .gov or .mil. Before sharing sensitive information, make sure you’re on a federal government site.

Https

The site is secure.
The https:// ensures that you are connecting to the official website and that any information you provide is encrypted and transmitted securely.

Access keys NCBI Homepage MyNCBI Homepage Main Content Main Navigation
. 2018 Nov:37:106-115.
doi: 10.1016/j.fsigen.2018.07.013. Epub 2018 Jul 19.

Sequence-based U.S. population data for 27 autosomal STR loci

Affiliations

Sequence-based U.S. population data for 27 autosomal STR loci

Katherine Butler Gettings et al. Forensic Sci Int Genet. 2018 Nov.

Abstract

This manuscript reports Short Tandem Repeat (STR) sequence-based allele frequencies for 1036 samples across 27 autosomal STR loci: D1S1656, TPOX, D2S441, D2S1338, D3S1358, D4S2408, FGA, D5S818, CSF1PO, D6S1043, D7S820, D8S1179, D9S1122, D10S1248, TH01, vWA, D12S391, D13S317, Penta E, D16S539, D17S1301, D18S51, D19S433, D20S482, D21S11, Penta D, and D22S1045. Sequence data were analyzed by two bioinformatic pipelines and all samples have been evaluated for concordance with alleles derived from CE-based analysis at all loci. Each reported sequence includes high-quality flanking sequence and is properly formatted according to the most recent guidance of the International Society for Forensic Genetics. In addition, GenBank accession numbers are reported for each sequence, and associated records are available in the STRSeq BioProject (https://www.ncbi.nlm.nih.gov/bioproject/380127). The D3S1358 locus demonstrates the greatest average increase in heterozygosity across populations (approximately 10 percentage points). Loci demonstrating average increase in heterozygosity from 10 to 5 percentage points include (in descending order) D9S1122, D13S317, D8S1179, D21S11, D5S818, D12S391, and D2S441. The remaining 19 loci each demonstrate less than 5 percentage point increase in average heterozygosity. Discussion includes the utility of this data in understanding traditional CE results, such as informing stutter models and understanding migration challenges, and considerations for population sampling strategies in light of the marked increase in rare alleles for several of the sequence-based STR loci. This NIST 1036 data set is expected to support the implementation of STR sequencing forensic casework by providing high-confidence sequence-based allele frequencies for the same sample set which are already the basis for population statistics in many U.S. forensic laboratories.

Keywords: Allele frequency; STR; Sequence.

PubMed Disclaimer

Conflict of interest statement

Conflict of interest

None

Figures

Figure 1.
Figure 1.
Example of the information provided in Supplementary Table S3: a) locus name, length-based allele, bracketed repeat region, summary of flanking region variants in 5’ to 3’ order; b) allele frequencies and counts for the full set and by population; c) reported sequence divided by 5’ flank, repeat region, 3’ flank, with important features denoted in bold red font (bold black font in printed journal); d) result of flanking region comparison to GRCh38, with differences reported explicitly relative to the chromosomal reference accession number, and the STRSeq accession number and range associated with the sequence; e) FASTA formatted sequence.
Figure 2.
Figure 2.
Across-population allele frequency distribution per locus, by sequence and by length in N = 1036. Loci are sorted in ascending order of sequence-based frequency of the most common allele at each locus (first column top to bottom followed by second column top to bottom). The first nine alleles at each locus are colored to facilitate comparisons within and across loci, with any remaining alleles shown in grayscale. Sequence data for these samples at the SE33 locus are reported in [9].
Figure 3.
Figure 3.
D3S1358 frequency distribution among the primary motifs by length-based allele and population in N = 1036. The motif is defined as: the first subunit is fixed TCTA, the second subunit is definitive of the motif with TCTG varying from one to four, and the third subunit contains a widely varying number of TCTA repeats. For simplicity, seven additional rare motif alleles present in the data set have been excluded from this figure.
Figure 4.
Figure 4.
D13S317 frequency distribution by population of the nine flanking region motifs identified in N = 1036. The first row of 5’ and 3’ flanking sequence is consistent with GRCh38, and is the most common sequence found in this data set. Dots in subsequent rows represent bases matching the first row. Flanking polymorphisms are identified by numbers one through eight in the bottom row: 1) rs73250432 C>T, 2) rs146621667 G>A, 3) rs9546005 A>T, 4) rs202043589 A>T, 5) rs1442523705 delATCT, 6) ss2137543825 A>G, 7) rs561167308 delTCTG, and 8) rs768323113 C>T. Variation in repeat region length, combined with these flanking region polymorphisms, results in 32 sequence-based alleles at this locus. Three additional D13S317 alleles in this data set result from repeat region sequence variants, each observed once, and have been excluded from this figure.
Figure 5.
Figure 5.
Allelic gains by sequence compared to gains in heterozygosity for the 27 auSTR loci in N = 1036. Two y-axes are present: left y-axis = number of alleles, plotted as columns; right y-axis = heterozygosity, plotted as circles. Differential shading in the columns indicates number of alleles by length (black), sequence in the repeat region (dark gray), and sequence in the flanking region (light gray). Black circles represent heterozygosity by length. Colored circles represent heterozygosity by sequence, binned into ranges of heterozygosity: blue = > 0.90, green = 0.90 to 0.85, yellow = 0.85 to 0.75, and orange = < 0.75.

References

    1. Wendt FR, King JL, Novroski NM, Churchill JD, Ng J, Oldt RF, McCulloh KL, Weise JA, Smith DG, Kanthaswamy S, Budowle B, Flanking region variation of ForenSeq DNA Signature Prep Kit STR and SNP loci in Yavapai Native Americans, Forensic Sci Int Genet 28 (2017) 146–154. - PubMed
    1. Wendt FR, Churchill JD, Novroski NM, King JL, Ng J, Oldt RF, McCulloh KL, Weise JA, Smith DG, Kanthaswamy S, Budowle B, Genetic analysis of the Yavapai Native Americans from West-Central Arizona using the Illumina MiSeq FGx forensic genomics system, Forensic Sci Int Genet 24 (2016) 18–23. - PubMed
    1. Novroski NM, King JL, Churchill JD, Seah LH, Budowle B, Characterization of genetic sequence variation of 58 STR loci in four major population groups, Forensic Sci Int Genet 25 (2016) 214–226. - PubMed
    1. Friis SL, Buchard A, Rockenbauer E, Borsting C, Morling N, Introduction of the Python script STRinNGS for analysis of STR regions in FASTQ or BAM files and expansion of the Danish STR sequence database to 11 STRs, Forensic Sci Int Genet 21 (2016) 68–75. - PubMed
    1. Devesse L, Ballard D, Davenport L, Riethorst I, Mason-Buck G, Court DS, Concordance of the ForenSeq™ system and characterisation of sequence-specific autosomal STR alleles across two major population groups, Forensic Science International: Genetics (2017). - PubMed

Publication types