Skip to main page content
U.S. flag

An official website of the United States government

Dot gov

The .gov means it’s official.
Federal government websites often end in .gov or .mil. Before sharing sensitive information, make sure you’re on a federal government site.

Https

The site is secure.
The https:// ensures that you are connecting to the official website and that any information you provide is encrypted and transmitted securely.

Access keys NCBI Homepage MyNCBI Homepage Main Content Main Navigation
. 2021 Jan 27;12(2):185.
doi: 10.3390/genes12020185.

Graph Algorithms for Mixture Interpretation

Affiliations

Graph Algorithms for Mixture Interpretation

Benjamin Crysup et al. Genes (Basel). .

Abstract

The scale of genetic methods are presently being expanded: forensic genetic assays previously were limited to tens of loci, but now technologies allow for a transition to forensic genomic approaches that assess thousands to millions of loci. However, there are subtle distinctions between genetic assays and their genomic counterparts (especially in the context of forensics). For instance, forensic genetic approaches tend to describe a locus as a haplotype, be it a microhaplotype or a short tandem repeat with its accompanying flanking information. In contrast, genomic assays tend to provide not haplotypes but sequence variants or differences, variants which in turn describe how the alleles apparently differ from the reference sequence. By the given construction, mitochondrial genetic assays can be thought of as genomic as they often describe genetic differences in a similar way. The mitochondrial genetics literature makes clear that sequence differences, unlike the haplotypes they encode, are not comparable to each other. Different alignment algorithms and different variant calling conventions may cause the same haplotype to be encoded in multiple ways. This ambiguity can affect evidence and reference profile comparisons as well as how "match" statistics are computed. In this study, a graph algorithm is described (and implemented in the MMDIT (Mitochondrial Mixture Database and Interpretation Tool) R package) that permits the assessment of forensic match statistics on mitochondrial DNA mixtures in a way that is invariant to both the variant calling conventions followed and the alignment parameters considered. The algorithm described, given a few modest constraints, can be used to compute the "random man not excluded" statistic or the likelihood ratio. The performance of the approach is assessed in in silico mitochondrial DNA mixtures.

Keywords: graph algorithm; massively parallel sequencing; mitochondrial mixtures; mixture interpretation; probabilistic genotyping.

PubMed Disclaimer

Conflict of interest statement

The authors declare no conflict of interest. The funders had no role in the design of the study; in the collection, analyses, or interpretation of data; in the writing of the manuscript, or in the decision to publish the results.

Figures

Figure 1
Figure 1
A DNA sequence and sequence variants represented as a directed acyclic graph (DAG). A DNA sequence (AACAAGT) can be thought of as a collection of single nucleotide vertices connected by directed edges (arrows). If AACAAGT is considered as the reference sequence (horizontal path), sequence variants (differing arrows) can be thought of as directed edges that depart from the horizontal path. The A deletion (open diamond arrow) is depicted, as well as a C to A transversion (open arrow), and as well as an A insertion between the A and the G (solid diamond arrows).
Figure 2
Figure 2
A variant graph for a mixture with two different bases at two adjacent sites. There are four possible sequences that can match this graph (AATTT, AATCT, AACTT, AACCT). All of the nodes in this graph describe sequence data that were in the mixture. Consequently, a suspected set of contributors must (collectively) represent every node in this graph.
Figure 3
Figure 3
A variant graph for a mixture of two individuals involving an indel. A person of interest (POI) (AACCT) is mixed with another POI (AACT), and a variant graph is created to describe this mixture. A matching set of contributors must account for every node on this graph. Additionally, all edges must be accounted for if the indel is considered correctly.
Figure 4
Figure 4
A remade version of the graph in Figure 2, using an epsilon node. In this version of the graph, all edges are known to be in the mixture. Note that this graph can be turned into the one from Figure 2 by examining all nodes reachable from a given node without consuming sequence and adding (uncertain) edges.
Figure 5
Figure 5
A reference sequence with epsilon nodes (black circles) added between each base (as well as terminal epsilon nodes, in gray, added to denote the start and end).
Figure 6
Figure 6
The graph from Figure 5 with three alleles added to it (3T, 3A, and 6C). Note that one of the bases (3T) is the reference sequence. Any variation at a base will delete the reference base.
Figure 7
Figure 7
The graph from Figure 6 with an insertion added (4.1T). Note that this insertion adds an additional epsilon node.
Figure 8
Figure 8
The graph from Figure 7 with two adjacent deletions (5 del, 6 del) added to it. Note that nothing is indicated about the relationship between the insertion and the deletion.
Figure 9
Figure 9
The proportion of haplotype pairs that explain two-person mixtures. Up to 100,000 in silico two-person mixtures were created using haplotypes from the Human Mitochondrial database (HmtDB). Mixtures were created within populations taking pairs of individuals from the African (AF) and European (EU) continental groups (colors). The number of distinct haplotype pairs that can then explain the mixture were tabulated (x-axis) as well as the frequency of the occurrence (y-axis, square root scale). Haplotype sampling occurred with replacement, thus the minimum number of haplotype pairs that can explain the mixture is one.
Figure 10
Figure 10
The number of haplotypes consistent with two-person mixtures. Up to 100,000 in silico two-person mitochondrial DNA mixtures from the African (AF, left pane) and European (EU, right pane) continental groups (colors) were created. The number of distinct haplotypes that could not be excluded (x-axis) were tabulated and the frequency of such an occurrence (y-axis, square root scale) was tabulated. Consistent haplotypes are defined as tracing some path in the variant graph.
Figure 11
Figure 11
Match statistics of two-person mixtures. The two-person mixtures were created within each of two continental groups (colors, AF: African, EU: European) by sampling haploid sequences from a database. The likelihood of the pair (treating one individual as an unknown, log10 transformed, y-axis) is contrasted against the probability of the random man not excluded (p(RMNE), log10 transformed, x-axis).

Similar articles

References

    1. Coble M.D., Bright J.A. Probabilistic genotyping software: An overview. Forensic Sci. Int. Genet. 2019;38:219–224. doi: 10.1016/j.fsigen.2018.11.009. - DOI - PubMed
    1. Krawczak M. Forensic interpretation of haploid DNA mixtures. Int. Congress Ser. 2006;1288:477–483. doi: 10.1016/j.ics.2005.10.041. - DOI
    1. Ge J., Budowle B., Chakraborty R. Interpreting Y chromosome STR haplotype mixture. Leg. Med. 2010;12:137–143. doi: 10.1016/j.legalmed.2010.02.003. - DOI - PubMed
    1. Ge J., Budowle B., Chakraborty R. Comments on “Interpreting Y chromosome STR haplotype mixture”. Leg. Med. 2011;13:52–53. doi: 10.1016/j.legalmed.2010.09.002. - DOI - PubMed
    1. Voskoboinik L., Darvasi A. Forensic identification of an individual in complex DNA mixtures. Forensic Sci. Int. Genet. 2011;5:428–435. doi: 10.1016/j.fsigen.2010.09.002. - DOI - PubMed

Publication types

Substances

LinkOut - more resources