Skip to main page content
U.S. flag

An official website of the United States government

Dot gov

The .gov means it’s official.
Federal government websites often end in .gov or .mil. Before sharing sensitive information, make sure you’re on a federal government site.

Https

The site is secure.
The https:// ensures that you are connecting to the official website and that any information you provide is encrypted and transmitted securely.

Access keys NCBI Homepage MyNCBI Homepage Main Content Main Navigation
. 2013;9(12):e1003397.
doi: 10.1371/journal.pcbi.1003397. Epub 2013 Dec 19.

Inferring the source of transmission with phylogenetic data

Affiliations

Inferring the source of transmission with phylogenetic data

Erik M Volz et al. PLoS Comput Biol. 2013.

Abstract

Identifying the source of transmission using pathogen genetic data is complicated by numerous biological, immunological, and behavioral factors. A large source of error arises when there is incomplete or sparse sampling of cases. Unsampled cases may act as either a common source of infection or as an intermediary in a transmission chain for hosts infected with genetically similar pathogens. It is difficult to quantify the probability of common source or intermediate transmission events, which has made it difficult to develop statistical tests to either confirm or deny putative transmission pairs with genetic data. We present a method to incorporate additional information about an infectious disease epidemic, such as incidence and prevalence of infection over time, to inform estimates of the probability that one sampled host is the direct source of infection of another host in a pathogen gene genealogy. These methods enable forensic applications, such as source-case attribution, for infectious disease epidemics with incomplete sampling, which is usually the case for high-morbidity community-acquired pathogens like HIV, Influenza and Dengue virus. These methods also enable epidemiological applications such as the identification of factors that increase the risk of transmission. We demonstrate these methods in the context of the HIV epidemic in Detroit, Michigan, and we evaluate the suitability of current sequence databases for forensic and epidemiological investigations. We find that currently available sequences collected for drug resistance testing of HIV are unlikely to be useful in most forensic investigations, but are useful for identifying transmission risk factors.

PubMed Disclaimer

Conflict of interest statement

The authors have declared that no competing interests exist.

Figures

Figure 1
Figure 1. Four transmission trees between hosts i, j and k are shown (center) that are consistent with the pathogen gene genealogy (left).
If the host k is not sampled, the resulting gene genealogy is shown at right. Transmission trees where i directly interacts with j are highlighted. The unsampled unit k may act as either a common source of infection for i and j or a an intermediate infection between i and j.
Figure 2
Figure 2. A schematic of a gene tree with variables of the coalescent model corresponding to tips, branches, and nodes of the tree.
Figure 3
Figure 3. Model used to simulate HIV phylogenies.
Left: Simulated number of infections over time. Infections are aggregated by stage of infection (top) and by diagnosis status (bottom). Right: Flow-diagram showing the progression of infected individuals through 5 stages of infection, diagnosis, and death. The color of compartments correspond to diagnosis status in prevalence figures on left. The color of outlines corresponds to stage of infection in prevalence figures on left. The per-capita rate of state transitions is shown over arrows.
Figure 4
Figure 4. Comparison of infector probabilities and frequency of transmission events in simulations.
On the left, infector probabilities are calculated for the true transmission genealogy in 20 independent simulated HIV epidemics and samples of 662 individuals. On the right, infector probabilities are based on simulated sequence data for a single simulation and a sample of 662 individuals. Data are pooled from 50 trees sampled from the Bayesian phylogenetic posterior distribution. Middle: The estimated infector probabilities (x-axis) versus whether a transmission actually occured (hash marks) for all pairs of sampled individuals in the HIV simulation. The red line shows a local-average of the frequency of transmission events. The green line shows a linear regression of true transmission events (coded zero or one) on the estimated infector probability. Histograms show the frequency of estimated infector probabilities when transmissions happen (top) and when they don't (bottom).
Figure 5
Figure 5. Performance of estimated infector probabilities.
Left: Estimated infector probabilities based on the true transmission genealogy versus infector probabilities based on a sample of trees from the Bayesian phylogenetic posterior distribution. The red line shows formula image. Right: True positive versus false positive rates (ROC) using estimated infector probabilities for classification of who infected whom in simulated HIV epidemics. The ROC curves were calculated for 208 pairs of individuals clustered in cherries in the transmission genealogy. Estimates are shown for the true transmission genealogy for a sample of 662 individuals and for the average infector probability calculated from a sample of 50 trees from a Bayesian phylogenetic posterior distribution.
Figure 6
Figure 6. Left: The log of the expected number of transmissions to at least one other sample unit is shown in aggregated form for two risk groups.
The high risk category transmits at a rate 10× that of the low risk category. Right: A quantile-quantile comparison of the distributions of log infector probabilities. A quantile-quantile comparison for undiagnosed and diagnosed is shown at bottom right.

References

    1. Pybus O, Rambaut A (2009) Evolutionary analysis of the dynamics of viral infectious disease. Nature Reviews Genetics 10: 540–550. - PMC - PubMed
    1. Baker S, Hanage WP, Holt KE (2010) Navigating the future of bacterial molecular epidemiology. Current Opinion in Microbiology 13: 640–645. - PMC - PubMed
    1. Volz E, Koelle K, Bedford T (2013) Viral phylodynamics. PLoS Computational Biology 9: e1002947. - PMC - PubMed
    1. Grenfell B, Pybus O, Gog J, Wood J, Daly J, et al. (2004) Unifying the epidemiological and evolutionary dynamics of pathogens. Science 303: 327. - PubMed
    1. Eshleman SH, Hudelson SE, Redd AD, Wang L, Debes R, et al. (2011) Analysis of genetic linkage of HIV from couples enrolled in the HIV Prevention Trials Network 052 trial. Journal of Infectious Diseases 204: 1918–1926. - PMC - PubMed

Publication types