Skip to main page content
U.S. flag

An official website of the United States government

Dot gov

The .gov means it’s official.
Federal government websites often end in .gov or .mil. Before sharing sensitive information, make sure you’re on a federal government site.

Https

The site is secure.
The https:// ensures that you are connecting to the official website and that any information you provide is encrypted and transmitted securely.

Access keys NCBI Homepage MyNCBI Homepage Main Content Main Navigation
Review
. 2019 Oct 1;11(10):2750-2766.
doi: 10.1093/gbe/evz184.

Current and Promising Approaches to Identify Horizontal Gene Transfer Events in Metagenomes

Affiliations
Review

Current and Promising Approaches to Identify Horizontal Gene Transfer Events in Metagenomes

Gavin M Douglas et al. Genome Biol Evol. .

Abstract

High-throughput shotgun metagenomics sequencing has enabled the profiling of myriad natural communities. These data are commonly used to identify gene families and pathways that were potentially gained or lost in an environment and which may be involved in microbial adaptation. Despite the widespread interest in these events, there are no established best practices for identifying gene gain and loss in metagenomics data. Horizontal gene transfer (HGT) represents several mechanisms of gene gain that are especially of interest in clinical microbiology due to the rapid spread of antibiotic resistance genes in natural communities. Several additional mechanisms of gene gain and loss, including gene duplication, gene loss-of-function events, and de novo gene birth are also important to consider in the context of metagenomes but have been less studied. This review is largely focused on detecting HGT in prokaryotic metagenomes, but methods for detecting these other mechanisms are first discussed. For this article to be self-contained, we provide a general background on HGT and the different possible signatures of this process. Lastly, we discuss how improved assembly of genomes from metagenomes would be the most straight-forward approach for improving the inference of gene gain and loss events. Several recent technological advances could help improve metagenome assemblies: long-read sequencing, determining the physical proximity of contigs, optical mapping of short sequences along chromosomes, and single-cell metagenomics. The benefits and limitations of these advances are discussed and open questions in this area are highlighted.

Keywords: horizontal gene transfer; lateral gene transfer; metagenome-assembled genomes; microbiome; shotgun metagenomics.

PubMed Disclaimer

Figures

<sc>Fig</sc>. 1.
Fig. 1.
Examples of microbial gene gain and loss. (A) Illustration of operon duplications between two genomes. Arrows indicate genes and colored bars indicate different regions of homologous DNA shared between the two genomes. This simplified example is inspired by the mercury resistance operon duplications identified in Rhodanobacter genomes (Hemme et al. 2016). High levels of mercury resistance genes were reported in groundwater metagenomes dominated by Rhodanobacter, but genomic analyses were required to identify putative duplication events. (B) An example of adaptation through loss-of-function. Tetracycline (indicated by chemical structure) is largely imported through the OmpF porin in Escherichia coli. Deleting the gene encoding this porin allows for higher tetracycline tolerance (Thanassi et al. 1995). (C) Example of de novo gain of the BSC4 gene in Saccharomyces cerevisiae compared with other Saccharomycetaceae (Cai et al. 2008). Simplified visualization of orthologous region across fungi demonstrates that BSC4 is only present in S. cerevisiae. (D) Distribution of two KEGG orthologs (Kanehisa and Goto 2000) (K08928, K08929), which are responsible for anoxygenic photosystem II (M00597), that are broadly distributed across the prokaryotic tree likely due to horizontal gene transfer. The presence and absence of these gene families is indicated in blue and gray, respectively. Panel created with AnnoTree (Mendler et al. 2019).
<sc>Fig</sc>. 2.
Fig. 2.
—Key approaches to infer potential HGT in MGS data. (A) Identifying genes frequently transferred through HGT at differential relative abundance between two sites. One possible explanation for these observations is HGT, which is frequently hypothesized in the literature although this is not based on direct evidence. (B) Identifying outlier genes in short assembled contigs using a compositional approach (Tamames and Moya 2008). This approach involves tabulating k-mer frequencies within each gene and calculating the pairwise Pearson correlation between all genes within the contig. Outlier genes with atypical k-mer composition can then be identified, which are candidates for HGT (such as gene 4 in this example). (C) Isolate reference genomes have been used with MGS data on several occasions. One example usage is to map the metagenomics reads to existing reference genomes to identify genomic regions not found in the metagenome. HGT is one possible explanation for the absence of reads mapped to a particular region of the reference genome as shown in this example. (D) Generating high-quality MAGs allows any method for identifying HGT in isolate genomes to be applied. The example shown here is of detecting genomic regions that have divergent k-mer composition compared with the rest of the genome. Note that in this simplified example only one k-mer is being compared whereas typically the profile of many k-mers would be compared.
<sc>Fig</sc>. 3.
Fig. 3.
—Promising technologies that could improve metagenome assembly. (A) Long-read sequencing as represented by single-molecule real-time (SMRT) sequencing, which takes place in a zero-mode waveguide. Fluorescently labeled nucleotides are added one at a time at the bottom of the well as the new strand of the input DNA is synthesized. The fluorescence of each added nucleotide is measured to determine the sequence. (B) Illustration of how relationships between contigs based on chromosome conformation capture can be visualized. This simplified illustration is based on the previously determined relationship between Escherichia coli contigs (Marbouty et al. 2014). The darker the shade of green, the higher the contact frequency of contigs. This visualization displays how the contig genomic ordering can be determined through chromosome conformation capture. The contig contact map can be used to improve the scaffolding step of the genome assembly. (C) Diagram illustrating principle of optical maps (blue) improving genomic assemblies (green). Solid black bars indicate occurrences of a short DNA sequence along the genome, which can be used to order contigs and correct assembly errors. (D) Simplified protocol for barcoding genomic fragments so that reads originating from the same high molecular weight (MW) DNA molecule can be identified. (E) Key steps required before single-cell sequencing. Individual cells need to be isolated using one of several technique (e.g., flow cytometry as shown in this panel) and then whole-genome amplification is conducted using multiple displacement amplification. The small arrows indicate amplified regions.

Similar articles

Cited by

References

    1. Akhter S, Aziz RK, Edwards RA.. 2012. PhiSpy: a novel algorithm for finding prophages in bacterial genomes that combines similarity-and composition-based strategies. Nucleic Acids Res. 40(16):e126. - PMC - PubMed
    1. Akiba T, Koyama K, Ishiki Y, Kimura S, Fukushima T.. 1960. On the mechanism of the development of multiple-drug-resistant clones of Shigella. Jpn J Microbiol. 4:219–227. - PubMed
    1. Albalat R, Cañestro C.. 2016. Evolution by gene loss. Nat Rev Genet. 17(7):379–391. - PubMed
    1. Alneberg J, et al. 2014. Binning metagenomic contigs by coverage and composition. Nat Methods. 11(11):1144–1146. - PubMed
    1. Altschul SF, Gish W, Miller W, Myers EW, Lipman DJ.. 1990. Basic Local Alignment Search Tool. J Mol Biol. 215(3):403–410. - PubMed

Publication types