Skip to main page content
U.S. flag

An official website of the United States government

Dot gov

The .gov means it’s official.
Federal government websites often end in .gov or .mil. Before sharing sensitive information, make sure you’re on a federal government site.

Https

The site is secure.
The https:// ensures that you are connecting to the official website and that any information you provide is encrypted and transmitted securely.

Access keys NCBI Homepage MyNCBI Homepage Main Content Main Navigation
. 2021 Mar 12;22(1):119.
doi: 10.1186/s12859-021-04038-2.

ContigExtender: a new approach to improving de novo sequence assembly for viral metagenomics data

Affiliations

ContigExtender: a new approach to improving de novo sequence assembly for viral metagenomics data

Zachary Deng et al. BMC Bioinformatics. .

Abstract

Background: Metagenomics is the study of microbial genomes for pathogen detection and discovery in human clinical, animal, and environmental samples via Next-Generation Sequencing (NGS). Metagenome de novo sequence assembly is a crucial analytical step in which longer contigs, ideally whole chromosomes/genomes, are formed from shorter NGS reads. However, the contigs generated from the de novo assembly are often very fragmented and rarely longer than a few kilo base pairs (kb). Therefore, a time-consuming extension process is routinely performed on the de novo assembled contigs.

Results: To facilitate this process, we propose a new tool for metagenome contig extension after de novo assembly. ContigExtender employs a novel recursive extending strategy that explores multiple extending paths to achieve highly accurate longer contigs. We demonstrate that ContigExtender outperforms existing tools in synthetic, animal, and human metagenomics datasets.

Conclusions: A novel software tool ContigExtender has been developed to assist and enhance the performance of metagenome de novo assembly. ContigExtender effectively extends contigs from a variety of sources and can be incorporated in most viral metagenomics analysis pipelines for a wide variety of applications, including pathogen detection and viral discovery.

Keywords: De novo assembly; Metagenomics; Next-Gen Sequencing; Pathogen detection; Viral discovery.

PubMed Disclaimer

Conflict of interest statement

The authors declare that they have no competing interests.

Figures

Fig. 1
Fig. 1
Schematic views of the ContigExtender assembly algorithm. (a) Iteratively recruit reads which overlap the edges of input contigs, then generate consensus sequence from the overlaps for form extended contigs. (b) Multiple strains may form alternative consensus contigs. Create branches when variant reads were detected. (c) A more detailed demonstration of the overlapping-consensus-branching algorithm, showing the two branches formed by depth first search (DFS). Two aligned reads have a three base disagreement region, so two different paths are formed for alternative extension. (d) Reads containing untrimmed adapters or other sequencing errors will not align well with contig and other reads. (e) Circular genome detection and extension termination
Fig. 2
Fig. 2
Pseudo code of ContigExtender algorithm
Fig. 3
Fig. 3
ContigExtender output shown alongside metaSPAdes seed contig and sequencing depth. Reads mapped to the final contig shown as wiggle plots (in blue), seed contigs generated by MetaSPAdes (dark brown line), and final contig regions that are aligned to reference viral genome (black line). The y axis is the depth in log scale and x axis is the contig length. This figure were generated from native Scalable Vector Graphics (SVG) images plotted using Python 3 scripts based on reads mapping to the viral reference genomes with blastn

Similar articles

Cited by

References

    1. Delwart E. A roadmap to the human virome. PLoS Pathog. 2013;9:e1003146. doi: 10.1371/journal.ppat.1003146. - DOI - PMC - PubMed
    1. Chiu CY. Viral pathogen discovery. Curr Opin Microbiol. 2013;16:468–478. doi: 10.1016/j.mib.2013.05.001. - DOI - PMC - PubMed
    1. Houldcroft CJ, Beale MA, Breuer J. Clinical and biological insights from viral genome sequencing. Nat Rev Microbiol. 2017;15:183–192. doi: 10.1038/nrmicro.2016.182. - DOI - PMC - PubMed
    1. Paez-Espino D, Eloe-Fadrosh EA, Pavlopoulos GA, Thomas AD, Huntemann M, Mikhailova N, et al. Uncovering Earth’s virome. Nature. 2016;536:425–430. doi: 10.1038/nature19094. - DOI - PubMed
    1. Carroll D, Daszak P, Wolfe ND, Gao GF, Morel CM, Morzaria S, et al. The global virome project. Science. 2018;359:872–874. doi: 10.1126/science.aap7463. - DOI - PubMed

LinkOut - more resources