Skip to main page content
U.S. flag

An official website of the United States government

Dot gov

The .gov means it’s official.
Federal government websites often end in .gov or .mil. Before sharing sensitive information, make sure you’re on a federal government site.

Https

The site is secure.
The https:// ensures that you are connecting to the official website and that any information you provide is encrypted and transmitted securely.

Access keys NCBI Homepage MyNCBI Homepage Main Content Main Navigation
. 2025 Oct 20;5(10):101186.
doi: 10.1016/j.crmeth.2025.101186. Epub 2025 Sep 26.

Reference-guided assembly of metagenomes with MetaCompass

Affiliations

Reference-guided assembly of metagenomes with MetaCompass

Tu Luan et al. Cell Rep Methods. .

Abstract

Metagenomic studies have primarily relied on de novo assembly for reconstructing genes and genomes from microbial mixtures. While reference-guided approaches have been employed in the assembly of single organisms, they have not been used in a metagenomic context. Here, we develop an effective approach for reference-guided metagenomic assembly that can complement and improve upon de novo metagenomic assembly methods for certain organisms. Such approaches will be increasingly useful as more genomes are sequenced and made publicly available.

Keywords: CP: genetics; CP: microbiology; comparative assembly; metagenome assembly; metagenomics; microbiome; reference-guided assembly.

PubMed Disclaimer

Conflict of interest statement

Declaration of interests The authors declare no competing interests.

Figures

None
Graphical abstract
Figure 1
Figure 1
Overview of the MetaCompass pipeline The MetaCompass pipeline begins with reference selection, where sample-specific reference genomes are identified based on marker gene coverage in the input reads. These genomes are clustered to remove redundancy and assembled in an iterative process based on the order of each cluster’s similarity to the input read set. The detailed workflow of reference culling and assembly is presented in Figure S1. The final output of the MetaCompass pipeline is a set of metagenome-assembled genomes (MAGs).
Figure 2
Figure 2
Comparing the total assembly size produced by MetaCompass to the de novo assembly size Since MetaCompass can only assemble sequences that align to reference genomes, its assembly size reflects the fraction of the sample that can be “explained” by reference genome collections. The boxes represent the interquartile range (IQR: 25%–75%); whiskers are extending to the point that is furthest above/below the box that is within 1.5 IQR from the box. Circles represent outliers.
Figure 3
Figure 3
Comparison between MetaCompass and de novo assembly methods on the measurement of NG25 of the clusters vs. the depth of coverage of references from all samples The length of the line connecting the two assembly NG25 points of the sample cluster represents the difference between the NG values of the two points.
Figure 4
Figure 4
The distribution of non-singleton clusters Top: fraction of non-singleton clusters per body site. Bottom: number of genomes assembled per non-singleton cluster across body sites. The boxes represent the interquartile range (IQR: 25%–75%); whiskers are extending to the point that is furthest above/below the box that is within 1.5 IQR from the box. Circles represent outliers.
Figure 5
Figure 5
The dependence of read mapping rate on input read count and sample richness Top: MetaCompass results on the entire dataset. Bottom: comparison with de novo methods on six samples with diverse characteristics. Left: total read count. Right: sample richness. Broadly, read mapping rates increase with the total sequencing effort and decrease with sample richness. Mapping rate is typically higher for de novo methods, since the MetaCompass assemblies only capture the reads that can be mapped to reference genome sequences.

Update of

References

    1. Hooper L.V., Gordon J.I. Commensal Host-Bacterial Relationships in the Gut. Science. 2001;292:1115–1118. doi: 10.1126/science.1058709. - DOI - PubMed
    1. Tringe S.G., Rubin E.M. Metagenomics: DNA sequencing of environmental samples. Nat. Rev. Genet. 2005;6:805–814. doi: 10.1038/nrg1709. - DOI - PubMed
    1. Qin J., Li R., Raes J., Arumugam M., Burgdorf K.S., Manichanh C., Nielsen T., Pons N., Levenez F., Yamada T., et al. A human gut microbial gene catalogue established by metagenomic sequencing. Nature. 2010;464:59–65. doi: 10.1038/nature08821. - DOI - PMC - PubMed
    1. Sunagawa S., Acinas S.G., Bork P., Bowler C., Tara Oceans Coordinators. Eveillard D., Gorsky G., Guidi L., Iudicone D., Karsenti E., et al. Tara Oceans: towards global ocean ecosystems biology. Nat. Rev. Microbiol. 2020;18:428–445. doi: 10.1038/s41579-020-0364-5. - DOI - PubMed
    1. Human Microbiome Project Consortium A framework for human microbiome research. Nature. 2012;486:215–221. doi: 10.1038/nature11209. - DOI - PMC - PubMed

LinkOut - more resources