Skip to main page content
U.S. flag

An official website of the United States government

Dot gov

The .gov means it’s official.
Federal government websites often end in .gov or .mil. Before sharing sensitive information, make sure you’re on a federal government site.

Https

The site is secure.
The https:// ensures that you are connecting to the official website and that any information you provide is encrypted and transmitted securely.

Access keys NCBI Homepage MyNCBI Homepage Main Content Main Navigation
. 2023 Feb 20;5(1):lqad013.
doi: 10.1093/nargab/lqad013. eCollection 2023 Mar.

FrangiPANe, a tool for creating a panreference using left behind reads

Affiliations

FrangiPANe, a tool for creating a panreference using left behind reads

Tranchant-Dubreuil Christine et al. NAR Genom Bioinform. .

Abstract

We present here FrangiPANe, a pipeline developed to build panreference using short reads through a map-then-assemble strategy. Applying it to 248 African rice genomes using an improved CG14 reference genome, we identified an average of 8 Mb of new sequences and 5290 new contigs per individual. In total, 1.4 G of new sequences, consisting of 1 306 676 contigs, were assembled. We validated 97.7% of the contigs of the TOG5681 cultivar individual assembly from short reads on a newly long reads genome assembly of the same TOG5681 cultivar. FrangiPANe also allowed the anchoring of 31.5% of the new contigs within the CG14 reference genome, with a 92.5% accuracy at 2 kb span. We annotated in addition 3252 new genes absent from the reference. FrangiPANe was developed as a modular and interactive application to simplify the construction of a panreference using the map-then-assemble approach. It is available as a Docker image containing (i) a Jupyter notebook centralizing codes, documentation and interactive visualization of results, (ii) python scripts and (iii) all the software and libraries requested for each step of the analysis. We foreseen our approach will help leverage large-scale illumina dataset for pangenome studies in GWAS or detection of selection.

PubMed Disclaimer

Figures

Figure 1.
Figure 1.
Summary of the approach ‘Map-then-assemble’ implemented in FrangiPANe. Raw pair-ended short reads are mapped to the reference genome, separately for each sample, and unmapped reads are assembled. Next, contigs from all individuals are pooled and clustered to reduce redundancy. Non-redundant contigs are finally anchored on the genome.
Figure 2.
Figure 2.
Contigs location on the 12 chromosomes of CG14. A total of 152 411 sequences were uniquely anchored, representing 31.5% of the total number of contigs.

References

    1. Springer N.M., Ying K., Fu Y., Ji T., Yeh C.-T., Jia Y., Wu W., Richmond T., Kitzman J., Rosenbaum H.et al. .. Maize inbreds exhibit high levels of copy number variation (CNV) and presence/absence variation (PAV) in genome content. PLoS Genet. 2009; 5:e1000734. - PMC - PubMed
    1. Wang W., Mauleon R., Hu Z., Chebotarov D., Tai S., Wu Z., Li M., Zheng T., Fuentes R.R., Zhang F.et al. .. Genomic variation in 3,010 diverse accessions of Asian cultivated rice. Nature. 2018; 557:43–49. - PMC - PubMed
    1. Tranchant-Dubreuil C., Rouard M., Sabot F.. Plant Pangenome: impacts on Phenotypes and Evolution. Annual Plant Reviews online. 2019; 2:453–478.
    1. Bayer P.E., Golicz A.A., Scheben A., Batley J., Edwards D.. Plant pan-genomes are the new reference. Nat. Plants. 2020; 6:914–920. - PubMed
    1. Schatz M.C., Maron L.G., Stein J.C., Wences A., Gurtowski J., Biggers E., Lee H., Kramer M., Antoniou E., Ghiban E.et al. .. Whole genome de novo assemblies of three divergent strains of rice, Oryza sativa, document novel gene space of aus and indica. Genome Biol. 2014; 15:506. - PMC - PubMed