Skip to main page content
U.S. flag

An official website of the United States government

Dot gov

The .gov means it’s official.
Federal government websites often end in .gov or .mil. Before sharing sensitive information, make sure you’re on a federal government site.

Https

The site is secure.
The https:// ensures that you are connecting to the official website and that any information you provide is encrypted and transmitted securely.

Access keys NCBI Homepage MyNCBI Homepage Main Content Main Navigation
. 2023 Jan 6;51(D1):D777-D784.
doi: 10.1093/nar/gkac894.

UFCG: database of universal fungal core genes and pipeline for genome-wide phylogenetic analysis of fungi

Affiliations

UFCG: database of universal fungal core genes and pipeline for genome-wide phylogenetic analysis of fungi

Dongwook Kim et al. Nucleic Acids Res. .

Abstract

In phylogenomics the evolutionary relationship of organisms is studied by their genomic information. A common approach to phylogenomics is to extract related genes from each organism, build a multiple sequence alignment and then reconstruct evolution relations through a phylogenetic tree. Often a set of highly conserved genes occurring in single-copy, called core genes, are used for this analysis, as they allow efficient automation within a taxonomic clade. Here we introduce the Universal Fungal Core Genes (UFCG) database and pipeline for genome-wide phylogenetic analysis of fungi. The UFCG database consists of 61 curated fungal marker genes, including a novel set of 41 computationally derived core genes and 20 canonical genes derived from literature, as well as marker gene sequences extracted from publicly available fungal genomes. Furthermore, we provide an easy-to-use, fully automated and open-source pipeline for marker gene extraction, training and phylogenetic tree reconstruction. The UFCG pipeline can identify marker genes from genomic, proteomic and transcriptomic data, while producing phylogenies consistent with those previously reported, and is publicly available together with the UFCG database at https://ufcg.steineggerlab.com.

PubMed Disclaimer

Figures

Figure 1.
Figure 1.
Schematic illustration of the preparation of the UFCG database and pipeline. (A) The UFCG Gene database consists of novel 41 core gene markers we defined, and 20 canonical marker genes curated from fungal taxonomy literature. We built profiles for all SwissProt Fungi proteins and searched them against 1587 species-representative genome assemblies using MMseqs2. Only genes that occur as single-copy in at least 95% species were further refined and filtered by AUGUSTUS-PPX. For each gene, we offer profile hidden Markov models (HMMs) and the seed amino acid sequences, downloadable from the database. The UFCG Species database provides pre-extracted marker sequences from the genome assemblies we obtained. In addition to the marker genes we defined, we extracted ITS and BUSCO sequences from both 1587 species-representative and 9397 species-redundant fungal genome assemblies. We compiled the extracted sequences into JSON files, which are downloadable from the database. (B) Graphical representation of three main modules (profile, tree, and train) from the pipeline. The profile module accepts genomic, proteomic, and transcriptomic data of fungi and extracts marker sequences using a pre-trained set of profile HMMs. The tree module combines the set of extracted marker genes and reconstructs their phylogeny as a maximum likelihood tree using aligned and concatenated marker sequences. The train module converts custom marker sequences into profile HMMs that can be directly utilized by the profile module.
Figure 2.
Figure 2.
Existence coverage of 61 UFCG marker genes, represented as a proportion of fungal genome assemblies with a valid hit. (A) Coverage against 1587 species-representative assemblies. (B) Coverage against 9397 species-redundant assemblies. Presence of each marker gene against the given set of genome assemblies was identified using an AUGUSTUS-PPX search with their corresponding block profile HMMs. We then tallied the proportion of genome assemblies in which marker genes were (i) present, regardless of copy-number (blue bars) and (ii) present as single-copy (purple bars for canonical genes, green bars for core genes). Genes of mitochondrial origin (as annotated by the Saccharomyces genome database) were marked with a dagger (e.g. COX1). Gene names are sorted by their single-copy coverage against the species-representative assemblies.
Figure 3.
Figure 3.
Maximum likelihood (ML) tree of the concatenated alignment of UFCG marker genes, extracted from either genomic, transcriptomic or proteomic data from 34 sequence datasets originated from three species under the order Eurotiales. As outgroup we included three species from the order Onygenales (highlighted in grey). Branches of the resulting tree were annotated by their bootstrap support and GSI values. monophyletic clades clustered by their species origin were highlighted with coloured box (yellow, Talaromyces marneffei; purple, Aspergillus nidulans; green, Aspergillus niger). Type of sequence origin was marked with the respective symbol (refer to the legend).

References

    1. Hawksworth D.L., Lücking R.. Fungal diversity revisited: 2.2 to 3.8 million species. Microbiol. Spectrum. 2017; 5:1–17. - PMC - PubMed
    1. O’Leary N.A., Wright M.W., Brister J.R., Ciufo S., Haddad D., McVeigh R., Rajput B., Robbertse B., Smith-White B., Ako-Adjei D.et al. .. Reference sequence (RefSeq) database at NCBI: current status, taxonomic expansion, and functional annotation. Nucleic Acids Res. 2016; 44:D733–D745. - PMC - PubMed
    1. White T.J., Bruns T., Lee S., Taylor J.. Amplification and direct sequencing of fungal ribosomal RNA genes for phylogenetics. PCR Protoc.: Guide Methods Appl. 1990; 18:315–322.
    1. Nuclear ribosomal internal transcribed spacer (ITS) region as a universal DNA barcode marker for Fungi. Proc. Natl. Acad. Sci. U.S.A. 2012; 109:6241–6246.Schoch,C.L., Seifert,K.A., Huhndorf,S., Robert,V., Spouge,J.L., Levesque,C.A., Chen,W.E., Fungal Barcoding Consortium, Fungal Barcoding Consortium Author List, Bolchacova,E. et al.. - PMC - PubMed
    1. Schoch C.L., Robbertse B., Robert V., Vu D., Cardinali G., Irinyi L., Meyer W., Nilsson R.H., Hughes K., Miller A.N.et al. .. Finding needles in haystacks: linking scientific names, reference specimens and molecular data for Fungi. Database. 2014; 2014:bau061. - PMC - PubMed

Publication types

LinkOut - more resources