Skip to main page content
U.S. flag

An official website of the United States government

Dot gov

The .gov means it’s official.
Federal government websites often end in .gov or .mil. Before sharing sensitive information, make sure you’re on a federal government site.

Https

The site is secure.
The https:// ensures that you are connecting to the official website and that any information you provide is encrypted and transmitted securely.

Access keys NCBI Homepage MyNCBI Homepage Main Content Main Navigation
. 2022 Feb 2:86:177-194.
doi: 10.3897/mycokeys.86.76053. eCollection 2022.

The curse of the uncultured fungus

Affiliations

The curse of the uncultured fungus

Kessy Abarenkov et al. MycoKeys. .

Abstract

The international DNA sequence databases abound in fungal sequences not annotated beyond the kingdom level, typically bearing names such as "uncultured fungus". These sequences beget low-resolution mycological results and invite further deposition of similarly poorly annotated entries. What do these sequences represent? This study uses a 767,918-sequence corpus of public full-length fungal ITS sequences to estimate what proportion of the 95,055 "uncultured fungus" sequences that represent truly unidentifiable fungal taxa - and what proportion of them that would have been straightforward to annotate to some more meaningful taxonomic level at the time of sequence deposition. Our results suggest that more than 70% of these sequences would have been trivial to identify to at least the order/family level at the time of sequence deposition, hinting that factors other than poor availability of relevant reference sequences explain the low-resolution names. We speculate that researchers' perceived lack of time and lack of insight into the ramifications of this problem are the main explanations for the low-resolution names. We were surprised to find that more than a fifth of these sequences seem to have been deposited by mycologists rather than researchers unfamiliar with the consequences of poorly annotated fungal sequences in molecular repositories. The proportion of these needlessly poorly annotated sequences does not decline over time, suggesting that this problem must not be left unchecked.

Keywords: DNA barcoding; Data interoperability; data mining; scientific practice; species identification; taxonomic annotation.

PubMed Disclaimer

Figures

Figure 1.
Figure 1.
A screenshot from species hypothesis SH1159264.08FU (Vishniacozymavictoriae; https://dx.doi.org/10.15156/BIO/SH1159264.08FU) in UNITE. Identifying a VishniacozymavictoriaeITS sequence to at least the genus level is trivial, yet the screenshot hints at the swathes of kingdom level-annotated Vishniacozymavictoriae sequences regularly deposited in the INSDC. SequenceID – INSDC accession number. UNITE taxon name – taxonomic annotation in UNITE. INSD taxon name – original taxonomic annotation in INSDC. RefSeq – indicates a type-derived sequence. More than thirty studies have deposited kingdom-level annotations in this species hypothesis. The ones shown primarily stem from Nishizawa et al. (2010).
Figure 2.
Figure 2.
Pie chart representing all the 95,055 kingdom-level ITS sequences and the proportion of these that were true-positives (had no or only very distant taxonomically more well-annotated BLAST matches at the time of sequence deposition/release; red, 10%), false-negatives (had only reasonable matches; green, 17%) and false-negatives (had close matches; blue, 73%). The chart suggests that nearly all kingdom-level fungal ITS sequences in INSDC could have been given a more taxonomically-resolved name at the time of sequence deposition/release.
Figure 3.
Figure 3.
The top 15 most common countries of collection for the publication-associated sequences annotated at or beyond the phylum level (green) expressed as the proportion of the sequences stemming from each country out of all phylum-level-and-beyond sequences. The corresponding country for publication-associated sequences annotated only at the kingdom level (orange) is similarly expressed as the proportion of sequences stemming from that country out of all kingdom-level sequences. The figure is ordered in decreasing order by the country of collection for the phylum-level sequences.
Figure 4.
Figure 4.
The proportion of false-negative sequences (had reasonable matches; green) and false-negative sequences (had close matches; blue) out of all kingdom-level sequences over time (2001-2020). The figure suggests that the act of taking sequence annotation very lightly is not in an abating trend. The data for 2020 extend through early November 2020 and are thus partial.

References

    1. Abarenkov K, Adams RI, Laszlo I, Agan A, Ambrosio E, Antonelli A, Bahram M, Bengtsson-Palme J, Bok G, Cangren P, Coimbra V, Coleine C, Gustafsson C, He J, Hofmann T, Kristiansson E, Larsson E, Larsson T, Liu Y, Martinsson S, Meyer W, Panova M, Pombubpa N, Ritter C, Ryberg M, Svantesson S, Scharn R, Svensson O, Töpel M, Unterseher M, Visagie C, Wurzbacher C, Taylor AFS, Kõljalg U, Schriml L, Nilsson RH. (2016) Annotating public fungal ITS sequences from the built environment according to the MIxS-Built Environment standard – a report from a May 23–24, 2016 workshop (Gothenburg, Sweden). MycoKeys 16: 1–15. 10.3897/mycokeys.16.10000 - DOI
    1. Altschul SF, Madden TL, Schäffer AA, Zhang J, Zhang Z, Miller W, Lipman DJ. (1997) Gapped BLAST and PSI-BLAST: a new generation of protein database search programs. Nucleic Acids Research 25(17): 3389–3402. 10.1093/nar/25.17.3389 - DOI - PMC - PubMed
    1. Arita M, Karsch-Mizrachi I, Cochrane G. (2021) The international nucleotide sequence database collaboration. Nucleic Acids Research 49(D1): D121–D124. 10.1093/nar/gkaa967 - DOI - PMC - PubMed
    1. Baldrian P, Větrovský T, Lepinay C, Kohout P. (2021) High-throughput sequencing view on the magnitude of global fungal diversity. Fungal Diversity. 10.1007/s13225-021-00472-y - DOI
    1. Bengtsson‐Palme J, Ryberg M, Hartmann M, Branco S, Wang Z, Godhe A, DeWit P, Sanchez-Garcia M, Ebersberger I, de Sousa F, Amend AS, Jumpponen A, Unterseher M, Kristiansson E, Abarenkov K, Bertrand YJK, Sanli K, Eriksson KM, Vik U, Veldre V, Nilsson RH. (2013) Improved software detection and extraction of ITS1 and ITS 2 from ribosomal ITS sequences of fungi and other eukaryotes for analysis of environmental sequencing data. Methods in Ecology and Evolution 4(10): 914–919. 10.1111/2041-210X.12073 - DOI

LinkOut - more resources