The UNITE database for molecular identification and taxonomic communication of fungi and other eukaryotes: sequences, taxa and classifications reconsidered

Kessy Abarenkov¹, R Henrik Nilsson^{2

3}, Karl-Henrik Larsson^{3

4}, Andy F S Taylor^{5

6}, Tom W May⁷, Tobias Guldberg Frøslev⁸, Julia Pawlowska⁹, Björn Lindahl¹⁰, Kadri Põldmaa^{1

11}, Camille Truong⁷, Duong Vu¹², Tsuyoshi Hosoya¹³, Tuula Niskanen¹⁴, Timo Piirmann¹, Filipp Ivanov¹, Allan Zirk¹, Marko Peterson¹¹, Tanya E Cheeke¹⁵, Yui Ishigami¹¹, Arnold Tobias Jansson², Thomas Stjernegaard Jeppesen⁸, Erik Kristiansson¹⁶, Vladimir Mikryukov¹¹, Joseph T Miller⁸, Ryoko Oono¹⁷, Francisco J Ossandon¹⁸, Joana Paupério¹⁹, Irja Saar^{1

11}, Dmitry Schigel⁸, Ave Suija¹, Leho Tedersoo¹¹, Urmas Kõljalg¹¹

Affiliations

¹ Natural History Museum, University of Tartu, Vanemuise 46, 51003 Tartu, Estonia.
² Department of Biological and Environmental Sciences, University of Gothenburg, Box 453, 405 30 Göteborg, Sweden.
³ Gothenburg Global Biodiversity Centre, University of Gothenburg, Box 453, 405 30 Göteborg, Sweden.
⁴ Natural History Museum, University of Oslo, Box 1172 Blindern, 0318 Oslo, Norway.
⁵ The James Hutton Institute, Craigiebuckler, Aberdeen AB15 8QH, UK.
⁶ Institute of Biological and Environmental Sciences, University of Aberdeen, Cruickshank Building, St Machar Drive, Aberdeen AB24 3UU, UK.
⁷ Royal Botanic Gardens Victoria, Birdwood Avenue, Melbourne, VIC 3004, Australia.
⁸ Global Biodiversity Information Facility (GBIF), Secretariat, Universitetsparken 15, DK-2100 Copenhagen Ø, Denmark.
⁹ Institute of Evolutionary Biology, Faculty of Biology, University of Warsaw, ul. Zwirki i Wigury 101, 02-089 Warsaw, Poland.
¹⁰ Swedish University of Agricultural Sciences, Department of Soil and Environment, Box 7014, SE-750 07 Uppsala, Sweden.
¹¹ Institute of Ecology and Earth Sciences, University of Tartu, J. Liivi 2, 50409 Tartu, Estonia.
¹² Westerdijk Fungal Biodiversity Institute, The Netherlands.
¹³ National Museum of Nature and Science, Japan.
¹⁴ Botany Unit, Finnish Museum of Natural History, P.O.Box 7, 00014 University of Helsinki, Finland.
¹⁵ School of Biological Sciences, Washington State University, 2710 Crimson Way, Richland, WA 9935, USA.
¹⁶ Department of Mathematical Sciences, Chalmers University of Technology, Gothenburg, Sweden.
¹⁷ Department of Ecology, Evolution, and Marine Biology, University of California at Santa Barbara, USA.
¹⁸ Biome Makers Inc., Davis, CA, USA.
¹⁹ European Molecular Biology Laboratory, European Bioinformatics Institute, Hinxton, Cambridge, UK.

PMID: 37953409
PMCID: PMC10767974
DOI: 10.1093/nar/gkad1039

The UNITE database for molecular identification and taxonomic communication of fungi and other eukaryotes: sequences, taxa and classifications reconsidered

Kessy Abarenkov et al. Nucleic Acids Res. 2024.

. 2024 Jan 5;52(D1):D791-D797.

doi: 10.1093/nar/gkad1039.

Authors

Affiliations

¹ Natural History Museum, University of Tartu, Vanemuise 46, 51003 Tartu, Estonia.
² Department of Biological and Environmental Sciences, University of Gothenburg, Box 453, 405 30 Göteborg, Sweden.
³ Gothenburg Global Biodiversity Centre, University of Gothenburg, Box 453, 405 30 Göteborg, Sweden.
⁴ Natural History Museum, University of Oslo, Box 1172 Blindern, 0318 Oslo, Norway.
⁵ The James Hutton Institute, Craigiebuckler, Aberdeen AB15 8QH, UK.
⁶ Institute of Biological and Environmental Sciences, University of Aberdeen, Cruickshank Building, St Machar Drive, Aberdeen AB24 3UU, UK.
⁷ Royal Botanic Gardens Victoria, Birdwood Avenue, Melbourne, VIC 3004, Australia.
⁸ Global Biodiversity Information Facility (GBIF), Secretariat, Universitetsparken 15, DK-2100 Copenhagen Ø, Denmark.
⁹ Institute of Evolutionary Biology, Faculty of Biology, University of Warsaw, ul. Zwirki i Wigury 101, 02-089 Warsaw, Poland.
¹⁰ Swedish University of Agricultural Sciences, Department of Soil and Environment, Box 7014, SE-750 07 Uppsala, Sweden.
¹¹ Institute of Ecology and Earth Sciences, University of Tartu, J. Liivi 2, 50409 Tartu, Estonia.
¹² Westerdijk Fungal Biodiversity Institute, The Netherlands.
¹³ National Museum of Nature and Science, Japan.
¹⁴ Botany Unit, Finnish Museum of Natural History, P.O.Box 7, 00014 University of Helsinki, Finland.
¹⁵ School of Biological Sciences, Washington State University, 2710 Crimson Way, Richland, WA 9935, USA.
¹⁶ Department of Mathematical Sciences, Chalmers University of Technology, Gothenburg, Sweden.
¹⁷ Department of Ecology, Evolution, and Marine Biology, University of California at Santa Barbara, USA.
¹⁸ Biome Makers Inc., Davis, CA, USA.
¹⁹ European Molecular Biology Laboratory, European Bioinformatics Institute, Hinxton, Cambridge, UK.

PMID: 37953409
PMCID: PMC10767974
DOI: 10.1093/nar/gkad1039

Abstract

UNITE (https://unite.ut.ee) is a web-based database and sequence management environment for molecular identification of eukaryotes. It targets the nuclear ribosomal internal transcribed spacer (ITS) region and offers nearly 10 million such sequences for reference. These are clustered into ∼2.4M species hypotheses (SHs), each assigned a unique digital object identifier (DOI) to promote unambiguous referencing across studies. UNITE users have contributed over 600 000 third-party sequence annotations, which are shared with a range of databases and other community resources. Recent improvements facilitate the detection of cross-kingdom biological associations and the integration of undescribed groups of organisms into everyday biological pursuits. Serving as a digital twin for eukaryotic biodiversity and communities worldwide, the latest release of UNITE offers improved avenues for biodiversity discovery, precise taxonomic communication and integration of biological knowledge across platforms.

PubMed Disclaimer

Figures

**Figure 1.**
Diagram of the UNITE SH 9.0 calculation steps. The sequences are dereplicated using VSEARCH, and sequences that do not represent the full ITS region according to ITSx are dismissed. Following quality filtering, a series of successive clustering steps of generating subsets of 500 000 (500k) and 30 000 (30k) sequences and selecting core representative sequences (cRepS) is carried out. This yields what are termed ‘compound clusters’, which are sequence clusters roughly at the genus/subgenus level. These are further clustered into species hypotheses (SH). All clustering steps in the SH calculation workflow are performed using the USEARCH tool. The similarity thresholds (97%−95%−90%−80%) for the nested pre-clustering (5c, 6) were chosen to yield clusters at approximately the genus/subgenus level. A dissimilarity threshold (0.5%) for the complete-linkage clustering (5d) was selected to trim the dataset of closely related sequences around the core representative sequences. The core representative sequences undergo the final single-linkage clustering within a dissimilarity range of 0.5−3.0% with a 0.5% step. These dissimilarity thresholds were selected as the most commonly applied in species delimitation and sequence identification. For each SH, a representative sequence is selected, either automatically or based on prior manual curation. The species hypotheses are aligned to form the final SH datasets.

**Figure 2.**
The number of species hypotheses at 1.0% and 1.5% between-species distance threshold through the four latest major versions of UNITE. Each SH is assigned a unique DOI every time the SHs are recomputed, and a versioning system keeps track of DOI names and contents over time, allowing users to follow how individual SHs are populated with sequences over time.

**Figure 3.**
(A) Treemap of the most abundant taxa (kingdom and phylum) based on the taxonomy of UNITE SHs at 1.0% between-species distance threshold, (B) The number of UNITE SHs at 1.0% distance threshold versus species names per fungal phylum in the Catalogue of Life (CoL) checklist from 2023-06-29.

See this image and copyright information in PMC

References

1. Schoch C.L., Seifert K.A., Huhndorf S., Robert V., Spouge J.L., Levesque C.A., Chen W., Bolchacova E., Voigt K., Crous P.W.et al.. Nuclear ribosomal internal transcribed spacer (ITS) region as a universal DNA barcode marker for Fungi. Proc. Natl. Acad. Sci. U.S.A. 2012; 109:6241–6246. - PMC - PubMed
1. Arita M., Karsch-Mizrachi I., Cochrane G.. The international nucleotide sequence database collaboration. Nucleic Acids Res. 2020; 49:D121–D124. - PMC - PubMed
1. Kõljalg U., Nilsson R.H., Abarenkov K., Tedersoo L., Taylor A.F.S., Bahram M., Bates S.T., Bruns T.D., Bengtsson-Palme J., Callaghan T.M.et al.. Towards a unified paradigm for sequence-based identification of fungi. Mol. Ecol. 2013; 22:5271–5277. - PubMed
1. Taberlet P., Coissac E., Pompanon F., Brochmann C., Willerslew E.. Towards next-generation biodiversity assessment using DNA metabarcoding. Mol. Ecol. 2012; 21:2045–2050. - PubMed
1. Bolyen E., Rideout J.R., Dillon M.R., Bokulich N.A., Abnet C.C., Al-Ghalith G.A., Alexander H., Alm E.J., Arumugam M., Asnicar F.et al.. Reproducible, interactive, scalable and extensible microbiome data science using QIIME 2. Nat. Biotechnol. 2019; 37:852–857. - PMC - PubMed

MeSH terms

Actions
Actions
Actions
Actions
Actions
Actions

Save citation to file

Email citation

Add to Collections

Add to My Bibliography

Your saved search

Create a file for external citation management software

Your RSS Feed

The UNITE database for molecular identification and taxonomic communication of fungi and other eukaryotes: sequences, taxa and classifications reconsidered

Affiliations

The UNITE database for molecular identification and taxonomic communication of fungi and other eukaryotes: sequences, taxa and classifications reconsidered

Authors

Affiliations

Abstract

Figures

References

MeSH terms

Substances

Grants and funding

LinkOut - more resources

Full Text Sources

Medical