Skip to main page content
U.S. flag

An official website of the United States government

Dot gov

The .gov means it’s official.
Federal government websites often end in .gov or .mil. Before sharing sensitive information, make sure you’re on a federal government site.

Https

The site is secure.
The https:// ensures that you are connecting to the official website and that any information you provide is encrypted and transmitted securely.

Access keys NCBI Homepage MyNCBI Homepage Main Content Main Navigation
. 2024 May 7;227(1):iyae027.
doi: 10.1093/genetics/iyae027.

The Arabidopsis Information Resource in 2024

Affiliations

The Arabidopsis Information Resource in 2024

Leonore Reiser et al. Genetics. .

Abstract

Since 1999, The Arabidopsis Information Resource (www.arabidopsis.org) has been curating data about the Arabidopsis thaliana genome. Its primary focus is integrating experimental gene function information from the peer-reviewed literature and codifying it as controlled vocabulary annotations. Our goal is to produce a "gold standard" functional annotation set that reflects the current state of knowledge about the Arabidopsis genome. At the same time, the resource serves as a nexus for community-based collaborations aimed at improving data quality, access, and reuse. For the past decade, our work has been made possible by subscriptions from our global user base. This update covers our ongoing biocuration work, some of our modernization efforts that contribute to the first major infrastructure overhaul since 2011, the introduction of JBrowse2, and the resource's role in community activities such as organizing the structural reannotation of the genome. For gene function assessment, we used gene ontology annotations as a metric to evaluate: (1) what is currently known about Arabidopsis gene function and (2) the set of "unknown" genes. Currently, 74% of the proteome has been annotated to at least one gene ontology term. Of those loci, half have experimental support for at least one of the following aspects: molecular function, biological process, or cellular component. Our work sheds light on the genes for which we have not yet identified any published experimental data and have no functional annotation. Drawing attention to these unknown genes highlights knowledge gaps and potential sources of novel discoveries.

Keywords: biocuration; community resource; model organism database; plant genomics.

PubMed Disclaimer

Conflict of interest statement

Conflicts of interest: The author(s) declare no conflict of interest.

Figures

Fig. 1.
Fig. 1.
Histogram showing the annotation status of the Arabidopsis proteome by GO aspect and GO evidence class. The unknown set includes proteins with annotations to the root ontology term using the evidence code ND. The experimental set includes proteins with at least one annotation using one of these evidence codes: Inferred from Direct Assay, Inferred from Expression Pattern, Inferred from Genetic Interaction, Inferred from Mutant Phenotype, Inferred from Physical Interaction, inferred from High-throughput Direct Assay, inferred from High-throughput Expression Pattern, or inferred from EXPeriment (EXP). The non-experimental set includes proteins ONLY having annotations of at least one of the following evidence codes: IEA, Inferred from Sequence or Structural Similarity, Non-traceable Author Statement, Traceable Author Statement, Inferred by Curator, Inferred from Reviewed Computational Analysis, IBA, and Inferred from Sequence Model.
Fig. 2.
Fig. 2.
Venn diagram illustrating the overlap among proteins having ND annotations to each aspect. Files containing Arabidopsis Genome Initiative (AGI) locus IDs for each aspect (MF_UNK, unknown molecular functions; BP_UNK, unknown biological process; CC_UNK, unknown cellular component; INSERT REF for files) were uploaded to the Vlaams Instituut voor Biotechnologie (VIB) Venn Diagram Generator (http://bioinformatics.psb.ugent.be/webtools/Venn/).
Fig. 3.
Fig. 3.
Histogram displaying the distribution of PANTHER 17 gene families containing Arabidopsis unknown proteins grouped by highest taxonomic classification. Plant-specific families are indicated by green bars.
Fig. 4.
Fig. 4.
Screenshot of GOAT data submission interface after logging in via ORCiD. a) Users add DOI or PubMed ID for the paper they are curating. b) Users enter locus identifiers and any gene names/symbols. Users can add more genes by clicking the “Add Another Gene” button. c) Users must enter at least one annotation for at least one gene (specified in the above list). d) Users can add as many annotations as desired. e) They can choose different types of annotations from the drop down menu. The type of annotation determines the set of GO or PO terms available as well as the types of evidence (Method). Once all annotations are entered, the user is prompted to review the submission (F) before submitting. Submissions are then reviewed by a TAIR curator before being imported into TAIR and integrated to the GO database on a quarterly basis.
Fig. 5.
Fig. 5.
JBrowse2 interface showing the locus At1g31860, the structure of the three different gene models, locations of T-DNA insertions, supporting cDNAs, and one some mRNA-seq expression data as a coverage track.
Fig. 6.
Fig. 6.
JBrowse 2 visualization of syntenic comparison between genomic regions of A. thaliana and A. lyrata. Individual tracks show A. lyrata protein coding genes (version 1.0), “Connectors” indicating syntenic regions between A. thaliana and A. lyrata, and A. thaliana protein coding genes (Araport11 release). Syntenic comparison was performed using MC-Scan and the output PAF file was uploaded into JBrowse2 to create the above syntenic panel.
Fig. 7.
Fig. 7.
Phases of genome reannotation.

References

    1. Arabidopsis Genome Initiative . 2000. Analysis of the genome sequence of the flowering plant Arabidopsis thaliana. Nature. 408(6814):796–815. doi:10.1038/35048692. - DOI - PubMed
    1. Arnaboldi V, Raciti D, Van Auken K, Chan JN, Müller H-M, Sternberg PW. 2020. Text mining meets community curation: a newly designed curation platform to improve author experience and participation at WormBase. Database (Oxford). 2020:baaa006. doi:10.1093/database/baaa006. - DOI - PMC - PubMed
    1. Ashburner M, Ball CA, Blake JA, Botstein D, Butler H, Cherry JM, Davis AP, Dolinski K, Dwight SS, Eppig JT, et al. 2000. Gene ontology: tool for the unification of biology. The Gene Ontology Consortium. Nat Genet. 25(1):25–29. doi:10.1038/75556. - DOI - PMC - PubMed
    1. Berardini TZ, Li D, Muller R, Chetty R, Ploetz L, Singh S, Wensel A, Huala E. 2012. Assessment of community-submitted ontology annotations from a novel database-journal partnership. Database (Oxford). 2012:bas030–bas030. doi:10.1093/database/bas030. - DOI - PMC - PubMed
    1. Berardini T, Reiser L, Huala E. 2022. TAIR functional annotation data (TAIR_Data_20220331) [Data set]. Zenodo. doi:10.5281/zenodo.7843882. - DOI

Publication types

Substances

LinkOut - more resources