Skip to main page content
U.S. flag

An official website of the United States government

Dot gov

The .gov means it’s official.
Federal government websites often end in .gov or .mil. Before sharing sensitive information, make sure you’re on a federal government site.

Https

The site is secure.
The https:// ensures that you are connecting to the official website and that any information you provide is encrypted and transmitted securely.

Access keys NCBI Homepage MyNCBI Homepage Main Content Main Navigation
. 2018 Aug 16:9:1877.
doi: 10.3389/fimmu.2018.01877. eCollection 2018.

The CAIRR Pipeline for Submitting Standards-Compliant B and T Cell Receptor Repertoire Sequencing Studies to the National Center for Biotechnology Information Repositories

Affiliations

The CAIRR Pipeline for Submitting Standards-Compliant B and T Cell Receptor Repertoire Sequencing Studies to the National Center for Biotechnology Information Repositories

Syed Ahmad Chan Bukhari et al. Front Immunol. .

Abstract

The adaptation of high-throughput sequencing to the B cell receptor and T cell receptor has made it possible to characterize the adaptive immune receptor repertoire (AIRR) at unprecedented depth. These AIRR sequencing (AIRR-seq) studies offer tremendous potential to increase the understanding of adaptive immune responses in vaccinology, infectious disease, autoimmunity, and cancer. The increasingly wide application of AIRR-seq is leading to a critical mass of studies being deposited in the public domain, offering the possibility of novel scientific insights through secondary analyses and meta-analyses. However, effective sharing of these large-scale data remains a challenge. The AIRR community has proposed minimal information about adaptive immune receptor repertoire (MiAIRR), a standard for reporting AIRR-seq studies. The MiAIRR standard has been operationalized using the National Center for Biotechnology Information (NCBI) repositories. Submissions of AIRR-seq data to the NCBI repositories typically use a combination of web-based and flat-file templates and include only a minimal amount of terminology validation. As a result, AIRR-seq studies at the NCBI are often described using inconsistent terminologies, limiting scientists' ability to access, find, interoperate, and reuse the data sets. In order to improve metadata quality and ease submission of AIRR-seq studies to the NCBI, we have leveraged the software framework developed by the Center for Expanded Data Annotation and Retrieval (CEDAR), which develops technologies involving the use of data standards and ontologies to improve metadata quality. The resulting CEDAR-AIRR (CAIRR) pipeline enables data submitters to: (i) create web-based templates whose entries are controlled by ontology terms, (ii) generate and validate metadata, and (iii) submit the ontology-linked metadata and sequence files (FASTQ) to the NCBI BioProject, BioSample, and Sequence Read Archive databases. Overall, CAIRR provides a web-based metadata submission interface that supports compliance with the MiAIRR standard. This pipeline is available at http://cairr.miairr.org, and will facilitate the NCBI submission process and improve the metadata quality of AIRR-seq studies.

Keywords: B cell receptor; National Center for Biotechnology Information; Rep-seq; T cell receptor; antibody; immune-repertoire sequencing; ontology.

PubMed Disclaimer

Figures

Figure 1
Figure 1
CAIRR Submission Pipeline Workflow. (1) The CEDAR Template Designer is employed to create a set of templates according to the Minimal Information about an Adaptive Immune Receptor Repertoire (MiAIRR) standard. (2) Scientists can log into the CEDAR Workbench and use these templates to edit ontology-controlled metadata associated with their AIRR-sequencing study. The edited metadata is pre-validated through the National Center for Biotechnology Information (NCBI) validation service. (3) Scientists can start the submission process by accessing the Submission Manager within their CEDAR Workbench workspace. (4) The Submission Manager connects the CEDAR Workbench to the NCBI. (5) The Submission Manager facilitates uploading the metadata and data (FASTQ files) to the NCBI. (6) The CAIRR pipeline periodically checks the submission status at the NCBI. (7) Alert messages from NCBI are received by the Submission Manager. (8) These alert messages provide step-by-step processing detail to the scientists.
Figure 2
Figure 2
The Minimal Information about an Adaptive Immune Receptor Repertoire (MiAIRR) fields are transformed into a CEDAR template using the CEDAR Template Designer. Fields specified by MiAIRR (left panel) are transformed into a CEDAR template (right panel).
Figure 3
Figure 3
An ontology-controlled adaptive immune receptor repertoire study metadata editing process. (1) CEDAR’s Metadata Editor presents this web form based on the MiAIRR template produced by the Template Designer. The paging option allows a data submitter to add or delete BioSample and sequence read archive (SRA) records. (2) Some of BioSample and the SRA metadata are controlled through ontologies, which allow for auto-completion during data entry. (3) The toggle spreadsheet option allows data submitters to edit metadata using a traditional spreadsheet view.
Figure 4
Figure 4
CAIRR data submission. (1) Data submitters choose National Center for Biotechnology Information (NCBI) Sequence Read Archive (SRA) as the target repository, and then upload the related datasets to submit. (2) CAIRR provides submission acknowledgment and data-processing-level messages generated by the NCBI system.

References

    1. Hou D, Chen C, Seely EJ, Chen S, Song Y. High-throughput sequencing-based immune repertoire study during infectious disease. Front Immunol (2016) 7:336.10.3389/fimmu.2016.00336 - DOI - PMC - PubMed
    1. Weinstein JA, Jiang N, White RA, III, Fisher DS, Quake SR. High-throughput sequencing of the zebrafish antibody repertoire. Science (2009) 324:807–10.10.1126/science.1170020 - DOI - PMC - PubMed
    1. Freeman JD, Warren RL, Webb JR, Nelson BH, Holt RA. Profiling the T-cell receptor beta-chain repertoire by massively parallel sequencing. Genome Res (2009) 19:1817–24.10.1101/gr.092924.109 - DOI - PMC - PubMed
    1. Robinson WH. Sequencing the functional antibody repertoire – diagnostic and therapeutic discovery. Nat Rev Rheumatol (2015) 11:171–82.10.1038/nrrheum.2014.220 - DOI - PMC - PubMed
    1. Wilkinson MD, Dumontier M, Aalbersberg IJJ, Appleton G, Axton M, Baak A, et al. The FAIR guiding principles for scientific data management and stewardship. Sci Data (2016) 3:160018.10.1038/sdata.2016.18 - DOI - PMC - PubMed

Publication types

Substances