. 2021 Mar 12;12(1):1652.

doi: 10.1038/s41467-021-21894-x.

Aptardi predicts polyadenylation sites in sample-specific transcriptomes using high-throughput RNA sequencing and DNA sequence

Ryan Lusk¹, Evan Stene², Farnoush Banaei-Kashani², Boris Tabakoff³, Katerina Kechris⁴, Laura M Saba³

Affiliations

¹ Department of Pharmaceutical Sciences, University of Colorado Anschutz Medical Campus, Aurora, CO, USA. ryan.lusk@cuanschutz.edu.
² Department of Computer Science and Engineering, University of Colorado Denver, Denver, CO, USA.
³ Department of Pharmaceutical Sciences, University of Colorado Anschutz Medical Campus, Aurora, CO, USA.
⁴ Department of Biostatistics and Informatics, University of Colorado Anschutz Medical Campus, Aurora, CO, USA.

PMID: 33712618
PMCID: PMC7955126
DOI: 10.1038/s41467-021-21894-x

Aptardi predicts polyadenylation sites in sample-specific transcriptomes using high-throughput RNA sequencing and DNA sequence

Ryan Lusk et al. Nat Commun. 2021.

. 2021 Mar 12;12(1):1652.

doi: 10.1038/s41467-021-21894-x.

Authors

Ryan Lusk¹, Evan Stene², Farnoush Banaei-Kashani², Boris Tabakoff³, Katerina Kechris⁴, Laura M Saba³

Affiliations

¹ Department of Pharmaceutical Sciences, University of Colorado Anschutz Medical Campus, Aurora, CO, USA. ryan.lusk@cuanschutz.edu.
² Department of Computer Science and Engineering, University of Colorado Denver, Denver, CO, USA.
³ Department of Pharmaceutical Sciences, University of Colorado Anschutz Medical Campus, Aurora, CO, USA.
⁴ Department of Biostatistics and Informatics, University of Colorado Anschutz Medical Campus, Aurora, CO, USA.

PMID: 33712618
PMCID: PMC7955126
DOI: 10.1038/s41467-021-21894-x

Abstract

Annotation of polyadenylation sites from short-read RNA sequencing alone is a challenging computational task. Other algorithms rooted in DNA sequence predict potential polyadenylation sites; however, in vivo expression of a particular site varies based on a myriad of conditions. Here, we introduce aptardi (alternative polyadenylation transcriptome analysis from RNA-Seq data and DNA sequence information), which leverages both DNA sequence and RNA sequencing in a machine learning paradigm to predict expressed polyadenylation sites. Specifically, as input aptardi takes DNA nucleotide sequence, genome-aligned RNA-Seq data, and an initial transcriptome. The program evaluates these initial transcripts to identify expressed polyadenylation sites in the biological sample and refines transcript 3'-ends accordingly. The average precision of the aptardi model is twice that of a standard transcriptome assembler. In particular, the recall of the aptardi model (the proportion of true polyadenylation sites detected by the algorithm) is improved by over three-fold. Also, the model-trained using the Human Brain Reference RNA commercial standard-performs well when applied to RNA-sequencing samples from different tissues and different mammalian species. Finally, aptardi's input is simple to compile and its output is easily amenable to downstream analyses such as quantitation and differential expression.

PubMed Disclaimer

Conflict of interest statement

The authors declare no competing interests.

Figures

**Fig. 1. Overview for using aptardi.**
Aptardi requires three files as input: (1) FASTA file of DNA sequence with headers by chromosome, (2) sorted Binary Alignment Map (BAM) file of reads aligned to the genome, and (3) General Feature Format (GTF) file of transcript structures. Blue boxes represent software. Yellow writing/boxes indicate aptardi incorporation. Note transcript structures can be derived from a reference transcriptome (i.e., Ensembl annotation) in lieu of the original transcriptome generated from a transcriptome assembler.

**Fig. 2. DNA sequence and RNA-sequencing (RNA-Seq) features are individually associated with polyadenylation (polyA) sites.**
a The percent of 100 base bins containing each of the three strong polyA signals stratified by the bin not containing (blue) or containing (orange) a polyA site. b Distribution of the inter-bin RNA-Seq features for each 100 base bin stratified by the bin not containing (blue) or containing (orange) a polyA site (RNA-Seq ratio features were standardized using the training set). c RNA-Seq features and DNA sequence features display little correlation (two-sided Pearson Product-Moment) across omics type. The combination of RNA-Seq information and DNA sequence information improves d average precision, and e, precision and recall at a specific prediction threshold (probability >0.50) over each separately. For both d and e, data are presented as mean values ±standard deviation on the test set (n = 5 random train-validate-test splits). Data shown are from the Human Brain Reference data set.

Fig. 3. The machine learning pipeline used to build aptardi is robust to different data sets and the aptardi prediction model generated from the Human Brain Reference data set is applicable across diverse data sets.
Blue bars indicate the performance of the data set-specific prediction model on its own data set, i.e., the model was built and evaluated on a single data set. Orange bars represent the performance of the aptardi prediction model—built from the Human Brain Reference data set—on the given data set (x axis).

Fig. 4. Incorporating aptardi transcripts into the original transcriptome improves the ratio of true positive to false positive 3′ termini compared with the original transcriptome and compared with the Tool for Alternative Polyadenylation site AnalysiS (TAPAS) analysis on the original transcriptome.
Results from transcripts added by aptardi to the original transcriptome are shaded in dark. Transcripts whose 3′ terminus was plus or minus 100 bases of a true polyadenylation site from PolyA-Seq data were considered a true positive and otherwise counted as a false positive. Data shown are from the Human Brain Reference data set.

**Fig. 5. Aptardi displays sample-specific sensitivity when annotating transcription stop sites.**
RNA-sequencing (RNA-Seq) read densities for a *CCND1*, b *DICER1*, and c *TIMP2* after control (Control) siRNA treatment and CFIm25 knockdown (KD) in HeLa cells. Numbers on y axis indicate RNA-Seq read coverage. After knockdown, each gene preferentially expresses a proximal alternative polyadenylation (APA) site compared to under control conditions. Transcript structures shown are from RefSeq annotation (dark blue), where boxes and lines indicate exons and introns, respectively. Black vertical lines indicate transcript stop sites identified in the original transcriptome, red vertical lines indicate transcript stop sites only identified in the aptardi modified transcriptome and that match the original study’s findings, and blue vertical lines indicate transcript stop sites only identified in the aptardi modified transcriptome that are not described in the original study. Graphics were generating using the UCSC Genome Browser (https://genome.ucsc.edu/) using the hg38 human genome assembly.

**Fig. 6. Incorporation of aptardi into differential expression analyses.**
RNA-sequencing (RNA-Seq) read densities for six genes in BNLx and SHR inbred rat strains. Numbers on y axis indicate RNA-Seq read coverage. Read coverage represents the aggregate of three biological samples for each strain. Transcript structures shown are from Ensembl annotation (dark red), where boxes and lines indicate exons and introns, respectively. Black vertical lines denote transcript stop sites identified in the original transcriptome derived using StringTie, and red vertical lines indicate transcript stop sites identified in the aptardi modified transcriptome only. No transcripts were identified as differentially expressed between strains in the original transcriptome (p > 0.001), but at least one differentially expressed transcript for each gene was identified in the aptardi modified transcriptome (p ≤ 0.001). For a *Unc79*, b *Sf3b1*, c *Ptn*, and d *Ap3b1* the original transcript isoform (black line) was differentially expressed in the aptardi modified transcriptome, and for e *Zdhhc22* and f *RGD1559441* the aptardi transcript was differentially expressed (red line). Graphics were generating using the UCSC Genome Browser (https://genome.ucsc.edu/) using the rn6 rat genome assembly.

See this image and copyright information in PMC

Cited by

InPACT: a computational method for accurate characterization of intronic polyadenylation from RNA sequencing data.
Liu X, Chen H, Li Z, Yang X, Jin W, Wang Y, Zheng J, Li L, Xuan C, Yuan J, Yang Y. Liu X, et al. Nat Commun. 2024 Mar 22;15(1):2583. doi: 10.1038/s41467-024-46875-8. Nat Commun. 2024. PMID: 38519498 Free PMC article.
Leveraging omic features with F3UTER enables identification of unannotated 3'UTRs for synaptic genes.
Sethi S, Zhang D, Guelfi S, Chen Z, Garcia-Ruiz S, Olagbaju EO, Ryten M, Saini H, Botia JA. Sethi S, et al. Nat Commun. 2022 Apr 27;13(1):2270. doi: 10.1038/s41467-022-30017-z. Nat Commun. 2022. PMID: 35477703 Free PMC article.
TDP-43 loss induces extensive cryptic polyadenylation in ALS/FTD.
Bryce-Smith S, Brown AL, Mehta PR, Mattedi F, Mikheenko A, Barattucci S, Zanovello M, Dattilo D, Yome M, Hill SE, Qi YA, Wilkins OG, Sun K, Ryadnov E, Wan Y; NYGC ALS Consortium; Vargas JNS, Birsa N, Raj T, Humphrey J, Keuss M, Ward M, Secrier M, Fratta P. Bryce-Smith S, et al. bioRxiv [Preprint]. 2024 Jan 23:2024.01.22.576625. doi: 10.1101/2024.01.22.576625. bioRxiv. 2024. PMID: 38313254 Free PMC article. Preprint.
Hypoviral-regulated HSP90 co-chaperone p23 (CpCop23) determines the colony morphology, virulence, and viral response of chestnut blight fungus Cryphonectria parasitica.
Ko YH, Chun J, Yang HE, Kim DH. Ko YH, et al. Mol Plant Pathol. 2023 May;24(5):413-424. doi: 10.1111/mpp.13308. Epub 2023 Feb 10. Mol Plant Pathol. 2023. PMID: 36762926 Free PMC article.
Extensible benchmarking of methods that identify and quantify polyadenylation sites from RNA-seq data.
Bryce-Smith S, Burri D, Gazzara MR, Herrmann CJ, Danecka W, Fitzsimmons CM, Wan YK, Zhuang F, Fansler MM, Fernández JM, Ferret M, Gonzalez-Uriarte A, Haynes S, Herdman C, Kanitz A, Katsantoni M, Marini F, McDonnel E, Nicolet B, Poon CL, Rot G, Schärfen L, Wu PJ, Yoon Y, Barash Y, Zavolan M. Bryce-Smith S, et al. RNA. 2023 Dec;29(12):1839-1855. doi: 10.1261/rna.079849.123. Epub 2023 Oct 10. RNA. 2023. PMID: 37816550 Free PMC article. Review.

See all "Cited by" articles

References

1. Di Giammartino DC, Nishida K, Manley JL. Mechanisms and consequences of alternative polyadenylation. Mol. Cell. 2011;43:853–866. doi: 10.1016/j.molcel.2011.08.017. - DOI - PMC - PubMed
1. Tian B, Manley JL. Alternative polyadenylation of mRNA precursors. Nat. Rev. Mol. Cell Biol. 2017;18:18–30. doi: 10.1038/nrm.2016.116. - DOI - PMC - PubMed
1. Park JY, et al. Comparative analysis of mRNA isoform expression in cardiac hypertrophy and development reveals multiple post-transcriptional regulatory modules. PLoS ONE. 2011;6:e22391. doi: 10.1371/journal.pone.0022391. - DOI - PMC - PubMed
1. de Klerk E, et al. Poly(A) binding protein nuclear 1 levels affect alternative polyadenylation. Nucleic Acids Res. 2012;40:9089–9101. doi: 10.1093/nar/gks655. - DOI - PMC - PubMed
1. Jenal M, et al. The poly(A)-binding protein nuclear 1 suppresses alternative cleavage and polyadenylation sites. Cell. 2012;149:538–553. doi: 10.1016/j.cell.2012.03.022. - DOI - PubMed

Publication types

Actions

MeSH terms

Actions
Actions
Actions
Actions
Actions
Actions
Actions
Actions
Actions
Actions
Actions

Save citation to file

Email citation

Add to Collections

Add to My Bibliography

Your saved search

Create a file for external citation management software

Your RSS Feed

Aptardi predicts polyadenylation sites in sample-specific transcriptomes using high-throughput RNA sequencing and DNA sequence

Affiliations

Aptardi predicts polyadenylation sites in sample-specific transcriptomes using high-throughput RNA sequencing and DNA sequence

Authors

Affiliations

Abstract

Conflict of interest statement

Figures

Similar articles

Cited by

References

Publication types

MeSH terms

Substances

Grants and funding

LinkOut - more resources

Full Text Sources

Other Literature Sources

Abstract

Conflict of interest statement

Figures

Similar articles

Cited by

References

Publication types

MeSH terms

Substances

Related information

Grants and funding

LinkOut - more resources

Full Text Sources

Other Literature Sources