Skip to main page content
U.S. flag

An official website of the United States government

Dot gov

The .gov means it’s official.
Federal government websites often end in .gov or .mil. Before sharing sensitive information, make sure you’re on a federal government site.

Https

The site is secure.
The https:// ensures that you are connecting to the official website and that any information you provide is encrypted and transmitted securely.

Access keys NCBI Homepage MyNCBI Homepage Main Content Main Navigation
. 2015 Oct 19;10(10):e0140885.
doi: 10.1371/journal.pone.0140885. eCollection 2015.

Increased Transcript Complexity in Genes Associated with Chronic Obstructive Pulmonary Disease

Affiliations

Increased Transcript Complexity in Genes Associated with Chronic Obstructive Pulmonary Disease

Lela Lackey et al. PLoS One. .

Abstract

Genome-wide association studies aim to correlate genotype with phenotype. Many common diseases including Type II diabetes, Alzheimer's, Parkinson's and Chronic Obstructive Pulmonary Disease (COPD) are complex genetic traits with hundreds of different loci that are associated with varied disease risk. Identifying common features in the genes associated with each disease remains a challenge. Furthermore, the role of post-transcriptional regulation, and in particular alternative splicing, is still poorly understood in most multigenic diseases. We therefore compiled comprehensive lists of genes associated with Type II diabetes, Alzheimer's, Parkinson's and COPD in an attempt to identify common features of their corresponding mRNA transcripts within each gene set. The SERPINA1 gene is a well-recognized genetic risk factor of COPD and it produces 11 transcript variants, which is exceptional for a human gene. This led us to hypothesize that other genes associated with COPD, and complex disorders in general, are highly transcriptionally diverse. We found that COPD-associated genes have a statistically significant enrichment in transcript complexity stemming from a disproportionately high level of alternative splicing, however, Type II Diabetes, Alzheimer's and Parkinson's disease genes were not significantly enriched. We also identified a subset of transcriptionally complex COPD-associated genes (~40%) that are differentially expressed between mild, moderate and severe COPD. Although the genes associated with other lung diseases are not extensively documented, we found preliminary data that idiopathic pulmonary disease genes, but not cystic fibrosis modulators, are also more transcriptionally complex. Interestingly, complex COPD transcripts are more often the product of alternative acceptor site usage. To verify the biological importance of these alternative transcripts, we used RNA-sequencing analyses to determine that COPD-associated genes are frequently expressed in lung and liver tissues and are regulated in a tissue-specific manner. Additionally, many complex COPD-associated genes are spliced differently between COPD and non-COPD patients. Our analysis therefore suggests that post-transcriptional regulation, particularly alternative splicing, is an important feature specific to COPD disease etiology that warrants further investigation.

PubMed Disclaimer

Conflict of interest statement

Competing Interests: The authors have declared that no competing interests exist.

Figures

Fig 1
Fig 1. Combining sources to identify genes associated with COPD.
(A) The COPD-associated gene, SERPINA1, is alternatively spliced to produce 11 different transcripts. Two transcription start sites are indicated by arrows; splice sites are shown as colored lollipops. Exons are indicated by white bars and introns by horizontal gray bars. The coding sequence start and stop codons are indicated with pink lines. 11 transcripts are depicted and colored by splice site selection. The 11 transcripts make SERPINA1 a particularly complex gene in terms of alternative splicing, 99.5% of human genes have fewer transcripts. (B) COPD-associated genes were identified by merging disease-associated genes from different literature reviews (left) and combining them with genes from the NHGRI GWAS catalog (right). Other comparative disease lists were compiled in the same manner, including (C) Parkinson’s disease-associated genes, (D) Type 2 diabetes-associated genes and (E) Alzheimer’s disease-associated genes. The number of genes from each source is indicated in the Venn diagrams.
Fig 2
Fig 2. COPD-associated genes are transcriptionally complex.
(A) We calculated the average number of transcripts per loci for length-normalized control loci and each disease-associated gene list. The control sets used as comparisons are labeled. The mean and standard deviation of the control lists are shown (black bar, grey box). Only the COPD-associated gene list is significantly different from the normalized control (p = 0.001). (B) We analyzed the number of gene loci that produce 1, 2, 3, 4, 5, and 6 or more transcripts and plotted these gene loci by their proportion in each list. The COPD-associated gene set has significantly fewer loci with 1 transcript and significantly more loci with more than 5 transcripts (p = 0.005 and p = 0.001).
Fig 3
Fig 3. The transcript complexity enrichment of COPD-associated genes is robust.
(A) Microarray data from the sputum of ex-smokers diagnosed with COPD stage 2, 3, or 4 identified 85 COPD-associated genes with significant differential expression (gds4265). These 85 genes are shown on the y-axis and individuals are shown on the x-axis with their COPD status marked by disease stage along the top. (B) We calculated the average number of transcripts per loci for these 85 differentially expressed COPD-associated genes along with length-normalized control gene sets. The mean and standard deviation of the control lists are shown (black bar, grey box). These expressed genes are significantly more transcriptionally complex than controls (p = 0.001).
Fig 4
Fig 4. COPD splice junctions have lower GC content than expected.
(A) Diagram of a representative gene with the regions of interest labeled, including the transcription start site (TSS), the first donor splice site (Donor), the first acceptor splice site (Accept) and the subsequent internal donor and acceptor splice sites (DonorMid and AcceptMid). (B) We calculated the average GC content for the 60 nucleotides surrounding each region in COPD-associated genes as well as genes associated with other diseases (PRK, T2D and ALZ). As a comparison, we also analyzed the GC content of genes with disease-associated SNPs within the TSS or splice junctions (black dots).
Fig 5
Fig 5. Skipped exons and alternative acceptor splicing events are differentially enriched in COPD genes.
(A) SERPINA1 contains a number of different splicing events including (B) skipped exons, (C) alternative donors and (D) alternative acceptors. (E) On average, COPD genes contain fewer skipped exons and significantly more alternative acceptor splice events (p = 0.030) per total splice events in comparison with normalized reference genes.
Fig 6
Fig 6. Highly expressed COPD-associated gene transcripts are enriched in lung tissue.
The highest expressed transcript from each COPD-associated gene is shown as a percentage of expression in each of the 16 tissues from BodyMap. COPD-associated gene expression is highest in lung, white blood cells, testes and liver (14.5%, 9.9%, 9.0% and 8.7%), respectively. Other COPD-associated transcripts not in these tissues are still tissue specific and may be detrimental if expressed in the lung. Splice variants were determined as alternative splice sites through ASprofile [95].
Fig 7
Fig 7. Usage of SERPINA1 and AGER splice variants by tissue and COPD status.
(A) The expression of each splice variant in SERPINA1 was normalized across tissues. The pattern of expression shows that these splice variants are not broadly used in every tissue, but specific to the kidney, liver, lung and white blood cells. (B) Likewise, AGER is expressed in nearly every tissue tested, but splice variants are have specific patterns that imply regulation. Splice variants were determined as alternative splice sites through ASprofile [95]. (C) RNA-seq data from control subjects and COPD patients indicate that in SERPINA1 four exons are significantly differentially used (p < 0.05). (D) In the AGER gene twelve exons are differentially expressed in COPD patients compared to normal controls (p < 0.05). Exon usage was generated with DEXSeq [101].

References

    1. Baralle D, Baralle M. Splicing in action: assessing disease causing sequence changes. Journal of medical genetics. 2005;42(10):737–48. 10.1136/jmg.2004.029538 - DOI - PMC - PubMed
    1. Chatterjee S, Pal JK. Role of 5'- and 3'-untranslated regions of mRNAs in human diseases. Biology of the cell / under the auspices of the European Cell Biology Organization. 2009;101(5):251–62. 10.1042/BC20080104 . - DOI - PubMed
    1. Lu ZX, Jiang P, Xing Y. Genetic variation of pre-mRNA alternative splicing in human populations. Wiley interdisciplinary reviews RNA. 2012;3(4):581–92. 10.1002/wrna.120 - DOI - PMC - PubMed
    1. Singh RK, Cooper TA. Pre-mRNA splicing in disease and therapeutics. Trends in molecular medicine. 2012;18(8):472–82. 10.1016/j.molmed.2012.06.006 - DOI - PMC - PubMed
    1. Manolio TA. Genomewide association studies and assessment of the risk of disease. The New England journal of medicine. 2010;363(2):166–76. 10.1056/NEJMra0905980 . - DOI - PubMed

Publication types