Skip to main page content
U.S. flag

An official website of the United States government

Dot gov

The .gov means it’s official.
Federal government websites often end in .gov or .mil. Before sharing sensitive information, make sure you’re on a federal government site.

Https

The site is secure.
The https:// ensures that you are connecting to the official website and that any information you provide is encrypted and transmitted securely.

Access keys NCBI Homepage MyNCBI Homepage Main Content Main Navigation
. 2006 May 16;103(20):7735-40.
doi: 10.1073/pnas.0601893103. Epub 2006 May 9.

Genetic variation in putative regulatory loci controlling gene expression in breast cancer

Affiliations

Genetic variation in putative regulatory loci controlling gene expression in breast cancer

Vessela N Kristensen et al. Proc Natl Acad Sci U S A. .

Abstract

Candidate single-nucleotide polymorphisms (SNPs) were analyzed for associations to an unselected whole genome pool of tumor mRNA transcripts in 50 unrelated patients with breast cancer. SNPs were selected from 203 candidate genes of the reactive oxygen species pathway. We describe a general statistical framework for the simultaneous analysis of gene expression data and SNP genotype data measured for the same cohort, which revealed significant associations between subsets of SNPs and transcripts, shedding light on the underlying biology. We identified SNPs in EGF, IL1A, MAPK8, XPC, SOD2, and ALOX12 that are associated with the expression patterns of a significant number of transcripts, indicating the presence of regulatory SNPs in these genes. SNPs were found to act in trans in a total of 115 genes. SNPs in 43 of these 115 genes were found to act both in cis and in trans. Finally, subsets of SNPs that share significantly many common associations with a set of transcripts (biclusters) were identified. The subsets of transcripts that are significantly associated with the same set of SNPs or to a single SNP were shown to be functionally coherent in Gene Ontology and pathway analyses and coexpressed in other independent data sets, suggesting that many of the observed associations are within the same functional pathways. To our knowledge, this article is the first study to correlate SNP genotype data in the germ line with somatic gene expression data in breast tumors. It provides the statistical framework for further genotype expression correlation studies in cancer data sets.

PubMed Disclaimer

Conflict of interest statement

Conflict of interest statement: No conflicts declared.

Figures

Fig. 1.
Fig. 1.
Data mining and analysis workflow. (A) A total of 583 SNPs in 203 candidate genes from the ROS metabolizing and signaling pathway were selected from an initial pool of 4,000 SNPs in 233 genes. These 583 selected SNPs were analyzed for associations to 3,351 mRNA transcripts from a whole-genome expression analysis, filtered for signal quality (ratio of spot intensity over background exceeding 1.5 in at least 80% of the experiments in each dye channel). A subset of SNPs and a subset of transcripts that belong to biclusters were identified. (B) A heat map of −log10 (P value) of SNP–transcript associations, with range from 0 to −log10(9.5E-005) = 4.02. Bright yellow indicates significant associations. Rows and columns are reordered to highlight biclusters, subsets of SNPs, and transcripts that share significantly many common significant associations (one example is highlighted with a red oval). (C) GO analysis was used to study the overrepresentation of GO functional classes in these sets of mRNA transcripts. The size of the corresponding node of the GO tree is proportional to the significance of the overrepresentation of the term. [B and C are reproduced with permission from ref. (Copyright 2005, IEEE).]
Fig. 2.
Fig. 2.
Overabundance analysis for QMIS-based associations. Left shows a comparison of distributions of observed and expected numbers of SNP–transcript pairs with a certain P value or lower. [Reproduced with permission from ref. (Copyright 2005, IEEE).] Inset shows the same restricted to P values between 1.0E-06 and 1.0E-04. Right shows the corresponding FDR. P values were computed exactly under a null model of uniform distribution of SNP genotype patterns of the same mixture.
Fig. 3.
Fig. 3.
Pairwise expression correlation of transcripts associated with the same SNP gene in two data sets. (A) Observed and expected correlation z scores for transcript subsets associated with SNPs in each candidate SNP gene. Expected distribution of z scores is computed as the correlation of n randomly selected transcripts for each of transcript set of size n. For each n, 100 random subsets were drawn. Error bars correspond to 1 SD. (B) z scores for the expression correlation of the corresponding subsets of transcripts in another breast cancer data set (17). Sets of transcripts associated with SNPs in ABCB1, BAK1, AKT2, and ABCC1 genes have expression correlation z scores in both data sets. Note that transcript sets are not ordered in the same way in both plots.

References

    1. Morley M., Molony C. M., Weber T. M., Devlin J. L., Ewens K. G., Spielman R. S., Cheung V. G. Nature. 2004;430:743–747. - PMC - PubMed
    1. Bystrykh L., Weersing E., Dontje B., Sutton S., Pletcher M. T., Wiltshire T., Su A. I., Vellenga E., Wang J., Manly K. F., et al. Nat. Genet. 2005;37:225–232. - PubMed
    1. Chesler E. J., Lu L., Shou S., Qu Y., Gu J., Wang J., Hsu H. C., Mountz J. D., Baldwin N. E., Langston M. A., et al. Nat. Genet. 2005;37:233–242. - PubMed
    1. Hubner N., Wallace C. A., Zimdahl H., Petretto E., Schulz H., Maciver F., Mueller M., Hummel O., Monti J., Zidek V., et al. Nat. Genet. 2005;37:243–253. - PubMed
    1. Pastinen T., Sladek R., Gurd S., Sammak A., Ge B., Lepage P., Lavergne K., Villeneuve A., Gaudin T., Brandstrom H., et al. Physiol. Genomics. 2004;16:184–193. - PubMed

Publication types