Skip to main page content
U.S. flag

An official website of the United States government

Dot gov

The .gov means it’s official.
Federal government websites often end in .gov or .mil. Before sharing sensitive information, make sure you’re on a federal government site.

Https

The site is secure.
The https:// ensures that you are connecting to the official website and that any information you provide is encrypted and transmitted securely.

Access keys NCBI Homepage MyNCBI Homepage Main Content Main Navigation
. 2013 Dec 2;2(4):1411-37.
doi: 10.3390/biology2041411.

Portraying the Expression Landscapes of B-CellLymphoma-Intuitive Detection of Outlier Samples and of Molecular Subtypes

Affiliations

Portraying the Expression Landscapes of B-CellLymphoma-Intuitive Detection of Outlier Samples and of Molecular Subtypes

Lydia Hopp et al. Biology (Basel). .

Abstract

We present an analytic framework based on Self-Organizing Map (SOM) machine learning to study large scale patient data sets. The potency of the approach is demonstrated in a case study using gene expression data of more than 200 mature aggressive B-cell lymphoma patients. The method portrays each sample with individual resolution, characterizes the subtypes, disentangles the expression patterns into distinct modules, extracts their functional context using enrichment techniques and enables investigation of the similarity relations between the samples. The method also allows to detect and to correct outliers caused by contaminations. Based on our analysis, we propose a refined classification of B-cell Lymphoma into four molecular subtypes which are characterized by differential functional and clinical characteristics.

PubMed Disclaimer

Figures

Figure 1
Figure 1
Self-organizing map (SOM) gallery of lymphoma subtypes with a resolution of 50 × 50 metagenes: The small mosaic images refer to selected individual tumor samples assigned to the mBL, non-mBL and intermediate subtypes. The larger images represent the respective mean subtype portraits (see methodical section). Dark red/blue colored metagenes refer to the 90th/10th-percentile of expression in each sample, respectively. The complete gallery of all sample portraits is available in Supplementary File 2.
Figure 2
Figure 2
Spot module characteristics: (a) The over-expression summary map collects all over-expression spots observed in the individual portraits into one map. Subtypes frequently showing the respective spots are indicated. (b) The over-expression spot map defines the spots used for further analysis. Regions beyond the 98th-percentile threshold of metagene expression are selected. The spots are assigned by large capital letters. The blue rectangles include highly correlated spots (r > 0.7). The blue and red dashed lines connect correlated (0.4 < r < 0.7) and anti-correlated (r < −0.6) spots, respectively. (c) The overexpression heatmap shows the mean expression of the spots across all samples in the data set. The samples are sorted according to their subtype. (d) The under-expression summary map collects all under-expressed spots observed in the individual portraits. Note the antagonistic nature of mBL and non-mBL expression: spots over-expressed in mBL become under-expressed in non-mBL and vice versa (compare with panel a).
Figure 3
Figure 3
Functional analysis: (a) The functional context of the most abundant spots is assigned according to the topmost overexpressed gene sets in each of the spots. (bd) GSZ-profiles and population maps are shown for gene sets accumulating in the mBL and non-mBL specific overexpression spots as indicated by the red ellipses (panel b), for mBL-vs-non-mBL signature sets published previously [10] (c) and for sets accumulating in rare spots (d).
Figure 4
Figure 4
Sample similarity analysis: (a) Independent component analysis (ICA) of lymphoma samples. The distribution of the samples is shown in the space spanned by the two leading independent components. (b) The neighbor-joining tree projects the sample similarity relations into a dendrogram. The bush-like structures reveal a finer granularity of subtypes beyond the three classes considered so far.
Figure 5
Figure 5
Pairwise correlation analysis of all lymphoma samples: (a) The pairwise correlation map (PCM) visualizes the correlation coefficients for all pairs of samples. The samples are arranged according to their subtype membership as indicated by the color bars. In the heatmap, red colors indicate positive, blue colors negative correlations between the samples. (b) The correlation network (CN) translates the PCM into a graph structure. The nodes are given by the samples and the edges connect positively correlated sample pairs (r > 0.5). Mean subtype portraits are given within the figure (large maps). Outlier nodes are highlighted by arrows. The SOM portraits of the respective samples are shown by small maps. The red circles and the spot letters indicate the outlier spots differing from the subtype specific patterns (compare these individual sample portraits with the mean subtype portraits).
Figure 6
Figure 6
Correction of outlier samples contaminated with healthy lymph node tissue. The left and right parts of the figure refer to the uncorrected and corrected data, respectively. (a) GSZ-profile and population map of the ‘tonsil’ gene set: The signature is not characteristic for one of the subtypes and their genes accumulate in spot ‘S’ of the map. (b) Correlation network of the lymphoma data set. (c) SOM portraits of selected outlier samples. The arrows point to the position of these samples in the CN and in the GSZ-profile. After correction, the expression landscape of the selected samples reveals subtype-specific signatures.
Figure 7
Figure 7
k-Means clustering into four subtypes: (a) Mean expression portraits of the four new subtypes. The green arrows indicate the spot pattern transitions from mBL to non-mBL via intermediate A or B. (b) CN colored according to the new subtypes obtained.
Figure 8
Figure 8
Consensus clustering: (ac) Cluster-heatmaps of the consensus matrices for class numbers ranging from two to four, respectively. Pairs of samples frequently found in one joint class accumulate in the blue regions along the diagonal of the map. (d) Cumulative distribution function (CDF) for class numbers ranging from two to six.
Figure 9
Figure 9
Kaplan-Meier survival curves of the original three subtypes (a) and the new four subtype (b) classifications. Tick marks indicate patients alive at the time of last follow-up. Subtype specific survival curves are compared using log-rank test and the respective p-values are indicated within the figures.

Similar articles

Cited by

References

    1. Cancer Genome Atlas Research Network Comprehensive genomic characterization defines human glioblastoma genes and core pathways. Nature. 2008;455:1061–1068. doi: 10.1038/nature07385. - DOI - PMC - PubMed
    1. Cancer Genome Atlas Research Networ Comprehensive molecular characterization of human colon and rectal cancer. Nature. 2012;487:330–337. - PMC - PubMed
    1. Barretina J., Caponigro G., Stransky N., Venkatesan K., Margolin A.A., Kim S., Wilson C.J., Lehár J., Kryukov G.V., Sonkin D., et al. The Cancer Cell Line Encyclopedia enables predictive modelling of anticancer drug sensitivity. Nature. 2012;483:603–607. doi: 10.1038/nature11003. - DOI - PMC - PubMed
    1. Hudson T.J., Anderson W., Artez A., Barker A.D., Bell C., Bernabé R.R., Bhan M.K., Calvo F., Eerola I., Gerhard D.S., et al. International network of cancer genome projects. Nature. 2010;464:993–998. doi: 10.1038/nature08987. - DOI - PMC - PubMed
    1. Fernald G.H., Capriotti E., Daneshjou R., Karczewski K.J., Altman R.B. Bioinformatics challenges for personalized medicine. Bioinformatics. 2011;27:1741–1748. doi: 10.1093/bioinformatics/btr295. - DOI - PMC - PubMed

LinkOut - more resources