Skip to main page content
U.S. flag

An official website of the United States government

Dot gov

The .gov means it’s official.
Federal government websites often end in .gov or .mil. Before sharing sensitive information, make sure you’re on a federal government site.

Https

The site is secure.
The https:// ensures that you are connecting to the official website and that any information you provide is encrypted and transmitted securely.

Access keys NCBI Homepage MyNCBI Homepage Main Content Main Navigation
. 2015 Sep 29;33(40):5262-70.
doi: 10.1016/j.vaccine.2015.04.088. Epub 2015 May 6.

Lessons learned in the analysis of high-dimensional data in vaccinomics

Affiliations

Lessons learned in the analysis of high-dimensional data in vaccinomics

Ann L Oberg et al. Vaccine. .

Abstract

The field of vaccinology is increasingly moving toward the generation, analysis, and modeling of extremely large and complex high-dimensional datasets. We have used data such as these in the development and advancement of the field of vaccinomics to enable prediction of vaccine responses and to develop new vaccine candidates. However, the application of systems biology to what has been termed "big data," or "high-dimensional data," is not without significant challenges-chief among them a paucity of gold standard analysis and modeling paradigms with which to interpret the data. In this article, we relate some of the lessons we have learned over the last decade of working with high-dimensional, high-throughput data as applied to the field of vaccinomics. The value of such efforts, however, is ultimately to better understand the immune mechanisms by which protective and non-protective responses to vaccines are generated, and to use this information to support a personalized vaccinology approach in creating better, and safer, vaccines for the public health.

Keywords: Data interpretation, statistical; Immunogenetics; Systems biology; Vaccination; Vaccines.

PubMed Disclaimer

Conflict of interest statement

Conflicts of Interest: ALO, BAM, VSP, RBK and DJS declare no conflicts of interest. Dr. Poland is the chair of a Safety Evaluation Committee for novel investigational vaccine trials being conducted by Merck Research Laboratories. Dr. Poland offers consultative advice on vaccine development to Merck & Co. Inc., CSL Biotherapies, Avianax, Sanofi Pasteur, Dynavax, Novartis Vaccines and Therapeutics, PAXVAX Inc, Emergent Biosolutions, Adjuvance, and Vaxness. Dr. Poland holds two patents related to vaccinia and measles peptide research. These activities have been reviewed by the Mayo Clinic Conflict of Interest Review Board and are conducted in compliance with Mayo Clinic Conflict of Interest policies. This research has been reviewed by the Mayo Clinic Conflict of Interest Review Board and was conducted in compliance with Mayo Clinic Conflict of Interest policies.

Figures

Figure 1
Figure 1
(a) Minus versus Average (MVA) plot demonstrating the effect of change in reagents and sequencing software. There is one data point for every feature measured on the assay. The x-axis is the average of each feature over all specimens in the study. Generally, the y-axis is the difference of each feature from the mean. Thus, if the observations are identical to the mean, all data points would lie on the y=0 line. Here, the y-axis is the difference between the before reagent change mean and the after reagent change mean. A reference line for y=0 as well as a loess smoother are included on the plot. If the smoother overlays the y-0 line, no normalization is needed. If the smoother is parallel to the y=0 line but shifted up or down, this indicates that between specimen biases are similar for all abundance levels and a linear normalization is needed. The nonlinear smoother demonstrates that nonlinear bias is present. (b). The same study shown after normalization and filtering out genes with median count <32. The fact that the smoother is now straight and lies on the y=0 line demonstrates that the bias has been removed.
Figure 2
Figure 2
Box-and-whisker plots showing global distribution of per-gene counts on the log scale (y-axis) by lane (x-axis) sorted by assay order. Top, mid-line and bottom of boxes indicate 75th, 50th and 25th percentiles, respectively. (a) Pre-normalization. The total counts/lane increased from ∼150million to ∼200million after reagent and software upgrades. This is evident from the general shift up approximately two-thirds of the way across the plot. A failed specimen with median nearly half that of the neighboring specimens is evident about one-third of the way across the plot. The failed specimen was deleted in subsequent analyses. (b) Post-normalization. After normalization via Conditional Quantile Normalization (CQN)[82], the distributions of the specimens are aligned exactly at the maximum, 75th and 50th percentiles as expected. The lower counts are not exactly aligned since the smallest counts are not adjusted in CQN.
Figure 3
Figure 3
Over 450 PBMC specimens from healthy subjects aged 50-74 years old on were assayed on five bead array plates of the Illumina DNA methylation 450K assay. The assay utilizes two probe designs, each yielding an M and U intensity value (fluorescence intensity of methylated or un-methylated cells, respectively). These intensity values are mathematically combined to create an estimate of the percent methylation (β-value) in the specimen. (a) There is evidence of nonlinear between-specimen biases in the M and U expression intensities as demonstrated by these residual MVA plots. Each smoother represents one specimen. Nonlinearities are evident. (b) Between-specimen biases are near linear on the beta-value scale (left), are not large, and are essentially eliminated via this strategy (right).
Figure 3
Figure 3
Over 450 PBMC specimens from healthy subjects aged 50-74 years old on were assayed on five bead array plates of the Illumina DNA methylation 450K assay. The assay utilizes two probe designs, each yielding an M and U intensity value (fluorescence intensity of methylated or un-methylated cells, respectively). These intensity values are mathematically combined to create an estimate of the percent methylation (β-value) in the specimen. (a) There is evidence of nonlinear between-specimen biases in the M and U expression intensities as demonstrated by these residual MVA plots. Each smoother represents one specimen. Nonlinearities are evident. (b) Between-specimen biases are near linear on the beta-value scale (left), are not large, and are essentially eliminated via this strategy (right).
Figure 4
Figure 4
Power to detect genetic associations as a function of ordinal genotypic effect size for three different analyses, and with two different levels of significance. 1000 data sets were generated for each combination of parameters. Panel (a) shows statistical power for α=0.05 and panel (b) shows statistical power for a genome-wide significance threshold (α=5×10-8).
Figure 5
Figure 5
Bias in estimating an ordinal genotypic effect, as a function of the simulated ordinal genotypic effect size for three different analytical approaches.

Similar articles

Cited by

References

    1. Poland GA. Pharmacology, vaccinomics, and the second golden age of vaccinology. Clin Pharmacol Ther. 2007 Dec;82(6):623–6. - PubMed
    1. Poland GA, Ovsyannikova IG, Jacobson RM. Personalized vaccines: the emerging field of vaccinomics. Expert Opin Biol Ther. 2008 Nov;8(11):1659–67. - PMC - PubMed
    1. Poland GA, Oberg AL. Vaccinomics and bioinformatics: accelerants for the next golden age of vaccinology. Vaccine. 2010 Apr 30;28(20):3509–10. - PMC - PubMed
    1. Poland GA, Kennedy RB, McKinney BA, Ovsyannikova IG, Lambert ND, Jacobson RM, et al. Vaccinomics, adversomics, and the immune response network theory: Individualized vaccinology in the 21st century. Semin Immunol. 2013 Jun 4; - PMC - PubMed
    1. Oberg AL, Kennedy RB, Li P, Ovsyannikova IG, Poland GA. Systems biology approaches to new vaccine development. Curr Opin Immunol. 2011 May 11; - PMC - PubMed

Publication types