Skip to main page content
U.S. flag

An official website of the United States government

Dot gov

The .gov means it’s official.
Federal government websites often end in .gov or .mil. Before sharing sensitive information, make sure you’re on a federal government site.

Https

The site is secure.
The https:// ensures that you are connecting to the official website and that any information you provide is encrypted and transmitted securely.

Access keys NCBI Homepage MyNCBI Homepage Main Content Main Navigation
. 2008 Sep 3;3(9):e3111.
doi: 10.1371/journal.pone.0003111.

Identification of a 5-protein biomarker molecular signature for predicting Alzheimer's disease

Affiliations

Identification of a 5-protein biomarker molecular signature for predicting Alzheimer's disease

Martín Gómez Ravetti et al. PLoS One. .

Abstract

Background: Alzheimer's disease (AD) is a progressive brain disease with a huge cost to human lives. The impact of the disease is also a growing concern for the governments of developing countries, in particular due to the increasingly high number of elderly citizens at risk. Alzheimer's is the most common form of dementia, a common term for memory loss and other cognitive impairments. There is no current cure for AD, but there are drug and non-drug based approaches for its treatment. In general the drug-treatments are directed at slowing the progression of symptoms. They have proved to be effective in a large group of patients but success is directly correlated with identifying the disease carriers at its early stages. This justifies the need for timely and accurate forms of diagnosis via molecular means. We report here a 5-protein biomarker molecular signature that achieves, on average, a 96% total accuracy in predicting clinical AD. The signature is composed of the abundances of IL-1alpha, IL-3, EGF, TNF-alpha and G-CSF.

Methodology/principal findings: Our results are based on a recent molecular dataset that has attracted worldwide attention. Our paper illustrates that improved results can be obtained with the abundance of only five proteins. Our methodology consisted of the application of an integrative data analysis method. This four step process included: a) abundance quantization, b) feature selection, c) literature analysis, d) selection of a classifier algorithm which is independent of the feature selection process. These steps were performed without using any sample of the test datasets. For the first two steps, we used the application of Fayyad and Irani's discretization algorithm for selection and quantization, which in turn creates an instance of the (alpha-beta)-k-Feature Set problem; a numerical solution of this problem led to the selection of only 10 proteins.

Conclusions/significance: the previous study has provided an extremely useful dataset for the identification of AD biomarkers. However, our subsequent analysis also revealed several important facts worth reporting: 1. A 5-protein signature (which is a subset of the 18-protein signature of Ray et al.) has the same overall performance (when using the same classifier). 2. Using more than 20 different classifiers available in the widely-used Weka software package, our 5-protein signature has, on average, a smaller prediction error indicating the independence of the classifier and the robustness of this set of biomarkers (i.e. 96% accuracy when predicting AD against non-demented control). 3. Using very simple classifiers, like Simple Logistic or Logistic Model Trees, we have achieved the following results on 92 samples: 100 percent success to predict Alzheimer's Disease and 92 percent to predict Non Demented Control on the AD dataset.

PubMed Disclaimer

Conflict of interest statement

Competing Interests: The authors have declared that no competing interests exist.

Figures

Figure 1
Figure 1. Histograms of the number of errors of the random forest classifier using 20 randomly selected signatures with 18 proteins.
The arrow indicates the results under the same conditions of the 18-protein signature proposed by Ray et al.
Figure 2
Figure 2. Histograms of the number of errors considering the random forest classifier and the 20 randomly selected signatures with 6 proteins.
The arrow indicates the results under the same conditions of our 6-protein signature.
Figure 3
Figure 3. Classification and prediction of clinical Alzheimer's diagnosis in subjects with Alzheimer's disease.
(a) An undirected graph, where each node corresponds a different protein belonging to the 10-protein signature we identified; each edge indicates the existence of a direct relation as obtained by searching the PubMed database, (using the Pathway Studio software). (b) Identification of the maximum clique of the graph, uncovering a robust 6-protein signature; each node on the clique has a direct relation with each other. Simple Logistic was used to classify and predict Alzheimer's (AD) and non-Alzheimer's class, in the training set (c), the blinded test set ‘AD’ (d). All the results are shown in a confusion matrix, for the training set a 10-fold cross-validation was applied 10 times, in both cases Simple Logistic was used with the default parameters of Weka package. All the p-values were calculated using the Fisher exact test.

References

    1. Ray S, Britschgi M, Herbert C, Takeda-Uchimura Y, Boxer A, et al. Classification and prediction of clinical Alzheimer's diagnosis based on plasma signaling proteins. Nat Med. 2007;13:1359–1362. - PubMed
    1. Witten IH, Frank E. Data Mining: Practical machine learning tools and techniques. Morgan Kaufmann; 2005.
    1. Ariadne Genomics I. Pathway StudioTM. 5.0 ed. 2007
    1. Bruunsgaard H, Andersen-Ranberg K, Jeune B, Pedersen AN, Skinhoj P, et al. A high plasma concentration of TNF-alpha is associated with dementia in centenarians. J Gerontol A Biol Sci Med Sci. 1999;54:M357–364. - PubMed
    1. Finch CE, Morgan TE. Systemic Inflammation, Infection, ApoE Alleles, and Alzheimer Disease: A Position Paper. Current Alzheimer Research. 2007;4:185–189. - PubMed

MeSH terms