Skip to main page content
U.S. flag

An official website of the United States government

Dot gov

The .gov means it’s official.
Federal government websites often end in .gov or .mil. Before sharing sensitive information, make sure you’re on a federal government site.

Https

The site is secure.
The https:// ensures that you are connecting to the official website and that any information you provide is encrypted and transmitted securely.

Access keys NCBI Homepage MyNCBI Homepage Main Content Main Navigation
. 2022 Jun 2:10:890901.
doi: 10.3389/fbioe.2022.890901. eCollection 2022.

Identification of Type 2 Diabetes Biomarkers From Mixed Single-Cell Sequencing Data With Feature Selection Methods

Affiliations

Identification of Type 2 Diabetes Biomarkers From Mixed Single-Cell Sequencing Data With Feature Selection Methods

Zhandong Li et al. Front Bioeng Biotechnol. .

Abstract

Diabetes is the most common disease and a major threat to human health. Type 2 diabetes (T2D) makes up about 90% of all cases. With the development of high-throughput sequencing technologies, more and more fundamental pathogenesis of T2D at genetic and transcriptomic levels has been revealed. The recent single-cell sequencing can further reveal the cellular heterogenicity of complex diseases in an unprecedented way. With the expectation on the molecular essence of T2D across multiple cell types, we investigated the expression profiling of more than 1,600 single cells (949 cells from T2D patients and 651 cells from normal controls) and identified the differential expression profiling and characteristics at the transcriptomics level that can distinguish such two groups of cells at the single-cell level. The expression profile was analyzed by several machine learning algorithms, including Monte Carlo feature selection, support vector machine, and repeated incremental pruning to produce error reduction (RIPPER). On one hand, some T2D-associated genes (MTND4P24, MTND2P28, and LOC100128906) were discovered. On the other hand, we revealed novel potential pathogenic mechanisms in a rule manner. They are induced by newly recognized genes and neglected by traditional bulk sequencing techniques. Particularly, the newly identified T2D genes were shown to follow specific quantitative rules with diabetes prediction potentials, and such rules further indicated several potential functional crosstalks involved in T2D.

Keywords: Monte Carlo feature selection; RIPPER; single-cell sequencing; support vector machine; type 2 diabetes.

PubMed Disclaimer

Conflict of interest statement

The authors declare that the research was conducted in the absence of any commercial or financial relationships that could be construed as a potential conflict of interest.

Figures

FIGURE 1
FIGURE 1
Workflow for key gene identification of type 2 diabetes. The MCFS method was used to evaluate the importance of all features (genes). On the one hand, the IFS method with SVM/RF/KNN was applied on the feature list yielded by the MCFS method to extract optimal T2D-associated genes and optimal classifiers. On the other hand, the informative features yielded by the MCFS method were fed into the Johnson reducer and RIPPER algorithms to construct optimal T2D-associated rules.
FIGURE 2
FIGURE 2
Performance of KNN integrated in IFS using different numbers of features. The y-axis is F1-measure, and the x-axis is the number of participated features. k is the parameter of KNN, indicating the number of nearest neighbors that are used to make prediction. KNN can yield the best F1-measure of 0.886 when k = 5 and the top 665 features are used.
FIGURE 3
FIGURE 3
Bar chart to show five measurements of three optimal classifiers based on different classification algorithms.
FIGURE 4
FIGURE 4
Performance of RF integrated in IFS using different numbers of features. The y-axis is F1-measure, and the x-axis is the number of participated features. I is the parameter of RF, indicating the number of decision trees. RF can yield the best F1-measure of 0.907 when I = 100 and the top 305 features are used.
FIGURE 5
FIGURE 5
Performance of SVM integrated in IFS using different numbers of features. The y-axis is F1-measure, and the x-axis is the number of participated features. SVM can yield the best F1-measure of 0.936 when the kernel is a linear function and the top 745 features are used.

Similar articles

Cited by

References

    1. American Diabetes Association (2014). Diagnosis and Classification of Diabetes Mellitus. Diabetes Care 37 (Suppl. 1), S81–S90. 10.2337/dc14-S081 - DOI - PubMed
    1. Andersen M. K., Pedersen C.-E. T., Moltke I., Hansen T., Albrechtsen A., Grarup N. (2016). Genetics of Type 2 Diabetes: the Power of Isolated Populations. Curr. Diab Rep. 16, 65. 10.1007/s11892-016-0757-z - DOI - PubMed
    1. Aubert D., Bisanz-Seyer C., Herzog M. (1992). Mitochondrial Rps14 Is a Transcribed and Edited Pseudogene in Arabidopsis thaliana . Plant Mol. Biol. 20, 1169–1174. 10.1007/bf00028903 - DOI - PubMed
    1. Boden G. (1997). Role of Fatty Acids in the Pathogenesis of Insulin Resistance and NIDDM. Diabetes 46, 3–10. 10.2337/diabetes.46.1.3 - DOI - PubMed
    1. Borg H., Gottsäter A., Landin-Olsson M., Fernlund P., Sundkvist G. (2001). High Levels of Antigen-specific Islet Antibodies Predict Futureβ -Cell Failure in Patients with Onset of Diabetes in Adult Age1. J. Clin. Endocrinol. Metabolism 86, 3032–3038. 10.1210/jcem.86.7.7658 - DOI - PubMed

LinkOut - more resources