Skip to main page content
U.S. flag

An official website of the United States government

Dot gov

The .gov means it’s official.
Federal government websites often end in .gov or .mil. Before sharing sensitive information, make sure you’re on a federal government site.

Https

The site is secure.
The https:// ensures that you are connecting to the official website and that any information you provide is encrypted and transmitted securely.

Access keys NCBI Homepage MyNCBI Homepage Main Content Main Navigation
Review
. 2018 Feb;24(2):221-235.
doi: 10.1016/j.molmed.2017.12.008. Epub 2018 Feb 4.

Biosignature Discovery for Substance Use Disorders Using Statistical Learning

Affiliations
Review

Biosignature Discovery for Substance Use Disorders Using Statistical Learning

James W Baurley et al. Trends Mol Med. 2018 Feb.

Abstract

There are limited biomarkers for substance use disorders (SUDs). Traditional statistical approaches are identifying simple biomarkers in large samples, but clinical use cases are still being established. High-throughput clinical, imaging, and 'omic' technologies are generating data from SUD studies and may lead to more sophisticated and clinically useful models. However, analytic strategies suited for high-dimensional data are not regularly used. We review strategies for identifying biomarkers and biosignatures from high-dimensional data types. Focusing on penalized regression and Bayesian approaches, we address how to leverage evidence from existing studies and knowledge bases, using nicotine metabolism as an example. We posit that big data and machine learning approaches will considerably advance SUD biomarker discovery. However, translation to clinical practice, will require integrated scientific efforts.

Keywords: artificial intelligence; biomarker; genomics; machine learning; nicotine metabolism; substance use disorders.

PubMed Disclaimer

Figures

Key Figure 1
Key Figure 1. Biosignature Development Workflow
Data where the number of variables vastly outnumbers the number of samples (high dimensional data) are becoming commonplace in studies of substance abuse disorders and treatment approaches. We present two approaches (penalized regression and Bayesian learning) for detecting the combination of variables (biosignatures) predictive of SUD phenotypes (e.g., nicotine metabolism). Biosignature detection is followed by validation, then prospective assessment of utility for translation to clinical practice.
Figure 2
Figure 2. Biosignatures of Nicotine Metabolism
Nicotine metabolism biosignatures are learned from genotypes G and clinical C data in laboratory studies of nicotine metabolism. Nicotine metabolism is then predicted (Zpred) in existing or new observations using the biosignatures and corresponding model weights. The predicted nicotine metabolite ratio can them be associated with clinical outcomes Y, such as smoking cessation (1’s indicate success). Adapted from [10].
Figure 3
Figure 3. Genetic Associations with Nicotine Metabolism in the CYP2A6 Region of Human Chromosome 19
The variants selected using penalized regression algorithms are overlaid on the marginal genetic association results (−log10 p-values on the y-axis). This shows how penalized regression algorithms can define biosignatures (red boxes) from complex patterns of marginal associations (stars).
Figure 4
Figure 4. An Ensemble of Models to Define Biosignatures
The rows highlight (unshaded) the sets of SNPs selected by different penalized regression algorithms applied to nicotine metabolism data. Shaded SNPs were not selected as predictors. While there are a core set of SNPs selected by all the approaches, there is diversity in the sets of SNPs selected among the models. We define the biosignature as the entire set of variants selected by any of the algorithms.
Figure 5
Figure 5. Tree-based Structures Can Represent Complex Relationships in Sets of Variables
Here each derived variable Z is computed from its inputs (genetic variants, clinical factors, or other derived variables) and a pair of edge parameters θ. The regression coefficient β1 represents the net effect of the entire combination of variables on the outcome of interest Y. These structures were explored using Bayesian algorithms to learn biosignatures of nicotine metabolism.
Figure 6
Figure 6. Joint SNP Effects on Nicotine Metabolism
The effects of combination of genetic variant on nicotine metabolism can be explored using Bayesian algorithms [58]. This plot shows that many genetic variants (dots) in different genes (color) can modify the effects of CYP2A6 variants on nicotine metabolism. This presents another way of defining biosignatures from a collection of models for use in prediction or generating new hypotheses.

References

    1. Boscolo-Berto R, Viel G, Montisci M, Terranova C, Favretto D, Ferrara SD. Ethyl glucuronide concentration in hair for detecting heavy drinking and/or abstinence: a meta-analysis. Int J Legal Med. 2013;127(3):611–619. - PubMed
    1. Cone EJ, Bigelow GE, Herrmann ES, Mitchell JM, LoDico C, Flegel R, Vandrey R. Nonsmoker exposure to secondhand cannabis smoke. III. oral fluid and blood drug concentrations and corresponding subjective effects. J Anal Toxicol. 2015;39(7):497. - PMC - PubMed
    1. Benowitz NL, Jain S, Dempsey DA, Nardone N, Helen GS, Jacob P., 3rd Urine cotinine screening detects nearly ubiquitous tobacco smoke exposure in urban adolescents. Nicotine Tob Res. 2017;19(9):1048. - PMC - PubMed
    1. Volkow ND, Koob G, Baler R. Biomarkers in substance use disorders. ACS Chem Neurosci. 2015;6(4):522. doi: 10.1021/acschemneuro.5b00067. - DOI - PubMed
    1. Yi H, Breheny P, Imam N, Liu Y, Hoeschele I. Penalized multimarker vs. single-marker regression methods for genome-wide association studies of quantitative traits. Genetics. 2015;199(1):205. - PMC - PubMed

Publication types