Skip to main page content
U.S. flag

An official website of the United States government

Dot gov

The .gov means it’s official.
Federal government websites often end in .gov or .mil. Before sharing sensitive information, make sure you’re on a federal government site.

Https

The site is secure.
The https:// ensures that you are connecting to the official website and that any information you provide is encrypted and transmitted securely.

Access keys NCBI Homepage MyNCBI Homepage Main Content Main Navigation
. 2016 Feb 10;11(2):e0147215.
doi: 10.1371/journal.pone.0147215. eCollection 2016.

When Quality Beats Quantity: Decision Theory, Drug Discovery, and the Reproducibility Crisis

Affiliations

When Quality Beats Quantity: Decision Theory, Drug Discovery, and the Reproducibility Crisis

Jack W Scannell et al. PLoS One. .

Abstract

A striking contrast runs through the last 60 years of biopharmaceutical discovery, research, and development. Huge scientific and technological gains should have increased the quality of academic science and raised industrial R&D efficiency. However, academia faces a "reproducibility crisis"; inflation-adjusted industrial R&D costs per novel drug increased nearly 100 fold between 1950 and 2010; and drugs are more likely to fail in clinical development today than in the 1970s. The contrast is explicable only if powerful headwinds reversed the gains and/or if many "gains" have proved illusory. However, discussions of reproducibility and R&D productivity rarely address this point explicitly. The main objectives of the primary research in this paper are: (a) to provide quantitatively and historically plausible explanations of the contrast; and (b) identify factors to which R&D efficiency is sensitive. We present a quantitative decision-theoretic model of the R&D process. The model represents therapeutic candidates (e.g., putative drug targets, molecules in a screening library, etc.) within a "measurement space", with candidates' positions determined by their performance on a variety of assays (e.g., binding affinity, toxicity, in vivo efficacy, etc.) whose results correlate to a greater or lesser degree. We apply decision rules to segment the space, and assess the probability of correct R&D decisions. We find that when searching for rare positives (e.g., candidates that will successfully complete clinical development), changes in the predictive validity of screening and disease models that many people working in drug discovery would regard as small and/or unknowable (i.e., an 0.1 absolute change in correlation coefficient between model output and clinical outcomes in man) can offset large (e.g., 10 fold, even 100 fold) changes in models' brute-force efficiency. We also show how validity and reproducibility correlate across a population of simulated screening and disease models. We hypothesize that screening and disease models with high predictive validity are more likely to yield good answers and good treatments, so tend to render themselves and their diseases academically and commercially redundant. Perhaps there has also been too much enthusiasm for reductionist molecular models which have insufficient predictive validity. Thus we hypothesize that the average predictive validity of the stock of academically and industrially "interesting" screening and disease models has declined over time, with even small falls able to offset large gains in scientific knowledge and brute-force efficiency. The rate of creation of valid screening and disease models may be the major constraint on R&D productivity.

PubMed Disclaimer

Conflict of interest statement

Competing Interests: The authors of this manuscript have the following competing interests: JWS is a director and shareholder of JW Scannell Analytics Ltd., which sells consulting services related to biopharmaceuticals. JB is a partner and employee of Clerbos LLC which sells consulting services related to systems biology. These companies did not play a role in the study design, data collection and analysis, decision to publish, or preparation of the manuscript and only provided financial support in the form of authors' salaries, dividends, research materials, and publication costs. This does not alter the authors' adherence to PLOS ONE policies on sharing data and materials.

Figures

Fig 1
Fig 1. Decision theoretic view of biopharma discovery, research, and development.
(A) The process starts with a large set of therapeutic possibilities (light blue oval). These could be putative disease mechanisms or candidate drug targets, in either an academic or commercial setting. However, we discuss them as if they are molecules in a commercial R&D campaign (e.g., compounds in a screening library and the analogues that could be reasonably synthesized to create leads). There are A candidates that with perfect R&D decision making and an unlimited R&D budget would eventually be approved by the drug regulator for the indication or indications. There are U candidates that would not succeed given similar skill and investment. In general, U >> A. The Discovery (D), Preclinical (P), and Clinical Trial (C) diamonds are “classifiers” (Table 1). Each takes decision variables (X, Y, Z) from predictive models for some or all of the candidates and tests the variables against a decision threshold, yielding yeses which receive further scrutiny or noes which are abandoned. The unit cost per surviving candidate increases through the process [21]. Given serial decisions, only yeses from C face the gold standard reference test; the drug regulator (e.g., the Food and Drug Administration, or FDA). The other decisions face “imperfect” reference tests [33] [34] [27], the next steps in the process, which are mere proxies for the gold standard. The imperfect reference test for yeses from D is provided by P. The imperfect reference test for yeses from P is provided by C. (B) Decision variables X, Y, and Z, will correlate to a greater or lesser extent with each other and with the gold standard reference variable R. The correlation coefficient between X and Y is ρX,Y, the correlation coefficient between Y and Z is ρY,Z, etc. Most of these correlations will never be measured directly during the R&D process. If ρX,R is very low, the Discovery stage will not enrich the Preclinical stage for approvable candidates, even if ρX,Y is high and decisions from D initially appear to have been successful.
Fig 2
Fig 2. Quantitative classifier model.
Bivariate normal probability density function determined by the correlation, ρY,R, between decision variable, Y, and reference variable, R. Lighter colours indicate high probability density (candidate molecules more likely to lie here), and darker colours indicate a low probability density (molecules less likely to lie here). The units on the horizontal and vertical axes are one standard deviation. We apply a decision threshold, yt (vertical dotted line) to the decision variable and then apply a reference test and a reference threshold, rt,(horizontal dotted line) to molecules that exceed the decision threshold yt. In the sensitivity analyses (see later) decision and reference thresholds are varied as is ρY,R. True positives (TP) and false positives (FP) correspond to the probability mass in the upper right and lower right quadrants, respectively. (A) When ρY,R is high, PPV is high. (B) When ρY,R is low, PPV tends to be low.
Fig 3
Fig 3. Predictive validity and classifier performance.
(A) The bivariate normal probability density function for decision variable Y (horizontal axis) and reference variable R (vertical axis). The correlation between Y and R is high (ρY,R = 0.95) so the decision variable has high PV. The graph shows only the positive quadrant of the distribution. The reference threshold, expressed here in units of standard deviation, is rt = 0.5 (dotted line) so positives are common, accounting for P(Rrt) ≈ 30% of the probability mass. (B) shows TPR (solid line) and FPR (dotted line) as the decision threshold, yt, varies. At some thresholds, the spread between the TPR and FPR is wide. (C) shows PPV vs. decision threshold, yt. (D) to (F) repeat the analyses with a decision variable with lower PV (ρY,R = 0.4). PPV declines vs. panel (C) but PPV remains high because positives are common. (G) to (I) repeat that analysis at ρY,R = 0.95 but with a high reference threshold (2.5 standard deviation units) and rare positives (P(Rrt) ≈ 0.6% of the probability mass). It is possible to achieve a high PPV, but only at a high decision threshold when the TPR is low, which would require screening a large number of items per positive detected. (J) to (L) show the situation with the same high reference threshold (i.e., rare positives) but with a decision variable with low PV. In this case, PPV is low, even with a very high decision threshold and a very low TPR.
Fig 4
Fig 4. Decision performance as yt (throughput) and ρY,R (predictive validity) vary.
Shading shows the PPV of the classifier (log10 units, with lighter shades showing better performance). The vertical axis represents both decision threshold and screening throughput. The scale is in log10 units. 7 represents a throughput of 107 and a decision threshold that accepts only the top 107th of candidates (P(Yyt) = 10−7, Eq 6); 6 represents a throughput of 106 and a decision threshold that accepts only the top 106th of candidates (P(Yyt) = 10−6, Eq 6); etc. The horizontal axis represents PV as the correlation coefficient, ρY,R, between Y and R, with the right hand end of each axis representing high PV (ρY,R = 0.98), and the left hand end of each axis representing low PV (ρY,R = 0). Our choice of scale for each axis is discussed in the main text. In (A), positives are relatively common. Here, P(Rrt) = 0.01, or one percent of the candidates entering the classifier. In (B), positives are relatively rare. Here, P(Rrt) = 10−5, or one hundred thousandth of the candidates entering the classifier. The spacing and orientation of the contours show the degree to which PPV changes with throughput and with ρY,R. PPV is relatively sensitive to throughput when ρY,R is high and when positives are very rare (lower right hand side of panel B.). However, PPV is relatively insensitive to throughput when ρY,R is low (left hand side of both panels). For much of the parameter space illustrated, an absolute 0.1 change in ρY,R (e.g., from 0.4 to 0.5, or 0.5 to 0.6 on the horizontal axis) has a larger effect on PPV than a 10x change in throughput (e.g., from 4 log10 units to 5 log10 units on the vertical axis).
Fig 5
Fig 5. Effect of multiple classification steps.
(A) Points represents decision performance with one, two, three, or four, similar classifiers applied in series. Each line represents the same value of correlation coefficient, ρ, applied to all pairwise relationships between decision variables and between decision variables and R. Thus in each line, all decision variables are equally correlated with each other and with R. The correlation coefficient between decision variables (X, Y, W, Z) and R vary from 0.9 (high PV, top right line) to 0.3 (low PV, bottom left line). The top left point on each line shows a single classifier applied to X, with each additional point towards the bottom and right of each line showing the effects of adding an additional classifier, up to a maximum of 4 classifiers. The top decile of candidates in the starting set exceed each decision threshold and the reference threshold (i.e., P(Xxt) = P(Yyt) = P(Wwt) = P(Zzt) = P(Rrt) = 0.1). In general, adding more steps increases PPV but at the cost of a lower TPR. There are diminishing returns from each additional classifier, particularly when the decision variables are highly correlated with one another. Furthermore, a single classifier that is highly correlated with R (e.g., the uppermost points on the lines with high correlation coefficients) often outperforms a combination of several classifiers with lower correlations with R in terms of both PPV and TPR. Note the logarithmic vertical axis. (B) is exactly as (A) but shows on the vertical axis the number of candidates screened per TP (Table 1). The number of candidates that must be screened per true positive identified increases as ρ (PV) declines because positives are wrongly rejected. Increasing ρ (PV) increases search efficiency. Note the logarithmic vertical axis.
Fig 6
Fig 6. Decision performance as correlations between decision variables change.
The first decision variable was X, and the correlation coefficient between X and R, ρX,R, was held constant at 0.5. The second decision variable was Y which varied in terms of its correlation with X (ρY,X, vertical axes) and with reference variable R (ρY,R, horizontal axes). Some regions of the graphs are empty because certain combinations of correlation coefficients cannot coexist. The top decile of candidates in the starting set exceed each decision threshold and the reference threshold (i.e., P(Xxt) = P(Yyt) = P(Rrt) = 0.1). (A) shows PPV. Lighter shades indicate higher PPV. PPV increases as ρY,R increases and as ρY,X declines. The use of Y may depress PPV if Y is highly correlated with X while having a low correlation with R. (B) shows the number of candidates screened per TP. Darker shades indicate fewer candidates per TP. Note the log10 colour scale. The number increases as ρY,R declines and as ρY,X declines.
Fig 7
Fig 7. Link between validity and reproducibility across a set of screening and disease models.
The figure shows the results of a Monte Carlo simulation (see S1 File for code). (A) Each small point represents one simulated screening or disease model (PM). When testing therapeutic candidates, each PM yields an expected signal which is the sum of two components. The first component is the signal from the reference test multiplied by a gain parameter (horizontal axis). The second component is a model-specific signal, whose gain is shown on the vertical axis. This component can also be thought of as systematic model-specific bias. It is real, but it tells us nothing about the reference test. (B) Each model’s PV is determined by the relative strength of the reference component versus the model-specific component of the signal. PV is high when the reference component is much larger than the model-specific component of the signal. This is because the output of the PM will correlate with the reference test when its signal is dominated by the reference signal. (C) Each PM’s signal to noise ratio increases with the sum of the reference component and the model-specific component. (D) Each point represents the performance of one of the models in Panel A., in two simulated experiments that include sampling and measurement noise. The horizontal axis shows the results of the first experiment. It is sample predictive validity (the correlation coefficient between the output of the model and the output of the reference test for a sample of therapeutic candidates). The vertical axis is the second experiment. It is test-retest reliability using the same sample of therapeutic candidates (calculated as the correlation coefficient between the results of the test and retest). The symbols (star, diamond, triangle, and cross) show how the space in (A) maps onto the space in (D). The line in (D) shows the best fit for the linear regression between sample PV and test-retest reliability. For the simulation shown, we sampled 400 therapeutic candidates for each PM. Both the reference and model-specific components of PM’s signal were drawn from a normally distributed random variable, whose mean was zero and whose standard deviations were equal to the respective gains on the horizontal and vertical axes of (A) to (C).

References

    1. Scannell J, Blanckley A, Boldon H, Warrington B. Diagnosing the decline in pharmaceutical R&D efficiency. Nat Rev Drug Discov. 2012; 11: p. 191–200. 10.1038/nrd3681 - DOI - PubMed
    1. Hogan JC. Combinatorial chemistry in drug discovery. Nat Biotechnol. 1997; 15: p. 328–330. - PubMed
    1. Geysen HM, Schoenen F, Wagner D, Wagner R. Combinatorial compound libraries for drug discovery: an ongoing challenge. Nat Rev Drug Discov. 2003; 2: p. 222–230. - PubMed
    1. Nature Biotechnology. Combinatorial chemistry. Nat Biotechnol. 2000; 18 supplement: p. IT50–IT52. - PubMed
    1. Dolle RE. Historical overview of chemical library design In Zhou JZ, editor. Chemical Library Design (Methods in Molecular Biology 685).: Springer Science; 2011. p. 3–25. - PubMed

Publication types