Skip to main page content
U.S. flag

An official website of the United States government

Dot gov

The .gov means it’s official.
Federal government websites often end in .gov or .mil. Before sharing sensitive information, make sure you’re on a federal government site.

Https

The site is secure.
The https:// ensures that you are connecting to the official website and that any information you provide is encrypted and transmitted securely.

Access keys NCBI Homepage MyNCBI Homepage Main Content Main Navigation
. 2023 Jun 12;63(11):3288-3306.
doi: 10.1021/acs.jcim.3c00460. Epub 2023 May 19.

Interpretable Machine Learning Models for Phase Prediction in Polymerization-Induced Self-Assembly

Affiliations

Interpretable Machine Learning Models for Phase Prediction in Polymerization-Induced Self-Assembly

Yiwen Lu et al. J Chem Inf Model. .

Abstract

While polymerization-induced self-assembly (PISA) has become a preferred synthetic route toward amphiphilic block copolymer self-assemblies, predicting their phase behavior from experimental design is extremely challenging, requiring time and work-intensive creation of empirical phase diagrams whenever self-assemblies of novel monomer pairs are sought for specific applications. To alleviate this burden, we develop here the first framework for a data-driven methodology for the probabilistic modeling of PISA morphologies based on a selection and suitable adaption of statistical machine learning methods. As the complexity of PISA precludes generating large volumes of training data with in silico simulations, we focus on interpretable low variance methods that can be interrogated for conformity with chemical intuition and that promise to work well with only 592 training data points which we curated from the PISA literature. We found that among the evaluated linear models, generalized additive models, and rule and tree ensembles, all but the linear models show a decent interpolation performance with around 0.2 estimated error rate and 1 bit expected cross entropy loss (surprisal) when predicting the mixture of morphologies formed from monomer pairs already encountered in the training data. When considering extrapolation to new monomer combinations, the model performance is weaker but the best model (random forest) still achieves highly nontrivial prediction performance (0.27 error rate, 1.6 bit surprisal), which renders it a good candidate to support the creation of empirical phase diagrams for new monomers and conditions. Indeed, we find in three case studies that, when used to actively learn phase diagrams, the model is able to select a smart set of experiments that lead to satisfactory phase diagrams after observing only relatively few data points (5-16) for the targeted conditions. The data set as well as all model training and evaluation codes are publicly available through the GitHub repository of the last author.

PubMed Disclaimer

Conflict of interest statement

The authors declare no competing financial interest.

Figures

Figure 1
Figure 1
Outline of the aqueous RAFT PISA process and resulting phase diagrams. (A) Illustration of morphology evolution during PISA, starting from the soluble corona polymer block, with core monomer dispersed or emulsified in the aqueous solvent. As the polymerization proceeds, an amphiphilic block copolymer is formed. At a critical core length, self-assembly into spherical micelles occurs. Typically, these spheres phase transition into worms, then into vesicles, as the core length grows further. Inset i shows an example RAFT PISA starting from a poly(cysteine methacryate) (PCysMA) corona polymer and 2-hydroxypropyl methacrylate (HPMA) core monomer, to form a PCysMA-b-PHPMA diblock copolymer. R and Z groups are generic end groups resulting from the RAFT agent of choice. Inset ii shows the different internal structures of spheres, worms, and vesicles. (B) Experimental empirical phase diagram for PCysMA31-b-PHPMAy at 70 °C, where s, w, and v, denote spheres, worms, and vesicles, respectively (adapted with permission from Ladmiral and Armes et al. Copyright 2015 Royal Society of Chemistry). (C) Example of an algorithm derived probabilistic phase diagram for PCysMA31-b-PHPMAy at 70 °C using the proposed framework (with random forest model). Points surrounded by a diamond indicate training data actively selected by the model for this monomer combination.
Figure 2
Figure 2
Structures of all corona-forming (top) and core-forming (bottom) monomers investigated in this study.
Figure 3
Figure 3
Overview of data set. Left: count of monomer combinations in core and corona blocks where components of copolymers are counted proportionally, resulting in 0 counts after rounding for some cross-linkers. Center: counts, empirical probabilities (relative counts), and empirical information entropy of different phases. The table contrasts actual empirical probabilities (p) and naïve probabilities (q) obtained by simply multiplying the individual morphology probabilities; the resulting differences in empirical information entropy H(p) = ∑i=116pi log pi and cross-entropy H(p, q) = ∑i=116pi log qi illustrates the potential loss of information when modeling morphologies individually instead of jointly (see Performance Estimation); last row (gray) gives the column totals where the totals in the first four columns are the marginal totals of the corresponding morphology. Right: histograms of reaction condition using 12 equal-width bins (x-axis in log-scale).
Figure 4
Figure 4
Additive model designs. Probability of an individual morphology is modeled via its log odds that, in turn, is given as sum of interpretable terms. Top: Linear model with one term per covariate of the form βjXj. Center: GAM model, which generalizes the linear terms to some univariate transformations fj(Xj). Bottom: Additive rule ensemble, where each term (rule) takes on a constant value (rule consequent) within a rectangular region defined by a subset of covariates (rule condition).
Figure 5
Figure 5
Overall model performance assessment as estimated by cross validation. The error rate for predicting the presence of an individual morphology (LHS), or predicting the correct morphological mixture (center) is shown, along with the log loss (degree of surprisal, RHS). Horizontal lines indicate uninformed baseline (assuming uniform phase probabilities) and informed baseline (assuming marginal phase probabilities estimated from the full dataset). Error bars indicate the standard errors across the 30-fold cross-validation.
Figure 6
Figure 6
GAM five most important variables per morphology with their partial dependency plots (importance score after colon, unit of variable in brackets where applicable). The y-axis represents the average model probability of the morphology occurring for the corresponding variable value on the x-axis. Actual probabilities can differ for specific values of the other variables only by an additive constant; i.e., the plot faithfully represents the effect of the variable on the modeled probability.
Figure 7
Figure 7
Random forest five most important variables per morphology with their partial dependency plots (importance score after colon, unit of variable in brackets where applicable). The y-axis represents the average model probability of the morphology occurring for the corresponding variable value on the x-axis. The effect can differ for specific values of the other variables.
Figure 8
Figure 8
Snapshots of actively learned phase diagrams and acquisition values (i.e., degree of prediction uncertainty, in orange) for different numbers of condition-specific training points (m). The leftmost column shows initial phase diagram without any data of the target condition, the rightmost shows the smallest number of training points with full phase error 0.

References

    1. Karayianni M.; Pispas S. Block copolymer solution self-assembly: Recent advances, emerging trends, and applications. J. Polym. Sci. 2021, 59 (17), 1874–1898. 10.1002/pol.20210430. - DOI
    1. Cabral H.; Miyata K.; Osada K.; Kataoka K. Block copolymer micelles in nanomedicine applications. Chem. Rev. 2018, 118 (14), 6844–6892. 10.1021/acs.chemrev.8b00199. - DOI - PubMed
    1. Jin Q.; Deng Y.; Chen X.; Ji J. Rational design of cancer nanomedicine for simultaneous stealth surface and enhanced cellular uptake. ACS Nano 2019, 13 (2), 954–977. 10.1021/acsnano.8b07746. - DOI - PubMed
    1. Bockstaller M. R.; Mickiewicz R. A.; Thomas E. L. Block copolymer nanocomposites: perspectives for tailored functional materials. Adv. Mater. 2005, 17 (11), 1331–1349. 10.1002/adma.200500167. - DOI - PubMed
    1. Kim H.-C.; Park S.-M.; Hinsberg W. D. Block copolymer based nanostructures: materials, processes, and applications to electronics. Chem. Rev. 2010, 110 (1), 146–177. 10.1021/cr900159v. - DOI - PubMed

Publication types