Skip to main page content
U.S. flag

An official website of the United States government

Dot gov

The .gov means it’s official.
Federal government websites often end in .gov or .mil. Before sharing sensitive information, make sure you’re on a federal government site.

Https

The site is secure.
The https:// ensures that you are connecting to the official website and that any information you provide is encrypted and transmitted securely.

Access keys NCBI Homepage MyNCBI Homepage Main Content Main Navigation
. 2021 Mar 26;2(4):100225.
doi: 10.1016/j.patter.2021.100225. eCollection 2021 Apr 9.

Machine learning discovery of high-temperature polymers

Affiliations

Machine learning discovery of high-temperature polymers

Lei Tao et al. Patterns (N Y). .

Abstract

To formulate a machine learning (ML) model to establish the polymer's structure-property correlation for glass transition temperature T g , we collect a diverse set of nearly 13,000 real homopolymers from the largest polymer database, PoLyInfo. We train the deep neural network (DNN) model with 6,923 experimental T g values using Morgan fingerprint representations of chemical structures for these polymers. Interestingly, the trained DNN model can reasonably predict the unknown T g values of polymers with distinct molecular structures, in comparison with molecular dynamics simulations and experimental results. With the validated transferability and generalization ability, the ML model is utilized for high-throughput screening of nearly one million hypothetical polymers. We identify more than 65,000 promising candidates with T g > 200°C, which is 30 times more than existing known high-temperature polymers (∼2,000 from PoLyInfo). The discovery of this large number of promising candidates will be of significant interest in the development and design of high-temperature polymers.

Keywords: feature representation; glass transition temperature; high-throughput screening; machine learning; polymer.

PubMed Disclaimer

Conflict of interest statement

The authors declare no competing interests.

Figures

None
Graphical abstract
Figure 1
Figure 1
Chemical space visualization of dataset-1, dataset-2, and dataset-3 (A) 2D visualization based on descriptors and fingerprints using the t-SNE algorithm. Dataset-1 has reported Tg values, and each data point is colored based on the corresponding Tg value. Dataset-2 and dataset-3 do not have reported Tg values, colored with yellow and red, respectively. (B) Set diagram showing representative substructures in dataset-1 (green circle), dataset-2 (yellow circle), and dataset-3 (red circle) based on Morgan fingerprint. Some substructures are common for all datasets, while some others are unique to certain datasets.
Figure 2
Figure 2
Three types of feature representation calculated based on the polymer's SMILES notation for ML models: molecular descriptor, Morgan fingerprint, and image
Figure 3
Figure 3
Performance of four ML models (A) The Lasso regression model using descriptors as input features (Lasso_Descriptor model). (B) The Lasso regression model using fingerprints as input features (Lasso_Fingerprint model). (C) The DNN model using fingerprints as input features (DNN_Fingerprint model). (D) The CNN model using images as input features (CNN_Image model). (E) The comparison between the MD-simulated Tg and the ML-predicted Tg on 20 polymers randomly selected from dataset-2. Three dashed lines are a unity line and lines with a mean absolute error of 40°C. The chemical structure of these 20 polymers is followed by their MD-simulated Tg value.
Figure 4
Figure 4
Substructures with the highest absolute weight based on Morgan fingerprint and Lasso ML model The central atom of the substructures is highlighted in blue. Aromatic atoms are highlighted in yellow. Connectivity of Atoms is highlighted in light gray.
Figure 5
Figure 5
High-throughput screening of high Tg polymers with the DNN_Fingerprint model The Tg distribution of the dataset-1, dataset-2, and dataset-3 are plotted in green, yellow, and red, respectively. The polymer samples on the right are following by their predicated Tg and true Tg values. For the sample in dataset-1 (green box), true Tg is the collected experimental value. For the samples in dataset-2 (yellow box) and dataset-3 (red box), true Tg is the MD-simulated value. More than 1,000 real polymers and 65,000 hypothetical polymers were discovered with Tg > 200°C.
Figure 6
Figure 6
Comparison of key substructures and functional groups in high-Tg(>200°C) polymers (A) Comparison of the 18 substructures recognized in Figure 4. (B) Comparison of the six high-Tg-related functional groups recognized in Table 5.

Similar articles

Cited by

References

    1. Hergenrother P.M. The use, design, synthesis, and properties of high performance/high temperature polymers: an overview. High Perform. Polym. 2003;15:3–45.
    1. Meador M.A. Recent advances in the development of processable high-temperature polymers. Annu. Rev. Mater. Sci. 1998;28:599–630.
    1. Mittal K.L. Vol. 3. CRC Press; 2005. (Polyimides and Other High Temperature Polymers: Synthesis, Characterization and Applications).
    1. Sperati C.A., Starkweather H.W. Fortschritte Der Hochpolymeren-Forschung. Springer; 1961. Fluorine-containing polymers. II. Polytetrafluoroethylene; pp. 465–495.
    1. Petrie E. Extreme high temperature thermoplastics: gateway to the future or the same old trail. Pop. Plast. Packag, 2012;57:30–43.