Skip to main page content
U.S. flag

An official website of the United States government

Dot gov

The .gov means it’s official.
Federal government websites often end in .gov or .mil. Before sharing sensitive information, make sure you’re on a federal government site.

Https

The site is secure.
The https:// ensures that you are connecting to the official website and that any information you provide is encrypted and transmitted securely.

Access keys NCBI Homepage MyNCBI Homepage Main Content Main Navigation
. 2017 Sep 12:4:170127.
doi: 10.1038/sdata.2017.127.

Machine-learned and codified synthesis parameters of oxide materials

Affiliations

Machine-learned and codified synthesis parameters of oxide materials

Edward Kim et al. Sci Data. .

Abstract

Predictive materials design has rapidly accelerated in recent years with the advent of large-scale resources, such as materials structure and property databases generated by ab initio computations. In the absence of analogous ab initio frameworks for materials synthesis, high-throughput and machine learning techniques have recently been harnessed to generate synthesis strategies for select materials of interest. Still, a community-accessible, autonomously-compiled synthesis planning resource which spans across materials systems has not yet been developed. In this work, we present a collection of aggregated synthesis parameters computed using the text contained within over 640,000 journal articles using state-of-the-art natural language processing and machine learning techniques. We provide a dataset of synthesis parameters, compiled autonomously across 30 different oxide systems, in a format optimized for planning novel syntheses of materials.

PubMed Disclaimer

Conflict of interest statement

The authors declare no competing financial interests.

Figures

Figure 1
Figure 1. Schematic overview of text extraction and database construction.
Each colored object represents a high-level step in the automated workflow for retrieving journal articles and processing text into codified synthesis parameters. Materials synthesis articles are fed into a NLP pipeline, which computes a machine-readable database of synthesis parameters across numerous materials systems. These parameters can then be queried to produce synthesis planning resources, including, empirical distributions of real-valued parameters and ranked lists of keywords.
Figure 2
Figure 2. Neural-network and parse-based synthesis parameter extraction.
(a) A hierarchical neural network assigns labels (e.g., ‘MATERIAL’) to words one-at-a-time by converting words to embedding and heuristic vector representations, and outputting to a classifier. The embeddings of a five-word window are considered for each prediction. Each layer is densely connected, with the hidden layer concatenating each of the two input layers. The final layer is a softmax (classifier) computed over each possible word category. (b) A grammatical parse of a sentence is used to resolve word-level labels (below colored bars) into sequential word-chunk-level labels (above colored bars), followed by resolution into word-chunk relations (curved arcs).
Figure 3
Figure 3. Topic and synthesis target distributions within the database.
Heatmap showing a sample of topic distributions plotted against material systems of interest. Topics are computed from training a Latent Dirichlet Allocation model on 640,000 journal articles, and are labelled by their top-ranked keywords. Values of the heatmap represent column-normalized counts across all articles within a material system.
Figure 4
Figure 4. Top occurring material mentions per target material system.
Heatmap showing a sample of co-occurring mentions of materials within synthesis routes for material systems of interest. Values of the heatmap represent column-normalized counts across all articles within a material system. Counts of self-mentioning co-occurrences (e.g., ZnO mentioned in papers synthesizing ZnO) are fixed to zero prior to column normalization and plotted in grey.
Figure 5
Figure 5. Temperature and time distributions for titania.
(a) Calcination and hydrothermal temperature kernel density estimate for titania, normalized to unit area. (b) Calcination and hydrothermal time kernel density estimate for titania, normalized to unit area. All density estimates are computed using Gaussian kernels computed from counts of temperatures and times extracted from synthesis sections of journal articles.
Figure 6
Figure 6. Learning curve for neural-network word classifier.
The baseline accuracy and F1 score are plotted as horizontal lines, computed from the baseline neural network on the maximum number of training words. The solid curves are computed from the human-trained neural network, showing accuracy and F1 score as a function of training data volume.

References

Data Citations

    1. Kim E. 2017. figshare. https://doi.org/10.6084/m9.figshare.5221351 - DOI

References

    1. Jain A. et al. Commentary: The Materials Project: A materials genome approach to accelerating materials innovation. APL Mater 1, 11002 (2013).
    1. Curtarolo S. et al. The high-throughput highway to computational materials design. Nat. Mater. 12, 191–201 (2013). - PubMed
    1. Pyzer-Knapp E. O., Li K. & Aspuru-Guzik A. Learning from the Harvard Clean Energy Project: The Use of Neural Networks to Accelerate Materials Discovery. Adv. Funct. Mater. 25, 6495–6502 (2015).
    1. Ghadbeigi L., Harada J. K., Lettiere B. R. & Sparks T. D. Performance and resource considerations of Li-ion battery electrode materials. Energy Environ. Sci 8, 1640–1650 (2015).
    1. Saal J. E., Kirklin S., Aykol M., Meredig B. & Wolverton C. Materials design and discovery with high-throughput density functional theory: The open quantum materials database (OQMD). JOM 65, 1501–1509 (2013).

Publication types

LinkOut - more resources