Skip to main page content
U.S. flag

An official website of the United States government

Dot gov

The .gov means it’s official.
Federal government websites often end in .gov or .mil. Before sharing sensitive information, make sure you’re on a federal government site.

Https

The site is secure.
The https:// ensures that you are connecting to the official website and that any information you provide is encrypted and transmitted securely.

Access keys NCBI Homepage MyNCBI Homepage Main Content Main Navigation
. 2023 Feb 28;56(Pt 2):409-419.
doi: 10.1107/S1600576723000596. eCollection 2023 Apr 1.

CrystalMELA: a new crystallographic machine learning platform for crystal system determination

Affiliations

CrystalMELA: a new crystallographic machine learning platform for crystal system determination

Nicola Corriero et al. J Appl Crystallogr. .

Abstract

Determination of the crystal system and space group is the first step of crystal structure analysis. Often this turns out to be a bottleneck in the material characterization workflow for polycrystalline compounds, thus requiring manual interventions. This work proposes a new machine-learning (ML)-based web platform, CrystalMELA (Crystallography MachinE LeArning), for crystal systems classification. Two different ML models, random forest and convolutional neural network, are available through the platform, as well as the extremely randomized trees algorithm, available from the literature. The ML models learned from simulated powder X-ray diffraction patterns of more than 280 000 published crystal structures from organic, inorganic and metal-organic compounds and minerals which were collected from the POW_COD database. A crystal system classification accuracy of 70%, which improved to more than 90% when considering the Top-2 classification accuracy, was obtained in tenfold cross-validation. The validity of the trained models has also been tested against independent experimental data of published compounds. The classification options in the CrystalMELA platform are powerful, easy to use and supported by a user-friendly graphic interface. They can be extended over time with contributions from the community. The tool is freely available at https://www.ba.ic.cnr.it/softwareic/crystalmela/ following registration.

Keywords: X-ray diffraction; crystal system determination; machine learning web platform.

PubMed Disclaimer

Figures

Figure 1
Figure 1
Distribution of samples among the seven crystal systems (x axis) in the POW_COD (blue) and full (orange) data sets.
Figure 2
Figure 2
Distribution of crystal systems (class label) in organic and inorganic data sets.
Figure 3
Figure 3
Architecture of the CNN composed of 22 layers: the features include an extraction section, constituted by three convolutional blocks each formed by a Conv1D layer followed by activation, dropout and average pooling layers. The number of Conv1D filters is 80 in the first block and increases incrementally by the same amount in each of the subsequent blocks to become 240 in the last one. The kernel size starts at 200 and is divided by 2 in the second block and by 4 in the third one. Other parameters include sub-sample length = 2, padding = ‘same’ and activation function = ‘relu’. The dropout rate is 0.3 in each block, and the average pooling 1D layers use a pool size of 3. The flattened layer is followed by the classification section, constituted by four densely connected blocks, each formed by a dense layer followed by a batch normalization one. The numbers of neurons used in the dense layer are 2800, 1400, 700 and 70. Each dense layer uses a l2 kernel regularizer and the ‘relu’ activation function, except for the last one which uses ‘tanh’. The last block is followed by the output layer formed of seven units (one for each crystal class), with the ‘softmax’ activation function, to ensure that the sum of the seven output neuron values is always equal to 1.
Figure 4
Figure 4
Home web page of the CrystalMELA platform.
Figure 5
Figure 5
Results page. Input diffraction pattern and crystal systems classification report.
Figure 6
Figure 6
History page.
Figure 7
Figure 7
Confusion matrices visualizing and summarizing the performance of the three classification models in CrystalMELA on the experimental data set.

References

    1. Agrawal, A. & Choudhary, A. (2016). APL Mater. 4, 053208.
    1. Aguiar, J., Gong, M., Unocic, R., Tasdizen, T. & Miller, B. (2019). Sci. Adv. 5, eaaw1949. - PMC - PubMed
    1. Altomare, A., Campi, G., Cuocci, C., Eriksson, L., Giacovazzo, C., Moliterni, A., Rizzi, R. & Werner, P.-E. (2009). J. Appl. Cryst. 42, 768–775.
    1. Altomare, A., Corriero, N., Cuocci, C., Falcicchio, A., Moliterni, A. & Rizzi, R. (2015). J. Appl. Cryst. 48, 598–603.
    1. Altomare, A., Cuocci, C., Giacovazzo, C., Moliterni, A., Rizzi, R., Corriero, N. & Falcicchio, A. (2013). J. Appl. Cryst. 46, 1231–1235.