Skip to main page content
U.S. flag

An official website of the United States government

Dot gov

The .gov means it’s official.
Federal government websites often end in .gov or .mil. Before sharing sensitive information, make sure you’re on a federal government site.

Https

The site is secure.
The https:// ensures that you are connecting to the official website and that any information you provide is encrypted and transmitted securely.

Access keys NCBI Homepage MyNCBI Homepage Main Content Main Navigation
. 2025 Aug 26;16(1):7977.
doi: 10.1038/s41467-025-62825-4.

Predictive design of crystallographic chiral separation

Affiliations

Predictive design of crystallographic chiral separation

Rokas Elijošius et al. Nat Commun. .

Abstract

The efficient separation of chiral molecules is a fundamental challenge in the manufacture of pharmaceuticals and light-polarising materials. We developed an approach that combines machine learning with a physics-based representation to predict resolving agents for chiral molecules, using a transformer-based neural network. In retrospective tests, our approach reaches a four to six-fold improvement over the historical - trial and error based - hit rate. We further validate the model in a prospective experiment, where we use the model to design a resolution screen for six unseen racemates. We successfully resolved three of the six mixtures in a single round of experiments and obtained an overall 8-to-1 true positive to false negative ratio. Together with this study, we release a previously proprietary dataset of over 6000 resolution experiments, the largest diastereomeric salt crystallisation dataset to date. More broadly, our approach and open crystallisation data lay the foundation for accelerating and reducing the costs of chiral resolutions.

PubMed Disclaimer

Conflict of interest statement

Competing interests: A.A.L. is a co-founder and owns equity in PostEra Inc. and Byterat Ltd. L.B., S.B., X.H., J.L.K.M., J.M., N.W.S., Q.Y., and R.M.H. are employed by Pfizer Inc. WPF is employed by Virscidian Inc. and formerly at Pfizer. F.A.F. is employed by AstraZeneca at the time of publication; however, none of the work presented in this manuscript was conducted at or influenced by this affiliation. The remaining authors declare no competing interests.

Figures

Fig. 1
Fig. 1. Pair representations allow differentiation of enantiomers.
Our framework focuses on the representation of the acid-base pair, where differences between the two enantiomers naturally arise. A machine learning model uses the representations of both diastereomeric salts to predict the probability of success for a given resolution experiment.
Fig. 2
Fig. 2. Low-mass-fraction and/or low-enantiomeric-excess data dominate the distribution.
The margins show the distributions of the mass fraction and enantiomeric excess for the successful and failed resolutions. Successful resolutions make up only 3% of the data. For clarity, we normalise the two distributions independently. The ‘Maximum possible EE’ line indicates the largest possible enantiomeric excess for a given solid mass fraction—points falling within 5% of the line—highlighted by the shaded area—were kept to allow for experimental error.
Fig. 3
Fig. 3. Model architecture encodes chemical intuition.
A When interacting with a resolving agent, a racemic mixture forms two diastereomeric acid-base pairs. We create representations for both pairs and input their mean and difference into the model. The atom-density for the highlighted nitrogen shows that the mean of the representations focuses on the local environment around each atom, while the difference component emphasises the long-range intermolecular interactions. B The model uses transformer blocks to create an informative internal representation from the structures of the diastereomeric salts and the differences between them.
Fig. 4
Fig. 4. Using the model quadruples the hit rate.
A Enrichment factor as a function of the number of selected top predictions, estimated using a 5-fold cross-validation experiment where each fold corresponds to unseen chiral salts. The blue line corresponds to encoding all of the participating molecules as random numbers, the orange line corresponds to encoding the molecular structures with Morgan fingerprints, the red line corresponds to another deep learning approach showing state-of-the-art results for chiral chromatography, and the green line corresponds to this work. The shaded area indicates the standard deviation estimated from an ensemble of five models. Note that the 8% base rate is the proportion of hits in the lower-noise subset of the data. B The model trained with the two-step approach outperforms the models directly trained on all training data, either as a classifier or a regressor. C Model performance, measured by average precision, systematically improves as more training data are added. Each point represents the mean performance of an ensemble of 10 models, and the error bars correspond to the standard deviation of the ensemble performance.
Fig. 5
Fig. 5. Attention heads in the model recognise neighbouring atoms in the crystal.
A Examples of pairwise attentions in the model identifying close contacts in the crystal structure. Attention is calculated between the reference atom and the remaining atoms. In each example, we have confirmed that the reference atom and the highest attention atom are within 3.5 Å in the crystal structure. B The mean attentions between neighbouring pairs and non-neighbouring pairs for each attention head in the model across four different crystal structures. Some heads in the model focused on neighbouring pairs up to 50% more compared to the uniform baseline; meanwhile, no head focuses on non-neighbouring pairs. This result arises without the model ever seeing a crystal structure during training. We defined pairs as neighbours if they were within 3.5 Å in the crystal. C An excerpt from one of the X-ray structures, highlighting the non-trivial packing identified by the model.
Fig. 6
Fig. 6. Model identifies successful resolutions in prospective experiments.
A Structures of the resolved racemates and the conditions for the successful resolutions. B Unresolved substrates. C Full confusion matrix of the prospective experiment.

References

    1. Yang, Y., da Costa, R. C., Fuchter, M. J. & Campbell, A. J. Circularly polarized light detection by a chiral organic semiconductor transistor. Nat. Photon.7, 634–638 (2013).
    1. Li, W. et al. Circularly polarized light detection with hot electrons in chiral plasmonic metamaterials. Nat. Commun.6, 8379 (2015). - PMC - PubMed
    1. Schulz, M. et al. Chiral excitonic organic photodiodes for direct detection of circular polarized light. Adv. Funct. Mater.29, 1900684 (2019).
    1. Yang, Y., da Costa, R. C., Smilgies, D.-M., Campbell, A. J. & Fuchter, M. J. Induction of circularly polarized electroluminescence from an achiral light-emitting polymer via a chiral small-molecule dopant. Adv. Mater.25, 2624–2628 (2013). - PMC - PubMed
    1. Wu, Z.-G. et al. Chiral octahydro-binaphthol compound-based thermally activated delayed fluorescence materials for circularly polarized electroluminescence with superior EQE of 32.6% and extremely low efficiency roll-off. Adv. Mater.31, 1900524 (2019). - PubMed

LinkOut - more resources