Skip to main page content
U.S. flag

An official website of the United States government

Dot gov

The .gov means it’s official.
Federal government websites often end in .gov or .mil. Before sharing sensitive information, make sure you’re on a federal government site.

Https

The site is secure.
The https:// ensures that you are connecting to the official website and that any information you provide is encrypted and transmitted securely.

Access keys NCBI Homepage MyNCBI Homepage Main Content Main Navigation
. 2025 Sep 3;17(1):139.
doi: 10.1186/s13321-025-01047-8.

The first South Korean data challenge for drug discovery using human and mouse liver microsomal stability data

Affiliations

The first South Korean data challenge for drug discovery using human and mouse liver microsomal stability data

Nam-Chul Cho et al. J Cheminform. .

Abstract

The Korea Chemical Bank (KCB) has generated a dataset containing metabolic stability data for approximately 4,000 compounds that have been tested on human and mouse liver microsomes. The first South Korea Data Challenge, named the Jump AI Challenge for Drug Discovery (JUMP AI 2023), was opened using the metabolic stability data of KCB in 2023. The objective of the JUMP AI 2023 was to promote and encourage the development of new drugs using artificial intelligence (AI) technology in South Korea. A total of 1254 teams participated in the competition, developing algorithms to estimate the remaining percentage of compounds after 30 min of incubation with human and mouse liver microsomes. The data set comprised training and test sets of 3498 and 483 compounds, respectively. This paper provides an overview of the JUMP AI 2023 and its outcomes, highlighting the diverse range of algorithms and artificial intelligence technologies employed by the competing teams. Among these, five teams stood out by utilizing GNN-based approaches winning awards. This competition was the first AI competition for drug discovery in South Korea, attracting numerous researchers and playing a key role in promoting drug research through the application of artificial intelligence technologies.

Keywords: JUMP AI 2023; Korea data challenge; Metabolic stability.

PubMed Disclaimer

Conflict of interest statement

Declarations. Competing interests: The authors declare no competing interests.

Figures

Fig. 1
Fig. 1
t-SNE visualization of data distribution between training and test sets. The plot displays the two-dimensional projection of high-dimensional data using t-SNE (t-distributed Stochastic Neighbor Embedding). Training samples (pink circles) and test samples (blue circles) show similar distribution patterns across the projected space, spanning approximately from − 15 to 15 on both axes. The overlapping distribution suggests good representation of the test set within the training data manifold, indicating appropriate dataset splitting for model development and evaluation
Fig. 2
Fig. 2
Distribution of liver microsomal stability data. A Human liver microsomal stability data showing the percentage of remaining compounds across different ranges (0–100%). Training set (orange bars, n = 3,498) and test set (blue bars, n = 483) distributions are displayed. B Mouse liver microsomal stability data showing similar distribution pattern with training set (orange bars, n = 3,498) and test set (blue bars, n = 483). Both species showed the highest number of compounds in the 0–10% range and 90–100%
Fig. 3
Fig. 3
Schematic diagrams of the model architectures used by the five award-winning teams. Figures were adapted from their presentation materials with minor modifications. A Datu team’s model combining a custom graph neural network with contrastive learning-based pretraining and multi-task learning for human (HLM) and mouse (MLM) liver microsomal stability prediction. B Suleezard team’s model employing a directed message passing neural network (D-MPNN) with extensive molecular and atomic feature engineering. C Silryeokeuro Malhae team’s model based on D-MPNN architecture enhanced by global multi-head attention pooling and multi-task learning. D Dimer team’s GraphGPS transformer model integrating long-range dependency modeling, data pruning, and ensemble learning. E Yakgwa Donut team’s MolCLR-based model incorporating contrastive pretraining, quantum descriptors, and ensemble feature integration

References

    1. Zhang K, Yang X, Wang Y, Yunfang Yu, Huang N, Li G, Li X, Wu JC, Yang S (2025) Artificial intelligence in drug development. Nat Med 31:45–59 - PubMed
    1. Hasselgren C, Oprea TI (2024) Artificial intelligence for drug discovery: are we there yet? Annu Rev Pharmacol Toxicol 64:527–550 - PubMed
    1. Dream challenges. https://dreamchallenges.org/. Accessed 30 May 2025.
    1. Kaggle. https://www.kaggle.com/. Accessed 30 May 2025.
    1. Masimirembwa CM, Bredberg U, Andersson TB (2003) Metabolic stability for drug discovery and development: pharmacokinetic and biochemical challenges. Clin Pharmacokinet 42(6):515–528 - PubMed

LinkOut - more resources