Skip to main page content
U.S. flag

An official website of the United States government

Dot gov

The .gov means it’s official.
Federal government websites often end in .gov or .mil. Before sharing sensitive information, make sure you’re on a federal government site.

Https

The site is secure.
The https:// ensures that you are connecting to the official website and that any information you provide is encrypted and transmitted securely.

Access keys NCBI Homepage MyNCBI Homepage Main Content Main Navigation
. 2026 Jan 17;18(1):23.
doi: 10.1186/s13321-025-01096-z.

Universal feature selection for simultaneous interpretability of multitask datasets

Affiliations

Universal feature selection for simultaneous interpretability of multitask datasets

Matt Raymond et al. J Cheminform. .

Abstract

Extracting meaningful features from complex, high-dimensional datasets across scientific domains remains challenging. Current methods often struggle with scalability, limiting their applicability to large datasets, or make restrictive assumptions about feature-property relationships, hindering their ability to capture complex interactions. BoUTS's general and scalable feature selection algorithm surpasses these limitations by identifying both universal features relevant to all datasets and task-specific features predictive for specific subsets. Evaluated on seven diverse chemical regression datasets, BoUTS achieves state-of-the-art feature sparsity while generally maintaining prediction accuracy comparable to specialized methods. Notably, BoUTS's universal features enable domain-specific knowledge transfer between datasets, and we expect these results to be broadly useful to manually-guided inverse problems. Beyond its current application, BoUTS holds potential for elucidating data-poor systems by leveraging information from similar data-rich systems.Scientific Contribution: BoUTS selects nonlinear, universally informative features across multiple datasets. We identify crucial "universal features" across seven real-world chemistry datasets, which enhance cross-dataset interpretability and selection stability. BoUTS is highly scalable and is applicable to tabular data from many domains, and our results identify connections between seemingly unrelated chemical domains.

Keywords: Dimensionality reduction; Multi-output; Multi-source; Variable selection.

PubMed Disclaimer

Conflict of interest statement

Declarations. Competing interests: The authors declare no competing interests.

Figures

Fig. 1
Fig. 1
Overview of BoUTS algorithm and datasets: a illustrates the BoUTS algorithm for the case where T=3. The boosted multitask trees are trained on all multitask datasets (circles) to estimate (upper diamonds) each task output. Single-task boosting estimates the residuals of the multitask trees (middle diamonds). We sum over multi- and single-task outputs for the final estimate (lower diamonds). In b, we show the splitting process for multitask trees. The improvement (impurity decrease) is computed for each task/feature combination (square), and f is selected as the feature with the maximin improvement. f splits each dataset (partial circles), and we repeat until a stopping condition is reached. c illustrates the assignment of datasets to categories, with square size indicating the logarithm of the size. For each dataset (starting at the top row), we have n=[11,0797771,1852,1431472063,071]. d shows the correlation between dataset outputs. Proteins are not included because we use only one protein dataset. n values for the lower triangles, grouped by column, are logP: n=[7771,1852,143], logHs: n=[479614], Tb: n=[822], zeta potential: n=[119]. e) t-SNE plot of each data point (using the complete feature set), colored by molecule type. For small molecule, NP, and protein, n=[11,0793,071234], respectively
Fig. 2
Fig. 2
Ablation tests and analysis of BoUTS selected features: Feature ablation tests for BoUTS are shown in a, b, and c, and compare our selected features to specialized prediction methods. Violin plots show the performance distribution; the inner bars indicate the 25th and 75th percentiles, and the outer bars indicate the 5th and 95th percentiles. The white dot indicates the median performance. The top of d shows the dataset size (top axis) and the selection stability of single-task gradient-boosted feature selection (bottom axis). The bars indicate the 95% confidence interval. In the bottom section, the upper bar shows the mean stability across all tasks. The lower bar shows the stability of BoUTS s universal features, with the 95% confidence interval as a black bar. e shows the absolute Spearman correlation between the universal features as a graph, with clusters indicated by gray circles and node colors indicating the categories that selected that feature. An alternative visualization is provided in Fig. 9
Fig. 3
Fig. 3
Comparing the performance of BoUTS and competing selection methods: The top half of a shows the performance of all evaluated feature selection algorithms compared to specialized methods. The violin plots are defined in Fig. 2a. The bottom half shows the number of features selected by each method. No features are indicated for the specialized methods, as they are not selection methods. The hatched section indicates the universal or common features that are selected, and the remaining features are task-specific. Plots b and c are defined similarly for the property and scale categories
Fig. 4
Fig. 4
The runtime of each feature selection method (in seconds) on each category of datasets
Fig. 5
Fig. 5
Examples of BoUTS and Dirty LASSO on non-chemical datasets: The performance is quantified using R2, and the bars are hatched to indicate that universal features are being use
Algorithm 1
Algorithm 1
Universal splitting condition
Algorithm 2
Algorithm 2
Task-specific splitting condition
Algorithm 3
Algorithm 3
BoUTS boosting algorithm
Fig. 6
Fig. 6
Demonstrating that LightGBM outperforms ridge regression: a, b, and c show comparisons between LightGBM and ridge regression for features selected by Dirty LASSO. Violin plots show the performance distribution; the inner bars indicate the 25th and 75th percentiles, and the outer bars indicate the 5th and 95th percentiles. The white dot indicates the median performance. Hatched plots indicate that only the universal features were used, and unhatched plots indicate that universal and task-specific features were used
Fig. 7
Fig. 7
Demonstrating that LightGBM is the best overall model on the original feature set: a, b, and c show comparisons between multiple machine learning algorithms on the chemistry datasets used in this study. Here, we evaluate the performance of each method without feature selection to highlight their innate performance. Violin plots show the performance distribution; the inner bars indicate the 25th and 75th percentiles, and the outer bars indicate the 5th and 95th percentiles. The white dot indicates the median performance. “KNN” indicates “k-nearest neighbors,” “Ridge ” indicates “ridge regression,” “Kernel Ridge” indicates “kernel ridge regression,” and “NN” indicates a “neural network”
Fig. 8
Fig. 8
Demonstrating that LightGBM is the best overall model using the universal features selected by BoUTS: a, b, and c show comparisons between multiple machine learning algorithms on the chemistry datasets used in this study. Here, we evaluate the performance of each method with feature selection to highlight their innate performance. Violin plots show the performance distribution; the inner bars indicate the 25th and 75th percentiles, and the outer bars indicate the 5th and 95th percentiles. The white dot indicates the median performance. “KNN” indicates “k-nearest neighbors,” “Ridge” indicates “ridge regression,” “Kernel Ridge” indicates “kernel ridge regression,” and “NN” indicates a “neural network”
Fig. 9
Fig. 9
Alternative visualization of Fig. 2e with included feature names. The edges indicate the absolute Spearman correlation between the universal features selected for each category, with clusters indicated by circular brackets on the outside of the graph. The colors in each node indicate the categories that selected that feature
Fig. 10
Fig. 10
An example of conditionally-important universal features: Here, we plot a synthetic 3d dataset with 1000 samples in each cluster. Both tasks share one universally important feature, and each task has one feature that is important for that task but has zero predictive power for the other task. Notably, the universal and task-specific features are only predictive when both features are selected. Red indicates a y value of 1, and blue a y value of 1. The right plot has been rotated around the vertical axis to improve visualization

References

    1. Dereli O, Oğuz C, Gönen M (2019) A multitask multiple kernel learning algorithm for survival analysis with application to cancer biology. In: Chaudhuri K, Salakhutdinov R (eds) Proceedings of the 36th International Conference on Machine Learning, Proceedings of Machine Learning Research, vol 97. PMLR, Long Beach, CA, USA, pp 1576–1585. https://proceedings.mlr.press/v97/dereli19a.html
    1. Valmarska A, Miljkovic D, Konitsiotis S, et al (2017) Combining multitask learning and short time series analysis in parkinson’s disease patients stratification. In: ten Teije A, Popow C, Holmes JH, et al (eds) 16th Conference on Artificial Intelligence in Medicine, vol 10259. Springer International Publishing, Vienna, Austria, pp 116–125. 10.1007/978-3-319-59758-4_13
    1. Yuan H, Paskov I, Paskov H et al (2016) Multitask learning improves prediction of cancer drug sensitivity. Sci Rep 6(1):31619. 10.1038/srep31619 - DOI - PMC - PubMed
    1. Sun X, Araujo RB, Santos EC et al (2024) Advancing electrocatalytic reactions through mapping key intermediates to active sites via descriptors. Chem Soc Rev 53:7392–7425. 10.1039/D3CS01130E - DOI - PubMed
    1. Weng B, Song Z, Zhu R et al (2020) Simple descriptor derived from symbolic regression accelerating the discovery of new perovskite catalysts. Nat Commun 11(1):3513. 10.1038/s41467-020-17263-9 - DOI - PMC - PubMed

LinkOut - more resources