Skip to main page content
U.S. flag

An official website of the United States government

Dot gov

The .gov means it’s official.
Federal government websites often end in .gov or .mil. Before sharing sensitive information, make sure you’re on a federal government site.

Https

The site is secure.
The https:// ensures that you are connecting to the official website and that any information you provide is encrypted and transmitted securely.

Access keys NCBI Homepage MyNCBI Homepage Main Content Main Navigation
Review
. 2023 Nov 22;25(1):bbad479.
doi: 10.1093/bib/bbad479.

A review of genetic variant databases and machine learning tools for predicting the pathogenicity of breast cancer

Affiliations
Review

A review of genetic variant databases and machine learning tools for predicting the pathogenicity of breast cancer

Rahaf M Ahmad et al. Brief Bioinform. .

Abstract

Studies continue to uncover contributing risk factors for breast cancer (BC) development including genetic variants. Advances in machine learning and big data generated from genetic sequencing can now be used for predicting BC pathogenicity. However, it is unclear which tool developed for pathogenicity prediction is most suited for predicting the impact and pathogenicity of variant effects. A significant challenge is to determine the most suitable data source for each tool since different tools can yield different prediction results with different data inputs. To this end, this work reviews genetic variant databases and tools used specifically for the prediction of BC pathogenicity. We provide a description of existing genetic variants databases and, where appropriate, the diseases for which they have been established. Through example, we illustrate how they can be used for prediction of BC pathogenicity and discuss their associated advantages and disadvantages. We conclude that the tools that are specialized by training on multiple diverse datasets from different databases for the same disease have enhanced accuracy and specificity and are thereby more helpful to the clinicians in predicting and diagnosing BC as early as possible.

Keywords: artificial intelligence; breast cancer; data science; genetic variants database; machine learning; pathogenicity prediction.

PubMed Disclaimer

Figures

Figure 1
Figure 1
The types of variants based on cell type and alteration type.
Figure 2
Figure 2
The classification of variants, their definition and their clinical effect.
Figure 3
Figure 3
Examples on cancer and general variants databases and applications.
Figure 4
Figure 4
Applying ML in the pathogenicity prediction research. This figure was modified from Won et al. [96].
Figure 5
Figure 5
The main workflow of the pathogenicity prediction research using ML.

References

    1. Hinton GE, Osindero S, Teh Y-W. A fast learning algorithm for deep belief nets. Neural Comput 2006;18(7):1527–54. - PubMed
    1. Lecun Y, Bengio Y, Hinton G. Deep learning. Nature 2015;521(7553):436–44. - PubMed
    1. Shendure J, Balasubramanian S, Church GM, et al. DNA sequencing at 40: past, present and future. Nature 2017;550(7676):345–353. - PubMed
    1. National Cancer Institute . Cancer Stat Facts: Common Cancer Sites. 2022.
    1. Siegel RL, Miller KD, Jemal A. Cancer statistics, 2019. CA Cancer J Clin 2019;69(1):7–34. - PubMed

Publication types