Skip to main page content
U.S. flag

An official website of the United States government

Dot gov

The .gov means it’s official.
Federal government websites often end in .gov or .mil. Before sharing sensitive information, make sure you’re on a federal government site.

Https

The site is secure.
The https:// ensures that you are connecting to the official website and that any information you provide is encrypted and transmitted securely.

Access keys NCBI Homepage MyNCBI Homepage Main Content Main Navigation
. 2023 Aug 14;13(1):13204.
doi: 10.1038/s41598-023-38110-z.

Predicting congenital renal tract malformation genes using machine learning

Affiliations

Predicting congenital renal tract malformation genes using machine learning

Mitra Kabir et al. Sci Rep. .

Abstract

Congenital renal tract malformations (RTMs) are the major cause of severe kidney failure in children. Studies to date have identified defined genetic causes for only a minority of human RTMs. While some RTMs may be caused by poorly defined environmental perturbations affecting organogenesis, it is likely that numerous causative genetic variants have yet to be identified. Unfortunately, the speed of discovering further genetic causes for RTMs is limited by challenges in prioritising candidate genes harbouring sequence variants. Here, we exploited the computer-based artificial intelligence methodology of supervised machine learning to identify genes with a high probability of being involved in renal development. These genes, when mutated, are promising candidates for causing RTMs. With this methodology, the machine learning classifier determines which attributes are common to renal development genes and identifies genes possessing these attributes. Here we report the validation of an RTM gene classifier and provide predictions of the RTM association status for all protein-coding genes in the mouse genome. Overall, our predictions, whilst not definitive, can inform the prioritisation of genes when evaluating patient sequence data for genetic diagnosis. This knowledge of renal developmental genes will accelerate the processes of reaching a genetic diagnosis for patients born with RTMs.

PubMed Disclaimer

Conflict of interest statement

The authors declare no competing interests.

Figures

Figure 1
Figure 1
The workflow for predicting mouse RTM genes integrating genomic and protein features using Random Forest classification model. First, features of mouse genes are collated from public databases. Statistical analyses and feature selection were then performed to identify most informative features differentiating between known RTM and non-RTM genes. A Random Forest classifier was built to predict RTM and non-RTM genes from these features. Finally, this classifier was used to predict RTM association status for all protein coding genes in the mouse genome not included in the classifier development.
Figure 2
Figure 2
Distributions of the total gene length, exon length, intron length and protein length in RTM and non-RTM datasets. These violin plots outline distribution of (a) gene length (b) exon length (c) intron length and (d) protein length with overlaid boxplots. The width of the violin plots represents the proportion of the data located there; the top and bottom of the boxplots denote the upper and lower quartiles; the line inside the box denotes the median of the data. The P-values from the Mann–Whitney U tests are reported below their respective graphs.
Figure 3
Figure 3
Distributions of several amino acid residues (%) between RTM and non-RTM mouse proteins. These violin plots outline distribution of the proportion of (a) glycine (b) asparagine (c) proline and (d) isoleucine (e) leucine (f) glutamine residues with overlaid boxplots. The width of the violin plots represents the proportion of the data located there; the top and bottom of the boxplots denote the upper and lower quartiles; the line inside the box denotes the median of the data. The P-values from the Mann–Whitney U tests are reported below their respective graphs.

References

    1. Neild GH. Primary renal disease in young adults with renal failure. Nephrol. Dial. Transplant. 2010;25:1025–1032. doi: 10.1093/ndt/gfp653. - DOI - PubMed
    1. Plumb L, et al. demography of the UK paediatric renal replacement therapy population in 2016. Nephron. 2018;139:105–116. doi: 10.1159/000490962. - DOI - PubMed
    1. Westland R, Renkema KY, Knoers NV. Clinical integration of genome diagnostics for congenital anomalies of the kidney and urinary tract. Clin. J. Am. Soc. Nephrol. 2021;16:128–137. doi: 10.2215/CJN.14661119. - DOI - PMC - PubMed
    1. Woolf AS, Lopes FM, Ranjzad P, Roberts NA. Congenital disorders of the human urinary tract: Recent insights from genetic and molecular studies. Front. Pediatr. 2019;7:136. doi: 10.3389/fped.2019.00136. - DOI - PMC - PubMed
    1. Adalat S, et al. HNF1B mutations associate with hypomagnesemia and renal magnesium wasting. J. Am. Soc. Nephrol. 2009;20:1123–1131. doi: 10.1681/ASN.2008060633. - DOI - PMC - PubMed

Publication types

Supplementary concepts