Skip to main page content
U.S. flag

An official website of the United States government

Dot gov

The .gov means it’s official.
Federal government websites often end in .gov or .mil. Before sharing sensitive information, make sure you’re on a federal government site.

Https

The site is secure.
The https:// ensures that you are connecting to the official website and that any information you provide is encrypted and transmitted securely.

Access keys NCBI Homepage MyNCBI Homepage Main Content Main Navigation
. 2024 Jul 3;25(13):7306.
doi: 10.3390/ijms25137306.

Identifying Key Genes Involved in Axillary Lymph Node Metastasis in Breast Cancer Using Advanced RNA-Seq Analysis: A Methodological Approach with GLMQL and MAS

Affiliations

Identifying Key Genes Involved in Axillary Lymph Node Metastasis in Breast Cancer Using Advanced RNA-Seq Analysis: A Methodological Approach with GLMQL and MAS

Mostafa Rezapour et al. Int J Mol Sci. .

Abstract

Our study aims to address the methodological challenges frequently encountered in RNA-Seq data analysis within cancer studies. Specifically, it enhances the identification of key genes involved in axillary lymph node metastasis (ALNM) in breast cancer. We employ Generalized Linear Models with Quasi-Likelihood (GLMQLs) to manage the inherently discrete and overdispersed nature of RNA-Seq data, marking a significant improvement over conventional methods such as the t-test, which assumes a normal distribution and equal variances across samples. We utilize the Trimmed Mean of M-values (TMMs) method for normalization to address library-specific compositional differences effectively. Our study focuses on a distinct cohort of 104 untreated patients from the TCGA Breast Invasive Carcinoma (BRCA) dataset to maintain an untainted genetic profile, thereby providing more accurate insights into the genetic underpinnings of lymph node metastasis. This strategic selection paves the way for developing early intervention strategies and targeted therapies. Our analysis is exclusively dedicated to protein-coding genes, enriched by the Magnitude Altitude Scoring (MAS) system, which rigorously identifies key genes that could serve as predictors in developing an ALNM predictive model. Our novel approach has pinpointed several genes significantly linked to ALNM in breast cancer, offering vital insights into the molecular dynamics of cancer development and metastasis. These genes, including ERBB2, CCNA1, FOXC2, LEFTY2, VTN, ACKR3, and PTGS2, are involved in key processes like apoptosis, epithelial-mesenchymal transition, angiogenesis, response to hypoxia, and KRAS signaling pathways, which are crucial for tumor virulence and the spread of metastases. Moreover, the approach has also emphasized the importance of the small proline-rich protein family (SPRR), including SPRR2B, SPRR2E, and SPRR2D, recognized for their significant involvement in cancer-related pathways and their potential as therapeutic targets. Important transcripts such as H3C10, H1-2, PADI4, and others have been highlighted as critical in modulating the chromatin structure and gene expression, fundamental for the progression and spread of cancer.

Keywords: RNA sequencing (RNA-Seq); axillary lymph node metastasis (ALNM); breast cancer; gene expression analysis; generalized linear models; quasi-likelihood F test.

PubMed Disclaimer

Conflict of interest statement

The authors declare no conflict of interest.

Figures

Figure 1
Figure 1
T-SNE projection of TMM-normalized gene expression data, illustrating the logistic regression decision boundary between ALNM− and ALNM+ samples. The red background represents areas predicted as ALNM+, while the blue background indicates ALNM−. The accompanying confusion matrix showcases the challenge of distinguishing samples without the application of GLMQL-MAS filtering.
Figure 2
Figure 2
Volcano plots showcasing BH-significant genes (with threshold of |LogFC|>1) identified by the GLMQL-MAS system in the comparison of 42 ALNM+ samples against 65 ALNM− samples, highlighting 309 upregulated and 559 downregulated genes.
Figure 3
Figure 3
T-SNE projection using only GLMQL-MAS BH-significant genes, with the logistic regression decision boundary indicating improved separability between ALNM+ and ALNM− samples. The red background represents areas predicted as ALNM+, while the blue background indicates ALNM−. The included confusion matrix further illustrates the efficacy of the GLMQL-MAS filtering process.
Figure 4
Figure 4
The figure demonstrates the robustness and independence of the GLMQL-MAS methodology from the |LogFC| threshold, emphasizing its capability to consistently select genes that maintain the largest possible values for both |LogFC| and the absolute log(BH-adjusted p-value), ensuring significant and impactful gene identification.
Figure 5
Figure 5
This figure evaluates the performance of a logistic regression model classifying samples as ALNM+ or ALNM− based on principal components derived from gene expression data. It compares metrics before and after applying the GLMQL-MAS methodology to highlight its effectiveness in refining gene selection. The analysis shows improvements in sensitivity, specificity, F1 score, and accuracy across principal components ranging from 2 to 20. The goal is not to develop a predictive model but to demonstrate the enhanced separation of disease states and the utility of GLMQL-MAS in biological data interpretation.
Figure 6
Figure 6
Distribution of the Top 20 Genes by BH-Significance Occurrence Using GLMQL-MAS. This figure illustrates the top-20 genes that achieved the highest frequency of BH-significance in an analysis of 500 iterations, with |LogFC| upper thresholds set at 1. It highlights the genes that consistently demonstrate significant differential expression in lymph node metastasis of breast cancer.
Figure 7
Figure 7
Top 10 GO Processes Related to Lymph Node Metastasis. This figure details the GO processes most intimately connected with the pathology of lymph node metastasis, providing insights into the molecular functions and cellular components affected.
Figure 8
Figure 8
This figure illustrates a selection of the significant pathways identified from the GSEA using Hallmark gene sets, highlighting the predominant biological mechanisms influenced by the GLMQL-MAS selected genes between 42 ALNM positive (ALNM+) and 65 ALNM negative (ALNM−) breast cancer samples.
Figure 9
Figure 9
Top 100 consistent genes in random selections meeting GLMQL-MAS criteria displayed in one of the 50 GSEA Hallmark sets, specifically highlighting the key genes in cancer progression and metastasis in red.
Figure 10
Figure 10
This figure displays common gene ontology (GO) processes and their associated upregulated BH-significant genes from analyses of 42 ALNM+ samples compared to ALNM− samples in two scenarios: Case 1, comparing 65 ALNM− versus 42 ALNM+; and Case 2, involving a random sampling of 42 ALNM− to compare against the same 42 ALNM+.
Figure 11
Figure 11
This figure illustrates a network that maps the most relevant Hallmark pathways to breast cancer, highlighting the common GLMQL-MAS selected genes from both Case 1 and Case 2.
Figure 12
Figure 12
The flowchart details the criteria used to refine the initial dataset of 1078 patients down to 104 untreated individuals based on the availability of ALNM information and absence of prior treatment. The final cohort is categorized into two groups: 65 without ALNM (ALNM−) and 42 with ALNM (ALNM+), facilitating the study of genetic and molecular markers associated with lymph node metastasis.

Similar articles

Cited by

References

    1. Cronin K.A., Lake A.J., Scott S., Sherman R.L., Noone A.M., Howlader N., Henley S.J., Anderson R.N., Firth A.U., Ma J., et al. Annual Report to the Nation on the Status of Cancer, part I: National cancer statistics. Cancer. 2018;124:2785–2800. doi: 10.1002/cncr.31551. - DOI - PMC - PubMed
    1. Negoita S., Chen H.S., Sanchez P.V., Sherman R.L., Henley S.J., Siegel R.L., Sung H., Scott S., Benard V.B., Kohler B.A., et al. Annual Report to the Nation on the Status of Cancer, part 2: Early assessment of the COVID-19 pandemic’s impact on cancer diagnosis. Cancer. 2024;130:117–127. doi: 10.1002/cncr.35026. - DOI - PMC - PubMed
    1. Cancer Stat Facts: Female Breast Cancer. [(accessed on 10 January 2024)]; Available online: https://seer.cancer.gov/statfacts/html/breast.html.
    1. DeSantis C.E., Bray F., Ferlay J., Lortet-Tieulent J., Anderson B.O., Jemal A. International variation in female breast cancer incidence and mortality rates. Cancer Epidemiol. Biomark. Prev. 2015;24:1495–1506. doi: 10.1158/1055-9965.EPI-15-0535. - DOI - PubMed
    1. American Cancer Society. [(accessed on 10 January 2024)]. Available online: https://www.cancer.org/research/cancer-facts-statistics/all-cancer-facts....

Substances