Machine Learning‑Based Analysis of Breast Cancer Data from University of Calabar Teaching Hospital Patients
| Received 29 Oct, 2025 |
Accepted 20 Jan, 2026 |
Published 31 Mar, 2026 |
Background and Objective: Breast cancer remains a leading cause of morbidity among women, with early detection critical for improving outcomes. This study aimed to identify significant predictors of malignancy and evaluate the effectiveness of machine learning models in diagnostic classification using patient data from the University of Calabar Teaching Hospital. Materials and Methods: A retrospective analysis was conducted on 213 patients treated between January, 2019 and August 2024. Demographic and clinical variables, including age, menopause status, tumor size, invasive lymph nodes, metastasis, breast quadrant, personal/family history of breast disease, and diagnosis outcome, were collected. Descriptive statistics, density plots, and inferential tests (Chi-square, t-test, ANOVA) were performed to examine differences between benign and malignant cases. A random forest classifier was trained to predict malignancy, and feature importance was analyzed to determine key contributors to model performance. Results: Of the patients analyzed, 117 had benign, and 90 had malignant diagnoses, with peak incidence in the 45-55 year age range. Tumor size, lymph node involvement, and metastasis were right-skewed, indicating early-stage presentation for most patients; malignant tumors were larger and occurred in older women. Significant differences were observed between benign and malignant groups in age, tumor size, metastasis, and lymph node involvement (p<0.05). Menopause status was significantly associated with tumor size. The Random Forest model achieved >90% accuracy and a kappa statistic of 84.54%, with tumor size, invasive nodes, metastasis, and age identified as the most important predictive features. Other variables, including breast quadrant, menopause status, and family history, contributed complementary diagnostic information. Conclusion: This study demonstrates that integrating classical statistical methods with machine learning can provide actionable insights for early breast cancer detection and risk stratification. Tumor size and lymph node status were reaffirmed as key clinical predictors. Limitations include missing values and data confined to a single institution. Future studies should use larger, multicenter datasets to enhance generalizability and refine predictive performance. Findings support the potential of data-driven models to assist in diagnostic decisions and personalized care pathways.
How to Cite this paper?
APA-7 Style
Omini,
A.A., Concord,
D.l. (2026). Machine Learning‑Based Analysis of Breast Cancer Data from University of Calabar Teaching Hospital Patients. Trends in Biological Sciences, 2(1), 61-73. https://doi.org/10.21124/tbs.2026.61.73
ACS Style
Omini,
A.A.; Concord,
D.l. Machine Learning‑Based Analysis of Breast Cancer Data from University of Calabar Teaching Hospital Patients. Trends Biol. Sci 2026, 2, 61-73. https://doi.org/10.21124/tbs.2026.61.73
AMA Style
Omini
AA, Concord
Dl. Machine Learning‑Based Analysis of Breast Cancer Data from University of Calabar Teaching Hospital Patients. Trends in Biological Sciences. 2026; 2(1): 61-73. https://doi.org/10.21124/tbs.2026.61.73
Chicago/Turabian Style
Omini, Abam, Ayeni, and Diala leona Concord.
2026. "Machine Learning‑Based Analysis of Breast Cancer Data from University of Calabar Teaching Hospital Patients" Trends in Biological Sciences 2, no. 1: 61-73. https://doi.org/10.21124/tbs.2026.61.73

This work is licensed under a Creative Commons Attribution 4.0 International License.


