A practical framework for early detection of diabetes using ensemble machine learning models

dc.authoridSonuc, Emrullah/0000-0001-7425-6963
dc.contributor.authorSaihood, Qusay
dc.contributor.authorSonuc, Emrullah
dc.date.accessioned2024-09-29T16:09:55Z
dc.date.available2024-09-29T16:09:55Z
dc.date.issued2023
dc.departmentKarabük Üniversitesien_US
dc.description.abstractThe diagnosis of diabetes, a prevalent global health condition, is crucial for preventing severe complications. In recent years, there has been a growing effort to develop intelligent diagnostic systems for diabetes utilizing machine learning (ML) algorithms. Despite these efforts, achieving high accuracy rates using such systems remains a significant challenge. Recent advancements in ensemble ML methods offer promising opportunities for early detection of diabetes, as they are known to be faster and more cost-effective than traditional approaches. Therefore, this study proposes a practical framework for diagnosing diabetes that involves three stages. The data preprocessing stage encompasses several crucial tasks, including handling missing values, identifying outliers, balancing the data, normalizing the data, and selecting relevant features. Subsequently, the hyperparameters of the ML algorithms are fine-tuned using grid search to improve their performance. In the final stage, the framework employs ensemble techniques such as bagging, boosting, and stacking to combine multiple ML algorithms and further enhance their predictive capability. Pima Indians Diabetes Database open-access dataset was used to test the performance of the proposed models. The experimental results of this framework indicate the superiority of ensemble methods in diagnosing diabetes compared to individual ML models. The stacking method achieved the best accuracy among the ensemble methods, with the stacked random forest (RF) and support vector machine (SVM) model attaining an accuracy of 97.50%. Among the bagging methods, the RF model yielded the highest accuracy, while among the boosting methods, eXtreme Gradient Boosting (XGB) model achieved the highest accuracy rates of 97.20% and 97.10%, respectively. Moreover, our proposed framework outperforms other ML models as confirmed by the comparison. The study has demonstrated that ensemble methods are crucial for accurate diabetes diagnosis, enabling early detection through efficient preprocessing and calibrated models.en_US
dc.identifier.doi10.55730/1300-0632.4013
dc.identifier.endpage738en_US
dc.identifier.issn1300-0632
dc.identifier.issn1303-6203
dc.identifier.issue4en_US
dc.identifier.scopus2-s2.0-85169681199en_US
dc.identifier.scopusqualityQ3en_US
dc.identifier.startpage722en_US
dc.identifier.trdizinid1194009en_US
dc.identifier.urihttps://doi.org/10.55730/1300-0632.4013
dc.identifier.urihttps://search.trdizin.gov.tr/tr/yayin/detay/1194009
dc.identifier.urihttps://hdl.handle.net/20.500.14619/7844
dc.identifier.volume31en_US
dc.identifier.wosWOS:001043194400003en_US
dc.identifier.wosqualityQ4en_US
dc.indekslendigikaynakWeb of Scienceen_US
dc.indekslendigikaynakScopusen_US
dc.indekslendigikaynakTR-Dizinen_US
dc.language.isoenen_US
dc.publisherTubitak Scientific & Technological Research Council Turkeyen_US
dc.relation.ispartofTurkish Journal of Electrical Engineering and Computer Sciencesen_US
dc.relation.publicationcategoryMakale - Uluslararası Hakemli Dergi - Kurum Öğretim Elemanıen_US
dc.rightsinfo:eu-repo/semantics/openAccessen_US
dc.subjectMachine learningen_US
dc.subjectensemble learningen_US
dc.subjectdiabetes diagnosisen_US
dc.subjectclassificationen_US
dc.titleA practical framework for early detection of diabetes using ensemble machine learning modelsen_US
dc.typeArticleen_US

Dosyalar