A practical framework for early detection of diabetes using ensemble machine learning models

Saihood, Qusay; Sonuc, Emrullah

A practical framework for early detection of diabetes using ensemble machine learning models

dc.authorid	Sonuc, Emrullah/0000-0001-7425-6963
dc.contributor.author	Saihood, Qusay
dc.contributor.author	Sonuc, Emrullah
dc.date.accessioned	2024-09-29T16:09:55Z
dc.date.available	2024-09-29T16:09:55Z
dc.date.issued	2023
dc.department	Karabük Üniversitesi	en_US
dc.description.abstract	The diagnosis of diabetes, a prevalent global health condition, is crucial for preventing severe complications. In recent years, there has been a growing effort to develop intelligent diagnostic systems for diabetes utilizing machine learning (ML) algorithms. Despite these efforts, achieving high accuracy rates using such systems remains a significant challenge. Recent advancements in ensemble ML methods offer promising opportunities for early detection of diabetes, as they are known to be faster and more cost-effective than traditional approaches. Therefore, this study proposes a practical framework for diagnosing diabetes that involves three stages. The data preprocessing stage encompasses several crucial tasks, including handling missing values, identifying outliers, balancing the data, normalizing the data, and selecting relevant features. Subsequently, the hyperparameters of the ML algorithms are fine-tuned using grid search to improve their performance. In the final stage, the framework employs ensemble techniques such as bagging, boosting, and stacking to combine multiple ML algorithms and further enhance their predictive capability. Pima Indians Diabetes Database open-access dataset was used to test the performance of the proposed models. The experimental results of this framework indicate the superiority of ensemble methods in diagnosing diabetes compared to individual ML models. The stacking method achieved the best accuracy among the ensemble methods, with the stacked random forest (RF) and support vector machine (SVM) model attaining an accuracy of 97.50%. Among the bagging methods, the RF model yielded the highest accuracy, while among the boosting methods, eXtreme Gradient Boosting (XGB) model achieved the highest accuracy rates of 97.20% and 97.10%, respectively. Moreover, our proposed framework outperforms other ML models as confirmed by the comparison. The study has demonstrated that ensemble methods are crucial for accurate diabetes diagnosis, enabling early detection through efficient preprocessing and calibrated models.	en_US
dc.identifier.doi	10.55730/1300-0632.4013
dc.identifier.endpage	738	en_US
dc.identifier.issn	1300-0632
dc.identifier.issn	1303-6203
dc.identifier.issue	4	en_US
dc.identifier.scopus	2-s2.0-85169681199	en_US
dc.identifier.scopusquality	Q3	en_US
dc.identifier.startpage	722	en_US
dc.identifier.trdizinid	1194009	en_US
dc.identifier.uri	https://doi.org/10.55730/1300-0632.4013
dc.identifier.uri	https://search.trdizin.gov.tr/tr/yayin/detay/1194009
dc.identifier.uri	https://hdl.handle.net/20.500.14619/7844
dc.identifier.volume	31	en_US
dc.identifier.wos	WOS:001043194400003	en_US
dc.identifier.wosquality	Q4	en_US
dc.indekslendigikaynak	Web of Science	en_US
dc.indekslendigikaynak	Scopus	en_US
dc.indekslendigikaynak	TR-Dizin	en_US
dc.language.iso	en	en_US
dc.publisher	Tubitak Scientific & Technological Research Council Turkey	en_US
dc.relation.ispartof	Turkish Journal of Electrical Engineering and Computer Sciences	en_US
dc.relation.publicationcategory	Makale - Uluslararası Hakemli Dergi - Kurum Öğretim Elemanı	en_US
dc.rights	info:eu-repo/semantics/openAccess	en_US
dc.subject	Machine learning	en_US
dc.subject	ensemble learning	en_US
dc.subject	diabetes diagnosis	en_US
dc.subject	classification	en_US
dc.title	A practical framework for early detection of diabetes using ensemble machine learning models	en_US
dc.type	Article	en_US

Koleksiyon

WoS İndeksli Yayınlar Koleksiyonu
Scopus İndeksli Yayınlar Koleksiyonu
TR-Dizin İndeksli Yayınlar Koleksiyonu

A practical framework for early detection of diabetes using ensemble machine learning models

Dosyalar

Koleksiyon