A novel data clustering algorithm based on gravity center methodology

dc.authoridkuwil, Farag/0000-0001-6630-3918
dc.authoridATILA, UMIT/0000-0002-1576-9977
dc.contributor.authorKuwil, Farag Hamed
dc.contributor.authorAtila, Umit
dc.contributor.authorAbu-Issa, Radwan
dc.contributor.authorMurtagh, Fionn
dc.date.accessioned2024-09-29T15:57:10Z
dc.date.available2024-09-29T15:57:10Z
dc.date.issued2020
dc.departmentKarabük Üniversitesien_US
dc.description.abstractThe concept of clustering is to separate clusters based on the similarity which is greater within cluster than among clusters. The similarity consists of two principles, namely, connectivity and cohesion. However, in partitional clustering, while some algorithms such as K-means and K-medians divides the dataset points according to the first principle (connectivity) based on centroid clusters without any regard to the second principle (cohesion), some others like K-medoids partially consider cohesion in addition to connectivity. This prevents to discover clusters with convex shape and results are affected negatively by outliers. In this paper a new Gravity Center Clustering (GCC) algorithm is proposed which depends on critical distance (lambda) to define threshold among clusters. The algorithm falls under partition clustering and is based on gravity center which is a point within cluster that verifies both the connectivity and cohesion in determining the similarity of each point in the dataset. Therefore, the proposed algorithm deals with any shape of data better than K-means, K-medians and K-medoids. Furthermore, GCC algorithm does not need any parameters beforehand to perform clustering but can help user improving the control over clustering results and deal with overlapping and outliers providing two coefficients and an indicator. In this study, 22 experiments are conducted using different types of synthetic, and real healthcare datasets. The results show that the proposed algorithm satisfies the concept of clustering and provides great flexibility to get the optimal solution especially since clustering is considered as an optimization problem. (C) 2020 Elsevier Ltd. All rights reserved.en_US
dc.description.sponsorshipDepartment of Computer Engineering at Karabuk Universityen_US
dc.description.sponsorshipWe would like to express our gratitude to the management of the Department of Computer Engineering at Karabuk University for supporting our research by providing the use of Big Data laboratory. Also special thanks to Hamed Atia and Ali Belal for their support.en_US
dc.identifier.doi10.1016/j.eswa.2020.113435
dc.identifier.issn0957-4174
dc.identifier.issn1873-6793
dc.identifier.scopus2-s2.0-85084338374en_US
dc.identifier.scopusqualityQ1en_US
dc.identifier.urihttps://doi.org/10.1016/j.eswa.2020.113435
dc.identifier.urihttps://hdl.handle.net/20.500.14619/4629
dc.identifier.volume156en_US
dc.identifier.wosWOS:000542130000002en_US
dc.identifier.wosqualityQ1en_US
dc.indekslendigikaynakWeb of Scienceen_US
dc.indekslendigikaynakScopusen_US
dc.language.isoenen_US
dc.publisherPergamon-Elsevier Science Ltden_US
dc.relation.ispartofExpert Systems With Applicationsen_US
dc.relation.publicationcategoryMakale - Uluslararası Hakemli Dergi - Kurum Öğretim Elemanıen_US
dc.rightsinfo:eu-repo/semantics/openAccessen_US
dc.subjectAlgorithmen_US
dc.subjectCluster analysisen_US
dc.subjectEuclidean distanceen_US
dc.subjectGravity centeren_US
dc.subjectPartitional clusteringen_US
dc.titleA novel data clustering algorithm based on gravity center methodologyen_US
dc.typeArticleen_US

Dosyalar