Examining the Impact of Feature Selection Methods on Text Classification

dc.authoridKARACA, Mehmet Fatih/0000-0002-7612-1437
dc.contributor.authorKaraca, Mehmet Fatih
dc.contributor.authorBayir, Safak
dc.date.accessioned2024-09-29T16:11:36Z
dc.date.available2024-09-29T16:11:36Z
dc.date.issued2017
dc.departmentKarabük Üniversitesien_US
dc.description.abstractFeature selection that aims to determine and select the distinctive terms representing a best document is one of the most important steps of classification. With the feature selection, dimension of document vectors are reduced and consequently duration of the process is shortened. In this study, feature selection methods were studied in terms of dimension reduction rates, classification success rates, and dimension reduction-classification success relation. As classifiers, kNN (k-Nearest Neighbors) and SVM (Support Vector Machines) were used. 5 standard (Odds Ratio-OR, Mutual Information-MI, Information Gain-IG, Chi-Square-CHI and Document Frequency-DF), 2 combined (Union of Feature Selections-UFS and Correlation of Union of Feature Selections-CUFS) and 1 new (Sum of Term Frequency-STF) feature selection methods were tested. The application was performed by selecting 100 to 1000 terms (with an increment of 100 terms) from each class. It was seen that kNN produces much better results than SVM. STF was found out to be the most successful feature selection considering the average values in both datasets. It was also found out that CUFS, a combined model, is the one that reduces the dimension the most, accordingly, it was seen that CUFS classify the documents more successfully with less terms and in short period compared to many of the standard methods.en_US
dc.identifier.endpage388en_US
dc.identifier.issn2158-107X
dc.identifier.issn2156-5570
dc.identifier.issue12en_US
dc.identifier.startpage380en_US
dc.identifier.urihttps://hdl.handle.net/20.500.14619/8554
dc.identifier.volume8en_US
dc.identifier.wosWOS:000423921400050en_US
dc.identifier.wosqualityN/Aen_US
dc.indekslendigikaynakWeb of Scienceen_US
dc.language.isoenen_US
dc.publisherScience & Information Sai Organization Ltden_US
dc.relation.ispartofInternational Journal of Advanced Computer Science and Applicationsen_US
dc.relation.publicationcategoryMakale - Uluslararası Hakemli Dergi - Kurum Öğretim Elemanıen_US
dc.rightsinfo:eu-repo/semantics/closedAccessen_US
dc.subjectFeature selectionen_US
dc.subjecttext classificationen_US
dc.subjecttext miningen_US
dc.subjectk-Nearest Neighborsen_US
dc.subjectsupport vector machinesen_US
dc.titleExamining the Impact of Feature Selection Methods on Text Classificationen_US
dc.typeArticleen_US

Dosyalar