Examining the Impact of Feature Selection Methods on Text Classification

Karaca, Mehmet Fatih; Bayir, Safak

Examining the Impact of Feature Selection Methods on Text Classification

dc.authorid	KARACA, Mehmet Fatih/0000-0002-7612-1437
dc.contributor.author	Karaca, Mehmet Fatih
dc.contributor.author	Bayir, Safak
dc.date.accessioned	2024-09-29T16:11:36Z
dc.date.available	2024-09-29T16:11:36Z
dc.date.issued	2017
dc.department	Karabük Üniversitesi	en_US
dc.description.abstract	Feature selection that aims to determine and select the distinctive terms representing a best document is one of the most important steps of classification. With the feature selection, dimension of document vectors are reduced and consequently duration of the process is shortened. In this study, feature selection methods were studied in terms of dimension reduction rates, classification success rates, and dimension reduction-classification success relation. As classifiers, kNN (k-Nearest Neighbors) and SVM (Support Vector Machines) were used. 5 standard (Odds Ratio-OR, Mutual Information-MI, Information Gain-IG, Chi-Square-CHI and Document Frequency-DF), 2 combined (Union of Feature Selections-UFS and Correlation of Union of Feature Selections-CUFS) and 1 new (Sum of Term Frequency-STF) feature selection methods were tested. The application was performed by selecting 100 to 1000 terms (with an increment of 100 terms) from each class. It was seen that kNN produces much better results than SVM. STF was found out to be the most successful feature selection considering the average values in both datasets. It was also found out that CUFS, a combined model, is the one that reduces the dimension the most, accordingly, it was seen that CUFS classify the documents more successfully with less terms and in short period compared to many of the standard methods.	en_US
dc.identifier.endpage	388	en_US
dc.identifier.issn	2158-107X
dc.identifier.issn	2156-5570
dc.identifier.issue	12	en_US
dc.identifier.startpage	380	en_US
dc.identifier.uri	https://hdl.handle.net/20.500.14619/8554
dc.identifier.volume	8	en_US
dc.identifier.wos	WOS:000423921400050	en_US
dc.identifier.wosquality	N/A	en_US
dc.indekslendigikaynak	Web of Science	en_US
dc.language.iso	en	en_US
dc.publisher	Science & Information Sai Organization Ltd	en_US
dc.relation.ispartof	International Journal of Advanced Computer Science and Applications	en_US
dc.relation.publicationcategory	Makale - Uluslararası Hakemli Dergi - Kurum Öğretim Elemanı	en_US
dc.rights	info:eu-repo/semantics/closedAccess	en_US
dc.subject	Feature selection	en_US
dc.subject	text classification	en_US
dc.subject	text mining	en_US
dc.subject	k-Nearest Neighbors	en_US
dc.subject	support vector machines	en_US
dc.title	Examining the Impact of Feature Selection Methods on Text Classification	en_US
dc.type	Article	en_US

Koleksiyon

WoS İndeksli Yayınlar Koleksiyonu

Examining the Impact of Feature Selection Methods on Text Classification

Dosyalar

Koleksiyon