SentiDariPers: Sentiment Analysis of Dari-Persian Tweets Based on People’s Views and Opinion

Küçük Resim Yok

Tarih

2023

Dergi Başlığı

Dergi ISSN

Cilt Başlığı

Yayıncı

Springer Science and Business Media Deutschland GmbH

Erişim Hakkı

info:eu-repo/semantics/closedAccess

Özet

In the research area of sentiment analysis, there is a noticeable gap when it comes to the Dari-Persian dialect. To bridge this gap, our research aimed to curate a comprehensive dataset encompassing people’s opinions in this specific language variant. This paper presents the development of a benchmark sentiment annotated dataset for the Dari dialect of Persian, which serves as an official language of Afghanistan. The dataset, named “SentiDariPers”, comprises 43,089 tweets posted between August 2021 and April 2023. It has been manually annotated with four sentiment classes: Negative, Positive, Neutral, and Mixed. We applied a range of models, such as Support Vector Machine (SVM), Long Short-Term Memory (LSTM), Bi-directional Long Short-Term Memory (Bi-LSTM), Gated Recurrent Unit (GRU), and Convolutional Neural Network (CNN). Additionally, we develop an ensemble model that combines different sets of sentiment classes for each system. We present a detailed comparative analysis of the results obtained from these models. Experimental findings demonstrate that the ensemble model achieves the highest accuracy 91%. We provide insights into the data collection and annotation process, offer relevant dataset statistics, discuss the experimental results, and provide further analysis of the data. © 2023, The Author(s), under exclusive license to Springer Nature Switzerland AG.

Açıklama

9th International Conference on Technologies and Innovation, CITI 2023 -- 13 November 2023 through 16 November 2023 -- Guayaquil -- 303389

Anahtar Kelimeler

Dari-Persian, Dataset Creation, Deep Learning, Sentiment Analysis

Kaynak

Communications in Computer and Information Science

WoS Q Değeri

Scopus Q Değeri

Q4

Cilt

1873 CCIS

Sayı

Künye