ANALISIS SENTIMEN ULASAN PENGGUNA CORETAX DI APLIKASI TWITTER MENGGUNAKAN ALGORITMA NAIVE BAYES DAN SUPPORT VECTOR MACHINE
Main Article Content
Abstract
This study aims to evaluate user sentiment towards the latest tax application, Coretax, and compare the effectiveness of two classification algorithms, namely Support Vector Machine (SVM) and Naive Bayes. The data was collected using web scraping techniques from the X platform (formerly Twitter) and through a series of pre-processing processes, including data cleansing, case folding, normalization, stopwords, stemming, labeling, and visualization. After that, feature weighting was carried out using the Term Frequency-Inverse Document Frequency (TF-IDF) method, as well as feature selection using the SelectKBest and Chi-Square methods.
The results of the evaluation showed that both algorithms performed very well in classifying sentiment. SVM recorded an accuracy of 92%, with precision, recall, and F1-scores of 93%, 94%, and 94%, respectively, for negative sentiment, and 90%, 88%, and 89% for positive sentiment. Meanwhile, the Naive Bayes algorithm also showed similar performance with 92% accuracy, 90% precision, 99% recall, and 94% F1-score for negative sentiment, as well as 97% precision, 80% recall, and 88% F1-score for positive sentiment. These results indicate that both algorithms are quite reliable at recognizing negative sentiments, although there is still room for improvement in detecting positive sentiment.
Keywords: Coretax, Sentiment Analysis, Support Vector Machine, Naive Bayes
ABSTRAK
Penelitian ini bertujuan untuk mengidentifikasi persepsi atau sentimen pengguna terhadap aplikasi perpajakan terbaru, yaitu Coretax, serta membandingkan performa kedua metode klasifikasi, yakni Support Vector Machine (SVM) dan Naive Bayes. Data diperoleh dengan teknik web scraping dari platform X (dahulu dikenal sebagai Twitter), kemudian diolah melalui serangkaian tahap pra-pemrosesan yang mencakup pembersihan teks (cleaning), perubahan huruf menjadi format seragam (case folding), normalisasi (normalized), penghapusan kata-kata umum (stopword), proses stemming (stemming), pelabelan data (labeling), dan visualisasi (visualized). Setelah semua langkah itu, dilakukan pembobotan kata menggunakan metode Term Frequency-Inverse Document Frequency (TF-IDF) dan pemilihan fitur menggunakan pendekatan SelectKBest serta pengujian Chi-Square.
Dari hasil pengujian, kedua algoritma menunjukkan hasil yang sangat baik dalam mengklasifikasikan data sentimen. Algoritma SVM mencapai akurasi sebesar 92%, dengan nilai precision, recall, dan F1-score masing-masing adalah 93%, 94%, dan 94% untuk sentimen negatif. Sementara itu, untuk sentimen positif, angkanya adalah 90%, 88%, dan 89%. Di sisi lain, algoritma Naive Bayes juga mencatat akurasi sebesar 92%, dengan precision 90%, recall 99%, dan F1-score 94% pada sentimen negatif, serta precision 97%, recall 80%, dan F1-score 88% untuk sentimen positif. Temuan ini menunjukkan bahwa kedua metode efektif dalam mengidentifikasi sentimen negatif, tetapi klasifikasi untuk sentimen positif masih perlu perbaikan dalam akurasinya.
Kata Kunci: Analisis Sentimen, Coretax, Support Vector Machine, Naive Bayes
Article Details
Section
This work is licensed under a Creative Commons Attribution-NonCommercial 4.0 International License.