PERBANDINGAN ALGORITMA KLASIFIKASI DALAM DATA MINING: DECISION TREE C4.5 DAN K-NEAREST NEIGHBOR (KNN) BERDASARKAN ANALISIS LITERATUR
Main Article Content
Abstract
This study is a comparative analysis aimed at evaluating the performance of Decision Tree C4.5 and K-Nearest Neighbor (KNN) based on a review of twelve scientific journals that applied these algorithms across various domains, including disease classification, graduation prediction, scholarship classification, and sentiment analysis. The evaluation was conducted by comparing key performance metrics such as accuracy, AUC, precision, and recall, while also considering aspects of interpretability and computational efficiency. The analysis revealed that Decision Tree C4.5 excels in interpretability and accuracy with categorical and complex datasets. In contrast, KNN performs better with small datasets featuring non-linear distributions, although it is highly dependent on the selection of the k value and data preprocessing. This study concludes that the choice of classification algorithm should be tailored to the data characteristics, the need for result interpretability, and computational efficiency.
Penelitian ini merupakan studi komparatif untuk menganalisis performa dari Decision Tree C4.5 dan K-Nearest Neighbor (KNN) berdasarkan telaah terhadap dua belas jurnal ilmiah yang menerapkan kedua algoritma dalam berbagai domain, seperti klasifikasi penyakit, prediksi kelulusan, klasifikasi beasiswa, hingga analisis sentimen. Evaluasi dilakukan dengan membandingkan metrik kinerja utama seperti akurasi, AUC, presisi, dan recall, serta mempertimbangkan aspek interpretabilitas dan efisiensi komputasi. Hasil analisis menunjukkan bahwa Decision Tree C4.5 unggul dalam interpretabilitas dan akurasi pada dataset kategorik dan kompleks. Sebaliknya, KNN lebih unggul pada dataset kecil dengan distribusi non-linear, walaupun sangat bergantung pada pemilihan nilai k dan preprocessing data. Penelitian ini menyimpulkan bahwa pemilihan algoritma klasifikasi harus disesuaikan dengan karakteristik data, kebutuhan interpretasi hasil, dan efisiensi komputasi.
Article Details
Section
This work is licensed under a Creative Commons Attribution-NonCommercial 4.0 International License.