PHISHING EMAIL DETECTION USING LARGE LANGUAGE MODELS (LLMS): A PERFORMANCE EVALUATION OF QWEN AND GEMINI
Main Article Content
Abstract
The increasing complexity of network infrastructure and the increasing sophistication of phishing attacks require advanced cybersecurity solutions. Artificial Intelligence for IT Operations (AIOps) integrates big data analytics, machine learning and automation to improve real-time detection and response to security threats. This study evaluates the zero-shot performance of Large Language Models (LLMs) - Gemini 2.5 Pro, Gemini 2.5 Flash, and Qwen 3 - in detecting phishing emails in an AIOps environment at Institut Teknologi Sepuluh Nopember (ITS). The findings show different strengths: Gemini 2.5 Pro achieved 99.8% accuracy in identifying legitimate emails, minimizing false positives and workflow disruption, while Gemini 2.5 Flash excelled in detecting phishing attempts with 89.1% accuracy, prioritizing threat prevention. Qwen 3 performed poorly, most likely due to its lack of alignment with the nuances of English-language phishing. Achieved without refinement, these results highlight LLM's out-of-the-box efficacy for cybersecurity, offering an accessible and high-performance tool for organizations with limited AI resources. This study underscores the potential of LLM in AIOps to improve automated security monitoring and incident response, advocating for a layered approach that combines smart technology, user training, and organizational policies to effectively combat evolving phishing threats.
Article Details
Section
This work is licensed under a Creative Commons Attribution-NonCommercial 4.0 International License.