IEEE-CSR - Peer Review & Conference Management System

Summary:

Prompt injection attacks manipulate language
model inputs to bypass intended constraints, extract sensitive
information, or generate misleading responses, posing a significant security risk in real-world applications. To address
this challenge, we propose a Graph Neural Network (GNN)-
based approach that integrates sentiment features and Bidirectional Encoder Representations from Transformers (BERT)
embeddings to effectively detect malicious prompt injections. By
transforming textual data into structured graph representations,
our approach captures both semantic and contextual relationships that conventional models often overlook. We evaluated
our approach against traditional machine learning techniques,
including Random Forest, Logistic Regression, and XGBoost,
demonstrating its superior performance. Experimental results
show that our approach achieves a high detection accuracy of
98.70% and an F1-score of 0.9799, significantly outperforming
conventional methods. Additionally, we provide an in-depth
analysis of computational efficiency, highlighting the trade-offs
between detection effectiveness and model complexity, ensuring
a practical balance between security and performance.

Author(s):

Gaurav Jadhav
University of Essex
United Kingdom

Amit Kumar Singh
University of Essex
United Kingdom

Zeba Khanam
BT Security Research
United Kingdom

Robert Hercock
BT Security Research
United Kingdom