CodeBuster

Aaron Yi

← Back to papers

CodeBuster

Aaron Yi

CUCAI 2026 Proceedings - 2026

View PDF Download PDF

Published 2026/03/07

Abstract

With the advent of Artificial Intelligence, academic dishonesty in programming related study has risen dramatically as students use AI tools to mask plagiarism through variable renaming, structural refactoring, and logical changes. We present CodeBuster, a multilayered plagiarism detection system designed to combat these forms of plagiarism using a trained model that computes similarity between two C++ programs more effectively than traditional token only methods. Our approach combines a token based similarity check, a GraphCodeBERT semantic embedding model, and an output similarity check, aggregated into a feature vector and passed through a Multilayer Perceptron (MLP). On our evaluation set, the model achieves an overall accuracy of 0.798 with weighted precision, recall, and F1 scores of 0.801, 0.798, and 0.798, respectively. The confusion matrix (TN=1267, FP=392, FN=247, TP=1253) indicates balanced detection performance across both plagiarized and non plagiarized classes, with strong true positive identification and controlled false negative rates. These results suggest that incorporating semantic and behavioral analysis can have a significant positive detection of sophisticated plagiarism, showing a potential multi layered framework that could bridge the gap between traditional token based detection and semantic-aware, behaviour-aware analysis.