Comprehensive Evaluation of Explainable AI in Misinformation Detection: An Integrated Framework for Transformer-Based, Retrieval-Augmented, and LLM-Enhanced Approaches
Sudrisha Sarkar
CUCAI 2026 Proceedings - 2026
Abstract
Accurate and interpretable misinformation detection models are essential for mitigating the spread of false informa- tion in online platforms. While transformer-based architectures such as DistilBERT and RoBERTa achieve strong classification performance, their opaque decision processes limit user trust and adoption. This work presents TruthLens, an explainable AI framework for claim classification and evidence retrieval across political and medical domains. We fine-tune domain- specific transformers on the LIAR and FakeHealth datasets, and apply LIME to generate token-level explanations for each prediction. To support counter-evidence generation, we implement a retrieval-augmented generation (RAG) pipeline combining BM25 retrieval over curated sources (Wikipedia, WHO, Snopes) with LLM-based summarization under strict citation constraints. We evaluate the classifiers using macro-F1 and calibration error, and assess explanation faithfulness via deletion tests. Our findings highlight the trade-offs between classification accuracy, explanation interpretability, and evidence quality, providing practical guidance for deploying explainable misinformation detection tools in high-stakes digital environments