DeepfakeGuard: An Open-Source Multi-Modal Toolkit for Detecting AI-Generated Video and Audio

Aryan Biswas

← Back to papers

DeepfakeGuard: An Open-Source Multi-Modal Toolkit for Detecting AI-Generated Video and Audio

Aryan Biswas

CUCAI 2026 Proceedings - 2026

View PDF Download PDF

Published 2026/03/07

Abstract

AI-generated video and audio have become an everyday attack surface for fraud, harassment, and political manipulation, yet effective defensive tooling remains largely confined to research labs. The 2026 International AI Safety Report documents that human observers correctly flag high-quality synthetic media only a small fraction of the time, and industry threat intelligence confirms hundreds of thousands of deepfakes already circulating on major platforms. We present DeepfakeGuard, an open-source Python library that unifies three complementary detection modalities into a single, pip-installable framework: (1) a DINOv3 Vision Transformer detector that achieves 0.88 AUROC under strict cross-dataset evaluation via parameter-efficient LayerNorm tuning; (2) a LipFD audio-visual detector that identifies lip-sync deepfakes by exploiting temporal inconsistency between audio and visual lip movements; and (3) a D3 temporal-volatility detector that requires zero training and flags AI-generated video through second-order motion features. A domain-aware ensemble system fuses all three modalities using trust-weighted scoring with applicability gating and outlier veto, and an optional post-hoc Vision-Language Model module provides human-readable forensic explanations grounded in visual evidence. All components are exposed through a unified Python API and a Streamlit demonstration interface. We argue that open, multi-modal defensive tooling is a necessary counterweight to the democratization of generative AI.