Insight

Apr 20, 2026

We Taught an AI to Argue With Itself to Catch Audio Deepfakes—and It Works

Surya Koppisetti

Director, Applied AI (Audio)

We’re excited to share that our paper, ICLAD: In-Context Learning with Comparison-Guidance for Audio Deepfake Detection, has been accepted at ACL 2026. Here’s the story behind it — what the problem is, why existing solutions fall short, and what we did differently.The Problem: Detectors That Can’t Leave the Lab

Generative models can now clone a voice from just a few seconds of audio. The downstream risks — fraud, impersonation, disinformation — are well documented. So naturally, there’s been a lot of work on audio deepfake detection.

The catch? Most state-of-the-art detectors are trained on scripted studio speech — clean, controlled, noise-free recordings collected under lab conditions. Datasets like ASVspoof-19 have been invaluable for early benchmarking. Still, they don’t reflect how audio actually sounds in the wild: phone calls, social media clips, noisy environments, spontaneous conversation with all its hesitations and stumbles. More recent datasets such as ASVSpoof5 and MLAAD address this gap, but only in part.

When you take a detector trained on pristine studio data and point it at real-world audio, performance drops sharply. We and others have documented this generalization gap extensively, and it’s a serious obstacle to practical deployment. Retraining on new data every time a new attack emerges isn’t sustainable — especially when real-world speech data is hard to collect and varies significantly across use cases.

Our Approach: In-Context Learning for Audio

Rather than fine-tuning yet another supervised model, we asked: can we use the emergent reasoning capabilities of audio language models (ALMs) to generalize to unseen deepfakes without any retraining at all?

The short answer is yes, but it required careful thought about how to prompt these models.

In a zero-shot setting, ALMs are nearly useless for deepfake detection. They default to predicting one class almost exclusively, and simple few-shot prompting (just interleaving audio clips with labels) performs no better than chance. The model can’t independently infer from the complex acoustic signatures that distinguish real from fake speech.

So we built ICLAD, a two-phase framework centered on a technique we call Pairwise Comparative Reasoning (PCR).

The Core Idea: Force the Model to Argue Both Sides

Here’s the observation that motivated PCR: when you ask an ALM to justify why a clip is real or fake, it will confidently do so — but the same acoustic cue (say, a glitch at a three-second mark) can show up as evidence for both classes depending on how you frame the question. The model hallucinates justifications rather than identifying genuinely discriminative features.

PCR addresses this directly. For each audio sample in our offline database, we prompt the ALM without revealing the ground-truth label to generate evidence for both the real and fake hypotheses simultaneously. Then we reveal the ground truth and ask the model to reconcile the contradiction — to figure out which of its reasons actually hold up, and which were noise or hallucination.

The reconciled explanation is what gets stored in our RAG database. It’s a curated, hallucination-aware rationale that the model can actually learn from during inference.

Phase 2: Retrieval-Augmented Detection at Inference Time

At inference time, we extract Wav2Vec2-AASIST embeddings from the query audio and retrieve the K most acoustically similar examples from our database — along with their paired real/fake evidence and reconciled explanations. These get concatenated into an ICL prompt, and the ALM reasons through the query audio using those examples as context.

We also pair ICLAD with a dynamic routing mechanism. Not all audio needs the ALM, as specialized detectors still outperform on in-distribution, studio-like audio. So we use a k-NN OOD detector to route in-distribution samples to the specialized detector (Wav2Vec2-AASIST) and out-of-distribution samples to ICLAD. This hybrid approach gets the best of both worlds.

Results

We evaluated ICLAD across five datasets totaling 126,348 clips in 42 languages, including two scripted studio sets (ASVspoof 2021, MLAAD-v3) and three in-the-wild sets (ITW, SpoofCeleb, DFEval 2024). This comprehensive testing underscores the robustness of our approach in diverse real-world scenarios.

On in-the-wild data, ICLAD consistently outperforms the specialized baseline. The most striking result is on SpoofCeleb, where ICLAD achieves a macro F1 of 0.665 versus the baseline’s 0.334 — nearly a 2× improvement. The dynamic routing further boosts performance on studio datasets (e.g., +19.6% macro F1 on ASVspoof 2021) by correctly delegating those samples back to the specialist.

Beyond accuracy, ICLAD produces textual rationales alongside every decision. These aren’t just post-hoc justifications — they reflect the model’s actual reasoning process, grounded in acoustic evidence. A manual listening test with 22 expert annotators found that PCR reduces hallucination rates from 18.3% (simple prompting) to 10.0%, an 8.3 percentage-point absolute reduction.

Why We Think This Matters

Most deepfake detection research is still stuck in a train-on-scripted, evaluate-on-scripted loop. ICLAD represents a meaningful step toward detectors that can adapt to novel real-world conditions without costly retraining and that can explain their decisions to foster justified trust or skepticism.

The full paper is available, and we welcome community feedback to refine and advance the state-of-the-art on deepfake detection methods.

This work was conducted as a collaboration between Purdue University and Reality Defender Inc. The paper, “ICLAD: In-Context Learning with Comparison-Guidance for Audio Deepfake Detection,” was accepted to ACL 2026.

Frequently Asked Questions About ICLAD

What is ICLAD, and how does it detect audio deepfakes? ICLAD (In-Context Learning with Comparison-Guidance for Audio Deepfake Detection) is a framework developed by Reality Defender and Purdue University that uses audio language models (ALMs) to detect deepfakes without retraining. It prompts the model to generate evidence for both the real and fake hypotheses simultaneously, a technique called Pairwise Comparative Reasoning, then reconciles those arguments into a hallucination-aware rationale. At inference time, it retrieves acoustically similar examples from a database to guide the ALM’s decision.

Why do audio deepfake detectors fail on real-world audio? Most detectors are trained on scripted studio recordings: clean, controlled, noise-free. Real-world audio includes phone calls, social media clips, background noise, and spontaneous speech. When a model trained on studio data encounters that variation, performance drops sharply. ICLAD is designed to close that gap, in part, without requiring new training data for every new attack type.

How accurate is ICLAD compared to existing deepfake detection methods? On real-world, out-of-distribution audio, ICLAD consistently outperforms specialized baseline models (Table 1). On the SpoofCeleb dataset, it achieved a macro F1 score of 0.665, compared to the baseline’s 0.334. Tested across 126,348 audio clips in 42 languages, it also reduces hallucinated reasoning by 8.3 percentage points compared to standard prompting approaches.

Insights

Insights Into Navigating AI-Generated Threats

Insight

Jul 09, 2026

What is a Deepfake?

Insight

Jul 07, 2026

The Best Deepfake Detection Tools in 2026: A Buyer's Framework

Insight

Jul 06, 2026

A Brief History of Deepfakes

Explore insights

All Solutions

Our Technology

Get the Deepfake Incident Response Playbook

Reality Defender Wins “Most Innovative Startup” at RSA Conference Innovation Sandbox

We Taught an AI to Argue With Itself to Catch Audio Deepfakes—and It Works

Our Approach: In-Context Learning for Audio

The Core Idea: Force the Model to Argue Both Sides

Phase 2: Retrieval-Augmented Detection at Inference Time

Results

Why We Think This Matters

Frequently Asked Questions About ICLAD

Insights Into Navigating AI-Generated Threats

What is a Deepfake?

The Best Deepfake Detection Tools in 2026: A Buyer's Framework

A Brief History of Deepfakes