Post-hoc Selective Classification for Reliable Synthetic Image Detection

By Kaixiang Zheng and Jacob Seidman, PhD

Published on 5/9/26

As synthetic images become increasingly realistic, reliable synthetic image detection techniques are of pressing need to prevent their misuse. Despite satisfactory in-distribution performance, deep neural network-based synthetic image detectors (SIDs) lack reliability in deployment and often fail in the presence of common covariate shifts, resulting in poor detection accuracy. To avoid the risk caused by potential errors, we adopt a selective classification (SC) strategy by allowing SIDs to abstain from making low confidence predictions. For practicality, we focus on post-hoc methods which perform confidence estimation on a given SID without retraining. However, we show that conventional logit-based confidence score functions (CSFs) exhibit pathological behavior under covariate shifts, leading to SC performance close to or even worse than random guessing. To address this, we propose a simple yet effective SC framework for Reliable Synthetic Image Detection (ReSIDe). First, we generalize the notion of logits to an SID's intermediate layers from a centroid matching perspective, extending the use of logit-based CSFs to any layer of an SID. Then, we introduce a preference optimization algorithm that aggregates confidence scores extracted from different layers to a final confidence estimate by minimizing an upper bound of the area under the risk-coverage curve (AURC). Extensive experimental results show that ReSIDe significantly boosts the SC performance of various logit-based CSFs under common covariate shifts, achieving up to 69.55% AURC reduction.

Research

All Solutions

Our Technology

Get the Deepfake Incident Response Playbook

Reality Defender Wins “Most Innovative Startup” at RSA Conference Innovation Sandbox

Post-hoc Selective Classification for Reliable Synthetic Image Detection

Published on 5/9/26

Read More of Our Peer-Reviewed Research, Published in Top Journals

Alethia: A Foundational Encoder for Voice Deepfakes

ICLAD: In-Context Learning with Comparison-Guidance for Audio Deepfake Detection

Patent: Generalizing audio deepfake detection by exploring style-linguistics mismatch