Not All Deepfake Detectors Are Created Equal

Deepfakes have been on the rise for the last few years, with multiple face swap tools gaining popularity among fraudsters and even organized criminal groups.

According to the Europol report “Facing Reality? Law enforcement and the challenge of deepfakes,” deepfakes could even be used in more serious crimes such as CEO fraud, evidence tampering, and the production of non-consensual pornography.

However, as with everything related to AI, it’s always an arms race between fraudsters and modern deepfake detectors. Coming out of International Fraud Awareness Week, we wanted to provide a reality check into the capabilities and advancements of deepfake detectors over the last few years - a reality check needed only because of how vast an issue deepfake fraud remains.

In our internal research, we analyzed the performance of open-source modern state-of-the-art deepfake detectors published since 2020.

Here’s our fundamental observation: when it comes to distinguishing between real and fake content, computers have long outperformed humans. This finding underscores the need to harness the power of cutting-edge algorithms and methods.

Almost all of the leading works in this field prominently feature face detection as a fundamental element of their algorithms. Face detection is a near-solution, characterized by high accuracy—not perfect, but close.

When a face is prominently positioned in an image and looks forward, modern detection models excel at fast and reliable identification.

And while there are several ways to create deepfake images, one method stands out as both popular and robust: one-shot face swapping. This technique uses two images, a source and a target, to transfer facial features from the former to the latter.

In the current landscape, it’s considered the most powerful approach to creating deepfake images and videos.

You can try our Deepfake game yourself, and check it out.

Studies

The lack of readily available code and weights in the majority of related work underscores a common challenge in the field of deepfake detection.

This landscape often prioritizes business applications over scientific dissemination, resulting in limited access to the tools and resources that are essential for academic and research communities.

This lack of openly shared code and model weights has been a significant barrier to the broader advancement of deepfake detection methods.

There are numerous approaches to deepfake detection, and with each conference, new articles appear.

Some of these articles focus primarily on the model architecture for deepfake detection, drawing considerable inspiration from the transformer model and attempting to adapt it to the challenge.

Meanwhile, other articles focus on training methods, particularly on synthetic datasets filled with fake images. The field is rich with benchmarks, and in the following section, we will discuss some of the most powerful among them, emphasizing those with open-source code and weights available.

FaceForensics++

The most prominent baseline for all modern deepfake detection methods is the research published in the paper FaceForensics++: Learning to Detect Manipulated Facial Images. The authors' primary contribution is an extensive dataset of over 1.8 million images from 1000 YouTube videos, categorized into raw, high-quality, and low-quality options.

They used human observers to validate these distinctions. The deepfake classification model in the paper is a binary system based on an XceptionNet backbone with ImageNet weights, fine-tuned on their dataset.

By employing a simple voting mechanism based on model responses, the authors achieved a significant impact in the field of deepfake detection despite the model's architectural simplicity.

Multi-attention Deepfake Detection

The authors highlight a common problem in previous deepfake detection models which are primarily characterized by their reliance on a simple binary classifier approach.

The basic binary classifier approach that doesn't account for subtle distinctions between real and fake images. The authors here propose an alternative inspired by fine-grained classification, using a multi-attention network with multiple attention heads to focus on different artifact regions.

This network combines low-level texture features and high-level semantic features to create image representations and a distinctive attention-guided data augmentation mechanism for training.

This approach addresses the limitations of existing models, making it a promising method for deepfake detection.

Multi-modal Multi-scale Transformers for Deepfake Detection

The authors of "M2TR: Multi-modal Multi-scale Transformers for Deepfake Detection" emphasize the importance of focusing on specific regions in images where fake content may appear.

They introduce a multi-modal approach with a multi-scale structure, using a frequency filter to detect artifacts that may not be visible after compression.

They further employ a Cross-Modality Fusion block inspired by self-attention to merge RGB and frequency features into a unified representation, enhancing their deepfake detection method.

End-to-End Reconstruction-Classification Learning for Face Forgery Detection

In "End-to-End Reconstruction-Classification Learning for Face Forgery Detection," the authors address a common issue in deepfake detection methods that focus on specific forgery patterns, which may not encompass all possible manipulations.

They propose an approach based on two components: reconstruction learning and classification learning:

Reconstruction learning enhances the representations to detect unknown forgery patterns.

Classification learning identifies disparities between real and fake images.

The authors employ a multi-scale approach to improve these representations, using a dedicated reconstruction network to model real faces and a metric-learning loss to enhance the detection of previously unknown forgery patterns.

Implicit Identity Leakage: The Stumbling Block to Improving Deepfake Detection Generalization

In the work, "Implicit Identity Leakage: The Stumbling Block to Improving Deepfake Detection Generalization," the authors address a significant issue concerning deepfake detection. They point out that many deepfake models are based on face-swapping techniques, which can lead to a unique challenge.

These models tend to remember the distributions of genuine IDs, which means that a fake image can sometimes appear as a blend of two different IDs. However, this problem becomes especially challenging when attempting to apply these models to new, unseen, or cross datasets. In these cases, the model struggles to decipher the true identity of the image because it has not encountered it before.

To address this issue, which the authors refer to as "Implicit Identity Leakage," they endeavor to find solutions that improve the generalization of deepfake detection models beyond the confines of their training datasets.

To provide evidence of this phenomenon, the authors initially took pretrained deepfake classifiers and froze all layers except the last one. They replaced the last layer with a linear layer and fine-tuned it for an ID classification task.

This experiment showed that a single linear layer could be effectively trained to classify IDs with high accuracy, demonstrating the potential for identity leakage. Then the authors created a new method for swapping parts of the face at different scales, with a primary focus on swapping specific facial regions.

They then trained a multi-scale detection model by utilizing images generated from this process. This model scrutinizes feature maps of different sizes in diverse layers to detect the existence of artifact areas, delivering a thorough observation of the likely signals of deepfake manipulation.

Detecting Deepfakes with Self-Blended Images

The latest notable paper in the field of deepfake detection is "Detecting Deepfakes with Self-Blended Images." In this research, the authors have taken a novel approach by training their own model using a unique dataset.

This dataset consists of images generated through the blending of pseudo source and target images derived from individual pristine images. This process effectively replicates common forgery artifacts often encountered in deepfakes.

The key insight behind this approach is that by using more general and less easily recognizable fake samples, classifiers can learn more generic and robust representations without succumbing to overfitting to manipulation-specific artifacts.

The authors identify four primary types of common deepfake artifacts: landmark mismatch, blending boundary, color mismatch, and frequency inconsistency. They then synthesize these artifacts using a specialized model.

For the model architecture, the authors took EfficientNet-b4, pre-trained on the ImageNet dataset. They fine-tune this model on their Self-Blended Images (SBI) dataset, ensuring that the model becomes adept at detecting deepfakes by learning from these blended images with common forgery artifacts.

Our Experiments. Metrics and Datasets

We have analyzed the performance of modern state-of-the-art deepfake detectors that were published after 2020 and have their code and model weights available for public and research use.

We have calculated relevant metrics for each model on the same public datasets to see how qualities disclosed by authors are transferred to a similar dataset. Then we applied simple transformations that are frequently used by fraudsters to bypass verification (like face swap) and saw how efficiently deepfake detectors perform.

We used Celeba-HQ and LFW as base datasets for ground truth real images. Both are widely used in research and development. The images from these two datasets could be classified as “domain” images for most of the computer vision tasks.

To introduce a ground truth fake images datasets, we used a state-of-the-art deepfake model from 2021 called SimSwap. It is still considered by many as the best and most popular single-photo deepfake generator.

To generate a sufficient amount of images, we used random pairs of source and reference photos from the dataset to create Fake-Celeba-HQ and Fake-LFW. Each dataset is exactly 10.000 images.

For simplicity, the main metric for measuring the quality of the models we used 1-class accuracy with a default threshold of 0.5. In other words, for each dataset, we calculated the percentage of rightly guessed labels. Additionally, we calculate a total ROC-AUC metric over combined real and fake datasets.

Experiment 1:

	LFW	CelebaHQ	Fake-LFW	Fake-CelebaHQ	AUC score
SBI	0.82	0.57	0.82	0.96	0.84
CADDM	0.49	0.69	0.80	0.54	0.67
RECCE	0.01	0.00	0.98	0.00	0.54
MAT	0.00	0.74	1.	1.	0.75
FF++	0.13	0.67	0.88	0.53	0.57
M2TR	0.42	0.56	0.69	0.51	0.56

Table 1. 1-class accuracy and AUC for real/fake datasets without changes

As expected, most of the models had some issues detecting SimSwap deepfakes. The best model is SBI, scoring 82% and 96% showing a promising 0.84 AUC score.

What is unexpected is that there are a lot of capable models that had difficulties classifying images from real datasets as real:

RECCE scored most of the real images as fakes.

MAT, FF, and M2TR scored less than half of the faces from LFW as deepfakes.

There are 3 models that have an AUC score close to 0.5. This raises questions about the transferability of these models to a more realistic domain and how they can easily be bypassed by fraudsters.

Experiment 2:

To test how these models behave on a more realistic domain, we will try two different techniques fraudsters usually exploit when using deepfakes.

The first thing they do to hide most of the artifacts and irregularities is downscaling. Since, in most of the liveness and deepfake checks there are no requirements on the video quality, fraudsters usually compress deepfaked video.

To simulate this approach, we will use the same datasets, but compress each image to a much smaller resolution (128x128) using a bilinear algorithm. Ideally, deepfake detectors should be able to detect deepfakes even if the resolution of images on inference differs from the resolution during the training process.

	LFW	CelebaHQ	Fake-LFW	Fake-CelebaHQ	AUC score
SBI	0.82	0.82	0.43	0.23	0.6
CADDM	0.55	0.46	0.62	0.65	0.6
RECCE	0.83	0.89	0.13	0.08	0.54
MAT c40	1.	1.	0.	0.	0.5

Figure 2: Best of the deepfake detectors’ metrics on a low-quality dataset

Here, the results are more than confusing. Models that were achieving more or less competitive performance now have near-zero accuracy on fake datasets. One can see that the MAT model simply scored everything as a real image, and the RECCE model is very close to the same decision.

Experiment 3:

The second fraud practice is to upscale the image to retouch deepfaked images to remove all imperfections that could “give up” fabricated images to detectors. One of many such examples is eyes: there are no round pupils or light refractions on most deepfaked images.

So, a fraudster usually uses some specific beautification or “enhancement” software similar to the ones used in Instagram or TikTok to mask all impurities.

To simulate the effects of such software, we used its closely related open-source analog: GPEN. This is an enhancer that uses GANs to upscale the face and enhance it.

	LFW	CelebaHQ	Fake-LFW	Fake-CelebaHQ	AUC score
SBI	0.76	0.63	0.38	0.58	0.62
CADDM	0.52	0.71	0.59	0.38	0.57
RECCE	0.18	0.	0.8	1.	0.52
MAT c40	0.99	1.	0.	0.	0.5

Figure 3: Best of the deepfake detectors’ metrics on an enhanced dataset

Here, one can see the same trend as in Experiment 2. The MAT model scored everything as real and RECCE scored everything as fake. The performance of SBI and CADDM is better than random, but they missed more than half of the deepfakes in Fake-LFW and Fake-CELEBA-HQ datasets.

Conclusion

The outcome of this research is gloomy since there are no open-source deepfake detectors that would be 100% secure, while deepfake fraud is expected to develop further, as its generation is getting easier and cheaper. According to Sumsub’s internal statistics, the prevalence of deepfake fraud grew considerably from 2022 to Q1 2023:

From 2022 to Q1 2023, the proportion of deepfakes among all fraud types increased by 4,500% in Canada, by 1,200% in the U.S., by 407% in Germany, and by 392% in the UK.

In Q1 2023, the most deepfakes came from Great Britain and Spain with 11.8% and 11.2% of global deepfake fraud respectively, followed by Germany (6.7%), and the Netherlands (4.7%). The U.S. held 5th place, representing 4.3% of global deepfake fraud cases.

Our experiments show that there are still a lot of things to be done about deepfake detection. Even the best open-source deepfake detection models are not prepared for the real world and can’t combat fraudsters.

There are a great number of papers about deepfake detectors, but most of them don’t have code or model weights available.

Because of that, one of the issues here is the lack of openness which creates a barrier to the improvement of deepfake detection methods.

Therefore, we at Sumsub:

Are continuously combatting deepfakes and improving our in-house deepfake detection technology.

Offer our in-house set of four distinct machine learning-driven models for deepfake and synthetic fraud detection called For Fake’s Sake. It is available for everyone for free download and use, and Sumsub seeks feedback from the AI-research community to further improve the models’ capabilities.

Still, the main responsibility for the online protection of Internet users’ images lies on the users themselves. Remember to be cautious about sharing personal photos online. Better use stylish avatars instead, just like our authors did.

And here, you can find other useful tips on how to stay away from deepfake abuse.

Written by Maksim Artemev, Lead Computer Vision Engineer, and Slava Pirogov, Computer Vision Engineer, at Sumsub