Detecting fake video needs to start with video authentication

There is an arms race in Artificial Intelligence capabilities. Instead let’s counter “deepfakes” with cryptographic signing of video.

Recent developments in artificial intelligence-based image synthesis has endowed machines with the ability to generate photos and videos of the real world and with accuracy that would have appeared impossible only a few years ago. While the incredible applications for this technology are only starting to be explored, recent press coverage of fake videos shows us that malicious use of this type of technology is around the corner.

Although creating a fake video of sufficient quality to fool most people most of the time is both expensive and time consuming, year-on-year developments in capabilities are seeing this technology commoditized at a breakneck speed. Current efforts seem laughable at times but the mere existence of deep learning-based fakery (or “deepfakes”) has already started casting doubt in our own senses, in criminal evidence, and in the institutions we trust.

In parallel with the development of deepfake technology, AI is also being developed to counter this threat: machines trained to detect malicious alterations in video for the inevitable future where we find ourselves unable to detect the forgeries ourselves.

An arms race between two fields of study of AI, which may very well go on forever.

But can we tackle this challenge through a different lens?

We’ve been dealing with faked electronic data for as long as we’ve had computer networks.

The browser you are reading this article on is likely showing a little green padlock, letting you know the page is coming from the source it’s claiming to come from and hasn’t been altered by a third party in the journey it took to get to your screen.
The celebrity Twitter profiles you follow likely have a blue tick, communicating to you the tweets are from the person they claim to be.
When you last entered your credit card details on a webpage, you had to provide additional personal information such as the postal address linked to the card to prove you were the rightful owner of it.
Your government’s website may have asked you to electronically provide details of your passport or another form of official ID when you’re using online services.

These are all forms of electronic identification and authentication which we come across in our everyday lives. Infrastructure around us — healthcare, finance, justice systems and many others — also rely on electronic identification and authentication trust. Digital certificates and electronic signatures are constantly being exchanged in the background of our lives, all but invisible to us.

While most of the conversation to counter the looming threat of fake video to society centers around detecting malicious alterations in video by forensically analyzing the characteristics of the video — deploying good AI to fight bad AI — we can instead protect ourselves from fake video by making a few changes to the way we record and consume video content.

Cryptographic signing of video from source provides us with evidence that that video came from the device that recorded it, untouched.

Software to analyze and detect fake video is easy to integrate as this type of software sits at distribution between when a video is uploaded and when it is played. Comparatively, authenticating video is harder to coordinate as there are numerous stakeholders, from camera makers through to distributors, which have to agree on and adopt an authentication standard. Bringing organizations together to collaborate is how we have standards for interoperability in numerous areas in society. Just because it is hard, doesn’t mean we shouldn’t try — especially if the risks justify the effort.

What’s at stake?

You interview for a job but fail at the final stage. All you are told is that you are “not a good fit” and they are “moving forward with another candidate”. Unbeknownst to you, HR discovered an alleged audio recording of you spewing racist epithets in the course of a routine background check.
The parents of a teenager sit by her bedside as she lays motionless in hospital after an attempted suicide. They were distraught to find out their daughter’s class had been sharing a supposed pornographic video of her.
A country’s population is galvanized as the newspaper headlines call for war. “We must strike first” they say in response to alleged footage of another country’s President declaring war on their nation. Is the footage real? Or was it their own country’s intelligence services trying to create the pretext for war?

There are real lives at stake.

Some argue that we’ve had “fake” text and “Photoshopped” images for awhile. That video is no different, fundamentally: humans will adapt to fake video the same way they have for altered images. Even with a skeptical view that society eventually reorients itself: the consequences in the short term are real as we saw with election meddling through fake news in 2016.

While it seems like video is just another step in an inevitable sequence (text →images →video), the human relationship to video is much more powerful and unique than to the other forms (we will expound more on this in a future blog). To analyze the scale of the risk and impact of fake video, we need to consider the trifecta of three trends: a video-first world, social networks providing targeted distribution at scale, and sophisticated deep learning AI creating realistic content at scale.

We have not yet seen a scaled, coordinated deployment of AI to create and release a mass proliferation of fakes but the foundation for it has been sowed.

In a world where we can’t trust our eyes and ears, how will we interpret events and situations?

Deepfakes that seek to deceive and spread disinformation is a problem that needs to be combated, so what are our options?

The current conversation: Good AI vs. Bad AI, an Arms Race.

The most commonly discussed approach is that we should counter deepfakes with software that forensically analyses video. Software like this may examine the characteristics of the audio and video data itself, looking for artifacts, abnormal compression signatures, or camera or microphone noise patterns. Aside from the characteristics of the data, AI may also analyze the video metadata, or even perform behavior pattern analysis on the subjects of the video.

So-called good AI used to police the bad AI.

The challenge with this approach is the bad AI will have a feedback loop in the form of knowledge the video it generated has been flagged as fake or penalized in search results.

And it will continuously learn from the knowledge.

A deepfake-generating system can keep producing content, learning from the certainty it has obtained of what is and isn’t detectable, while the good AI will always be one step behind, slowly learning but never basing its newfound knowledge on the certainty of what content was, or wasn’t, fake.

Good AI is like Norton Antivirus software: it will catch most of the viruses but not all.

There will be false positives and there will be false negatives. Ultimately, the bad AI only needs to get one video through as a false negative to deceive its intended target.

Countering fake video doesn’t have to be an arms-race.

Instead of focusing on a remedy, let’s look at the problem from the prevention side: authentication.

YouTube: https://www.youtube.com/watch?time_continue=39&v=MVBe6_o4cMI

Video authentication and how it could work.

Video authentication is where video data is processed through a hashing algorithm which maps a collection of video data (for instance a file) to a small string of text, or “fingerprint”. A video file fingerprint can carry-on with it throughout the life of that video, from capture through to distribution. At playback, those fingerprints are reconfirmed, proving the authenticity of the video data — confirming it is the same video that was originally recorded.

This fingerprint could also be digitally signed by the recording device, providing evidence of where the content originally came from with the device details and other metadata. Whether that is a CCTV camera, a first responder’s body camera, a journalist’s registered equipment, or the mobile app of a concerned citizen.

When a viewer watches a recording, they could inspect an authenticity certificate, reviewing the chain of custody for that file. During playback, there should even be a visual representation of where in a video fingerprints match the original recorded content (it is authentic) and don’t match (authenticity cannot be guaranteed). This is similar to how a browser’s lock icon simply and clearly communicates authenticity of the site you are visiting.

Amber’s UI uses a border around the video to communicate where it has been altered

This form of video authentication would entail:

Hashing and signing technology to be integrated via an SDK into a recording app or onto the firmware of a recording device. It is imperative to fingerprint the video as close as possible to the time of recording as any delays increases risks of exploitation. (In the future, we may need fingerprinting built into the on-board video encoder chip or the image sensor itself.)
The generated fingerprints along with details such as author, location, time, and equipment details would be stored in a secure, signed and immutable manner. An example could be with a robust blockchain where a smart contract logs the hashes and the additional details of the video.
On playback, videos are rehashed and compared to the fingerprints retrieved from the immutable store.

If the fingerprints do not match, the video has been altered.

If the fingerprints match, the video is authentic and unaltered.

The results are binary: match or non-match. As this approach would rely on a hash function, it is very, very difficult — likely impossible — to maliciously alter a video so that it produces the same fingerprint and while still being playable.

The net result would be analogous to Transport Layer Security technology used in browsers, creating a “truth layer” for media files.

It is critical that these fingerprints are stored in an immutable yet transparent database so multiple stakeholders can have confidence in the veracity of the video. If not, a bad actor could alter both the video and the original fingerprints to make an altered video seem authentic. Or they may alter just the fingerprints themselves, to sow doubt as to the legitimacy of a genuine video.

For authenticated video to become commonplace in relevant categories of video such as in news or video evidence — that is, most video does not need to be authenticated and signed — the key challenge is in bringing onboard a number of key participants. These groups include camera manufacturers (including the smartphone makers) and distributors like traditional TV and radio broadcasters and new media platforms like Twitter, Facebook and YouTube.

In a world where we should be skeptical of our own eyes and ears (and what they are interpreting), authentication is a system design where truth is baked-in at a foundational layer.

Trust through design:

Of course, video authentication, or fake video detection, won’t stop:

Teenagers bullying each other;
News editors selecting shots that skew a story in a certain way;
Conspiracy theorists from claiming evidence has been fabricated.

The choices in a system design could, though, prevent the spread of an altered video or allow a viewer to review the edited footage they’re watching against the source footage and remove some of the fuel in the flames of those who seek to discredit or to misinform.

If the Rodney King Beatings happened today, would people doubt the legitimacy of the video as a product of a deepfake? The existence of deepfakes will cast a shadow over authentic videos.

Authentication technology is not required for the latest blockbuster or the latest image filter on our social media posts. But fact-based content, especially where the stakes are high, should be recorded with trust taken into account from the outset.

At Amber, we have created a video authentication system or “truth layer” for video where trust is built into the foundational layer. This system includes a video recording app, storing fingerprints and an audit trail as an immutable blockchain record, and a site to playback reconfirmed videos.

Parallels with the Apple App Store

Our video authentication system mirrors the Apple App Store in a number of important respects. An iPhone (unless jail broken) can only download and run apps from the official App Store. Those apps on that App Store have been assessed for compliance to security policies. The app itself was submitted by a verified developer, vetted at a stringency level based on if the submitter is an individual developer or a company.

Apple has created a security ring around apps in its ecosystem and thus instilled confidence in its customers of the safety and efficacy of the applications they choose to download from the App Store.

If you download an app on your iPhone, you can be almost sure that will function as it claimed to in its App Store description. The app (and its developer) were authenticated in the Apple ecosystem. Your phone is part of the ecosystem. And there is a lot more trust within the ecosystem.

Authenticate + Detect: a hybrid approach.

As a transaction on an e-commerce site is passed through fraud detection software even after the authentication checks have passed, Amber takes a combined approach of not only generating fingerprints of content, but also deploying the latest good AI developments to detect fakes. We perform various analyses on videos uploaded to our platform, to provide a measure of not only the likelihood that a video was altered, but where in the image this alteration was likely to have occurred.

Video authentication is an important approach but it may not be foolproof alone such as if you use a camera (with video fingerprinting tech) to film a screen playing a manipulated video. Using software to detect fakes and manipulations are an important complement to authentication.

At Amber, we strongly believe that one of the greatest impacts on humanity has been the adoption of the scientific method and its premise of evidence-based conclusions. We are on a mission to protect truth of aural and visual evidence.

Deepfakes, and malicious AI in general, can cause great harm in the near future. We need to preempt the challenge today and video authentication is a critical piece of the solution.

Want to participate?

2. Contact us to find out if our verification products could be right for you: https://ambervideo.co

3. Leave us your thoughts below or send us a message: https://twitter.com/ambervid

About | Amber: video veracity, at scale.

Amber is developing the “truth layer” for video. Amber Authenticate fingerprints source videos & tracks its provenance. Amber Detect analyzes synthetic videos of unknown source using advanced AI and deepfakes counter systems.

Contact us if you would like to find out more about what we do and how you can seamlessly integrate Authenticate and Detect products into your workflow for frictionless video veracity, at scale.

Thank you to Roderick Hodgson, Shaan Puri, and Sikander Mohammed Khan for all your input and feedback on this post.