The Danger of Deepfakes: Responding to Bobby Chesney and Danielle Citron

Herb Lin

Cybersecurity & Tech

The Danger of Deepfakes: Responding to Bobby Chesney and Danielle Citron

Herb Lin

Tuesday, February 27, 2018, 7:00 AM

Bobby Chesney and Danielle Citron have painted a truly depressing picture of a future in which faked video and audio cannot be distinguished from the real thing. And I think they are right to be depressed about it, though I want to discuss a possible technological solution that they did not address.

(Photo: Youtube/Analytics India Magazine)

Herb Lin

HerbLinCyber

Meet The Authors

Subscribe to Lawfare

Bobby Chesney and Danielle Citron have painted a truly depressing picture of a future in which faked video and audio cannot be distinguished from the real thing. And I think they are right to be depressed about it, though I want to discuss a possible technological solution that they did not address.

Their post addresses the present infeasibility of technology that can distinguish between fake and real video and audio, and it suggests that it will be a long time, if ever, before such technology is possible. I agree with that assessment, but there is another way to obtain many of the benefits of such technology.

Consider the technology of digital signatures, which enable a party to sign a digital object in such a way that proves he or she was the one who signed it. Now imagine that a vendor produces cameras and sound recorders (i.e., devices) that digitally sign every video or audio file the user creates. Further, the vendor keeps records so that the purchaser of any given device is known in the future—that is, the device and its public signature key is registered in a database accessible to anyone. Any video or audio file released in the future, accompanied by a digital signature, could then be associated with a specific purchaser.

Purchase of these devices would be strictly voluntary, and many other devices would be available without signature-generation capability. Likely purchasers would be parties that have a stake in proving to the world that they indeed were responsible for capturing specific videos and audios released to the public (such as news organizations).

Viewers would then be exposed to a mix of videos and audios, some with a valid digital signature and some without. Viewers could trust with a high degree of confidence that a given video or audio clip—if and only if signed—had been actually recorded on a specific device, and, assuming the device had not been stolen from its registered owner, recorded by that owner. This scheme does not produce 100-percent confidence—a party producing fake videos or audios with a forged signature could physically obtain the device associated with the signature and also compromise the electronics to feed the forged video or audio to the signature engine inside. But this scheme would certainly provide more confidence in the authenticity of the video or audio than for a video that was not accompanied by a signature that could be traced to a registered device.

The idea described here has been implemented in certain cameras made by Canon and Nikon, though both implementations have been cracked—ironically, by a well-known Russian company. Nonetheless, the fundamental concept is sound, and both Canon and Nikon recognized a commercial opportunity in the value of authenticating images for evidentiary purposes.

That’s my ray of hope. But an understanding of psychology dims that hope pretty fast (though I hope that someone can tell me why I am wrong). To wit, Chesney and Citron argue that cognitive biases underpin resistance to facts and that deepfakes are likely to encourage such resistance. Indeed, that’s the very purpose of most of the claims that verifiable facts constitute “fake news.”

But faked video and audio present a harder problem than fake news. Deepfakes will be more likely to arouse strong emotion than simple text on a page, making it less likely that humans so exposed will care at all about the sourcing of these video and audio inputs. (Explanations for the stronger emotional arousal for sensory inputs vis-a-vis textual inputs vary, but one common explanation is that human beings evolved sensory inputs to warnings of immediate danger, whereas textual inputs must first go through sensory input channels and are subsequently processes in higher-order parts of the brain that are not responsible for generating warnings of danger.)

A deepfake video may depict children being held in the basement of a pizza parlor for pedophilic purposes with Hillary Clinton supervising their captivity, and even if it is debunked, many viewers will dismiss the debunking with a claim such as “well, she would do it if she could.” That is, the video may not be truthful, but it would be certainly plausible to a certain segment of viewers—and the difference between plausible and actual would matter very little in this context.

So we have to hope that there remain in the population politically significant numbers of people who do care about reality as the necessary and essential foundation for discourse. For those people, authentication technologies such as what I described above may matter a great deal. The question is whether members of this group outnumber those for whom truth is no longer a consideration.

Topics:

Cybersecurity & Tech

Back to Top