SIMON ADLER: All right. Hello? Hello? Hello? Hello? Hello? Hello? I can hear you. They can't hear me.
ROBERT KRULWICH: We can hear you. We can hear you.
SIMON: But you can't.
ROBERT: But what we can also hear is us twice -- us twice.
JAD ABUMRAD: Hey, I'm Jad Abumrad.
ROBERT: I'm Robert Krulwich.
JAD: This is Radiolab. And today ...
ROBERT: Oops. Oops. You don't -- you don't hear -- you don't hear us ...
JAD: We have a story about how the echoes of you can go out into the world and come back and bite you, and all of us really, in the butt.
ROBERT: Oh, wait. Maybe we're fine now.
SIMON: Is my echo gone?
SIMON: Okay. Okay, we're good.
JAD: We're good?
ROBERT: And it comes to us from our producer, Simon Adler.
SIMON: Yeah, Nick. Hello!
NICK BILTON: I'm sorry.
SIMON: Okay, so, this is Nick Bilton.
NICK BILTON: My name is Nick Bilton. I'm a special correspondent for Vanity Fair.
SIMON: And his beat, you could say, is trying to predict the future of technology.
NICK BILTON: To look into the future, and to this kind of crystal ball and try to predict what the next 5, 10, 15 years would look like for the media industry.
SIMON: Do you have a good batting record? Like did you -- did you call some big ones?
NICK BILTON: Oh, yeah. You know, phones in our pockets that would be like super computers, that social media would drive news not newspapers and so on, and things like that. So it's been pretty good.
SIMON: I reached out to you because I came across this article that you wrote, an article that sent shivers down my spine. And I'm not one to typically be given shivers by articles. So I guess, how did you stumble into all of this? And where does this start for you?
NICK BILTON: So I was sitting around with some friends in my living room, and a friend of mine mentioned, "Oh, did you see this thing that Adobe put out recently?"
[ADOBE SPOKESPERSON: We live in a time when more people than ever before believe that they can change the world.]
SIMON: And that conversation led Nick to a video. A video online of the Adobe Max 2016 conference. There are tons and tons of people in the audience.
[ADOBE SPOKESPERSON: This is amazing!]
SIMON: And up in front of them, it looks like the stage of a Apple product launch, but sort of beach-themed?
JAD: Why beach?
SIMON: I have absolutely no idea.
[KIM CHAMBERS: It's a little TMI, don't you think?]
SIMON: There are two hosts that are sitting in these, like, lifeguard chairs.
[JORDAN PEELE: Say you, say me!]
SIMON: Comedian Jordan Peele.
JAD: Jordan Peele as in Key & Peele, Jordan Peele?
SIMON: Yes. And then the other host is this woman Kim Chambers, who is a marathon swimmer and an Adobe employee. And then ...
[KIM CHAMBERS: Please welcome to the stage ...]
SIMON: On walks ...
[KIM CHAMBERS: Zeyu.]
SIMON: Zeyu Jin.
[ZEYU JIN: Hello, everyone!]
SIMON: Young guy. Glasses.
[ZEYU JIN: You guys have been making weird stuff online! With photo editing!]
SIMON: And he says Adobe is known for Photoshop, we're known for editing photos and doing magical things visually.
[ZEYU JIN: Well, we'll do the next thing today. Let's do something to human speech.]
SIMON: Pulls up a screen on a Mac computer.
[ZEYU JIN: Well, I have obtained this piece of audio where there's Michael Key talking to Peele about his feeling after getting nominated.]
SIMON: Keegan-Michael Key had been nominated for an Emmy, and he and Jordan Peele were talking about it.
[ZEYU JIN: There's a pretty interesting joke here. So let's -- let's just hear it.]
[CLIP OF KEEGAN-MICHAEL KEY: I jumped on the bed, and I kissed my dogs and my wife in that order.]
SIMON: Not a bad joke.
[ZEYU JIN: So let's do something here. Okay, so suppose Michael Key wants to send this audio to his wife.]
SIMON: In other words, what if Keegan-Michael Key was feeling like that was a little bit rough on my wife, that was a little bit mean, you know, maybe he wanted to go and rewrite history and say that he kissed his wife before the dogs.
[ZEYU JIN: So he actually wants his wife to go before the dogs. So, okay. So what do we do easily?]
SIMON: So Zeyu clicks a button, and the program automatically generates a transcript of the audio and projects it up on the screen behind him. You know, just text of what Keegan-Michael Key said.
[ZEYU JIN: Okay. Let me zoom in a little bit.]
SIMON: And then ...
[ZEYU JIN: Copy. Paste.]
SIMON: He just highlights the word "wife," and pastes it over in front of "dogs."
[ZEYU JIN: Okay, let's listen to it.]
SIMON: Clicks play.
[CLIP OF KEEGAN-MICHAEL KEY: And I kiss my wife and my dogs.]
JAD: Oh, so he was able to move the -- edit the audio by moving the text around in the text box.
SIMON: Yes, exactly.
JAD: Okay. Well, that's kind of cool.
SIMON: Kind of impressive.
[ZEYU JIN: Wait.]
SIMON: But then ...
[ZEYU JIN: Here's more. Here's more. We can actually type something that's not here. So ...]
JAD: Wait, Wait, what?
SIMON: Just hang on. Just hang on.
[ZEYU JIN: I heard that actually that on that day, Michael actually kissed our Jordan. So ...]
[JORDAN PEELE: Sorry?]
[ZEYU JIN: To recover the truth, let's do it.]
SIMON: He goes back into that little word box.
[ZEYU JIN: So let's remove the word "my" here.]
[KIM CHAMBERS: Your secret's out, Jordan.]
[ZEYU JIN: And also just type the word "Jordan."]
SIMON: So he types it out J-O-R-D-A-N. And just to be clear, Keegan-Michael Key did not say Jordan anywhere in this clip.
[ZEYU JIN: And here we go.]
[CLIP OF KEEGAN-MICHAEL KEY: And I kissed Jordan and my dogs.]
JAD: Wait, he just typed in a word that the guy never said, and it made the guy say the word that he never said as if he actually said it.
[ZEYU JIN: Well ...]
[JORDAN PEELE: You're a witch!]
SIMON: Jordan jumps out of his lifeguard chair, starts sort of stomping around the stage.
[JORDAN PEELE: You're a demon.]
[ZEYU JIN: Oh, yeah. I'm magic. And the last magic I'm gonna show you guys is we can actually type small phrases. So let's say, okay so we remove ...]
SIMON: He deletes the words "my dogs," and he types "three times."
[KIM CHAMBERS: Oh!]
[ZEYU JIN: And playback!]
[CLIP OF KEEGAN-MICHAEL KEY: And I kiss Jordan three times.]
[KIM CHAMBERS: Oh!]
ROBERT: All right, wait a sec. You're saying that Keegan-Michael Key never said, ever said "Jordan," never said "three," never said "times," never said any of those words, and somehow just from the typing in of it, the guy is now saying them and we're hearing them in his voice. That's what just happened?
SIMON: Yep. That is exactly what the demo claims.
NICK BILTON: It's essentially Photoshop for audio.
SIMON: Nick Bilton again.
NICK BILTON: You could take as little as 20 minutes of someone's voice, type the words and it creates in that voice that sentence.
SIMON: With just 20 minutes of the guy talking.
NICK BILTON: Yes.
ROBERT: But how? How in heaven do you do this?
SIMON: And so we're here at Adobe. What exactly do you do here?
DURIN GLEAVES: Sure. I'm the product manager for audio.
SIMON: This is Durin Gleaves. I flew out to Seattle and tracked him down to ask him exactly that question.
DURIN GLEAVES: So essentially what it does is it does an analysis of the speech and it creates models. And it basically ...
SIMON: And he explained to me that this program, which they call VoCo by the way, what it does is it takes 20 minutes, or actually 40 if you want the best results, of you talking and it figures out all of the phonetics of your speech, all of the sounds that you make.
DURIN GLEAVES: Finds each little block of sound and speech that is in the recordings.
SIMON: Chops them all up. And then when you go and type things in ...
DURIN GLEAVES: It will recombine those into that new word.
JAD: But what if it encounters a sound that I've never made?
SIMON: Well, the theory is in 40 minutes of speech, which is the amount they recommend you feed in, you're gonna probably say just about every sound.
SIMON: In the English language.
JAD: So if -- really? So, like, phonetically I go -- I run through the gamut in 40 minutes?
SIMON: Well, and like, what would you, or what are you hoping people would use a product like VoCo for?
DURIN GLEAVES: So for the video production tools and for what Audition is used for, a lot is dialogue editing.
SIMON: The whole idea Durin said is to help people that work in movies and TV.
DURIN GLEAVES: A lot of our customers record great audio on set. The actors and the dialogues and everything. And when they come back, if sometimes there's a mistake or they make a change.
SIMON: Like the actor on set said "shoe," but what he was pointing at was obviously a boot.
DURIN GLEAVES: And right now there's -- they do what's called ADR. They'll bring the actor in, they'll re-record some lines and they'll try and drop that into the video.
SIMON: But you're not using the same microphones, you're not in the same location, the actor might be sick that day so his voice sounds different.
DURIN GLEAVES: And things -- a lot of times you can really hear that stand out in productions if they don't get it just right.
SIMON: But with VoCo, you just delete the word "shoe," type in "boot," and boom! There it is.
DURIN GLEAVES: Using the same source media and the same characteristics, and have it just sound seamless and natural.
SIMON: And so it -- it's going to be a sort of -- the hope is that it will make the lives of professional post-production editors easier the world over.
DURIN GLEAVES: That's our hope right now, yeah.
SIMON: But that's not exactly ...
NICK BILTON: It's -- I mean, it's ...
SIMON: ... what Nick Bilton thought when he saw this video.
NICK BILTON: It could be Donald Trump's voice, or Vladimir Putin. So I saw that and I thought, "Wow, if -- imagine if audio clips start getting shared around the internet as fake news of a fake conversation between, you know, Vladimir Putin and Paul Manafort about trying to get Trump into the White House or something like that."
NICK BILTON: And I was like, "Whoa, this is -- this is scary stuff."
SIMON: But we're just getting started. In the words of John Raymond Arnold, played by Samuel L. Jackson in the movie Jurassic Park in his own voice.
[CLIP OF SAMUEL L. JACKSON: Hold on to your butts.]
SIMON: Things are about to get a lot crazier.
SIMON: So forget voices for a second, because now ...
SIMON: 1 2 3 4 5. 1 2 3 4 5.
SIMON: Its face time.
SIMON: All right, we are at the Paul G. Allen Center at the University of Washington in Seattle.
SIMON: So I left Adobe and went across town to talk to the head of the Grail Lab.
IRA KEMELMACHER-SHLIZERMAN: Hello.
SIMON: Hello, Ira?
IRA KEMELMACHER-SHLIZERMAN: Yeah.
SIMON: Simon. Very nice to meet you.
IRA KEMELMACHER-SHLIZERMAN: Nice to meet you.
SIMON: Dr. Ira Kemelmacher-Shlizerman.
IRA KEMELMACHER-SHLIZERMAN: So I'm a professor in the computer science department at the University of Washington, and also work at Facebook.
SIMON: Can I just have you come a little closer?
SIMON: Okay, just to back up for a second. When Nick first saw the VoCo demonstration, he started to wonder, okay like, how could this be used down the road?
NICK BILTON: My original thesis was oh well, maybe what will happen is that you will be able to create 3D actors, just like you did in Star Wars.
SIMON: Then join it with the VoCo stuff to ...
NICK BILTON: Create a fake Hillary Clinton and, you know, Donald Trump having a conversation or making out, or whatever it is you want to do.
SIMON: And that led him to investigate the type of work that Ira does.
SIMON: So I've been using these terms like facial reenactment and facial manipulation. Are those -- are those the right words? And then what the hell do these words mean?
IRA KEMELMACHER-SHLIZERMAN: Yeah. So I mean, it's all -- it's all a way of animating faces. And it started from the movies, right?
[AVATAR CLIP: The concept is to drive these remotely-controlled bodies called avatars.]
SIMON: Think like the aptly-named movie Avatar. Or ....
[TOY STORY CLIP: Sergeant?]
[TOY STORY CLIP: Yes?]
SIMON: Going a little further back.
[TOY STORY CLIP: No sign of intelligent life anywhere.]
SIMON: Toy Story. And to make the characters come alive, what you need is the expressions of the actors playing them.
IRA KEMELMACHER-SHLIZERMAN: This is a movie space. It means that you will bring a person to a studio.
SIMON: Then you cover their face with these sticky sensory marker things.
IRA KEMELMACHER-SHLIZERMAN: And then you will spend hours, hours, hours capturing the person's little dynamics.
SIMON: Like smiles.
IRA KEMELMACHER-SHLIZERMAN: Open mouth.
IRA KEMELMACHER-SHLIZERMAN: Closed mouth.
SIMON: No teeth.
IRA KEMELMACHER-SHLIZERMAN: Sad.
SIMON: Surprised. Disturbed.
IRA KEMELMACHER-SHLIZERMAN: Things like that.
SIMON: Angry. Bloated. Frustrated.
IRA KEMELMACHER-SHLIZERMAN: Yeah.
SIMON: And from that, they create a virtual character capable of emoting all those expressions. And to make that character believable, the animators sometimes have to model a bone structure and muscles. And as you can imagine, this can get very, very expensive. And so what people like Ira started to wonder was like, can this be done on a budget? So she and others in the field started feeding videos of faces into computers, and trained those computers to break down the face into a series of points.
IRA KEMELMACHER-SHLIZERMAN: Our models are about 250 by 250.
SIMON: That is 62,500 points on one human face.
IRA KEMELMACHER-SHLIZERMAN: And once we know that, right, we can track the points.
SIMON: So once you can track how my face moves through a video clip by these 250 by 250 points, what can you then do with that information?
IRA KEMELMACHER-SHLIZERMAN: Well, I can apply the points on the face on a different model of a different person.
SIMON: Now, this is -- this is where things get quite strange, because instead of being able to map all of your facial movements onto a computer-generated virtual character or person, what Ira and others in this field of facial reenactment have figured out how to do, is to map your facial movements onto a real person. A pre-recorded real person.
JAD: What? What does that even mean?
ROBERT: Yeah, how does that work?
SIMON: Well, the best example of this is this piece of software that Nick showed us.
NICK BILTON: This software that I found from these university students.
SIMON: Called Face2Face.
[VIDEO CLIP: We present a novel, real-time facial reenactment method that works with any commodity webcam.]
SIMON: There's a video demo of this, and when you open it up this very monotone voice comes in saying ...
[VIDEO CLIP: Since our method only uses RGB data for both the source and target actor ...]
SIMON: And you're like, what the heck is this? And this screen pops up.
[VIDEO CLIP: Here we demonstrate our method in a live set-up.]
SIMON: On the right, you've got this heavyset man. Goatee, spiked hair.
[VIDEO CLIP: On the right, a source actor is captured with a standard webcam.]
SIMON: He's arching his eyebrows, he's pursing his lips, he's opening his mouth widely.
JAD: Sort of like if you're making funny faces for a two-year-old kind of thing?
SIMON: Yeah. And then ...
[VIDEO CLIP: This input drives the animation of the face in the video shown on the monitor to the left.]
SIMON: On the left, you've got this Dell computer screen displaying a CNN clip of George Bush. This is a real clip of Bush back from 2013. And his face is there looking right at the camera, occupies most of that screen.
[VIDEO CLIP: A significant difference to previous methods ...]
SIMON: And what you start to notice is, when the man with the goatee smiles, George Bush in the CNN clip also smiles. And when the man raises his eyebrows, George Bush raises his eyebrows. And you realize this man is controlling George Bush's face.
JAD: Wait, so this is a guy in -- in the present controlling a past George Bush? A real George Bush from an old video clip?
SIMON: Okay. I pulled up a video for you here.
ANDREW MARANTZ: Okay, cool.
SIMON: And a little while back when we were just learning about this, we happened to have our friend Andrew Marantz who writes for the New Yorker in the studio.
ANDREW MARANTZ: So that is George Bush's face.
ANDREW MARANTZ: What? Oh God! Oh God! That's terrifying. His -- okay. So yeah, I cannot stop watching George Bush's face. Oh, they're doing it with Putin now. Holy God! So I just have a guy just sort of going [mouth noises], and then that's what Putin is doing.
ANDREW MARANTZ: Uh-oh. Now it's Trump.
NICK BILTON: You know I mean, those videos online had my mouth agape.
SIMON: Again, Nick Bilton.
ROBERT: This is -- this is a form of puppetry where ...
NICK BILTON: Your face is the -- is the puppeteer. And the only thing is is that George W. Bush is the puppet.
ROBERT: So I sit in front of a camera, I smile and the business is taken care of?
NICK BILTON: That's real time. This isn't like you have to render some software on your computer. It's literally you download a clip or you take a clip from cable news and you turn on your webcam, and however long it takes you to do it you're done. It's the same as just shooting a video on your phone.
JAD: What is this for?
SIMON: So what are the applications of this?
IRA KEMELMACHER-SHLIZERMAN: I want to be able to help -- help develop tele-presence.
SIMON: This is Ira again.
IRA KEMELMACHER-SHLIZERMAN: So ...
IRA KEMELMACHER-SHLIZERMAN: Tele-presence, yeah.
SIMON: What does that mean?
IRA KEMELMACHER-SHLIZERMAN: So for example, so my mom lives in Israel and I'm here, and wouldn't it be cool if I could have some -- it's kind of crazy, right? But if I could have some kind of hologram of her sitting on my couch here, and we can have a conversation.
SIMON: And going one step further, one of Ira's colleagues, a guy by the name of Steve Seitz.
STEVE SEITZ: I'm a professor at the University of Washington, and I also work part-time at Google.
SIMON: He told me that they see this technology as like a building block that could one day be used to essentially virtually bring someone back from the dead.
STEVE SEITZ: I just think this technology, combined with virtual reality and other innovations could help me, you know, just be there in the room with Albert Einstein or Carl Sagan. You know, that's sort of the motivation.
ROBERT: That's what they want to do?
JAD: That's the motivation?
ROBERT: Talk to ghosts?
SIMON: Well, for them. Yes. And when I was talking to some folks who work in commercials, they're developing their own version of this. And the idea is that they're gonna make a million or a billion dollars off of this, because say you bring, I don't know, Jennifer Aniston in to film some makeup commercial. And in the makeup commercial, in English she says, "So come and buy this product. This is the best sort of whatever product around." Right now you've got China, which is a booming market. You maybe want to market things to China, and you'd really like to be able to use Jennifer Aniston. Problem is, Jennifer Aniston doesn't speak Mandarin. So either you use the same audio clip and you have someone come in and speak Mandarin over her, and the lips don't line up. Or you have to hire a Mandarin-speaking actor to come in and do the part of Jennifer Aniston. With this technology, all you have to do is record Jennifer Aniston once. You can hire a Mandarin speaker, and the Mandarin speaker's voice will be coming out of Jennifer Aniston's mouth as if she had said it in front of the camera.
JAD: Her lips would be moving as if she were a perfect Mandarin speaker?
SIMON: Exactly. Exactly.
SIMON: I think that part of it is actually incredible.
JAD: That's -- that's amazing.
JAD: Oh, my God. I'm amazed, and completely frightened by what you're telling me.
SIMON: And that's the whole point of what Nick was writing about that that gave me shivers. That someday, if you join the video manipulation with the VoCo voice manipulation ...
NICK BILTON: You're -- you're the ultimate puppeteer. You can create anyone talking about anything that you want.
SIMON: In their own voice.
NICK BILTON: And having any kind of emotion around it.
SIMON: And you'd have it right there for everyone to see in video.
NICK BILTON: And all you need to do is take that and put it on Twitter or Facebook. And if it's shocking enough, minutes later it's everywhere.
SIMON: Like, the timing of you guys making this thing, and then this explosion of fake news. Like, how do you guys think about -- about how it could be used for nefarious purposes?
IRA KEMELMACHER-SHLIZERMAN: Yeah, it's a good question.
SIMON: Again, Ira Kemelmacher-Shlizerman.
IRA KEMELMACHER-SHLIZERMAN: I feel like when every technology is developed, then there is this danger of with our technology, you -- you can create fake videos and so on. I don't want to call it fake videos, but like, to create video from audio, right?
SIMON: But they are fake videos.
IRA KEMELMACHER-SHLIZERMAN: Yeah, yeah. But the way that I think about it is that, like scientists are doing their job and showing -- like, inventing the technology and showing it off, and then we all need to, like, think about the next steps, obviously. I mean, people should work on that, and the answer is not clear. Maybe it's in education. Maybe it's every video should come up with some code now, that this is -- this is, like, authentic video or authentic text. You don't believe anything else. I mean, yeah.
SIMON: But like, maybe it was the timing more than anything, but I saw this video and it really felt like, "Oh my God! Like, America can't handle this right now." Like, we're in a moment where -- where truth seems to be sort of a -- an open disc -- what is true is -- has become an open discussion. And this seems to be adding fuel on the fire of sort of competing narratives in a way that I find troubling. And I'm just curious that you don't.
IRA KEMELMACHER-SHLIZERMAN: I think that -- I think that people -- if people know that such technology exists, then they will be more skeptical. My guess, I don't know. But if people know that fake news exists, if they know that fake texts exists, fake videos exist, fake photos exist, then everyone is more skeptical in what they read and see.
SIMON: But like, a man in North -- I think he was from North Carolina, believed from a fake print article that Hillary Clinton was running a sex ring out of a pizza parlor in DC, which is, like, insane. This man believed it and showed up with a gun. And if people are at a moment where they are willing to believe stories as ludicrous as that, like, I don't expect them to wonder if this video is real or not.
IRA KEMELMACHER-SHLIZERMAN: So what are you asking?
SIMON: I'm asking -- well, I'm asking, do you -- are you afraid of the power of this? And if not, why?
IRA KEMELMACHER-SHLIZERMAN: Just -- I'm just giving my -- I don't know. It just -- I'm answering your questions, but I'm a technologist, I'm a computer scientist. So not really, because I know how to -- and I know that -- because I know that this technology's reversible. I mean, nobody -- well, there is not -- not worried too much.
SIMON: Have you seen these videos? Otherwise, I can text it.
HANY FARID: I have, yeah.
HANY FARID: Yeah, I have.
SIMON: And as we were feeling worried, and more than that surprised that the folks making these technologies weren't, we decided to do a sort of gut check to see if we were totally off base and get in touch with one of the guys who's on the front lines of this.
ROBERT: Can you describe what was going through your head when you were watching Bush's face?
HANY FARID: I can tell you exactly what I was thinking. I was thinking, how are we gonna develop a forensic technique to detect this?
SIMON: This is Hany Farid.
HANY FARID: I am a professor of computer science at Dartmouth College.
SIMON: He's sort of like a Sherlock Holmes of digital misdeeds, which means that he spends a lot of time sitting around looking at pictures and videos ...
HANY FARID: Trying to understand where has this come from, has it been manipulated and should we trust it?
SIMON: He's done work for all sorts of organizations.
HANY FARID: The AP, The Times, Reuters ...
SIMON: Who want to know if, say, a picture is fake or not.
HANY FARID: They often will ask me, you know, particular when -- like, this just happened actually yesterday. Images came out of North Korea. And every time images come out of these regimes where there's a history of photo manipulation, there are real concerns about this. So I was asked to determine if they'd been manipulated in some way, and if so, how had they been manipulated.
SIMON: And how -- how the heck would you do that?
HANY FARID: Well, every time you manipulate data, you're gonna leave something behind.
SIMON: So let's say you do some funny business to a photo. You might create some noticeable distortion in the picture itself, but you also might distort the data.
HANY FARID: And we're in the business of basically finding those distortions in the data.
SIMON: For example, imagine he gets sent a photo. It's probably a jpeg.
HANY FARID: Jpeg, which now is 99% of the image formats that we see out there, is what is called the lossy compression scheme.
SIMON: Just a fancy way to say that when a photo is taken and stored as a jpeg, the camera, you know, just to save space, throws a little bit of the data away.
HANY FARID: So for example, if I went out to the Dartmouth Green right now and took a picture of the grass.
SIMON: The camera isn't gonna store all those millions of little variations of green hidden in the grass, because that would be just a huge file. Its going to save space by throwing some of those greens away.
HANY FARID: You just don't notice if it changes, like, a lot or a little bit less than that. It's just grass as far as you can tell.
SIMON: Now, here's Hany's trick. Every camera has a subtly different palette of greens that it's going to keep and greens that it's going to throw away.
HANY FARID: This varies tremendously from device to device. An iPhone compresses the image much more.
SIMON: So less greens.
HANY FARID: Than a high-end Nikon or a high-end Canon.
SIMON: Which would keep more of those variations of green. Now, if you hold these two pictures side by side, you might not be able to tell the difference. But Hany says when you look at the underlying pixels, there are different recognizable patterns.
HANY FARID: If you take an image off of your iPhone, I should be able to go into that jpeg and look at the packaging and say, "Ah, yes, this should have come out of an iPhone. But if that image is uploaded to Facebook and then re-downloaded, or put it into Photoshop and re-saved, it will not look like jpeg consistent with an iPhone.
SIMON: So basically, he can see at the level of the pixels or data whether the picture has been messed with in any way.
SIMON: And this is, of course, just one of many different ways that Hany can spot a fake.
HANY FARID: Yeah. Yeah.
ROBERT: Well, let me ask. Like, if you could go up against the top 100 best counterfeiters, do you think you'd catch them 10 percent of the time? 50 percent of the time? Just out of curiosity, what's your sense?
HANY FARID: I would say we could probably catch 75 percent of the fakes. But I would say that would take a long time to do. This is not an easy task. And so, you know, the pace at which the media moves does not lend itself to careful forensic analysis of images. I'm always amazed that, you know, you get these emails, you're like, "All right, you got 20 minutes." And you would need, you know, half a day, a day per image.
HANY FARID: So a very manual and a very human process.
SIMON: So is this video editing and this audio editing that's coming down the pipeline here.
HANY FARID: Yeah.
SIMON: I guess, should I be -- should I be terrified?
HANY FARID: Um, yes, you should.
ROBERT: Oh, no. Do you really mean that?
HANY FARID: Yeah, I think it's -- I think it's going to raise the fake news thing to a whole new level. I did see some artifacts by the way in the videos, they are not perfect. But that's neither here nor there, because the ability of technology to manipulate and alter reality is growing at a breakneck speed. And the ability to disseminate that information is phenomenal. So I can't stop that, by the way, because at the end of the day, it's always going to be easier to create a fake than to detect a fake.
JON KLEIN: Thank you very much. Jad himself just handed me a cup of water, which shows none of you have gotten too big for your britches.
SIMON: And that could be a serious problem ...
JON KLEIN: I would like to have seen Peter Jennings do that, ever.
SIMON: ... for this guy.
JON KLEIN: My name is Jon Klein, co-founder and CEO of Tapp Media.
SIMON: Before that ...
JON KLEIN: President of CNN US.
JAD: Oh, wow.
JON KLEIN: Before that, I was Executive Vice President of CBS News, where I was executive in charge of 60 Minutes, 48 Hours and a bunch of other things.
SIMON: And he's had to react to some serious evolutions in the media industry. He was manning the helm as social media exploded, as smartphones became ubiquitous, and consequently he had to deal with figuring out how and if to trust thousands of hours of video taken on these smartphones and sent in by viewers. What to broadcast and what not to. And so we wanted to know how someone in his position would think about these fake videos. So we sent him all of the different demos and videos we'd come across, just to see what he thought.
JON KLEIN: First thought was that this is the kind of thing that a James Bond villain would put to use. Or The Joker in Batman. Or an eighth-grade girl who, right, wants to be most popular.
ROBERT: Yeah, exactly.
JON KLEIN: You know, I mean this is -- there's so many ways to abuse this, blows your mind. I mean, it goes to the very core of communication of any sort, whether it's television or radio or interpersonal. Is what I'm seeing true? Is what I'm hearing real?
SIMON: In your -- over the course of your career, you've seen multiple technological developments that have impacted the media in rather profound ways. Where is your terror level right now or your fear level caused by this, relative to all of the other sort of advancements that have occurred over --over your career?
JON KLEIN: It's terrifying. And it hurtles us even faster toward that point where no one believes anything. How do you have a democracy in a country where people can't trust anything that they see or read anymore?
NICK BILTON: What -- what we saw happen with the fake news during the election cycle was that it doesn't -- it didn't even need to matter if anyone, you know, would rebuff it afterwards.
SIMON: This is Nick Bilton again.
NICK BILTON: It would reach millions and millions of people in mere seconds. And -- and that was it. You'd done -- it had done its job. And I think that with this audio stuff, and the video stuff that's gonna come down online in the next few years, it's gonna do the same thing, but no one's gonna know what's real and what's not.
[CLIP OF DONALD TRUMP: I moved on her actually. You know, she was down in Palm Beach. I moved on her and I failed.]
SIMON: And what's more, Nick says ...
NICK BILTON: If you think about the video that came out of Donald Trump from Access Hollywood.
[CLIP OF DONALD TRUMP: I'm automatically attracted to beautiful -- I just start kissing them.]
NICK BILTON: The thing that was really interesting about that video ...
[CLIP OF DONALD TRUMP: Hey, when your star they let you do it. You can do anything. Grab them by the p***y.]
NICK BILTON: ... you don't actually see Donald Trump until the very last second when he gets off the bus.
[CLIP OF DONALD TRUMP: Hello, how are you? Hi!]
NICK BILTON: You only hear him.
[CLIP OF DONALD TRUMP: Make me a soap star.]
NICK BILTON: And so if that technology existed today, I can guarantee you that Donald Trump would have responded by saying, "Oh, it's fake. It's fake news, it's fake audio. You can't see me. I didn't say that." And it would just be this video's word against his.
JAD: Okay, actually that's kind of like for me, that's sort of the real problem here. Like, you create this possibility for, like, plausible deniability. It's so broad. You know what I mean? It's like -- you know, it's like the tobacco industry in the '60s and '70s. You know, I was just reading this great article by the writer Tim Harford about this. In the '60s and '70s, the tobacco industry led this very calculated effort to sort of push back against cancer science by, you know, just injecting a little bit of doubt here, a little bit of doubt there.
ROBERT: Right, but on the other hand ...
JAD: On the other hand this, and on the other hand that. And the idea was to create just enough wiggle room that nothing happens.
ROBERT: They do that with climate change, too.
JAD: Exactly. And it's that little bit of doubt that creates a paralysis. And is that what's gonna happen? That, like, there's gonna be paralysis now writ large, because now we're talking about the very things we see, the very things we hear.
ROBERT: But wait. But don't you think that before we get completely carried away with the threat of this technology, you know, maybe we should just find out literally where we are now.
ROBERT: We should give it a spin.
JAD: Yeah. Mm-hmm.
SIMON: So at this moment, do you think making one of these clips is possible?
NICK BILTON: Yeah. I think it's entirely possible. It just -- I would be careful what it is.
JAD: After the break, things get fake.
[ANGELA: Howdy everyone. It's Angela calling from Dallas, Texas. Radiolab is supported in part by the Alfred P. Sloan Foundation, enhancing public understanding of science and technology in the modern world. More information about Sloan at www.sloan.org. Thanks, Radiolab.]
ROBERT: So, we're back. We're going to now fake something. We're going to build our own video from scratch.
JAD: Fake words, fake faces. Because we wanted to know, like, in use, how dangerous are these technologies, really? Can they make a convincing fake? Are they as easy as advertised?
ROBERT: So we will find out by giving the assignment as always to our long-suffering Simon Adler.
SIMON: So while I was in Seattle talking to Durin Gleaves, I not so subtly hinted that I would really like to give VoCo a whirl.
SIMON: Let's say I had my hands on it somehow.
DURIN GLEAVES: Absolutely.
SIMON: What can I do with it?
DURIN GLEAVES: Well, right now nothing, because we haven't shared it with anybody.
SIMON: At first, I just thought he didn't want *me* to be able to play around with it, but then I realized ...
DURIN GLEAVES: But, I don't even have a personal copy for myself yet.
SIMON: Oh, so it's not even on the premises here.
DURIN GLEAVES: No, it's still very much contained to research.
SIMON: But ...
MATTHEW AYLETT: Hiya. Hi, are you there?
SIMON: Hey, Matt.
MATTHEW AYLETT: Yeah, I'm here.
SIMON: Eventually, I got in touch with this guy.
MATTHEW AYLETT: So I'm Dr. Matthew Aylett. I'm the Chief Science Officer at CereProc Limited.
SIMON: Which is a vocal synthesis research company based in Edinburgh.
MATTHEW AYLETT: Yeah.
SIMON: Okay, so I called you up because I was hoping that you could help me to make a video clip that has, I don't know, like George Bush or Barack Obama saying things that they have never said.
MATTHEW AYLETT: Yep, that sounds great.
JAD: That's it? He was just game?
SIMON: Yeah. Now see, the thing is what his company does is not quite the same as VoCo. What they do is, like for a client they'll create a voice that you can then just type in words or sentences and make that voice say whatever you want it to say.
[SYNTHESIZED SPEECH CLIP: I feel sad. That's an interesting idea.]
SIMON: They've created voices with a variety of accents.
[SYNTHESIZED SPEECH CLIP: Great rooted blossomer.]
SIMON: In a variety of languages.
[SYNTHESIZED SPEECH CLIP: [speaking Spanish.]
[SYNTHESIZED SPEECH CLIP: [speaking Japanese.]
SIMON: And in their spare time, when they're not making voices for clients ...
[SYNTHESIZED SPEECH CLIP: This is Governor Arnold Schwarzenegger. I think ...]
SIMON: ... they're building celebrity voices. And it just so happens they've got a Barack Obama and a George Bush bot.
MATTHEW AYLETT: Yeah.
SIMON: How did you create a George Bush robot?
MATTHEW AYLETT: Well, a great thing about George Bush is that he was President the United States for some time.
[CLIP OF GEORGE W. BUSH: Good morning. Good morning. Good morning.]
SIMON: Which means he had to give ...
MATTHEW AYLETT: The weekly presidential address.
[CLIP OF GEORGE W. BUSH: A week ago today, I received a great honor.]
MATTHEW AYLETT: And the other great thing about the address is it's completely copyright-free, so we're allowed to do anything we like that audio.
[CLIP OF GEORGE W. BUSH: For the people of America.]
MATTHEW AYLETT: Maybe things that they haven't envisaged that we're going to do with it.
SIMON: Real quick digression here, just because it's absolutely fascinating. It looks like we're actually about to enter this really sticky gray area when it comes to voice ownership.
MATTHEW AYLETT: For example, an audio book ...
SIMON: So if you record an audio book and you've signed over the rights to those audio files to the publisher ...
MATTHEW AYLETT: The publisher has the copyright.
SIMON: You don't own it. You do not own your own voice.
JAD: Is that really true?
MATTHEW AYLETT: Yeah.
SIMON: Anyway, back to Bush.
MATTHEW AYLETT: So I took all those weekly addresses.
SIMON: About six hours worth, which is a lot more tape than VoCo's 20 minutes. But what he did with it is pretty similar.
MATTHEW AYLETT: Right.
SIMON: He fed them into this machine-learning algorithm along with their transcripts, and then what the program will do ...
MATTHEW AYLETT: It will take the text and it will analyze it in terms of the linguistics. It will say this is the word ...
[CLIP OF GEORGE W. BUSH: Social Security.]
MATTHEW AYLETT: Social. The word "social" is made up of the sounds suh-oh-sh-ul, right?
MATTHEW AYLETT: And so we'll cut those sounds up into lots of little tiny pieces.
SIMON: And it did that for all of the words in all of these addresses. Around 80,000 in total. Put them all in this database with tons of info about what sound came before it, after it, etcetera.
MATTHEW AYLETT: And ...
SIMON: Once that database is built, all that's left to do ...
MATTHEW AYLETT: I type in some text and then I push go and it will try and find a set of little sounds which will join together really nicely. And then I push play and see how well they came out.
SIMON: So what we did was we found an old video of former Presidents George Bush and Barack Obama together.
[CLIP OF GEORGE W. BUSH: I want to thank the President-Elect.]
SIMON: They're shaking hands, making generic statements. The exact clip isn't important. But we wondered, could we turn that clip from a boring meet-and-greet to a scenario where Bush is telling Obama a joke? So we convinced a comedy writer Rachel Axler who works for the show Veep to write us a few jokes and sent it off to Matt, and this is what the computer spat out.
[GEORGE W. BUSH SYNTHESIZED VOICE: And well, it goes something like, knock knock.]
[BARACK OBAMA SYNTHESIZED VOICE: Who's there?]
[GEORGE W. BUSH SYNTHESIZED VOICE: Oval.]
[BARACK OBAMA SYNTHESIZED VOICE: Oval who?]
[GEORGE W. BUSH SYNTHESIZED VOICE: Oval. I think it's something about the Oval Office, probably.]
[BARACK OBAMA SYNTHESIZED VOICE: That was a -- a very good joke, Mr. President.]
[GEORGE W. BUSH SYNTHESIZED VOICE: My wife Laura tells it better.]
JAD: [laughs] What the hell was that?
SIMON: Wait, what?
ROBERT: That was terrible!
ROBERT: Technically, that was, like ...
JAD: I don't ever get -- a) I don't understand that joke at all. And that's literally what the computer spat out?
SIMON: That is what the computer spat out and truth be told, I don't think it's anywhere -- it is not worthy of the negative response that you are giving it.
JAD: That's terrible.
ROBERT: That's terrible.
SIMON: Let me show you another one.
[GEORGE W. BUSH SYNTHESIZED VOICE: So happy to be joining forces with this good man to put cortisol in your drinking water.]
JAD: Put what?
[BARACK OBAMA SYNTHESIZED VOICE: Cortisol?]
[GEORGE W. BUSH SYNTHESIZED VOICE: It's to help protect people's teeth so they don't get fillings.]
[BARACK OBAMA SYNTHESIZED VOICE: Isn't that fluoride?]
[GEORGE W. BUSH SYNTHESIZED VOICE: Oh, shoot. I think I signed the wrong bill.]
SIMON: Pretty good!
JAD: No, the robots are terrible.
ROBERT: I couldn't hear cortisol.
JAD: But the joke is funny. I like the joke. But the robots just massacred that joke. Which is in itself kind of a joke.
SIMON: Well, I do think that -- yeah, let me get into it. Well, I think that you two are far more critical than you should be, and you are far more critical than the average listener. However, Matt ...
JAD: You're so wrong about that.
ROBERT: But anyway ...
SIMON: Matt did tell me that conversations, getting people to talk back and forth to each other are still really difficult for a synthesizer to do.
MATTHEW AYLETT: You know, conversational stuff is always difficult. And in fact, we're going to -- it's going to be a long time before we get really, really easy conversational synthesis. There's all sorts of barriers to that.
SIMON: There's a human quality to a conversation that the synthesizers can't quite capture yet. But he also told us that, you know, if we add -- once we add the video or if we add a video to this, it will smooth out a lot of the problems.
MATTHEW AYLETT: When you have the faces as well speaking, people are not focusing on the audio.
MATTHEW AYLETT: And you can't hear the errors in the same way.
SIMON: So ...
KYLE OLSZEWSKI: Hello?
SIMON: Hey, is this Kyle?
KYLE OLSZEWSKI: Oh, yeah.
SIMON: Great. Great.
SIMON: I found these two grad students.
SHUNSUKE SAITO: My name is Shunsuke Saito.
KYLE OLSZEWSKI: I am Kyle Olszewski.
SIMON: From ...
KYLE OLSZEWSKI: The University of Southern California.
SHUNSUKE SAITO: USC.
SIMON: They also do a lot of facial reenactment research, and agreed to help us. But making these visuals also turned out to be way harder than we thought. Turned out the clip we chose posed some serious challenges. There were too many side shots of Obama's face, the lighting was all wrong. And eventually I got an email one late Sunday night saying it's not gonna work.
ROBERT: Okay. So now I think I can draw a line here, and I can point out that this -- that we maybe got overexcited about this technology. It is not yet ready for true deceit. You have been fumbling and fumbling and fumbling here.
SIMON: I have not been fumbling. I am not the running back here.
JAD: I find it interesting psychologically that Simon feels like it's a personal failure.
SIMON: I don't like to fail.
ROBERT: You should. This is ...
JAD: So okay. Just -- just on Simon's behalf, on behalf of actually trying to answer the question, we felt like okay, maybe -- maybe we should try this one last time. Let's find a simpler Obama video and with the audio, rather than like whole phrases, let's just do a couple of word replacements here or there. By the way, the only reason we're using Obama is that he seems to be the guy all these technologies are built around. In any case, we chose the video of Obama's last weekly address, and we chose the audio from a talk he'd given in Chicago after he'd left office.
[CLIP OF BARACK OBAMA: So, uh, what's been going on while I've been gone?]
JAD: In this speech, he sort of talks about what he's gonna do next, how he's still gonna keep fighting for what he believes is right.
[CLIP OF BARACK OBAMA: Filled with idealism, and absolutely certain that somehow I was gonna change the world.]
JAD: But we thought what if in an alternate reality, he didn't want to keep fighting? What if he could at that moment see the divisions ahead, and he was just like, "Ah, that's too much. I give up." Now truth is, we didn't think too hard about this, because we didn't have much time. We just whipped it together. Did a script based on words Obama used with a few changes, sent it off to the guys at USC.
SIMON: And I videotaped myself saying this new script so that we could use that video of my face to puppetize the former president. And when we got the final video back ...
JAD: I have to say it was -- I was expecting it to be horrible, and we were to have a good laugh, but I -- it went from, like, laughy, giggly to -- to, "Oh, wait. This is creepy!"
SIMON: Well, yeah. No, I was suddenly -- I had been gangbusters, we got to release this thing and not tell anybody and try to fake out the entire world. But when I saw it, there was a reluctancy.
ROBERT: You mean you went, "Oh, no?"
SIMON: I went, "Oh, God." Yeah. Yeah. I thought, "Oh, this -- this ..."
JAD: You know, my personal thought was like, it was convincing enough that I got genuinely spooked. But, you know, just in fairness, we shouldn't sit around talking about something people can't see. Go to FutureOfFakeNews.com and check it out for yourself. It's all one word. FutureOfFakeNews.com and it'll pop right up. You can see, tell us what you think. You can see how Simon made the video. Check it out. Anyhow, the whole process got us all thinking like "Oh, wow. If we, a bunch of idiots, can do this for no money very, very quickly. What will this mean to, like, a newsroom, for example? Just to start there.
JON KLEIN: We're at the level now with this kind of thing where we need technologists to verify or knock down, and ...
SIMON: Again, news executive Jon Klein.
JON KLEIN: I don't think journalists, English majors, are gonna be the ones to solve this, you know? You may have been editor of your school paper, but this is beyond your -- your capability. But if you're good at collaborating with engineers and scientists, you know, you'll have a good chance of working together to figure it out. So we need -- we need technical expertise more than we ever have.
JAD: Can I ask you, in your heart -- let me compare your heart to my heart for a second. In my heart, I want somebody to tell the researchers, "Yeah, sorry you can't do that. Sorry, you know, I know it's really cool. I know you -- I know you probably are really proud of that algorithm. But some men in black are gonna walk in right now and they're going to take your computers away. And you just can't. Sorry. Society is going to overrule you right now." Do you -- is there a part of you that just dictatorially wants to just, like, squash this?
JON KLEIN: Well, sure. But wouldn't you still have the, what are they? The FSB in Moscow or the CIA utilizing this and developing it anyway? Weaponizing it, so to speak?
JON KLEIN: And I think that the top-down model could never contain that.
SIMON: Jon says ultimately what's happening is probably going to be bigger than any one organization or any one newsroom can solve. He said it'll probably end up coming down to the 14- and 15-year-olds of tomorrow who will grow up using this technology, making fake videos, being the victims of fake videos, and that maybe in the maze of them having to parse truth from fiction in such a personal way, some kind of code will develop.
JON KLEIN: I'm an optimist by nature. I do -- I look at this and I say, "Well, somebody's gonna figure it out." What worries me is the larger context within which this takes place. This is all occurring within a context of massive news illiteracy, and the -- the consumers seem to be just throwing their hands up and tiring of trying to even figure it out. And so just the work involved in getting to the bottom of the truth is unappealing to a growing percentage of the audience. And I'm not sure where Gen Z, the teenagers of today come out on this. Let's hope that they are more willing to do the work, maybe out of self-interest, maybe so that they're not dissed by, you know, the girl in social studies. But that's our best hope for overcoming it, because everybody else seems to be sick of trying.
JAD: Reporter Simon Adler. This piece was produced by Simon and Annie McEwen. Very special thanks to Kyle Olszewski and the entire team at USC's Institute for Creative Technology for all their work manipulating that video of President Obama. And thanks to Matthew Aylett for synthesizing so, so many words for us. Rachel Axler for writing us the jokes that we tried to use. Sohum Pawar for building us an amazing website, Angus Kneale, Amy Pearle, everybody in the WNYC newsroom for advising us and giving us reaction shots to the face-to-face video.
ROBERT: And to David Carroll for putting us in touch with Nick Bilton in the first place. And to Nick Bilton for inspiring this whole story with his article. He's got a new one -- a book actually, American Kingpin, about the founder of a black market website called The Silk Road. And to Supasorn Suwajanakorn, a computer scientist who works in Ira's lab who helped us understand what the heck was going on.
JAD: And finally, you can see the video that we created as well as a bunch of other kind of crazy clips that we mentioned throughout this episode. It's at FutureOfFakeNews.com. It's all one word. FutureOfFakeNews.com. And with that, my real co-host and I will bid you adieu.
ROBERT: I'm Jad Abumrad.
JAD: I'm Robert Krulwich.
ROBERT: That's who we really are.
JAD: I'm glad we could finally be honest about that.
ROBERT: Yeah. All these years.
[ANSWERING MACHINE: Message two. New. From an external number.]
[JON KLEIN: This is Jon Klein, calling from the frontiers of media.]
[MATTHEW AYLETT: My name is Dr. Matthew Aylett, and I am the Chief Science Officer of CereProc Limited.]
[HANY FARID: I am Hany Farid, professor of computer science at Dartmouth College. Radiolab was created by Jad Abumrad.]
[JON KLEIN: Jad Abumrad. And produced by Soren Wheeler. Dylan Keefe is our Director of Sound Design.]
[MATTHEW AYLETT: Our staff includes Simon Adler, David Gebel, Tracie Hunte, Matt Kielty ...]
[JON KLEIN: Robert Krulwich, Annie McEwen, Latif Nasser, Malissa O'Donnell, Arianne Wack and Molly Webster.]
[MATTHEW AYLETT: With help from Soma Sowa, Rebecca Chaisson ...
[JON KLEIN: Rebecca Chaisson, Nigel Batali, Phoebe Wang and Katie Ferguson.]
[MATTHEW AYLETT: Our fact-checker is Michelle Harris.]
[ANSWERING MACHINE: End of message. To hear the message again, press 2. To delete it, press 76.]