Woman 1: Hey!
Woman 2: Hi! How was work?
Woman 1: Ugh, exhausting, but fine. I think Gina finally admitted that we need to hire another person for the team, so that’s good. Let’s just hope it happens sooner rather than later. How was your day?
Woman 2: Eh, it was okay. I got some writing done, but I’m still stuck on that one plot thing. What do you want for dinner?
Woman 1: Mmmm, not sure. Not pizza, we ordered in for lunch.
Woman 2: Is Alia doing something in the car?
Woman 1: What?
Woman 2: Alia, she didn’t follow you in.
Woman 1: I thought you were picking her up.
Woman 2: Um, no, you said you would this morning.
Woman 1: YOU definitely said you were picking her up.
Woman 2: No I didn’t! You did! I’m 100% sure.
Woman 1: Okay let’s look at the transcript!
Woman 2: How about we just pick up our daughter instead?
Woman 1: SEE HERE IT SAYS…. oh …. shoot…
MZ: Ugh. I have had this argument myself with my husband. C’mon man, you said you were going to figure out what’s for dinner. I don’t think so. That was yesterday. And sometimes I have to admit I’m not sure what day it is. Rose have you ever had this happen to you before?
RE: Oh yes, I fight with all sorts of people about all sorts of things. And I’m always right, I would like to note. Bu t I wish I had a transcript to say so.
MZ: It’s Note to Self, the tech show about being human. I’m Manoush Zomorodi. And this week, we are joined by Rose Eveleth. She’s the host of “Flash Forward,” great podcast about what it might be like to live in the future … and actually why the future is often way way closer than you think it might be. Today we’re going to talk about a future where every conversation is transcribed. Everything you say is written down. It is Searchable, it is recorded. And Rose, why would any ordinary person even want this?
RE: Well first you could win arguments. That’s useful.
MZ: Right. That we’ve established.
RE: Yeah, you could capture funny things your kids say. Right? Like that’s a nice little thing. But there are other companies that are working on things like this speech recognition to be able to open up your phone, open your bank account as a biometric security thing, or if you’re at a doctor’s appointment and you have a conversation with your doctor and then afterwards instead of the doctor go and write down everything they said, it just autopopulates your medical records. So it seems like this sci-fi future that might never happen. And there are all these companies that are trying to develop speech recognition so basically you could unlock your phone with your voice, unlock your bank account with your voice, and as soon as they can solve the problems required to make that work, the next step to transcription is really really easy. All those problems that are solved for speech recognition would allow us to have this future.
MZ: I mean in some ways, the future is already here in that Siri is listening for instructions and Amazon’s Echo thing that you know, you can say out loud Alexa turn on Radiolab and it will start playing it. So they are listening all the time in some ways. So that means there’s privacy implications there. And that it’s a very small step from listening all the time to recording, transcribing, and keeping.
MZ: So we actually did test out if someone’s life was taped and put on the record. And some intense things occurred. Some very boring things -
RE: A lot of boring things.
MZ: But also a crazy moment, which we’re gonna get to in just a little bit. But for now we are ready to go back to the future with you, Rose.
RE: We start in California with someone who is already using speech to text systems in her daily life, all the time.
HR: My name is Heather Ratcliff and I work in the cannabis industry but more importantly I’m disabled, I have a disease called Ehlers-Danlos disease which is a collagen disorder and it causes my joints to dislocate really easily, so I end up using voice to text really often because my fingers will dislocate.
RE: And it’s super painful when her fingers dislocate. So Heather uses speech to text in a couple of different ways. She’ll use it when her fingers or wrists have dislocated because she can’t type. But she also likes to tape interactions she has, to have a record of them.
HR: I’ll do that pretty often if I’m going to the doctor’s office, if I’m seeing a specialist. I’ve used it at work before if I’m having a disagreement with a boss. One of the things that happens with Ehlers-Danlos disease is you get something called brain fog, all of the things that happen in my body in the background don’t actually regulate properly, like the body temperature or my blood pressure. So what ends up happening is the blood pools in my feet instead of going to my brain, so I have serious memory issues, and so the ability to record the things that I’m going through in my day is really integral to me, because I don’t always remember the conversations that I’ve had because I have memory issues.
RE: So you can kind of see why she’d want something like this. In Heather’s ideal world she’d be just carry phone along with her all the time and record everything. But right now, she can’t do that for a couple of reasons: her phone’s battery just doesn’t last that long, for one. And if you’ve ever used speech to text systems like Siri you kind of know that they don’t exactly why you’re saying … and that happens to her all the time.
[CLIP Siri Siri Siri Siri please find the note to self, where can I find the note to self note to self podcast? Playing the very best of the Velvet Underground. Sorry Jenna I didn’t get that. Starting Facetime. No no]
RE: So the systems we have now, aren’t perfect. But there are lots and lots of people working on making them better. Like this guy.
SR: Hi, my name is Steve Renals and I'm a Professor of Speech Technology at the University of Edinburgh.
RE: The error rate of speech-to-text systems today hovers around 7 percent.
SR: For some people the error rate will be down 2 or 3% for other people it will be much higher and this is even beyond just accent. There are just some people who are just well matched to speech recognition it's kind of a sheep and goats type of thing.
RE: A sheep and goats type of thing?
SR: So we think of sheep, sheep are people who a system recognizes well and goats are people that systems don't recognize well.
RE: Is that like a common expression in the field?
SR: I think colloquially yeah colloquially.
RE: I've never heard that before.
RE: But when it comes to accuracy, you don’t really want goats, you want everybody to be sheep. You want the system to be really good at understanding everybody. And Steve says that there are a couple of things standing in the way of these systems being, really good.
SR: so all of speech recognition basically works using machine-learning techniques and we're just trying to improve the mathematical models and the algorithms that we use just to work better. And so so that's the sort of part of the really core activity but then there are other things where issues such as how you can deal with overlap. at the moment i'm in a very nice acoustic environment. there's just me and a microphone very close to me and there’s no competing acoustic sources. But if you're recording your daily life then you're going in a metro, you're on a bus, you're walking down the street. there's many many acoustic sources, some of them are other people talking, talking to you, other conversations around you, just kind of street noise and just stuff happening, being able to deal with all that, being able to kind of robustly transcribe things in the presence of all these other sounds is really challenging.
RE: Okay so, how long will getting all of that stuff figured out, actually take?
SR this is a question you should never ask people in AI because they always say 5 years, since the late 1950s and it's always a pretty good guess so the future is always very difficult to predict.
RE: you very well avoided giving me a number there that was good
SR: I gave you a number to start. I said 5 years.
RE: So let's say instead of trying to guess how long it will take, let's say in 10 years from now what does it look like?
SR: So in 10 years time i think we will certainly have routinely very good speech recognition on all our devices and i think we will get systems that can actually start to deal with speech with multiple talkers and multiple acoustic sources and so on.
MZ: When we come back, we’ll put the transcribed life to a real test. And we’ll talk about some of the pitfalls of this technology. And how horrible some people can be. Even when they know they are being recorded.
[CLIP So this is a strange question, okay, but I’m doing something for a podcast where I’m recording my whole day, do you mind if I record?]
MZ: We’re back and you’re listening to Note to Self. I’m Manoush Zomorodi. And we have a guest host, reporter, producer, all around awesome science technology person. Rose Eveleth, she is the host of a great podcast you should check out, Flash Forward. And she’s here this week. And Rose, so as I was getting ready for the transcribed life, which is what we’re talking about, there’s this great crazy stat. It is estimated that we spend 40% of our waking lives talking. I would like to add though that if you are my 9 year old son - yes, Kai, I'm talking to you - it's more like 95% of all your waking hours talking. That's a crazy statistic. We are running at the mouth, non-stop.
RE: Yeah, I work home, by myself - does talking to your dog count as talking?
MZ: Yeah dude.
RE: That counts. It’s human to animal talking. Alright. Then it’s probably like 95% for me.
MZ: Definitely. And if you’re talking so much, that’s sounds like a looooot to be sucked up into this transcription world that you are delving into ...Where are we going now?
RE: We’re going to do the big picture. And we’re going to go back to Steve the professor of speech technology and he says living the transcribed life would like living life on the witness stand, but your whole life. Yeah.
SR: For me that would be one of the potential dystopias where, where you have to be careful every sentence you say. i can kind of live with it at work to some degree but I wouldn't want it in my personal life. i don't think any of us would. right?
RE: I go back and forth because I, one of things, I had not thought of this, a friend of mine when i was talking to her, she was saying often women get told don't say this, say this. don't talk like that, talk like this. Don't apologize so much. And she was like oh i can go through my transcript and see how often i'm saying those things and try to kind of be better about it. And that sounds like my worst nightmare, to go back and read through all of the verbal ticks that I've developed to like stop doing them for some reason because it makes other people happier.
SR: yeah, don't
RE: thank you
SR: The flip side of it is, there's a lot of particularly based around gender actually there's a lot of misperception about how women talk differently to men and if you had this kind of large scale transcription then you could actually do the statistics to see if it's true so you can find out who interrupts more and so on and so on and so on. But you know, i would hate to see this as being viewed as somehow a self improvement thing.
SR: This isn't my future.
RE: And that actually brings me to someone else who has spent a lot of time thinking about what this future life would be like.
SW: I am trying to imagine what the like metadata around the transcripts look like.
RE: Sara Watson is a tech critic and research fellow at the tow center for digital journalism.
SR: is there a way to algorithmically detect sarcasm, is there a recording of the laughter, like in parenthesis laughing, do you know who was laughing. all of these things can really color the conversation and if all we have is a transcript and not necessarily the audio, how would that be useful information?
RE: And, going back to the sheep and goats thing that Steve mentioned, some people are going to be better understood by these systems than others.
SW: I think about different accents or like patois or just kind of new cultural, like what does on fleek look like or the future whatever version of that is, how does that get translated. and is there a way to capture those trending verbal expressions quickly. how much cultural appropriation or like meme-worthiness would come out of these transcripts that wouldn't necessarily be surfaced otherwise
RE: But my big question for Sarah, is whether or not keeping these kinds of records is going to change the way that we talk to each other.
SW does it mean that we care exactly how you said it and have an ability to look back and say well you said this
RE: And I wonder if that sort of enhances the feeling of oh god I can't speak because I might say something off the cuff that I don't want to say, or that I don't want to be on the record
RE: So Manoush, we wanted to test whether people really would ask themselves these questions that Sara and I had been talking.
MZ: and there are a lot of questions-
RE: tons of questions. So we asked Heather Ratcliffe, who you heard earlier, to record THREE WHOLE DAYS of her life-- so that we could transcribe them and take a closer look.
MZ: Yeah. I mean I can’t believe you got her to do this. B it was really just to see, what is it really like when you listen to someone’s life that closely-- and transcribe every last word…
RE: I’m glad it wasn’t my life.
HR: Okay! So, hi! I am recording everything that I’m doing for the next few days, and I just got up and I got a breakdown of how to use this crazy fancy recorder that I have in my hand that looks like some sort of tech from Star Trek, honestly I look like I could zap somebody with this.
RE: She recorded all this stuff-- seventy-two hours of it.
HR: Will someone come make the bed for me, question mark? CUT TO One of the things I’m a little worried about in this recording is that there is going to be dead air. Just tons and tons of dead air while I sit here and do this.
RE: And she was right, there’s lots of dead air. I listened to so much dead air. And driving, and typing. Also-- a lot of muffled conversations, when the recorder was in a bag, or nearly drowned out by the car or when her dog peed on it. But there were also important little moments-- moments when she was counseling her sister through a break up, deciding what kind of drugs her dog might need, and even talking to her therapist... All kinda dull….but then, something more...telling happened…
[CLIP hi hello, so this is a strange question okay. But I’m doing something for a podcast where I record my whole day, do you mind if I record? No. okay.]
RE: She was meeting a potential client when he launched into a story and started using language that Heather was really uncomfortable with.
[CLIP I was robbed three times, pistol whipped once and that’s where they fractured my skull.]
MZ: Hey it’s Manoush jumping in here-- we were going to play the tape of what this man actually said we kind of went back and forth about this but we decided that the language he uses was incredibly offensive, so we’re not going to
[CLIP all three times. Ah, you’re killing me with that word. Sorry]
RE: Heather told me about what exactly happened afterwards.
HEATHER: He started using words that are not appropriate and I was acutely uncomfortable and I had to be like whoa I can’t, we’re recording!
RE: You didn’t feel like you were glad you had it on tape as proof or something?
HR: It just ended up making me feel uncomfortable because I couldn’t decide if I was being vehement enough in like my, my discomforts and speaking up and being like this is not cool I’m not okay with this language around me. And so I just, was afraid that I was coming off as too meek and timid. I ended up doing a voice diary after the fact.
HR: Oh my god the ways in which I was not expecting the casual use of the n-word in that conversation. Hoooo. That’s so stressful.
HR: So, I hope everyone knows that I’m not okay with that language, I tried to make it clear.
RE: That’s so interesting, there’s the pro and con right there, where it’s like you have proof that this person said something that’s really bad, but there’s the flip side of you having to worry whether you were vocal or assertive enough on the record to prove that you’re not okay with it, and you’re not complicit in this conversation?
HR: Yeah, it was really strange how much I was focused on my side of things and what people were going to think when they heard how I was going about my day.
MZ: Ugh. Rose.
MZ: I mean that moment. There's so many layers to it. Like on the one hand, this is a reprehensible -
MZ: ugh. and then like to me, I started thinking about this idea that Heather immediately went to what will people think of my reaction. The sort of performative aspect of not being even in the moment. Does that make sense?
RE: yeah. and she - I mean it clearly bothered her a lot. Cause she talking about it not just in that voice diary right after but even in the voice diary before going to bed. she talked about it again, where she was like - you know it's that moment: you're laying in bed and you're replaying the conversations that you ad that day. And you're like oh my go I said this thing, I probably shouldn't have but now there's a record of you saying it.
MZ: Literally replay the moment.
MZ: And you can't take it back like in your mind, what I would have - been like and I would have said this and then that would have happened and that would have made it better. And you're like well, nobody knows that that isn't what happened.
RE: One of the things that she said was she felt like shed didn't put her foot down hard enough and then when I listened to it, I sort of felt like maybe she didn't. You know, she was quickly like don't say that and then he was like I'm sorry and then she was like it's okay. And you know, I was like, maybe she could've - what could she have done. And then I started thinking like I don't think I would have done any better necessarily. But it's so easy to analyze other people's words without sort of putting yourselves in their shoes in a way that's like empathetic. So it's easy, I can totally imagine especially with a controversial public figure or somebody who's like a public or even other journalists, if you read someone's transcript and you could say, well I would have done this and I would have done this and you could do this and you could do that. But in the moment. It's so hard. It feels kind of paralyzing to me like if I were to think about you know, listening to all this tape that Heather had and even not this conversation necessarily, but other conversations that were totally benign it definitely made me feel that I wouldn't I wouldn't just want to have this all the time. Because it's so easy for someone to just read something that you say and judge you for it, maybe because they don't know the person that you're talking to, they don't know your inside jokes. Like I can be very sort of sarcastic and I don't think I'm being mean but when you read it, it might be like wow, she's really mean. I don't think I'm a mean person. But the transcript might make me look really mean.
MZ: People are complicated.
MZ: And we all have nuance that would get lost.
RE: Almost certainly, yeah.
HR: So it’s 9:15 now and I’m actually believe it or not going to get to bed. Um. It’s so early. I’m such a wuss. Uh yeah, weird day to do a recording and weird weird conversation to get stuck in at work. Just like, aah why do people think they can be racist around other white people? And now I’m like not feeling like I put my foot down hard enough. Ugh. Anyways that’s a weird conversation to have on a recording. Alright goodnight.
RE: What do you think Manoush? If this is a device that you could wear around, which is something that technologists are in fact working on. Would you wear it?
MZ: I’m conflicted because on the one hand obviously I am a documenter of life, right? It’s why I became a journalist, it’s why I take a million pictures, it’s why I try to keep a diary about my kids like cute things that they say. On the other hand, the idea of a tech company or any company having access or the ability to look at everything I’ve ever said, ugh. that definitely creeps me out. What about you? Would you?
RE:I’m the same way. I’m the kind of person who like wants to know everything and always wants to record everything. But at the same time what is the use that I’m getting out of this? If I have this thing, do the pros outweigh the cons? I don’t know that they do.
MZ: But I don’t think you’re going to get to make a choice. You know what I mean? Like I really do think Alexa and the Amazon echo are the most interesting example of how this is just going to seep into everything that we’re doing. Like, okay, I was at my parents house and there’s a consumer reports magazine lying around - yes - and so I picked it up and I was reading the reviews of all the different types of you know - the latest thing on siri and the latest thing that the echo has. But there was no mention of the privacy implications - there was no mention that like who knows the government could maybe subpoena amazon for the records of everything you’ve been saying - like there was no mention in a consumer reports. And to me that is a crucial thing that needs to be folded into how we go about rating the efficacy of all these new technologies.
RE: Yeah, I mean think that they key to getting this to be widely accepted is making it really convenient, right? And that’s how you get facial recognition systems trained by facebook - you play this little game, is this your friend Mike? Yes. oh wow look you’ve trained facebook to recognize faces extremely accurately. As soon as companies make this a convenience play, they make this a game where it’s like did you say this? Clean up the transcript. It’ll be fun. And no one is going to put a little asterisk at the bottom and say you know by the way, this is subpoenable, this is something that could be published outside of your control - you have no idea who is going to read this in the future.
MZ: But then on the other hand, I don’t know, how many times have I had a meeting with my boss where I’m like, wait what did he say. I gotta write it down. He had some really good ideas in there, I should have been taking closer notes. If only I had a transcript-
RE: Or I just had that idea and you took the credit for it.
MZ: Yes. That’s a good point too. And I would just also say, we have a beautiful example of Gretchen Carlson, the Fox News anchor, who it turns out, recorded her conversations with Roger Ailes, the head of Fox news, for a year and a half, and it looks like those iPhone audio recordings are what finally put the boot into him and got her a 20 million dollar settlement. So pros and cons. pros and cons.
RZ: Pros and cons.
MZ: yeah man. Alright. That’s tough. I want to put it out there, Note to Self listeners pros and cons. Pros and cons. I wonder if you were like the victim of a crime, or somebody made a promise to you that they didn’t keep, would you think that yeah transcribed life sounds pretty good? Or if you feel like, ah, I don’t want to be - like I Like being able to make mistakes outloud without facing repercussions for it or being held to all the stupid crap that I say. We want to know what you think. So email us at firstname.lastname@example.org.
MZ: Listen to Flash Forward with Rose Eveleth. Rose, where can where can they find you?
RE: Flash Forward pod dot com is me. Alright. And will you come back and do more for us?
RE: Of course, this was fun.
MZ: And will you answer people’s questions? If they have them.
RE: Oh totally.
MZ: So CC - who are you on Twitter?
RE: Rose Eveleth
MZ: Perfect. We want to talk more about this on places where it will be recorded and transcribed - it should be noted. You can leave comments for us on Note to self radio dot org. Another great episode coming to you next week. Until then. The Note to Self team is Jen Poyant, Jenna Kagel, Joe Plourde, and Mythili Rao. Many thanks to you Rose for being here. Alright everybody, talk to you next week. Note to Self is a production of WNYC Studios. And I am Manoush Zomorodi.