BROOKE GLADSTONE: Even the most neutral words can carry subtle messages. A team of researchers has developed an algorithm that can predict a writer's sex about 80 percent of the time based solely on how often they use several simple words like "you," "the," "some" and "with." Clive Thompson wrote an article in the Boston Globe about this so-called "gender detector."
CLIVE THOMPSON: The basic idea behind it was the discovery that women use personal pronouns a lot more than men do -- words like "you," "they," "he," "she." That's actually one of the most revealing indicators. If you were just to count those, you would probably have at least a half chance at being able to figure out whether it was a man or a woman that had written it.
BROOKE GLADSTONE: But don't you have a half chance of figuring whether [LAUGHS] it's a man or a woman even if don't count any words at all?
CLIVE THOMPSON: Okay. Yeah, granted. Granted. But that's why they go on to count other things -- for example-- you look at how many what are called "determiners" are used. Those are the things like "the," "some," "more" -- and men use those more. The basic idea behind it if you talk to the linguists who've looked at the study is that it's like that old saw that women write about people and men write about things! And, and it seems crazy and completely hackneyed, and I, I was sort of resistant to the idea a little bit when I first heard about it, but the numbers do add up! If you even go back a couple hundred years and look at the writings of women clerics, they'd be more likely to write sentences that said "I think this is true," whereas a man would just state it -- he would just state "this is a fact." You know, he [LAUGHS] had complete confidence in himself. And so it's the inclusion of those things like "I think" that begin to introduce more personal pronouns into the way that women historically written.
BROOKE GLADSTONE: So hubris, they name is -- man. [LAUGHS]
BROOKE GLADSTONE: Actually the first journal that the researchers submitted this research [LAUGHS] to rejected it on ideological grounds!
CLIVE THOMPSON: That's right. They thought it was potentially kind of sexist because they said you are suggesting that there are innate differences between men and women and we don't know if that's true. I, I gotta say I started off being a little skeptical of this, party because gender science has a long history of starting off with kind of a weird basic premise which is that men and men [sic] and women are women and never the twain shall met, and so when you go out looking for differences between men and women with that as your starting point, well you're, you're going to find them! I mean in the same way that you could do a study saying well maybe old people and young people write differently. You'd probably find that too, quite frankly. Now the scientists themselves are very careful - they're walking on egg shells here. They're very careful to say that they're not suggesting any reasons why this happens. They're, they're all - you know - we're just about the numbers. We'll let the-- We'll leave it to the philosophers to figure out why men and women do this differently, and that's where the linguists come in and say well, men and women are you know, sort of, are trained to, to talk in different ways.
BROOKE GLADSTONE: Now what's really interesting is that words that you think would actually indicate gender -- words that have meaning - words that have an emotional context or words that pertain to certain subjects -- those were eliminated before this evaluation of these pieces began!
CLIVE THOMPSON: Yeah, that's a kind of weird thing about this language analysis stuff -- you would think, you know, if you're going to look at something I've written like an article that you'd be interested in looking at the major words, like the article I wrote - you, you'd think you might pay attention to words like "computer" or "gender" or whatever. But when people do these analyses to try and figure out who it was that wrote something, they strip all those big words out -- they just leave the, the little stuff. The theory behind that is that when I'm writing an article, I'm paying attention to the big words, I'm paying attention to the words like "computer" or "artificial intelligence," but I'm not paying attention to how I use words like "the," and "and" and "but." Those things are literally unconscious! And because they're unconscious, they leave a fingerprint of who I am.
BROOKE GLADSTONE: Well how much does the intended audience of a piece of writing play into the gender specificity of the text. In other words if you're writing for a men's magazine or you're writing for a women's magazine, does that muck up the question of these gender signifiers?
CLIVE THOMPSON: Yes, it does actually. Deborah Tannen, the linguist who writes those books You Just Don't Understand: Men and Women in Conversation --she studied a lot of how men and women talk differently, and her students once took a look at men's magazines and women's magazines, and it was very easy to spot when someone was trying to write for a men's magazine, because they would use these very short sentences, and with women's magazines they would be longer and they would have more of these pronouns. But the really weird thing was it didn't matter whether a man or a woman had written it which is to say when women wrote for the men's magazines, they wrote as if they were trying to be a man communicating to men! So obviously there's some part of, you know, being a man or a woman that's about performing it as opposed to being it.
BROOKE GLADSTONE: Okay! Clive thank you very much!
CLIVE THOMPSON: Thank you.
BROOKE GLADSTONE: Clive Thompson is a writer for Slate, the Boston Globe, the New York Times, and others. The computer scientists whose research we just discussed are Moshe Koppel, Shlomo Argamon, Anat Rachel Shimoni, and Jonathan Fein. They wrote two papers about their work-- one in the journal Text and one in the journal Literary and Linguistic Computing.
BOB GARFIELD: Coming up - I share my pain in a shaggy dog story from Hollywood.