BROOKE GLADSTONE: So maybe we don't like the idea that Internet news sites and social network sites intrude on our privacy and collect our data. We nevertheless do like that those sites are free, and they're free because they make money selling our data. But now there’s something else to dislike – a process called data scraping whereby third-party companies use automated software to try and scrape our information from our favorite sites and Internet forums and even resume sites like Monster.com, and the scrapers offer no free service in exchange for that data. Our information is just swiped. Julia Angwin, Senior Technology Editor for the Wall Street Journal, has written about data scraping. Julia, welcome to the show.
JULIA ANGWIN: Thank you very much.
BROOKE GLADSTONE: So could you start with the opening anecdote in your piece earlier this month? Tell us what happened on May 7th.
JULIA ANGWIN: There’s a website called PatientsLikeMe, and patients with a whole host of different diseases, ranging from really serious terminal cancer to depression, gather to talk about their experiences with their disease and talk about the drugs they're using. On May 7th at 1 a.m. a new member joined the “Mood” forum, which is a place where people talk about pretty serious depression, and started cutting and pasting all the posts from that forum. This turned out to be a robot. They traced the scraping to Nielsen Company, which is – most people know them as their TV rating service, but they also have a service called buzz monitoring, where they, quote, listen to the conversations on the Web and tell their clients how are they being talked about. So their clients include all of the top pharmaceutical companies. But Nielsen, since they were caught doing it, they said that they weren't going to use the data.
BROOKE GLADSTONE: How is the data that’s been scraped most often used?
JULIA ANGWIN: Well, the thing about scraping is that it can be legitimate. In some ways what Google does when it searches the Web to make its index of links is a kind of scraping. But there are people out there trying to get every single different type of data. So Nielsen is trying to, quote, listen to conversations and monitor them. Then you have companies that are trying to build up their databases about individuals, so they might be looking for your name and address so they can put it on some mailing list for junk mail, or there might be a spammer who’s trying to get your email address. So the world of scrapers is an unruly one, and it ranges from completely legitimate to completely illegitimate.
BROOKE GLADSTONE: What about legal recourse, is there any?
JULIA ANGWIN: There are three different ways you can fight against a scraper. You can allege that they've stolen copyrighted material. You can allege essentially that they've trespassed on your property; if you’re the website owner, your website is sort of your property. And you can also allege that they've broken a contract. So if they click – you know the little things where it says “I agree” and you, there’s some long list –
BROOKE GLADSTONE: Terms of service.
JULIA ANGWIN: - the thing that you never read? [LAUGHS] If they click that in order to get into an area of the site, and that said “No scraping” then they can also allege. So there are a lot of cases, but the problem is that the rulings have been contradictory. Sometimes courts have said, you know, it’s not really trespass because you actually want people to come to your website, so you can't choose which people you want.
BROOKE GLADSTONE: The data scrapers definitely come off as the bad guys in your piece but, as you note, Google engages in a kind of data scraping. What’s the harm?
JULIA ANGWIN: I mean, the case with PatientsLikeMe is a perfect example, right, where you feel like you’re safe. You've entered some “captcha,” you've joined an account, you've signed some terms of service. And then if some scraper breaks in, steals all that and puts it up for sale, it feels like a violation.
BROOKE GLADSTONE: But it’s still all public information, right? I mean, if we found our way to PatientsLikeMe, then so can anyone else. There’s no lock on the door. That’s the nature of the Web.
JULIA ANGWIN: It used to be that if you were sitting alone in your house, basically everything you did was automatically private. But now we basically have these transmission lines where things you do while you’re sitting alone and you feel like you’re alone are actually going out [LAUGHS] into the universe instantaneously, in real time, before you have a chance to sort of have a second thought about maybe that wasn't a good idea to hit [BROOKE LAUGHS] “Post.” And so now we have to decide as a society what kind of world do we want to live in? Like shouldn't we have takedown notices for our own data?
[LAUGHTER] You know, War on Music can go and get a takedown notice for any video that has their song in it.
BROOKE GLADSTONE: Some might say, look, you don't have to use your real name. And yet there’s this company called PeekYou LLC which has just applied for a patent that would use information about you that’s scattered all over the Web to match your real name with whatever pseudonym you might use on a place like Twitter. Does that presage the end of Internet anonymity?
JULIA ANGWIN: There’s a world of people out there who sell data about individuals, and every additional piece of data they can get means they can put a higher price tag. So for them, the effort it takes to match your Twitter name or your MySpace name, which you think is anonymous, that’s a business opportunity and they're going to pursue it.
BROOKE GLADSTONE: In this chaotic Internet ecosystem, where there’s more value to a targeted ad than there may be to a 30-second broadcast commercial, isn't this just part of the natural evolution of the commercial, which supports most of the free media that we love?
JULIA ANGWIN: It may be that the future of advertising is that every ad says your name in it [LAUGHS] and they know everything about you. That would be, I think, the dream of a lot of [LAUGHING] advertisers. The question is, is it a dream for us as consumers? I worry, though, about a world where everything is known about you so all you see reflected back is yourself. Where is the serendipity? Isn't that the nature of going on the Internet is to find out about the rest of the world? [LAUGHS] So I worry about that, that you would live in a hall of mirrors.
BROOKE GLADSTONE: Julia, thank you so much.
JULIA ANGWIN: Thank you.
BROOKE GLADSTONE: Julia Angwin is the Senior Technology Editor for WallStreetJournal.com.