Transcript
BOB GARFIELD: Nine years ago, Brewster Kahle embarked on a project of monumental proportions: archiving the internet. A recent profile in Slate reminded us that it's probably time to check in with Kahle. When we did, we found out that archiving the largest information machine in the history of the world wasn't enough for the man Slate called an "evangelical librarian." He wants archive everything - the billions of pages on the internet, but also music, movies and books. It's an awesome undertaking, and Brewster Kahle plans to get it done.
BREWSTER KAHLE: Yes. Universal access to all knowledge is our mission. We're working towards that, and interestingly, it's within our grasp.
BOB GARFIELD: All right. Now that's kind of hard to accept on the face of it, so tell me what you mean by universal access to all knowledge and how you are going about giving us that.
BREWSTER KAHLE: Well, if you take just the published works of humankind, which is a subset of all knowledge, but if you take all the books, music, video, television, software, web pages ever produced by humans - preserve it for the long term - and make it available to people, technologically, we have the ability to do that. If we have the political will to do this, and that's the goal of The Enlightenment, the universal education, then we can actually achieve this without busting the bank or inventing new technologies.
BOB GARFIELD: So how much have you accumulated so far that I have access to?
BREWSTER KAHLE: The Internet Archive started by taking snapshots of all publicly available web sites. Since 1996. We take a snapshot every two months and store them on hard drives. It's now about 500 terabytes of information, and it's growing at about 50 terabytes each month. So, in each month, just from the worldwide web, we collect as much as in all the words that are in the Library of Congress. Then, we started moving on to collecting movies and music and software releases.
BOB GARFIELD: I understand we're just talking about electrons, which have no mass, but you're storing an awful lot of electrons, and given the state of technology, must require just football fields full of servers or whatever to keep these electrons in storage for future access, or am I missing something?
BREWSTER KAHLE: Oh, no, you've got it right. It takes a lot of storage, but we're using normal tape systems and disc drives. Let's try to quantize it. If you were to take, say, all of the books in the Library of Congress (there are 26 million books in the Library of Congress) - if you were to have a book in a Microsoft Word file, that's about one megabyte. So 26 million megabytes is 26 terabytes. It goes mega, giga, tera. 26 terabytes fits in a small bookshelf, these days, and costs about 60,000 dollars. So there's a job of trying to get them into the computer, but storing them is actually completely reasonable.
BOB GARFIELD: No, that's everything in the Library of Congress, which is a lot, but you mentioned taking snapshots of every public available web site every two months. Well there are millions!
BREWSTER KAHLE: Yes. There are about 50 million web sites that we gather every two months, and we gather about four billion pages, and you - not only do you want to put them on hard drives - you want to put them on more than one, cause I guess one of the main lessons from the Library of Alexandria, version one, is: Don't just have one copy. So, you want to have more than one copy in multiple locations.
BOB GARFIELD: Well, let's talk about the Library of Alexandria, version one, because they began with fundamentally the same goal, but it all went up in smoke.
BREWSTER KAHLE: Yes. The loss of the Library of Alexandria is one of the great catastrophes of humankind. They set out in 300 B.C. to collect all the books of all the peoples of the world, and by some measurements they got 75 percent of the way there. It's amazing. But by around the year zero, there was a, a fire that brought down some of it, and then by the first couple hundred years, the idea of universal knowledge was not in vogue any more, and it decayed and was burned. Our job is to do something better this time, so what we want is to have copies in different places.
BOB GARFIELD: Is there some sort of external event - I don't know, sun spots, some sort of electro-magnetic disaster, that could actually do to a digital archive what fire did to the Library of Alexandria 2400 years ago?
BREWSTER KAHLE: Yes. We could lose it all, and those of us that live on top of the San Andreas Fault Line are pretty conscious of this, and also just the ups and downs of politics. What happens to libraries in the long haul is, they're burned. And then tend to be burned by governments. You know, it's just the new guys don't like the old stuff around. They're sorry about it a hundred years later, but it's, it's too late by then. So, what we have to do is build a system that can work with changes in governments and political structures and culture over the long haul.
BOB GARFIELD: The irony of all of this is that, in addition to being maybe the world's foremost collector of the priceless treasures of human knowledge, you're also, almost by definition, going to be the largest amasser of crap, maybe in human history. [LAUGHTER] You have accumulated an awful lot of crap, no?
BREWSTER KAHLE: Oh, yes. We've got stuff that - not just of bad taste, but of just - stuff that should never have been really collected, if we had have been better at it in the first place.
BOB GARFIELD: I'm not just talking about bad writing; I'm talking about pornography, hate speech…
BREWSTER KAHLE: Oh, yes. The materials on the web. I- Douglas Adams had a line that I, I just love. He said, "The wonderful thing about the net is it's just us." It's just who we are - the good, the bad, the ugly. What I do like about the net is it really shows that people are very particular and very peculiar. So we've really dispelled that sort of idea that we're all just a generic blob.
BOB GARFIELD: Brewster, thanks so much.
BREWSTER KAHLE: Thank you very much.
BOB GARFIELD: Internet archivist Brewster Kahle is accumulating all human knowledge, which you can locate at www.archive.org. [MUSIC]