Transcript
BROOKE GLADSTONE:
We often talk about how new media, the unbounded vistas of cyberspace, can change how we form opinions and communities, how we understand the world. But the Internet doesn't change how you write a novel or compose an opera or build a house, right?
Actually, it can, if you let it. It can change everything, even practices considered fundamental and immutable, like how we do science. The scientific method is built around testable hypotheses, experiments that confirm or reject theoretical models of how the world works.
This, writes Wired editor Chris Anderson, in a recent issue of his magazine, is the way science has worked for hundreds of years. But now new media is making the old scientific method obsolete. Chris, welcome back to the show.
CHRIS ANDERSON:
It’s great to be here.
BROOKE GLADSTONE:
Before I go any further, just confirm that I have that right.
CHRIS ANDERSON:
You have [LAUGHING] accurately recounted what my article said. And at this point I need to make a slight confession as a magazine editor, that we sometimes pump up our headlines
BROOKE GLADSTONE:
[LAUGHS]
CHRIS ANDERSON:
and sometimes engage in a little overstatement for effect. So let me just take all of that and delete the word “obsolete.” I think what we have is a new way to do science which adds an option that wasn't available before the ages of massive data and what we call petabyte scale computing.
We have, in a sense, a new scientific tool, like a microscope or a telescope, which, in a sense, challenges the form that you've just described which is theory, then experiment.
BROOKE GLADSTONE:
Well, let's talk about that challenge. Despite the fact that I'm nearly apoplectic that you have backed away from your headline, thus depriving me of the ability to hoist you on your own petard, but [LAUGHS] nevertheless, as you say, we have entered the age of the petabyte. And it would be useful, I think, if you gave us a sense of scale.
Mathematician Martin Wattenberg observed in Wired that the sum total of all the words you'll hear in your lifetime amount to less than a terabyte of text. So then how much is a petabyte?
CHRIS ANDERSON:
A petabyte is, mathematically it is, you know, 1,000 terabytes, but we have a hard time understanding that scale. We usually use the sort of, you know, the Library of Congress, as an example. The Library of Congress is sort of, you know, on the - you know, on a couple of terabytes scale; a petabyte’s a thousand of those.
We've never seen petabyte scale data aggregations before. There’s never been anything like that because we're still relatively early in, you know, the digital age. But Google has just hit that state. Google processes about a petabyte of information every 72 minutes, and a year from now it'll process a petabyte every half an hour, and so on.
BROOKE GLADSTONE:
So then connect the dots for us and explain how that volume of data can eliminate the need for the traditional scientific method.
CHRIS ANDERSON:
You know, the old way of understanding who we are and what we do was to use kind of conventional human techniques, what’s called semantic analysis. So the old form of search, for example, was to try to understand, you know, what is this page about? And Google sort of said, give up, that, you know, you could do that once or twice but it doesn't scale to the huge volume of the Internet.
The way page rank works is they say, we don't know anything about this page but we do know that these other sites link to it. So what they're saying is there was a connection between this site and these highly ranked sites and those sites that are connected to those other sites.
And what we have here is a correlation but we don't know anything about causation. We don't know why they link to each other.
BROOKE GLADSTONE:
But how do we apply this notion that correlation can replace theory in other spheres besides search engines? Give me an example.
CHRIS ANDERSON:
One of the best examples is genetics. In the same way that the Internet has taken human knowledge and digitized it into a volume that Google can treat as a huge statistical database, gene sequencing is taking biology, taking the life — taking the world around us and digitizing that.
Scientists like Craig Venter, the biologist, are sequencing the oceans. They're sequencing the air. Now, what does that actually mean? What that means is you take a sample of a bucket of water or a sample of air and you just sort of throw it in the machines, and what you get is millions or trillions or, you know, or petabytes of just individual sequences.
Now, this is a mash up of all the animals, you know, bacteria, etc., that were in that bucket of water or all the various viruses and things like that that just happened to be flying through the air.
And there’s no way to analyze it the way a traditional biologist would. Traditional biologists would be, well, let's look at the animal. Let's look at it under a microscope. Let's, you know, dissect it, etcetera. This new one says we don't know what it looks like. We don't know how it lives. We don't know anything about it except for that it has a unique sequence of genes, and that popped up as a blip in our statistical view of biology.
Now, this is very unsatisfying to us as people. We intuitively are built around stories. Before science came mythology. You know, you looked up in the stars and you invented stories, you know, the constellations, the gods, etc. Those were the stories we used to explain nature. As we got better with our analytical tools, we didn't need those stories. We had theories.
Now we're at that stage where our theories really don't explain biology in its fullest. Our theories really can't explain physics in its fullest. We're now, what is a string theory? What are the grand unified theories? We really don't have the experiments that we can do to test these theories. They're just math. They're just abstract.
In the case of biology, what we do have is an extraordinary volume of data. And now some biologists, like Craig Venter, can sort of think like Google does and say, okay, I will not understand how this species evolved. I won't understand how this species lived.
But I can use supercomputers and petabyte scale databases to actually identify them as species, to count them as species, to maybe find some connection between species and say, these two distinct blips seem to both use photosynthesis. Hmm, I wonder what that tells us? Or these two distinct blips seem to have a connection to some blips that I found in China in a rice paddy. I wonder why that is?
And so you end up thinking abstractly and statistically and not like traditional science and a human does, but by doing so you can scale to an entirely new class of scientific questions that can be answered by data alone.
BROOKE GLADSTONE:
Genetics seems to be essentially mathematics whereas psychology is not, and yet you say if we have enough data about how humans behave in a particular way, we don't have to know why. Can you give me an example of that?
CHRIS ANDERSON:
Sure, so you say psychology wouldn't lend itself to this approach. Well, let me ask a question. Would you consider behavioral economics a form of psychology? I mean, markets are nothing but millions of people acting out of personal interest involving their own psychology. We're all different.
The reality is, is that you can understand one person, maybe. Can you understand everybody simultaneously? No. But what you can do is you can measure their actions. And, you know, behavioral economics applied to market theory on a kind of, you know, a NASDAQ scale, is essentially the Google method at work.
Obviously the traditional scientific method is ideal. Newton’s laws, you know, quantum mechanics, Darwin’s theories are fantastic and they got us where we are today, and they cut through a lot of confusing data and gave an explanation that did have predictable power. But they all reached their limits.
BROOKE GLADSTONE:
But if we had petabytes of data back then, are you saying we really wouldn't have needed those theories?
CHRIS ANDERSON:
Those petabytes of data might have helped us get those theories more quickly. I mean, obviously it’s hard to imagine what petabytes of data in Darwin’s age would have meant.
BROOKE GLADSTONE:
[LAUGHS]
CHRIS ANDERSON:
But the point is that he did use data to help develop his theories. I mean, remember, Darwin was not considered a scientist for the first part of his career. He was a naturalist. He was collecting specimens and categorizing them.
And so, in a sense, what we're doing is we're returning to the earlier stage of science where scientists were data collectors. And the world is too complex in general for an ultimately granular understanding of how it works, and that’s why we increasingly fall back on statistical methods that use our ability to gather data on an unprecedented scale and then just treat it as a math exercise.
We don't start with, you know, the requirement that we understand what’s going on in every element. We start with the fact that we have data. And then we statistically analyze that data, and out of that data comes correlations. And at that point, it’s enough, correlation is enough. Causation some day will come. Theory some day will follow.
BROOKE GLADSTONE:
[LAUGHS]
CHRIS ANDERSON:
But the notion of data led science as opposed to theory led science is the new model that’s increasingly being explored.
BROOKE GLADSTONE:
We don't have a widely accepted unified field theory yet. We don't even know what the universe is made of. We don't know what makes us and dolphins and elephants and a couple of great apes self aware. Does this Google model know?
CHRIS ANDERSON:
The Google model does not know and the Google model maybe can't know. But what the Google model might be able to do is to allow us to act in the absence of knowledge. And I think we now have a question — what are we going to do with that?
When you have access to infinite amounts of data and infinite amounts of processing power, how are you going to use it? How are you going to change the way you ask questions and the way you look for answers? How are you going to think like Google with this unprecedented opportunity to ask questions in a new way?
BROOKE GLADSTONE:
Chris, thank you so much.
CHRIS ANDERSON:
Thank you.
BROOKE GLADSTONE:
Chris Anderson is the editor of Wired Magazine.