Horizon (1964) s52e06 Episode Script
The Age of Big Data
In Los Angeles, a remarkable experiment is under way.
Face the wall, face the wall before I put you in handcuffs.
The police are trying to predict crime before it even happens.
It actually gives us a forecast about where crime is most likely to happen in the next 12 hours.
In the City of London, this scientist-turned-trader believes he's found the secret of making millions, with maths.
The potential to do things with data is fantastic, fantastic.
And in South Africa, this star-gazer has set out to catalogue the entire cosmos, by listening to every single star.
What unites these different worlds is an explosion in data.
The volume of it, the dynamic nature of the data is changing how we live our lives.
In just the last few years, we've produced more data than in all of human history.
In this film, we follow the people who are mining this data.
It's set to become one of the greatest sources of power in the 21st century.
6am, Los Angeles.
The start of shift in the Foothill division.
Officer Steve Nunes, a 12-year-veteran of the LAPD, and his partner Danny Fraser head out to patrol.
Right now, we're north of Los Angeles, downtown Los Angeles, in the San Fernando Valley area.
Their beat is one of LA's toughest neighbourhoods.
There's a lot of BFMVs, burglary from motor vehicles.
There's a lot of robberies, there's a lot of gang and narcotic activity over here.
There's a lot of people selling drugs.
The gang that's in this area are called the Project Boys.
They're a Hispanic gang.
Despite their experience and intimate knowledge of the neighbourhood, today, their patrol is being controlled by a computer algorithm.
You know, I wasn't really too happy about it, you know, specially as a police officer you know, we kind of go off of what we know from our training.
We weren't too happy about a computer telling us where we need to do our police work and what area we need to drive around.
Steve and Danny are part of a ground-breaking trial.
An equation is being used to predict where crime will occur on their watch.
I saw some people hanging out by the laundry, like the little laundry.
I guess If its predictions are correct, the system will be rolled out across all LA Hey, stop, yeah, stop.
Stop, stop, stop.
Put your hands on your head.
.
.
and the computer algorithm will become a routine part of Steve's working life.
Spread your feet, face forward.
Have anything on you? Stop moving.
Face the wall, face the wall before I put you in handcuffs.
You a Project Boy too or no? The ambition to predict crime was born out of a remarkable collaboration between the LAPD .
.
and the University of California.
Jeff Brantingham might seem an unlikely crime fighter.
A professor of anthropology, he is an expert on remote hunter-gatherer tribes in China, but he's convinced that from remote China to gangland LA, all human behaviour is far more predictable than you might like to believe.
We all like to think that we are in control of everything, but in fact all of our behaviour is very regular, very patterned in ways that is often frightening to us.
Offenders are no different.
They do exactly the same things over and over and over again, and their criminal offending patterns emerge right out of that regularity of their behaviour.
Jeff believed he could find repeating patterns of criminal behaviour in the LAPD's vast dataset - 13 million crimes recorded over 80 years.
The LAPD have droves and droves of data about where and when crimes have been occurring.
It represents a treasure trove of potential information for understanding the nature of crime.
The LAPD already use their crime data to identify hotspots of crime, but that only tells them where crime has already struck.
We've gotten very good at looking at dots on a map, and where, where crime has occurred and the problem with that is that, sometimes, you're making an assumption that today is the same as yesterday.
Jeff Brantingham planned to do something more radical and more useful - predict the future.
He believed he could use patterns in the crime data to predict where and when crime was likely to occur.
We've long used the patterns in nature to make predictions.
From the setting sun, we learned when to expect the new day.
The phases of the moon allowed us to forecast the ebb and flow of the tides.
And from observing the patterns of the stars, we mastered the art of navigation.
But Jeff Brantingham wanted to do something far more ambitious.
He wanted to tease out patterns in the apparent chaos of human behaviour, to uncover them in the LAPD's vast dataset of 13 million past crimes.
You can have gut feelings about the crime but, ultimately, you need to think about working in a mathematical framework because mathematics gives you the ability to understand exactly why things are happening within the data in a way that gut feelings do not.
Jeff needed an expert in pattern detection.
He turned to his colleague, UCLA mathematician George Mohler.
As mathematicians, we're interested in understanding what's around you so, you know, how do waves propagate if you throw a pebble into the water? The distribution of trees in a forest.
So mathematical models can help you understand those types of things.
George could use mathematical tools to see what was hidden in the crime data.
And there were hints of a pattern in it.
What you see is that after a crime occurs, there's an elevated risk and that risk travels to neighbouring regions.
So what we wanted to do is develop a model to take that into account so police could maybe use that information to prevent those crimes from occurring.
He started with a mathematical model that was already being used, right here on the west coast of America.
Southern California is earthquake country.
Sitting on the San Andreas Fault, there's an average of 10,000 earthquakes and after-shocks every year.
The biggest for 100 years was the Loma Prieta earthquake of 1989.
Its epicentre was here, just outside Santa Cruz, California.
There is quite simply no mathematical model that can predict an earthquake like this one.
But after the earthquake come the after-shocks and that's a different matter.
So we're several hundred metres from the epicentre.
Nearby was one of the after-shocks of the original Loma Prieta earthquake.
After a large earthquake occurs, there is a probability that another earthquake will follow nearby in space and time.
George discovered seismologists had found a pattern to earthquake after-shocks and developed an algorithm to predict these after-shock clusters.
These types of clustering patterns are also seen in crime data.
So, after a crime occurs, you will see an increased likelihood of future events nearby in space and time.
You can think of them as after-shocks of crime.
George and Jeff took the equation for predicting earthquake after-shocks and began to adapt it to predict crime.
So the model is broken into several parts, so the overall rate of crime, which we'll call Lamda, models the rate of events in space and time.
We use the Greek letter Myu to represent the background amount of crime that's going on.
The second component to Lamda is G.
G models the distribution of crimes following an initial event.
This whole term overall describes what we call self-excitation, that a crime that occurs today actually self-excites the possibility of future crimes.
So Lamda equals Myu plus G, is that right? Well, sort of, so Lamda equals Myu plus G positioned at all the past events in your dataset.
George and Jeff took their algorithm back to the streets of LA.
When they plugged the old crime data into the equation, it generated predictions that fitted what had happened in the past.
But could it also predict the future? They began to produce daily crime forecasts, identifying hotspots where crime was likely to strike in the future.
11 Nunes, there.
Sir.
23 Fowler.
Wallier.
Sir.
Let's go to the mission maps if you would, please.
Today, the LAPD is putting these predictions to the test.
The cops in Foothill are assigned boxes of just 500 square feet where the algorithm predicts crime is most likely to occur in their 12-hour watch.
Right, predictive mission for today is, we've got a few boxes here to address, in Adam 11's area, 12260 Foothill Boulevard.
They're instructed to hit their boxes as often as they can.
Osborne and Foothill Boulevard.
So you've got your mission for the day? So let's go out there, have fun and be safe.
Yeah, there is a homicide blinking up there.
The trial is monitored at the real time crime centre in downtown LA.
What we're looking at here is the forecast that was produced by the PredPol software.
So if you see on the centre of this map, we've got three nearly contiguous forecast boxes around this area, and then an adjacent one.
So this is good information for the officers.
They can go out there, work up and down that street, Sheldon, and some of those side streets, and look for criminal activity or evidence that criminal activity might be afoot.
OK, Roger, we'll take it.
SIREN WAILS Steve and Danny have got the word to go.
The model has predicted car crime in a box on their beat.
It's a kid.
Yeah, it's the same address as that kid that we had yesterday.
When they reach their assigned hotspot, they find a cold-plated car.
The licence plates don't match the vehicle.
They're getting what they need, huh? When they call the number in, it turns out the car's been stolen.
It was an area where there's a lot of GTAs, which is "grand theft auto", people were stealing cars.
Right out of roll call, right when we got down one of the boxes they went into, one of the areas they started patrolling, right away they ran a car and it came back stolen.
In Foothill, they found using the algorithm led to a 12% decrease in property crime and a 26% decrease in burglary.
At first I said we weren't big on it, you know, and it came to the point where, little by little, you start to see crime in certain areas deteriorate because of us being in that box for, you know, even ten minutes, twenty minutes, even five minutes.
So, we definitely see how it is working.
The model is continuously updated with new crime data, helping to make the predictions ever more accurate.
This whole year since January, Foothill area has been leading the city of Los Angeles in crime reduction, week to week, so the officers, once it started working, then we had buy-in from them and now it's just a regular course of how they do business.
Predictive policing will be rolled out right across the city of Los Angeles, and is being trialled in over 150 cities across America.
And predicting crime from crime data is just one way the data miners are changing our world.
In fact, the tools that Jeff used to mine the LAPD data can be applied to any dataset.
The vast complexity of the universe .
.
the diversity of human behaviour .
.
even the data we create ourselves every day.
The data miners are reaching into every area of our lives, from medicine to advertising, to the world of high finance.
Professor Phil Beales is a geneticist at the forefront of this data revolution.
The methods he uses today can be traced back to an extraordinary man living in London 300 years ago.
The first data miner, the amateur scientist, John Graunt.
Graunt was living through the greatest health threat of his day, the bubonic plague.
Its causes were an utter mystery.
Graunt began searching for patterns in the parish death records, known as the Bills of Mortality.
The Bills of Mortality were essentially random sets of information which he brought together and organised and made sense of that information, so Graunt realised that this information was essentially a gold mine.
Graunt wanted to know who had died of the plague and who had died of something else.
He compiled all the death records together.
And this dataset allowed him to see patterns that no-one else had seen.
He listed a number of the causes of death and categorised them in such a way that one can now look back and see exactly what people died of.
For example 38 people had King's Evil, which is actually tuberculosis of the neck or otherwise called scrofula.
One patient was bit with a mad dog, another 12 had French Pox, which is actually syphilis.
And in the plague deaths, Graunt found a revealing pattern.
It overturned an idea that everyone shared at the time about what caused the disease.
He was able to refute the widely-held belief that plague might have been caused by person-to-person contact, and he was also able to refute the widely-held belief at that time that plague tended to increase during the first year of the reign of a new king.
And the more Graunt looked at the data, the more hidden patterns he discovered.
People started to see the city of London in an entirely new way.
He was the first to estimate its population.
He proved more boys were born than girls, but that higher male mortality meant the population was soon evenly balanced.
He showed that surprising and rather useful ideas could be mined from data, if you knew how to examine it.
This was a completely new way of looking at the information and from extracting really useful data, so Graunt was essentially a pioneer.
Graunt was the founding father of statistics and epidemiology, the study of the patterns, causes and effects of disease.
And it's this same power of data that has become fantastically valuable in modern medicine.
Today, Professor Phil Beales is mining a new human dataset, the three billion bits of genetic information that make up the human genome.
He's searching our DNA for clues to help him diagnose and treat illness.
Let me just take a quick look at you.
Jake Pickett is one of his patients.
When Jake was born, there were no extra skin tags or extra toes or fingers or anything like that? I had a skin tag on my arm.
For 14 years, Jake has lived with an unusual range of symptoms, including learning difficulties, obesity, and poor eyesight.
You had an earring in there? Yeah.
Oh, OK, you weren't born with that! His unidentified condition has baffled his parents and doctors.
We've had a lot of tests over the years, and actually, my paediatrician of the time had said to me, "He's such a happy, lovely young boy.
"Why do you want to keep sticking him with needles?" and it made me a bit frightened to keep asking for help, because then I thought maybe the medics would think there's something wrong with me.
But in the course of Jake's lifetime, medicine has changed.
Professor Beales now has the tools that may help Jake and his family unravel this mystery.
.
.
because they know it's difficult for him.
As part of the blood test today, we will take some of that and from that blood take the DNA, extract the DNA, and then we will do the genetic testing on those.
Are you happy with that? Yeah, yeah.
It will take a few weeks.
So the key really is to try to nail down the diagnosis in this particular situation, if we can.
OK, that's great.
This is just to clean it.
He will search Jake's DNA, hunting for the tiny telltale variations in his genes that may have caused his condition.
Just hold still for me.
Every patient whose genes are analysed adds to the growing database of DNA.
It helps doctors devise new treatments and identify previously mysterious conditions.
Well done, it's all done.
OK? Phew! OK? It wasn't that bad.
Over the last ten years, this technique has successfully revealed the genetic basis of many diseases.
We have got here the coverage and Good, OK, well it looks like we've got our gene then, doesn't it? I hope so.
OK.
Being able to identify a disease is often the first step in helping patients.
So patients live with the uncertainty of a lack of diagnosis for many, many years and we can't underestimate the benefits and the importance of having this diagnosis, so through molecular testing such as this, we're able to provide those patients with a certain level of comfort when it comes to a diagnosis, and, in a sense, closure, so they can move on to the next chapter.
Teasing out the patterns in the human dataset is transforming medicine.
Data is becoming a powerful commodity.
It's leading to scientific insights and new ways of understanding human behaviour.
And data can also make you rich, very rich.
TRADERS SHOU When it comes to making money out of data, David Harding's rather good at it.
30 years ago, he set out to bring data analysis and algorithms to the trading floors of the City.
This is how all trading used to be done.
All trading used to be done in rooms full of people like this.
They are shouting the prices they will buy and sell at, they are agreeing the deals, the rises and falls in the prices are almost like the rises and falls in the noise level.
Today, the London Metals Exchange is the only trading pit of its kind in Europe.
Noisy, emotional and chaotic.
To a science graduate from Cambridge, it came as a bit of a surprise.
When I went into the City, I assumed because it was the world of banking and high finance, I assumed that it would all be very, very rational and very efficient and very disciplined and well-organised, rather like the body of knowledge I had been taught at Cambridge in physics and chemistry.
These bodies of knowledge were organised and rational, and it wasn't at all like I expected.
But that it was, you know, somewhat chaotic, in a way.
Buying and selling strategy in those days tended to be governed by instinct and intuition.
I watched the prices going up and down on the board up there.
I plotted graphs by hand, standing at the edge and followed these graphs and I became convinced that there was a pattern to the rises and falls in prices.
David Harding wanted to bring mathematics to the problem.
He believed that if he had enough data, he could predict patterns in the prices and make money, but the prevailing wisdom was that this was an impossible task.
According to the financial orthodoxy, the rises and falls in prices that take place here are completely random.
Nobody can ever predict them, however clever they are or however much foresight they have.
Essentially, cutting to the chase, the idea is that you can't beat the market.
Like all data miners, Harding needed two things.
Data, a lot of it, and computer algorithms to spot the patterns.
In the mid-1980s, the introduction of computers to the City made data about prices accessible.
Harding had to develop the tools to analyse it.
At that stage in my life, I could program a computer! HE LAUGHS I could program a computer, I could read the data from the new exchange, I could conduct analysis of that data and that, to me, was rather an elementary thing to do.
I was surprised that other people hadn't done it first.
You'd have thought that, where all the millions and billions are all sloshing around, you'd have thought that lots of rational, intelligent people would have done these sorts of things.
The company David Harding founded 20 years ago now invests billions of pounds on the basis of data.
That is a lovely dataset you've created, that's why I was waxing rather lyrical.
You might just find a pattern! And that's a large dataset.
That's a lot of stocks on a lot of dates.
Harding is now far from the only scientist in the City.
His company alone employs over 100 scientifically trained data hunters, from astrophysicists to cosmologists, to mathematicians and meteorologists.
They've become known as quants.
Well, there's the joke which is, what do you call a nerd in 20 years' time? And the answer is "Boss," you know! It reminds me of Bill Gates who said at any other point in history he would have been sabre-toothed tiger food.
His company is built around the idea that if you have enough data and the expertise to read it, you can spot trends and links that no-one else has noticed.
He and his analysts can seek out patterns in anything that is bought and sold.
Take, for example, coffee.
Obviously, they will probably almost certainly sell less coffee on a Sunday.
Now that's not a revelation, or that they sell more coffee in winter, because people are indoors more often in winter, but there is an art or a science or a skill which is using the data to find out more interesting things and I'm sure that if my analysts went to work, we could find out much more interesting things than that.
The process begins with data, collecting any information that might be relevant to the cost of coffee.
The data, you can't hear it and you can't see it.
You need specialised tools to interrogate and take decisions about that data and those tools are not the eye and the ear.
They are the modern computer.
Algorithms can then search the data, looking for factors that link to the rises and falls in coffee prices.
The yield of coffee bean harvests for example, the strengths of the economies and currencies of coffee-producing countries, as well as consumer demand for coffee.
In the vast dataset, tiny significant signals appear and it is these signals which hold the clues to when to sell and when to buy.
The idea of the exercise is to read in the data on all the companies around the world, analyse that data using rigorous scientific methods and make sensible, rational inferences from that data, not just take decisions on the basis of human feelings and how you feel today and what you heard from your friend and so on and so forth, but really bringing to bear the scientific method much more.
It's a strange mathematical social science, but science, it is.
Here, they gather data across hundreds of markets going right back in time.
Daily metal prices from 1910, food prices dating to the Middle Ages, and London Stock Exchange prices stretching back to 1690.
And every day, they collect new data on 28,000 companies across the world.
We have data coming in almost 24 hours a day for nearly all the markets we trade, and the last time I looked, we had something like 40 terabytes of data in our database, and that's the equivalent of about 70 million King James Bibles.
The ambition is that somewhere in this 40 terabytes of data there are patterns that can be used to predict price rises and falls, and you don't need to predict price changes with pinpoint accuracy.
The odds just need to be a bit better than even.
If you throw a coin and there's a 50/50 chance of it landing heads or tails, then clearly, there's no way of profiting from that.
If however, we had the ability to know that heads was going to come up 52% of the time or 53% of the time, then that would be a great investment business.
You should look closer to the data, then there is something which looks a bit bizarre.
First If you have the resources and can make enough investments, spotting even a tiny variation can lead to large profits.
Over the last 20 years, this approach has paid handsomely for David Harding.
There's never really a point at which you can relax and sit back and go, "There, I have proved my point!" Of course, you know, over the years the ideas have been successful, the company has grown.
It gives me great pride and satisfaction.
Of course, investing in financial markets remains a gamble.
There is no universal law of finance.
Stock market crashes, recessions, they're clearly not easy to predict.
The patterns in the data are constantly shifting and changing.
There is no one right answer.
Every day, week or month, you are being proven wrong by having your ideas put to the test, and that is a gift because it enables you to maintain a level of humility that people may, in other situations, lose, and humility is actually a vital ingredient of proper scientific investigation.
I think most good scientists tend to be quite humble people.
The world of finance has been changed forever by the data revolution.
The effects have spilled over into everyday life.
And the data revolution is set to become even more personal.
The fastest growing dataset of all is the one being created by you.
Every time we call, text, search, travel, buy, we add to the data mountain.
All told, it's growing by 2.
5 billion gigabytes every day.
All that data is valuable, and it's brought out the data hunters, like Mike Baker.
The volume of it, the dynamic nature of the data is changing how we live our lives and if you collect this information over millions of people, you can start to guess what they may be interested in next.
He saw an opportunity to bring the data revolution to the world of advertising.
Instead of relying on customers seeing a billboard, it was now possible to beam the adverts directly to them.
We started to look and think about all of the data.
If we collected enough about past behaviour, could it be predictive in a way that would be useful for a business, in terms of trying to connect to people? Mike wanted to mine this data, to predict what people might want to buy.
His first hurdle was how to search through the vast amount of data we produce every day to find the tiny signals of our consumer interest.
I quickly realised that a big part of the problem was actually the math.
It was clear there were no systems, not even really mathematical constructs, where you could capture the information, make sense of it and then turn around and create actions across hundreds of millions of people simultaneously.
As if capturing the vast dataset created by mobile computing wasn't challenge enough, Mike also wanted to mine it virtually instantaneously.
He wanted to find hints of what people might be want to buy even before they'd realised it themselves.
He needed to find a collaborator.
The ideal partner for Mike came from a completely different world.
Bill Simmons was an aerospace engineer at MIT.
He was working on one of NASA's most ambitious tasks of all time, a potential manned mission to Mars.
A mission to Mars is extremely complex, especially if you include people, and it gets very hard if you want to bring the people back.
Bill's team started to work out how to plan all the elements necessary for a manned Mars mission, and discovered the real problem was that there were so many different options to choose from.
We found there were about 35 different major decisions, and many, many, small decisions that follow.
For things like how many crew, what kind of propellant to use, how many rockets, big ones or small ones, what kind of orbit trajectory? So you add all those up and all the different possible choices you can make was 35 billion different possible Mars missions.
And that would have taken, if we were to go through all 35 billion, it would have taken infinite time to find one that works.
NASA needed a way to narrow down the possibilities.
Bill turned to decision theory.
It's a complex branch of maths but the principle is the same as something really quite simple - shopping.
Even buying dinner for two, you've got thousands of decisions to make.
You could take all day.
You could try every food, and it would take you hundreds of years to see every combination of apples and, I don't know, mustard or pears and bananas.
To make it simple, you can apply the principle of decision theory.
You can make decisions about things in many different orders.
If you want to decide what to make for dinner, you can decide what food you like first or you can decide what tools you're going to use.
So you could say, "I'm going to cook things with a spatula," and then you haveit doesn't really narrow things down for you.
The trick is to put your decisions in the right order.
If you take big decisions first, you eliminate a lot of smaller decisions and speed up the process.
I did bring a plan.
I'll show it to you.
This is, um I have three different kinds of recipes.
I can either make salmon, a white fish or branzini, three of my favourite recipes.
If I choose salmon, I'll need mustard and capers and lemon.
If I choose white fish, parsley, eggs and lemon.
And branzini, lemon and rosemary.
So here we are at the seafood section.
Looking around, I see they have some very nice fresh Atlantic salmon and I think that's what I'll buy.
PROGRAMME-MAKER: You strike me as a very organised guy.
Is that a typical Bill thing to do a list like that? Yes, this is.
You know, studying decision theory, this is how I think about things.
So now the rest of my plan is set in motion.
All I need to do is buy mustard, capers, lemon and some salad, and possibly a side dish, if I see something I like.
Decision theory, which works so well on a shopping trip, can also be applied to the 35 billion decisions in a manned Mars mission.
If the first decision only had two choices, you could have two crew or three crew, if you find after a few more decisions that two crew is not possible, it won't work, because you need at least two people in the lander and one person in orbit, then you've eliminated essentially, if you made that decision first, early enough in the process, you've eliminated half of the permutations you need to look at.
So this increases your speed by half, and if you continue to use this process over and over again, you continue to speed up your decision process, doubling every time, for example, so it becomes exponentially faster.
Bill created a decision-making algorithm which was able to process information, putting the decisions that narrowed down the most options first.
The 35 billion decisions fell to just over 1,000.
It was a revolution in the speed of data processing.
Mike Baker realised Bill's decision-making model was just what he had been looking for.
They joined forces and adapted Bill's super fast decision-making machine.
Now it scans the billions of bits of data we produce, quickly finding clues to what we might buy, then sends a personalised advert from one of their advertising clients.
We're processing hundreds of thousands of advertisements per second, potential advertisements, and determining within 100 milliseconds, so one tenth of a second, much faster than the blink of an eye, whether that advertisement is good for any one of our clients.
The models learn what you might be tempted to buy, and where and when you might buy it.
They all work in concert and they pick up on patterns, so they see the same anonymised user triggering similar behaviours over and over again.
The machine learns this is a person who likes Italian food, interested in Sedans, and likes rock music from the '60s.
The data analysts predicting what you might buy are creating a world of personalised advertising.
If you choose not to personalise the advertising, you'll still get advertising.
It's not a choice to have no advertising.
It's just that it'll be less relevant to you and, you know, potentially more annoying.
We're all familiar with what that's like to see something very annoying.
I saw some today at my house.
I think it was erectile dysfunction.
Totally irrelevant to me! And advertising is just the start of exploiting our personal data mines.
Even the most insignificant data of everyday life is being mined, with potentially life-saving consequences.
Cathy Sigona is a retired school principal in San Francisco.
She has a condition called atrial fibrillation, which makes her heart beat irregularly.
It felt like a big fish in my chest.
And it was one side here, and then it would just bounce back and forth, and what can happen is the blood can pool and that can cause a clot which then can cause a stroke.
So that's where the real seriousness lies, is the fact that I could stroke out.
The causes of atrial fibrillation are unknown, so predicting when episodes may occur is vital.
Hi, Nanette, this is Cathy.
So Cathy is about to take part in a trial.
Her doctor is going to monitor her symptoms using data extracted from how she uses her mobile phone.
Dr Jeff Olgin is Cathy's cardiologist.
Because the mobile phone has become such an integral part of people's lives, it's with them most of the day and most of the time, so that becomes a very good real-time data collector for them.
Dr Olgin is trialling software that will record Cathy's daily behaviour.
Any changes to her usual routine might indicate she's unwell.
As a really practical, simple example, let's say you get up and go to work every week day at 7 o'clock.
If all of a sudden that's changed, we'll notice in a difference in your behavioural pattern that might trigger us to say, you know, "What's going on?" And there's lots of fun things that sort of pop up Algorithms in the software will search Cathy's data, and if they find signals of abnormal behaviour, they will trigger an alert to Dr Olgin.
It could be a life-saver.
Hopefully in relation to atrial fibrillation in particular, hopefully we will be able to identify behaviours or behavioural patterns that might predict an episode.
Our personal data trails can be used to peer into our behaviour, discovering clues to illness.
And so if we can find a cause that we can fix down the road, and I'm not talking the next couple of weeks, but in the next couple of years, that we can start alleviating some of the stresses that cause me to have atrial fib, I would be extremely pleased.
I have a lot of life left.
The idea of predictive and personalised medicine is coming closer than ever before.
And it's the data we have from the moment we're conceived that will make this idea a reality.
Professor Beales' clinic relies on the biggest human dataset of all, the human genome.
Just 20 years ago, his work would have been all but impossible but now he can analyse his patients' DNA to pinpoint the genetic mutations causing disease.
We still have a myriad of diseases, particularly at this hospital, where there are many, many children who do not have yet a diagnosis for their often rare condition, and I think at the moment, one of the things we really need to do is to be able to sequence as many of these children as possible so that we can begin to unravel a lot of these mysteries.
Genetic diagnosis has already helped identify new conditions, allowing doctors to devise new treatments and research cures that promise to improve our lives.
And so far, we only really understand about one percent of our genome.
These volumes represent the whole of the human genome, the coding element of the human genome.
In other words, the sequence of all of the letters that go to make up a single human being.
This is a huge discovery.
However, it is just the tip of the iceberg.
The medical use of our DNA data is in its infancy.
We're just beginning to glimpse the 99% of the genome which we used to think was junk, but now realise is vitally important.
So the 99% of the genome that's left for us to understand is going to represent a huge task.
There's an enormous amount of information in there and we have to be able to relearn, we have to actually be able to develop new tools to be able to understand the code that's hidden within that vast chunk of the genome.
But even the huge dataset of the human genome is dwarfed by the one that has its roots in the very first data science.
Astronomy.
For centuries, astronomers like Simon Ratcliffe have been collecting data from the billions of stars and galaxies in the night sky.
In many ways, astronomy was the first of the natural sciences, and it was the Babylonians who kicked that off and they started to notice that it wasn't just random.
There were patterns.
There's certain things in the sky that seem to move over and they're always fixed, relative to each other.
Those were the stars.
Then they noticed that certain objects in the sky seemed to wander.
That was the planets.
And so what they did, as you do, is you record the movements.
They wrote down this data and in recording that data over long periods of time, they were able to tease out the patterns inherent and that gave the ability to start to understand the universe.
The science of astronomy was founded on data hunting.
Astronomers use the patterns of nature, the predictability of stars, to unlock the secrets of the universe.
At the moment, we have the Southern Cross to the left.
We have Scorpio right in ascendance above us.
Scorpio was first identified and named over 5,000 years ago.
And if you look closely, you can see a bright red star there called the Heart of Scorpio.
That's a star called Antares, which is a super-giant.
Now with more data, scientific equations and mathematical models, astronomers can forecast the fate of Antares.
This is a fairly massive star that's getting towards the end of its life.
What's going to happen is it's going to expend its nuclear fuel and basically collapse in on itself, and then form a black hole.
So, if we look at this night sky, at this epic splendour above us, you don't just see stars.
You see this kind of potential for discovery.
Astronomers are only just beginning to unlock the potential of this vast dataset.
Today, astronomers like Simon are using a new set of tools to mine the eternal dataset of the stars.
And as these tools improve, they can detect more and more detail in the patterns of the universe.
In some ways, beach-combing for shells is a bit like great astronomy at the moment.
You know, we have a sort of wide plain, but we pick the low-hanging fruit.
A big shell like this is pretty easy to pick up.
You know this might be representative of what we could do 50 years ago.
And then we start to get down into smaller stuff, right down into the sand, into the heart of the matter, to a point where we can see something deeply hidden that we're really interested in.
And the key to getting there is the next generation of data, really big data.
Simon's challenge is to find new unmined data about the universe that will reveal new discoveries.
His latest project promises to deliver exactly that.
The key to it is a site deep in the Karoo, a broad semi-desert in South Africa's Northern Cape.
We're about 200-odd kilometres away from Cape Town and there's still another maybe 500 to go before we get to the site, and as you can see, it's the road to nowhere, really.
The Cat 7 array of radio telescopes are listening for electrical signals that have travelled billions of light years and are infinitesimally weak.
We need to be really far away from people and the things that they do because anything modern really interferes with our observations.
So people, their microwaves, their cell phones, their cars, all these things really drive us further and further away.
The data Cat 7 has already catalogued has increased our knowledge of the universe.
We've been imaging neutral hydrogen in our galaxy.
We've been looking at transient events.
We've looked at pulsars, but really, we're limited by data.
We need more data to do better science.
The signals Simon looks for are so small that, despite a combined detecting area of over 1,000 square metres, these seven telescopes capture just two megabits of data per second, and Simon's ambitions go far beyond that.
I think really understanding how galaxies came to be the way they are, you know the evolution of the universe, I think that's one of the most exciting things we can anticipate addressing and to really answer the questions of how did the universe get to be as it is and where is it going? It's only achievable through big data.
We really need to catalogue the entire universe.
We have to figure out what it was like at every epoch and that's the only way to really understand how it evolved and where it's going.
Life in the Karoo is about to change.
These telescopes are set to be joined by more, thousands more.
A new telescope array will fill the valley, covering a square kilometre, the biggest array in the world.
Over the next ten to fifteen years, this valley is going to fill up with telescopes.
As far as the eye can see, you'll see telescopes forming a vast array, bringing data, siphoning back into the Karoo where science is going to be done on an unprecedented scale.
Work has now begun on the array.
The new telescopes will receive 30 terabytes of data per second.
It will be the biggest data collector ever built.
We're moving into the regime of unprecedented amounts of information.
We have to take a step back from the data and think, "What are we trying to extract from the data? "What is the information that's actually contained therein?" And make sure that our tools and our techniques that we bring to bear look for the patterns in the data.
This really requires a new breed of astronomers to see how we're going to change from where we are now to this next big shift.
Simon Ratcliffe and his team have to develop a way to attain the important patterns in a huge flood of telescopic data.
If they can do it, they will discover the greatest secrets of our universe.
So it's pretty easy to get lost in the challenge and the grand endeavour of the whole thing and feel, you know, you're kind of the master of the universe, sucking down and unlocking the secrets out there.
You sort of sit here and think, "I'm this little small human and what right do I have "to go and pull these secrets out of the universe? But that's our task.
You know, that's what we're going to do and I think that this project and these data challenges really offer us that opportunity to understand fully our universe, where it came from and where it's going.
The data revolution is transforming our world.
We're devising ever more complex ways of gathering data and ever more ingenious ways of mining it.
Data is becoming the most valuable commodity of the 21st century.
The world of big data has arrived.
Face the wall, face the wall before I put you in handcuffs.
The police are trying to predict crime before it even happens.
It actually gives us a forecast about where crime is most likely to happen in the next 12 hours.
In the City of London, this scientist-turned-trader believes he's found the secret of making millions, with maths.
The potential to do things with data is fantastic, fantastic.
And in South Africa, this star-gazer has set out to catalogue the entire cosmos, by listening to every single star.
What unites these different worlds is an explosion in data.
The volume of it, the dynamic nature of the data is changing how we live our lives.
In just the last few years, we've produced more data than in all of human history.
In this film, we follow the people who are mining this data.
It's set to become one of the greatest sources of power in the 21st century.
6am, Los Angeles.
The start of shift in the Foothill division.
Officer Steve Nunes, a 12-year-veteran of the LAPD, and his partner Danny Fraser head out to patrol.
Right now, we're north of Los Angeles, downtown Los Angeles, in the San Fernando Valley area.
Their beat is one of LA's toughest neighbourhoods.
There's a lot of BFMVs, burglary from motor vehicles.
There's a lot of robberies, there's a lot of gang and narcotic activity over here.
There's a lot of people selling drugs.
The gang that's in this area are called the Project Boys.
They're a Hispanic gang.
Despite their experience and intimate knowledge of the neighbourhood, today, their patrol is being controlled by a computer algorithm.
You know, I wasn't really too happy about it, you know, specially as a police officer you know, we kind of go off of what we know from our training.
We weren't too happy about a computer telling us where we need to do our police work and what area we need to drive around.
Steve and Danny are part of a ground-breaking trial.
An equation is being used to predict where crime will occur on their watch.
I saw some people hanging out by the laundry, like the little laundry.
I guess If its predictions are correct, the system will be rolled out across all LA Hey, stop, yeah, stop.
Stop, stop, stop.
Put your hands on your head.
.
.
and the computer algorithm will become a routine part of Steve's working life.
Spread your feet, face forward.
Have anything on you? Stop moving.
Face the wall, face the wall before I put you in handcuffs.
You a Project Boy too or no? The ambition to predict crime was born out of a remarkable collaboration between the LAPD .
.
and the University of California.
Jeff Brantingham might seem an unlikely crime fighter.
A professor of anthropology, he is an expert on remote hunter-gatherer tribes in China, but he's convinced that from remote China to gangland LA, all human behaviour is far more predictable than you might like to believe.
We all like to think that we are in control of everything, but in fact all of our behaviour is very regular, very patterned in ways that is often frightening to us.
Offenders are no different.
They do exactly the same things over and over and over again, and their criminal offending patterns emerge right out of that regularity of their behaviour.
Jeff believed he could find repeating patterns of criminal behaviour in the LAPD's vast dataset - 13 million crimes recorded over 80 years.
The LAPD have droves and droves of data about where and when crimes have been occurring.
It represents a treasure trove of potential information for understanding the nature of crime.
The LAPD already use their crime data to identify hotspots of crime, but that only tells them where crime has already struck.
We've gotten very good at looking at dots on a map, and where, where crime has occurred and the problem with that is that, sometimes, you're making an assumption that today is the same as yesterday.
Jeff Brantingham planned to do something more radical and more useful - predict the future.
He believed he could use patterns in the crime data to predict where and when crime was likely to occur.
We've long used the patterns in nature to make predictions.
From the setting sun, we learned when to expect the new day.
The phases of the moon allowed us to forecast the ebb and flow of the tides.
And from observing the patterns of the stars, we mastered the art of navigation.
But Jeff Brantingham wanted to do something far more ambitious.
He wanted to tease out patterns in the apparent chaos of human behaviour, to uncover them in the LAPD's vast dataset of 13 million past crimes.
You can have gut feelings about the crime but, ultimately, you need to think about working in a mathematical framework because mathematics gives you the ability to understand exactly why things are happening within the data in a way that gut feelings do not.
Jeff needed an expert in pattern detection.
He turned to his colleague, UCLA mathematician George Mohler.
As mathematicians, we're interested in understanding what's around you so, you know, how do waves propagate if you throw a pebble into the water? The distribution of trees in a forest.
So mathematical models can help you understand those types of things.
George could use mathematical tools to see what was hidden in the crime data.
And there were hints of a pattern in it.
What you see is that after a crime occurs, there's an elevated risk and that risk travels to neighbouring regions.
So what we wanted to do is develop a model to take that into account so police could maybe use that information to prevent those crimes from occurring.
He started with a mathematical model that was already being used, right here on the west coast of America.
Southern California is earthquake country.
Sitting on the San Andreas Fault, there's an average of 10,000 earthquakes and after-shocks every year.
The biggest for 100 years was the Loma Prieta earthquake of 1989.
Its epicentre was here, just outside Santa Cruz, California.
There is quite simply no mathematical model that can predict an earthquake like this one.
But after the earthquake come the after-shocks and that's a different matter.
So we're several hundred metres from the epicentre.
Nearby was one of the after-shocks of the original Loma Prieta earthquake.
After a large earthquake occurs, there is a probability that another earthquake will follow nearby in space and time.
George discovered seismologists had found a pattern to earthquake after-shocks and developed an algorithm to predict these after-shock clusters.
These types of clustering patterns are also seen in crime data.
So, after a crime occurs, you will see an increased likelihood of future events nearby in space and time.
You can think of them as after-shocks of crime.
George and Jeff took the equation for predicting earthquake after-shocks and began to adapt it to predict crime.
So the model is broken into several parts, so the overall rate of crime, which we'll call Lamda, models the rate of events in space and time.
We use the Greek letter Myu to represent the background amount of crime that's going on.
The second component to Lamda is G.
G models the distribution of crimes following an initial event.
This whole term overall describes what we call self-excitation, that a crime that occurs today actually self-excites the possibility of future crimes.
So Lamda equals Myu plus G, is that right? Well, sort of, so Lamda equals Myu plus G positioned at all the past events in your dataset.
George and Jeff took their algorithm back to the streets of LA.
When they plugged the old crime data into the equation, it generated predictions that fitted what had happened in the past.
But could it also predict the future? They began to produce daily crime forecasts, identifying hotspots where crime was likely to strike in the future.
11 Nunes, there.
Sir.
23 Fowler.
Wallier.
Sir.
Let's go to the mission maps if you would, please.
Today, the LAPD is putting these predictions to the test.
The cops in Foothill are assigned boxes of just 500 square feet where the algorithm predicts crime is most likely to occur in their 12-hour watch.
Right, predictive mission for today is, we've got a few boxes here to address, in Adam 11's area, 12260 Foothill Boulevard.
They're instructed to hit their boxes as often as they can.
Osborne and Foothill Boulevard.
So you've got your mission for the day? So let's go out there, have fun and be safe.
Yeah, there is a homicide blinking up there.
The trial is monitored at the real time crime centre in downtown LA.
What we're looking at here is the forecast that was produced by the PredPol software.
So if you see on the centre of this map, we've got three nearly contiguous forecast boxes around this area, and then an adjacent one.
So this is good information for the officers.
They can go out there, work up and down that street, Sheldon, and some of those side streets, and look for criminal activity or evidence that criminal activity might be afoot.
OK, Roger, we'll take it.
SIREN WAILS Steve and Danny have got the word to go.
The model has predicted car crime in a box on their beat.
It's a kid.
Yeah, it's the same address as that kid that we had yesterday.
When they reach their assigned hotspot, they find a cold-plated car.
The licence plates don't match the vehicle.
They're getting what they need, huh? When they call the number in, it turns out the car's been stolen.
It was an area where there's a lot of GTAs, which is "grand theft auto", people were stealing cars.
Right out of roll call, right when we got down one of the boxes they went into, one of the areas they started patrolling, right away they ran a car and it came back stolen.
In Foothill, they found using the algorithm led to a 12% decrease in property crime and a 26% decrease in burglary.
At first I said we weren't big on it, you know, and it came to the point where, little by little, you start to see crime in certain areas deteriorate because of us being in that box for, you know, even ten minutes, twenty minutes, even five minutes.
So, we definitely see how it is working.
The model is continuously updated with new crime data, helping to make the predictions ever more accurate.
This whole year since January, Foothill area has been leading the city of Los Angeles in crime reduction, week to week, so the officers, once it started working, then we had buy-in from them and now it's just a regular course of how they do business.
Predictive policing will be rolled out right across the city of Los Angeles, and is being trialled in over 150 cities across America.
And predicting crime from crime data is just one way the data miners are changing our world.
In fact, the tools that Jeff used to mine the LAPD data can be applied to any dataset.
The vast complexity of the universe .
.
the diversity of human behaviour .
.
even the data we create ourselves every day.
The data miners are reaching into every area of our lives, from medicine to advertising, to the world of high finance.
Professor Phil Beales is a geneticist at the forefront of this data revolution.
The methods he uses today can be traced back to an extraordinary man living in London 300 years ago.
The first data miner, the amateur scientist, John Graunt.
Graunt was living through the greatest health threat of his day, the bubonic plague.
Its causes were an utter mystery.
Graunt began searching for patterns in the parish death records, known as the Bills of Mortality.
The Bills of Mortality were essentially random sets of information which he brought together and organised and made sense of that information, so Graunt realised that this information was essentially a gold mine.
Graunt wanted to know who had died of the plague and who had died of something else.
He compiled all the death records together.
And this dataset allowed him to see patterns that no-one else had seen.
He listed a number of the causes of death and categorised them in such a way that one can now look back and see exactly what people died of.
For example 38 people had King's Evil, which is actually tuberculosis of the neck or otherwise called scrofula.
One patient was bit with a mad dog, another 12 had French Pox, which is actually syphilis.
And in the plague deaths, Graunt found a revealing pattern.
It overturned an idea that everyone shared at the time about what caused the disease.
He was able to refute the widely-held belief that plague might have been caused by person-to-person contact, and he was also able to refute the widely-held belief at that time that plague tended to increase during the first year of the reign of a new king.
And the more Graunt looked at the data, the more hidden patterns he discovered.
People started to see the city of London in an entirely new way.
He was the first to estimate its population.
He proved more boys were born than girls, but that higher male mortality meant the population was soon evenly balanced.
He showed that surprising and rather useful ideas could be mined from data, if you knew how to examine it.
This was a completely new way of looking at the information and from extracting really useful data, so Graunt was essentially a pioneer.
Graunt was the founding father of statistics and epidemiology, the study of the patterns, causes and effects of disease.
And it's this same power of data that has become fantastically valuable in modern medicine.
Today, Professor Phil Beales is mining a new human dataset, the three billion bits of genetic information that make up the human genome.
He's searching our DNA for clues to help him diagnose and treat illness.
Let me just take a quick look at you.
Jake Pickett is one of his patients.
When Jake was born, there were no extra skin tags or extra toes or fingers or anything like that? I had a skin tag on my arm.
For 14 years, Jake has lived with an unusual range of symptoms, including learning difficulties, obesity, and poor eyesight.
You had an earring in there? Yeah.
Oh, OK, you weren't born with that! His unidentified condition has baffled his parents and doctors.
We've had a lot of tests over the years, and actually, my paediatrician of the time had said to me, "He's such a happy, lovely young boy.
"Why do you want to keep sticking him with needles?" and it made me a bit frightened to keep asking for help, because then I thought maybe the medics would think there's something wrong with me.
But in the course of Jake's lifetime, medicine has changed.
Professor Beales now has the tools that may help Jake and his family unravel this mystery.
.
.
because they know it's difficult for him.
As part of the blood test today, we will take some of that and from that blood take the DNA, extract the DNA, and then we will do the genetic testing on those.
Are you happy with that? Yeah, yeah.
It will take a few weeks.
So the key really is to try to nail down the diagnosis in this particular situation, if we can.
OK, that's great.
This is just to clean it.
He will search Jake's DNA, hunting for the tiny telltale variations in his genes that may have caused his condition.
Just hold still for me.
Every patient whose genes are analysed adds to the growing database of DNA.
It helps doctors devise new treatments and identify previously mysterious conditions.
Well done, it's all done.
OK? Phew! OK? It wasn't that bad.
Over the last ten years, this technique has successfully revealed the genetic basis of many diseases.
We have got here the coverage and Good, OK, well it looks like we've got our gene then, doesn't it? I hope so.
OK.
Being able to identify a disease is often the first step in helping patients.
So patients live with the uncertainty of a lack of diagnosis for many, many years and we can't underestimate the benefits and the importance of having this diagnosis, so through molecular testing such as this, we're able to provide those patients with a certain level of comfort when it comes to a diagnosis, and, in a sense, closure, so they can move on to the next chapter.
Teasing out the patterns in the human dataset is transforming medicine.
Data is becoming a powerful commodity.
It's leading to scientific insights and new ways of understanding human behaviour.
And data can also make you rich, very rich.
TRADERS SHOU When it comes to making money out of data, David Harding's rather good at it.
30 years ago, he set out to bring data analysis and algorithms to the trading floors of the City.
This is how all trading used to be done.
All trading used to be done in rooms full of people like this.
They are shouting the prices they will buy and sell at, they are agreeing the deals, the rises and falls in the prices are almost like the rises and falls in the noise level.
Today, the London Metals Exchange is the only trading pit of its kind in Europe.
Noisy, emotional and chaotic.
To a science graduate from Cambridge, it came as a bit of a surprise.
When I went into the City, I assumed because it was the world of banking and high finance, I assumed that it would all be very, very rational and very efficient and very disciplined and well-organised, rather like the body of knowledge I had been taught at Cambridge in physics and chemistry.
These bodies of knowledge were organised and rational, and it wasn't at all like I expected.
But that it was, you know, somewhat chaotic, in a way.
Buying and selling strategy in those days tended to be governed by instinct and intuition.
I watched the prices going up and down on the board up there.
I plotted graphs by hand, standing at the edge and followed these graphs and I became convinced that there was a pattern to the rises and falls in prices.
David Harding wanted to bring mathematics to the problem.
He believed that if he had enough data, he could predict patterns in the prices and make money, but the prevailing wisdom was that this was an impossible task.
According to the financial orthodoxy, the rises and falls in prices that take place here are completely random.
Nobody can ever predict them, however clever they are or however much foresight they have.
Essentially, cutting to the chase, the idea is that you can't beat the market.
Like all data miners, Harding needed two things.
Data, a lot of it, and computer algorithms to spot the patterns.
In the mid-1980s, the introduction of computers to the City made data about prices accessible.
Harding had to develop the tools to analyse it.
At that stage in my life, I could program a computer! HE LAUGHS I could program a computer, I could read the data from the new exchange, I could conduct analysis of that data and that, to me, was rather an elementary thing to do.
I was surprised that other people hadn't done it first.
You'd have thought that, where all the millions and billions are all sloshing around, you'd have thought that lots of rational, intelligent people would have done these sorts of things.
The company David Harding founded 20 years ago now invests billions of pounds on the basis of data.
That is a lovely dataset you've created, that's why I was waxing rather lyrical.
You might just find a pattern! And that's a large dataset.
That's a lot of stocks on a lot of dates.
Harding is now far from the only scientist in the City.
His company alone employs over 100 scientifically trained data hunters, from astrophysicists to cosmologists, to mathematicians and meteorologists.
They've become known as quants.
Well, there's the joke which is, what do you call a nerd in 20 years' time? And the answer is "Boss," you know! It reminds me of Bill Gates who said at any other point in history he would have been sabre-toothed tiger food.
His company is built around the idea that if you have enough data and the expertise to read it, you can spot trends and links that no-one else has noticed.
He and his analysts can seek out patterns in anything that is bought and sold.
Take, for example, coffee.
Obviously, they will probably almost certainly sell less coffee on a Sunday.
Now that's not a revelation, or that they sell more coffee in winter, because people are indoors more often in winter, but there is an art or a science or a skill which is using the data to find out more interesting things and I'm sure that if my analysts went to work, we could find out much more interesting things than that.
The process begins with data, collecting any information that might be relevant to the cost of coffee.
The data, you can't hear it and you can't see it.
You need specialised tools to interrogate and take decisions about that data and those tools are not the eye and the ear.
They are the modern computer.
Algorithms can then search the data, looking for factors that link to the rises and falls in coffee prices.
The yield of coffee bean harvests for example, the strengths of the economies and currencies of coffee-producing countries, as well as consumer demand for coffee.
In the vast dataset, tiny significant signals appear and it is these signals which hold the clues to when to sell and when to buy.
The idea of the exercise is to read in the data on all the companies around the world, analyse that data using rigorous scientific methods and make sensible, rational inferences from that data, not just take decisions on the basis of human feelings and how you feel today and what you heard from your friend and so on and so forth, but really bringing to bear the scientific method much more.
It's a strange mathematical social science, but science, it is.
Here, they gather data across hundreds of markets going right back in time.
Daily metal prices from 1910, food prices dating to the Middle Ages, and London Stock Exchange prices stretching back to 1690.
And every day, they collect new data on 28,000 companies across the world.
We have data coming in almost 24 hours a day for nearly all the markets we trade, and the last time I looked, we had something like 40 terabytes of data in our database, and that's the equivalent of about 70 million King James Bibles.
The ambition is that somewhere in this 40 terabytes of data there are patterns that can be used to predict price rises and falls, and you don't need to predict price changes with pinpoint accuracy.
The odds just need to be a bit better than even.
If you throw a coin and there's a 50/50 chance of it landing heads or tails, then clearly, there's no way of profiting from that.
If however, we had the ability to know that heads was going to come up 52% of the time or 53% of the time, then that would be a great investment business.
You should look closer to the data, then there is something which looks a bit bizarre.
First If you have the resources and can make enough investments, spotting even a tiny variation can lead to large profits.
Over the last 20 years, this approach has paid handsomely for David Harding.
There's never really a point at which you can relax and sit back and go, "There, I have proved my point!" Of course, you know, over the years the ideas have been successful, the company has grown.
It gives me great pride and satisfaction.
Of course, investing in financial markets remains a gamble.
There is no universal law of finance.
Stock market crashes, recessions, they're clearly not easy to predict.
The patterns in the data are constantly shifting and changing.
There is no one right answer.
Every day, week or month, you are being proven wrong by having your ideas put to the test, and that is a gift because it enables you to maintain a level of humility that people may, in other situations, lose, and humility is actually a vital ingredient of proper scientific investigation.
I think most good scientists tend to be quite humble people.
The world of finance has been changed forever by the data revolution.
The effects have spilled over into everyday life.
And the data revolution is set to become even more personal.
The fastest growing dataset of all is the one being created by you.
Every time we call, text, search, travel, buy, we add to the data mountain.
All told, it's growing by 2.
5 billion gigabytes every day.
All that data is valuable, and it's brought out the data hunters, like Mike Baker.
The volume of it, the dynamic nature of the data is changing how we live our lives and if you collect this information over millions of people, you can start to guess what they may be interested in next.
He saw an opportunity to bring the data revolution to the world of advertising.
Instead of relying on customers seeing a billboard, it was now possible to beam the adverts directly to them.
We started to look and think about all of the data.
If we collected enough about past behaviour, could it be predictive in a way that would be useful for a business, in terms of trying to connect to people? Mike wanted to mine this data, to predict what people might want to buy.
His first hurdle was how to search through the vast amount of data we produce every day to find the tiny signals of our consumer interest.
I quickly realised that a big part of the problem was actually the math.
It was clear there were no systems, not even really mathematical constructs, where you could capture the information, make sense of it and then turn around and create actions across hundreds of millions of people simultaneously.
As if capturing the vast dataset created by mobile computing wasn't challenge enough, Mike also wanted to mine it virtually instantaneously.
He wanted to find hints of what people might be want to buy even before they'd realised it themselves.
He needed to find a collaborator.
The ideal partner for Mike came from a completely different world.
Bill Simmons was an aerospace engineer at MIT.
He was working on one of NASA's most ambitious tasks of all time, a potential manned mission to Mars.
A mission to Mars is extremely complex, especially if you include people, and it gets very hard if you want to bring the people back.
Bill's team started to work out how to plan all the elements necessary for a manned Mars mission, and discovered the real problem was that there were so many different options to choose from.
We found there were about 35 different major decisions, and many, many, small decisions that follow.
For things like how many crew, what kind of propellant to use, how many rockets, big ones or small ones, what kind of orbit trajectory? So you add all those up and all the different possible choices you can make was 35 billion different possible Mars missions.
And that would have taken, if we were to go through all 35 billion, it would have taken infinite time to find one that works.
NASA needed a way to narrow down the possibilities.
Bill turned to decision theory.
It's a complex branch of maths but the principle is the same as something really quite simple - shopping.
Even buying dinner for two, you've got thousands of decisions to make.
You could take all day.
You could try every food, and it would take you hundreds of years to see every combination of apples and, I don't know, mustard or pears and bananas.
To make it simple, you can apply the principle of decision theory.
You can make decisions about things in many different orders.
If you want to decide what to make for dinner, you can decide what food you like first or you can decide what tools you're going to use.
So you could say, "I'm going to cook things with a spatula," and then you haveit doesn't really narrow things down for you.
The trick is to put your decisions in the right order.
If you take big decisions first, you eliminate a lot of smaller decisions and speed up the process.
I did bring a plan.
I'll show it to you.
This is, um I have three different kinds of recipes.
I can either make salmon, a white fish or branzini, three of my favourite recipes.
If I choose salmon, I'll need mustard and capers and lemon.
If I choose white fish, parsley, eggs and lemon.
And branzini, lemon and rosemary.
So here we are at the seafood section.
Looking around, I see they have some very nice fresh Atlantic salmon and I think that's what I'll buy.
PROGRAMME-MAKER: You strike me as a very organised guy.
Is that a typical Bill thing to do a list like that? Yes, this is.
You know, studying decision theory, this is how I think about things.
So now the rest of my plan is set in motion.
All I need to do is buy mustard, capers, lemon and some salad, and possibly a side dish, if I see something I like.
Decision theory, which works so well on a shopping trip, can also be applied to the 35 billion decisions in a manned Mars mission.
If the first decision only had two choices, you could have two crew or three crew, if you find after a few more decisions that two crew is not possible, it won't work, because you need at least two people in the lander and one person in orbit, then you've eliminated essentially, if you made that decision first, early enough in the process, you've eliminated half of the permutations you need to look at.
So this increases your speed by half, and if you continue to use this process over and over again, you continue to speed up your decision process, doubling every time, for example, so it becomes exponentially faster.
Bill created a decision-making algorithm which was able to process information, putting the decisions that narrowed down the most options first.
The 35 billion decisions fell to just over 1,000.
It was a revolution in the speed of data processing.
Mike Baker realised Bill's decision-making model was just what he had been looking for.
They joined forces and adapted Bill's super fast decision-making machine.
Now it scans the billions of bits of data we produce, quickly finding clues to what we might buy, then sends a personalised advert from one of their advertising clients.
We're processing hundreds of thousands of advertisements per second, potential advertisements, and determining within 100 milliseconds, so one tenth of a second, much faster than the blink of an eye, whether that advertisement is good for any one of our clients.
The models learn what you might be tempted to buy, and where and when you might buy it.
They all work in concert and they pick up on patterns, so they see the same anonymised user triggering similar behaviours over and over again.
The machine learns this is a person who likes Italian food, interested in Sedans, and likes rock music from the '60s.
The data analysts predicting what you might buy are creating a world of personalised advertising.
If you choose not to personalise the advertising, you'll still get advertising.
It's not a choice to have no advertising.
It's just that it'll be less relevant to you and, you know, potentially more annoying.
We're all familiar with what that's like to see something very annoying.
I saw some today at my house.
I think it was erectile dysfunction.
Totally irrelevant to me! And advertising is just the start of exploiting our personal data mines.
Even the most insignificant data of everyday life is being mined, with potentially life-saving consequences.
Cathy Sigona is a retired school principal in San Francisco.
She has a condition called atrial fibrillation, which makes her heart beat irregularly.
It felt like a big fish in my chest.
And it was one side here, and then it would just bounce back and forth, and what can happen is the blood can pool and that can cause a clot which then can cause a stroke.
So that's where the real seriousness lies, is the fact that I could stroke out.
The causes of atrial fibrillation are unknown, so predicting when episodes may occur is vital.
Hi, Nanette, this is Cathy.
So Cathy is about to take part in a trial.
Her doctor is going to monitor her symptoms using data extracted from how she uses her mobile phone.
Dr Jeff Olgin is Cathy's cardiologist.
Because the mobile phone has become such an integral part of people's lives, it's with them most of the day and most of the time, so that becomes a very good real-time data collector for them.
Dr Olgin is trialling software that will record Cathy's daily behaviour.
Any changes to her usual routine might indicate she's unwell.
As a really practical, simple example, let's say you get up and go to work every week day at 7 o'clock.
If all of a sudden that's changed, we'll notice in a difference in your behavioural pattern that might trigger us to say, you know, "What's going on?" And there's lots of fun things that sort of pop up Algorithms in the software will search Cathy's data, and if they find signals of abnormal behaviour, they will trigger an alert to Dr Olgin.
It could be a life-saver.
Hopefully in relation to atrial fibrillation in particular, hopefully we will be able to identify behaviours or behavioural patterns that might predict an episode.
Our personal data trails can be used to peer into our behaviour, discovering clues to illness.
And so if we can find a cause that we can fix down the road, and I'm not talking the next couple of weeks, but in the next couple of years, that we can start alleviating some of the stresses that cause me to have atrial fib, I would be extremely pleased.
I have a lot of life left.
The idea of predictive and personalised medicine is coming closer than ever before.
And it's the data we have from the moment we're conceived that will make this idea a reality.
Professor Beales' clinic relies on the biggest human dataset of all, the human genome.
Just 20 years ago, his work would have been all but impossible but now he can analyse his patients' DNA to pinpoint the genetic mutations causing disease.
We still have a myriad of diseases, particularly at this hospital, where there are many, many children who do not have yet a diagnosis for their often rare condition, and I think at the moment, one of the things we really need to do is to be able to sequence as many of these children as possible so that we can begin to unravel a lot of these mysteries.
Genetic diagnosis has already helped identify new conditions, allowing doctors to devise new treatments and research cures that promise to improve our lives.
And so far, we only really understand about one percent of our genome.
These volumes represent the whole of the human genome, the coding element of the human genome.
In other words, the sequence of all of the letters that go to make up a single human being.
This is a huge discovery.
However, it is just the tip of the iceberg.
The medical use of our DNA data is in its infancy.
We're just beginning to glimpse the 99% of the genome which we used to think was junk, but now realise is vitally important.
So the 99% of the genome that's left for us to understand is going to represent a huge task.
There's an enormous amount of information in there and we have to be able to relearn, we have to actually be able to develop new tools to be able to understand the code that's hidden within that vast chunk of the genome.
But even the huge dataset of the human genome is dwarfed by the one that has its roots in the very first data science.
Astronomy.
For centuries, astronomers like Simon Ratcliffe have been collecting data from the billions of stars and galaxies in the night sky.
In many ways, astronomy was the first of the natural sciences, and it was the Babylonians who kicked that off and they started to notice that it wasn't just random.
There were patterns.
There's certain things in the sky that seem to move over and they're always fixed, relative to each other.
Those were the stars.
Then they noticed that certain objects in the sky seemed to wander.
That was the planets.
And so what they did, as you do, is you record the movements.
They wrote down this data and in recording that data over long periods of time, they were able to tease out the patterns inherent and that gave the ability to start to understand the universe.
The science of astronomy was founded on data hunting.
Astronomers use the patterns of nature, the predictability of stars, to unlock the secrets of the universe.
At the moment, we have the Southern Cross to the left.
We have Scorpio right in ascendance above us.
Scorpio was first identified and named over 5,000 years ago.
And if you look closely, you can see a bright red star there called the Heart of Scorpio.
That's a star called Antares, which is a super-giant.
Now with more data, scientific equations and mathematical models, astronomers can forecast the fate of Antares.
This is a fairly massive star that's getting towards the end of its life.
What's going to happen is it's going to expend its nuclear fuel and basically collapse in on itself, and then form a black hole.
So, if we look at this night sky, at this epic splendour above us, you don't just see stars.
You see this kind of potential for discovery.
Astronomers are only just beginning to unlock the potential of this vast dataset.
Today, astronomers like Simon are using a new set of tools to mine the eternal dataset of the stars.
And as these tools improve, they can detect more and more detail in the patterns of the universe.
In some ways, beach-combing for shells is a bit like great astronomy at the moment.
You know, we have a sort of wide plain, but we pick the low-hanging fruit.
A big shell like this is pretty easy to pick up.
You know this might be representative of what we could do 50 years ago.
And then we start to get down into smaller stuff, right down into the sand, into the heart of the matter, to a point where we can see something deeply hidden that we're really interested in.
And the key to getting there is the next generation of data, really big data.
Simon's challenge is to find new unmined data about the universe that will reveal new discoveries.
His latest project promises to deliver exactly that.
The key to it is a site deep in the Karoo, a broad semi-desert in South Africa's Northern Cape.
We're about 200-odd kilometres away from Cape Town and there's still another maybe 500 to go before we get to the site, and as you can see, it's the road to nowhere, really.
The Cat 7 array of radio telescopes are listening for electrical signals that have travelled billions of light years and are infinitesimally weak.
We need to be really far away from people and the things that they do because anything modern really interferes with our observations.
So people, their microwaves, their cell phones, their cars, all these things really drive us further and further away.
The data Cat 7 has already catalogued has increased our knowledge of the universe.
We've been imaging neutral hydrogen in our galaxy.
We've been looking at transient events.
We've looked at pulsars, but really, we're limited by data.
We need more data to do better science.
The signals Simon looks for are so small that, despite a combined detecting area of over 1,000 square metres, these seven telescopes capture just two megabits of data per second, and Simon's ambitions go far beyond that.
I think really understanding how galaxies came to be the way they are, you know the evolution of the universe, I think that's one of the most exciting things we can anticipate addressing and to really answer the questions of how did the universe get to be as it is and where is it going? It's only achievable through big data.
We really need to catalogue the entire universe.
We have to figure out what it was like at every epoch and that's the only way to really understand how it evolved and where it's going.
Life in the Karoo is about to change.
These telescopes are set to be joined by more, thousands more.
A new telescope array will fill the valley, covering a square kilometre, the biggest array in the world.
Over the next ten to fifteen years, this valley is going to fill up with telescopes.
As far as the eye can see, you'll see telescopes forming a vast array, bringing data, siphoning back into the Karoo where science is going to be done on an unprecedented scale.
Work has now begun on the array.
The new telescopes will receive 30 terabytes of data per second.
It will be the biggest data collector ever built.
We're moving into the regime of unprecedented amounts of information.
We have to take a step back from the data and think, "What are we trying to extract from the data? "What is the information that's actually contained therein?" And make sure that our tools and our techniques that we bring to bear look for the patterns in the data.
This really requires a new breed of astronomers to see how we're going to change from where we are now to this next big shift.
Simon Ratcliffe and his team have to develop a way to attain the important patterns in a huge flood of telescopic data.
If they can do it, they will discover the greatest secrets of our universe.
So it's pretty easy to get lost in the challenge and the grand endeavour of the whole thing and feel, you know, you're kind of the master of the universe, sucking down and unlocking the secrets out there.
You sort of sit here and think, "I'm this little small human and what right do I have "to go and pull these secrets out of the universe? But that's our task.
You know, that's what we're going to do and I think that this project and these data challenges really offer us that opportunity to understand fully our universe, where it came from and where it's going.
The data revolution is transforming our world.
We're devising ever more complex ways of gathering data and ever more ingenious ways of mining it.
Data is becoming the most valuable commodity of the 21st century.
The world of big data has arrived.