This blog grew out of a panel discussion on 'Randomness and Order' as part of the Humanities and Science series. This panel discussion allowed academics in the field of quantum physics, music, probability and medieval history to explore what randomness meant in their different disciplines. Click here to view the related blogs by Professor Ian Walmsley (physics perspective) Professor Chris Wickham (historial perspective), and Professor Jonathan Cross (musical perspective).
This blog is by Professor Alison Etheridge. Professor Etheridge is a Fellow by Special Election at Magdalen College and works for the Department of Statistics.
Stepping into Professor Alison Etheridge’s office in the Peter Medawar Building at the east end of South Parks Road, you might be surprised to hear that the focus of her work is randomness. Neat piles of paper sit on the table; tidy rows of books line the shelves; two computers sit humming, squarely aligned to the desk. Everything seems very well ordered indeed.
'It didn’t used to be like this,' admits Etheridge. 'It was all much more disorganised, but the health and safety inspectors told me I had to neaten up.' Her choice of words, unlike mine, is careful – because she knows what the word random really means, and it certainly has nothing to do with how tidy her desk is.
What is it that makes something truly random in a mathematical sense?
That’s actually quite a difficult question, but I’d say that something is random if there are a number of possible outcomes and one doesn’t know before performing an experiment which of those outcomes will occur, but can assign some degree of certainty to the different outcomes.'
Surprisingly, that means that we can think of some things as being more random than others. You could think of something being a little less random if the possible outcomes are more closely aligned to one another, and more random if the possible outcomes are more disparate.
I used to have to commute to Cambridge from Oxford, and I used to say that doing that journey on the back roads through Woburn was in some sense less random than going around the M25. The M25 was on average the fastest, but it could take you four times as long as you expected; if you went the other way it was a bit slower but it rarely took you more than 10% longer than you expected.
That’s a rather silly example, though. I suppose stock markets are a more sensible place to look. The variability in the price of the stock associated with a very large and stable conglomerate is very small compared to, say, some kind of tech start-up, whose finances would be much more volatile.
Given randomness is impossible to predict, though, what hope do academics have of being able to study it as a concept?
The way people first started modelling randomness was to, say, flip a coin lots and lots of times, and then plot the distribution of what they saw. They’d look as the result and say: actually, when we perform this experiment, usually around 50% of the time we get heads. It was impossible to say ahead of each coin toss what the outcome would be, but the results can provide vital information for use in the future.
It’s known as the frequentist approach, because you look at the frequency with which different outcomes occurs. Then you can use that as a model for what you expect to happen going forward in time. Of course, that’s only good if you’re modelling something that happens a lot, which is going to be behaving normally. It’s not good for what we call extreme events – it’s not a good way to model earthquakes, for instance.
It’s also quite dangerous in finance. If you look at the way banks are asked to model their risk, they assign probability to the different outcomes of the stock markets. So, they ask, if this certain thing happens, what will happen to the balance sheet? And they supposedly do this to things that will happen down to a probability of 1% or even 0.1%. But since the stock markets haven’t run under consistent rules for more than a decade, they can’t really use a frequentist approach. One could, perhaps, say that it’s a failure of statistics.
So is the study of random processes just as risky?
No, because often the whole problem, as in my work, is flipped on its head. In the context of my work, I already know the outcome – what I don’t know is how we arrived at the outcome. I’m very interested in the way biological populations evolve, where there is a lot of the randomness just because two individuals in the world choose to mate at random.
Of course, tracing back the random process of procreation over time is bewilderingly complex – so Etheridge and her colleagues start with a simple example that they can gradually make more complex. “At first we might use an individual-based model that supposes the whole population lives in a big melting pot with no structure to it. Then, we can say that everyone’s equally likely to mate with everybody else, and we can sample individuals from the population at random and see what the outcomes would be. If you do that for all the possible outcomes, you can begin to predict how the population might grow. Then we usually turn it around and say: given we know what the population is like now, in the real world, what would the most likely way to arrive at this situation?”
Does such a quantified way of looking at biological populations require incredibly large and accurate datasets that explain the entire population over time?
In the kind of work I do, we’re not looking at small-scale populations,” she explains. “If I look at the human population, I’m interested in knowing the relatedness between a human here and a human there, which was probably determined somewhere in Africa, or perhaps Europe, many generations ago. So I’m looking for parameters that I can put into a model that are robust to local variation. I don’t want to make a prediction that depends on knowing this particular individual who lived in Wolverhampton in 1783 – I want a model which is insensitive to that kind of detail, but will still predict the broad population outcomes.
So how does one create a model that can predict broad trends without knowing the nitty gritty of the population?
We write down what we call stochastic models. At school, we all worked with equations where you could put in one value and get out another. With stochastic models, you put in a value and with some probability you get one answer and with some probability you get another. We track all the possible answers that we could have got, and the probabilities assigned to them.
Here's a very simple example. Imagine I have a growing population where there’s just one parent per child – like bacteria. The world is so big and plentiful that they’ve all got plenty to eat, they don’t fight, and offspring don’t have to interact with each other... it’s utopia. In this mathematical utopia, each individual produces a number of offspring – but instead of assuming each one will have 2, we go for the 2.4 model, so some have none, some have 1, 2, 3, 4… and we assign likelihoods to each of those outcomes. And then some time later we investigate how likely it is that this population consists of 100 individuals, 1,000, 2,000 and so on. The model will give us some kind of probability distribution about that. Then we flip it round and ask how a population could have grown from one individual to 1,000 individuals. Did it expand quickly? Slowly? What can we infer about how the population got there?
The answer is, of course, nothing. You can’t infer exactly how the population got there. But you can say that most likely it grew exponentially, say, or that it was pretty small and then suddenly exploded. The art is to understand which things matter. So does it really matter that we assumed individuals didn’t interact with each other? Does it matter that we supposed each individual just has one parent? That sort of thing is a fine art, though, and it relies on talking to the scientists in the domain in which you’re modelling.
As a result, I spend much of my time talking to biological scientists rather than other mathematicians, trying to figure out how my models can be tweaked to take into account the randomness of procreation. Often what mathematicians can do is say, well, if the population were infinite, this would happen – then that’s where the biologists come in: they can test the models we create on real populations. So we talk to biologists and geneticists, trying to understand what they think matters most. Then we take a model, which is crude in other respects, and then add in some extra effect that we’ve talked to them about. Then, we test it to see if it makes the model more or less accurate. Almost always the biologists are right of course!
Have the models that you've helped to create become better and better over time? What impact have they had in the biological sciences?
What other people have done with these models is incredibly important. They have allowed geneticists to identify which gene causes disease and map variations in the population by differences in DNA. Nowadays, of course, they have so much data that they don’t have to use terribly sophisticated models, but the sort of mathematical modelling we do underlies all that – without it, they’d never have got to this point.
Do you ever get annoyed when people misuse the term random?
I normally try and not show the fact that I’m annoyed. I do in fact have a teenage daughter and sub-teenage son, so the word random is misused all the time.
These blogs were originally posted by the University of Oxford as part of their Research and Conversation series.