Hannah and her sweets have been a hot topic of discussion over the last few days, since the following question appeared on a GCSE mathematics paper, sat by 500,000 teenagers last Thursday: “There are n sweets in a bag. Six of the sweets are orange. The rest of the sweets are yellow. Hannah takes at random a sweet from the bag. She eats the sweet. Hannah then takes at random another sweet from the bag. She eats the sweet. The probability that Hannah eats two orange sweets is 13. Show that n2-n-90=0.”

Why were the sweets so sticky? Probably because the question was asked in an unusual way. If we had been told how many sweets were in the bag and then asked to find the probability of Hannah eating two orange sweets, it becomes more familiar. As it stands, we must calculate the probabilities in terms of n, to get 6n 5(n-1) = 13, which is then rearranged to give the equation that we’re asked for. Solving this equation, we find that there are 10 sweets in the bag.

In fact the question as set is more reminiscent of problems that arise in real-world applications than the version that the students might have hoped to see. By telling us that Hannah’s probability of eating two orange sweets is 13, the examiner is effectively telling us about the outcomes of a very large number of experiments, each involving taking and eating two sweets. If Hannah, with some help from her friends, drew and ate two sweets from each of a large number of identical bags, then saying that there is probability 13 of eating two orange sweets is equivalent to saying that about one-third of bags have two orange sweets drawn from them (and this fraction gets closer to one-third the more bags that are used).

How can we develop this point of view to more challenging problems? Before making things harder, let’s make them a little simpler by supposing that Hannah loses her sweet tooth and so replaces each sweet in the bag after it is drawn, instead of eating it. This makes the algebra a little easier. For example, if we were told that there were 6 orange sweets in the bag and that sampling two sweets with replacement results in two orange sweets with probability 14, then we could conclude there are 12 sweets in the bag.

What about if we were now not given any probabilities, but instead given the experimental results directly? This is getting closer to real-world situations. Suppose that conducting a series of experiments leads to two orange sweets being drawn \(p\) times, one orange and one yellow sweet being drawn \(q\) times and two yellow sweets being drawn \(r\) times. It is not now possible to deduce precisely what is the number of sweets in the bag, but we can determine the most likely number, which turns out to be \(12{p+q+r \over 2p+q}\). This is a calculation that might be familiar to a first or second year undergraduate rather than a GCSE student.

If we find \(r>p\) in our experiments, this is evidence that there are more yellow sweets than orange sweets. The greater the difference between \(r\) and \(p\), the more evidence there is for a large number of yellow sweets and so a large number of sweets in total. This intuition is consistent with the formula above for the most likely value of \(n\). Remember that we do still know there are 6 orange sweets, and this information provides a valuable benchmark or calibration against which to estimate the total number. In other words, knowing how many orange sweets there are allows us to infer the total number of sweets from the relative frequencies of orange and yellow.

Now let’s take things one step further by supposing we are still interested in how many sweets there are in the bag, but they are all the same colour! We have now lost the calibration that we had from knowing there that were 6 orange sweets. The trick now is to create our own calibration mechanism as we go along, by marking each item in some way before replacing it in the bag. This way, we know whether we have seen the same item before.

A few years ago, local mice took to visiting my garage. The squeamish in my family were reluctant to “sample without replacement” and so devised a method of using a humane trap and then correction fluid to paint a white stripe on the mouse before releasing it in a nearby field. I imagine the mouse viewed being painted as a modest price to pay for being sampled with replacement, apart perhaps from some residual risk of being mistaken for a miniature badger. Sure enough, we soon started catching mice with white stripes, at which point they were lucky enough to get a second stripe, and so on. Once we realised we were dealing with only a few mice, we simply put them in a more distant field.

This type of method for estimating the size of a population has firm mathematical foundations and has been surveyed by Wilson and Collins [1], among others. Imagine that we are catching fish in a lake, one at a time, and throwing each one back after we mark it. If we cast our net \(m\) times then we will catch some number of fish exactly once – call this number \(f_1\) – and some, generally smaller, number of fish exactly twice – call this number \(f_2\). Likewise for \(f_3\), \(f_4\) and so on. The big unknown is the number of fish that we don’t catch at all. Among the estimates of the total number of fish in the lake recommended by Wilson and Collins is \({m\sum_{i\geq1}f_i\over m-f_1}\).

Having no information to begin with, i.e. only gathering information through actually catching fish, does mean that some time is needed to achieve a good estimate. The graph below simulates catching fish at random from a lake containing 1,000 in total. The fisherman does not know how many fish there are, but is familiar with the Wilson-Collins paper, and so knows that he should mark each fish caught and estimates the number in the lake from his catches so far. The estimates can oscillate wildly to begin with, but settle down close to the true value of 1,000 beyond about 500 total catches. Even at the right-hand edge of the graph, after 2,000 catches, more than 10% of fish have still not been caught at all.

If we have more prior information than the fisherman then we should use it, but at least we have a means of making progress if not. As well as estimating animal populations, this type of technique can be used for estimating other populations where records are incomplete, for example incidence of crime or prevalence of medical conditions.

Although this year’s GCSE cohort might feel hard done by, perhaps some of them will in future look back and see how Hannah opened the door on a whole range of challenging problems in the real world.

1 Richard Wilson and Mark Collins, Capture-Recapture Estimation with Samples of Size One Using Frequency Data, Biometrika, Vol. 79, pp. 543-553 (1992).