Earlier today I set these two puzzles, which were given to me by a computer scientist at Microsoft. They are an analogy for how companies protect data centres from the random failures of hard drives. Here they are again with solutions and workings.
The disappearing boxes
You have 100 boxes. Each box contains a single number in it, and no two boxes have the same number.
1. You are told that one of the boxes at random will be removed. But before it is removed you are given an extra box, and allowed to put a single number in it. What number do you put in the extra box that guarantees you will be able to recover the number of whichever box is removed?
[A clarification, as I think some readers of the puzzle were unclear: your own boxes are not laid out at random. You know which box is which. Just as in a data centre, the hard drives are all positioned in order and do not move around. What is random is the the box that is removed.]
Solution The sum of all the numbers in the 100 boxes.
Here’s an explanation with some more working. Let the number in box 1 be n1, the number in 2 be n2, and so on. So, we put n1 + n2 + … n100 in the extra box. Now let’s say that box r is removed, so the information that is lost is the number nr. In order to recover nr we add up all the numbers in the 99 boxes that remain, and subtract it from the number in the extra box. That number is nr.
2. You are told that two of the boxes at random will be removed. But before it is removed you are given two extra boxes, and allowed to put one number in each of them. What (different) numbers do you put in these two boxes that guarantees you will be able to recover the numbers of both removed boxes?
Solution In one box we do what we did in the previous example, which is put the sum of all the numbers in the 100 boxes. Let’s call this number A. In the other box, we need another sum that combines all the digits in the 100 boxes, such as:
n1 + 2n2 + 3n3 + … 100n100
Let’s call this number B.
So let’s say the two boxes that are removed are boxes r and s. If we add up all the 98 numbers that are left, and subtract this from A, we have a value for nr + ns.
Likewise, we add up the values n1 + 2n2 + 3n3 + … 100n100 but without rnr and sns we are left with the value of rnr + sns.
This gives us two equations in two unknowns (nr and ns ) which can be solved to find both nr and ns. (As you hopefully remember from learning basic ‘simultaneous equations’ at school.)
Ta dah! We have managed to find an extremely efficient method to recover our data. And as a bonus, you have learned the basic idea of how “error correcting codes” work, which is the technology that makes sure that the digital world can handle random noise and failures.
I hope you enjoyed today’s puzzle. I’ll be back in two weeks.
Thanks to Sivakanth Gopi, a Principal Researcher at Microsoft, for today’s puzzle.
I’ve been setting a puzzle here on alternate Mondays since 2015. I’m always on the look-out for great puzzles. If you would like to suggest one, email me.