Cracking Code Locks
When I was still a student in Linköping University, my friends were staying in student apartments where the main entrance used a code lock. The lock required a four digit passcode, but there was something unusual about it: it lacked the green "OK"/"Enter" button to confirm once choice. Instead, the code was automatically verified as the digits were inserted. This may not seem like such a big deal, but, as we will see, it is actually.
Consider that the code is 1234. Let us now say that you are returning home on a Saturday morning at 3.30 AM and wish to enter your building. Since you are not at your best, you happen to start by pressing a "2" instead of the "1". It's not such a big deal, you think, and you now press "1234", upon which the door opens. It works as you expected, but there is one crucial detail here: you actually inserted the string "21234", so the first four digits are in fact "2123", which is not the right code. When you then inserted the "4", the verification mechanism clearly knew that it should check the last three digits "123", and append your newly inserted "4".
Clearly then, you have just tried two code: "2123" and "1234", and you have in fact done so with only 5 key presses instead of the 4*2=8 you would normally expect. This is due to the missing "OK" button (and perhaps a "Cancel"/"Restart" button).
This convenience thus comes at a price: if someone wishes to crack the lock (by guessing the code that is, not physically breaking it), that cracker could reuse previously inserted digits as part of the new code. For instance, consider the following. First, we insert "0000", which tries precisely that code. Then, we insert a "1", upon which we are trying the code "0001". If we insert a "0" again now, we are trying "0010" (as the code lock is constantly remembering the last three digits inserted). We can clearly take shortcuts, but how many? If we could constantly insert new codes, without ever having to return to a previous code, we would effectively try all 10^4 = 10000 codes by pressing only 10003 digits (the 10000 codes + 3 digits we need to start the process).
At first glance, it's not even clear if it's possible to find such a sequence. Consider a code that does not deal with digits from 0 to 9, but only the binary 0 and 1. If the code is two digits long, then there are four different possible codes: 00, 01, 10, and 11 (not a very useful code lock for practical purposes, but it serves as an easy example for us to understand the problem). Normally, trying each code means we need to press 2*4=8 buttons (excluding the "OK" in between each). But if we don't have the "OK" button, so that the code lock uses a memory, we can in fact try all four codes by pressing "00110" (five presses instead of eight). To see why, consider each two consecutive digits in the sequence: 00, 01, 11, 10.
Now consider using binary digits again (only two buttons are available: 0 and 1), but with a code of length 3. We now have 8 possible codes: 000, 001, 010, 011, 100, 101, 110, 111. If we start with 000, we need to then append a 1 (otherwise we retry 000), so we begin with 0001. We could then start with 0 again, as the sequence 00010 would try 000, 001, 010 (just look at three consecutive digits in the sequence). Let's add another 0 and see what happens. The sequence is now 000100, which tries 000, 001, 010, 100. These are unique codes, which is great. However, we now run into problems, as the last 2 digits are 00, exactly what we began with. We have already tried both 000 and 001, so we are now forced into retrying a code!
The point is that some sequences will repeat previously tried codes, which is a waste of time. The sequences that do not repeat previous codes are known as De Bruijn sequences. So with binary digits and a length of 2, the sequence 00110 is a De Bruijn sequence, because it tries the codes: 00, 01, 11, 10, that is, all possible combinations exactly once each. The sequence 00011 is not a De Bruijn sequence, as it tries 00 two times (in the first two trials) and does not try the code 10.
Put differently, De Bruijn sequences are the shortest possible sequence of button presses needed to try all codes on doors with code memory (without "OK" button).
Do De Bruijn sequences exist for all number of digits and lengths (k and n)? We succeeded in finding such for binary digits with length 2 (k=2, n=2), but not for binary digits with length 3 (k=2,n=3). And what about the real world door codes, with has 10 digits and typically length 4 (k=10, n=4)?
It turns out that De Bruijn sequences exist for all possible digits k and length n. This means that a real world door code can be cracked in 10003 key presses instead of the expected 40000 (4 per code, 10000 codes). That is, it can be cracked four times faster!
To see that De Bruijn sequences always exist (and how to find them), we first make some observations:
- At any point, we have a state, which are the n-1 digits previously inserted (the "memory").
- We then make k different choices, each giving us a code. Put differently: at each point (after initializing with n-1 digits), we have n-1 digits in state and we choose one more digit (k choices) to obtain a new code of length n.
- To get a code, say 1234, there is in fact only one way to reach it: from the state 123, and adding the digit 4.
We can picture this process in a graph, where the nodes represent states, and the links represent codes. Thus, note that attempts to try a code are on the links (edges), not nodes (vertices). Below you can see such a graph for k=2 (only binary digits) and n=4 (all codes are composed for four binary digits). The correct code might thus be e.g. 1001 or 0111.
Clearly, the question is now: can we traverse this graph in a way that passes each link once and only once? Note that we may be in the same state multiple times. In fact this is always necessary: to try both 0000 and 0001, we need to be in the state 000 first. However, we don't want to try any code more than once, so each link should only be visited once. This is known as a Eulerian path (or cycle, if we finish at the same node as we started). Now, each node in the graph will have exactly k links going out and k links going in (k=2 in the depicted graph), since there are k digits to choose from in each state, and k ways to throw away a digit to obtain the state. For example, in the graph above, there are two ways out of the state 000: by adding another 0, or by adding a 1. This corresponds to trying code 0000 and 0001. There are also two ways to reach 000: from 0000, and from 1000, as we throw away the left most digit (oldest digit).
Since there are equally many links pointing in and out (namely, k), we can be sure there is an Eulerian cycle, so our problem is indeed always solvable! In fact, all solutions are given precisely by all Eulerian cycles, so any algorithm to find cycles will do (Hierholzer's algorithm is not only an efficient and intuitive way of finding such cycles, but also provides a proof of existence).
Cracking suitcases is similar (k=10, n=3), except that we can now rotate each wheel independently. Thus, in one "click" we can go from 000 to 001, 010, 100, 009, 090, and 900. These six "next codes" corresponding to rotate one of the three wheels one step "up" or "down" (to their adjacent values). If we were to depict the solutions as a graph, where each node again corresponds to state and links to solutions, we have an important difference: the state is now the previous solution (we don't throw away any digits). From each state there are 6 links going out, and in fact, all these links are also going in, since we can always turn the wheel back one step. Thus we may consider the undirected graph where each node has 6 links, and, since every node has an even degree, it is guaranteed that an Eulerian cycle exists.
Note that the "obvious" solution of enumerating does not work: 000, 001, 002, 003, ... 008, 009 is fine, but going from 009 to 010 requires two switches: turning the least significant 9 to 0, and the middle 0 to 1.
In this manner, we only need to rotate the "wheels" a total of 1000 times, as each rotation tries a new code (and there are precisely 1000 codes, namely 000 to 999).