Monday, December 9, 2013

Solving Logic Riddles using AI Programming

Consider three persons, Caesar, Brutus, and Cassius. One is a king, another is a bureaucrat, and the third is a spy. The king never lies, whereas the bureaucrat always lies. The spy either lies or tells the truth. Caesar claims that Cassius is a bureaucrat, Brutus claims that Caesar is a king, Cassius claims that he is a spy.

Who is who?

The solution, by hand

Let's start by solving this without computer assistance. Cassius cannot be a king: he is asserting that he is a spy, so if he were king, he would be lying - which is impossible for the king to do. Likewise, Brutus cannot be king, since he claims Caesar is, and so would be lying. So Caesar is the king. Brutus is then telling the truth, so he cannot be a bureaucrat (the Bureaucrat always lies). So Brutus must be a spy, and, by exclusion, Cassius is a bureaucrat (which is consistent, since Cassius is lying when asserting he is a spy). To summarize, there is a unique solution, which is:

Caesar: king.
Brutus: spy.
Cassius: bureaucrat.

The solution, AI programming

How would you encode information about this problem in such a way as to let the computer solve it? Sure, in this case the solution was easy to obtain, but if you had 100+ rules and a lot more people it would not be easy to solve the problem by hand (think of the problem as an equation with values true/false). Logic programming, as the name suggest, provides an intuitive and natural way to express logical riddles (among other things). The following in Prolog:

First, tell the computer Caesar, Brutus, and Cassius are persons:


Next, we would like to assign King, Liar (Bureaucrat is too long to type), and spy to these persons:

assign(K,L,S) :-
person(K), % king is a person
person(L), % liar is a person
person(S), % spy is a person
king(K,L,S), % constraints for king (to be defined later)
liar(K,L,S),  % constraints for liar
spy(K,L,S), % constraints for spy
different(K,L,S). % king, liar, and spy must be different people

We must thus defined the constraints imposed by virtue of being King, Liar, and Spy:

king(K,L,S) :- says(K,K,L,S). % Whatever King says is true
liar(K,L,S) :- \+ says(L,K,L,S). % Whatever Liar says is false (think of "\+" as "not true")
spy(_,_,_). % Anybody can be a spy

Now onto the claims that each of them make:

says(caesar,K,L,S) :- L = cassius. % Caesar says Cassius is Liar
says(brutus,K,L,S) :- K = caesar. % Brutus says Caesar is King
says(cassius,K,L,S) :- S = cassius. % Cassius says he himself is Spy

Finally, we define what "different" means:

different(X,Y,Z) :- X \= Y, X \= Z, Y \= Z. % Not the same: X,Y,Z 

Now we may query Prolog to obtain all solutions:

?- assign(K,L,S).
K = caesar, <--- Caesar is King
L = cassius, <--- Liar is Cassius
S = brutus ; <--- Spy is Brutus
false. <---- This means there are no more solutions

An interesting change

To see that this really works, consider what would happen if we don't require that there must be a unique King, Liar, and Spy. For example, one person could then be both King and Spy. This is effectively done by removing the requirement that they are all different: either remove different(K,L,S) from assign(K,L,S), or define it trivially as different(_,_,_), meaning there are no constraints).

Before executing the query, let's make some observations: a person cannot both the king and liar, assuming that person makes at least one claim. Since everybody makes a claim here, nobody can be both king and liar. So at least two people must be assigned the properties {king, liar, spy}. Clearly anybody can in principle be both king and spy, or liar and spy, since spy adds no new constraints. Furthermore, our original solution should still be part of the new solution set, since we've only removed one constraint.

How many solutions do you count? This problem is clearly more difficult than the previous one. Our query will reveal all solutions:

?- assign(K,L,S).
Solution 1:
K = S, S = caesar, <--- Caesar is both king and spy
L = cassius  <--- Cassius is liar

Solution 2 (This is the original solution):
K = caesar,
L = cassius,
S = brutus 

Solution 3:
K = S, S = cassius, <--- Cassius is king and spy
L = caesar <--- Caesar is Liar

Solution 4:
K = S, S = cassius, <--- As above, Cassius is king and spy
L = brutus <--- This time Brutus is the liar (instead of Caesar)

false. <--- No more solutions.

There is an asymmetry of solutions: when Cassius is both king and spy (solution 3 and 4), the liar can be Caesar or Brutus. But when Caesar is king and spy (solution 1), only Cassius can be a liar, not Brutus. This is because Brutus asserts that Caesar is king, and liars are never allowed to tell the truth.

Sunday, December 1, 2013

Which shape gives the largest area?

The Isoperimetric Inequality

Given a rope of length L, say 100 meters, what is the largest closed area you can make?
If we try a square, each side has to be L/4 (25 meters), and we'll end up with an area of (L/4)^2 = 625 square meters. If we however try a circle, the circumference is L=2πR, where R is the radius. Thus R = L/(2π), and the area is π(L/2π)^2 = L^2/(4π) = 796 square meters, which is better than the square.

But there's plenty more shapes to try, in fact, infinitely many. So which should we pick to maximize the area?
The ancients knew the answer but could not prove it: it's the circle. The proof of this is surprisingly simple using calculus. Let L be the length of the rope, and let A be the enclosed area.

Note that for a circle, the relationship between A and L is L=2πR and A=πR^2, where R is the radius, so that:

Given the belief that the circle gives the largest area, we expect that for any shape,

This is known as the isoperimetric inequality.

Hurwitz's Proof

Assume that the curve (x(t),y(t)) is parametrized so that it does one full revolution when t goes from 0 to π. The arc length is defined by s = Lt/(2π). Note that when t=0, s=0, and when t=2π, s=L.

We then have:

As for the area, we use Green's theorem:

Now, simply take the difference we want to show is always non-negative:

We want to conclude that this expression is non-negative for any curve. The first integral is clearly non-negative, since its integrand is a square. As for the second integral, the result is not obvious until one consider its Fourier series (this is called Wirtinger's inequality):

As we will see soon, for this to work, we must place the curve in coordinates such that:

This can always be done, as the placement of the curve is arbitrary: if the integral of y(t) is V, then the translated function y(t)-V/(2π) will have integral 0.

Next, differentiate y(t) to obtain:

Then, by Parseval's identity:

which concludes the proof. (The requirement a0=0 is to ensure that the last inequality holds for n=0).
The inequality thus obtained is known as the isoperimetric inequality:

Only the Circle Yields Maximum Area

As we saw in the beginning, the circle turns the inequality into an equality, thus yielding the maximum possible area. But this still leaves open the possibility that other shapes may be equally good. We can show that only the circle yields max area by considering when the inequality becomes an equality: precisely when the two integrals are zero.

Starting with the second integral, the inequality was obtained from comparing the square sums after applying Parseval's identity. To have equality, the terms in those sums must be 0 except when n=1, which means that

The first integral must also be zero, which means that

Integrating, we get

So indeed we must have a circle centered at (D,0).

The fact that the circle must be centered around y=0 is due to our previous choice of fixing the integral of y(t) over one period. The center of mass must be such that equally much is positive and negative, which for a circle clearly means fixing it around y=0.

How general is this proof?

The use of Green's theorem requires that we deal with a curve (x(t),y(t)) that has a continuous derivative except in finitely many points (piecewise smooth). Piecewise smoothness is also enough to ensure that the Fourier series can be differentiated term by term. Any area you can physically construct will certainly satisfy these conditions, as will anything you draw with a pen, since it is impossible to create or draw infinitely many edged points.

Tuesday, November 12, 2013

Mathematical Realism and Anti-realism: all the way down to the Nature of the Universe itself

There's this age old question of where mathematics come from: is it a product of the human mind, or does it exist independently of it? Here's the labels:

Realism: A realist believes that math exists outside of our  minds. We discover mathematics, we don't invent it. Mathematics is real.

Anti-realism: An anti-realist believes that mathematics exists within our minds. We invent mathematics, we don't discover it. Mathematics is not physically real.

Realists argue that mathematicians make discoveries long before they become useful in physics, and that can't be a coincidence. When Einstein wanted to develop general relativity, non-Euclidean geometry had already been developed. Quantum mechanics is based on Hilbert spaces (functional analysis), which had also already been developed by mathematicians.

Anti-realists, on the other hand, argue that the truth of mathematical statements depend on assumptions, i.e. mathematics does not have any unconditional truths (apart from logical tautologies, such as "A implies A"). For example, let's reconsider geometry: there are various kinds of them, the Euclidean, and many non-Euclidean. For a long time the ancients saw Euclidean geometry as self evident, although we today know that this "self evident fact" is incorrect.

But let's take one question very seriously: why are the properties of the universe so mathematical?

It's almost as if the universe is made of mathematics, given that the laws of physics are mathematical relations (and moreover, they may all be computable). This hints towards a possible physical answer to the great debate between these two camps.

The universe may appear to be mathematical because it IS mathematical. The universe is a formal axiomatic system, and we exist as part of those axioms. The assertion may in fact be physically provable: if we discover a theory of everything (i.e. the ultimate laws of physics), then the universe is obviously mathematical, and realism turns out to be true: pi is not in the sky, but in the formalization of the laws of nature. As a special case, we it turns out that all physical processes of nature are computable, then the universe is not only mathematical, but a Turing-equivalent computer.

On the other hand, it may turn out that the universe is not mathematical, and we "discover" mathematical properties of the universe only because our mind creates them. This may sound crazy at first, but one may note that many concept are in fact purely man made (although indispensable to everyday life): concepts such as human, car, alive/dead, and temperature all rely on human interpretation of a huge amount of matter ("a macroscopic phenomena"). It is not fundamental to the universe: a car is just an arrangement of particles/energy ("a state"), no more special than a pile of junk. Put differently, the universe may be "blind" to our high level concepts, and that may actually include most of physics, possibly with the exception of quantum mechanics (although that is not certain either). From this perspective, the universe is not mathematical: we interpret it as having mathematical properties, and that approximate some things very well, but will never grasp the full extent of it (that is the case right now). In that case, anti-realists win: mathematics is only in the mind (yet very useful to our existence).

Monday, October 21, 2013

How to iterate over the values of a std::map

EDIT: Changed example to better reflect the generality of the problem = title of this post. END EDIT.

Now that C++ has lambdas, the C++ STL algorithms are practically useful, but how do you iterate over the values of a std::map?

Say you have a function template:

template <typename Iter> void f(Iter curr, Iter end);

f assumes the iterators point to std::strings:

std::vector<std::string> v;
f(v.begin(),v.end()); // ok

std::map<int,std::string> m;
f(m.begin(),m.end()); // Won't work: *m.begin() is a pair<Key,Value>

What we thus need is a way of converting the map::iterator (and map::const_iterator) to an iterator to always dereferences the Value type of the pair.

Solution using Boost

The solution is easy using boost:

#include <boost/iterator/transform_iterator.hpp>

// First, define a lambda to get the second element of a pair:
auto get_second = [](const std::pair<const int,std::string>& p){ return p.second; };

// Then, we can convert a map iterator into an iterator that automatically dereferences the second element
auto beg = boost::make_transform_iterator(m.begin(),get_second);
auto end = boost::make_transform_iterator(m.end(),get_second);

f(beg,end); // ok, works!

The line
is needed so inform the boost libraries that the result type (Value in this case) should be inferred using decltype(), rather than by requiring a result_of typedef (prior to C++11, decltype did not exist).

Solution without Boost

If you can't use boost, you'll need to implement the dereferencing by hand. Here's the code:

#include <map>
#include <iterator>

template <typename Iter>
class map_iterator : public std::iterator<std::bidirectional_iterator_tag,typename Iter::value_type::second_type> {
 map_iterator() {}
 map_iterator(Iter j) : i(j) {}
 map_iterator& operator++() { ++i; return *this; }
 map_iterator operator++(int) { auto tmp = *this; ++(*this); return tmp; }
 map_iterator& operator--() { --i; return *this; }
 map_iterator operator--(int) { auto tmp = *this; --(*this); return tmp; }
 bool operator==(map_iterator j) const { return i == j.i; }
 bool operator!=(map_iterator j) const { return !(*this == j); }
 reference operator*() { return i->second; }
 pointer operator->() { return &i->second; }
 Iter i;

template <typename Iter>
inline map_iterator<Iter> make_map_iterator(Iter j) { return map_iterator<Iter>(j); }

// We can now do:

 std::map<int,std::string> m;
 // ...

Sunday, October 13, 2013

STL Container Performance

STL Container Performance Table

There's already a good table at Stack Overflow listing time complexity (in Big O notation) of common operations with C++ STL containers for comparison, although it's structured in a more abstract way and a little hard to read because it's not a HTML table. Here's my version, which also includes priority_queue (technically an adaptor).

Persistent iterators means that the iterators are not invalidated by insertions and erase (except when erasing the element referred to by the iterator, which is necessarily invalidated).

ContainerInsertionAccessEraseFindPersistent Iterators
vector / stringBack: O(1) or O(n)
Other: O(n)
O(1)Back: O(1)
Other: O(n)
Sorted: O(log n)
Other: O(n)
dequeBack/Front: O(1)
Other: O(n)
O(1)Back/Front: O(1)
Other: O(n)
Sorted: O(log n)
Other: O(n)
Pointers only
list / forward_listBack/Front: O(1)
With iterator: O(1)
Index: O(n)
Back/Front: O(1)
With iterator: O(1)
Index: O(n)
Back/Front: O(1)
With iterator: O(1)
Index: O(n)
set / mapO(log n)-O(log n)O(log n)Yes
unordered_set / unordered_mapO(1) or O(n)O(1) or O(n)O(1) or O(n)O(1) or O(n)Pointers only
priority_queueO(log n)O(1)O(log n)--

Always O(1): begin(), end(), empty(), size(), push_back()

The following operations are always O(1) when they exist:
  1. begin(), end()
  2. empty()
  3. size() (note that list::size() was not necessarily O(1) prior to C++11)
  4. push_front() (note that std::vector does not have push_front(), as it would not be O(1))
  5. push_back()

Some Additional Notes

Adaptors: queue and stack

For std::queue and std::stack, complexity depends on the underlying container used (by default std::deque).


std::vector has constant time (O(1)) back insertion provided no reallocation needs to take place (use reserve/capacity to allocate/check). When reallocation is necessary, all elements are copied (or moved, if possible) to a need memory location. It is guaranteed that back insertion is amortized constant, meaning: "if we perform a large amount of back insertions, the average time for back insertion is constant".

Insertion does not invalidate iterators as long as no reallocation is performed (when reallocating, all iterators become invalid). Deletion invalidates all iterators after the deleted element, iterators to elements before are still valid.


Insertion and deletion of elements in a std::deque may invalidate all its iterators. Pointers are however persistent. In practice accessing / iterating over a std::vector is faster than std::deque.

All iterators may become invalid after an insertion or deletion, but pointers/references are always valid.


If you have an iterator to an element, you can insert right after (or before) that element in constant time (O(1)). Of course, you can also erase it or access it directly (O(1)) using the iterator (or any adjacent element, as ++iterator / --iterator are constant time operations).

If you only know the index, e.g. that you wish to insert/retrieve/erase the 4th element, you'll need to iterate the list until you reach that element. Put differently: std::list does not provide random access.

sorted vector and deque

To search for an element in a sorted std::vector or std::deque, use std::equal_range. If only the first element is needed, there is std::lower_bound. If you only want to know whether an element exists or not, there is std::binary_search.

set and map

Requires a less-than comparison function. Complexities also apply to multiset and multimap.

unordered_set and unordered_map (hash tables)

unordered_set and unordered_map has constant time performance on all operations provided no collisions occur. When collisions occur, traversal of a linked list containing all elements of the same bucket (those that hash to the same value) is necessary, and in the worst case, there is only one bucket; hence O(n).

Requires a hash function and equality comparison function. Complexities also apply to unordered_multiset and unordered_multimap.

Deletion does not invalidate any iterators (other than erased element). Insertion keeps all iterators valid as long as no rehashing is done. When rehashing is performed, all iterators become invalid. Pointers/references to elements always remain valid.

multiset, multimap, unordered_multiset, unordered_multimap

std::multiset and std::multimap follow the same rules as std::set and std::map.
std::unordered_multiset and std::unordered_multimap follow the same rules as std::unordered_set and std::unordered_map.

The only reason they are not listed in the table/throughout this document is to save space.


Strictly speaking std::string and std::wstring are typedefs for basic_string, which is a container. What applies to string above applies more generally to basic_string (and hence, to std::wstring too).

Note that prior to C++11 basic_string was not required to store its elements (characters) contiguously. Now it acts as std::vector, except its only valid for POD types and some tricks that don't violate the constraints may be employed (in practice: small string optimization).


Requires a less-than comparison function. Always gives the greatest element (according to comparison function) when top() is called. top() is constant time, but pop() requires O(log n) as the queue needs to be rearranged (to ensure next top() correctly gives greatest element).

Sunday, October 6, 2013

How to Crack Door Codes and Suitcases Fast

Cracking Code Locks

When I was still a student in Linköping University, my friends were staying in student apartments where the main entrance used a code lock. The lock required a four digit passcode, but there was something unusual about it: it lacked the green "OK"/"Enter" button to confirm once choice. Instead, the code was automatically verified as the digits were inserted. This may not seem like such a big deal, but, as we will see, it is actually.

Consider that the code is 1234. Let us now say that you are returning home on a Saturday morning at 3.30 AM and wish to enter your building. Since you are not at your best, you happen to start by pressing a "2" instead of the "1". It's not such a big deal, you think, and you now press "1234", upon which the door opens. It works as you expected, but there is one crucial detail here: you actually inserted the string "21234", so the first four digits are in fact "2123", which is not the right code. When you then inserted the "4", the verification mechanism clearly knew that it should check the last three digits "123", and append your newly inserted "4".

Clearly then, you have just tried two code: "2123" and "1234", and you have in fact done so with only 5 key presses instead of the 4*2=8 you would normally expect. This is due to the missing "OK" button (and perhaps a "Cancel"/"Restart" button).

This convenience thus comes at a price: if someone wishes to crack the lock (by guessing the code that is, not physically breaking it), that cracker could reuse previously inserted digits as part of the new code. For instance, consider the following. First, we insert "0000", which tries precisely that code. Then, we insert a "1", upon which we are trying the code "0001". If we insert a "0" again now, we are trying "0010" (as the code lock is constantly remembering the last three digits inserted). We can clearly take shortcuts, but how many? If we could constantly insert new codes, without ever having to return to a previous code, we would effectively try all 10^4 = 10000 codes by pressing only 10003 digits (the 10000 codes + 3 digits we need to start the process).

At first glance, it's not even clear if it's possible to find such a sequence. Consider a code that does not deal with digits from 0 to 9, but only the binary 0 and 1. If the code is two digits long, then there are four different possible codes: 00, 01, 10, and 11 (not a very useful code lock for practical purposes, but it serves as an easy example for us to understand the problem). Normally, trying each code means we need to press 2*4=8 buttons (excluding the "OK" in between each). But if we don't have the "OK" button, so that the code lock uses a memory, we can in fact try all four codes by pressing "00110" (five presses instead of eight). To see why, consider each two consecutive digits in the sequence: 00, 01, 11, 10.

Now consider using binary digits again (only two buttons are available: 0 and 1), but with a code of length 3. We now have 8 possible codes: 000, 001, 010, 011, 100, 101, 110, 111. If we start with 000, we need to then append a 1 (otherwise we retry 000), so we begin with 0001. We could then start with 0 again, as the sequence 00010 would try 000, 001, 010 (just look at three consecutive digits in the sequence). Let's add another 0 and see what happens. The sequence is now 000100, which tries 000, 001, 010, 100. These are unique codes, which is great. However, we now run into problems, as the last 2 digits are 00, exactly what we began with. We have already tried both 000 and 001, so we are now forced into retrying a code!

The point is that some sequences will repeat previously tried codes, which is a waste of time. The sequences that do not repeat previous codes are known as De Bruijn sequences. So with binary digits and a length of 2, the sequence 00110 is a De Bruijn sequence, because it tries the codes: 00, 01, 11, 10, that is, all possible combinations exactly once each. The sequence 00011 is not a De Bruijn sequence, as it tries 00 two times (in the first two trials) and does not try the code 10.

Put differently, De Bruijn sequences are the shortest possible sequence of button presses needed to try all codes on doors with code memory (without "OK" button).

Do De Bruijn sequences exist for all number of digits and lengths (k and n)? We succeeded in finding such for binary digits with length 2 (k=2, n=2), but not for binary digits with length 3 (k=2,n=3). And what about the real world door codes, with has 10 digits and typically length 4 (k=10, n=4)?

It turns out that De Bruijn sequences exist for all possible digits k and length n. This means that a real world door code can be cracked in 10003 key presses instead of the expected 40000 (4 per code, 10000 codes). That is, it can be cracked four times faster!

To see that De Bruijn sequences always exist (and how to find them), we first make some observations:
  1. At any point, we have a state, which are the n-1 digits previously inserted (the "memory").
  2. We then make k different choices, each giving us a code. Put differently: at each point (after initializing with n-1 digits), we have n-1 digits in state and we choose one more digit (k choices) to obtain a new code of length n.
  3. To get a code, say 1234, there is in fact only one way to reach it: from the state 123, and adding the digit 4.
We can picture this process in a graph, where the nodes represent states, and the links represent codes. Thus, note that attempts to try a code are on the links (edges), not nodes (vertices). Below you can see such a graph for k=2 (only binary digits) and n=4 (all codes are composed for four binary digits). The correct code might thus be e.g. 1001 or 0111.

Clearly, the question is now: can we traverse this graph in a way that passes each link once and only once? Note that we may be in the same state multiple times. In fact this is always necessary: to try both 0000 and 0001, we need to be in the state 000 first. However, we don't want to try any code more than once, so each link should only be visited once. This is known as a Eulerian path (or cycle, if we finish at the same node as we started). Now, each node in the graph will have exactly k links going out and k links going in (k=2 in the depicted graph), since there are k digits to choose from in each state, and k ways to throw away a digit to obtain the state. For example, in the graph above, there are two ways out of the state 000: by adding another 0, or by adding a 1. This corresponds to trying code 0000 and 0001. There are also two ways to reach 000: from 0000, and from 1000, as we throw away the left most digit (oldest digit).

Since there are equally many links pointing in and out (namely, k), we can be sure there is an Eulerian cycle, so our problem is indeed always solvable! In fact, all solutions are given precisely by all Eulerian cycles, so any algorithm to find cycles will do (Hierholzer's algorithm is not only an efficient and intuitive way of finding such cycles, but also provides a proof of existence).

Cracking Suitcases

Cracking suitcases is similar (k=10, n=3), except that we can now rotate each wheel independently. Thus, in one "click" we can go from 000 to 001, 010, 100, 009, 090, and 900. These six "next codes" corresponding to rotate one of the three wheels one step "up" or "down" (to their adjacent values). If we were to depict the solutions as a graph, where each node again corresponds to state and links to solutions, we have an important difference: the state is now the previous solution (we don't throw away any digits). From each state there are 6 links going out, and in fact, all these links are also going in, since we can always turn the wheel back one step. Thus we may consider the undirected graph where each node has 6 links, and, since every node has an even degree, it is guaranteed that an Eulerian cycle exists.

Note that the "obvious" solution of enumerating does not work: 000, 001, 002, 003, ... 008, 009 is fine, but going from 009 to 010 requires two switches: turning the least significant 9 to 0, and the middle 0 to 1.

In this manner, we only need to rotate the "wheels" a total of 1000 times, as each rotation tries a new code (and there are precisely 1000 codes, namely 000 to 999).

Thursday, September 26, 2013

Measuring Execution Time Accurately and Setting Deadlines

If you do benchmarking, a common task is to measure the execution time of a particular block of code.

The canonical and portable way to do this in C++ is as follows:

#include <chrono>
// Starting timepoint
const auto start_time = std::chrono::steady_clock::now();
// ... Code here ...
double time_in_seconds = std::chrono::duration_cast<std::chrono::milliseconds>
  (std::chrono::steady_clock::now() - start_time).count() / 1000.0;

This will measure execution time in seconds, with millisecond precision. If you want better precision, you can of course go for say microsecond resolution:

// Starting timepoint
const auto start_time = std::chrono::steady_clock::now();
// ... Code here ...
double time_in_seconds = std::chrono::duration_cast<std::chrono::microseconds>
  (std::chrono::steady_clock::now() - start_time).count() / 1000000.0;

If your code block may throw exceptions, and the execution time is actually updated to some variable that is outside of this scope, you may want to catch the exception:

// time_in_seconds reference to variable declared outside this scope
// Starting timepoint
const auto start_time = std::chrono::steady_clock::now();
try {
  // ... Code here ...
} catch (...) {
  // Update execution time and rethrow
  time_in_seconds = std::chrono::duration_cast<std::chrono::milliseconds>
    (std::chrono::steady_clock::now() - start_time).count() / 1000.0;
// Normal execution ended, update time
time_in_seconds = std::chrono::duration_cast<std::chrono::milliseconds>
  (std::chrono::steady_clock::now() - start_time).count() / 1000.0;

steady_clock is a monotonic clock: its value never decreases. This can be compared to std::chrono::system_clock, which can in fact decrease if the user changes the value. There's also std::chrono::high_resolution_clock which may or may not be monotonic (as the name suggests, it is intended to primarily be used as a high resolution clock, i.e. ideally with nanosecond precision).

The start_time is a timepoint, and taking the difference of two timepoints (the two now()s) yields a duration. We then convert the duration into millisecond "ticks", which are obtained using count(). Alternatively, we can convert the duration into microsecond "ticks" (as shown above), or even nanosecond ticks. Finally, we refactor the numerical value from milliseconds to seconds (this step is obviously not needed, but is used because it is many times easier to use SI units).

Deadlines can be created in a similar way:

// Define type used for deadline
typedef std::chrono::steady_clock::time_point deadline_t;

deadline_t soon = std::chrono::steady_clock::now() + std::chrono::minutes(2);
// or, equivalently: now() + std::chrono::seconds(2*60);

for (const auto& e : vec) {
  if (std::chrono::steady_clock::now() > soon) throw out_of_time();
  // ... process vector elements ...

Monday, August 19, 2013

Fast (and pretty easy) Currency Conversion without Calculators

With the prevalence of mobile phones, it's easy to convert between currencies by simply using the phone's calculator. If you're like most people, however, you won't be doing this while shopping on the street markets of Thailand or anywhere else abroad, provided the sum is not too large (say anything below 100 USD).

Here's a practical way to convert between currencies fast and easy, all done in your head.

Consider some practical examples.

USD to EUR conversion
We want to convert from USD to Euros (EUR).
Google tells me that today, 1 USD = 0.75 EUR.
How much is 67 USD in EUR?

The answer is 67*0.75 EUR, but how do you actually do that calculation in your head?
Note that 0.75 = 3/4, so we have 67 * 3 / 4. From a mathematical point of view, it doesn't matter if we multiply 67 by 3 first, and then divide it by 4, or if we divide 67 by 4 first, then multiply by 3. It is however much easier from humans (and computers!) to deal with small numbers, so we start by dividing to get a smaller number.

How do you divide 67 by 4? The answer is that you half it two times: 67 / 4 = 67 / (2*2) = 67 / 2 / 2.
How do you halve 67? 67 is 60+7, and if we take half of each and then add them, we'll have the correct answer. Half of 60 is 30, and half of 7 is approximately 4 (we stick to integers). Thus the answer is 30+4 = 34. Repeating once more (remember, we halve two times), 34 = 30+4, and halving each gives 15+2 = 17.

Now, we multiply 17 by 3 using the same technique of splitting 17 into 10+7 and then multiplying (instead of dividing) by 3: 3*10 = 30, and 3*7 = 21, so you get 30+21 = 51.

In one go:

67 USD is approximately 51 EUR. The exact answer is 50.25 EUR.

What you need to remember is thus:
  1. The general operations: halve the number two times, then multiply by 3
  2. How to halve and multiply by 3 using the "splitting" trick.
Let's we want to do the opposite now: convert euros into USD. 1/0.75 = 1.33, so 1 EUR = 1.33 USD.

Let's say we have 247 EUR. This is 247*1.33 USD, but how do we calculate it easily?

First, we omit the second decimal, i.e. 1.33 is approximately 1.3. So we only need to do 247 * 1.3. First, using the splitting trick, we see that:

So the hard part is a little easier now: we need to estimate 247*0.3. Noting that 0.3 = 3/10, and doing the division before multiplication as before, we get:

So the answer is 247 + 75. A rough estimate is given by 250+75 = 250 + 50 + 25 = 325.

So 247 EUR is approximately 325 USD. The exact answer is 328.51 USD.

It seems then that the trick is to come up with ways to multiply and divide by decimal numbers, such as 0.75 and 0.3. This can always be done, since all conversion rates are rational numbers, meaning they can be written in the form Integer/Integer. By using a fairly simple numerator and denumerator, we can then perform the estimation fast. Multiplication is in general easy, division much harder. Here's a couple of tricks:

Divide by 2 (halving)
This is simply taking half the quantity. If you need to divide say 573 by 2, then use the splitting technique: 573 = 500 + 70 + 3. Halving each gives approximately 250 + 35 + 2 = 250 + 37 = 287.

Divide by 4 and 8 (halving many times)
Also very easy: dividing by 4 is the same as halving twice. Dividing by 8 is the same as halving three times.

Divide by 10
The easiest to do. 128 divided by 10 is 12.8. In general, XYZ divided by 10 will be XY.Z, where X,Y and Z are digits (0,1,2,3,...,9). We then simply round to nearest integer> 12.8 is closest to 13.

Divide by 5 ("divide by 10 and double")
This one's really easy too: x/5 = x/10*2, so all you need to do is divide by 10 (using the method above), and then double the result. Example: 128/5 can be calculated first dividing 128 by 10, giving 13. Doubling gives 26.

Divide by 3 ("take average of 2 and 5")
This one's a bit harder. A nice way to estimate x/3 that builds on the techniques already mentioned is to take the average of x/2 and x/5.
Example: 175 divided by 3 can be approximated by noting that:
175/5 = 18*2 = 36
175/2 = (100 + 75 + 5) / 2 = 50 + 37 + 3 = 50 + 40 = 90
Taking averages, we get (36+90)/2 = 126/2 = (100+20+6)/2 = 50 + 10 + 3 = 63.
The exact answer is 58.333.
In general, this technique gives (x/2 + x/5)/2 = 7x/20 = 0.35x, instead of 0.3333x.
(It may seem more intuitive to take average of x/2 and x/4, but this gives a worst estimate of 0.375x.)

Divide by 7 ("take average of 5 and 10")
Also a bit harder. When doing x/7, we can take the average of dividing by 5 and 10.
384/7 is somewhere between:
384/10 = 38
384/5 = 38*2 = (30+8)*2 = 60 + 16 = 76
Taking averages, we get (38+76)/2 = (100 + 14)/2 = 50 + 7 = 57
The exact answer is close to 55.
This techniques calculates (x/5 + x/10)/2 = 3x/20 = 0.15x instead of x/7 = 0.143, so it's a good estimate for relatively small sums of money.

Here's another example. 1 THB (Thai Baht) is 0.032 USD. How much is the saleswoman asking for in USD when she says 650 THB? The answer is 650*0.032 USD. To estimate this quantity, we manipulate the conversion rate 0.032 into rules of the form above. Note that 0.032 = 32/1000 which we approximate as 30/1000 = 3/100 (32/1000 is (30+2)/1000 = 3/100 + 2/1000, so we're skipping a term containing two parts in 1000).

Calculating 650 * 3/100 is easy: do the division first (dividing by 100 is dividing by 10 two times, since 100 = 10*10). We then get 6.5, which we round to 7. Multiplying by 3 gives 21. Here's the full estimate in one line:

Let's now convert from GBP (Brittish pound) to USD. 1 GBP = 1.56 USD. How much is Harrods asking you in USD when they say 489 GBP?

First we deal with the conversion rate. 1.56. This is close to 1.6, which we instead use. 1.6 = 16/10, and looking at prime factors we have:

This works: we can divide by 5 and then double 3 times. It's however rather tedious, as if we allow a little more rounding error, 1.56 is also close to 1.5, which gives:

A much simpler calculation: we half, then multiply by 3.
Thus, estimating 489 GBP is done as follows:
489/2 = (400 + 80 + 9)/2 = 200 + 40 + 5 = 245.
245 * 3 = (200 + 40 + 5)*3 = 600 + 120 + 15 = 720 + 15 = 735.

489 GBP is approximately 735 USD (although for such large amounts I would suggest you use a calculator!).

Monday, June 24, 2013

Intuitive Explanation for False Positives and False Negatives

Let's say you go to your favorite local bar, and do what most people do in bars: propose marriage to anyone of the opposite sex (that you find attractive enough, whatever that means to you).

When everything's said and done, there's only two ways it can go: Yes (success) or No (failure).
Furthermore, let us, for the sake of argument, assume that you only approach those you believe will say yes (perhaps to save yourself some embarrassment, although the reason doesn't matter).

Let's also say you'd like to keep track of your success rate, so you count the number of people you've approached, as well as the number of "Yes" you got (you don't stop as soon as you get a Yes, perhaps because you want the best Yes possible).

The hit rate is the proportion of success you've had:

Hit Rate = (Number of Yes) / (Number of Yes + Number of No) = (Number of Yes) / (Number of People Asked)

This seems like a good measure for success, except you've forgot something crucial: there's two ways things can go wrong:
  1. You asked someone and got a No (that's the above situation).
  2. You didn't asked someone that would have said Yes.

When you ask someone and they say Yes, that's called a true positive. When you ask someone and they say No, it's a false positive: positive because you thought they would say Yes, and false because you were wrong.

When you don't ask someone (because you think they will say No), and they would have said no, it's called a true negative. If, however, you never asked but they would have said Yes, that's a missed opportunity, a false negative: negative because you thought they would say No, false because you were wrong.

It may seem strange to talk about what would happen if we did something we never did, but the context in which this is performed in science is usually an instrument that is supposed to detect something. For example, if we do a test for a certain disease, a "positive" means that the diseases is there, and a "negative" means that the disease is not there (which to call "positive" and "negative" is largely a matter of definition, it is just a binary classification). We could thus verify the correctness of the "guesses" by using samples for which we know the true state (or by repeating the experiment multiple times, assuming we know the probability of success is more than 50%).

If the experiment succeeds it's a "true", otherwise "false". We can visualize the four options: The instrument (or human) "guesses", and the classification happens according to that guess and the real world state.

PositiveGuess Yes, Real YesGuess Yes, Real No
NegativeGuess No, Real NoGuess No, Real Yes

In this more formal language, the hit rate is defined as:

Hit Rate = (True Positives) / (True Positives + False Negatives) = (True Positives) / |"Everything Actually Positive"|.

The danger in situations like the marriage proposals is that we don't observe the Outcome of the Predicted No, since we never experience that alternate reality. Thus the True Negatives and False Negatives are hidden from view, and we can't measure the proportion of missed opportunities:

Miss Opportunities = (False Negatives) / (False Negatives + True Negatives).

This measure is interesting in our bar example but usually not used in other contexts. In general, a table of confusion can be used to list all prediction/outcomes. What should be clear is that the Hit Rate is not necessarily the best measure of success in general, as it doesn't account for False Positives.

Likewise, an instrument that is good at detecting a disease has few False Negatives, but what about the amount of people that are healthy but the instrument incorrectly classifies as having the disease (False Positives)? This is usually less of a priority in this case, since it is seems less bad to classify healthy people as sick than sick people as healthy.

Whether it is the False Positives or False Negatives we would like to detect best depends on how we describe the problem: in this case we call healthy people "negatives" and sick people "positives" (it thus has nothing to do with having a "positive" or "negative" outlook on things!).

It is usually a matter of prioritizing: if the machine can aggressively detect traces of a disease, it may also incorrectly find patterns that are in fact not due to the disease. Put differently, the machine is sensitive, meaning that has a high hit rate.

The "opposite" of sensitivity (hit rate) is specificity: how many healthy people that are classified as being healthy.

Specificity = |"Guessed Healthy"| / |"All Healthy"| = (True Negative) / (True Negative + False Positive)

One measure that accounts for everything (False Positives as well as False Negatives) is Accuracy:

Accuracy = (True Positives + True Negatives) / (True Positives + True Negatives + False Positives + False Negatives) = |"Correct Predictions"| / |"All Predictions"|

Thursday, May 9, 2013

How to Estimate the Number of Candies in a Jar (and Pack Tightly)

Two related questions:
  1. How do you pack balls (spheres) as tightly as possible?
  2. How do you estimate the number of candies in a jar?

Packing Spheres Tightly

The relevant quantity to study is the following: given a certain amount of volume of space T, how much volume of balls S can we put into that space? The ratio S/T is called the packing factor, and it gives the proportion of space that is occupied by balls.

There are various ways to pack. The following picture shows three suggestions (the balls are sliced so that you only see the parts that are inside the cube):

In the left most figure (simple cubic), we see the "corners" of eight different balls; the packing is essentially putting balls on top of each other (and next to each other) in a grid like fashion.

In the right most (body centered cubic), we put one ball int he middle and surround it by eight other balls. So instead of stacking balls like a grid, they "sink in to each other".

The middle one (face centered cubic) is similar to body centered cubic in that the balls "sink in", but here we "smash them together" not only at their corners, but also on the sides (the blue balls).

It should be intuitively clear that the simple cubic is not the best way to pack balls as densely as possible.
In fact, the picture suggests that the face centered cubic is better than the body centered cubic.
More interesting, we would like to know the percentage of wasted space (or equivalently, used space) for each configuration, as well as the best we can in principle do.

Define some useful variables:
L = Length of one side in the cube.
R = Radius of a ball.

Packing Factor for Body Centered Cubic

Now we calculate the packing factor for the body centered cubic:

The factor 2 in S comes the fact that inside the volume T, we have 2 balls: the one in the middle, and the 8 other pieces, which are one eight in side each (it is quite easy to see that they can be put together into a ball by looking at the picture).

Dividing S by T should give the answer, except that we haven't figured out the relationship between the length of the cube, L, and the radius of the balls, R.

The diagonal from side corner to the opposite (farthest away) will be of length , which can be seen as follow:  a diagonal across one surface of the cube is  by the Pythagorean theorem, and so  .

Looking at the picture (c) above, the balls on each corner touch the middle one, so looking at a straight line from one corner to the other (of length D), we are traversing 4R of length: . From our two identities involving D, we get:

We can now express our T in terms of R too, which gives:

That is, using body centered cubic packing (picture c), we use up about 68% of the available space. Equivalently, we thus waste 32%.

Packing Factor for Face Centered Cubic

Looking at the blue balls in picture b, we see that there's 6 of them (four on the sides, one on top and bottom). Each is half a ball, so this gives 3 balls. We get one more ball by putting together the 8 corner pieces, for a total of 4 balls. Thus:

The relationship between T and R may however have changed, as we are no longer centering around a specific ball. This time, going from one corner to another (D above) won't work since there are gaps in between the balls when doing so (this can be seen in the picture). However, looking at one side, we see that the side diagonal crosses a length of 4R (with no gaps), and this length is also . We thus get:

Face centered packing is thus more compact than body centered (as the picture suggests), using 6% more of the space.

Can we do even better?

There are other ways of packing which gives the 74% above, but are there even better ways?
Kepler postulated that the answer should be no, and there is a computer generated proof (a proof by exhaustion) that this is impossible: Hales' proof. Thus the answer is no, except that mathematicians are somewhat suspicious of the proof given the prevalence of bugs in computer code (the non-computer parts of the proof are considered reliable).

So to answer: almost surely not.

How do you estimate the number of candies in a jar?

Ignoring more intrusive methods, such as weighting the jar or x-raying the bottle and counting each candy, we obtain a way based on packing factors:
  1. Estimate the volume of the jar
  2. Estimate the percentage of the jar that is filled with candies (the packing factor defined above)
  3. This gives the volume of candies. Divide that volume by the estimated volume of one candy.
You may be tempted to use the 74% above as your packing factor in step 2, but this is in fact not usually the right thing to do, because there's no guarantee that pouring down (round) candies into a jar will give a perfect packing. In fact, it isn't so: simply pouring down balls will give a packing factor of around 64%, that is, closer to something like the body rather than face centered packing. So it is 64% you should use as your packing factor - under the reasonable assumption that candies were simply poured into the jar, as opposed to, say, carefully put into their optimal position one by one.

Also, if the candies are not round, the packing factor may be somewhat altered. The exact mathematics easily becomes forbiddingly complex (say for teddybear shaped candies), so you're left with two choices: use the approximation 64% (perhaps adjusted according to some reasonable logic), or buy the same jar, fill it with the same candy, pour out all the candies again, and count them.

Monday, April 29, 2013

Intuitive Explanation why Horn and Dual Horn Clause Logics are Linear Time Solvable

Propositional Logic

In propositional logic (equivalently, boolean algebras), a common problem is to find a true/false assignment to all boolean variables such that all formulas are satisfied.

In mathematics, we would call this a solution, in logic, it's called a model:

2x + 3y = 16 has an integer solution {x=5, y = 2} (there are infinitely many integer solutions).
x and not y  has a model {x = true, y = false}

In general, the problem of finding models to propositional formulas is NP-complete.
It is by many believed to be exponential: in the worst case, the best known algorithms do no better than the truth table method of trying out all 2^N possibilties, where N is the number of boolean variables (in practice DPLL based algorithm like Chaff perform significantly better across almost all problems).

There are however a few subclasses of problems that are linear time solvable. Two such that have practical value are Horn and Dual Horn Clause logic, which we consider here.

If you don't know it yet, all propositional formulas can be written in Conjunctive Normal Form, which you'll need to know about before reading on.

Horn Clause Logic (propositional)

Horn clauses are clauses that contain at most one positive literal.
I'll assume that the empty clause is not part of the database: if it is, there's obviously no model (this can be checked in linear time by simply traversing all clauses).

Let's start with the case where all clauses have no positive literals. Then all literals in all clauses are negative, so there is no conflict between any literals: all we need to do in order to satisfy all clauses is set all variables to false.

Now assume some (or all) clauses have a positive literal. If they also have a negative literal, we're back to our previous situation: by setting all literals to false, all clauses (include these new ones) are satisfied.

So the problem is when we have unit clause with one positive literal (and therefore, no negative literals). Clearly, we have no choice but to set these literals to true. This triggers a cascade of inferences by which other literals become unit. For example:

If we have the unit clause [X] and [ -X, Y ], we have no choice but to set X=1.
Then the second clause reduces to [ Y ], which in turn propagates Y=1.
And so on. This process is known as unit propagation.

After we've finished unit propagating, all necessary variable assignments have been performed. For the remaining variables, we have the same situation as before: they occur in non-unit clauses, so we can trivially handle them by assigning everything else to false. The algorithm is thus:

  1. Unit Propagate (this may fail, e.g. if we have [X] and [-X], whereby there is no model).
  2. Set remaining variables to false.

Linear Time Unit Propagation

The obvious algorithm for unit propagation is to pick all unit clauses, put the literal in a queue (or stack, it doesn't matter), and then go through the queue of literals doing:

1. Eliminate all clauses containing the literal (this step can be omitted if we use two literal watch, see below).
2. Remove negated literal from all clauses.
3. Scan for new unit clauses, put literal in queue and add to model.
4. Repeat until queue is empty.

This is a loop nested inside another loop: for each literal (in the queue), we linearly scan all clauses to simplify them (step 1 and 2 above). Thus the complexity is O(N^2). The second step, of scanning through all literals, can be avoided by keeping a mapping from all literals (positive and negative literals are distinct) to clauses.

For example:
C1 = [X]
C2 = [ -X, Y ]
C3 = [ -X, -Y ]

Will give the table:
X : C1
Y : C2
-X: C2, C3
-Y: C3

In step 1 and 2 above, we can thus directly go to the clauses containing the literal (step 1) and its negation (step 2). Note that if this table is stored as a std::map, it will have lookup time log(N). If we use a hash map (with linear chaining, i.e. std::unordered_map), we will ideally get O(1) but worst case will be O(N). A way to get a truly O(1) lookup is to store the literals in a jump table so that we can immediately get a list of the clauses (std::vector<std::vector<clause*>>).

This presumes that we remove clauses/literals. If we use lazy datastructures, e.g. two literal watch, the obviously implementation of scanning for another available literal to watch yields a worst case of O(N) operations per literal deletion, so we end up having a O(N^2) worst case anyway. (That is, the obvious algorithm that had O(N^2) would actually be O(N^3) using two literal watch.) This is perhaps mostly academic, as collisions between assignments and watch literals are rare in large clauses. One could add another table to keep track of all available literals for each clause, making updating watched literals O(1). However, this benefit probably translate into worst performance when using the full DPLL algorithm on general problems, as backtracking would be more expensive: now we need to add/remove literals from this new table during every assignment/backtracking.

There is paper about linear time unit propagation here. It uses two literal watch and assumes that literals are updated in constant time.

What about Dual Horn Logic?

In dual Horn logic, all clauses have at most one negative literal (instead of one positive, as in Horn logic).
So if we have non-unit literals, there must be at least one positive literal, and setting all variable assignments to true will thus be a model. If we have unit clauses, we unit propagate. The algorithm is:

  1. Unit Propagate (may fail, see Horn clause example).
  2. Assign True to all remaining variables.
So the only difference from Horn logic is that after unit propagation, we assign true to all remaining literals instead of false.

Can DPLL solve Horn and Dual Horn Logic in Linear time?

Yes, but only if we use the selection strategy that (first) sets all variables to false for Horn logic, and true for dual Horn logic. This works since the DPLL is essentially:
  1. Unit Propagate.
  2. If propagation fails, backtrack. Go to step 3.
  3. If there are no more literals to assign, return the model.
  4. Select unassigned variable (according to selection strategy). Go to step 1.
If DPLL's backtrack fails, it will exit with a "no models available". It's easy to see that DPLL properly handled the Horn (Dual Horn) logic cases for these particular selection strategies, although with an overhead: after each assignment in step 4, DPLL tries to unit propagate, although no such propagation actually takes place. It then re-enters step 4 and assigns false (or true) to the next literal. This process is repeated until all remaining literals are set to false (true). In the efficient algorithms above: all remaining variables are directly set to false (true) with no unnecessary unit propagation between every assignment.

As mentioned before, there may be issues with true linear time solvability when using two literal watch with schemes that are efficient for backtracking (O(N) updating) vs schemes that are efficient for quickly updating watched literals (O(1) updating, but constant time penalty when propagating and backtracking).

The problem is that DPLL doesn't know that these assignments are safe in the Horn and Dual Horn cases (they are not "safe" in general, unit propagation is necessary, and this may trigger backtracking or solve the problem entirely).

So intuitively, the reason why Horn and Dual horn logic are linear time solvable whereas the general case is exponential is because of choice points and backtracking: in the general case, we need to make guesses, and if these guesses are wrong, we need to backtrack and try the other branch. So we are forking the possible paths repeatedly. With Horn and Dual horn logic, no guesses need to be made, and hence, no backtracking occurs.

What about First-order Horn Clause Logic?

It is semi-decidable.

Proof: one can encode integers and basic arithmetic operations in Prolog. Thus general diophantine equations can be expressed, and a query can be made that succeeds if and only if it has a solution. If queries would be decidable, this would give a decision procedure for diophantine equations, contradicting Matiyasevich's theorem. So it is undecidable. That it is semi-decidable now follows from the Church-Turing theorem.