Monday, April 29, 2013

Intuitive Explanation why Horn and Dual Horn Clause Logics are Linear Time Solvable

Propositional Logic

In propositional logic (equivalently, boolean algebras), a common problem is to find a true/false assignment to all boolean variables such that all formulas are satisfied.

In mathematics, we would call this a solution, in logic, it's called a model:

2x + 3y = 16 has an integer solution {x=5, y = 2} (there are infinitely many integer solutions).
x and not y  has a model {x = true, y = false}

In general, the problem of finding models to propositional formulas is NP-complete.
It is by many believed to be exponential: in the worst case, the best known algorithms do no better than the truth table method of trying out all 2^N possibilties, where N is the number of boolean variables (in practice DPLL based algorithm like Chaff perform significantly better across almost all problems).

There are however a few subclasses of problems that are linear time solvable. Two such that have practical value are Horn and Dual Horn Clause logic, which we consider here.

If you don't know it yet, all propositional formulas can be written in Conjunctive Normal Form, which you'll need to know about before reading on.

Horn Clause Logic (propositional)

Horn clauses are clauses that contain at most one positive literal.
I'll assume that the empty clause is not part of the database: if it is, there's obviously no model (this can be checked in linear time by simply traversing all clauses).

Let's start with the case where all clauses have no positive literals. Then all literals in all clauses are negative, so there is no conflict between any literals: all we need to do in order to satisfy all clauses is set all variables to false.

Now assume some (or all) clauses have a positive literal. If they also have a negative literal, we're back to our previous situation: by setting all literals to false, all clauses (include these new ones) are satisfied.

So the problem is when we have unit clause with one positive literal (and therefore, no negative literals). Clearly, we have no choice but to set these literals to true. This triggers a cascade of inferences by which other literals become unit. For example:

If we have the unit clause [X] and [ -X, Y ], we have no choice but to set X=1.
Then the second clause reduces to [ Y ], which in turn propagates Y=1.
And so on. This process is known as unit propagation.

After we've finished unit propagating, all necessary variable assignments have been performed. For the remaining variables, we have the same situation as before: they occur in non-unit clauses, so we can trivially handle them by assigning everything else to false. The algorithm is thus:

  1. Unit Propagate (this may fail, e.g. if we have [X] and [-X], whereby there is no model).
  2. Set remaining variables to false.

Linear Time Unit Propagation

The obvious algorithm for unit propagation is to pick all unit clauses, put the literal in a queue (or stack, it doesn't matter), and then go through the queue of literals doing:

1. Eliminate all clauses containing the literal (this step can be omitted if we use two literal watch, see below).
2. Remove negated literal from all clauses.
3. Scan for new unit clauses, put literal in queue and add to model.
4. Repeat until queue is empty.

This is a loop nested inside another loop: for each literal (in the queue), we linearly scan all clauses to simplify them (step 1 and 2 above). Thus the complexity is O(N^2). The second step, of scanning through all literals, can be avoided by keeping a mapping from all literals (positive and negative literals are distinct) to clauses.

For example:
C1 = [X]
C2 = [ -X, Y ]
C3 = [ -X, -Y ]

Will give the table:
X : C1
Y : C2
-X: C2, C3
-Y: C3

In step 1 and 2 above, we can thus directly go to the clauses containing the literal (step 1) and its negation (step 2). Note that if this table is stored as a std::map, it will have lookup time log(N). If we use a hash map (with linear chaining, i.e. std::unordered_map), we will ideally get O(1) but worst case will be O(N). A way to get a truly O(1) lookup is to store the literals in a jump table so that we can immediately get a list of the clauses (std::vector<std::vector<clause*>>).

This presumes that we remove clauses/literals. If we use lazy datastructures, e.g. two literal watch, the obviously implementation of scanning for another available literal to watch yields a worst case of O(N) operations per literal deletion, so we end up having a O(N^2) worst case anyway. (That is, the obvious algorithm that had O(N^2) would actually be O(N^3) using two literal watch.) This is perhaps mostly academic, as collisions between assignments and watch literals are rare in large clauses. One could add another table to keep track of all available literals for each clause, making updating watched literals O(1). However, this benefit probably translate into worst performance when using the full DPLL algorithm on general problems, as backtracking would be more expensive: now we need to add/remove literals from this new table during every assignment/backtracking.

There is paper about linear time unit propagation here. It uses two literal watch and assumes that literals are updated in constant time.

What about Dual Horn Logic?

In dual Horn logic, all clauses have at most one negative literal (instead of one positive, as in Horn logic).
So if we have non-unit literals, there must be at least one positive literal, and setting all variable assignments to true will thus be a model. If we have unit clauses, we unit propagate. The algorithm is:

  1. Unit Propagate (may fail, see Horn clause example).
  2. Assign True to all remaining variables.
So the only difference from Horn logic is that after unit propagation, we assign true to all remaining literals instead of false.

Can DPLL solve Horn and Dual Horn Logic in Linear time?

Yes, but only if we use the selection strategy that (first) sets all variables to false for Horn logic, and true for dual Horn logic. This works since the DPLL is essentially:
  1. Unit Propagate.
  2. If propagation fails, backtrack. Go to step 3.
  3. If there are no more literals to assign, return the model.
  4. Select unassigned variable (according to selection strategy). Go to step 1.
If DPLL's backtrack fails, it will exit with a "no models available". It's easy to see that DPLL properly handled the Horn (Dual Horn) logic cases for these particular selection strategies, although with an overhead: after each assignment in step 4, DPLL tries to unit propagate, although no such propagation actually takes place. It then re-enters step 4 and assigns false (or true) to the next literal. This process is repeated until all remaining literals are set to false (true). In the efficient algorithms above: all remaining variables are directly set to false (true) with no unnecessary unit propagation between every assignment.

As mentioned before, there may be issues with true linear time solvability when using two literal watch with schemes that are efficient for backtracking (O(N) updating) vs schemes that are efficient for quickly updating watched literals (O(1) updating, but constant time penalty when propagating and backtracking).

The problem is that DPLL doesn't know that these assignments are safe in the Horn and Dual Horn cases (they are not "safe" in general, unit propagation is necessary, and this may trigger backtracking or solve the problem entirely).

So intuitively, the reason why Horn and Dual horn logic are linear time solvable whereas the general case is exponential is because of choice points and backtracking: in the general case, we need to make guesses, and if these guesses are wrong, we need to backtrack and try the other branch. So we are forking the possible paths repeatedly. With Horn and Dual horn logic, no guesses need to be made, and hence, no backtracking occurs.

What about First-order Horn Clause Logic?

It is semi-decidable.

Proof: one can encode integers and basic arithmetic operations in Prolog. Thus general diophantine equations can be expressed, and a query can be made that succeeds if and only if it has a solution. If queries would be decidable, this would give a decision procedure for diophantine equations, contradicting Matiyasevich's theorem. So it is undecidable. That it is semi-decidable now follows from the Church-Turing theorem.


Wednesday, April 17, 2013

Paradoxes and their Solutions

You've almost certainly heard some of these (logical) paradoxes. They're fun brain puzzles, and people like to share them. Surprisingly, most people who share them do not know their solution, nor try to solve them. Here are some of the most well known paradoxes and their solutions.

Note that these are paradoxes of more logical nature (i.e. "mind puzzles"), and consequently, their solutions explore logic and language.

There are also paradoxes within scientific theories, with scientific solutions, which I might write about another time.

Liar Paradox

Problem


Perhaps the most well known paradox. The version of the paradox that is the most clear is the following. Consider the sentence: "This sentence is false". We are wondering whether the sentence itself is true or false.

Assume the sentence is true.
Then clearly the sentence is false, a contradiction.

Now assume the sentence is false.
Then the sentence is true, also a contradiction.

It seems we cannot assign a binary truth value (true/false) to the sentence.

This paradox is sometimes given in the form: A person says that he is lying. Is he telling the truth or a lie?

Solution

There are two things that are unusual about the sentence.

First, it makes a reference to itself, by using the word "this sentence". This is known as self reference, and the statement is said to be self-referential.

Second, the sentence assumes that it can express the truth/falsehood of other statements. 

To see this, imagine that we were to express the sentence in a more formal language. It would look something like:

Sentence 1: not is_true(get_sentence(1))

In other words, we are assuming that we can get a sentence by it's label ("get_sentence(1)", which is the sentence itself, since we gave it label 1) and that we have a predicate "is_true" which returns true/false accurately (we need not assume anything about the computability of these functions, only that they are definable within the language).

If you haven't studied logic you might be inclined to think that the first point is a pretty serious problem whereas the second is trivial. You'd be wrong, however, because it's the other way around: assuming we can express "is false" is a much more problematic issue than using self references.

Here's why:
Self-referential statements are everywhere. I can easily get rid of the self reference in the liar paradox by doing the following:

Sentence 1: "Sentence 2 is true"
Sentence 2: "Sentence 1 is false"

If Sentence 1 is true, then both are true, which in turn means that Sentence 1 is false, a contradiction.
If Sentence 2 is false, then both are false, which in turn means that Sentence 1 is true, a contradiction.

You might feel like this point is moot, cause all I've done is replaced "this" with a reference to another sentence, which in turn references back to the first sentence. After all, decoupling the sentence does solve much since we can still express everything using get_sentence and is_true:

Sentence 1: is_true(get_sentence(2))
Sentence 2: not(is_true(get_sentence(1)))

That's a good point, except that the problem doesn't go away even if you forbid all kinds of references (to the language itself). In particular, almost any logical system can be re-encoded so that the sentences can be seen to implicitly contain self references. This is done using something called Gödel encodings, and it is roughly the equivalent of assigning every sentence in the English alphabet a unique number in a systematic way. get_sentence is easy to construct, even in languages where it was not intended to exist.

The same cannot be said for is_true: it does not in general exist. For example, arithmetic does not admit a truth predicate so there is no way of expressing "1+2=3 is true" in arithmetic. This result is known as Tarski's undefinability theorem.

You might wonder what all of this has to do with the English language, in which the paradox is presented. Well the point is that if we allow unrestricted use of natural languages (such as English), we can express pretty much anything. The problem is that many things we can express in English won't even be logical sentences, in the sense that they will not have truth values (such as the sentence under consideration). Unrestricted use of language also opens up the possibility for expressing nonsense.

What we can learn from the liar paradox is that we must clearly handicap our "language" if we wish to have a "no nonsense" language (e.g. predicate logic, modal logics, Horn clause logic). How do we handicap such languages to make sure the liar paradox doesn't appear? Gödel has shown us that there isn't much point in trying to remove self references; they occur very easily. Taski has shown us that a truth predicate "is true" is not usually available, so this is certainly more of a comfort.

In arithmetic, you cannot express sentences such as the liar paradox, not because it is self referential, but because it asserts the truth of arithmetical sentences. Put differenty, in arithmetic, is_true is not definable.

Barber's Paradox / Russell's Paradox

Problem

The barber's paradox arises from the following assumption and question: A barber shaves everyone who doesn't shave themselves. Does he shave himself?

As we did for the liar paradox, we check to see which is true:

Assume the barber shaves himself.
Then he doesn't have himself, since he only shaves those that don't shave themselves.
This is a contradiction.

Now assume the barber doesn't shave himself.
Then he shaves himself, a contradiction again.

Solution

So this sentence is neither true nor false. As mentioned in the liar paradox, that is not so much of a problem in English, because we really shouldn't expect all sentences to even make sense. There is no problem in imagining a barber that shaves everyone who don't shave themselves, except that he shaves himself too. Or a barber that goes to another barber. The problem is within the confines of language, physically, there is no way to create the contradiction.

The lesson to learn from this was a hard one historically. Russell's paradox is a more formal (mathematical) version of the paradox, which has to do with set theory. Imagine that a set is a box. We can have an empty box, or put boxes inside empty boxes. So denote the empty box with { }. If we have something, X, inside a box, we'll write {X}. And if we have two things, like X and Y, we'll write {X,Y} and so on. Assume we're allowed to create boxes by (perhaps infinitely) repeating the following:

The empty box {} is a box.
{X,Y,...} is a box if X,Y,... are all boxes.

So we know that {} is a box, and { {} } is a box (the box that contains an empty box). Then we can put that box inside a box: { {{}}}. But these are not the only ways we can make boxes. We can put all three of these boxes into one box: { {}, {{}}, {{{}}} }. If you're wondering why this is useful to do in mathematics, it's because from this you can create all numbers. For example, define:

0 = {}
1 = {0} = { {} }
2 = {0,1} = { {}, {{}} }
3 = {0,1,2} = { {}, {{}}, {{},{{}}} }
...

These seems all good, until you define the following box (set):

B is the box containing all boxes that do not contain themselves. Does it contain itself?

What is meant by a box that contains itself? It quite literally means that you can find a copy of the box inside the box. If you think that's impossible, it's because you're thinking only about finite boxes. If we have infinite nesting, such as {{{{...}}}}, then if we "box it", i.e. wrap it around one more { }, we can see that the box has a "copy of itself". So let X = {{{....}}}. Then {X} is a box containing itself, but so is {X,X}, {X,{}}, {X,{},{{}}}, and so on. So there's not just one box that contains itself, but an infinite number.

Now back to the paradox: Does B contain itself?
Assume it does. Then B is inside B, i.e. B = {B,...}. But B is the box containing all boxes (say X) that do not contain themselves. Since B contains B, B cannot be X here. Therefore B does not contain itself.
Now assume B does not contain B. Then, since B contains all X that do not have X, and B is precisely such an X, B contains itself.

So we have: B in B if and only if B not in B, i.e. the set contains itself if and only if it does not contain itself. It's usually said to be a contradiction, but it is technically not: you can't prove "B in B" or "B not in B", only that "B in B iff B not in B". It's only a contradiction if you assume that the sentence must be true or false.

Regardless, it is an undesirable feature of a logical system to say the least.
So these simple rules to create boxes actually lead to the possibility of creating a sentence that is neither true nor false (or both). What set theorists learnt from this is that the creation of boxes (sets) must be restricted (see Von Neumann Universe for three intuitive rules that create sets without this paradox).

Technical note: Boxes are defined so that they can contain multiple copies, e.g. { {}, {} }, whereas (non-fuzzy) sets either contain or do not contain an element. Boxes are in fact multisets. None of this changes the argument however: from sets one can create multisets, and vice versa.

Zeno's Paradox

Problem

Zeno's paradox has many forms, here's one that is easy to understand: A man is going to walk 100 meters. To reach there, he must first cross 50 meters. But to reach that, in turn, he needs to cross 25 meters. It's easy to see that when we keep halving the distances, we get increasingly smaller numbers (but never exactly zero). This suggests that the man most move "infinitely much" in order to reach the 100 meters (in fact, in order to reach anywhere at all!).

Solution

The paradox itself makes some questionable assumptions about the nature of reality. More specifically, it assumes that we can divide distances into halves indefinitely. The existence of a smallest measurable length, the Planck length, certainly puts doubt on this.

But even if we accept the premiss, one could ask where the problem really is anyway.
Is it that we find it unreasonable that he can move an infinite number of times? We have already accepted that there's infinitely much space to move in, so perhaps we should also accept that he can move infinitely much (in finite time), especially since the distances are getting smaller and smaller. Mathematically, he must move: 100/2 + 100/4 + 100/8 + 100/16 + ... = 100 meters. So if we can complete the infinite series, he'll be just fine. From this point of view, Zeno's paradox really illuminates its own assumptions:

1. Space is continuous (we have infinitely many steps to move through).
2. Time is discrete (we are not allowed to pass through all infinite steps).

Modern science suggests both are discrete, in which case there's no problem: at each discrete time step, the person moves forward a finite number of discrete steps. And as shown above, if both are continuous, there isn't any problem either.

Turning the paradox against itself, we can observe that we can indeed move, so the combination of continuous space and discrete time seems incorrect as far as reality goes.