Wednesday, March 19, 2014

Inductive Reasoning Visualized

Ever wondered what inductive reasoning (technically, using inductive logic programming) would look like if you could draw a picture?
Here's how:



I'll explain what you're looking at. Let us say you know the following facts:
Gaia is a parent of Cronus.
Cronus is a parent of Zeus.
Zeus is a parent of Athena.

You also know that:
Gaia is a grandparent of Zeus.
Zeus is a grandparent of Athena.
Gaia is not a grandparent of herself.
Gaia is not a grandparent of Cronus.
Cronus is not a grandparent of Gaia.
Athena is not a grandparent of Cronus.

Now you ask the computer to induce a definition of grandparenthood based on these facts.

To do this, the machine needs to try different possibilities, and these are what you see in the graph.

On top, you see:
grandparent(_,_)
Now the sentence "X is a grandparent of Y" is what the machine writes as "grandparent(X,Y)", and an underscore simply means "anybody". So this is the hypothesis that "anybody is a grandparent of anybody".
The machine knows this to be false because of what you see in the red square: 4. Four is the number of "problems" found with this hypothesis: namely, it predicts that "Gaia is a grandparent of herself", which we know to be false. It predicts every instance of "is not a grandparent of" above, and there are 4 of them. Thus this hypothesis is not consistent with observations (its predictions are falsified).

Next we have
grandparent(A,_) :- parent(A,_)
which in English reads "A is a grandparent of anyone if A is a parent of anyone". As you can see, the red box says 3, because it has 3 problems: it predicts that Gaia is a grandparent of herself, since Gaia is a parent (of whom does not matter, but it happens to be Zeus), which we know to be false. For the same reason, it predicts that Gaia is a grandparent of Zeus, which is also false. Finally, it predicts that Cronus is a grandparent of Gaia, since Cronus is a parent (again, of whom does not matter). The negative example "Athena is not a grandparent of Cronus" is not (incorrectly) predicted to be true, since Athena is not a parent.

This is the basic idea in Inductive Logic Programming: we construct lots of hypotheses and test them against the examples we have. There are two solutions that look promising (green boxes):
grandparent(A,B) :- parent(A,C), parent(C,B)
which states that A is a grandparent of B if A is a parent of some C and that C is a parent of B. This is indeed the correct definition, and it does cover both positive examples ("Gaia is a grandparent of Zeus", "Zeus is a grandparent of Athena"), and does not cover any of the 4 negative examples.

The other promising solution is
grandparent(A,B) :- parent(A,C), parent(C,B), parent(B,_)
which gets the grandparent relation slightly wrong: on top of requiring the correct conditions (A is a parent of some C, which is a parent of B), it also requires the grandchild B to be a parent. So according to this definition, Athena is not the grandchild of Gaia because she does not (yet) have any children, but when she does, she'll satisfy the conditions. The machine knows this definition is worse than the right one because it cannot explain the fact that Zeus is a grandparent of Athena. Hence it only explains one of the two facts (that's the 1 in the green gox).

I'll leave it as an exercise to interpret all the other hypotheses in the graph.

The picture was produced using my ILP system Atom.