Monday, April 29, 2013

Intuitive Explanation why Horn and Dual Horn Clause Logics are Linear Time Solvable

Propositional Logic

In propositional logic (equivalently, boolean algebras), a common problem is to find a true/false assignment to all boolean variables such that all formulas are satisfied.

In mathematics, we would call this a solution, in logic, it's called a model:

2x + 3y = 16 has an integer solution {x=5, y = 2} (there are infinitely many integer solutions).
x and not y  has a model {x = true, y = false}

In general, the problem of finding models to propositional formulas is NP-complete.
It is by many believed to be exponential: in the worst case, the best known algorithms do no better than the truth table method of trying out all 2^N possibilties, where N is the number of boolean variables (in practice DPLL based algorithm like Chaff perform significantly better across almost all problems).

There are however a few subclasses of problems that are linear time solvable. Two such that have practical value are Horn and Dual Horn Clause logic, which we consider here.

If you don't know it yet, all propositional formulas can be written in Conjunctive Normal Form, which you'll need to know about before reading on.

Horn Clause Logic (propositional)

Horn clauses are clauses that contain at most one positive literal.
I'll assume that the empty clause is not part of the database: if it is, there's obviously no model (this can be checked in linear time by simply traversing all clauses).

Let's start with the case where all clauses have no positive literals. Then all literals in all clauses are negative, so there is no conflict between any literals: all we need to do in order to satisfy all clauses is set all variables to false.

Now assume some (or all) clauses have a positive literal. If they also have a negative literal, we're back to our previous situation: by setting all literals to false, all clauses (include these new ones) are satisfied.

So the problem is when we have unit clause with one positive literal (and therefore, no negative literals). Clearly, we have no choice but to set these literals to true. This triggers a cascade of inferences by which other literals become unit. For example:

If we have the unit clause [X] and [ -X, Y ], we have no choice but to set X=1.
Then the second clause reduces to [ Y ], which in turn propagates Y=1.
And so on. This process is known as unit propagation.

After we've finished unit propagating, all necessary variable assignments have been performed. For the remaining variables, we have the same situation as before: they occur in non-unit clauses, so we can trivially handle them by assigning everything else to false. The algorithm is thus:

  1. Unit Propagate (this may fail, e.g. if we have [X] and [-X], whereby there is no model).
  2. Set remaining variables to false.

Linear Time Unit Propagation

The obvious algorithm for unit propagation is to pick all unit clauses, put the literal in a queue (or stack, it doesn't matter), and then go through the queue of literals doing:

1. Eliminate all clauses containing the literal (this step can be omitted if we use two literal watch, see below).
2. Remove negated literal from all clauses.
3. Scan for new unit clauses, put literal in queue and add to model.
4. Repeat until queue is empty.

This is a loop nested inside another loop: for each literal (in the queue), we linearly scan all clauses to simplify them (step 1 and 2 above). Thus the complexity is O(N^2). The second step, of scanning through all literals, can be avoided by keeping a mapping from all literals (positive and negative literals are distinct) to clauses.

For example:
C1 = [X]
C2 = [ -X, Y ]
C3 = [ -X, -Y ]

Will give the table:
X : C1
Y : C2
-X: C2, C3
-Y: C3

In step 1 and 2 above, we can thus directly go to the clauses containing the literal (step 1) and its negation (step 2). Note that if this table is stored as a std::map, it will have lookup time log(N). If we use a hash map (with linear chaining, i.e. std::unordered_map), we will ideally get O(1) but worst case will be O(N). A way to get a truly O(1) lookup is to store the literals in a jump table so that we can immediately get a list of the clauses (std::vector<std::vector<clause*>>).

This presumes that we remove clauses/literals. If we use lazy datastructures, e.g. two literal watch, the obviously implementation of scanning for another available literal to watch yields a worst case of O(N) operations per literal deletion, so we end up having a O(N^2) worst case anyway. (That is, the obvious algorithm that had O(N^2) would actually be O(N^3) using two literal watch.) This is perhaps mostly academic, as collisions between assignments and watch literals are rare in large clauses. One could add another table to keep track of all available literals for each clause, making updating watched literals O(1). However, this benefit probably translate into worst performance when using the full DPLL algorithm on general problems, as backtracking would be more expensive: now we need to add/remove literals from this new table during every assignment/backtracking.

There is paper about linear time unit propagation here. It uses two literal watch and assumes that literals are updated in constant time.

What about Dual Horn Logic?

In dual Horn logic, all clauses have at most one negative literal (instead of one positive, as in Horn logic).
So if we have non-unit literals, there must be at least one positive literal, and setting all variable assignments to true will thus be a model. If we have unit clauses, we unit propagate. The algorithm is:

  1. Unit Propagate (may fail, see Horn clause example).
  2. Assign True to all remaining variables.
So the only difference from Horn logic is that after unit propagation, we assign true to all remaining literals instead of false.

Can DPLL solve Horn and Dual Horn Logic in Linear time?

Yes, but only if we use the selection strategy that (first) sets all variables to false for Horn logic, and true for dual Horn logic. This works since the DPLL is essentially:
  1. Unit Propagate.
  2. If propagation fails, backtrack. Go to step 3.
  3. If there are no more literals to assign, return the model.
  4. Select unassigned variable (according to selection strategy). Go to step 1.
If DPLL's backtrack fails, it will exit with a "no models available". It's easy to see that DPLL properly handled the Horn (Dual Horn) logic cases for these particular selection strategies, although with an overhead: after each assignment in step 4, DPLL tries to unit propagate, although no such propagation actually takes place. It then re-enters step 4 and assigns false (or true) to the next literal. This process is repeated until all remaining literals are set to false (true). In the efficient algorithms above: all remaining variables are directly set to false (true) with no unnecessary unit propagation between every assignment.

As mentioned before, there may be issues with true linear time solvability when using two literal watch with schemes that are efficient for backtracking (O(N) updating) vs schemes that are efficient for quickly updating watched literals (O(1) updating, but constant time penalty when propagating and backtracking).

The problem is that DPLL doesn't know that these assignments are safe in the Horn and Dual Horn cases (they are not "safe" in general, unit propagation is necessary, and this may trigger backtracking or solve the problem entirely).

So intuitively, the reason why Horn and Dual horn logic are linear time solvable whereas the general case is exponential is because of choice points and backtracking: in the general case, we need to make guesses, and if these guesses are wrong, we need to backtrack and try the other branch. So we are forking the possible paths repeatedly. With Horn and Dual horn logic, no guesses need to be made, and hence, no backtracking occurs.

What about First-order Horn Clause Logic?

It is semi-decidable.

Proof: one can encode integers and basic arithmetic operations in Prolog. Thus general diophantine equations can be expressed, and a query can be made that succeeds if and only if it has a solution. If queries would be decidable, this would give a decision procedure for diophantine equations, contradicting Matiyasevich's theorem. So it is undecidable. That it is semi-decidable now follows from the Church-Turing theorem.


Monday, April 22, 2013

Why Math is Useful in Natural Sciences and not in Social Sciences

I'll touch on some fundamental questions here:
  1. Why use Mathematics in Science?
  2. Why does it work so poorly in Social Sciences (e.g. Economics, Psychology)?
  3. Why does it work so well in Natural Sciences (e.g. Physics and Molecular Biology)?

Why use Mathematics in Science?

You might think that mathematics is just about being more precise than words can convey. For examples, instead of conveying nuances such as "it's very warm today" and "it's a little warm today", we can be more precise and say "it's 32 degrees today" and "it's 23 degrees today". So it seems like numbers are useful to remove the subjective interpretation of "very warm" and "a little warm".

But this is not the main strength of using numbers. The main strength is not that you can quantify things more precisely, but that you can reach conclusions that are otherwise completely inaccessible. Consider this:

An artillery piece fires a shell right up into the sky. Will the shell return back to earth? You might think it's obvious that it will, in which case you'd be wrong. There's two opposing things here: gravity, which pulls the shell back to earth, and the velocity at which the shell was fired.

And here's where the true power of science comes in: one can calculate that the velocity needed to escape earth's gravity is 40 000 km/h. That's a lot, but not impossibly much: in 1957, during Operation Plumbbob, the Pascal-B test detonated an extremely weak nuke (300 tons) which propelled a metal plate into the sky at  240 000 km/h, about 6 times escape velocity. That piece of metal is believed to have disintegrated in the atmosphere, but one things for sure: it did not return to earth.

This is the strength of using quantities: we can tell that a pistol bullet will come back to earth, whereas a nuke propelled piece of metal will not. There's two opposing "forces": gravity and velocity, up and down. And because we have a formula to balance them, we can actually get an answer.

Now compare this to the following:
Alice is offered $20 to eat a fresh cockroach. Does she say yes or no? Two opposing forces: money (good) and eating fresh cockroach (bad). Naturally, everybody's different, so the answer depends on the person. There's certainly validity to that argument, and I'll return to this issue when answering the next question.

For now, I want to point out why quantitative science is useful: not only because we can quantity results, but because we would not even be able to reach some conclusions without using quantitative science. In particular, we can't balance two or more opposing things without using quantities.

Sometimes numbers are not necessary. Here's an example from evolutionary psychology: on average, who is a person closest to among these four people: his/her maternal grandmother, maternal grandfather, paternal grandmother, or paternal grandfather? Which is a person the least close to? Evolutionary psychology is based on a set of basic assumptions (much like Newton's laws, except they're not quantitative but rather purely logical), from which one can derived that people tend to be closest to their maternal grandmothers, and least close to their paternal grandfathers? And indeed the predictions are correct. What is the logic behind this? That people know for sure that their mothers are theirs, but not so for their fathers. Thus there is a certainty gap that arises every time a parent is a male. Because people tend to favor their own kin (also a pillar of evolutionary psychology, follows from the modern insight that the unit of survival and replication is the gene), the conclusion follows.

The argument is statistical in nature, but no quantities are involved (it is qualitative). You might argue that, due to the genetic nature of the assumptions, the argument is essentially one of probabilities over shared genes. It would be so if the assumption of evolutionary psychology where that narrow, but they are not: evolutionary psychology studies the interaction of genes in environments (and thus the assumptions are weaker too, but presumably more accurate).

So in this example it works great, but we kindly run into dead ends again if we start balancing different aspects that are in conflict. The problem is that here we could conclude that a person should be close to their mother than father, and that the parent should be closer to their mother than father. So:

closeness(mother(A)) > closeness(father(A)) => closeness(mother(mother(A))) > closeness(father(father(A)))
In short, there is no conflict: the inequalities go in the same direction both times. In form, it is similar to when arguing that if Alice is taller than Bob and Bob taller than Eve, then Alice is taller than Eve:

height(A) > height(B) and height(B) > height(E) => height(A) > height(E).

This holds true of numbers too, but numbers are not necessary for such arguments to hold, such as in this case, where we don't know the actual heights of anyone, yet can still draw a conclusion based on the fact that heights are ordered. Logically, a partial order is sufficient for this reasoning to go through (by definition, any transitive relation).

In general, the more we assume, the more we can conclude. And quantitative sciences (those based on mathematics) make far more assumptions about their domain of study. Consequently, the theories are richer in predictions, and, as a result, research becomes more fruitful as more experiments can be performed.

This is the reason why social sciences do not live up to the impressive achievements of physics: they are not quantitative. This, of course, is something to blame social scientists for. Or is it?

Why does math work so poorly in Social Sciences?

We know that social sciences cannot make strong predictions when many opposing features need to be considered together, due to their lack of quantitative precision. But why can't we get a quantitative precision?

Here's a couple of explanations:
  1. Social scientists are less smart than natural scientists, so they can't find a mathematical foundation.
  2. Social sciences are more complex than natural sciences.
  3. Social sciences are by nature non-mathematical, no mathematical foundation exists.
The main reason why explanation (1) doesn't work is that a lot of mathematicians have worked in the fields of economic theory and behavioral sciences. In particular, John von Neumann, widely regarded as one of the greatest mathematicians to ever live, attempted to create a solid mathematical foundation for social sciences: Game Theory. He even showed that randomly bluffing in poker is not a non-mathematical aspect of the game, but actually falls out from the calculations as the optimal strategy.

Von Neumann is not the only person who has made signifcant theoretical contributions to (theoretical studies of) social sciences, so we can safely rule out (1).

This laid a foundation upon which countless economists have built their academic careers, but there's a reason why almost none have made it in the industry: it's is useless for practical purposes. It may well be that game theory can capture all of social behavior, the problem lies in the fact that it is not much easier to formalize the "game" correctly than it is to simply write down mathematical equations for human behaviors. We have created a mathematical framework that is no simpler than mathematics itself in terms of describing human behavior.

So is there a mathematical foundation? Everybody knows how well math works in physics. The laws of nature seem to be mathematical. At the very least they are very well approximated by mathematics. Here's the key insight: social sciences have a mathematical foundation if, and only if, the laws of nature are truly mathematical (i.e. there are laws that not only approximate reality well but actually describe reality exactly). This follows from straightforward reduction of everything (brain states, body movements) to the laws of nature. This is of course a horrible theory for social sciences due to the microscopic view of things (humans are seen as a pile of atoms), and would be completely useless. But it gives a partial answer to (3).

In fact, (3) isn't even the right question to ask. It may be that the laws of nature are not mathematical, but we still have good approximations today. Perhaps there are no exact laws of social science, but we should nevertheless be able to find good approximations.

An important comparison can be made to physics: the laws of physics are not necessarily "real". Temperature is not a real concept of nature: it is the speed at which atoms bump into each other. Viewed from a microscopic perspective (imagine we seen every atom individually) there is no "heat", just movement of atoms. From a macroscopic perspective, it is the amount of such movements that causes heat in a gas. But "macroscopic perspective" is certainly a human centric view. It may even be a natural point of view for all intelligent biological lifeforms, but it is certainly a creation of our perception, a way to simplify down reality.

The same goes for many other quantities: pressure is the amount of "push" atoms have towards the enclosing surface (e.g. the bottle containing the gas). Again, it's all about the atoms and their movements. 

Why can't we define concepts like "temperature" and "pressure" in social sciences? We could, it's just that they tend not to make any non-trivial predictions. So why is that? Because human brains are not like atoms. In classical physics, atoms only have three useful properties: location, mass, and velocity. Brains need significantly more information than a couple of numbers to describe it.

In mathematics, there is a notion of complexity as "the smallest possible computer program that describes it". This captures the intuitive notion that the string "1212121212" has lower complexity than "7387904583".
The reason is that the first string is described by a computer program that says "output 12 5 times", whereas the second says "output 7387904583". Intuitively, we need a very short program to describe the state of an atom ("location = (x,y,z), mass = m, velocity = v") but a very long and complicated description for the state of a brain.

Complexity in this sense is not the reason. Atoms are memoryless: what an atom did yesterday does not affect its current state. But brains have memory: they remember the past, and their behavior today may significantly have been altered by the past.

A third factor is that atoms influence each other on a very basic level: they knock each other around (they can also be entangled in quantum mechanics, in which they have a relatively simple relation, e.g. opposite spin, which is only one number that is correlated). Brains can have enormously complex influence on each other.

Once you mix plenty of atoms, you get a predictable behavior. The idea that, once you mix plenty of brains, you get predictable behavior, well, is simply not true, as is made obvious to anyone who has invested on the stock market.

We don't need to move all the way from atoms to brains to see the effects of complex interactions. An example from statistics will illuminate the problem: if you want to know the behavior of throwing 100 dices, that's pretty straightforward: the sum will be approximated by a normal distribution with average 350 and standard deviation 0.292. But if you start mixing different kind of dices (some land on 6 more often than 1, or vice versa) as well as interactions between the dices (if one gets a 4, then the others are more likely to land on the same number), then you can't say anything about even what the average is going to be. The reason you can't is because you would need to specify the distribution of probabilities of each dice and how they interact.

The easiness of solving the first case lies in one hidden assumption that is extremely useful: the outcomes of each individual dice does not affect the others (the dices are independent). Once you have dependency, you have two choices: if you know all the correlations between different outcomes, there's still a chance you might get a mathematical solution. If not, you're out of luck. For the sum of 100 brains (whatever that means, say voting), you're going to have B1+B2+...+B100 and that's where the math begins and ends: no useful interpretation, no useful conclusion.

 It is in this sense that I think explanation (2) is the correct one: social sciences are more complex than natural sciences. It's not that it is in impossible to analyze such complex situations: it may be possible with a mind that can handle extremely complicated equations and long, counter-intuitive lines of reasoning by combining a wide variety of techniques (numerical optimization, computer algebra systems, automated theorem provers, simulations, and lots of methods that don't even exist today). It's just that our brains are incapable of such herculean tasks, or we would have done it already - it's not for lack of trying we've failed.


Why does it work so well in Natural Sciences?

Nobody knows the answer to this. We live in a world filled with science and technology (it is tempting to call it the age of science and tech, except that we'll have even more in the future). We've gotten so used to the fact that science can accurately predict the motion of planets, when a comet is going to pass by and whether it will be visible to the naked eye, that thousands of commercial airliners can take off and land everyday (they also achieve this without a pilot), satellites orbit our planet pinpointing our exact location, and robots crawl on the surface of Mars.

All of this is possible because of one thing: the laws of nature are possible to approximate with mathematics. Put differently: we are capable of creating an approximate representation of the real world in our minds. We can compress reality.

We could turn the question and instead ask: what could go wrong when trying to represent reality in our minds?

For one thing, the laws of nature could be hopelessly complex ("incompressible"), just like social sciences are. All of the laws of nature as we know them are simple, in the sense that they admit a very short description: Newton's laws, Maxwell's equations, Schrödinger's wave equation, the equations of general relativity. Now, the laws of nature are at the fundamental level of reality: there is by definition nothing "underneath" them. But why would they have short descriptions? Apparently the description is not as short as we once thought, since Newton's laws had to be replaced by general relativity and quantum mechanics. The latter two contain more equations, and the equations have longer descriptions.

It is in principle conceivable that there is are no final laws of nature: the universe is not mathematical, we only approximate it by inventing (not discovering) laws that are useful for our purposes. Science is essentially an engineering task from that point of view: all that counts is what works. That's not so bad, since it apparently works really well, but more interesting question now is: How would it be otherwise?

It would be that the universe is actually mathematical. The laws of physics as we currently are not nature itself; they only approximate nature. But perhaps that's only because we haven't found them yet, and we're on the right track (that's two different assumptions: mathematical correct laws of nature exist, and we will find them in the future). This would be the case if the universe worked like a computer program (perhaps because we truly live in a Turing-complete simulation). Another explanation, provided by Tegmark, is that all logically consistent theories are actualized as a separate universe. One problem with this explanation, however, is that the vast majority of universes would have extremely complex logical/mathematical laws, making it impossibly unlikely for us to end up in our current universe with simple laws (assuming they ultimately turn out to be simple, perhaps they're not).

This is essentially the concern Wigner had when we wrote a paper titled "The Unreasonable Effectiveness of Mathematics in the Natural Sciences". Why does math work so well in physics?

Clearly, even if the universe is non-mathematical, there's the mystery of why mathematics approximates reality so well. An interesting thought is that perhaps it doesn't: perhaps all mathematics does is describe our understanding of the universe, as we perceive it with our limited minds. As mentioned before, temperature and pressure are not properties of the universe, they are properties of the world as we perceive it macroscopically. A hypothetical divine being that perceives quantum states directly may not have any notion of temperature, since that would not be needed: it is after all about compressing lots of information into just one number, so we can make sense of it. We keep repeating this pattern over and over: in statistics, "all information" is all the available samples. We then compress this data into mean, median, standard deviation, etc. We even replace real world data with known statistical distributions because they approximate them well. But that is something we do to comprehend reality, the fact that it throws away information makes it perfectly obvious that it is man made.

But those notions are not the most fundamental. At the most basic level, we have something like quantum mechanics (general relativity is not the correct fundamental theory). The rest of science (even physics) is built on top of that. The wave equation describing reality does not throw away information. The laws of physics seem reversible, so that we can freely "move" between past and future in the mathematical formulation. That makes it clear that we're no longer throwing away information, so perhaps this is something like reality itself. So even though most of science is "engineering" in the sense that it defines concepts useful to humans, it may also be the case that reality is mathematical: if we ever find unified mathematical laws that describe everything  in the universe, so there is no need for further refinements, we will know that the answer is "yes, the universe is mathematical". The question of why, however, may still remain a mystery.

Wednesday, April 17, 2013

Paradoxes and their Solutions

You've almost certainly heard some of these (logical) paradoxes. They're fun brain puzzles, and people like to share them. Surprisingly, most people who share them do not know their solution, nor try to solve them. Here are some of the most well known paradoxes and their solutions.

Note that these are paradoxes of more logical nature (i.e. "mind puzzles"), and consequently, their solutions explore logic and language.

There are also paradoxes within scientific theories, with scientific solutions, which I might write about another time.

Liar Paradox

Problem


Perhaps the most well known paradox. The version of the paradox that is the most clear is the following. Consider the sentence: "This sentence is false". We are wondering whether the sentence itself is true or false.

Assume the sentence is true.
Then clearly the sentence is false, a contradiction.

Now assume the sentence is false.
Then the sentence is true, also a contradiction.

It seems we cannot assign a binary truth value (true/false) to the sentence.

This paradox is sometimes given in the form: A person says that he is lying. Is he telling the truth or a lie?

Solution

There are two things that are unusual about the sentence.

First, it makes a reference to itself, by using the word "this sentence". This is known as self reference, and the statement is said to be self-referential.

Second, the sentence assumes that it can express the truth/falsehood of other statements. 

To see this, imagine that we were to express the sentence in a more formal language. It would look something like:

Sentence 1: not is_true(get_sentence(1))

In other words, we are assuming that we can get a sentence by it's label ("get_sentence(1)", which is the sentence itself, since we gave it label 1) and that we have a predicate "is_true" which returns true/false accurately (we need not assume anything about the computability of these functions, only that they are definable within the language).

If you haven't studied logic you might be inclined to think that the first point is a pretty serious problem whereas the second is trivial. You'd be wrong, however, because it's the other way around: assuming we can express "is false" is a much more problematic issue than using self references.

Here's why:
Self-referential statements are everywhere. I can easily get rid of the self reference in the liar paradox by doing the following:

Sentence 1: "Sentence 2 is true"
Sentence 2: "Sentence 1 is false"

If Sentence 1 is true, then both are true, which in turn means that Sentence 1 is false, a contradiction.
If Sentence 2 is false, then both are false, which in turn means that Sentence 1 is true, a contradiction.

You might feel like this point is moot, cause all I've done is replaced "this" with a reference to another sentence, which in turn references back to the first sentence. After all, decoupling the sentence does solve much since we can still express everything using get_sentence and is_true:

Sentence 1: is_true(get_sentence(2))
Sentence 2: not(is_true(get_sentence(1)))

That's a good point, except that the problem doesn't go away even if you forbid all kinds of references (to the language itself). In particular, almost any logical system can be re-encoded so that the sentences can be seen to implicitly contain self references. This is done using something called Gödel encodings, and it is roughly the equivalent of assigning every sentence in the English alphabet a unique number in a systematic way. get_sentence is easy to construct, even in languages where it was not intended to exist.

The same cannot be said for is_true: it does not in general exist. For example, arithmetic does not admit a truth predicate so there is no way of expressing "1+2=3 is true" in arithmetic. This result is known as Tarski's undefinability theorem.

You might wonder what all of this has to do with the English language, in which the paradox is presented. Well the point is that if we allow unrestricted use of natural languages (such as English), we can express pretty much anything. The problem is that many things we can express in English won't even be logical sentences, in the sense that they will not have truth values (such as the sentence under consideration). Unrestricted use of language also opens up the possibility for expressing nonsense.

What we can learn from the liar paradox is that we must clearly handicap our "language" if we wish to have a "no nonsense" language (e.g. predicate logic, modal logics, Horn clause logic). How do we handicap such languages to make sure the liar paradox doesn't appear? Gödel has shown us that there isn't much point in trying to remove self references; they occur very easily. Taski has shown us that a truth predicate "is true" is not usually available, so this is certainly more of a comfort.

In arithmetic, you cannot express sentences such as the liar paradox, not because it is self referential, but because it asserts the truth of arithmetical sentences. Put differenty, in arithmetic, is_true is not definable.

Barber's Paradox / Russell's Paradox

Problem

The barber's paradox arises from the following assumption and question: A barber shaves everyone who doesn't shave themselves. Does he shave himself?

As we did for the liar paradox, we check to see which is true:

Assume the barber shaves himself.
Then he doesn't have himself, since he only shaves those that don't shave themselves.
This is a contradiction.

Now assume the barber doesn't shave himself.
Then he shaves himself, a contradiction again.

Solution

So this sentence is neither true nor false. As mentioned in the liar paradox, that is not so much of a problem in English, because we really shouldn't expect all sentences to even make sense. There is no problem in imagining a barber that shaves everyone who don't shave themselves, except that he shaves himself too. Or a barber that goes to another barber. The problem is within the confines of language, physically, there is no way to create the contradiction.

The lesson to learn from this was a hard one historically. Russell's paradox is a more formal (mathematical) version of the paradox, which has to do with set theory. Imagine that a set is a box. We can have an empty box, or put boxes inside empty boxes. So denote the empty box with { }. If we have something, X, inside a box, we'll write {X}. And if we have two things, like X and Y, we'll write {X,Y} and so on. Assume we're allowed to create boxes by (perhaps infinitely) repeating the following:

The empty box {} is a box.
{X,Y,...} is a box if X,Y,... are all boxes.

So we know that {} is a box, and { {} } is a box (the box that contains an empty box). Then we can put that box inside a box: { {{}}}. But these are not the only ways we can make boxes. We can put all three of these boxes into one box: { {}, {{}}, {{{}}} }. If you're wondering why this is useful to do in mathematics, it's because from this you can create all numbers. For example, define:

0 = {}
1 = {0} = { {} }
2 = {0,1} = { {}, {{}} }
3 = {0,1,2} = { {}, {{}}, {{},{{}}} }
...

These seems all good, until you define the following box (set):

B is the box containing all boxes that do not contain themselves. Does it contain itself?

What is meant by a box that contains itself? It quite literally means that you can find a copy of the box inside the box. If you think that's impossible, it's because you're thinking only about finite boxes. If we have infinite nesting, such as {{{{...}}}}, then if we "box it", i.e. wrap it around one more { }, we can see that the box has a "copy of itself". So let X = {{{....}}}. Then {X} is a box containing itself, but so is {X,X}, {X,{}}, {X,{},{{}}}, and so on. So there's not just one box that contains itself, but an infinite number.

Now back to the paradox: Does B contain itself?
Assume it does. Then B is inside B, i.e. B = {B,...}. But B is the box containing all boxes (say X) that do not contain themselves. Since B contains B, B cannot be X here. Therefore B does not contain itself.
Now assume B does not contain B. Then, since B contains all X that do not have X, and B is precisely such an X, B contains itself.

So we have: B in B if and only if B not in B, i.e. the set contains itself if and only if it does not contain itself. It's usually said to be a contradiction, but it is technically not: you can't prove "B in B" or "B not in B", only that "B in B iff B not in B". It's only a contradiction if you assume that the sentence must be true or false.

Regardless, it is an undesirable feature of a logical system to say the least.
So these simple rules to create boxes actually lead to the possibility of creating a sentence that is neither true nor false (or both). What set theorists learnt from this is that the creation of boxes (sets) must be restricted (see Von Neumann Universe for three intuitive rules that create sets without this paradox).

Technical note: Boxes are defined so that they can contain multiple copies, e.g. { {}, {} }, whereas (non-fuzzy) sets either contain or do not contain an element. Boxes are in fact multisets. None of this changes the argument however: from sets one can create multisets, and vice versa.

Zeno's Paradox

Problem

Zeno's paradox has many forms, here's one that is easy to understand: A man is going to walk 100 meters. To reach there, he must first cross 50 meters. But to reach that, in turn, he needs to cross 25 meters. It's easy to see that when we keep halving the distances, we get increasingly smaller numbers (but never exactly zero). This suggests that the man most move "infinitely much" in order to reach the 100 meters (in fact, in order to reach anywhere at all!).

Solution

The paradox itself makes some questionable assumptions about the nature of reality. More specifically, it assumes that we can divide distances into halves indefinitely. The existence of a smallest measurable length, the Planck length, certainly puts doubt on this.

But even if we accept the premiss, one could ask where the problem really is anyway.
Is it that we find it unreasonable that he can move an infinite number of times? We have already accepted that there's infinitely much space to move in, so perhaps we should also accept that he can move infinitely much (in finite time), especially since the distances are getting smaller and smaller. Mathematically, he must move: 100/2 + 100/4 + 100/8 + 100/16 + ... = 100 meters. So if we can complete the infinite series, he'll be just fine. From this point of view, Zeno's paradox really illuminates its own assumptions:

1. Space is continuous (we have infinitely many steps to move through).
2. Time is discrete (we are not allowed to pass through all infinite steps).

Modern science suggests both are discrete, in which case there's no problem: at each discrete time step, the person moves forward a finite number of discrete steps. And as shown above, if both are continuous, there isn't any problem either.

Turning the paradox against itself, we can observe that we can indeed move, so the combination of continuous space and discrete time seems incorrect as far as reality goes.

Tuesday, April 9, 2013

Physics of Projectiles: What You Always Wanted to Learn in School

Here's what you've always wanted to know about physics but were never taught in school.

In an attempt to make things more interesting, I'll first state the conclusion, and then explain why it is so.

We'll consider a firearm bullet, but the same physics can be used for kinetic energy penetrators ("tank bullets"), asteroid impacts, rail guns, and pretty much any projectile that is not propelled by an engine. This excludes rocket propelled grenades and missiles.

Fact 1: The recoil you feel when firing your gun is exactly as big as the stagger that will be exerted on your target.


This comes from a conservation law, more precisely, conservation of momentum. Momentum is effectively "push" (the shooter calls it recoil). Momentum is mass times velocity (speed with direction).
If you double the mass of the bullet, recoil is twice as big. Likewise, if you double the speed of the bullet, recoil will be twice as big.

Conservation of momentum means that the push exerted on you by the bullet as it leaves your gun is the same as the recoil you will feel (the direction of the bullet's and your velocity have to be opposite):

bullet momentum = your momentum (recoil)

Which, according to what I just said, is:

(bullet mass) x (bullet velocity) = (your mass) x (your velocity)

A 9mm bullet weights approximately 9 grams. Muzzle velocity (speed upon exiting the muzzle of the gun) is about 365 meters per second. From this, we can calculate at what speed you would move backward when firing the gun:

(your velocity) = (bullet mass) x (bullet velocity) / (your mass)

If you weigh 75 kg, this gives a speed of 4 cm / second.

Assuming that the bullet doesn't slow down before hitting your target (which it will due to air resistance), the bullet's momentum will be the same upon impact, and again by the law of conservation, this momentum will be transferred to your target (assuming the bullet stops upon impact, otherwise only the proportion of the momentum that was lost is transferred).

Yes, this means that if your target weights 75 kg, it is going to be pushed back by precisely the same amount as the recoil you felt.

It also means that if your favorite bulletproof superhero is hit by a main tank gun, he's gonna fly back quite the distance through the skyline. The reason a tank doesn't fly back when hit is because it weighs a lot.

Momentum is not what causes damage upon impact. It is only the "push" exerted by the projectile.

Momentum = Recoil (or Stagger)

Technically, air resistance slows down a bullet, thus staggering the target even less than your recoil.

Fact 2: Transfer of Energy is what causes damage, mainly through conversion from kinetic energy (due to movement) into shock waves and heat release.

Energy, like momentum, can be derived from Newton's laws. Like momentum, kinetic energy is dependent on mass and speed, but the relationship is different:



The fact that Energy depends on the square of bullet velocity (velocity times itself) means the following:
when doubling the mass of the bullet, we double the kinetic energy (= the damage caused, as we will see soon). This is just the same as for momentum. But when doubling the speed of the bullet, we get four times more energy!

If we get ten times more mass, we get ten times more energy. But, again due to the square, if we get ten times more speed, we get 100 times (=10x10) more energy!

There is a law of conservation of energy. It is very fundamental, and it means that, assuming the bullet stops upon impact, all of the energy from the bullet will be delivered to the target. In practice, this means that the kinetic energy ("movement energy") from the bullet is entirely converted into shock waves and heat.

"Energy = Damage".

Technically, it's not really true that energy = damage, but (energy per time unit) = damage. If you convert nuclear fuel into electricity slowly, well, that's good cause it gives us heat in our homes. If you however release all that energy fast (nuclear detonation), then it's destructive. The difference being that in one case we release the energy over a long period of time, whereas in the latter we release a similar amount of energy over a very short period of time. However, all projectiles cause damage very fast, so this is not really an issue with bullets (as the example above makes clear, it does however matter when it comes to explosions).

Power = Energy per time = Damage.

Here's an interesting question: if the bullet's momentum is transferred to the shooter when firing, why isn't the energy also transferred to the shooter? The answer is that bullets are fired using a modern day version of gun powder, and the energy needed to speed up the bullet is taken from that powder (chemically stored energy).

Speeding up a bullet is thus done by converting energy stored inside the cartridge into kinetic energy. You however still feel recoil due to the bullet having mass (weight). It would in principle be possible to avoid that recoil by firing another bullet backwards at the same time, thus preserving momentum, but this is not done for obvious reasons. When (if?) if we develop handheld high energy firearms (e.g. lasers), we may get (almost entirely) recoil free weapons.

Fact 3: Firearms bullets are small and fast because that minimizes recoil while maximizing damage output.


But for now, we don't have high energy weapons, so we have an optimization problem (constraint problem) to solve:
Minimize recoil.
Maximize damage.

Which, after what we've said:
Minimize (bullet mass) x (bullet velocity) [Momentum]
Maximize (bullet mass) x (bullet velocity)^2 [Energy]

The solution should be obvious: since we much faster increase damage by increasing velocity (as opposed to mass), we should:
Minimize bullet mass.
Maximize bullet velocity.

There's one catch however: too fast traveling bullets means it could go right through the target, thus not delivery all of its kinetic energy onto the target, and risking to wound bystanders. When hunting large game, this is not usually an issue: a moose, being very thick, is likely to absorb all the kinetic energy. Some police forces uses hollow point bullets (see below), which decreases risk of firing through a target.

Ways to increase bullet mass include using larger calibers (and longer bullets) and very dense materials (such as tungsten or depleted uranium, not used in firearms). Increasing velocity includes using full metal jacket (pointy) bullets and rifling. More on these below.

Fact 4: Large Caliber = Bigger Hole = More Damage.

Sir Isaac Newton took us pretty far down the road, but there's more to the story. As you may know, caliber seems to be the single most mentioned aspect when it comes to firearms. In part this is because caliber constrains mass, but that's not all. A bigger hole causes more bleeding, which is obviously more damaging to organic targets. If you're in the forest facing a bear, the difference between a 9 mm bullet and a 12 gauge shotgun slug makes the difference between you being alive and dead.

Moreover, large calibers means more absorption of the kinetic energy upon impact, which in turns means more likelihood of not passing through (recall that if the bullet doesn't stop, not all energy will be delivered), and also that the time for transferring energy from kinetic to damage will be shorter: recall that damage = energy per second, so to maximize energy we need to transfer the energy as fast as possible.

Fact 6: Hollow Point bullets = More Damage. Pointy bullets = More Accuracy.

You've most likely seen pointy bullets in movies. Those are "standard", but not necessarily the best. Pointy means they penetrate the air better (i.e. they are less affected by air resistance). The downside, is that they can be too good at penetrating, going right through the target, thus not delivering all energy.

One solution is to use non-pointy bullets (i.e. bullets that are "hollow", so that it easily gets stuck in solids). This quickens the energy transfer (just as for large calibers above, but even more dramatically so), makes it more likely that the bullet will get stuck, but also decreases accuracy as it flies through the air less well.

A bullet type that attempts to take the best from both worlds is the plastic-tipped bullet: essentially a hollow point bullet with a plastic tip that easily falls off upon impact.

Fact 7: Rifling = More Accuracy.

The aerodynamics of a bullet can be significantly improved by imparting a spin on the bullet. This is due to the fact that bullets travel through air (sometimes water), and air is thick compared to the resistance free vacuum. Air resistance slows down the bullet, which means less kinetic energy (indeed, the energy is absorbed by the air). By spinning the bullet, it "cuts" through the air, much like a drill spins to penetrate wood or metal. Imparting spin on a bullet is achieved by simply rifling the barrel's inside.


Monday, April 1, 2013

Discoveries that Killed Entire Philosophies

EDIT: Just realized it's 1st of April today. Had I known that before I wrote this post, I may actually have tried to come up with some clever trick. However, that didn't happen: this post is completely serious, no hidden jokes or traps.

Every belief - no matter how innocent looking - implies certain conclusions. Put differently, every belief tells us something about what to expect (or not expect) in a certain situation. This makes it possible to make predictions based on beliefs, that is, to experimentally test our beliefs. This lies at the heart of the scientific method.

Yesterday's philosophy is today's science. Physics used to be called "natural philosophy". Cosmology has traditionally been considered part of religion and philosophy. Today they are quantitative science, making accurate predictions down to decimals. This transition occurred because we humans have historically underestimated how much we would one day be able to discover by rational means.

Here's some big philosophical beliefs that have been shattered by scientific discoveries.

Clockwork Universe - A Predetermined World

Birth: 1687 (Newton)
Death: 1915-1930 (Einstein, Planck, Bohr)
History:
After Newton discovered Newtonian Mechanics in 1687, it became clear from his equations that any future state of the universe could in principle be predicted from the current state. This is best explained by Laplace in 1814:

"We may regard the present state of the universe as the effect of its past and the cause of its future. An intellect which at a certain moment would know all forces that set nature in motion, and all positions of all items of which nature is composed, if this intellect were also vast enough to submit these data to analysis, it would embrace in a single formula the movements of the greatest bodies of the universe and those of the tiniest atom; for such an intellect nothing would be uncertain and the future just like the past would be present before its eyes."

A clockwork universe is thus characterized by determinism.
Einstein's discovery of Special Relativity in 1915 put doubts of this world view, as it coupled space and time, and the discovery of Quantum Mechanics finally destroyed it, as it became clear that, on atomic scale, the best can do is give probabilities for location / momentum of particles. A fundamentally non-deterministic universe had to be accepted.

The implications of Quantum Mechanics (QM) are still a matter of ongoing research (it will be so until all but one interpretation of QM has been eliminated). Bell's theorem demonstrates that reality is either a multiverse (all possible outcomes happen in separate universes) or non-local (instantaneous transfer of information between one end of the universe to the other is possible).

Divine Origin - The Gift of Life

Birth: 500 B.C. (Empedocles)
Death: 1859 (Darwin)
History:
The idea that man was created in its full form by divine power is very old. It can be traced back to the ancient Greek philosopher Empedocles, although the idea most likely existed long before that. The beginning of the end for such a world view came with Darwin's discovery that species evolve into other species in time, as they adapt to their environment. Man, as all other species, was thus created from some "seed of life".

Darwin also destroyed another idea: the belief that physical traits can be altered throughout an organism's life. Lamarckian evolution is a theory in which Giraffes grow longer necks because they keep trying to reach higher fruits (they stretch out their necks). Darwin made this idea obsolete, by correctly explaining that Giraffes with necks too short simply die out, so that long necked Giraffes become more common in the next generation.

Today, evolution can be observed directly in microorganism (as their evolution is much faster than that of larger species). The mechanism by which the information gets passed on was discovered to be the double helix DNA strands. The decoding of DNA information (genotypes) into observable traits (phenotypes) is ongoing research.

Formalism - A Dream of Mechanization

Birth: 1920 (Hilbert)
Death: 1931 (Gödel)
History:
At a time when mathematicians rose to the challenge of mechanizing all of mathematical reasoning (and, in doing so, founded what is today known as Artificial Intelligence), Hilbert had a vision: a system of rules to go from one fact to the next, thus discovery all truths. The end result? A machine that, given some input such as "does have any integer solutions?", the machine would output "yes".

This dream came to an abrupt end in 1931 when Gödel proved two things: first, the machine Hilbert envisioned would never be able to answer some questions, and, second, that machine cannot reliably answer the question "Is every answer you give me correct?". These are Gödel's famous incompleteness theorems.

One may then ask an interesting question: If the machine cannot answer some question, what would happen? We can imagine that in the best case, the machine would answer with "Don't Know". So for every question, we have three possible answers: "Yes", "No", and "Don't Know". Church and Turing later proved that even this is not possible: in the "Don't Know" case, the machine will "think" forever (this is known as the Church-Turing theorem).

Static Universe - Eternal Existence

Birth: 1917 (Einstein)
Death:1929 (Hubble)
History:
The idea of a universe that has always existed, and will always exist, is probably very old, but Einstein made it explicit when he introduced the so called "Cosmological Constant" into his equations to force the universe to be static in size (non-shrinking and non-growing). This made for great disappointment in 1929, when Hubble observed that everything is moving away from us in space - the universe is expanding!

If we reverse the expansion of the universe, we see that it all began at a single point, and so Hubble also discovered the Big Bang theory.

It was initially thought that the expansion of the universe would be slowing down due to the discovery of dark matter. This left open the possibility that the universe would one day stop expanding, and start contracting back into the big bang. Later discoveries showed the opposite of such expectations: the expansion of the universe is speeding up all the time. The reason for this is completely unknown, and called "dark energy" (or sometimes also "cosmological constant", although it needs not be constant). What is clear, however, is that the universe will thus continue expanding forever, implying higher entropy / lower temperature / lower energy per cubic meter of space. In short, the universe is dying.