Thursday, April 19, 2012

SCARY iterators and multimap alternative

I don't find std::multimap particularly useful, mostly because I can't order the duplicates however I want. There has been a proposal to order duplicates according to insertion order (like push_back), but this wouldn't be enough for many needs I have. Instead, I find myself rewriting a particular intance of multimap using something like std::map<Key, std::vector<T>>. Here, I'll show how to write a multimap replacement container.

The class has its own iterators. Every class that can should have them, as the generic algorithms make code vastly more elegant. I'll also take the opportunity to discuss SCARY iterators with code, since Microsoft introduced them in MSVC 11 beta (GCC already has them in libstdc++).

First, we define the multimap class itself, called mmap (for lack of imagination):

template <
class Key,
class T,
class Compare = std::less<Key>,
class Vector = std::list<T>,
class MapAllocator = typename std::map<Key,Vector,Compare>::allocator_type
>
class mmap {
typedef Vector vec_type;
typedef std::map<Key,Vector,Compare,MapAllocator> map_type;


The first three template arguments are the same as for a std::multimap.
The third is the underlying vector to be used to store all duplicates, which can be a vector, list, queue, and perhaps something more fancy (the restrictions will become obvious as we implement member functions). Perhaps a bit surprising, I have chosen list instead of vector as the default, since it guarantees that no iterator is invalidated when inserting elements, something that will carry over to mmap (erasing won't with equal_range, unfortunately).

We also need some rather mechanical work to get the proper typedefs:

 typedef Key key_type;
typedef T mapped_type;
typedef typename std::pair<const Key,T> value_type;
typedef typename map_type::size_type size_type;
typedef Compare key_compare;
typedef MapAllocator allocator_type;
typedef value_type& reference;
typedef const value_type& const_reference;
typedef typename map_type::pointer pointer;
typedef typename map_type::const_pointer const_pointer;


Here's where things get interesting, we want iterators too. Now, the iterator can either be defined as a nested class, or a separate one. What does the iterator need to contain? It needs to contain two map_type::iterator: one for the current key, and one for the end of the map. Our iterators also need to reference a certain vector element, so we need one vec_type::iterator. We don't need any vec_type::iterator that points to the end of the vector, since this is easily retrieved by the current map_type::iterator.

Clearly, our iterator class depends on the template arguments Key and T, but also on the Vector (since we need a vec_type::iterator). So we have two options here:

Either we define it as a nested class:
class mmap_iterator {
But this would mean that mmap_iterator implicitly has the same template arguments as mmap, so that the template instantiations mmap<int,double> and mmap<int,double,custom_compare> would yield two difference mmap_iterator classes. As we saw, the iterator doesn't need the comparator to be implemented, so this would seem like unnecessary bloat (modern C++ compilers won't generate dead code for member functions of mmap_iterator that we never use, but this doesn't help us much since generic algorithms include pretty much everything our iterators will be able to do anyway).

A better way would be to write our mmap_iterator class separately from mmap, and conveniently type define it as an iterator:

template <class Key, class T, class Vector> class mmap_iterator; // before class mmap
typedef mmap_iterator<Key,T,Vector> iterator; // inside class mmap


And then define it outside. This seems like a great idea, since our instances mmap<int,int> and mmap<int,int,custom_compare> can now use the same iterator class definition, namely mmap_iterator<int,double,std::list<double>>. This technique, of making iterators independent of template arguments that it doesn't need (in particular, STL comparators and allocators), is known as SCARY iterators. Our mmap class allows the flexibility of using different Vector types (std::list is the default), but this provides no overhead as long as we use the same Vector type everywhere.

Let us then define the mmap_iterator class after the mmap class definition:

template <class Key, class T, class Vector>
class mmap_iterator {
public:
// Friend declaration
template <class Key, class T, class Compare, class Vector, class MapAllocator> friend class mmap;
typedef T value_type;
typedef T& reference;
typedef T* pointer;
typedef const T& const_reference;
typedef const T* const_pointer;
typedef std::bidirectional_iterator_tag iterator_category;
// Note: use default comparator and allocator:
typedef typename mmap<Key,T,std::less<Key>,Vector>::map_type::iterator iterator;
typedef typename mmap<Key,T,std::less<Key>,Vector>::range_iterator r_iterator;
typedef typename mmap<Key,T,std::less<Key>,Vector>::map_type::difference_type difference_type;


This is mostly necessary typedefs for generic algorithms, and a friend declaration so that mmap can access the private members of mmap_iterator. The data member part of mmap_iterator looks like this:

protected:
iterator cbeg,cend; // current key, end of map
r_iterator ri; // current element
};


We can then construct a mmap_iterator by supplying the arguments:



// Constructors
mmap_iterator(iterator i, iterator e, r_iterator j) : cbeg(i), cend(e), ri(j) {}


It would be convenient to have a shorthand for end iterators, so we provide such a constructor too:

mmap_iterator(iterator e) : cbeg(e), cend(e), ri(???) {}


Here, e is intended to be the end iterator, which will then be both the current pointer (cbeg), the end pointer (cend). But what will the "range iterator" j be, that is, the iterator that points into the elements of the Vector? After all, there is no such vector now, since cbeg == cend.

One way would be to simply not assign ri to anything, but this formally leads to undefined behavior when we perform assignments (MSVC 10 in debug mode will complain). There's also an efficiency reason not to do this, which I'll mention when relevant.

So instead we declare an empty dummy vector, which we will always point to when map is at the end:

// Data members of mmap_iterator
protected:
iterator cbeg,cend;
r_iterator ri;
static Vector dummy_vec; // static, always empty. ri points to dummy_end() when cbeg == cend
};

template <class Key, class T, class Vector>
Vector mmap_iterator<Key,T,Vector>::dummy_vec = Vector(); // just an empty vector


And back to our mmap_iterator for end():

mmap_iterator(iterator e) : cbeg(e), cend(e), ri(dummy_vec.end()) {}


We are now ready to define begin() and end(). We've pretty much already covered end(), so let's define it:

template <class Key, class T, class Compare, class Vector, class MapAllocator>
auto mmap<Key,T,Compare,Vector,MapAllocator>::end() -> iterator
{
return iterator(m.end()); // use one argument version

}

And here's begin():

template <class Key, class T, class Compare, class Vector, class MapAllocator>
auto mmap<Key,T,Compare,Vector,MapAllocator>::begin() -> iterator
{
auto e = m.begin();
if (e != m.end()) {
return iterator(m.begin(),m.end(),m.begin()->second.begin());
} else {
return end(); // empty map
}
}


The iterator is created by initalizing the three data members we just discussed: the "pointer" to the first key m.begin(), the pointer to the end of keys m.end(), and the first element of the Vector (note that the vector itself is m.begin()->second): m.begin()->second.begin().

Back to the iterator class, we need to traverse it:

// Iteration
mmap_iterator& operator++() {
if (++ri == cbeg->second.end()) { // move to next list element
if (++cbeg == cend) { // move to next group
ri = dummy_vec.end();
} else {
ri = cbeg->second.begin();
}
}
return *this;
}
mmap_iterator operator++(int) { auto it = *this; ++(*this); return it; }


The code for traversing backwards is even easier, because we never reach the end:

mmap_iterator& operator--() {
if (cbeg == cend || ri == cbeg->second.begin()) {
--cbeg; // move to previous group
ri = --(cbeg->second.end());
} else {
--ri; // move to previous list element
}
return *this;
}
mmap_iterator operator--(int) { auto it = *this; --(*this); return it; }


So far so good, but we also need to compare iterators. The correct solution should be obvious:

// Comparisons

bool operator==(const mmap_iterator& it) const { return cbeg == it.cbeg && ri == it.ri; }
bool operator!=(const mmap_iterator& it) const { return !(*this == it); }


Note that if cbeg == it.cbeg, ri and it.ri always reference the same vector, the entire comparison is well defined due to short circuiting (we never reach ri == it.ri if cbeg != it.cbeg, which is important).

Note that when cbeg == it.cbeg == cend (don't write that in real code), ri and it.ri both point to the end of the dummy vector. However, had I not defined the dummy vector, and assuming there's no problem in using uninitialized iterators/variables, these comparisons would have needed modification, since ri and it.ri are uninitialized and could hence point to different things. The correct version would then be:

// Comparisons (bad way, without dummy vector)
bool operator==(const mmap_iterator& it) const
{
return cbeg == it.cbeg && (cbeg == cend || ri == it.ri); // Less efficient!
}

bool operator!=(const mmap_iterator& it) const { return !(*this == it); }

Obviously less efficient, and inelegant too.

Continuing, dereferencing is easy:

 // Dereferencing
reference operator*() { return *ri; }
pointer operator->() { return &(*ri); }

We can now define some useful members of mmap. For example, to push_front a Key/Value and get back the inserted element:

template <class Key, class T, class Compare, class Vector, class MapAllocator>
auto mmap<Key,T,Compare,Vector,MapAllocator>::push_front(const value_type& p) -> iterator
{
auto at = m.find(p.first);
if (at == m.end()) {
// First entry of this kind
at = m.insert(map_type::value_type(p.first,Vector(1,p.second))).first;
} else {
// Entries already exist, push to front
at->second.push_front(p.second);
}
return iterator(at, m.end(), at->second.begin());
}


And rvalue version wouldn't hurt:

template <class Key, class T, class Compare, class Vector, class MapAllocator>
auto mmap<Key,T,Compare,Vector,MapAllocator>::push_front(value_type&& p) -> iterator
{
auto at = m.find(p.first);
if (at == m.end()) {
// First entry of this kind
Vector v;
v.push_back(std::move(p.second));
at = m.insert(map_type::value_type(std::move(p.first),std::move(v))).first;
} else {
// Entries already exist, push to front
at->second.push_front(std::move(p.second));
}
return iterator(at, m.end(), at->second.begin());
}


push_back is similar:

template <class Key, class T, class Compare, class Vector, class MapAllocator>
auto mmap<Key,T,Compare,Vector,MapAllocator>::push_back(const value_type& p) -> iterator
{
auto at = m.find(p.first);
if (at == m.end()) {
// First entry of this kind
at = m.insert(map_type::value_type(p.first,Vector(1,p.second))).first;
return iterator(at, m.end(), at->second.begin());
} else {
// Entries already exist, push back
at->second.push_back(p.second);
return iterator(at, m.end(), --at->second.end());
}
}


equal_range is a bit trickier, since the "end" iterator will be the first element of the next key:

template <class Key, class T, class Compare, class Vector, class MapAllocator>
auto mmap<Key,T,Compare,Vector,MapAllocator>::equal_range(const Key& k) -> std::pair<iterator,iterator>
{
auto at = m.find(k);
if (at != m.end()) {
auto next = std::next(at);
if (next != m.end()) { // "default" case: key found, next element exists
return make_pair(
iterator(at,m.end(),at->second.begin()),
iterator(next,m.end(),next->second.begin()));
} else { // key found, but there is no next element
return make_pair(
iterator(at,m.end(),at->second.begin()),
end());
}
} else { // key not found
return make_pair(
iterator(end()),
iterator(end()));
}
}


Erase works as you think, but it also has to remove the Vector if it was a singleton:

template <class Key, class T, class Compare, class Vector, class MapAllocator>
auto mmap<Key,T,Compare,Vector,MapAllocator>::erase(iterator it) -> iterator
{
auto next = std::next(it);
auto& vref = it.cbeg->second;
vref.erase(it.ri);
if (vref.empty()) {
m.erase(it.cbeg);
}
return next;
}


Calculating the sizes can be done by taking std::distance(begin(),end()), but a more efficient way would be to exploit the fact that Vector already knows its size (list::size is O(1) in C++11):

template <class Key, class T, class Compare, class Vector, class MapAllocator>
auto mmap<Key,T,Compare,Vector,MapAllocator>::size() const -> size_type
{
return std::accumulate(m.begin(),m.end(),size_type(0),
[](size_type a, const map_type::value_type& p){ return a + p.second.size(); });
}



We can now test the mmap implementation:

#include "mmap.h"
#include <iostream>
#include <string>
#include <algorithm>

int main()
{
typedef mmap<std::string,std::string> mymap;
mymap m;
auto print = [&](mymap& mm){ // should be const mymap&, but we haven't defined const iterators
std::cout << "Content of map (size: " << mm.size() << "):\n";
for (auto i = mm.begin(); i != mm.end(); ++i) {
std::cout << *i << "\n";
}
std::cout << "\n";
};

m.push_back(mymap::value_type("J","Josef"));
m.push_back(mymap::value_type("A","Anna"));
m.push_back(mymap::value_type("J","Jens"));

print(m);

m.push_front(mymap::value_type("A","Anita"));
m.push_front(mymap::value_type("J","John"));

print(m);

auto rng = m.equal_range("A");
std::cout << "All entries in A:\n";
std::for_each(rng.first,rng.second,[&](const mymap::mapped_type& s){ std::cout << s << "\n"; });
std::cout << "\n";

rng = m.equal_range("J");
std::cout << "All entries in J:\n";
std::for_each(rng.first,rng.second,[&](const mymap::mapped_type& s){ std::cout << s << "\n"; });
std::cout << "\n";

// Erase all names containing 'n'
for (auto i = m.begin(); i != m.end(); ) {
if (i->find('h') == std::string::npos) {
i = m.erase(i);
} else {
++i;
}
}

std::cout << "After erasing all names without 'h':\n";
print(m);

m.clear();

std::cout << "After clearing\n";
print(m);
}


This prints:

Content of map (size: 3):
Anna
Josef
Jens

Content of map (size: 5):
Anita
Anna
John
Josef
Jens

All entries in A:
Anita
Anna

All entries in J:
John
Josef
Jens

After erasing all names without 'h':
Content of map (size: 1):
John

After clearing
Content of map (size: 0):

Now let us return to SCARY iterators. My implementation of mmap_iterator was independent of the comparator (template argument Compare) and map allocator (template argument MapAllocator). (The Vector allocator is implicitly defined in Vector, so there was no need to provide it separately.)

Now, in the test suite I use a mmap<std::string,std::string>, so it uses the default arguments for Compare and MapAllocator. But what if I use a mmap<std::string,std::string,my_compare>? My code will still use the same mmap_iterator instance: mmap_iterator<std::string,std::string,std::list<std::string>>.

Notably, my mmap_iterator has the following type definitions:
typedef typename mmap<Key,T,std::less<Key>,Vector>::map_type::iterator iterator; typedef typename mmap<Key,T,std::less<Key>,Vector>::range_iterator r_iterator;

Put differently, I'm relying on the default comparator and allocator for cbeg, cend, and ri.
If I now use mmap<std::string,std::string,my_compare>, things will not go smoothly, as cbeg is of type mmap<std::string,std::string,defaults here...>, but, since MSVC 10 doesn't support comparator independence in iterators (SCARY iterators), it generates code which looks like an attempt at illegal type conversion due to the perceived difference between using std::less<Key> and my_compare.

Let's try it:

// in main.cpp

// No trivial lambda to pointer conversion in MSVC 10, need to define functor outside
struct my_compare {
bool operator()(const std::string& s, const std::string& t) {
return std::lexicographical_compare(s.rbegin(),s.rend(),t.rbegin(),t.rend());
}
};

int main()
{
mmap<std::string,std::string,my_compare> rm;
std::for_each(rm.begin(),rm.end(),[&](const std::string& s){ std::cout << s << "\n"; });
}

Trying to compile, as expected we get:
error C2665: 'mmap_iterator<Key,T,Vector>::mmap_iterator' : none of the 3 overloads could convert all the argument types
(and a lot more of the barely comprehensible template output)

My point here is the following: there is a proposal to make SCARY iterators part of the standard, so that my code above is standard compliant. It is not now, because I'm assuming that the underlying implementation of iterators (in particular, std::map and Vector's) satisfy the "SCARY condition". MSVC 11 will implement SCARY iterators, so the code will be fine.

But look at what happens if SCARY iterators do NOT get standardized: users of C++, like me, who want to wrap some composition of STL containers into a nice class and provide iterators for access to generic algorithms, won't have the chance to provide an efficient implementation using SCARY iterators, unless the STL itself does so. Of course, one could always implement my mmap from scratch, but that's not really what C++ programming is about. The containers are meant to be used; they increase productivity.

Why not help the user increase performance and reduce bloat too, then?
I hope SCARY iterators will be standardized.

Wednesday, April 11, 2012

new/delete vs malloc/free for POD types

C++ typically allocates memory using the new/delete pair, whereas C uses malloc/free.
New/delete is necessarily when dealing with non-POD types, as the constructor/destructor needs to be invoked. However, this is not necessary on POD types, so the question I was having is: How well does C++ live up to its philosophy of "you don't pay for what you don't use" here?

This of course depends on the compiler, and I'm testing it for MSVC 10.

The first benchmark simply allocates and deallocates 100 million ints using new/delete vs malloc/free.

We also check whether the default constructor is invoked (which would effectively zero the allocated value).
Here's the result on my machine (time in seconds):

malloc/free:

Execution time [s]: 6.49
zero: 0
nonzero: 100000000

new/delete:
Execution time [s]: 6.876
zero: 0
nonzero: 100000000

The difference is rather small: malloc is about 6% faster.
It turns out that my previously made assumption of new/delete calling a default constructor for POD types (in this case, int(), to zero the value) is false, as the elements are clearly non-zero after allocation. Thanks to Antti Tuppurainen for actually looking at the assembly output to demonstrate this.

However, the fact remains that new/delete is slower than malloc/free on PODs.

I can only guess that the overhead lies in the definition of new, which requires std::get_new_handler to be called upon allocation failure until the handler fails too. Then std::bad_alloc is thrown. Due to it's support for mixing C++ with managed code, MSVC 10 has a slight overhead in entering a try block (and having a throw statement, I think), so this may explain the slight penalty. (It would be informative if someone could run the same benchmark with other compilers, e.g. GCC, which does not have an overhead when entering a try block).

On a related note, there's a subtle difference between C++98 and C++03: the latter allows you to decide whether a POD should be zeroed out when calling new or not:

struct pod {
int i;
double d;
};

pod* p = new pod; // does not zero-initialize members
pod* q = new pod(); // zero initializes members



I've tried the benchmarks with and without zero initialization:
malloc/free:

Execution time [s]: 6.483
zero: 0
nonzero: 100000000

new/delete:
Execution time [s]: 7.026
zero: 0
nonzero: 100000000

new/delete is about 8% slower here. As you can see, this is not due to zero-initialization, which is correctly handled. With zero initialization, things get a little worse:

new/delete with zero initialization (i.e. new pod() instead of new pod)

Execution time [s]: 7.543
zero: 100000000
nonzero: 0

I'll still use new/delete though: most sensible objects have constructors, because they can conveniently be initialized at the point of declaration.

It should be noted that in C++11, some structs with constructors can still be POD. It basically boils down to "if there's not C++ magic involving hidden data being inserted into the object, then it can be made POD". It would be nice to have C++ perform as well as C in dynamic memory allocation for POD types, but is there such an implementation, given that C++ needs proper exception handling?

I hope someone will do this benchmark for GCC and Intel C++.

Sunday, April 8, 2012

Wine Tasting and the Probability of Murphy's Law

When I was still in secondary school, I encountered a situation in which a person was at a wine competition - the kind where you are supposed to guess which wine is which, as opposed to ranking them based on taste - and failed to recognize all of the four wines.

It works like this: in front of you, you see four bottles and four glasses of wine. You don't know which glass of wine belongs to which bottle, only that they are poured from all four bottles. Your job is to drink the wine glasses and guess which wine is which, by assigning each wine to a unique bottle.

Assuming you don't know anything about wines, what is the probability that you will get all guesses wrong?
In other words: what's the worst case probability of complete failure?

I solved the problem with a friend by drawing a probability tree for conditional probabilities, and came to the conclusion that the probability is surprisingly high: 37.5% to be exact. One way to see this is as follow:
Assume the bottles have labels 1, 2, 3, 4. You now need to put them in the right order, which we may simply assume to be 1234 (of course, the wine taster doesn't know that, and in practice labels are names, not numbers).

To make things a bit simpler, consider the case where we have 3 bottles of wine:
Here's all the possible 3!=6 permutations:
123, 132, 213, 231, 312, 321.
Of these, here's the ones that are completely deranged, that is, every label has been put at the wrong place:
231, 312.
So we have 2 derangements out of 6 possible permutations, and the probability of getting them all wrong is 2/6 = 33%.

This approach works for solving our case with 4 bottles, but it does not generalize to more bottles: for every situation, we need to recompute all permutations, and then find all derangements by trying them all. Except for being extremely tedious (computationally expensive), it does not give much clarity into the problem.

For example, if our wine competition has 10 bottles, what's the probability of guessing them all wrong?
There are 10!, which is more than 3.6 million permutations, of which we need to manually check whether it's deranged. That's gonna take a while, even with a computer program.

Besides from being inefficient and uninsightful, there's a third problem with this approach: it does not tell us what happens when the number of bottles increase to large numbers. Is it almost surely the case that we are doomed to fail? Or succeed? With 3 bottles, our probability of complete failure is 33%. With 4 bottles, it is 37.5%. So you might think that it keeps increasing up to 100%, that is, almost surely complete failure.

Here's a solution to the general problem which is pretty straight forward. It does not assume you know about the inclusion/exclusion principle (which will otherwise give a short and elegant solution too); instead it produces a recursive formula which is then simplified.

Let's call the number of derangements, given k bottles, d(k). If we have only one bottle, we can't guess it wrong, so d(1) = 0. If we have two bottles, there's one way to guess it right (12) and one to guess it wrong (21), so d(2) = 1.

Now, consider the following case: for our first choice, we drink wine 1 and guess it belongs to the bottle numbered c. There are k bottles, but we must get it all wrong, so we can't assign c = 1. Hence we have k-1 choices. Now, it doesn't matter in which order we drink and guess the wines (we just need to get them all wrong), so let's say we drink glass c next.

We will now break it down into two cases, so we have the following formula so far:

$d(k) = (k-1)(simpleCase + otherCases)$

The simplest case is if we guess that glass c belongs to bottle 1, so that the two errors cancel each other out. This observation is illuminating, because it shows what derangements are all about: once we get one bottle wrong, we necessarily get another one wrong too, since we are mixing up at least two bottles. For example, if you guess that wine 1 belongs to bottle 2, then not only is that a mistake, but you have no chance of guessing that wine 2 belongs to bottle 2, since you've already picked bottle 2.
In the simplest case, you are guessing that wine 2 belongs to bottle 1, so that you swapped (mixed up) bottle 1 and 2. Now, none of the other bottles are affected by this mistake, so after having mixed up two bottles, you end up in a situation identical to what you had in the beginning, except there's now k-2 bottles left:

$d(k) = (k-1)(d(k-2) + otherCases)$

In the other cases, we did not swap two bottles. Instead, what should have been the first bottle became bottle c, and wine c is paired with any other bottle but the first one. We are thus propagating our error forward. What's interesting here is the following: we have already chosen the wrong bottle for the first wine (bottle c instead of 1). If wine c is not going to be paired with the first bottle, there's precisely one option which c cannot have: c itself (since we would not have a derangement then). c can go with any other bottle, as long as it is not c (and we already know it is not 1, since that belongs to the simple case).
But this is precisely true of every other wine: we can select any bottle for them, except the correct one. So we have the same situation as in the beginning, except with k-1 bottles left:

$d(k) = (k-1)(d(k-2) + d(k-1))$

This is a recursive formula that allows us to compute the number of derangements for any number of bottles in a much faster way that enumeration. Since d(1)=0 and d(2)=1, we can retrieve our previous result for 3 bottles in the following way:

$d(3) = 2(d(1) + d(2)) = 2$

And for my high school problem:

$d(4) = 3(d(2)+d(3)) = 9$

The probability of getting them all wrong is  $d(4)/4! = 0.375$.

This formula is nice, but still quite slow: to get the number of derangements for a certain number of bottles k, we need to compute all previous number of derangements. A better way would be to get rid of the recursion entirely, and we can do this. Rewrite

$d(k) = (k-1)(d(k-2) + d(k-1))$

into the equivalent

$d(k)-kd(k-1) = -\big ( d(k-1)-(k-1)d(k-2) \big )$

Now, if we set the left hand side to a function f(k), we see that:

$f(k) = -f(k-1)$

Since  $f(2) = d(2)-2d(1) = 1$,  we get an alternation between 1 and -1. This can be written as

$f(k) = (-1)^k$

So we now have the simplified formula:

$d(k)-kd(k-1) = (-1)^k$

This relation is a lot nicer than the previous one, although it is still recursive. To solve it, we note that the left hand side would be a telescope sum, if it wasn't for the factor k. This annoyance is easily remedied by dividing everything by k!:

$\frac{d(k)}{k!}-\frac{d(k-1)}{(k-1)!} = \frac{(-1)^k}{k!}$

We can now sum both sides from k = 2 to n (or equivalently, use a characteristic function):

$\sum_2^n \left( \frac{d(k)}{k!}-\frac{d(k-1)}{(k-1)!}\right) = \frac{d(n)}{n!}-\frac{d(1)}{1!} = \sum_2^n \frac{(-1)^k}{k!}$

Since d(1)=0 and the terms for k=0 and k=1 cancel out, we arrive at an elegant solution:
$\frac{d(n)}{n!} = \sum_0^n \frac{(-1)^k}{k!}$

This is a non-recursive formula that computes the derangements (or probability of getting them, depending on which side we put n!) efficiently. Moreover, we note that the right hand side is the partial sums of the maclaurin expansion of 1/e. So as n tends to infinity (that is, as the number of wine bottles grows to large numbers), we see that

$\frac{d(n)}{n!} \rightarrow \frac{1}{e} \approx 0.37$

In words: as the number of bottles grows, our risk of complete failure approaches 37%.

This is a good solution to the problem, but we can in fact do even better than this. The formula for d(n) is non-recursive but it still needs to sum across all k=0,...,n and compute factorials. Going back to the maclaurin expansion, we can estimate the error bound using the Lagrange remainder:

$n!\left|\frac{1}{e}-\sum_0^n \frac{(-1)^k}{k!}\right| \leq n! \frac{e^\epsilon}{(n+1)!} \leq \frac{1}{n+1} \leq 0.5$

The error between using the sum and simply taking n!/e is less than one half, so we can compute the derangements by simply rounding to the nearest integer:
$d(n) = round\left(\frac{n!}{e}\right)$
This shows that the number of derangements is always close to a third (= 1/e) of the number of permutations.

For example:

$d(4) = round(8.8\ldots) = 9$, and

$\frac{d(4)}{4!} = 9/24 = 0.375$

This is the original problem with 4 bottles: the risk of being really embarrassed is 37.5%.

If we have 10 bottles, the number of ways to get everything wrong is:

$d(10) = round(10!/e) = round(1334960.92) = 1334961$

And the probability of total embarrassment is:

$\frac{d(10)}{10!} = 1334961/10! \approx 0.368$

As expected, the probabilities are all quite close to the limit of 37%.

It would seem then, the probability of everything going completely wrong is an almost constant 37%. No wonder it has a name: Murphy's Law.