Wednesday, April 11, 2012

new/delete vs malloc/free for POD types

C++ typically allocates memory using the new/delete pair, whereas C uses malloc/free.
New/delete is necessarily when dealing with non-POD types, as the constructor/destructor needs to be invoked. However, this is not necessary on POD types, so the question I was having is: How well does C++ live up to its philosophy of "you don't pay for what you don't use" here?

This of course depends on the compiler, and I'm testing it for MSVC 10.

The first benchmark simply allocates and deallocates 100 million ints using new/delete vs malloc/free.

We also check whether the default constructor is invoked (which would effectively zero the allocated value).
Here's the result on my machine (time in seconds):

malloc/free:

Execution time [s]: 6.49
zero: 0
nonzero: 100000000

new/delete:
Execution time [s]: 6.876
zero: 0
nonzero: 100000000

The difference is rather small: malloc is about 6% faster.
It turns out that my previously made assumption of new/delete calling a default constructor for POD types (in this case, int(), to zero the value) is false, as the elements are clearly non-zero after allocation. Thanks to Antti Tuppurainen for actually looking at the assembly output to demonstrate this.

However, the fact remains that new/delete is slower than malloc/free on PODs.

I can only guess that the overhead lies in the definition of new, which requires std::get_new_handler to be called upon allocation failure until the handler fails too. Then std::bad_alloc is thrown. Due to it's support for mixing C++ with managed code, MSVC 10 has a slight overhead in entering a try block (and having a throw statement, I think), so this may explain the slight penalty. (It would be informative if someone could run the same benchmark with other compilers, e.g. GCC, which does not have an overhead when entering a try block).

On a related note, there's a subtle difference between C++98 and C++03: the latter allows you to decide whether a POD should be zeroed out when calling new or not:


struct pod {
  int i;
  double d;
};

pod* p = new pod; // does not zero-initialize members
pod* q = new pod(); // zero initializes members



I've tried the benchmarks with and without zero initialization:
malloc/free:

Execution time [s]: 6.483
zero: 0
nonzero: 100000000

new/delete:
Execution time [s]: 7.026
zero: 0
nonzero: 100000000


new/delete is about 8% slower here. As you can see, this is not due to zero-initialization, which is correctly handled. With zero initialization, things get a little worse:

new/delete with zero initialization (i.e. new pod() instead of new pod)

Execution time [s]: 7.543
zero: 100000000
nonzero: 0

I'll still use new/delete though: most sensible objects have constructors, because they can conveniently be initialized at the point of declaration.

It should be noted that in C++11, some structs with constructors can still be POD. It basically boils down to "if there's not C++ magic involving hidden data being inserted into the object, then it can be made POD". It would be nice to have C++ perform as well as C in dynamic memory allocation for POD types, but is there such an implementation, given that C++ needs proper exception handling?

I hope someone will do this benchmark for GCC and Intel C++.


2 comments:

  1. Hi John,
    I know this post is already very old, but I actually did the measurements with GCC 4.8.2 and ICPC 14.0.2 on Kubuntu 14.04 on a Intel Core I7 - 2670QM@2.2 GHz.

    My Code can be found here: http://pastebin.com/tj3G1Jjm
    (Watch out, it needs 4 GB RAM)

    I did eleven measurements and threw away the first one. Because of the TurboBoost-Technology the CPU needs some 'warm-up' time to run with maximum clock frequency.

    I took the average from the remaining ten results and computed the standard error to a confidence level of 95%. In the results, 1.345 (12) means 1.345 plus / minus 0.012.

    The numbers left from the vertical line correspond to GCC, the ones right to ICPC. The first numbers correspond to allocation, respectively the second ones to deletion. Unfortunately, it's very hard to format a readable table with this comment box.

    Method: GCC alloc / delete | ICPC alloc / delete
    new int: 1.763 (10) 1.531 (12) | 1.783 (5) 1.4956 (29)
    new int(): 1.773 (6) 1.426 (5) | 1.766 (5) 1.568 (4)
    malloc: 1.597 (7) 1.310 (7) | 1.600 (5) 1.3122 (18)

    Obviously, malloc and free perform slightly faster than new and delete.
    A remarkable (and reproducable) result is also, that with GCC deletion becomes faster with 'new int()' instead of 'new int' while with ICPC it's quite the reverse.

    Have fun with the numbers.

    Bye Wolfgang

    ReplyDelete