Topic : C++ Performance
Author : Glen McCluskey & Associates LLC
Page : << Previous 4  Next >>
Go to page :


      #endif
        };

        #ifdef USE_ND
        A* A::freelist = 0;

        inline void* A::operator new(size_t sz)
        {
                // get free node from freelist if any

                if (freelist) {
                        A* p = freelist;
                        freelist = freelist->next;
                        return p;
                }

                // call malloc() otherwise

                return malloc(sz);
        }

        inline void A::operator delete(void* vp)
        {
                A* p = (A*)vp;

                // link freed node onto freelist

                p->next = freelist;
                freelist = p;
        }
        #endif

        A::A() {}

        A::~A() {}

        #ifdef DRIVER
        const int N = 1000;
        A* aptr[N];

        int main()
        {
                int i;
                int j;

                // repeatedly allocate / deallocate A objects

                for (i = 1; i <= N; i++) {
                        for (j = 0; j < N; j++)
                                aptr[j] = new A();
                        for (j = 0; j < N; j++)
                                delete aptr[j];
                }

                return 0;
        }
        #endif


We've also included a driver program. For this example, that recycles the memory for object instances, the new approach is about 4-5X faster than the standard approach.


When new() is called for an A type, the overloaded function checks the free list to see if any old recycled instances are around, and if so one of them is used instead of calling malloc(). The free list is shared across all object instances (the freelist variable is static). delete() simply returns a no-longer-needed instance to the free list.


This technique is useful only for dynamically-created objects. For static or local objects, the storage has already been allocated (on the stack, for example).


We have again sidestepped the issue of whether a failure in new() should throw an exception instead of returning an error value. This is an area in transition in the language.


There are other issues with writing your own storage allocator. For example, you have to make sure that the memory for an object is aligned correctly. A double of 8-byte length may need to be aligned, say, on a 4-byte boundary for performance reasons or to avoid addressing exceptions ("bus error - core dumped" on a UNIX system). Other issues include fragmentation and support for program threads.




Part VIII
Duplicate Inlines


Suppose that you have a bit of code such as:


        inline long fact(long n)
        {
                if (n < 2)
                        return 1;
                else
                        return n * fact(n - 1);
        }

        int main()
        {
                long x = fact(23);

                return 0;
        }


to compute the factorial function via a recursive algorithm. Will fact() actually be expanded as an inline? In many compilers, the answer is no. The "inline" keyword is simply a hint to the compiler, which is free to ignore it.


So what happens if the inline function is not expanded as inline? The answer varies from compiler to compiler. The traditional approach is to lay down a static copy of the function body, one copy for each translation unit where the inline function is used, and with such copies persisting throughout the linking phase and showing up in the executable image. Other approaches lay down a provisional copy per translation unit, but with a smart linker to merge the copies.


Extra copies of functions in the executable can be quite wasteful of space. How do you avoid the problem? One way is to use inlines sparingly at first, and then selectively enable inlining based on program profiling that you've done. Just because a function is small, with a high call overhead at each invocation, doesn't necessarily mean that it should be inline. For example, the function may be called only rarely, and inlining might not make any difference to the total program execution time.


Another approach diagnoses the problem after the fact. For example, here's a simple script that finds duplicate inlines on UNIX systems:


        #!/bin/sh

        nm $@ |
        egrep ' t ' |
        awk '{print $3}' |
        sort |
        uniq -c |
        sort -nr |
        awk '$1 > = 2{print}' |
        demangle


nm is a tool for dumping the symbol tables of objects or executables. A " t " indicates a static text (function) symbol. A list of such symbols is formed and those with a count of 2 or more filtered out and displayed after demangling their C++ names ("demangle" has various names on different systems).


This technique is simply illustrative and not guaranteed to work on every system.


Note also that some libraries, such as the Standard Template Library, rely heavily on inlining. STL is distributed as a set of header files containing inline templates, with the idea being that the inlines are expanded per translation unit.


Much of the time such

Page : << Previous 4  Next >>