Topic : Performance Programming Applied to C++
Author : Joris Timmermans
Page : 1 Next >>
Go to page :

Performance Programming Applied to C++
by Joris Timmermans (aka MadKeithV)


We all want speed in this business - whether we like to admit it or not. Frames Per Second ratings are thrown around like there is no tomorrow, and everyone in the PR department is always ranting about how their new engine is "faster" than everyone else's and can do "more" as well.

I'm not here to talk about how to make your code "faster" than anyone else's. I just want to teach you to make it faster, and more efficient, than the code you usually write.

I'd like to cover three things that are intricately tied together:

- Code Execution Time
- Code / Program Size
- Programming effort

I'm a strong believer in keeping a balance between those three, and only in certain cases should you sacrifice parts of the latter two to improve the first.

In this article, I'm going to point out some things that may help you improve the efficiency and performance of your code. I will start with the easiest ways of optimizing and work my way to more complex and involved techniques. First, I'll begin with what may seem a not-so obvious point : the compiler.

For the more advanced programmers among you, please keep in mind that I'm trying to write as simple as possible, without cluttering my text with too many details.

Part 1 : Know your tools

It may seem trivial, but how much do you REALLY know about your compiler? Do you know what processors it can generate code for? Do you know what kinds of optimizations it can perform? Do you know about its language incompatibilities?

It helps to know these things when you are implementing anything, and specially when you are trying to make something that goes fast.

For example - in a recent question on the GameDev boards, someone asked about Microsoft Visual C++'s "Release Mode". This is a standard compiler option, if you use this specific compiler, you should know what it does. If you don't, you've spent a lot of money on something you don't fully know how to use. In short, it removes all the debugging code, and performs any code optimizations the compiler might have, producing a much smaller executable, that runs faster. There is slightly more to it than that, but if you're interested, read the documentation that comes with the compiler.

See - if you didn't know this already, I've just handed you a way to make your code a lot faster, without you having to program a single thing!

The target platform is also important. These days, the lowest you would aim for is probably still an Intel Pentium processor, but if you're using a 10-year old compiler, it's not going to give you Pentium-optimized code. Getting a more recent compiler may really improve the speed, once again without making a single code change.

Some other things to keep in mind: does your compiler have a profiling tool? If you don't know, you can't expect to make fast code. If you don't know what a profiling tool IS, you need to learn. A profiling tool is something that tracks where the execution time goes in your program. You run your code using the profiler, perform some operations with it, and after you exit your application, you get back a report of the times spent in each function. You can use this to find your speed "bottlenecks", the parts in your code where you spend the most time. Optimizing these parts is much more effective in raising the overall speed of your application than randomly applying optimizations everywhere.

Don't say "But I know where my bottlenecks are!" They can be very unexpected, specially when working with third-party APIs and libraries. I ran into a problem like that only a few weeks ago, when it turned out a silly state change that occurred (unnecessarily) every frame, was taking up 25% of total execution time. A single line of code adding a test to see if that state was already set, dropped that function out of the top 50 list of most expensive functions in my profiling.

Using the profiler is deceptively simple, in most cases. Interpreting the results isn't always that simple. You have to try to identify the critical path in your application, that is the path in which most of the application time is spent. Optimizing that path will result in a noticeable performance improvement for your users.

An example would be, where loading a particular file shows up as the largest single time expenditure on a function, but you know it only happens once, at load-time of your application. Optimizing this function might save you a few seconds in the total run-time of the program, but it won't result in a performance increase during "normal" use. In fact, it's safe to say that your profiling run wasn't long enough, because during normal operation, the time taken in that function with respect to the total run-time for the application would get less and less, while your critical-path functions would float up to the top of the list gradually.

I hope that gives you a start in using these tools.

The profiling tool is definitely A Good Thing. Use it.

If you don't have a profiler, there is at least one that you can try out for free, and that's Intel's VTune profiler. It's on a one-month trial basis.

You can get it here ( I haven't used it yet, but as soon as I have time to try it, I'll try to write up some pointers on how to use it.

In the next sections, I'm moving on to letting your C/C++compiler do what you want it to do.

Part 2: Inlining, and the inline keyword

What is inlining? I'll answer that through the description of the inline keyword.

It tells the compiler to "expand the function inline", which is a lot like the way that macros (#define's) work in C and C++, but with a small difference. Inline functions are type-safe, and subject to further compiler optimization. This is a REALLY good thing, because you'll have the speed of a macro (that means you won't suffer from function-call overhead ), with the type-safety of a function, with a bunch of other benefits.

What are those benefits? Well, most compilers can only optimize code in a single module at a time. That's usually a single .h/.cpp combination. Using inline functions means that those functions will actually end up within the same module as the calling module, enabling certain optimizations, such as eliminating return-value copying, eliminating superfluous temporary variables, and a host of other possibilities. If you want to learn more, have a look at my references, specifically the performance C++ book, at the end of this article.

The dreaded inline keyword. I have to mention this, because there seem to be a lot of misconceptions about it. The inline keyword does not force the compiler to inline that particular function, but rather gently requests it. Quoting from MSDN:

"The inline keyword tells the compiler that inline expansion is preferred. However, the compiler can create a separate instance of the function (instantiate) and create standard calling linkages instead of inserting the code inline."
Things that usually make the compiler ignore your request are: using loops in the inline function, calling other inline functions from within the function, and recursion.

The above quote also hints at something else: the linkage for a function declared inline is internal. That means that your linker will choke on an inline function implemented in another object file, making them far less useful than I'd like. In human language, using ye olde declaration in .h file and implementation in .cpp file doesnĄ¯t work with inline functions, generally. The ANSI standard has a way to do it, but unfortunately, Visual C++ does not implement it.

So, you ask, what is the solution? Simple, implement the inline functions within the same module. This is easy enough to do - just write the entire function in the .h file, and include it wherever you need to use the function. Not as clean as you might want, but it works.

I don't like it much, in the interest of implementation hiding (I'm an Object Orientation freak), but I do use it in a lot of my classes lately. The good part is, I don't have to write in the inline keyword - if you write a function entirely within a class declaration, the compiler will automatically attempt to inline it. This way, I've ended up with entire classes contained in only a header file, since all of the functions needed to be inline. I'd suggest you ONLY do this when you really need the speed, and when you're not about to share the code with too many people - because the visibility of the implementation may lead to very annoying assumptions from the rest of your team. Trust me, it happens.

Part 3: To

Page : 1 Next >>