Thursday, February 11, 2010

GDI leak hunting season

In addition to my create an installer task, the past several days I've been tasked with finding an extremely nasty bug in our project. The game would run fine for several minutes, however inevitably it would begin to exhibit some strange behavior, such as random text strings no longer rendering or the entire game window, or even monitor display, flickering wildly. Obviously QA would never allow us to ship a product in such a condition, so I had to dive in and figure out what was going on.

To start off, I've previously never delved very deeply into the Windows API. The beautiful thing about working with an existing game engine is most of the low-level OS interactions are abstracted away, so you only need to worry about implementing new, cool features on top of that layer. Unfortunately for this project, we needed to do some very low level interaction with the Windows API in order to fulfill a feature request by our client. In this case, we were interacting with the Windows Graphics Device Interface API to handle some custom font rendering. We made some good initial progress on the feature implementation, however the programmer working on it had to suddenly take some personal leave time for family reasons. This left us in a lurch because the feature implementation, while working as a prototype, wasn't fully completed, cleaned-up and optimized yet. One of the nasty caveats with Windows GDI Object handles is there are a limit number of handles you can have active (~10000 on my XP system). Once you reach that number, the OS appears to start doing some very odd things, which causes things to disappear and applications to start flickering.

I'll save you the boredom of describing my late nights looking over code and reading GDI documentation, but suffice to say it was a late couple nights :). Thinking back now, I could kick myself for not doing some obvious things until today. My breakthrough today came when I finally looked up online some additional resources about GDI memory leaks and how to find them. There are some interesting articles and techniques for finding GDI leaks, however I felt they were all overkill for our particular situation. Most of those techniques were for finding a leak *somewhere*, however in our case I knew we had a leak in a very specific area of our code (about 400 lines altogether). The real breakthrough came when I found out you can add a 'GDI Object' column in the Windows Task Manager (View->Select Columns), and watch in real time as the count goes up and down for a particular application. On the surface this doesn't appear incredibly useful finding a leak *somewhere*, however as I said above, I knew the leak was happening in a very specific area. So I fired up the trusty debugger and began stepping through the suspect code line-by-line while watching the GDI Object count; and after several minutes I found the offending item. A CreateSolidBrush() call hidden deep within an if block was allocating a new handle, but never deleting it. After I fixed the leaking object I ran the game again and confirmed our GDI Object count was now steady.

Victory!

No comments:

Post a Comment