A few memory debugging tips which I found useful during the past few weeks:
  • Always look at ALL process heaps when doing memory debugging. Turns out we still had a rather obvious memleak in Drakensang. It didn't show up in our own memleak dumps because it happened in an external heap created by SpeedTree, and we can only track allocations going through our own memory subsystem and through the CRT. After dumping a summary of all heaps returned by GetProcessHeaps() before and after loading a level one of the "external" heaps showed a memleak of up to 10 MB per level-load! The heap disappeared after compiling without SpeedTree-support, so the culprit was easy to identify (of course it wasn't SpeedTree's fault, but a bug in our own tree instancing code, which was fixed in half an hour, and admittedly it was a very obscure special case to not have showed up as a memleak in our own dumps as well).
  • Use memory allocation hooks in 3rd party libs if they support it. In fact I wish all libs would support a mechanism to re-route memory allocations to my own routines, if only for debugging reasons. There's no way to track memory allocations happening inside XACT for instance (that I'm aware of?!).
  • Write an automatic stress test for your app early in the project. We have started a while ago to run continuous playthrough sessions with a hot-seat system where our testers hammer the same game-session 24/7. The more subtle memory leak bugs often only trigger after 10 or 20 hours of continuous playtime. Despite this continuous testing I also wrote a little stress test mode, where the game loads a level, lets the level run for 30 seconds and then load another level, ad infinitum over night. This may amplify bugs which happen during load time, and may attenuate bugs happening during normal game play (the SpeedTree related memleak mentioned above went critical much earlier in the stress-test as in normal gameplay-sessions). A water-proof generic record/replay mechanism in the engine would be helpful as well (we don't have that in Drakensang, but this may be a feature we might look into in the future).
  • A fixed memory layout is actually helpful on the PC as well. Drakensang doesn't have this, but I'm becoming more and more a fan of a fixed memory layout. Set aside N megabytes for C++ objects, another chunk of memory as temporary load/save buffer, fixed memory buffers for the different resource types, allocate those blocks at the beginning of the game either as non-growable heaps, or as pre-allocated virtual memory blocks with custom-taylored memory management - and let the game crash if any of the heaps is exhausted. The main reason why this is a good idea is not so much finding memory leaks, but to prevent resource usage to grow out of control during development.
I have collected a lot of ideas for the Nebula3 memory subsystem which I will play around with as time permits.