Lowlevel Optimizations
I'm currently doing memory optimizations in Drakensang, and together with the new ideas from the asynchronous rendering code in N3 I'm going to do a few low-level optimizations in the memory subsystem over the next time in Nebula3. Here's what I'm planning to do:
The general strategy is to get away as far as possible from the general Alloc()/Free() calls, and to make memory management more "context sensitive" and also more static (in the sense that the memory layout doesn't change wildly over the runtime of the game). It's extremely important for a game application to set aside fixed amounts of memory right from the beginning of the project (and to let the game fail hard if the limits are violated), otherwise everything will grow without control, and towards the end of the project much time must be invested to cut everything back to reasonable limits.
- Add a thread-safe flag to the Heap class, currently a heap is always thread-safe, but there will be quite a few cases now where it makes sense to not have the additional thread-safety-overhead in the allocation routines.
- Add some useful higher-level allocators:
- FixedSizeAllocator: This optimizes the allocation of many small same-size objects from the heap, it will pre-allocate big pages, and manage the memory within the pages itself. The main-advantage comes from the fact that all blocks in the page are guaranteed to be the same size.
- BucketAllocator: This is a general allocator which holds a number of buckets for e.g. 16, 32, 48, ...256 byte blocks (the buckets are just normal FixedSizeAllocators). Small allocations can be satisified from the buckets, larger allocation go directly through the heap as usual.
- Overwrite the new-operator in all RefCounted-derived classes to use the a BucketAllocator (however, I'll actually do some profiling whether this is actually faster then Windows' Low Fragmentation Heaps). This stuff will be behind the scenes in the DeclareClass()/ImplementClass()-macros, so there are no changes necessary to the class source code.
- thread-local classes would create their instances from a thread-local BucketAllocator which doesn't have to be thread-safe
- thread-local classes could use normal increment and decrement operations for their refcounting instead of Interlocked::Increment() and Interlocked::Decrement(). Since every Ptr<> assignment changes the refcount of an object this may add up quite a bit.
The general strategy is to get away as far as possible from the general Alloc()/Free() calls, and to make memory management more "context sensitive" and also more static (in the sense that the memory layout doesn't change wildly over the runtime of the game). It's extremely important for a game application to set aside fixed amounts of memory right from the beginning of the project (and to let the game fail hard if the limits are violated), otherwise everything will grow without control, and towards the end of the project much time must be invested to cut everything back to reasonable limits.