While working on the new graphics and scene subsystems I eventually came to the point where some math code was needed (managing the view, world and projections transforms and their combinations, and flattening matrix hierarchies into world space). My original plan was to create a low level functional math library which looks much like HLSL and uses SSE intrinsics for performance. I started this stuff and soon it became clear that it would be quite an undertaking to correctly implement and test all this. And then there would only be an SSE implementation, when there's still SSE2..4, 3DNow around, and completely different intrinsics on the Xbox360 and other platforms. Sure, the problem could be solved by throwing manpower at it. But that's never a good idea for solving programming problems. So I looked around for a more pragmatic solution and found it in the form of the D3DX math library. The D3DX math functions are very complete, specialized for games, and support all current (and presumably future) vector instructions sets under the hood, and the 360 math library basically offers the same feature set (although it's much more down to the metal).

There are 2 disadvantages:
  • additional calling overhead since D3DX functions are not inlined
  • not portable to other platforms except DirectX and Xbox360
But I can live with this because it saves me a lot of work *now*, and porting the math library to another platform with platform-specific optimizations is just as much work with the wrapper approach then with the self-made approach.

There a few other aspects to consider:
  • With C++ math code, performance shouldn't get into the way of convenience. For instance, a proper operator+() always costs some performance because a temporary object must be constructed (the return value). But it's much more convenient and readable to use C++ operator overloading in generic game code instead of using (for instance) intrinsics. The point is to pay special attention to inner loops and use lower level code there when it actually makes sense.
  • There should only be very few places in Nebula where heavy math code is actually executed on the CPU (in Nebula2 these are: particle systems, animation code, computing shadow caster geometry for skinned characters). In Nebula3 these tasks are either offloaded to the GPU, or will be obsolete). In general, the CPU should NEVER touch geometry per-frame and per-vertex.
While I was rewriting the whole thing, I also did one other radical change which I wanted to do all the time, but wasn't possible in Nebula2 because a lot of existing code would be broken: default constructors no longer initialize the low level math objects. Controversial, I know. But I want to see how it turns out in practice. Math constructors tend to show up quite a lot during profiling, the more light weight they are the better.

Another basic change I wanted to do for some time was to differentiate between points and vectors. There is now a Math::point and a Math::vector class which both derive from the generic Math::float4 class. A point describes a position in 3d space and a vector describes a direction and length in 3d space, generalized to homogeneous 4d space (the w component of a point is always 1.0, the w component of a vector is always 0.0). By creating 2 different classes with the right operator overloading one can encode the computation rules for points and vectors right into the C++ code, so that the compiler throws an error when the rules are violated:

  • (point + point) is illegal
  • (point * scalar) is illegal
  • point = point + vector
  • vector = point - point
  • vector = vector + vector
  • vector = vector - vector
  • vector = vector * scalar
  • etc...
Also, class interfaces become much clearer because the programmer immediately knows whether an argument is expected to be a point or a vector.

So the new math lib basically looks like this:

The following low level classes directly call D3DX functions:
* matrix44 (D3DXMatrix functions)
* float4 (D3DXVec4 functions)
* quaternion (D3DXQuaternion functions)
* plane (D3DXPlane functions)

All other classes (like bbox, sphere, line, etc...) are generic and use functionality provided by the above low level classes. There is also a new scalar type (which is just a typedef'ed float), which helps in porting to some platforms (for instance, on NintendoDS, all math code is fixed point, so a scalar would be typedef'ed from one of the fixed point types). I still have to write a complete set of test and benchmark classes for the math library, but for now I'm quite happy that a very big chunk of work has been reduced to about 2 days of implementation time.