Nebula3's Multithreaded Rendering Architecture

Alright! The Application Layer is now running through the new multithreaded rendering pipeline.

Here's how it works:

The former Graphics subsystem has been renamed to InternalGraphics and is now running in its own "fat thread" with all the required lower-level Nebula3 subsystems required for rendering.
There's a new Graphics subsystem running in the application thread with a set of proxy classes which mimic the InternalGraphics subsystem classes.
The main thread is now missing any rendering related subsystems, so trying to call e.g. RenderDevice::Instance() will result in a runtime error.
Extra care has been taken to make the overall design as simple and "fool-proof" as possible.
There's very little communication necessary between the main and render threads. Usually one SetTransform message for each graphics entity which has changed its position.
Communication is done with standard Nebula3 messages through a single message queue in the new GraphicsInterface singleton. This is an "interface singleton" which is visible from all threads. The render thread receives messages from the main thread (or other threads) and never actively sends messages to other threads (with one notable exception on the Windows platform: mouse and keyboard input).
Client-side code doesn't have to deal with creating and sending messages, because it talks through proxy objects with the render thread. Proxy objects provide a typical C++ interface and since there's a 1:1 relationship may cache data on the client-side to prevent a round-trip into the render thread (so there's some data duplication, but a lot less locking)
The Graphics subsystem offers the following public proxy classes at the moment:

Graphics::Display: setup and query display properties
Graphics::GraphicsServer: creates and manages Stages and Views
Graphics::Stage: a container for graphics entities
Graphics::View: renders a "view" into a Stage into a RenderTarget
Graphics::CameraEntity: defines a view volume
Graphics::ModelEntity: a typical graphics object
Graphics::GlobalLightEntity: a global, directional light source
Graphics::SpotLightEntity: a local spot light

These proxy classes are just pretty interfaces and don't do much more then creating and sending messages into the GraphicsInterface singleton.
There are typically 3 types of messages sent into the render thread:

Synchronous messages which block the caller thread until they are processed, this is just for convenience and only exists for methods which are usually not called while the main game loop is running (like Display::GetAvailableDisplayModes())
Asynchronous messages which return immediately but pass a return-value back at some later time. These are non-blocking, but the result will only be available in the next graphics frame. The proxy classes do everything possible to hide this fact by either caching values on the client side, so that no communication is necessary at all, or by returning the previous value until the graphics thread gets around to process the message).
The best and most simple messages are those which don't require a return value. They are just send off by the client-side proxy and processed at some later time by the render thread. Fortunately, most messages sent during a frame are of this nature (e.g. updating entity transforms).

Creation of Graphics entities is an asynchronous operation, it is possible to manipulate the client-side proxy object immediately after creation even though the server-side entity doesn't exist yet. The proxy classes take care about all these details internally.
There is a single synchronization event per game-frame where the game thread waits for the graphics thread. This event is signalled by the graphics thread after it has processed pending messages for the current frame and before culling and rendering. This is necessary to prevent the game thread from running faster then the render thread and thus spamming its message queue. The game thread may run at a lower - but never at a higher - frame rate as the render thread.

Here's some example code from the testviewer application. It actually looks simpler then before since all the setup code has become much tighter:

using namespace Graphics;
using namespace Resources;
using namespace Util;

// setup the render thread
Ptr<GraphicsInterface> graphicsInterface = GraphicsInterface::Create();
graphicsInterface->Open();

// setup and open the display
Ptr<Display> display = Display::Create();
// ... optionally change display settings here...
display->Open();

That's all that is necessary to open a default display and get the render thread up and running. The render thread will now happily run its own render loop.

To actually have something rendered we need at least a Stage, a View, a camera, at least one light and a model:

// create a GraphicServer, Stage and a default View
Ptr<GraphicsServer> graphicsServer = GraphicsServer::Create();
graphicsServer->Open();

Attr::AttributeContainer dummyStageBuilderAttrs;
Ptr<Stage> stage = graphicsServer->CreateStage(StringAtom("DefaultStage"),
Graphics::SimpleStageBuilder::RTTI,
dummyStageBuilderAttrs);

Ptr<View> view = this->graphicsServer->CreateView(InternalGraphics::InternalView::RTTI,
StringAtom("DefaultView"),
StringAtom("DefaultStage"),
ResourceId("DX9Default"),
true);

// create a camera and make it the active camera for our view
Ptr<CameraEntity> camera = CameraEntity::Create();
camera->SetTransform(matrix44::translation(0.0f, 0.0f, 10.0f));
stage->AttachEntity(camera.cast<GraphicsEntity>());
view->SetCameraEntity(camera);

// create a global light source
Ptr<GlobalLightEntity> light = GlobalLightEntity::Create();
light->SetTransform(matrix44::rotationx(n_deg2rad(-70.0f)));
stage->AttachEntity(light.cast<GraphicsEntity>());

// finally create a visible model
Ptr<ModelEntity> model = ModelEntity::Create();
model->SetResourceId(ResourceId("mdl:examples/eagle.n2"));
stage->AttachEntity(model.cast<GraphicsEntity>());

That's the code to setup a simple graphics world in the asynchronous rendering case. There are a few issues I still want to fix (like the InternalGraphics::InternalView::RTTI thing). The only thing that's left is to add a call to GraphicsInterface::WaitForFrameEvent() somewhere into the game-loop before updating the game objects for the next frame. The classes App::RenderApplication and App::ViewerApplication in the Render layer will actually take care of most of this stuff.

There's some brain-adaption required to work in an asynchronous rendering environment:

there's always a delay of up to one graphics frame until a manipulation actually shows up on screen
it's hard (and inefficient) to get data back from the render thread
it's impossible for client-threads to read, modify and write-back data within one render-frame

For the tricky application specific stuff I'm planning to implement some sort of installable client-handlers. Client threads can install their own custom handler objects which would run completely in the render-thread context. This is IMHO the only sensible way to implement application specific graphics functionality which requires exact synchronization with the render-loop.

I've had to do a few other changes to the existing code base for the asynchronous rendering to work: Mouse and keyboard events under Windows are produced by the application Windows (which is owned by the render thread), but the input subsystem lives in the game thread. Thus there needs to be a way for the render thread to communicate those input events into the main thread. I decided to derive a ThreadSafeDisplayEventHandler class (and ThreadSafeRenderEventHandler for the sake of completeness). Client threads can install those event handlers to be notified about display and render events coming out of the render-thread.

The second, bigger, change affected the Http subsystem. Previously, HttpRequestHandlers had to live in the same thread as the HttpServer, which isn't very useful anymore now that important functionality has been moved out of the main thread. So I basically moved the whole Http subsystem into its own thread as well, and HttpRequestHandlers may now be attached from any thread. There's a nice side effect now that a Http request only stalls the thread of the HttpRequestHandler which processes the request.

There's still more work to do:

need to write some stress-tests to uncover any thread-synchronization bugs
need to do performance investigations and profiling (are there any unintended synchronizations issues?)
thread-specific low-level optimization in the Memory subsystem as detailed in one of my previous posts
optimize the messaging system as much as possible (especially creation and dispatching)
I also want to implement some sort of method to run the rendering in the main thread, partly for debugging, partly for platforms with simple single-core CPUs

Phew, that's all for today :)