EVA

I think I have a pretty good plan now for Nebula3’s asynchronous job subsystem, and (most importantly) I have a nifty name for it: EVA (Eingabe-Verarbeitung-Ausgabe, engl.: Input-Processing-Output). A job object is basically a piece of input data plus some code which asynchronously processes that data, resulting in the output data. Hence EVA. The main motivation is of course to make use of the PS3 stream processing units (I know, I know, the S in SPU actually stands for synergistic, but I think that’s wishful thinking at least in the context of a game engine, processing of individual data streams makes more sense then chaining the SPUs together IMHO, but I digress…).

I really don’t want an entire, important subsystem in N3 which only makes sense on the PS3 though. EVA jobs should also work on the CPU or GPU if desired.

The 2 main inspirations are Insomniacs “SPU Shaders” (treat a SPU job like a GPU shader), and DX11’s Compute Shaders (use GPU shaders for non-rendering purposes).

The main problem is to provide a simple, generic interface of how to get data to and from “external computation units” with as little synchronization as possible. GL and D3D had to solve that problem years ago in order to let the GPU work in parallel to the CPU. Vertex buffers and textures provide the input data, which is processed by shader code and the output is written to render targets. It’s a simple and intuitive pattern of how to communicate with an asynchronous processing unit, and best of all, every programmers knows (or should know) those concepts in and out.

EVA will simply wrap existing ideas under a common subsystem, with the emphasis on data-compatibility, not code compatibility. It should be relatively easy to “port” a job between the CPU, GPU or SPU. The main problem (how to structure the input and output data) should only be solved once (with the general rule-of-thumb that related data should be placed near to each other), while the processing code needs to re-written for each processing unit type (FX/HLSL for GPU jobs, simple, self-contained C or C++ code for CPU and SPU jobs).

Thus an EVA job object would have the following properties:

one or more input buffers
one output buffer
a small set of input parameters
the actual processing code

Buffers usually contain a stream of uniform data elements (similar to vertices or pixels), the input parameters can be used to cheaply tweak the behavior of an existing job object.

For a CPUJob, the input and output buffers would be simple system memory buffers, and the processing code would be standard C/C++ code which is running in a thread of a thread-pool. The processing code should not call any “non-trivial” external functions as to remain somewhat portable to the other job types.

A “DX9PixelShaderJob” would use textures as input buffers, and a render target as output buffer, the input parameters and the processing code would be described by an FX shader file.

SPU jobs would need a way to manually pull data from the input buffers to local memory, and to write blocks of processed data back to the output buffer, possibly using some sort of double buffering.

A way to cheaply convert/map buffers between the different job types would be desirable (for instance to use an output buffers directly as a DX texture, etc…). DX10 provides a pattern for this with its “resource views”.

A great plan is the simple part of course, the devil is in the implementation. And once EVA is ready for action the true challenge will be to “de-fragment” the game-loop in order to identify and isolate jobs which can be handled asynchronously.