Cross-Platform Multitouch Input

TL;DR: A look at the low-level touch-input APIs on iOS, Android NDK and emscripten, and how to unify them for cross-platform engines with links to source code.

Why

Compared to mouse, keyboard and gamepad, handling multi-touch input is a complex topic because it usually involves gesture recognition, at least for simple gestures like tapping, panning and pinching. When I worked on mobile platforms in the past, I usually tried to avoid processing low-level touch input events directly, and instead used gesture recognizers provided by the platform SDKs:

On iOS gesture recognizers are provided by the UIKit, they are attached to an UIView object, and when the gesture recognizer detects a gesture it will invoke a callback method. The details are here: GestureRecognizer_basics.html

The Android NDK itself has no built-in gesture recognizers, but comes with source code for a few simple gesture detectors in the ndk_helpers source code directory

There’s 2 problem with using SDK-provided gesture detectors. First, iOS and Android detectors behave differently. A pinch in the Android NDK is something slightly different then a pinch in the iOS SDK, and second, the emscripten SDK only provides the low-level touch events as provided by HTML5 Touch Event API, no high-level gesture recognizers.

So, to handle all 3 platforms in a common way, there doesn’t seem to be a way around writing your own gesture recognizers and trying to reduce the platform-specific touch event information into a platform-agnostic common subset.

Platform-specific touch events

Let’s first look at the low-level touch events provided by each platform in order to merge their common attributes into a generic touch event:

iOS touch events

On iOS, touch events are forwarded to UIView callback methods (more specifically, UIResponder, which is a parent class of UIView). Multi-touch is disabled by default and must be enabled first by setting the property multipleTouchEnabled to YES.

The callback methods are:

- touchesBegan:withEvent:
- touchesMoved:withEvent:
- touchesEnded:withEvent:
- touchesCancelled:withEvent:

All methods get an NSSet of UITouch object as first argument and an UIEvent as second argument.

The arguments are a bit non-obvious: the set of UITouches in the first argument is not the overall number of current touches, but only the touches that have changed their state. So if there’s already 2 fingers down, and a 3rd finger touches the display, a touchesBegan will be received with a single UITouch object in the NSSet argument, which describes the touch of the 3rd finger that just came down. Same with touchEnded and touchMoved, if one of 3 fingers goes up (or moves), the NSSet will only contain a single UITouch object for the finger that has changed its state.

The overall number of current touches is contained in the UIEvent object, so if 3 fingers are down, the UIEvent object contains 3 UITouch objects. The 4 callback methods and the NSSet argument are actually redundant, since all that information is also contained in the UIEvent object. A single touchesChanged callback method with a single UIEvent argument would have been enough to communicate the same information.

Let’s have a look at the information provided by UIEvent, first there’s the method allTouches which returns an NSSet of all UITouch objects in the event and there’s a timestamp when the event occurred. The rest is contained in the returned UITouch objects:

The UITouch method locationInView provides the position of the touch, the phase value gives the current state of the touch (began, moved, stationary, ended, cancelled). The rest is not really needed or specific to the iOS platform.

Android NDK touch events

On Android, I assume that the Native Activity is used, with the android_native_app_glue.h helper classes. The application wrapper class android_app allows to set a single input event callback function which is called whenever an input event occurs. Android NDK input events and access functions are defined in the “android/input.h” header. The input event struct AInputEvent itself is isn’t public, and can only be accessed through accessor functions defined in the same header.

When an input event arrives at the user-defined callback function, first check whether it is actually a touch event:

int32_t type = AInputEvent_getType(aEvent);
if (AINPUT_EVENT_TYPE_MOTION == type) {
  // yep, a touch event
}

Once it’s sure that the event is a touch event, the AMotionEvent_ set of accessor functions must be used to extract the rest of the information. There’s a whole lot of them, but we’re only interested in the attributes that are also provided by other platforms:

AMotionEvent_getAction();
AMotionEvent_getEventTime();
AMotionEvent_getPointerCount();
AMotionEvent_getPointerId();
AMotionEvent_getX();
AMotionEvent_getY();

Together, these functions provide the same information as the iOS UIEvent object, but the information is harder to extract.

Let’s start with the simple stuff: A motion event contains an array of touch points, called ‘pointers’, one for each finger touching the display. The number of touch points is returned by the AMotionEvent_getPointerCount() function, which takes an AInputEvent* as argument. The accessor functions AMotionEvent_getPointerId(), AMotionEvent_getX() and AMotionEvent_getY() take an AInputEvent* and an index to acquire an attribute of the touch point at the specified index. AMotionEvent_getX()/getY() extract the X/Y position of the touch point, and the AMotionEvent_getPointerId() function returns a unique id which is required to track the same touch point across several input events.

AMotionEvent_getAction() provides 2 pieces of information in a single return value: the actual ‘action’, and the index of the touch point this action applies to:

The lower 8 bits of the return value contain the action code for a touch point that has changed state (whether a touch has started, moved, ended or was cancelled):

AMOTION_EVENT_ACTION_DOWN
AMOTION_EVENT_ACTION_UP
AMOTION_EVENT_ACTION_MOVE
AMOTION_EVENT_ACTION_CANCEL
AMOTION_EVENT_ACTION_POINTER_DOWN
AMOTION_EVENT_ACTION_POINTER_UP

Note that there are 2 down events, DOWN and POINTER_DOWN. The NDK differentiates between ‘primary’ and ‘non-primary pointers’. The first finger down generates a DOWN event, the following fingers POINTER_DOWN events. I haven’t found a reason why these should be handled differently, so both DOWN and POINTER_DOWN events are handled the same in my code.

The upper 24 bits contain the index (not the identifier!) of the touch point that has changed its state.

emscripten SDK touch events

Touch input in emscripten is provided by the new HTML5 wrapper API in the ‘emscripten/html5.h’ header which allows to set callback functions for nearly all types of HTML5 events (the complete API documentation can be found here.

To receive touch-events, the following 4 functions are relevant:

emscripten_set_touchstart_callback()
emscripten_set_touchend_callback()
emscripten_set_touchmove_callback()
emscripten_set_touchcancel_callback()

These set the application-provided callback functions that are called when a touch event occurs.

There’s a caveat when handling touch input in the browser: usually a browser application doesn’t start in fullscreen mode, and the browser itself uses gestures for navigation (like scrolling, page-back and page-forward). The emscripten API allows to refine the events to specific DOM elements (for instance the WebGL canvas of the application instead of the whole HTML document), and the callback can decide to ‘swallow’ the event so that standard handling by the browser will be supressed.

The first argument to the callback setter functions above is a C-string pointer identifying the DOM element. If this is a null pointer, events from the whole webpage will be received. The most useful value is “#canvas”, which limits the events to the (WebGL) canvas managed by the emscripten app.

In order to suppress default handling of an event, the event callback function should return ‘true’ (false if default handling should happen, but this is usually not desired, at least for games).

The touch event callback function is called with the following arguments:

int eventType,
const EmscriptenTouchEvent* event
void* userData

eventType will be one of:

EMSCRIPTEN_EVENT_TOUCHSTART
EMSCRIPTEN_EVENT_TOUCHEND
EMSCRIPTEN_EVENT_TOUCHMOVE
EMSCRIPTEN_EVENT_TOUCHCANCEL

The 4 different callbacks are again kind of redundant (like in iOS), it often makes sense to route all 4 callbacks to the same handler function and differentiate there through the eventType argument.

The actual touch event data is contained the EmscriptenTouchEvent structure, interesting for us is the member int numTouches and an array of EmscriptenTouchPoint structs. A single EmscriptenTouchPoint has the fields identifier, isChanged and the position of the touch in canvasX, canvasY (other member omitted for clarity).

Except for the timestamp of the event, this is the same information provided by the iOS and Android NDK touch APIs.

Bringing it all together

The cross-section of all 3 touch APIs provides the following information:

a notification when the touch state changes:
- a touch-down was detected (a new finger touches the display)
- a touch-up was detected (a finger was lifted off the display)
- a movement was detected
- a cancellation was detection
information about all current touch points, and which of them has changed state
- the x,y position of the touch
- a unique identifier in order to track the same touch point over several input events

The touch point identifier is a bit non-obvious in the iOS API since the UITouch class doesn’t have an identifier member. On iOS, the pointer to an UITouch object serves as the identifier, the same UITouch object is guaranteed to exist as long as the touch is active.

Also, another crucial piece of information is the timestamp when the event occurred. iOS and Android NDK provide this with their touch events, but not the emscripten SDK. Since the timestamps on Android and iOS have different meaning anyway, I’m simply tracking my own time when the events are received.

My unified, platform-agnostic touchEvent now basically looks like this:

struct touchEvent {
    enum touchType {
        began,
        moved,
        ended,
        cancelled,
        invalid,
    } type = invalid;
    TimePoint time;
    int32 numTouches = 0;
    static const int32 MaxNumPoints = 8;
    struct point {
        uintptr identifier = 0;
        glm::vec2 pos;
        bool isChanged = false;
    } points[MaxNumPoints];
}

TimePoint is an Oryol-style timestamp object. The uintptr datatype for the identifier is an unsigned integer with the size of a pointer (32- or 64-bit depending on platform).

Platform-specific touch events are received, converted to generic touch events, and then fed into custom gesture recognizers:

Simple gesture detector source code:
- tap detector
- panning detector
- pinch detector

And a simple demo (the WebGL version has only been tested on iOS8, mobile Safari’s WebGL implementation still has bugs):
- WebGL demo
- Android self-signed APK

And that’s all for today :)

Written with StackEdit.