Motion Capturing

Handling and adapt mocap data of an optical system

Project overview

Problems

Smoothly animating virtual characters is a pretty challenging task, that's why there exist motion capturing systems, which basically translates real world movement into virtual environments.

One among many possible tracking systems which is suitable for the job is an optical one, which uses a set of trackers displaced on the body, tracked by several cameras.

Methods

OptiTrack

Tracking

Predicting

Tools

Python;

OpenCV;

Unreal Engine 5;

Goals

Learn to work with optical tracking data.

Find also way to make them more robust to information loss: a sensor's data is lost if not enough camera can capture its position.

Move to virtual environment such as Unreal Engine 5, understanding the tool and modelling a scene with an animated virtual character.

Extract spatial informations from it and achieve 3D to 2D projection of skeleton joints on the camera image plane.

Developed in collaboration with Alessandro Lorenzi (2024).

Context

This project was part of the Computer Vision course at University of Trento a.y. 2023/2024, on my first year of Master's degree in AI systems.

I've always found virtual environments' possibilities really interesting and limitless. I was quite happy to work in such contexts inside a university project.

Design process

1

Understanding the data

2

Address the data loss problem

3

Unreal Engine simulation

4

3D to 2D projection

The problem

Optical tracking systems are able to gather information about trackers at an impressive speed (those we have in the labs go up to 360 fps).
The informations provided by those sensors we mostly care about are :

We can utilize those data to build a simple demo even inside Python's Matplotlib, which is not really designed to such tasks... but it's easy and does the job :
This animation was a pretty smooth one, without any big defects, let's see a harder case :
Notice the flickering! This occurs when less than 3 cameras can see the markers, causing a complete loss of information... this is what we want to fix.

Addressing the problem

A possible solution to the problem is to apply some sort of external tracking, so that when the sensors' information is loss we can try to predict it. A simple way to achieve so is the application of naive filters! We propose 2: a Kallman filter (the most popular and obvious) & a particle filter.
Starting with the Kallman filter :

It's noticeable some shaking, but considering the motion's impredectability and the fact there's not anymore processing other than the filter application that's quite nice.
Switching to the particle filter version :
The particle filter's nature makes its stochastic behaviour not really suited for tracking rigid bodies, especially since we can't really retain any body structure's information, as there's no correlation between particles tracking one marker & another.
Other more complex objective functions which regulate the particles behaviour might be more suited for the job of course.

Unreal Engine simulation

By simulating a virtual environment inside Unreal Engine 5.4 (UE) the goal is to extract pose informations necessary to achieve 3D to 2D projection onto the camera image plane of all the joints togheter with the skeletal structure.

That was our first time playing around with UE, but after a while we got the hang of it : UE makes it possible to interact with level (scene) components either via C++ code, or blueprints (BP) (visual representation of code functions via node graphs). As it was our first time with the Engine we went for the BP approach.

As first we properly modelled the scene inserting 2 core blueprints, one containing our main actor and the other containing the camera. Using the animation retargeting feature provided by UE5 we were also easily able to map the provided animation onto another free skeleton from Adobe Mixamo characters

properly positioning actors in the scene a LevelSequencer component allows to capture a video using the virtual camera. Using the blueprint engine together with Json blueprint utility plugin we implemented a script to extract the following data in Json format.
You can either play around with this blueprint visualization tool (enter in full screen for better visualization)



Visualization also available HERE

3D to 2D projection

We decided to go for openCV as image processing framework as it provides all we need to perform a 3D to 2D projection of a set of points into the camera image plane. We had some stuff to deal with first :

UE5 and openCV use 2 different coordinate systems as :

That means all the position and rotation data we gatehred need to be converted performing a change of basis. Such a task can be easily be handled with using homogenous transformation matrices as well explained here.

Knowing the world coordinate references to both the camera and joints we just need to extract the camera intrinsics and compute the projection. Camera intrinsics are generally extracted via camera calibration, but as this is a controlled environment with no distortion we can directly compute them with some algebra. computing the transformation for each point with the cv.projectPoints() we're able correctly displace the skeleton on the image plane.

Bonus

Instead of evaluating results on Matplotlib it would be much better to forward data to a more suitable environment such as Blender. To achieve so we used as basis the deep-motion-editing repository which provides a framework to build skeleton aware neural networks by interacting with Blender python APIs.

Further informations & source code available on GitHub !

My contribution to the project

The team worked as a compact unit through almost all the phases of the project.