Skeleton Driven Non-rigid Motion Tracking and 3D Reconstruction

DICTA 2018

Shafeeq Elanattil 1,2
Peyman Moghadam1,2
Simon Denmen2
Sridha Sridharan2
Clinton Fookes2

Autonomous Systems Laboratory, CSIRO Data61, Brisbane, Australia1

Queensland University of Technology, Brisbane, Australia2


Qualitative results of live 3D reconstruction from‘Exercise’ data sequence in our synthetic dataset.
The upperrow corresponds to images of different frame indices and the lower row shows respective 3D reconstructions.

This paper presents a method which can track and 3D reconstruct the non-rigid surface motion of human performance using a moving RGB-D camera. 3D reconstruction of marker-less human performance is a challenging problem due to the large range of articulated motions and considerable non-rigid deformations. Current approaches use local optimization for tracking. These methods need many iterations to converge and may get stuck in local minima during sudden articulated movements. We propose a puppet model-based tracking approach using skeleton prior, which provides a better initialization for tracking articulated movements. The proposed approach uses an aligned puppet model to estimate correct correspondences for human performance capture. We also contribute a synthetic dataset which provides ground truth locations for frame-by-frame geometry and skeleton joints of human subjects. Experimental results show that our approach is more robust when faced with sudden articulated motions, and provides better 3D reconstruction compared to the existing state-of-the-art approaches.



Paper

Shafeeq Elanattil, Peyman Moghadam, Simon Denmen, Sridha Sridharan, Clinton Fookes

Skeleton Driven Non-rigid Motion Tracking and 3D Reconstruction

DICTA 2018

[pdf] [poster] [bibtex]


Overview


Block diagram of inputs and outputs of our proposed system.
(a), (b) and (c) shows colour, depth and skeleton inputs at current frame.
(d) and (e) illustrates puppet model and 3D reconstruction outputs at previous frame and finally (f) and (g) are puppet model and 3D reconstruction output at current frame.


Our approach operates in a frame-to-frame manner. For each frame, we sequentially perform three steps: at first, the motion is tracked by a puppet model using the current frame skeleton prior. Secondly, non-rigid tracking is carried out using the puppet model. Here the puppet model's transformations are used to initialize tracking and correspondence estimation. Thirdly, volumetric fusion is carried out as in state-of-the-art approaches. A block diagram of our proposed system is shown above. Note that unlike the other approaches our ystem takes the skeleton prior as an input per frame along with the RGB-D data.



Methodology



Calculation of Intial transformation of each body parts.
Initial Rigid transformation (Rinit, tinit) of each body part is calculated using the angle between the skeleton bones as shown above.

Correspondence estimation.
For a point in the reconstruction vc the nearest neighbour in the puppet model vcp is estimated (shown in first and second images from left to right). The corresponding point of vcp in the aligned puppet vlp is used for finding nearest neighbour in the target cloud vt (shown in third and forth images). The correspondence from vc to vt is established in this way



The key contributiond of our methodology are alculation of Intial transformation of each body parts using skeleton prior and correspondence estimation suing puppet model.

Please see the paper for more details.

Results



Qualitative results of motion tracking from the 'Boxing' data sequence from our dataset.
The upper row shows images of different frames and the lower row shows the respective deformed 3D model. The frame index is shown below each image.



Qualitative results of motion tracking from the 'Exercise' data sequence from our dataset.
The upper row shows images of different frames and the lower row shows the respective deformed 3D model. The frame index is shown below each image.



Qualitative results of live 3D reconstruction from the 'Boxing' sequence.
The upper row shows images of different frames and the lower row shows the respective 3D reconstructions. The frame index is shown below each image.


Limitations and Future work

One of the major limitations in this work is that we used the ground truth of skeleton joints in the experiments. In a stand alone system this should be coming from a skelton joint detector. The reason we not using a skelton joint detector is that it comes with new challenges in our performance capture method, due to the potential for detection failures or wrong detections due to self occlusions. In our future work we are planning to anticipate these challenges and extend our method to a stand-alone system.

Acknowledgements

We thank ChanohPark, Agniva Sengupta and Eranda Tennakoon for helpful discussions. This webpage template is taken from humans working on 3D who borrowed it from some colorful folks.