The following video shows the Full Body Pose Estimation Library in action from a live video source :
Pose estimation has become one of the topics of interest in the computer vision field. At RidgeRun we decided to take part, and we started our own research project based on three-dimensional human pose estimation. The full-body pose estimation library is a set of Python modules intended to carry out the whole pose estimation process, as shown in Figure 1 below. This process includes the camera calibration, video capturing, the pose estimation itself, and the estimation refinement. The main objective of this library is to further use it as a starting point for human motion analysis in sports. However, the majority of the modules are designed in a way that they can be reused in other applications.
Figure 1. Demo example (kinematic fitting) of the final output of the pose estimation process.
The library has been designed with high modularity and capability for an extension. Figure 2 shows the general design of the library, including the processing modules for each stage of the estimation process. An advantage of this design is the capability of extension for the input sources, the library allows to consume video from almost any video sources through a simple and extendable interface.
Figure 2. Library design overview.
Due to the final goal of the library, accuracy is a critical factor and bone length consistency is a must. For this reason, the library includes a skeleton calibration stage that ensures to match the estimated skeleton with the subject dimensions.
One of the main advantages of the library is that it is agnostic to the object, that is, it can be used for any skeleton and it is not limited to human pose only. Figure 3 below shows an example of the library using a dog skeleton.
Figure 3. Example of the library using a dog skeleton instead of a human skeleton.
The library includes a calibration framework that allows calibrating the cameras using AprilTags or a traditional chessboard as depicted in Figures 4a & 4b below. This flexibility allows an adequate calibration process for the application use case.
Figure 4a: Model of a dual-camera system capturing the Chessboard calibration object.
Figure 4b: Model of a dual-camera system capturing AprilTag calibration object.
The library was tested using the hardware setup depicted in Figure 5. However, it can be used on any system with multi-camera support.
Figure 5. Hardware setup used to test the library using NVIDIA Jetson TX2.
We are currently working on taking this project even further by using our results as a starting point for sports analysis. In our next release we will have the following features:
An improved version of the 3D pose estimator to make it faster and more accurate when using multi-camera systems. We are aiming to improve the system accuracy and precision by optimizing the system’s final prediction based on a multi-variable error function that encompasses different aspects of the pose estimation process. We are also working on improving the system performance by utilizing different techniques which will make more efficient use of the GPU power and system resources.
A 2D and 3D posture correction system integrating the new pose estimator. This correction system shall analyze the sport or any other kind of activity performed by a person in a reference video. Then, it will tell the person performing that same activity what changes should be done in order to match the correct postures shown in the reference video.
RidgeRun developer wiki page :
Any Questions? : email@example.com