Full Body Pose Estimation for Sports Analysis - Kinematic Fitting
WORK IN PROGRESS. Please Contact RidgeRun OR email to support@ridgerun.com if you have any questions. |
Introduction
The kinematic fitting module is intended to receive the 3D positions from the pose estimator and adjust them to the user limb lengths while considering the anatomic constraints of the human body. This process is done through kinematic chains that model the skeleton pose with three-dimensional rotations and translations.
How it Works
The kinematic fitting is a mathematical procedure that considers physical constraints in order to improve measurements describing a process. In this case, we want to adjust the predicted positions from the 3D pose estimation system, to the actual skeleton of the subject. In order to achieve this, it is necessary to have a calibrated skeleton first.
Once we have the fixed skeleton with the calibrated bone measures, it is necessary to map the positions from the 3D pose estimator to this new skeleton. Even if the subject is the same as the one whose skeleton is calibrated, due to the noise, the predicted positions may not match with this skeleton. In order to overcome this, we need to obtain the 3D angles to represent the pose, since they are independent of bone lengths. With the angles, it is possible to map the same pose between different types of skeletons using the relative pose representation, but the problem with obtaining the angles is the optimization procedure involved. Instead, we take the approach of using artificial neural networks to provide the angles of a pose, given the skeleton joint's positions and translations.
Model Architecture
The network architecture is shown in Figure 2. The architecture is made of five fully connected layers and a custom layer for position-mapping loss calculation. The first four layers have each one batch normalization and a ReLu activation function. The last fully-connected layer which outputs the angles has no batch normalization and a tanh activation function. Finally, the custom layer is used for loss calculation between the input positions and the mapped positions using the predicted angles. The angles are provided by means of their sine and cosine values to avoid ambiguity in their interpretation.
Here the numbers below the layer names are the number of neurons. In the case of the last fully connected layer that has an number, is the number of movable freedom degrees of the skeleton multiplied by two, since the angles are provided in sine-cosine pairs.
The model receives as input the absolute 3D positions and the relative translations of the skeleton, however, the translations may be calculated from the positions, using the skeleton calibrator, so there is no need to explicitly provide them. In its output, the model will provide the sine-cosine pair values for each movable freedom degree, and the 3D positions mapped with those angles. However, these predicted positions are meant to be used only for loss calculation in the model training, since they depend on the skeleton measures and that is exactly what we want to avoid.
Model Training
In this module, we use a Keras model to predict the angles that represent a given pose. We use angles instead of absolute positions to make the pose independent from the skeleton’s bone lengths. This prediction is part of the kinematic fitting applied to the 3D pose estimations in order to improve the accuracy.
In order to train a model, you need the input and output data for training and validation and the training script.
Input Data
The input data must be located in a file named input.txt. This file will contain in each line a list with the absolute positions of the visible joints for a given pose. The joint's positions must be ordered inside the list according to their id.
For example, let us say that we have three joints: joint0 and joint2 are visible, but joint1 is not. They should be listed like this:
[joint0_x, joint0_y, joint0_z, joint2_x, joint2_y, joint2_z] //one pose [joint0_x, joint0_y, joint0_z, joint2_x, joint2_y, joint2_z] //another pose
Note that joint1 is not listed because it is not visible.
Output Data
The output data must be located in a file named output.txt. This file will contain in each line a list with the sine-cosine pairs for the rotation angles of each joint that has a movable freedom degree, for the corresponding pose in the input.txt. The joint's angles must be ordered inside the list according to their id, and also following the X, Y, Z order for each joint.
For example, let us say that we have the same three joints from before. In this case, joint0 is movable in the X and Z axes, joint1 is movable in the three axes and joint2 is movable only in the Z axis. They should be listed like this:
[joint0_sinx, joint0_cosx, joint0_sinz, joint0_cosz, joint1_sinx, joint1_cosx, joint1_siny, joint1_cosy, joint1_sinz, joint1_cosz] //one pose [joint0_sinx, joint0_cosx, joint0_sinz, joint0_cosz, joint1_sinx, joint1_cosx, joint1_siny, joint1_cosy, joint1_sinz, joint1_cosz] //another pose
Note that the axes in which the joints don't move are not listed.
Training Script
We provide a training script in this link, that you may use in order to create a new model for your data. This script allows the following options to customize the training:
- train-path -t: the path to the folder where the input.txt and output.txt for training are located. **
- validation-path -v: the path to the folder where the input.txt and output.txt for validation are located. **
- json-path -j: the path to the json file that contains the structure of the type of skeleton you are working with (click here for further reference). **
- input-model -i: the path to the .h5 file in case you want to start the training from pre-trained weights. Default: "".
- model-name -m: the name of the training output model (without extension). Default: "model".
- initial-epoch -e: a number indicating the starting epoch (used when an input model is provided). Default: 0.
- num-epochs -n: a number indicating the total of epochs for the training. Default: 2000.
- batch-size -b: the number of samples to be processed per training batch. Default: 1024.
- optimizer -o: name of one of Keras pre-defined optimizers. Default: "adadelta".
- log-off -l: if this argument is provided there will be no log file for the training. However, it is recommended to store the log to analyze the training process. If the argument is not provided a model-name_log.txt will be created. Default: save log.
**mandatory
Example:
python3 training_script.py -t <path-to-train> -v <path-to-val> -j <path-to-json> -m test_model -n 100000 -b 1024 -l
You may try this script using these training and validation datasets. The input data of these datasets are normalized in order to have the skeletons facing front and their pelvis in the (0,0,0) coordinates. In the examples section, you may see how you can use the trained model in order to predict the angles of a given pose.
How to Use the Model
Once you have an already trained model you can make your predictions using an input.txt file like the one described above. Here we provide a file with data you can use to test the model.
An example of how to make predictions, using the visualization would be the following:
#Import the necessary modules
from pose_estimation.pose_representation.skeleton_visualization_3d import *
from pose_estimation.kinematic_fitting.kinematic_fitting_model import *
from pose_estimation.skeleton_calibration.skeleton_calibration import *
#You should assign the path variables according to your environment
DATAPATH = "path/to/the/input.txt"
JSON_PATH = "path/to/the/skeleton/json/file"
COLORS_PATH = "path/to/the/colors/json"
MODELPATH = "path/to/the/model/weights"
#Define the pairs of limbs that we want to be symmetric in the calibration
SYMMETRIC_LIMBS = [[3,9],[4,10],[5,11],[6,12],[7,13],[8,14]]
#Read the data from the input.txt
input_data = open(DATAPATH, "r").readlines()
num_samples = len(input_data)
#Initialize the skeleton, calibrato, visualizer and kinematic fitting model
#objects
skeleton = Skeleton(JSON_PATH)
num_joints = len(skeleton.skeleton_joints)
visible_joints = skeleton.get_visible_joints()
calibrator = SkeletonCalibrator(skeleton, min_samples=50)
visualizer = SkeletonVisualizer3D(skeleton)
kinematic_fitting = KinematicFittingModel(skeleton)
#Visualizer initialization with one figure
visualizer.create_skeleton_scenes(1, [COLORS_PATH])
#Load the model weights with a batch size of 1
kinematic_fitting.load_model(MODELPATH, 1)
#First, we apply a calibration loop to obtain the skeleton measures
for i in range(num_samples):
#Read the sample
positions_raw = eval(input_data[i])
#Reshape it as a numpy array for calibration
positions = numpy.reshape(positions_raw, (visible_joints,3))
#Calibrate the measures
calibrator.calibrate(positions)
#Show visualization with a delay of 0.1 seconds
visualizer.plot_skeletons([positions], ["Calibrating: " + str(i)], 0.1)
#Check calibration status
if(calibrator.is_calibrated()):
break
#Now, with the calibrated skeleton, we obtain the measures through
#its translations in order to place them as input for the model
calibrated_skeleton = calibrator.get_calibrated_skeleton(SYMMETRIC_LIMBS)
skeleton_translations = calibrated_skeleton.get_all_translations()
#Clear the visualization
visualizer.clear_figure()
#Visualizer initialization with two figures using for
#both of them the same color scheme
visualizer.create_skeleton_scenes(2, [COLORS_PATH, COLORS_PATH])
#Prediction loop
for i in range(i, num_samples):
#Read the sample
positions_raw = eval(input_data[i])
#Predict the coordinates and angles (remember we use only the angles)
_, angles = kinematic_fitter.get_coords_and_angles(positions_raw,
skeleton_translations)
#The transfer pose method of the skeleton receives ALL the angles of
#each joint. However, since we only predict the movable ones, it is
#necessary to fill the values of those that were not taken into
#account in the model training.
angles = kinematic_fitting.map_angles(angles[0])
calibrated_skeleton.transfer_pose(angles)
#Obtain new position with the calibrated skeleton
pred_positions = calibrated_skeleton.get_pose()
#Visualize the input vs the prediction
positions = numpy.reshape(positions_raw, (visible_joints, 3))
visualizer.plot_skeletons([positions, pred_positions],
["Raw: " + str(i), "Predicted: " + str(i)])
With this example, you may expect a visualization like the one shown in Figure 3.