Action preserving diversification of animation sequences

The creation of good animation sequences is a lengthy process, which is typically carried out either using a motion-capture system or directly by content creators on 3D modelling software. To increase realism in a virtual scene, a considerable number of animations (e.g., for walking) would need to be created. This project attempts to address this problem by proposing a system which, given a set of animations, would be able to generate variations of these while preserving the purpose of the actions being animated.

Essentially, the process required mapping different humanoid animation sequences into a latent space using different clustering algorithms, and then proceeding to create variations by combining different animations and varying the influence of the chosen animations for the new animation. This would then result in new variations influenced by other animations. The dataset of animation sequences that was mapped into the latent space was sourced from the Carnegie-Mellon Graphics Lab Motion Capture Database. This consists of hundreds of different humanoid animations, such as walking, dancing, jumping and other actions that involve hand movement while sitting and squatting.

The latent space created holds the mapped animations in such way that similar animations are closer to each other. The mapping function considers a feature vector extracted from the animation sequence. The features that were considered include, the position and rotation of each component of the human body from 60 evenly spaced frames, the muscle values of 60 evenly spaced frames, and the distribution of translation and rotation values of each human body component throughout the whole animation. This was done to observe which features would best create groups in the latent space, such that animations within the groups would be closest to the animations within that group.

The groups formed in the latent space were identified in such a way as to produce a hierarchical structure. In practice, the space was divided into large groups; these were further divided, until no more subgroups could be identified. In the system, the first groups identified were divided in terms of movement speed and direction of the whole body. Particularly, looking at one of the groups, it was further subdivided in terms of where the hand movement was occurring.

The new animations were created through Unity, the game engine. The humanoid animations were imported into Unity according to how they were grouped in the latent space. The user could then specify to which groups the different limbs and the strength of the influence could be assigned. The system then creates all possible combinations with the animations in the chosen groups, allowing the user to procced to choosing the preferred variations.

The correctness and quality of the results obtained was evaluated through an online survey, where 120 participants were asked to rate animations that included some that were not created by the system. Results show that some variations performed better than those of the motion-capture library. Additionally, the range of the overall ratings was not that wide, which suggests that the variations mixed well with existing animation sequences.

Figure 1. Merging of basketball dribbling and drinking (first 2 rows) into one animation sequence (third row)
Student: Matthias Attard
Course: B.Sc. (Hons.) Computing Science
Supervisor: Dr Sandro Spina