In a paper published this month, the human motions to wear clothes is synthesized in animation with reinforcement learning. The paper named Learning to Dress: Synthesizing Human Dressing Motion via Deep Reinforcement Learning was published yesterday. The team is made up of two Ph.D. students from The Georgia Institute of Technology, two of its professors and a researcher from Google Brain.
Understanding the dressing problem
Dressing, putting on a t-shirt or a jacket is something we do every day. Yet it is a computationally costly and complex task for a machine to perform or be simulated by computers. Techniques in physics simulation and machine learning are used in this paper to simulate an animation. A physics engine is used to simulate character motion and cloth motion. On the other hand deep reinforcement learning on a neural network is used to produce character motion.
Physics engine and reinforcement learning on a neural network
The authors of the paper introduce a salient representation of haptic information to guide the dressing process. Then the haptic information is used in the reward function to provide learning signals when training the network. As the task is too complex to do perform in one go, the dressing task is separated into several subtasks for better control.
A policy sequencing algorithm is introduced to match the distribution of output states from one task to the input distribution for the next task. The same approach is used to produce character controllers for various dressing tasks like wearing a t-shirt, wearing a jacket, and robot-assisted dressing of a sleeve.
Dressing is complex, split into several subtasks
The approach taken by the authors splits the dressing task into a sequence of subtasks. Then a state machine guides the between these tasks. Dressing a jacket, for example, consists of four subtasks:
- Pulling the sleeve over the first arm.
- Moving the second arm behind the back to get in position for the second sleeve.
- Putting hand in the second sleeve.
- Finally, returning the body back to a rest position.
A separate reinforcement learning problem is formulated for each subtask in order to learn a control policy.
The policy sequencing algorithm ensures that these individual control policies can lead to a successful dressing sequence on being executed sequentially. The algorithm matches the initial state of one subtask with the final state of the previous subtask in the sequence. A variety of successful dressing motions can be produced by applying the resultant control policies.
Each subtask in the dressing task is formulated as a partially observable Markov Decision Process (POMDP). Character dynamics are simulated with Dynamic Animation and Robotics Toolkit (DART) and cloth dynamics with NVIDIA PhysX.
Conclusion and room for improvement
A system that learns to animate a character that puts on clothing is successfully created with the use of deep reinforcement learning and physics simulation. From the subtasks, the system learns each sub-task individually, then connects them with a state machine. It was found that carefully selecting the cloth observations and the reward functions were important factors for the success of the approach taken.
This system currently performs only upper body dressing. For lower body, a balance into the controller would be required. The number of subtasks might reduce on using a control policy architecture with memory. This will allow for greater generalization of the skills learned.
You can read the research paper at the Georgia Institute of Technology website.