Abstract
Human hands possess remarkable dexterity and have long served as a source of inspiration for robotic manipulation. In this work, we propose a human Hand-Informed visual representation learning framework to solve difficult Dexterous manipulation tasks (H-InDex). Our framework consists of three stages: (i) pre-training representations with 3D human hand pose estimation, (ii) offline adapting representations with self-supervised keypoint detection, and (iii) reinforcement learning with exponential moving average BatchNorm. The last two stages only modify 0.36% parameters of the pre-trained representation in total, ensuring the knowledge from pre-training is maintained to the full extent. We empirically study 12 challenging dexterous manipulation tasks and find that our method largely surpasses the previous state-of-the-art method and also the recent visual foundation models for motor control.
Method Overview
Visualization of Tasks
We show the successful trajectories of our dexterous manipulation task suite, generated by policies trained with H-InDex.
Visualization of Self-Supervised Keypoint Detection
We visualize the self-supervised keypoint detection results in Stage 2. The trajectory here is from the training videos.
Citation
If you use our method or code in your research, please consider citing the paper as follows: