SkillMimic: Learning Basketball Interaction Skills from Demonstrations

CVPR 2025 🏆 Highlight

Yinhuai Wang*^1,2, Qihan Zhao*¹, Runyi Yu*^1,2, Hok Wai Tsui¹, Ailing Zeng^✉️6, Jing Lin⁴, Zhengyi Luo⁷, Jiwen Yu³, Xiu Li⁴, Qifeng Chen¹, Jian Zhang^✉️3, Lei Zhang⁵, Ping Tan¹

¹Hong Kong University of Science and Technology ²Unitree Robotics ³Peking University ⁴Tsinghua University ⁵International Digital Economy Academy ⁶Tencent ⁷Carnegie Mellon University

arXiv Video Code Dataset

We propose a novel approach that enables physically simulated humanoids to learn a variety of basketball skills from human-object demonstrations, such as shooting (blue), retrieving (red), and turnaround layup (yellow). Once acquired, these skills can be reused and combined to accomplish complex tasks, such as continuous scoring (green), which involves dribbling toward the basket, timing the dribble and layup to score, retrieving the rebound, and repeating.

Abstract

Mastering basketball skills such as diverse layups and dribbling involves complex interactions with the ball and requires real-time adjustments. Traditional reinforcement learning methods for interaction skills rely on labor-intensive, manually designed rewards that do not generalize well across different skills.

Inspired by how humans learn from demonstrations, we propose SkillMimic , a data-driven approach that mimics both human and ball motions to learn a wide variety of basketball skills. SkillMimic employs a unified configuration to learn diverse skills from human-ball motion datasets, with skill diversity and generalization improving as the dataset grows. This approach allows training a single policy to learn multiple skills, enabling smooth skill switching even if these switches are not present in the reference dataset. The skills acquired by SkillMimic can be easily reused by a high-level controller to accomplish complex basketball tasks.

To evaluate our approach, we introduce two basketball datasets: one estimated through monocular RGB videos and the other using advanced motion capture equipment, collectively containing about 35 minutes of diverse basketball skills. Experiments show that our method can effectively learn all the basketball skills contained in the dataset with a unified configuration, including various styles of dribbling, layups, and shooting. Furthermore, by training a high-level controller to reuse the acquired skills, we can achieve complex basketball tasks such as scoring, which involves dribbling toward the basket, timing the dribble and layup to score, retrieving the rebound, and repeating the process.

Video

Overview

Our system consists of three parts. (a) First, we capture real-world basketball skills to create a large Human-Object Interaction (HOI) motion dataset. (b) Second, we train a skill policy to learn interaction skills by imitating the corresponding HOI data. A unified HOI imitation reward is designed to imitate diverse HOI state transitions. (c) The third part involves training a High-Level Controller (HLC) to reuse the learned skills for complex tasks, with extremely simple task rewards.