Learning Robust and Generalizable Interaction Skills from Sparse and Noisy Demonstrations

1Hong Kong University of Science and Technology 2Shanghai AI Laboratory

Abstract

We address a fundamental challenge in Reinforcement Learning from Interaction Demonstration (RLID): demonstration noise and coverage limitations. While existing data collection approaches provide valuable interaction demonstrations, they often yield sparse, disconnected, and noisy trajectories that fail to capture the full spectrum of possible skill variations and transitions. Our key insight is that despite noisy and sparse demonstrations, there exist infinite physically feasible trajectories that naturally bridge between demonstrated skills or emerge from their neighboring states, forming a continuous space of possible skill variations and transitions. Building upon this insight, we present two data augmentation techniques: a Stitched Trajectory Graph (STG) that discovers potential transitions between demonstration skills, and a State Transition Field (STF) that establishes unique connections for arbitrary states within the demonstration neighborhood. To enable effective RLID with augmented data, we develop an Adaptive Trajectory Sampling (ATS) strategy for dynamic curriculum generation and a historical encoding mechanism for memory-dependent skill learning. Our approach enables robust skill acquisition that significantly generalizes beyond the reference demonstrations. Extensive experiments across diverse interaction tasks demonstrate substantial improvements over state-of-the-art methods in terms of convergence stability, generalization capability, and recovery robustness.

Motivation - Demonstration Sparsity

Given reference trajectories of shooting and dribbling, there should exist natural transitions between them and countless recovery strategies for potential errors. However, such variations are often missing in collected data. Can we learn such diversities from sparse demonstrations?

Motivation - Demonstration Noise

Imitation from Interaction data can be highly sensitive to data degradation - even minor errors in a cup-grasping demonstration may lead to failure in imitation learning. For such noisy demonstrations, there must exist ideal trajectories. These optimal solutions are likely diverse and reside within some neighborhood of the original noisy demonstrations.

Method Overview

Given sparse and noisy demonstrations (e.g., two short trajectories of Shot and Dribble), there exist infinite valid but uncaptured trajectories that can either bridge between them or emerge from their neighboring states (illustrated by question marks). Our method discovers and learns these potential trajectories through three key steps:

(1) Constructing a Stitched Trajectory Graph (STG) to identify possible transitions between demonstrations. (2) Expanding STG into a State Transition Field (STF) that establishes connections for arbitrary states within the demonstration neighborhood. (3) Learning a skill policy via Adaptive Trajectory Sampling (ATS) and Reinforcement Learning from Interaction Demonstrations (RLID).

This enables robust skill transition and generalization far beyond the original sparse demonstrations which contain no instances of skill transitions or error recovery patterns.

Results - Skill Recovery Patterns

Houshold Manipulation —— Book Retrival

(Unstable Initial Pose)

Locomotion - Getup

(Unstable Initial Pose + Attacked)

Locomotion - Run & Getup (Blocked)

(Blocked)

Ball Play —— Shot

(Attacked)

Ball Play —— Dribble

(Attacked)

Ball Play —— Layup

(Unstable Initial Pose + Attacked)

Skill Generalization

More Learned Houshold Manipulation Skills

Pour-Kettle-Cup

Stand-Chair

Place-Book

Drink-Cup

Place-Kettle

Place Pan