ResearcharXivNEW

DynaFLIP: Rethinking Robotics Perception via Tri-Modal-Dynamics Guided Representation

Lee 2026-05-28
Jusuk LeeSeungjae LeeJonghun Shin

Robot manipulation critically depends on perception that preserves the action-relevant aspects of a scene. Yet most robot learning pipelines are built upon visual encoders pre-trained for static recognition or vision-language alignment, leaving motion understanding to downstream policies. We introduce DynaFLIP, a dynamics-aware multimodal pre-training framework that pushes motion understanding ups

Topics

AIResearch