Open House Now — Property Listing
Coarse-to-Control: Action-Token Planning for Vision-Language-Action Models
One-line summary
A robotics research paper on Coarse-to-Control: Action-Token Planning for Vision-Language-Action Models.
Property details
Additional property details will be updated shortly.
Property description
Most vision-language-action (VLA) models map observations directly to actions without explicit intermediate planning, which limits performance on long-horizon tasks where early mistakes compound. We propose Coarse-to-Control, a plan-execute VLA that introduces planning natively in the action-token space. The key idea is to let the policy first predict a compact sequence of coarse action tokens that summarize the intended future trajectory, and then generate executable action tokens conditioned on this plan. Because both planning and execution share a unified discrete action vocabulary, the plan stays close to the control manifold and provides directly actionable guidance rather than an abstract hint that must be translated back to motor commands. Experiments on LIBERO, SimplerEnv-WidowX, and real-world manipulation tasks show that action-token planning consistently improves over direct action generation, with the largest gains on long-horizon multi-stage tasks.
Links and sources
Interested in this property?
Open House Now can help you schedule a visit, connect with the listing agent, and find similar homes for sale in this neighborhood.
Contact us
Comments