CogDriver: Integrating Cognitive Inertia for Temporally Coherent Planning in Autonomous Driving

Pei Liu1,2*, Qingtian Ning3*, Xinyan Lu3, Haipeng Liu3, Weiliang Ma3†, Dangen She3, Xianpeng Lang3, Jun Ma1,2‡
1The Hong Kong University of Science and Technology (Guangzhou)
2The Hong Kong University of Science and Technology 3Li Auto Inc. * Indicates Equal Contribution  |   Project Lead  |   Corresponding Author
Banner Image

To generate temporally coherent data, we propose a novel Multi-View Spatiotemporal MLLM capable of processing concurrent video streams. Its reasoning is guided by our cognitive inertia Injection framework, which provides structured rules and tasks. The generated narrative is then rigorously verified against the ground-truth vehicle trajectory via Future History Alignment, ensuring the final annotations are both causally sound and coherent.

Abstract

The pursuit of autonomous agents capable of temporally coherent planning is hindered by a fundamental flaw in current vision-language models (VLMs): they lack cognitive inertia. Operating on isolated snapshots, these models cannot form a continuous understanding of the environment, leading to erratic decision jitter and a failure to execute complex, multi-step maneuvers. To remedy this, we introduce CogDriver, a framework designed to build a stable internal representation by instilling this crucial cognitive property. Our work makes two key contributions: (1) We present CogDriver-Data, a large-scale vision-language-action dataset whose narrative annotations provide the supervisory signal for learning temporal dynamics and persistent intent. (2) We develop the CogDriver-Agent, an architecture featuring a sparse temporal memory to maintain a stable internal state. This is enabled by a spatiotemporal knowledge distillation approach that explicitly teaches decision coherence. Comprehensive experiments validate our paradigm: CogDriver-Agent achieves a 22% increase in the closed-loop Driving Score on Bench2Drive and a 21% reduction in mean L2 error on nuScenes, establishing a new state-of-the-art. These significant gains in both long-term decision-making and imitation accuracy provide strong evidence that our agent successfully maintains a temporally coherent internal state, bridging the gap toward more reliable autonomous driving.

Poster

BibTeX

@article{liu2025CogDrive,
  title={CogDriver: Integrating Cognitive Inertia for Temporally Coherent Planning in Autonomous Driving},
  author={Pei Liu and Qingtian Ning and Xinyan Lu and Haipeng Liu and Weiliang Ma and Dangen She and Xianpeng Lang and Jun Ma}
      
}