LiSTAR: Ray-Centric World Models for 4D LiDAR Sequences in Autonomous Driving

1The Hong Kong University of Science and Technology (Guangzhou)
2Li Auto Inc.
3The Hong Kong University of Science and Technology
*Indicates Equal Contribution
Banner Image

Cartesian vs. HCS coordinate for LiDAR scene representation. Cartesian coordinate partitions space into uniform, axis-aligned cubes, ignoring the native ray geometry of LiDAR. HCS coordinates divides space into angular–radial cells centered at the sensor origin, aligning with LiDAR’s ray-based sampling pattern and preserving range-dependent resolution.

Abstract

Synthesizing high-fidelity and controllable 4D LiDAR data is crucial for creating scalable simulation environments for autonomous driving. This task is inherently challenging due to the sensor's unique spherical geometry, the temporal sparsity of point clouds, and the complexity of dynamic scenes. To address these challenges, we present LiSTAR, a novel generative world model that operates directly on the sensor's native geometry. LiSTAR introduces a Hybrid-Cylindrical-Spherical (HCS) representation to preserve data fidelity by mitigating quantization artifacts common in Cartesian grids. To capture complex dynamics from sparse temporal data, it utilizes a Spatio-Temporal Attention with Ray-Centric Transformer (START) that explicitly models feature evolution along individual sensor rays for robust temporal coherence. Furthermore, for controllable synthesis, we propose a novel 4D point cloud-aligned voxel layout for conditioning and a corresponding discrete Masked Generative START (MaskSTART) framework, which learns a compact, tokenized representation of the scene, enabling efficient, high-resolution, and layout-guided compositional generation. Comprehensive experiments validate LiSTAR's state-of-the-art performance across 4D LiDAR reconstruction, prediction, and conditional generation, with substantial quantitative gains: reducing generation MMD by a massive 76%, improving reconstruction IoU by 32%, and lowering prediction L1 Med by 50%. This level of performance provides a powerful new foundation for creating realistic and controllable autonomous systems simulations.

Poster

BibTeX

@article{liu2025listarraycentricworldmodels,
  title={LiSTAR: Ray-Centric World Models for 4D LiDAR Sequences in Autonomous Driving},
  author={Pei Liu and Songtao Wang and Lang Zhang and Xingyue Peng and Yuandong Lyu and Jiaxin Deng and Songxin Lu and Weiliang Ma and Xueyang Zhang and Yifei Zhan and XianPeng Lang and Jun Ma},
      year={2025},
      eprint={2511.16049},
      archivePrefix={arXiv},
      primaryClass={cs.CV},
      url={https://arxiv.org/abs/2511.16049}
}