Physics-Informed BEV World Model

A major challenge in deploying world models is the trade-off between size and performance. Large world models can capture rich physical dynamics but require massive computing resources, making them impractical for edge devices. Small world models are easier to deploy but often struggle to learn accurate physics, leading to poor predictions. To address this, we propose the Physics-Informed BEV World Model (PIWM), a compact model designed to efficiently capture physical interactions in bird’s-eye-view (BEV) representations. PIWM incorporates a Soft Mask mechanism during training to improve dynamic object modeling and future prediction. We also introduce a simple yet effective inference technique called Warm Start, which enhances prediction quality even in zero-shot settings. Experiments demonstrate that, at the same parameter scale (400M), PIWM surpasses the baseline by 60.6% in weighted overall score. Moreover, even when compared to the largest baseline model (400M), the smallest PIWM variant (130M with Soft Mask) achieves a 7.4% higher weighted overall score while delivering 28% faster inference speed.

On a run and want to get a gist of our paper? Listen to the following podcast!

Results of Human evaluation scores. The metrics considered are Interactive Existential Consistency (IEC), Kinematics Response (KIR), and Temporal Existential Consistency (TEC), Weighted Overall (WO). ^† indicates the experiments are evaluated by 4 humans. While the rest are evaluated by 24 humans. The baseline is DIAMOND.

At the same parameter scale (400M), PIWM surpasses the baseline by 60.6% in weighted overall score. Moreover, even when compared with the largest baseline model (400M), the smallest PIWM (130M Soft Mask) achieves a 7.4% higher weighted overall score with a 28% faster inference speed.

Check out what the Genie 3 creators say in interview, timestamp 25:25: How Do You Measure the Quality of a World Model?

BibTeX

@misc{anonymous,
        title={Enhancing Physical Consistency in Lightweight World Models}, 
        author={Anonymous Author(s) for now},
        year={2025},
        eprint={2509.12437},
        archivePrefix={arXiv},
        primaryClass={cs.AI},
        url={https://arxiv.org/abs/2509.12437}, 
  }
}

Physics-Informed BEV World Model

Typical Problems in World Models and our solution comparison

Physics-Informed BEV World Model Demonstration

Baseline (DIAMOND)

High visual quality but weak physical consistency

Hard Mask (Simple Geometry Prior)

Over‑constrained behavior, lane changes difficult.

Soft Mask (ours)

High physical consistency (interactive & temporal)

Abstract

Podcast

What is Soft Mask?

Quantitative comparison:

Why human subjective ratings matter?

BibTeX