Xiaotao Hu1,2*, Wei Yin2*§, Mingkai Jia1,2, Junyuan Deng1,2, Xiaoyang Guo2
Qian Zhang2, Xiaoxiao Long1†, Ping Tan1
HKUST1, Horizon Robotics2
* Equal Contribution, † Corresponding Author, § Project Leader
We present DrivingWorld (World Model for Autonomous Driving), a model that enables autoregressive video and ego state generation with high efficiency. DrivingWorld formulates the future state prediction (ego state and visions) as a next-state autoregressive style. Our DrivingWorld is able to predict over 40s videos and achieves high-fidelity controllable generation.
[Dec 2024]
Released paper, inference codes, and quick start guide.
- Hugging face demos
- Complete evaluation code
- Video preprocess code
- Training code
- 🔥 Novel Approach: GPT-style video and ego state generation.
- 🔥 State-of-the-art Performance: and long-duration driving-scene video results.
- 🔥 Controlable Generation: High-fidelity controllable generation with ego poses.
git clone https://github.com/YvanYin/DrivingWorld.git
cd DrivingWorld
pip3 install -r requirements.txt
- Download the pretrained models from Hugging Face, and move the pretrained parameters to
DrivingWorld/pretrained_models/*
For the public dataset, we use NuPlan and OpenDV-YouTube for testing.
We download nuPlan Val Split
in NuPlan. And we follow OpenDV-YouTube to get the validation set.
We share the json
files in Hugging Face.
Script for the default setting (conditioned on 15 frames, on Nuplan Validation set, adopt topk sampling):
cd tools
sh demo_test_long_term_nuplan.sh
sh demo_test_long_term_youtube.sh
sh demo_test_change_road.sh
You can change the setting with config file in <configs/>
If the paper and code from DrivingWorld
help your research, we kindly ask you to give a citation to our paper ❤️. Additionally, if you appreciate our work and find this repository useful, giving it a star ⭐️ would be a wonderful way to support our work. Thank you very much.
@article{hu2024drivingworld,
title={DrivingWorld: ConstructingWorld Model for Autonomous Driving via Video GPT},
author={Hu, Xiaotao and Yin, Wei and Jia, Mingkai and Deng, Junyuan and Guo, Xiaoyang and Zhang, Qian and Long, Xiaoxiao and Tan, Ping},
journal={arXiv preprint arXiv:2412.19505},
year={2024}
}
We thank for VQGAN, LlamaGen and LLlama 3.1 for their codebase.
This repository is under the MIT License. For more license questions, please contact Wei Yin ([email protected]) and Xiaotao Hu ([email protected]).