Review

. 2024 Jul 16:7:0399.

doi: 10.34133/research.0399. eCollection 2024.

Prospective Role of Foundation Models in Advancing Autonomous Vehicles

Jianhua Wu¹, Bingzhao Gao^{1

2}, Jincheng Gao¹, Jianhao Yu¹, Hongqing Chu¹, Qiankun Yu³, Xun Gong⁴, Yi Chang⁴, H Eric Tseng⁵, Hong Chen^{6

7}, Jie Chen^{2

7}

Affiliations

¹ School of Automotive Studies, Tongji University, Shanghai 201804, China.
² Frontiers Science Center for Intelligent Autonomous Systems, Tongji University, Shanghai 201210, China.
³ SAIC Intelligent Technology, Shanghai 201805, China.
⁴ College of Artificial Intelligence, Jilin University, Changchun 130012, China.
⁵ Research and Advanced Engineering, Ford Motor Company, Dearborn, MI 48124, USA.
⁶ College of Electronic and Information Engineering, Tongji University, Shanghai 201804, China.
⁷ National Key Laboratory of Autonomous Intelligent Unmanned Systems, Shanghai 201210, China.

PMID: 39015204
PMCID: PMC11249913
DOI: 10.34133/research.0399

Review

Prospective Role of Foundation Models in Advancing Autonomous Vehicles

Jianhua Wu et al. Research (Wash D C). 2024.

. 2024 Jul 16:7:0399.

doi: 10.34133/research.0399. eCollection 2024.

Authors

Jianhua Wu¹, Bingzhao Gao^{1

2}, Jincheng Gao¹, Jianhao Yu¹, Hongqing Chu¹, Qiankun Yu³, Xun Gong⁴, Yi Chang⁴, H Eric Tseng⁵, Hong Chen^{6

7}, Jie Chen^{2

7}

Affiliations

¹ School of Automotive Studies, Tongji University, Shanghai 201804, China.
² Frontiers Science Center for Intelligent Autonomous Systems, Tongji University, Shanghai 201210, China.
³ SAIC Intelligent Technology, Shanghai 201805, China.
⁴ College of Artificial Intelligence, Jilin University, Changchun 130012, China.
⁵ Research and Advanced Engineering, Ford Motor Company, Dearborn, MI 48124, USA.
⁶ College of Electronic and Information Engineering, Tongji University, Shanghai 201804, China.
⁷ National Key Laboratory of Autonomous Intelligent Unmanned Systems, Shanghai 201210, China.

PMID: 39015204
PMCID: PMC11249913
DOI: 10.34133/research.0399

Abstract

With the development of artificial intelligence and breakthroughs in deep learning, large-scale foundation models (FMs), such as generative pre-trained transformer (GPT), Sora, etc., have achieved remarkable results in many fields including natural language processing and computer vision. The application of FMs in autonomous driving holds considerable promise. For example, they can contribute to enhancing scene understanding and reasoning. By pre-training on rich linguistic and visual data, FMs can understand and interpret various elements in a driving scene, and provide cognitive reasoning to give linguistic and action instructions for driving decisions and planning. Furthermore, FMs can augment data based on the understanding of driving scenarios to provide feasible scenes of those rare occurrences in the long tail distribution that are unlikely to be encountered during routine driving and data collection. The enhancement can subsequently lead to improvement in the accuracy and reliability of autonomous driving systems. Another testament to the potential of FMs' applications lies in world models, exemplified by the DREAMER series, which showcases the ability to comprehend physical laws and dynamics. Learning from massive data under the paradigm of self-supervised learning, world models can generate unseen yet plausible driving environments, facilitating the enhancement in the prediction of road users' behaviors and the off-line training of driving strategies. In this paper, we synthesize the applications and future trends of FMs in autonomous driving. By utilizing the powerful capabilities of FMs, we strive to tackle the potential issues stemming from the long-tail distribution in autonomous driving, consequently advancing overall safety in this domain.

PubMed Disclaimer

Conflict of interest statement

Competing interests: The authors declare that they have no competing interests.

Figures

**Fig. 2.**
Emergent abilities of LLMs [11]. (A) to (H) represent different downstream tasks. (A) 3-digit addition/subtraction, 2-digit multiplication. (B) Transliteration from the international phonetic alphabet. (C) Recovering scrambled words. (D) Persian question-answering. (E) Answering questions truthfully. (F) Mapping conceptual domains. (G) Massive multi-task language understanding. (H) Word in context semantic understanding. Each point is a separate LLM. The dotted line represents random performance.

**Fig. 3.**
The pipeline diagram for the supervised end-to-end autonomous driving system with a pretraining backbone. Multi-modal sensing information is input to the pretraining backbone to extract features, after which it enters into the framework of autonomous driving algorithms built by various methods to realize tasks, such as planning/control, to accomplish end-to-end autonomous driving tasks.

**Fig. 4.**
The pipeline diagram for enhancing autonomous driving leveraging FMs, where FMs refer to language models and vision models. FMs can learn perceptual information and utilize their powerful ability to understand the driving scenarios and reason to give language-guided instructions and driving actions to enhance autonomous driving.

**Fig. 5.**
For the application of LLMs to autonomous driving system decision-making, a typical pipeline is shown in this figure, referenced from DriveMLM [105].

**Fig. 6.**
For the application of LLMs to autonomous driving system planning, a typical pipeline is shown in this figure, referenced from LMDrive [119].

**Fig. 7.**
The pipeline diagram for enhancing autonomous driving with world models. The world models first learn the intrinsic evolutionary patterns by observing the traffic environment and then enhance autonomous driving by hooking up different decoders adapted to different driving tasks.

**Fig. 8.**
Comparison of the architecture of generative and non-generative methods [184]. (A) Generative architectures reconstruct a signal y from a compatible signal x using a decoder network conditioned on additional (possibly latent) variables z. (B) Joint-embedding predictive architectures predict the embeddings of a signal y from a compatible signal x using a predictor network conditioned on additional (possibly latent) variables z.

See this image and copyright information in PMC

References

1. Yurtsever E, Lambert J, Carballo A, Takeda K. A survey of autonomous driving: Common practices and emerging technologies. IEEE Access. 2020;8:58443–58469.
1. Grigorescu S, Trasnea B, Cocias T, Macesanu G. A survey of deep learning techniques for autonomous driving. J Field Robot. 2020;37(3):362–386.
1. Chen L, Wu P, Chitta K, Jaeger B, Geiger A, Li H. End-to-end autonomous driving: Challenges and frontiers. arXiv. 2023. 10.48550/arXiv.2306.16927 - DOI
1. Chib PS, Singh P. Recent advancements in end-to-end autonomous driving using deep learning: A survey. IEEE Trans Intell Veh. 2023;9(1):103–118.
1. Bommasani R, Hudson DA, Adeli E, Altman R, Arora S, von Arx S, Bernstein MS, Bohg J, Bosselut A, Brunskill E, et al. On the opportunities and risks of foundation models. arXiv. 2021. 10.48550/arXiv.2108.07258 - DOI

Publication types

Actions

LinkOut - more resources

Full Text Sources
- Atypon
- PubMed Central
Other Literature Sources
- The Lens - Patent Citations Database
Miscellaneous
- NCI CPTAC Assay Portal

Save citation to file

Email citation

Add to Collections

Add to My Bibliography

Your saved search

Create a file for external citation management software

Your RSS Feed

Prospective Role of Foundation Models in Advancing Autonomous Vehicles

Affiliations

Prospective Role of Foundation Models in Advancing Autonomous Vehicles

Authors

Affiliations

Abstract

Conflict of interest statement

Figures

References

Publication types

LinkOut - more resources

Full Text Sources

Other Literature Sources

Miscellaneous