首頁 > 市場調查報告書 > 汽車工業

ADAS 自動駕駛車

市場調查報告書

商品編碼

2064024

大規模端到端智慧駕駛模式研究報告（2026）

Intelligent Driving End-to-End Large Model Research Report, 2026

出版日期: 2026年05月08日 | 出版商:

ResearchInChina | 英文 595 Pages | 商品交期: 最快1-2個工作天內

價格

簡介目錄

大規模智慧駕駛模式研究—技術競爭與典範整合的關鍵時期

隨著自動駕駛技術從L2級快速演進至L3級和L4級，智慧駕駛系統正經歷從傳統規則驅動架構向下一代資料驅動和認知驅動架構的重大變革。作為這項轉變的核心技術，大規模智慧駕駛模型已成為產業競爭的焦點。隨著物理人工智慧時代的到來，自動駕駛被定位為首個大規模應用場景，推動汽車從傳統交通工具迅速轉型為「超級智慧體」，成為連接出行、移動辦公、家庭生活和第三方生態系統的全場景智慧樞紐。

從產業角度來看，實體人工智慧仍處於技術融合的早期階段，全球自動駕駛市場擁有巨大的未開發潛力。數據顯示，運作中。雖然全球年行駛總里程達13兆公里，但自動駕駛里程僅7億公里，僅佔總里程的0.006%左右。未來成長潛力大。

此外，從技術採納速度來看，大規模智慧駕駛模型正進入一個關鍵的技術迭代窗口期。分段式端到端解決方案預計將在2024年至2025年間實現量產，而單車型端到端和VLA技術預計將在2025年至2026年間廣泛部署。同時，由於自動駕駛體驗的不斷提升以及L3-L4級高級自動駕駛技術的加速成熟，實體人工智慧也在快速發展。 ResearchInChina預測了大規模自動駕駛模式的三大關鍵發展趨勢。

趨勢 1：2026 年大規模自動駕駛模型演進的核心將是多種技術路線的競爭和深度融合。

Bosch、Momenta 整合模型 1：單一模型端對端 + 世界模型 + 強化學習，領先供應商 - WeRide、Bosch、Momenta

特點：此單模型端到端模型作為智慧駕駛的核心神經網路，直接連接感測器輸入和駕駛輸出，實現零資訊損失，從而達到極高的性能上限。世界模型負責推斷未來的路況，並能以低成本產生大量長尾場景用於模擬訓練。強化學習基於獎勵機制，在推理空間內迭代最佳化，輸出最佳駕駛策略，以因應各種突發情況。這三者的結合構成了一個強大的封閉回路型：資料生成（世界模型）→策略學習（強化學習）→決策與執行（端到端模型）。這使得自動駕駛系統能夠從海量駕駛數據中學習並不斷進化。

整合模式 2：E2E+ 基礎模型（VLM/VLA）+ 強化學習 + 世界模型，代表性供應商：Horizon Robotics 和 Afari Technology

特點：大型視覺和語言模型充當「大腦」，負責認知推理；而較小的端到端模型充當「小腦」，負責快速執行。

Horizon Robotics採用單模型端到端+車輛模型+強化學習+世界模型的雙軌自動駕駛架構，融合了「快速思考」和「慢速思考」，其核心是強化學習。一方面，它透過世界模型和模擬訓練強化端到端直覺模型，實現毫秒響應，同時增強處理罕見、短時間、長尾場景的能力。另一方面，它透過提升推理能力增強車輛模型認知能力，提高對複雜、長時序場景的語意理解和邏輯推理能力。最後，它將車輛模型功能遷移到車輛模型，透過量化和蒸餾實現輕量級部署，從而形成「毫秒級快速響應+慢速、長時序推理」的平衡封閉回路型。

Afari Technology 採用 VLA+E2E+World 模型架構。在這個架構中，VLA 模型處理類似於慢速系統高階決策的推理，而 E2E 端對端演算法處理類似於快速系統的動作映射。首先，使用包含 320 億個參數的大規模模型進行大規模多模態預訓練 (VLM)，然後將其精簡為包含 70 億個參數的輕量級模型，以最佳化效能和部署之間的平衡。此外，透過監督式微調，將感知和駕駛行為進行匹配，引入駕駛領域知識 (VLA)，並學習高級駕駛策略和行為規範。透過強化學習將人類駕駛風格與安全約束結合，實現了感知、決策和控制的封閉回路型最佳化。

整合模式3：VLA+全球模式，代表性供應商－卓宇科技與小鵬汽車

功能特點：車輛邏輯分析系統（VLA）負責識別當前環境、學習過往駕駛模式並決定下一步。世界模型負責推斷道路上每個物體在未來5-10秒內的互動方式。 VLA擅長理解現狀，但不太擅長預測未來。另一方面，世界模型擅長預測，但無法反思或推斷預測結果。兩者結合構成了一個完整的「大腦」。

趨勢 2：VLA 和世界模型的融合範式有望成為實體人工智慧實現的主要方法之一。

未來大規模自動駕駛模型演進的核心在於從根本上重建其底層範式，從「模仿人類駕駛」轉向「理解物理世界」。虛擬學習與世界模式並非互斥的選擇；未來的大規模自動駕駛模式將是二者的融合。目前，兩種方法的差異在於，虛擬學習的支持者認為「理解」是駕駛的前提，而世界模型的支持者則認為「預測」才是關鍵。

世界模型的支持者認為，物理世界的變化是連續且高維度的。語言是一個離散的、低維度的符號系統，從物理到語言的轉換不可避免地會造成資訊損失。世界模型能夠以更高的頻寬直接操作物理表徵。 VLA的支持者認為，VLA的最大優勢在於其能夠與世界模型和基於模型的強化學習相結合進行微調。 VLA可以吸收世界模型的優勢，但世界模型無法利用VLM/VLA的優勢。語言是人類常識的壓縮包，因此具有強大的泛化能力。 VLA透過語言擁有「常識推理」能力和思考鏈（CoT），從而獲得自我解釋的能力。

基於這兩種方法的優勢和差異，業界已開始致力於將它們融合。目前，VLA與世界模型融合的主流模式主要有三種：統一潛在空間融合、架構層面深度融合、模組化協同融合（雲端模擬器型）。

融合模式1：潛在空間統一融合，代表性範例－小米OneVL與華為DriveVLA-W0

此方法的核心在於將世界模型的預測能力融入VLA的學習目標中，而不是在推理階段添加額外的模組。具體而言，透過在VLA模型的學習過程中加入未來影像預測任務，模型不僅可以學習行為預測，還可以學習未來時間點的環境狀態（即未來影像）。這種設計促使模型學習駕駛環境的潛在動態規律，而不是簡單地擬合稀疏的行為監督訊號。

潛在空間整合與融合案例研究1：小米OneVL自動駕駛模型

2026年5月13日，小米正式發布了「小米OneVL」，這是一款完全開放原始碼的自動駕駛模型，它將VLA（虛擬語言分析）、世界模型和潛在空間推理三種技術方法整合到一個統一的框架中。此模型的核心突破在於透過潛在空間推理深度融合了多種技術範式。與傳統方案將推理過程分解為人類可讀的自然語言並逐字生成演繹邏輯不同，小米OneVL直接在高維向量化的潛在空間中執行端到端的邏輯運算。此潛在空間融合了VLA的場景識別和理解能力以及世界模型的環境時間序列預測能力，並且由於所有推理操作都在向量層面而非文本層面進行，因此與傳統的VLA方案相比，推理效率得到了顯著提升。

在實現機制方面，首先在模型中引入兩種類型的潛在變數：視覺潛在標記和語言潛在標記。前者編碼場景中的物理關係和時間序列變化，負責世界模型的預測能力；後者表達駕駛意圖和語意邏輯，負責VLA的理解能力。

接下來，OneVL引入了兩個輔助解碼器，它們僅在訓練階段使用。語言輔助解碼器負責從語言潛在標記重建人類可讀的CoT文本，解釋模型做出某些駕駛決策的原因。視覺輔助解碼器負責從視覺潛在標記中預測未來影格（0.5秒和1.0秒後的影像）的視覺標記，使模型能夠預測場景變化。在推理階段，這兩個解碼器都會被移除，模型直接輸出預期結果。這實現了單步推理，徹底消除了自回歸導致的延遲累積。

潛在空間整合與融合案例研究2：華為DriveVLA-W0透過世界建模任務預測未來影像

傳統VLA模型面臨一個根本性問題：缺乏監督資訊。儘管VLA模型以高多模態資料（例如前視圖像序列、語音指令、歷史行為等）作為輸入，但監督訊號只是低維度的行為標記。這導致模型的大部分錶達能力被浪費，無法充分學習駕駛環境的複雜動態，也無法有效釋放VLA模型的巨大潛力。

如下圖所示，隨著訓練資料量從70萬幀增加到700萬幀，再到7000萬幀（資料量持續增加），碰撞率呈下降趨勢。換言之，訓練資料越多，安全性越高。然而，在缺乏世界模型的傳統VLA技術範式中，當資料量從700萬幀增加到7,000萬幀時，碰撞率的下降速度減緩。這顯示數據對提升VLA安全性能的影響存在極限。

為了應對VLA面臨的挑戰，例如自監督學習稀疏、資料尺度規律失效以及缺乏物理時間序列預測能力等問題，華為在其論文中提案了一種名為DriveVLA-W0的訓練範式。此範式引入了一個世界模型，在訓練階段將未來影像預測為密集的自監督訊號，從而在保持理解環境動態變化能力的同時，提升了未來時間序列預測能力。與傳統VLA相比，DriveVLA-W0增加了世界建模（預測未來道路狀況）。隨著資料量的增加，這種世界建模的優勢更加顯著，並強化了資料尺度規律。

具體而言，透過在VLA模型的學習過程中加入未來影像預測任務，該模型不僅能學習行為預測，還能學習未來時間點的環境狀態（即未來影像）。這種設計迫使模型學習駕駛環境的潛在動態規律，而不是簡單地擬合稀疏的行為監督訊號。

融合模式 2：架構層面的深度融合，代表性範例 - VLA-World

與預訓練融合（外在強化學習）不同，預訓練融合中世界模型作為外部工具，先生成後傳輸，而架構層面的深度融合將世界模型的功能內化為 VLA 的固有功能，從而允許規劃和生成在同一架構內共同發展。

VLA-World是由上海交通大學和華為中央研究院於2026年4月聯合提案的整合式VLA架構，它深度融合了世界模型功能。在傳統方案中，世界模型和VLA相互獨立，前者負責產生模擬影片，後者負責感知推理和決策輸出。 VLA-World採用單一的VLA骨幹網路，實現視覺生成和決策推理之間的特徵共用。它將軌跡預測和視覺生成作為同一決策鏈中的連續環節，並遵循先預測運動軌跡再基於該軌跡推斷未來圖像的因果邏輯，從而實現了深度模組耦合和高度一致的推理鏈。

運行機制：

基於軌跡感知的條件反射：VLA-World 首先預測軌跡，然後基於該軌跡產生未來的影格。軌跡預測的結果直接作為視覺產生的條件訊號，引導生成過程。這樣就形成了一種因果關係，軌跡決定了「去哪裡」，而圖像則呈現了「到達目的地後看到什麼」。

生成與推理的融合：與傳統模型中世界模型和VLA是兩個獨立模組不同，VLA-World共用同一個VLA主幹。換句話說，它將視覺生成和推理整合在同一個VLA結構中。

端對端對齊與GRPO－在強化學習階段，模型使用GRPO（群體相對策略最佳化）進行最佳化。模型產生多個候選軌跡及其對應的未來影像，並獎勵「想像的未來」與「實際安全決策」相符的結果。這種機制確保視覺生成不再是獨立的任務，而是始終在提升下游決策品質方面發揮作用。

趨勢 3：智慧駕駛 AI 向基礎模型演進的進程將加速，產業將進入這些基礎模型的通用認知和推理能力的競爭時期。

2026年是自動駕駛平台模型湧現的第一年。 DeepRoute.ai、Afari Technology、卓宇科技、理想汽車和小鵬汽車均已發布相關產品。這些平台模型的核心在於建立一個通用且可重複使用的實體世界認知基礎，從而實現與所有層級自動駕駛的兼容性以及跨場景的功能轉換。

首先，自動駕駛本質上是一個典型的規模化問題，目前的實現主要受限於模型容量不足和資料封閉回路型效率低下。其次，現有的基礎模型規模有限，缺乏足夠的泛化能力來處理複雜的長尾場景。此外，高價值數據的挖掘依賴於人工篩選和審核，其碎片化和缺乏自動化限制了其長期迭代能力。

為了解決模型容量不足和封閉回路型資料效率低下這兩個瓶頸問題，DeepRoute.ai提案了一個解決方案：一個擁有 400 億參數的統一的基於 VLA 的模型。其核心在於「三位一體」模型角色設計，使同一模型能夠同時扮演三種角色：「駕駛者（視覺輸入 → 即時駕駛決策）、分析員（診斷關鍵場景）」和「評論者/判斷者（評估駕駛行為的安全合理性）」。這使得駕駛系統從一個單純的執行系統演變為一個具有認知能力的智慧系統。

在預處理階段，DeepRoute.ai 放棄了傳統的端對端模型方法（該方法依賴軌跡監督，數據利用率僅為 0.001%），轉而採用影片預測任務。這使得模型能夠透過預測影片序列來學習真實世界的動態結構，並將每個像素轉換為監督訊號，從而將資料利用率提高到接近 100%。

在核心訓練階段（中期訓練），該模型協同學習，重點關注以下三個任務：使用 V+A（視覺+行動）進行傳統的端到端駕駛學習；使用 V+A→L（行動後解釋）激活分析者和評論者的角色；以及使用 V→L+A（多模態模態邏輯推理）訓練駕駛員的推理能力。模型利用「思維鏈」方法，首先輸出關鍵事件的語言說明和決策邏輯，然後輸出具體的駕駛軌跡。

在工程實現方面，DeepRoute.ai 透過 KV 快取、多令牌預測 (MTP)、模型量化以及自主研發的推理引擎等最佳化技術，實現了 10-15Hz 的即時閉迴路控制能力，將單步處理 1000 個視覺令牌和數十個封閉回路型令牌的延遲控制在 60-85 毫秒以內。此外，底層模型可根據車輛晶片的運算能力靈活部署，例如在 100 TOPS 平台上部署純駕駛 VA 模型，在 500 TOPS 平台上部署具備邏輯推理能力的 VLA 模型。

此外，此基礎模型經過預先訓練，能夠學習現實世界的物理定律和空間邏輯，並具備原生零樣本轉換能力。憑藉其多功能的認知基礎，該模型透過模型蒸餾、計算能力最佳化和功能微調，可適應從L2駕駛輔助到L4自動駕駛的各個級別。該模型最初應用於自動駕駛領域，未來將擴展到包括人形機器人和工業機器人在內的多個領域，最終實現「萬物互聯」的目標。

2026年，卓宇科技將徹底轉型。公司以原生多模態平台為技術基礎，力求從「智慧駕駛一級供應商」升級為「行動出行與實體人工智慧企業」。本公司將專注於擴大乘用車、商用車、L4級自動駕駛產品及海外擴張等全場景、全垂直領域的量產規模，並進一步進軍實戰機器人領域。

卓宇發布了VLA（VLA世界模型，原生多模態FM），該模型利用統一的骨幹網處理視覺、文字和感測器數據，在潛在空間中進行物理推理，並直接輸出駕駛動作。從預訓練階段開始，VLA利用圖像、影片、文字、駕駛和機器人資料進行協同學習，在統一的潛在空間中預測和推斷物理世界，從而理解語義和物理規律。

2026年將是大規模自動駕駛模型技術進步和典範轉移的關鍵一年。多條技術路線的競爭與整合、虛擬實驗室（VLA）與世界模型的協同部署以及基礎模型的大規模應用，將加速自動駕駛產業從「技術探索」向「大規模應用」的轉變。無論是多路徑融合的技術創新，還是基礎模型的通用部署，其核心目標都圍繞著「更安全、更有效率、更適應真實駕駛場景」這一目標。「實體人工智慧」的實現趨勢將進一步推動自動駕駛系統從「模仿人類」階段邁向「理解世界」階段，最終實現真正的自動駕駛。

未來，隨著技術的不斷發展和產業鏈的協調完善，大規模自動駕駛模型有望逐步克服現有瓶頸，成為支撐自動駕駛大規模部署的核心，重塑出行行業的發展格局，並推動移動物理人工智慧向更多場景擴展和應用。

端到端自動駕駛的術語和概念
端對端自動駕駛術語解釋
端到端相關概念之間的關聯性與差異
端到端自動駕駛概述及發展現狀
這是端到端自動駕駛的典型例子。
商湯科技 UniAD：一款專注於路徑規劃的大規模人工智慧模型，可提供端到端的商業場景應用。
商湯科技 UniAD 的技術原則與架構
Horizon VAD 的技術原則和架構
Horizon VADv2 的技術原理和架構
VADv2 培訓
DriveVLM技術原理與架構
理想汽車採用混合專業技術（MoE）架構。
MOE 和 STR2
上海啟智研究院的端到端自動駕駛模式SGADS：一種基於強化學習和模仿學習的安全通用型端到端自動駕駛系統
上海交通大學ActiveAD主動學習案例研究：從資料中心觀點解決資料標註瓶頸問題
大多數端對端自動駕駛系統都是基於基礎模型開發的。
基本型
視覺語言模型（VLM）
視覺語言模型（VLM）在智慧駕駛的應用
基本模型在自動駕駛的應用
視覺語言模型（VLM）的應用
視覺語言模型（VLM）的發展過程
視覺語言模型（VLM）架構
VLM在端對端自動駕駛的應用原則
VLM在端對端自動駕駛的應用
VLM模型在智慧駕駛中面臨的挑戰
視覺-語言-行為模式（VLA）
VLM → VLA
VLM+E2E->VLA
VLA架構分析
典型的VLA架構
VLA架構分析範例：李氏自動化思維VLA架構的解構（1）
VLA架構分析範例：李氏自動化思維VLA架構的解構（2）
大型VLA模型的概念
VLA模型原理
VLA 型號的分類
對甚大陣列（VLA）技術演進的解讀
大規模語言模型是端到端解決方案的核心組成部分之一。
VLA的技術架構與關鍵技術
VLA的優勢
部署VLA模型面臨的挑戰－即時回應能力
VLA模型部署中即時效能和記憶體使用率面臨的挑戰
VLA模型部署面臨的挑戰 - 數據
部署VLA模型的挑戰－長期任務規劃能力
VLA大型模型的演化路徑
VLA技術範式的代表性模型
VLA 資料集和基準
世界模型
世界模型原型：心智模型
世界模型的關鍵定義與應用發展
世界模型的基本架構
世界自動駕駛模式的三大核心價值
全球模式的兩條主要技術路線
生成世界模型 DIAMOND：擴散模型 + 即時強化學習自適應 + 長期穩定性
Genie：一種用於從未標記的網路影片中學習現實世界物理定律的生成式互動式世界模型
WorldDreamer的技術原理與開發過程
隱性世界模型：V-JEPA2 的技術原則與路徑
隱性世界模型：Comma.ai 的技術原則與發展路徑
建立和實施全球模型框架的困難。
基於變壓器和Diffusion模型的影片生成方法
全球模型可能是實現端到端自動駕駛的理想方法之一。
世界模型 - 產生虛擬訓練數據
世界模型 - 特斯拉世界模型
世界模式 - NVIDIA
InfinityDrive：突破駕駛世界車型的時間限制
SenseAuto InfinityDrive 參數效能
SenseAuto InfinityDrive 管道
商湯科技 DiT 架構及關鍵影片產生評估指標 FID/FV
在自動駕駛領域引入世界模型的挑戰
端到端大規模建模技術範式的比較
擴散模型
四種主流生成模型
擴散模型原理
擴散模型最佳化了智慧駕駛軌跡產生的核心環節。
基於擴散模型的駕駛軌跡生成智慧最佳化
擴散模型在智慧駕駛的應用
擴散模型的實際應用

第2章：端到端自動駕駛的技術路線與發展趨勢

端到端自動駕駛技術趨勢
端到端大型模式中智慧駕駛演化路徑的總結
趨勢 1：2026 年大規模自動駕駛模型演進的核心重點將是多種技術路線的競爭和深度整合。
整合範例 1：Afari Technology 的自動駕駛系統採用 VLA+E2E 協作閉合迴路。
整合範例 2：L3 啟用的世界行動模型 (WAM) 建構了「VLA + 世界模型 + 安全對抗模型」的三方架構。
趨勢 2：VLA 和世界模型融合範式有望成為實體人工智慧實現的主流方法之一。
VLA + 世界模式整合案例研究 1：小米 OneVL 將 VLA 和世界模式整合到單一框架中
小米 OneVL 架構拆解
VLA+World 模型整合案例研究 2：小鵬汽車推出 X-World
VLA+世界模型整合案例研究3：透過華為DriveVLA-W0替代世界建模任務預測未來影像
DriveVLA-W0架構的解構
DriveVLA-W0 利用全域模型來放大自動駕駛資料的縮放規律。
VLA+世界模型整合案例研究 4：Bosch ExploreVLA 實現了基於 VLA+RL 的世界模型，並取得了三項重大突破。
Bosch ExploreVLA 模型架構的分解
趨勢三：自動駕駛正進入實體人工智慧階段。
實體人工智慧的終極形式將連接數位世界和物理世界，而自動駕駛將是實現這一目標的最佳媒介。
趨勢 4：智慧駕駛 AI 的發展正加速向基礎模型演進，產業正進入這些基礎模型的通用認知和推理能力的競爭階段。
案例 1：基於 DeepRoute 40B VLA 模型中的硬核心技術創新
案例研究 2：卓宇科技 2026 年策略的核心：建構行動智慧平台模式 (1)
案例研究2：卓宇科技2026策略的核心：建構行動智慧平台模式（2）
案例研究3：小鵬世界基金會模式
趨勢 5：端到端自動駕駛進入複雜的運行階段，數據閉合迴路競爭日益激烈。
案例：NVIDIA MOSAIC
趨勢 6：機器人和智慧駕駛將是通往通用人工智慧 (AGI) 的兩大端到端應用場景。
端到端自動駕駛市場趨勢
ADAS一級供應商端對端自動駕駛大規模模型配置比較
與其他端對端自動駕駛系統供應商的解決方案配置比較
各廠商端到端自動駕駛大規模模型配置比較（1）：小米、小鵬、理想汽車、蔚來汽車
不同廠商（2）：長安、比亞迪、躍遷汽車端到端自動駕駛大規模模型配置對比
將奇瑞、東風汽車和意進汽車這三家汽車製造商的端到端自動駕駛大規模模型配置進行比較。
對廣汽、一汽紅旗和吉利四家汽車製造商的端到端自動駕駛大規模模型配置進行比較

第3章：端到端自動駕駛供應商

第4章：端對端自動駕駛中的OEM佈局

Xiaomi
輪廓
2026年策略計劃
2026年新車計畫的全面分析
2026年新車的產品定位與參數基準分析
智慧駕駛部門的組織結構重組
智慧駕駛技術路線：對所有路線進行全面的行前調查，不依賴單一技術。
VLA 與端對端路由的比較
智慧駕駛演算法的演進趨勢：從模組化端到端架構到端到端架構；世界模型+強化學習的引入
將於 2026 年發布 XLA 認知大型模型。
智慧駕駛系統與大型模式的發展藍圖
HAD擴充
Orion：端對端VLA智慧駕駛解決方案
REION框架
物理世界建模架構
三層分離建模：多模型端對端
長影片生成框架 - MiLA
XPeng
面向端對端智慧駕駛大型車款的演進藍圖
自動駕駛產品規劃，2025-2026年
2026 年 L4 級自動駕駛佈局：無人計程車
第二代VLA：一種原生多模態物理世界大規模模型
L4 功能 = 模型 × 運算能力 × 資料量 × 車輛硬體
第二代超高速航空
世界基金會模式
世界基金會模式核心技術路徑
世界基金會模式研究與發展成果的三個階段。
雲模型工廠
端對端系統：架構
Li Auto
大型車型端到端智慧駕駛演進藍圖
將於 2026 年發布下一代整合架構「MindVLA-o1」。
下一代整合架構 MindVLA-o1
從 E2E+VLM 雙系統到 MindVLA 的演變
MindVLA模型架構
MindVLA的核心技術1：卓越的3D空間辨識能力
MindVLA 的核心技術 2：與大規模語言模型 (LLM) 的整合
MindVLA核心技術3：擴散與RLHF的結合
MindVLA 的核心技術 4：世界模型與 NVAIE 加速強化學習
端對端解決方案
Tesla
2024年人工智慧大會解讀
AD演算法的發展史
2023-2024 年全程進展總結
FSD v13
自動駕駛演算法的發展史：進入一個強調感知和地圖的時代
AD演算法的發展史
AD演算法發展歷程：多相機融合演算法HydraNet
AD演算法發展歷程：FSD V12
感知與決策全端整合模型的核心要素
端到端演算法
世界模型
數據引擎
道場超級電腦中心：概述
道場超級電腦中心：基於D1晶片整合的訓練模組
道場超級電腦中心：運算能力發展計劃
NIO
智慧駕駛事業部重組，2024-2025年
從基於模型到端到端，世界模型成為主導技術範式。
端到端大規模模型的演化路徑
智慧駕駛系統的詳細描述
蔚來世界模型（NWM）
重構世界模型想像與群體智慧的能力
NSim 模擬器（NIO 仿真）
世界模型 2.0
端到端模型與世界模型的比較
VLA模型與世界模型的比較
Changan
Dubhe Plan 2.0 - Tenju 智慧駕駛
TOPS AD 軟體架構
品牌佈局
ADAS策略：「杜布計畫」策略
端對端系統：BEV+LLM+GoT
量產車型配備端對端系統：NEVO E07
Chery
產品矩陣與車輛型號
智慧駕駛系統的發展史
2025年將推出四款獵鷹飛行員無人機。
大型端對端智慧駕駛模式的進展
GAC Group
智慧駕駛大型車型策略
ADiGO智慧駕駛系統演進藍圖（ADiGO 1.0至ADiGO 6.0）
將於 2025 年推出五大智慧駕駛平台。
L2.9車輛和城市NOA演算法/智慧駕駛系統供應商
「雙梯度智慧驅動供應商+基於場景的定價匹配」策略使城市網路營運商能夠實現「高階定位+大眾市場可及性」。
華為採用「廣汽智慧製造+華為智慧」模式，拓展高階市場，強化品牌矩陣。
華王愛達樂園 F03 的首款車型預定於 2026 年第二季發布。
Momenta 5.0 的單模型端對端演算法現已安裝在 15 萬元級的車輛中，並且還提供都市區NOA（僅噪音接入）功能。
傳訊馳上灣S7配備了加強型、更大的Momenta R6車型。
ADiGO端到端表現推理模型的架構
ADiGO的核心技術
Leapmotor
全球模型將於2026年發布
D19 採用大型 VLA 模型，實現了門到門 NOA，支援所有場景。
採用了自主研發的智慧駕駛系統。
LeapMotor Pilot 發展藍圖
端對端先進智慧駕駛
端到端高階智慧駕駛應用場景
IM Motors
智慧駕駛系統迭代歷史
與 Momenta 合作開發智慧駕駛技術
IM AD 端對端 2.0 智慧駕駛大型機型
IM AD 端對端 2.0 智慧駕駛：大型車款核心技術
大型車款與IM AD端對端2.0智慧駕駛應用場景對比
FAW Hongqi
思南智慧駕駛技術架構
端到端大型模型的核心技術
思南智慧駕駛解決方案
西南智慧駕駛解決方案車輛部署計畫及未來規劃
思南智慧駕駛系統：與大疆卓宇科技合作開發
配備Sinan智慧駕駛系統及其主要配置的車輛
卓宇端到端4.0系統將於2026年在西南智慧駕駛展上首次亮相。
一汽紅旗9系列車款計畫於2026年採用華為的高科技功能。
Dongfeng
智慧駕駛策略規劃：2026-2030年
2025年，天元將發表四階段智慧駕駛產品矩陣：涵蓋L2至L4/L5。
針對首次搭載天元T100/T200/T500的量產車型智慧駕駛配置進行比較
天元智慧駕駛技術架構研發
智慧駕駛策略：短期內，內部研發與外部採購並行運作；長期來看，逐步取代內部研發。
BYD
2026年智慧駕駛計畫概述
智慧駕駛領域的佈局：全球車型初步研究
智慧駕駛團隊重組（1）：整合兩個智慧駕駛部門，共用資源，加速實現普適智慧駕駛。
智慧駕駛團隊組織結構調整（2）：增加先進技術研發中心建置的投資

簡介目錄

Product Code: DTT011

Research on Intelligent Driving Large Models: A Critical Period for Technological Competition and Paradigm Integration

As autonomous driving technology rapidly iterates from L2 to L3-L4, intelligent driving systems are shifting profoundly from traditional rule-driven architectures to the new generation of data-driven + cognition-driven architectures. As the underlying core enabler, intelligent driving large models have become the core track in industry competition. As the accelerated arrival of the Physical AI era, autonomous driving stands as its first large-scale application scenario, promoting automobiles to evolve rapidly into super agents that transcend the nature of traditional transportation tools and become all-scenario intelligent hubs connecting mobility, mobile office, home life, and third-party ecosystems.

From an industrial perspective, Physical AI remains in the early stage of technological fission, and the global autonomous driving market holds massive untapped potential. According to the data, there is a global ownership of about 1.5 billion passenger cars, 280 million commercial vehicles and trucks, and 18 million operating taxis. The total annual global driving mileage reaches 13 trillion kilometers, while the autonomous driving mileage is only 700 million kilometers, accounting for only about 0.006%. The future incremental potential is significant.

Judging further from the pace of technological implementation, intelligent driving large models are ushering in a critical technological iteration window period. The segmented end-to-end solution has come into mass production during 2024-2025, and the one-model end-to-end and VLA technologies are intensively implemented during 2025-2026. Coupled with the continuous upgrading of intelligent driving experience and the accelerated maturation of L3-L4 high-level autonomous driving technology, physical AI is accelerating. ResearchInChina predicts three major evolution trends of intelligent driving large models.

Trend 1: The Core Focus of Autonomous Driving Large Model Evolution in 2026 Will Be Competition and Deep Integration of Multiple Technical Routes.

Bosch,Momenta Integration Mode 1: One-model End-to-End + World Model + Reinforcement Learning, Representative Suppliers: WeRide, Bosch and Momenta

Features: The one-model end-to-end model serves as the core neural network of intelligent driving, directly connecting sensor input and driving output with zero information loss and extremely high performance ceiling; the world model is responsible for future deduction of road conditions and can generate massive long-tail scenarios at low cost for simulation training; reinforcement learning iterates and optimizes in the deduction space relying on the reward mechanism, outputs the optimal driving strategy, and copes with various sudden working conditions. The combination of the three forms a powerful closed loop of "data generation (world model) -> policy training (reinforcement learning) -> decision and execution (end-to-end model)". This enables intelligent driving systems to learn from massive driving data and keep evolving.

Integration Mode 2: E2E + Foundation Model (VLM/VLA) + Reinforcement Learning + World Model, Representative Suppliers: Horizon Robotics and Afari Technology

Features: The vision-language large model acts as the "cerebrum" responsible for cognitive reasoning, and the small end-to-end model acts as the "cerebellum" responsible for rapid execution.

Horizon Robotics adopts the one-model E2E + VLM + reinforcement learning + world model. Horizon Robotics' "fast thinking + slow thinking" dual-track intelligent driving architecture takes reinforcement learning as the hub. On the one hand, it empowers the end-to-end intuition model through the world model and simulation training, enabling it to respond in milliseconds while complementing the ability to handle rare short-time-sequence long-tail scenarios. On the other hand, it empowers the VLM cognitive model through reasoning enhancement, strengthening its semantic understanding and logical reasoning capabilities for long-time-sequence complex scenarios. It finally realizes the migration of VLM capabilities to the vehicle model, and completes lightweight deployment by quantization and distillation, building a balanced closed loop of "millisecond-level fast response + long-time-sequence slow reasoning".

Afari Technology adopts the VLA + E2E + world model architecture, in which the VLA model is responsible for reasoning similar to the high-level decision by the slow system, and the E2E end-to-end algorithm is responsible for mapping actions similar to the fast system. The 32B-parameter large model is used for large-scale multimodal pre-training (VLM) -> distilled into a 7B lightweight model, balances performance and deployment (VLM) -> aligning perception and driving actions, introduces driving domain knowledge (VLA) -> supervised fine-tuning, and learns high-level driving strategies and behavioral norms -> reinforcement learning aligning human driving styles and safety constraints, realizing perception-decision-control closed-loop optimization.

Integration Mode 3: VLA + World Model, Representative Suppliers: Zhuoyu Technology and XPeng

Features: VLA is responsible for perceiving the current environment, learning historical driving patterns, and determining the next action. The world model is responsible for deducing how each target on the road will interact in the next 5 to 10 seconds. VLA is good at understanding the present but not predicting the future; the world model is good at prediction but does not reflect on and reason about the prediction results. The combination of the two constitutes a complete brain.

Trend 2: The VLA and world model fusion paradigm is expected to become one of the main ways for the implementation of Physical AI.

The core of the future evolution of intelligent driving large models is the fundamental reconstruction of the underlying paradigm from "imitating human driving" to "understanding the physical world". VLA and world model are not an either-or choice. The future intelligent driving large model will be a fusion of the two. At present, the divergence between the two routes lies in that VLA advocates believe that "understanding" is the premise of driving, while world model advocates believe that "prediction" is the key.

World model advocates believe that changes in the physical world are continuous and high-dimensional. Language is a discrete, low-dimensional symbolic system - the transformation from physics to language is inevitably accompanied by information loss. The world model directly operates physical representations with higher bandwidth. VLA advocates believe that the biggest advantage of VLA is that it can be fine-tuned with the world model or model-based reinforcement learning. It can absorb the advantages of the world model, while the world model cannot utilize the advantages of VLM/VLA. Language brings strong generalization capability for it is a compressed package of human common sense. VLA possesses "common sense reasoning" capability and Chain-of-Thought (CoT) via language, thus gaining self-explanation capability.

Based on the advantages and divergences of the two routes, the industry has begun to explore the fusion path of the two. At present, there are three mainstream fusion modes for VLA and world model: latent space unified fusion, in-depth fusion at the architectural level, and modular collaborative fusion (cloud simulator type).

Fusion Mode 1: Latent Space Unified Fusion, Representatives: Xiaomi OneVL and Huawei DriveVLA-W0

The core is to embed the prediction capability of the world model into the training objectives of VLA, rather than adding additional modules in the reasoning stage. Specifically, it adds a future image prediction task to the training process of the VLA model, allowing the model to not only learn to predict actions, but also the environmental state (i.e., future images) at future moments. This design forces the model to learn the underlying dynamic laws of the driving environment, rather than just fitting sparse action supervision signals.

Case 1 of Latent Space Unified Fusion: Xiaomi OneVL Autonomous Driving Model

On May 13, 2026, Xiaomi officially released Xiaomi OneVL, a fully open-sourced autonomous driving model which unifies the three technical routes of VLA, world model and latent space reasoning into the same framework. The core breakthrough of this model is the in-depth unification of multiple technical paradigms through latent space reasoning. Differing from traditional solutions that decompose the reasoning process into human-readable natural language and generate deduction logic word by word, Xiaomi OneVL directly completes end-to-end logical operations in the high-dimensional vectorized latent space. This latent space integrates both the scenario perception and understanding capability of VLA and the environmental time-series prediction capability of the world model, and all reasoning operations are carried out at the vector level rather than the text level, achieving a significant leap in reasoning efficiency compared with traditional VLA solutions.

In terms of implementation mechanism, firstly, two types of latent variables are introduced inside the model: visual latent token and language latent token. The former is responsible for encoding physical relationships and time-series changes in the scene, carrying the prediction capability of the world model. The latter is responsible for expressing driving intentions and semantic logic, carrying the understanding capability of VLA.

Secondly, OneVL introduces two auxiliary decoders, which are only used in the training stage. The language auxiliary decoder is responsible for restoring human-readable CoT text from the language latent token, explaining why the model makes a certain driving decision. The visual auxiliary decoder is responsible for predicting future frame visual tokens (images after 0.5 seconds and 1.0 seconds) from the visual latent token, allowing the model to predict scene changes. During inference, both decoders are removed, and the model directly outputs planning results, realizing one-step reasoning and completely eliminating the delay accumulation caused by autoregression.

Case 2 of Latent Space Unified Fusion: Huawei DriveVLA-W0 Predicts Future Images Through World Modeling Tasks

Traditional VLA models face a fundamental problem: Supervision Deficit. The input of VLA models is high-dimensional multimodal data (front-view image sequences, language instructions, historical actions, etc.), but the supervision signal is only low-dimensional action tokens. Most of the model's representation capacity is wasted, resulting in its inability to fully learn the complex dynamics of the driving environment, and the huge potential of VLA models cannot be effectively released.

As can be seen from the figure below, as the amount of training data increases from 700,000 frames to 7 million frames and then to 70 million frames (ever more data), the collision rate shows a downward trend, that is, the more training data, the better the safety. However, for the traditional VLA technical paradigm without the world model, when the data increases from 7 million frames to 70 million frames, the decline in collision rate slows down, indicating that data has limited effect on improving the safety performance of VLA.

To solve the sore points of VLA such as sparse supervision, failure of data scaling law, and lack of physical time-series prediction capability, Huawei proposed the DriveVLA-W0 training paradigm in its paper, introducing the world model to predict future images as dense self-supervision signals during the training stage, so as to increase future time-series prediction while maintaining the ability to understand environmental dynamics. Compared with traditional VLA, DriveVLA-W0 adds world modeling (predicting future road conditions): the more data, the greater the advantage is magnified, and the data scaling law is strengthened.

Specifically, it adds a future image prediction task to the training process of the VLA model, allowing the model to not only learn to predict actions, but also the environmental state (i.e., future images) at future moments. This design forces the model to learn the underlying dynamic laws of the driving environment, rather than just fitting sparse action supervision signals.

Fusion Mode 2: In-depth Fusion at the Architectural Level, Representative: VLA-World

Differing from pre-training fusion (external reinforcement), where the world model acts as an external tool to generate first and then transmit, in-depth fusion at the architectural level internalizes the world model capability into the native capability of VLA, with planning and generation growing together in the same architecture.

VLA-World, jointly proposed by Shanghai Jiao Tong University and Huawei Central Research Institute in April 2026, is an integrated VLA architecture with deeply embedded world model capabilities. In traditional solutions, the world model and VLA are independent of each other, with the former responsible for generating simulation videos and the latter for perception reasoning and decision output. VLA-World adopts a single VLA backbone network for feature sharing between visual generation and decision reasoning. It integrates trajectory prediction and visual generation into continuous links of the same decision chain, and follows the causal logic of predicting motion trajectory first and then deducing future images based on the trajectory, realizing deep module coupling and highly coherent reasoning chain.

Working Mechanism:

Trajectory Perception Conditioning: VLA-World predicts the trajectory first, and then generates future frames conditioned on the trajectory: the trajectory prediction result directly serves as the conditioning signal for visual generation to guide the generation process. In this way, the trajectory determines "where to go", and the image presents "what to see when arriving there", forming a causal dependency.

Unified Generation and Reasoning: Differing from the past when the world model and VLA were two independent modules, VLA-World enables the two to share the same VLA backbone, that is, unifying visual generation and reasoning in the same VLA structure.

GRPO End-to-End Alignment: GRPO (Group Relative Policy Optimization) is used to optimize the model during the reinforcement learning stage. The model generates multiple candidate trajectories and corresponding future images, and rewards those results where the "imagined future" is consistent with the "real safe decision". This mechanism makes visual generation no longer an independent task, but always serves the quality of downstream decisions.

Trend 3: The Evolution of Intelligent Driving AI Towards Foundation Models Accelerates, and the Industry Will Enter A Competition Period of General Cognitive and Reasoning Capabilities of Foundation Models.

2026 is the first year of the launch of autonomous driving foundation models. DeepRoute.ai, Afari Technology, Zhuoyu Technology, Li Auto, and XPeng have launched related products. The core of foundation models is to build a universal and reusable cognitive base for the physical world, realizing full-level intelligent driving compatibility and cross-scenario capability migration.

Firstly, autonomous driving is essentially a typical scaling problem, and current implementation is mainly restricted by insufficient model capacity and low efficiency of data closed-loop. First of all, the existing foundation models have limited scale and insufficient generalization capability for long-tail complex scenarios; secondly, high-value data mining relies on manual screening and review, with fragmentation and low automation, limiting long-term iterative capabilities.

To address the two bottlenecks of insufficient model capacity and inefficient data closed-loop, DeepRoute.ai proposed a solution, a unified 40B-parameter VLA foundation model. The core innovation lies in the "trinity" model role design, allowing the same model to play three roles simultaneously: driver (visual input -> real-time driving decision), analyst (diagnostic understanding of key scenarios), and critic/ referee (evaluating the safety and rationality of driving behavior), upgrading the driving system from a simple execution system to an intelligent system with cognitive capabilities.

In the pre-training stage, DeepRoute.ai abandons the traditional approach of the end-to-end model relying on trajectory supervision (data utilization rate is only 0.001%), and instead adopts the video prediction task, enabling the model to learn the dynamic structure of the real world by predicting video sequences, turning every pixel into a supervision signal and increasing the data utilization rate to nearly 100%.

In the core training stage (Mid-train), the model conducts joint training around three tasks: V+A (vision + action) to learn conventional end-to-end driving, V+A->L (explanation after action) to activate the analyst and critic roles, and V->L+A (multimodal logical reasoning) to train a driver with reasoning capability, using Chain-of-Thought to let the model first output language descriptions and decision logic of key events, and then output specific driving trajectories.

In terms of engineering implementation, DeepRoute.ai controls the single-step processing latency of 1,000 visual tokens and dozens of reasoning tokens within 60-85 milliseconds using optimization methods such as KV Cache, Multi-Token Prediction (MTP), model quantization, and self-developed reasoning engine, realizing 10-15Hz real-time closed-loop control capability. Moreover, the foundation model can be flexibly distilled according to the computing power of vehicle chips, and deploy a pure driving VA model on a 100 TOPS platform, and a VLA model with logical reasoning capability on a 500 TOPS platform.

Then the foundation model pre-trains to learn the physical laws and spatial logic of the real world, with native zero-shot migration capability. With a universal cognitive base, it adapts to all levels from L2 assisted driving to L4 autonomous driving through model distillation, computing power tailoring, and capability fine-tuning. It is first applied to autonomous driving, and will migrate to multiple tracks such as humanoid robots and industrial robots in the future, realizing "one foundation making all things intelligent".

In 2026, Zhuoyu Technology fully transforms its strategy. Taking the native multimodal foundation model as the technical base, it aims to upgrade from an "intelligent driving Tier 1 supplier" to a "mobile physical AI company", focusing on mass production expansion across all scenarios and vertical domains covering passenger cars, commercial vehicles, L4 products and overseas layout, and extending to the field of embodied robots.

Zhuoyu launched VLA (VLA World Model, native multimodal FM): it uses a unified Backbone to process visual, text, and sensor data, completes physical reasoning in the latent space, and directly outputs driving actions. From the pre-training stage, it conducts joint training with image/video/text/driving/robot data, and performs prediction and reasoning of the physical world in a unified latent space, understanding both semantics and physical laws.

In 2026, a critical year for the technological iteration and paradigm fusion of intelligent driving large models, the competition and integration of multiple technical routes, the collaborative implementation of VLA and world model, and the large-scale launch of foundation models will jointly promote the intelligent driving industry to accelerate from "technological exploration" to "large-scale implementation". Whether it is technological innovation of multi-route integration or generalized layout of foundation models, the core is to revolve around the goal of "safer, more efficient, and more adaptable to real driving scenarios". The trend of "physical AI" implementation will further drive intelligent driving systems to evolve from "imitating humans" to "understanding the world", realizing true intelligent driving.

In the future, with the continuous iteration of technologies and the coordinated improvement of the industry chain, intelligent driving large models will gradually break through existing bottlenecks, become the core support for the large-scale implementation of autonomous driving, reshape the development pattern of the mobility sector, and also facilitate the extension and application of mobile physical AI in more scenarios.

1 Fundamentals of End-to-End Autonomous Driving Technology

1.1 Terms and Concepts of End-to-End Autonomous Driving
Explanation of End-to-End Autonomous Driving Terminologies
Correlation and Differences of End-to-End Related Concepts
1.2 Introduction to End-to-End Autonomous Driving and Development Status
- 1.2.1 Overview
- Emerging Background of End-to-End Autonomous Driving
- Deduced Impacts of Large AI Models on the Pattern of Autonomous Driving Industry
- Reasons for the Emergence of End-to-End Autonomous Driving: Commercial Value
- Transformer Enables Autonomous Driving
- Differences between End-to-End and Traditional Architectures (1)
- Differences between End-to-End and Traditional Architectures (2)
- Evolution of End-to-End Architecture
- Evolution Route of End-to-End Autonomous Driving
- Comparison between One-Model and Two-Model End-to-End
- Performance Parameter Benchmarking of Mainstream One-Model/Segmented End-to-End Systems
- Challenges and Solutions for Large-Scale Mass Production of End-to-End: Computing Power Supply/Data Acquisition
- Challenges and Solutions for Large-Scale Mass Production of End-to-End: Team Building/Interpretability
- Progress and Challenges in End-to-End Systems: World Model Generation + Neural Network Simulator + RL Accelerating Innovation
- Perception Layer under End-to-End Architecture
- 1.2.2 Implementation Methods of End-to-End Models
- Two Implementation Approaches for End-to-End
- End-to-End Implementation Method: Imitation Learning
- End-to-End Implementation Method: Reinforcement Learning
- Basic Architecture and Definition of Reinforcement Learning
- Mainstream Reinforcement Learning Algorithms
- 1.2.3 Verification Methods of End-to-End Models
- Dataset Evaluation Methods for End-to-End Autonomous Driving
- Three Major Simulation Tests for End-to-End Autonomous Driving Models (1) - Bench2Drive
- Three Major Simulation Tests for End-to-End Autonomous Driving Models (2) - HUGSIM
- Three Major Simulation Tests for End-to-End Autonomous Driving Models (3) - DriveArena
1.3 Classic End-to-End Autonomous Driving Cases
SenseTime UniAD: Path Planning-Oriented Large AI Model Provides E2E Commercial Scenario Applications
Technical Principles and Architecture of SenseTime UniAD
Technical Principles and Architecture of Horizon VAD
Technical Principles and Architecture of Horizon VADv2
Training of VADv2
Technical Principles and Architecture of DriveVLM
Li Auto Adopts Mixture-of-Experts (MoE) Architecture
MOE and STR2
Shanghai Qi Zhi Institute's E2E-AD Model SGADS: A Safe and Generalized E2E-AD System Based on Reinforcement Learning and Imitation Learning
Shanghai Jiao Tong University's ActiveAD Active Learning Case: Solving Data Labeling Bottleneck from A Data-centric Perspective
Most End-to-End Autonomous Driving Systems Are Developed Based on Foundation Models
1.4 Foundation Models
- 1.4.1 Introduction to Foundation Models
- Significance of Introducing Multimodal Models into End-to-End Autonomous Driving
- Core of End-to-End Systems - Foundation Models
- Foundation Model 1: Large Language Model (LLM) - Application Cases in Autonomous Driving
- Foundation Model 2: Vision Foundation - Application in Intelligent Driving
- Foundation Model 2: Vision Foundation - Latent Diffusion Models Framework
- Foundation Model 2: Vision Foundation - Wayve GAIA-1
- Foundation Model 2: Vision Foundation - DriveDreamer Framework
- Foundation Model 3: Multimodal Foundation Model - MFM
- Foundation Model 3: Multimodal Foundation Model - Application of GPT-4V in Intelligent Driving
- 1.4.2 Foundation Models - Multimodal Foundation Model
- Development and Overview of Multimodal Foundation Model
- Multimodal Foundation Model vs. Single-Modal Foundation Model (1)
- Multimodal Foundation Model vs. Single-Modal Foundation Model (2)
- Technical Panorama of Multimodal Foundation Model
- Multimodal Information Representation
- 1.4.3 Foundation Models - MLLM
- Multimodal Large Language Model (MLLM)
- Architecture and Core Components of Multimodal Large Language Model
- Mainstream Multimodal Large Language Models
- Application of Multimodal Large Language Model in Intelligent Driving
- CLIP Model
- LLaVA Model
1.5 Vision-Language Model (VLM)
Application of Vision-Language Model (VLM) in Intelligent Driving
Application of Foundation Models in Autonomous Driving
Application of Vision-Language Model (VLM)
Development History of Vision-Language Model (VLM)
Architecture of Vision-Language Model (VLM)
Application Principles of VLM in End-to-End Autonomous Driving
Application of VLM in End-to-End Autonomous Driving
Challenges Faced by VLM Models in Intelligent Driving
1.6 Vision-Language-Action Model (VLA)
VLM->VLA
VLM +E2E ->VLA
Analysis of VLA Architecture
Typical VLA Architectures
VLA Architecture Analysis Case: Disassembling Li Auto MindVLA Architecture (1)
VLA Architecture Analysis Case: Disassembling Li Auto MindVLA Architecture (2)
Concept of VLA Large Models
Principles of VLA Model
Classification of VLA Models
Interpretation of VLA Technology Evolution
Large Language Model as One of the Cores of End-to-End
Technical Architecture and Key Technologies of VLA
Advantages of VLA (1)
Advantages of VLA (2)
Advantages of VLA (3)
Deployment Challenges of VLA Model - Real-Time Response Capability
Real-Time Performance and Memory Occupancy Challenges of VLA Model Deployment
Deployment Challenges of VLA Model - Data (1)
Deployment Challenges of VLA Model - Data (2)
Deployment Challenges of VLA Model - Long-Term Task Planning Capability
Evolution Route of VLA Large Models
Representative Models of VLA Technical Paradigms
VLA Datasets and Benchmarks
1.7 World Model
World Model Prototype: Mental Model (1)
World Model Prototype: Mental Model (2)
Key Definitions and Application Development of World Model
Basic Architecture of World Model
Three Core Values of World Model Empowering Autonomous Driving
Two Major Technical Routes of World Model
Generative World Model DIAMOND: Diffusion Model + Real-Time RL Adaptation + Long-Term Stability
Generative Interactive World Model Genie: Unsupervised Learning of Real-World Physical Laws from Unlabeled Internet Videos
Technical Principles and Paths of WorldDreamer
Implicit World Model: Technical Principles and Paths of V-JEPA2
Implicit World Model: Technical Principles and Paths of Comma.ai
Framework Setting and Implementation Difficulties of World Model
Video Generation Methods Based on Transformer and Diffusion Models
World Model May be One of the Ideal Approaches to Realize End-to-End Autonomous Driving
World Model - Generation of Virtual Training Data
World Model - Tesla World Model
World Model - NVIDIA
InfinityDrive: Breaking Time Limits in Driving World Models
Parameter Performance of SenseAuto InfinityDrive
Pipeline of SenseAuto InfinityDrive
SenseTime DiT Architecture and Main Video Generation Evaluation Metrics FID/FV
Deployment Challenges of World Model in Autonomous Driving
1.8 Comparison between End-to-End Large Model Technical Paradigms
- 1.8.1 Technical Paradigm Comparison: Modular End-to-End vs. One-Model End-to-End vs. VLM/VLM+E2E/VLA
- Summary of Comparison between Three Mainstream Intelligent Driving Models (1): Modular / One-Model End-to-End / Foundation Model-Based Autonomous Driving Paradigm
- Summary of Comparison between Three Mainstream Intelligent Driving Models (2): Modular / One-Model End-to-End / Foundation Model-Based Autonomous Driving Paradigm
- Summary of Comparison between Three Mainstream Intelligent Driving Models (3): Modular / One-Model End-to-End / Foundation Model-Based Autonomous Driving Paradigm
- Definition and Classification of Generalized End-to-End (GE2E)
- Comparison of Different GE2E Autonomous Driving Paradigms: Planning-Only E2E vs. Multi-Task E2E
- Comparison of Different GE2E Autonomous Driving Paradigms: VLM-Driven Cognitive End-to-End Driving
- Comparison between Two Technical Paradigms: VLM + Traditional E2E
- Architecture Summary of Various GE2E Autonomous Driving Models
- Performance Comparison between Various GE2E Autonomous Driving Models
- 1.8.2 Technical Paradigm Comparison: VLA vs. World Model
- VLA vs. World Model: Who will Win?
- Performance Competition between VLA and World Model
- Summary of Comparison between VLM/VLA/World Models
1.9 Diffusion Models
Four Mainstream Generative Models
Principles of Diffusion Models
Diffusion Models Optimize Core Links of Intelligent Driving Trajectory Generation
Diffusion Models Optimize Intelligent Driving Trajectory Generation
Application of Diffusion Models in Intelligent Driving
Practical Application Cases of Diffusion Model

2 Technical Routes and Development Trends of End-to-End Autonomous Driving

2.1 Technical Trends of End-to-End Autonomous Driving
Summary of Evolution Route of Intelligent Driving End-to-End Large Models
Trend 1: The Core Focus of Autonomous Driving Large Model Evolution in 2026 Will Be Competition and Deep Integration of Multiple Technical Routes
Integration Case 1: Overall Architecture of Afari Technology's Autonomous Driving System Adopts VLA+E2E Collaborative Closed Loop
Integration Case 2: L3-Capable World Action Model (WAM) Builds Trinity Architecture of "VLA + World Model + Safety Adversarial Model"
Trend 2: VLA and World Model Fusion Paradigm Is Expected to Become One of the Mainstream Approaches for Physical AI Implementation
VLA+World Model Integration Case 1: Xiaomi OneVL Unifies VLA and World Model into One Framework
Disassembly of Xiaomi OneVL Architecture
VLA+World Model Integration Case 2: XPeng Launches X-World
VLA+World Model Integration Case 3: Huawei DriveVLA-W0 Predicts Future Images via World Modeling Tasks
Disassembly of DriveVLA-W0 Architecture
DriveVLA-W0 Leverages World Models to Amplify Autonomous Driving Data Scaling Law
VLA+World Model Integration Case 4: Bosch ExploreVLA Introduces World Model Based on VLA+RL to Achieve Three Major Breakthroughs
Disassembly of Bosch ExploreVLA Model Architecture
Trend 3: Autonomous Driving Is Entering the Physical AI Stage
Ultimate Form of Physical AI Connects Digital and Physical Worlds, and Autonomous Driving Serves as Its Optimal Implementation Carrier
Trend 4: Evolution of Intelligent Driving AI Towards Foundation Models Accelerates, and the Industry Will Enter A Competition Period of General Cognitive and Reasoning Capabilities of Foundation Models
Case 1: Hardcore Technological Innovations in DeepRoute 40B VLA Foundation Model
Case 2: Core of 2026 Strategy of Zhuoyu Technology: Building Mobile Intelligent Foundation Model (1)
Case 2: Core of 2026 Strategy of Zhuoyu Technology: Building Mobile Intelligent Foundation Model (2)
Case 3: XPeng World Foundation Model
Trend 5: End-to-End Autonomous Driving Has Entered the Stage of Data Closed-Loop Competition and Refined Operation
Case: NVIDIA MOSAIC
Trend 6: Robots and Intelligent Driving Become Two Mainstream E2E Application Scenarios on the Road to AGI (1)
Trend 6: Robots and Intelligent Driving Become Two Mainstream E2E Application Scenarios on the Road to AGI (2)
2.2 End-to-End Autonomous Driving Market Trends
Comparison of End-to-End Autonomous Driving Large Model Layout between ADAS Tier 1 Suppliers (1)
Comparison of End-to-End Autonomous Driving Large Model Layout between ADAS Tier 1 Suppliers (2)
Comparison of End-to-End Autonomous Driving Large Model Layout between ADAS Tier 1 Suppliers (3)
Comparison of End-to-End Autonomous Driving Large Model Layout between ADAS Tier 1 Suppliers (4)
Comparison of End-to-End Autonomous Driving Large Model Layout between ADAS Tier 1 Suppliers (5)
Solution Layout Comparison between Other End-to-End Autonomous Driving System Suppliers
Comparison of End-to-End Autonomous Driving Large Model Layout between OEMs (1): Xiaomi, XPeng, Li Auto, NIO
Comparison of End-to-End Autonomous Driving Large Model Layout between OEMs (2): Changan, BYD, Leapmotor
Comparison of End-to-End Autonomous Driving Large Model Layout between OEMs (3): Chery, Dongfeng, IM Motors
Comparison of End-to-End Autonomous Driving Large Model Layout between OEMs (4): GAC, FAW Hongqi, Geely

3 End-to-End Autonomous Driving Suppliers

3.1 Afari Technology - End-to-End Autonomous Driving Model
Profile
Fully Entering into AI-Driven Intelligent Vehicle Era
AI + Vehicle Strategy
Top-Level Strategy and Commercial Closed Loop
Ecosystem Alliance
Judgment on Next-Generation End-to-End Architecture Trend (1)
Judgment on Next-Generation End-to-End Architecture Trend (2)
Judgment on Next-Generation End-to-End Architecture Trend (3)
End-to-End Large Model Architecture: E2E2.0+VLA
E2E Architecture
World Model Closed-Loop Simulation Architecture
Native Intelligent Driving Foundation Model
Three Major Businesses (1)
Three Major Businesses (2): Robotaxi Deployment Plan, 2026-2030
Evolution Route of Intelligent Driving Solutions (ASD1.0 to ASD4.0) and End-to-End Large Model
Mass Production of Chongqing Qianli Intelligent Driving Technology Co., Ltd.
3.2 Horizon Robotics - End-to-End Autonomous Driving Large Model
Ultimate Strategic Roadmap: 2025-2030+
Three Strategic Evolutions
Latest Product Launches in 2026 (1)
Latest Product Launches in 2026 (2)
Adopts One-Model End-to-End + VLM Solution
Introduction of Reinforcement Learning and World Model
Thoughts on One-Model End-to-End Large Models
Urban Driving Assistance System: HSD
Journey 6 Series Chips
SparseDriveV2 (1)
SparseDriveV2 (2)
UMGen: Unified Framework for Multimodal Driving Scene Generation
GoalFlow: Goal-Driven Approach Unlocking New Future of Generative End-to-End Strategies
MomAD: Momentum-Aware Planning in End-to-End Autonomous Driving
DiffusionDrive: Towards Generative Multimodal End-to-End Autonomous Driving
RAD: Post-Training Paradigm of End-to-End Reinforcement Learning Based on 3DGS Digital Twin World
Mass Production
Super Drive High-Level Intelligent Driving and Its Advantages
Architecture and Technical Principles of Super Drive
Senna Intelligent Driving System (Large Model + End-to-End)
Core Technologies and Training Methods of Senna
Core Modules of Senna
3.3 Zhuoyu Technology - Intelligent Driving Large Model
Comparison of Three Intelligent Driving Model Paradigms: One-Model End-to-End, World Model and VLA (1)
Comparison of Three Intelligent Driving Model Paradigms: One-Model End-to-End, World Model and VLA (2)
Launched Mobile Physical AI Foundation Model in 2026: Native Multimodal Foundation Model
Comparison between Three VLA Technical Paradigms and Zhuoyu's 2026 Native Multimodal Foundation Model
Evolution Route of ClixPilot End-to-End Large Model (1)
Evolution Route of ClixPilot End-to-End Large Model (2)
End-to-End World Model Architecture
Two-Stege Training Model for End-to-End World Model
Core Functions of Generative Intelligent Driving GenDrive
Core Technologies of Generative Intelligent Driving
Two-Model End-to-End
Interpretable One-Model End-to-End
Mass Production and Clients of End-to-End
3.4 NVIDIA - Intelligent Driving Large Model
Ten-Year Layout of Autonomous Driving Business
L2++/L4 Intelligent Driving Plan (2026-2030)
L3 and L4 Implementation Roadmap of NVIDIA
DRIVE Full-Stack Driving Assistance Platform: 5-Layer Architecture
Drive Hyperion 10 (1): Hardware Configuration
Drive Hyperion 10 (2): Software Architecture
Building Autonomous Driving Safety and AI Ecosystem Based on Halos OS
DRIVE AV Intelligent Driving Large Model Solution: VLA + Classic Rule-Based Algorithms
E2E+VLM->Drive VLA (1)
E2E+VLM->Drive VLA (2)
VLA On-Vehicle Deployment Solution (1)
VLA On-Vehicle Deployment Solution (2)
Launched Alpamayo 1.5
Drive VLA Technical Route: 10B Large Model Alpamayo 1.5
New-Generation In-Vehicle Computing Platform - Drive Thor
World Foundation Model Development Platform - Cosmos
Cosmos Training Paradigm
NVIDIA DriveOS: Foundation Platform Built for Autonomous Driving
Core Design Concept of NVIDIA Multicast
End-to-End Intelligent Driving Framework - Hydra-MDP
Self-Developed Model Architecture - Model Room
3.5 Momenta - Intelligent Driving Large Model
Profile
R7 Reinforcement Learning World Model
Mass-Produced Vehicles Equipped with R7
R6 Flywheel Large Model
Disassembly of One-Model End-to-End
Algorithm Development Path
Evolution Roadmap of Intelligent Driving Large Models
Intelligent Driving Technology Evolution and Industrial Paradigm Changes
End-to-End Planning Architecture
End-to-End Large Model Mass Production Solutions
3.6 DeepRoute.ai - Intelligent Driving Large Model
Product Layout and Strategic Deployment
Launched Unified Foundation Model in 2026
Principle, Architecture and Technical Highlights of 40B VLA Foundation Model (1)
Principle, Architecture and Technical Highlights of 40B VLA Foundation Model (2)
Principle, Architecture and Technical Highlights of 40B VLA Foundation Model (3)
Value Brought by Foundation Models
End-to-End Intelligent Driving Large Model Evolution, 2023-2026
DeepRoute IO 2.0: VLA 2.0 (1)
DeepRoute IO 2.0: VLA 2.0 (2)
VLA2.0 Designated Mass Production Projects
Adopted End-to-End Intelligent Driving Solutions in 2023
In-Depth Cooperation with Volcano Engine in 2025
Implementation Platform of RoadAGI - AI Spark
End-to-End VLA Model: VLA1.0
End-to-End VLA Model: Architecture of VLA1.0
End-to-End 1.0 Designated Mass Production Projects
Introduction of Hierarchical Hint Tokens
End-to-End Training Solution - DINOv2
Application Value of DINOv2 in Computer Vision
VQA Evaluation Dataset for Intelligent Driving
BLEU Evaluation Metrics and CIDEr Automatic Evaluation Metric for Image Caption Generation Tasks
Score Comparison between DeepRoute HoP and Huawei Solution
3.7 Huawei - End-to-End Intelligent Driving Large Model
Evolution Roadmap of Qiankun Intelligent Driving Large Model (ADS2.0 to ADS5)
ADS 5 (1): WEWA 2.0 Architecture
Comparation between WEWA2.0 and WEWA1.0
ADS 5 (2): Computing Power
ADS 5 (3): Benchmarking of Four Versions and Production Vehicle Models
Hierarchical Architecture of Pangu Large Model
Pangu Model Product System (1)
Pangu Model Product System (2)
ADS 4: WEWA 1.0
In-Depth Integration of ADS 4 and XMC, and Cloud Simulation Verification
ADS 4: Commercial L3 Highway Solution
Mass Production of ADS 4 End-to-End
ADS 2.0 (1): End-to-End Concept and Perception Algorithm
ADS 2.0 (2): End-to-End Concept and Perception Algorithm
Summary of ADS 2.0
ADS 3.0 (1): End-to-End
ADS 3.0 (2): End-to-End
ADS 3.0 (3): ASD3.0 VS. ASD2.0
ADS 3.0 End-to-End Application Case (1): STELATO S9
ADS 3.0 End-to-End Application Case (2): LUXEED R7
ADS 3.0 End-to-End Application Case (3): AITO Series
Architecture and Principles of Perception-Enhanced World-Awareness-Action Model (Percept-WAM) (1)
Architecture and Principles of Perception-Enhanced World-Awareness-Action Model (Percept-WAM) (2)
Architecture and Principles of Perception-Enhanced World-Awareness-Action Model (Percept-WAM) (3)
Multimodal LLM End-to-End Autonomous Driving Solution
End-to-End Test - VQA Tasks
Architecture of DriveGPT4
End-to-End Training Solution Case
Two Training Stages of DriveGPT4
Comparison between DriveGPT4 and GPT4V
3.8 QCraft - Intelligent Driving Large Model
Product Matrix in Intelligent Driving: Three-Tier Product Matrix of Intelligent Driving System QPilot 2.0
Mass-Produced Urban NOA End-to-End Solution Based on Single Journey 6M Chip
Core Technologies Implementing Urban NOA with Single J6M Chip: Interpretable One-Model End-to-End
Core Technologies Enabling Ultimate Urban NOA Experience: VLA and World Model Architecture
Evolution of Intelligent Driving Large Models
Intelligent Driving Solution Evolution Roadmap
Data and Model Training Closed Loop
Ecosystem Partners Panorama
3.9 Bosch - Intelligent Driving Large Model
Zongheng Driving Assistance Solution
Urban Driving Assistance Solution Based on End-to-End Model
China Strategic Layout of Bosch Mobility
Bosch Mobility Launched New Organizational Restructuring and Strategic Cooperation Based on End-to-End Development Trends
Adopt One-Model End-to-End for Mass Production Solutions
End-to-End Technical Route of Premium Zongheng Driving Assistance Solution
Disassembly of One-Model End-to-End Technical Paradigm
Comparison between End-to-End Mass Production Solutions
Overall Design Idea of CriticVLA
Architecture of CriticVLA (1)
Architecture of CriticVLA (2)
Classification System of Foundation Models for Autonomous Driving Trajectory Planning
Customized Foundation Models for Trajectory Planning: Fine-Tuning
Foundation Model for Autonomous Driving Trajectory Planning: Customized Foundation Models for Trajectory Planning
Foundation Model for Autonomous Driving Trajectory Planning: Models Focused Solely on Trajectory Planning
Models and Core Features of Trajectory Planning Methods with Language Interaction Capability
Core Features of Models with Action Interaction Capability: Training Datasets, Training Methods and Evaluation Metrics
3.10 WeRide - End-to-End Large Model
Profile
Business Model
Financial Overview, 2023-2025
Five Major Product Matrices
Exploration of Business Model for L4 Autonomous Driving Multi-Scenario Application
Traditional Autonomous Driving Architecture: Two Major Problems of Perception-Prediction-Planning-Control Modular Pipeline
Unsolved Problems of One-Model End-to-End
E2E + Traditional Pipeline Dual Architecture
E2E Model Architecture
Evolution Route of End-to-End Autonomous Driving Large Models
Hardware Architecture of Gen8 L4 Autonomous Driving System
HPC 3.0
Self-Developed General Simulation Model: WeRide GENESIS
3.11 Pony.ai - End-to-End Intelligent Driving Large Model
Profile
Three Major Business Lines and Business Model
Robotaxi Business Layout
Business Model of Robotaxi
Revenue Overview, 2024-2025
Comparative Analysis between Pony.ai and WeRide: Market Value, Revenue, Business, Robotaxi Business and Intelligent Driving Models
PonyWorld World Model 2.0 (1)
PonyWorld World Model 2.0 (2)
PonyWorld World Model 2.0 (3)
PonyWorld World Model 2.0 (4)
E2E End-to-End Intelligent Driving Model
Evolution Route of 1st to 7th Generation Robotaxi Products
Released New-Generation Autonomous Driving Domain Controller
Ecosystem Partners
3.12 Baidu - End-to-End
DriVerse: Navigation World Model for Driving Simulation via Multimodal Trajectory Prompting and Motion Alignment
Overview of Baidu Apollo
Robotaxi Business Layout
Commercial Implementation Progress of Robotaxi (1): Overseas Markets
Commercial Implementation Progress of Robotaxi (2): Domestic Market
Key Nodes of Robotaxi Deployment in 8 Cities in China, 2021-2026
Two-Model End-to-End: Adopt the Strategy of Segmenting First and Then Joint Training
Production Vehicle Equipped with Two-Model End-to-End Architecture: Jiyue 07
Baidu Automotive Cloud 3.0 Enables End-to-End Systems in Three Aspects (1)
Baidu Automotive Cloud 3.0 Enables End-to-End Systems in Three Aspects (2)
3.13 SenseAuto - End-to-End
Profile
Technical Route Analysis 1: End-to-End Autonomous Driving Evolution Roadmap
Technical Route Analysis 2: Analysis of Generative Intelligent Driving R-UniAD (1)
Technical Route Analysis 3: Analysis of Generative Intelligent Driving R-UniAD (2)
Architecture of R-UniAD
Practical Demonstration of R-UniAD: Complex Scene Mining, 4D Simulation Reproduction, Reinforcement Learning and Generalization Verification
Kaiwu World Model 2.0
Mass Production
Released UniAD End-to-End Solution
DriveAGI: New-Generation Intelligent Driving Large Model and Its Advantages
DiFSD: End-to-end Intelligent Driving System That Simulates Human Driving Behaviors
DiFSD: Technical Interpretation
3.14 Wayve - Intelligent Driving Large Model
Profile
Advantages of AV 2.0
Latest Progress: Architecture of GAIA-1 World Model
GAIA-1 World Model - Token
GAIA-1 World Model - Generation Effects
LINGO-2 Model
3.15 Waymo - Intelligent Driving Large Model
Foundation Model
Building the Driver Algorithm
Validating the Driver Algorithm
Released Multimodal End-to-End Model EMMA
EMMA: Multimodal Input
EMMA: Defining Driving Tasks as Visual Q&A
EMMA: Introducing Chain-of-Thought Reasoning to Enhance Interpretability
Limitations of EMMA Model
Implementation and Operation
3.16 GigaAI - End-to-End
Profile
Evolution Route of World Models
Hierarchical Construction Method for 4D Generative World Models
Application of World Models (1)
Application of World Models (2)
ReconDreamer
World Model: DriveDreamer
World Model: DriveDreamer 2
Overall Framework of DriveDreamer4D
3.17 Nullmax - Intelligent Driving Large Model
Profile
MaxDrive Driving Assistance Solution
New-Generation Intelligent Driving Technology - Nullmax Intelligence
End-to-End Technical Architecture
End-to-End Data Platform
HiP-AD: End-to-End Intelligent Driving Framework Based on Multi-Granularity Planning and Deformable Attention
Mass Production

4 End-to-End Autonomous Driving Layout of OEMs

4.1 Xiaomi
Profile
2026 Strategic Planning/li>
Comprehensive Analysis of New Vehicle Planning in 2026
Product Positioning and Parameter Benchmarking of 2026 New Vehicles (1)
Product Positioning and Parameter Benchmarking of 2026 New Vehicles (2)
Organizational Structure Changes of Intelligent Driving Division
Intelligent Driving Technical Route: Full-Route Pre-Research without Betting on Single Technology
Comparison between VLA and End-to-End Routes
Intelligent Driving Algorithm Evolution Trend: from Modular End-to-End to End-to-End Architecture Introducing World Model + Reinforcement Learning
Launched XLA Cognitive Large Model in 2026
Evolution Roadmap of Intelligent Driving System and Large Models
Enhanced Version of HAD (1)
Enhanced Version of HAD (2)
End-to-End VLA Intelligent Driving Solution Orion
ORION Framework
Physical World Modeling Architecture
Multi-Model End-to-End with Three-Layer Separated Modeling
Long Video Generation Framework - MiLA
4.2 XPeng
Evolution Roadmap of End-to-End Intelligent Driving Large Models
Autonomous Driving Product Planning, 2025~2026
L4 Autonomous Driving Layout in 2026: Robotaxi
Second-Generation VLA: Native Multimodal Physical World Large Model
L4 Capability = Model X Computing Power X Data X Vehicle Hardware
Second-Generation VLA (1)
Second-Generation VLA (2)
World Foundation Model (1)
World Foundation Model (2)
Core Technical Path of World Foundation Model
Three Phased Achievements in R&D of World Foundation Model
Cloud Model Factory (1)
Cloud Model Factory (2)
End-to-End System: Architecture
4.3 Li Auto
Evolution Roadmap of End-to-End Intelligent Driving Large Models (1)
Evolution Roadmap of End-to-End Intelligent Driving Large Models (2)
Launched New-Generation Unified Architecture MindVLA-o1 in 2026 (1)
Launched New-Generation Unified Architecture MindVLA-o1 in 2026 (2)
Next-Generation Unified Architecture MindVLA-o1 (1)
Next-Generation Unified Architecture MindVLA-o1 (2)
Next-Generation Unified Architecture MindVLA-o1 (3)
Evolution from E2E+VLM Dual System to MindVLA
Architecture of MindVLA Model
Core Technology 1 of MindVLA: Great 3D Physical Spatial Perception Capability
Core Technology 2 of MindVLA: Integration with Large Language Model (LLM)
Core Technology 3 of MindVLA: Combination of Diffusion and RLHF
Core Technology 4 of MindVLA: World Model and NVAIE Accelerated Reinforcement Learning
End-to-End Solution (1): Iterative Evolution of System 1
End-to-End Solution (2): System 1 (End-to-End Model) + System 2 (VLM)
End-to-End Solution (3): Intelligent Driving Technical Architecture
End-to-End Solution (4): DriveVLM Large Model - Architecture
End-to-End Solution (5): DriveVLM Large Model - Rendering Effects
End-to-End Solution (6): DriveVLM Large Model - BEV and Text Feature Processing
4.4 Tesla
Interpretation of 2024 AI Conference
Development History of AD Algorithms
Summary of End-to-End Progress, 2023-2024
FSD v13 (1)
FSD v13 (2)
FSD v13 (3): Subsequent Updates
Development History of AD Algorithms: Entering the Perception-heavy Map-light Era
Development History of AD Algorithms: Shadow Mode
Development History of AD Algorithms: Background of Occupancy Network Adoption
Development History of AD Algorithms: Occupancy Network (1)
Development History of AD Algorithms: Occupancy Network (2)
Development History of AD Algorithms: Occupancy Network (3)
Development History of AD Algorithms: Multi-Camera Fusion Algorithm HydraNet
Development History of AD Algorithms: FSD V12
Core Elements of Perception-Decision Full-Stack Integrated Model
End-to-End Algorithms
World Model (1)
World Model (2)
Data Engine
Dojo Supercomputer Center: Overview
Dojo Supercomputer Center: Training Tile Based on D1 Chip Integration
Dojo Supercomputer Center: Computing Power Development Plan
4.5 NIO
Organizational Structure Adjustment of Intelligent Driving Division, 2024-2025
From Model-Based to End-to-End, World Model Becomes Dominant Technical Paradigm
Evolution Route of End-to-End Large Models
Detailed Explanation of Intelligent Driving System
NIO World Model (NWM) (1)
NIO World Model (NWM) (2)
Imagination Reconstruction Capability and Swarm Intelligence of World Model
NSim Simulator (NIO Simulation)
World Model 2.0
Comparation between End-to-End Model and World Model
Comparation between VLA and World Model
4.6 Changan
Dubhe Plan 2.0 - Tianshu Intelligent Driving
Software Architecture of TOPS AD
Brand Layout
ADAS Strategy: "Dubhe Plan" Strategy
End-to-End System: BEV+LLM+GoT (1)
End-to-End System: BEV+LLM+GoT (2)
Production Vehicle Equipped with End-to-End System: NEVO E07
4.7 Chery
Product Matrix and Vehicle Models
Evolution History of Intelligent Driving System
Launched Four Versions of Falcon Pilot in 2025
Progress of End-to-End Intelligent Driving Large Models (1)
Progress of End-to-End Intelligent Driving Large Models (2)
4.8 GAC Group
Intelligent Driving Large Model Strategy
Evolution Roadmap of ADiGO Intelligent Driving System (ADiGO1.0 to ADiGO6.0)
Launched Five Major Intelligent Driving Platforms in 2025
L2.9 Vehicles and Urban NOA Algorithm/Intelligent Driving System Suppliers
Achieves "High-End Orientation + Mass Popularization" of Urban NOA through "Dual-Gradient Intelligent Driving Suppliers + Scenario-Price Precision Matching" Strategy
Established Huawang Adopting the "GAC Smart Manufacturing + Huawei Intelligence" Model to Expand High-End Market and Improve Brand Matrix
First Model Huawang Aistaland F03 Expected to Be Launched in Q2 2026
Momenta 5.0 One-Model End-to-End Algorithm Is Deployed on RMB150,000-Level Vehicles, and Urban NOA Function Is Also Available
Trumpchi Xiangwang S7 to Be Equipped with Momenta R6 Reinforcement Large Model
Architecture of ADiGO End-to-End Embodied Reasoning Model
Core Technologies of ADiGO
4.9 Leapmotor
Released World Model in 2026
D19 Adopts VLA Large Model to Realize Full-Scenario Door-to-Door NOA
Adopts Intelligent Driving System Self-Development Model
Evolution Roadmap of Leapmotor Pilot (1)
Evolution Roadmap of Leapmotor Pilot (2)
End-to-End High-Level Intelligent Driving
Application Scenarios of End-to-End High-Level Intelligent Driving
4.10 IM Motors
Iteration History of Intelligent Driving System
Cooperation with Momenta on Intelligent Driving
IM AD End-to-End 2.0 Intelligent Driving Large Models
Core Technologies of IM AD End-to-End 2.0 Intelligent Driving Large Models
Application Scenario Comparison between IM AD End-to-End 2.0 Intelligent Driving Large Models
4.11 FAW Hongqi
Technical Architecture of Sinan Intelligent Driving
Core Technologies of End-to-End Large Models
Sinan Intelligent Driving Solution
Vehicle Deployment Schedule and Future Planning of Sinan Intelligent Driving Solution
Sinan Intelligent Driving System: Co-Developed with DJI Zhuoyu Technology (1)
Sinan Intelligent Driving System: Co-Developed with DJI Zhuoyu Technology (2)
Deployed Vehicles and Key Configurations of Sinan Intelligent Driving System
Zhuoyu End-to-End 4.0 System Debuted with Sinan Intelligent Driving in 2026
FAW Hongqi 9 Series Models to Adopt Huawei Hi Mode in 2026
4.12 Dongfeng
Intelligent Driving Strategic Plan 2026-2030
Launched Four-Tier Tianyuan Intelligent Driving Product Matrix in 2025: Full Coverage from L2 to L4/L5
Comparison of Intelligent Driving Configurations between Production Vehicles First Equipped with Tianyuan T100/T200/T500
Tianyuan Intelligent Driving Technical Architecture R-AiD
Intelligent Driving Strategy: Self-development + External Procurement in Parallel in Short Term, and Gradual Self-development for Replacement in Long Term
4.13 BYD
Overview of 2026 Intelligent Driving Planning
Layout in Intelligent Driving Field: Pre-Research on World Models
Organizational Structure Adjustment of Intelligent Driving Team (1): Integration of Dual Intelligent Driving Departments to Pool Resources to Accelerate Universal Intelligent Driving
Organizational Structure Adjustment of Intelligent Driving Team (2): Establishment of Advanced Technology R&D Center to Increase Investment in