封面
市場調查報告書
商品編碼
2064024

大規模端到端智慧駕駛模式研究報告(2026)

Intelligent Driving End-to-End Large Model Research Report, 2026

出版日期: | 出版商: ResearchInChina | 英文 595 Pages | 商品交期: 最快1-2個工作天內

價格
簡介目錄

大規模智慧駕駛模式研究—技術競爭與典範整合的關鍵時期

隨著自動駕駛技術從L2級快速演進至L3級和L4級,智慧駕駛系統正經歷從傳統規則驅動架構向下一代資料驅動和認知驅動架構的重大變革。作為這項轉變的核心技術,大規模智慧駕駛模型已成為產業競爭的焦點。隨著物理人工智慧時代的到來,自動駕駛被定位為首個大規模應用場景,推動汽車從傳統交通工具迅速轉型為「超級智慧體」,成為連接出行、移動辦公、家庭生活和第三方生態系統的全場景智慧樞紐。

從產業角度來看,實體人工智慧仍處於技術融合的早期階段,全球自動駕駛市場擁有巨大的未開發潛力。數據顯示,運作中。雖然全球年行駛總里程達13兆公里,但自動駕駛里程僅7億公里,僅佔總里程的0.006%左右。未來成長潛力大。

此外,從技術採納速度來看,大規模智慧駕駛模型正進入一個關鍵的技術迭代窗口期。分段式端到端解決方案預計將在2024年至2025年間實現量產,而單車型端到端和VLA技術預計將在2025年至2026年間廣泛部署。同時,由於自動駕駛體驗的不斷提升以及L3-L4級高級自動駕駛技術的加速成熟,實體人工智慧也在快速發展。 ResearchInChina預測了大規模自動駕駛模式的三大關鍵發展趨勢。

趨勢 1:2026 年大規模自動駕駛模型演進的核心將是多種技術路線的競爭和深度融合。

Bosch、Momenta 整合模型 1:單一模型端對端 + 世界模型 + 強化學習,領先供應商 - WeRide、Bosch、Momenta

特點:此單模型端到端模型作為智慧駕駛的核心神經網路,直接連接感測器輸入和駕駛輸出,實現零資訊損失,從而達到極高的性能上限。世界模型負責推斷未來的路況,並能以低成本產生大量長尾場景用於模擬訓練。強化學習基於獎勵機制,在推理空間內迭代最佳化,輸出最佳駕駛策略,以因應各種突發情況。這三者的結合構成了一個強大的封閉回路型:資料生成(世界模型)→策略學習(強化學習)→決策與執行(端到端模型)。這使得自動駕駛系統能夠從海量駕駛數據中學習並不斷進化。

整合模式 2:E2E+ 基礎模型(VLM/VLA)+ 強化學習 + 世界模型,代表性供應商:Horizo​​n Robotics 和 Afari Technology

特點:大型視覺和語言模型充當「大腦」,負責認知推理;而較小的端到端模型充當「小腦」,負責快速執行。

Horizo​​n Robotics採用單模型端到端+車輛模型+強化學習+世界模型的雙軌自動駕駛架構,融合了「快速思考」和「慢速思考」,其核心是強化學習。一方面,它透過世界模型和模擬訓練強化端到端直覺模型,實現毫秒響應,同時增強處理罕見、短時間、長尾場景的能力。另一方面,它透過提升推理能力增強車輛模型認知能力,提高對複雜、長時序場景的語意理解和邏輯推理能力。最後,它將車輛模型功能遷移到車輛模型,透過量化和蒸餾實現輕量級部署,從而形成「毫秒級快速響應+慢速、長時序推理」的平衡封閉回路型。

Afari Technology 採用 VLA+E2E+World 模型架構。在這個架構中,VLA 模型處理類似於慢速系統高階決策的推理,而 E2E 端對端演算法處理類似於快速系統的動作映射。首先,使用包含 320 億個參數的大規模模型進行大規模多模態預訓練 (VLM),然後將其精簡為包含 70 億個參數的輕量級模型,以最佳化效能和部署之間的平衡。此外,透過監督式微調,將感知和駕駛行為進行匹配,引入駕駛領域知識 (VLA),並學習高級駕駛策略和行為規範。透過強化學習將人類駕駛風格與安全約束結合,實現了感知、決策和控制的封閉回路型最佳化。

整合模式3:VLA+全球模式,代表性供應商-卓宇科技與小鵬汽車

功能特點:車輛邏輯分析系統(VLA)負責識別當前環境、學習過往駕駛模式並決定下一步。世界模型負責推斷道路上每個物體在未來5-10秒內的互動方式。 VLA擅長理解現狀,但不太擅長預測未來。另一方面,世界模型擅長預測,但無法反思或推斷預測結果。兩者結合構成了一個完整的「大腦」。

趨勢 2:VLA 和世界模型的融合範式有望成為實體人工智慧實現的主要方法之一。

未來大規模自動駕駛模型演進的核心在於從根本上重建其底層範式,從「模仿人類駕駛」轉向「理解物理世界」。虛擬學習與世界模式並非互斥的選擇;未來的大規模自動駕駛模式將是二者的融合。目前,兩種方法的差異在於,虛擬學習的支持者認為「理解」是駕駛的前提,而世界模型的支持者則認為「預測」才是關鍵。

世界模型的支持者認為,物理世界的變化是連續且高維度的。語言是一個離散的、低維度的符號系統,從物理到語言的轉換不可避免地會造成資訊損失。世界模型能夠以更高的頻寬直接操作物理表徵。 VLA的支持者認為,VLA的最大優勢在於其能夠與世界模型和基於模型的強化學習相結合進行微調。 VLA可以吸收世界模型的優勢,但世界模型無法利用VLM/VLA的優勢。語言是人類常識的壓縮包,因此具有強大的泛化能力。 VLA透過語言擁有「常識推理」能力和思考鏈(CoT),從而獲得自我解釋的能力。

基於這兩種方法的優勢和差異,業界已開始致力於將它們融合。目前,VLA與世界模型融合的主流模式主要有三種:統一潛在空間融合、架構層面深度融合、模組化協同融合(雲端模擬器型)。

融合模式1:潛在空間統一融合,代表性範例-小米OneVL與華為DriveVLA-W0

此方法的核心在於將世界模型的預測能力融入VLA的學習目標中,而不是在推理階段添加額外的模組。具體而言,透過在VLA模型的學習過程中加入未來影像預測任務,模型不僅可以學習行為預測,還可以學習未來時間點的環境狀態(即未來影像)。這種設計促使模型學習駕駛環境的潛在動態規律,而不是簡單地擬合稀疏的行為監督訊號。

潛在空間整合與融合案例研究1:小米OneVL自動駕駛模型

2026年5月13日,小米正式發布了「小米OneVL」,這是一款完全開放原始碼的自動駕駛模型,它將VLA(虛擬語言分析)、世界模型和潛在空間推理三種技術方法整合到一個統一的框架中。此模型的核心突破在於透過潛在空間推理深度融合了多種技術範式。與傳統方案將推理過程分解為人類可讀的自然語言並逐字生成演繹邏輯不同,小米OneVL直接在高維向量化的潛在空間中執行端到端的邏輯運算。此潛在空間融合了VLA的場景識別和理解能力以及世界模型的環境時間序列預測能力,並且由於所有推理操作都在向量層面而非文本層面進行,因此與傳統的VLA方案相比,推理效率得到了顯著提升。

在實現機制方面,首先在模型中引入兩種類型的潛在變數:視覺潛在標記和語言潛在標記。前者編碼場景中的物理關係和時間序列變化,負責世界模型的預測能力;後者表達駕駛意圖和語意邏輯,負責VLA的理解能力。

接下來,OneVL引入了兩個輔助解碼器,它們僅在訓練階段使用。語言輔助解碼器負責從語言潛在標記重建人類可讀的CoT文本,解釋模型做出某些駕駛決策的原因。視覺輔助解碼器負責從視覺潛在標記中預測未來影格(0.5秒和1.0秒後的影像)的視覺標記,使模型能夠預測場景變化。在推理階段,這兩個解碼器都會被移除,模型直接輸出預期結果。這實現了單步推理,徹底消除了自回歸導致的延遲累積。

潛在空間整合與融合案例研究2:華為DriveVLA-W0透過世界建模任務預測未來影像

傳統VLA模型面臨一個根本性問題:缺乏監督資訊。儘管VLA模型以高多模態資料(例如前視圖像序列、語音指令、歷史行為等)作為輸入,但監督訊號只是低維度的行為標記。這導致模型的大部分錶達能力被浪費,無法充分學習駕駛環境的複雜動態,也無法有效釋放VLA模型的巨大潛力。

如下圖所示,隨著訓練資料量從70萬幀增加到700萬幀,再到7000萬幀(資料量持續增加),碰撞率呈下降趨勢。換言之,訓練資料越多,安全性越高。然而,在缺乏世界模型的傳統VLA技術範式中,當資料量從700萬幀增加到7,000萬幀時,碰撞率的下降速度減緩。這顯示數據對提升VLA安全性能的影響存在極限。

為了應對VLA面臨的挑戰,例如自監督學習稀疏、資料尺度規律失效以及缺乏物理時間序列預測能力等問題,華為在其論文中提案了一種名為DriveVLA-W0的訓練範式。此範式引入了一個世界模型,在訓練階段將未來影像預測為密集的自監督訊號,從而在保持理解環境動態變化能力的同時,提升了未來時間序列預測能力。與傳統VLA相比,DriveVLA-W0增加了世界建模(預測未來道路狀況)。隨著資料量的增加,這種世界建模的優勢更加顯著,並強化了資料尺度規律。

具體而言,透過在VLA模型的學習過程中加入未來影像預測任務,該模型不僅能學習行為預測,還能學習未來時間點的環境狀態(即未來影像)。這種設計迫使模型學習駕駛環境的潛在動態規律,而不是簡單地擬合稀疏的行為監督訊號。

融合模式 2:架構層面的深度融合,代表性範例 - VLA-World

與預訓練融合(外在強化學習)不同,預訓練融合中世界模型作為外部工具,先生成後傳輸,而架構層面的深度融合將世界模型的功能內化為 VLA 的固有功能,從而允許規劃和生成在同一架構內共同發展。

VLA-World是由上海交通大學和華為中央研究院於2026年4月聯合提案的整合式VLA架構,它深度融合了世界模型功能。在傳統方案中,世界模型和VLA相互獨立,前者負責產生模擬影片,後者負責感知推理和決策輸出。 VLA-World採用單一的VLA骨幹網路,實現視覺生成和決策推理之間的特徵共用。它將軌跡預測和視覺生成作為同一決策鏈中的連續環節,並遵循先預測運動軌跡再基於該軌跡推斷未來圖像的因果邏輯,從而實現了深度模組耦合和高度一致的推理鏈。

運行機制:

基於軌跡感知的條件反射:VLA-World 首先預測軌跡,然後基於該軌跡產生未來的影格。軌跡預測的結果直接作為視覺產生的條件訊號,引導生成過程。這樣就形成了一種因果關係,軌跡決定了「去哪裡」,而圖像則呈現了「到達目的地後看到什麼」。

生成與推理的融合:與傳統模型中世界模型和VLA是兩個獨立模組不同,VLA-World共用同一個VLA主幹。換句話說,它將視覺生成和推理整合在同一個VLA結構中。

端對端對齊與GRPO-在強化學習階段,模型使用GRPO(群體相對策略最佳化)進行最佳化。模型產生多個候選軌跡及其對應的未來影像,並獎勵「想像的未來」與「實際安全決策」相符的結果。這種機制確保視覺生成不再是獨立的任務,而是始終在提升下游決策品質方面發揮作用。

趨勢 3:智慧駕駛 AI 向基礎模型演進的進程將加速,產業將進入這些基礎模型的通用認知和推理能力的競爭時期。

2026年是自動駕駛平台模型湧現的第一年。 DeepRoute.ai、Afari Technology、卓宇科技、理想汽車和小鵬汽車均已發布相關產品。這些平台模型的核心在於建立一個通用且可重複使用的實體世界認知基礎,從而實現與所有層級自動駕駛的兼容性以及跨場景的功能轉換。

首先,自動駕駛本質上是一個典型的規模化問題,目前的實現主要受限於模型容量不足和資料封閉回路型效率低下。其次,現有的基礎模型規模有限,缺乏足夠的泛化能力來處理複雜的長尾場景。此外,高價值數據的挖掘依賴於人工篩選和審核,其碎片化和缺乏自動化限制了其長期迭代能力。

為了解決模型容量不足和封閉回路型資料效率低下這兩個瓶頸問題,DeepRoute.ai提案了一個解決方案:一個擁有 400 億參數的統一的基於 VLA 的模型。其核心在於「三位一體」模型角色設計,使同一模型能夠同時扮演三種角色:「駕駛者(視覺輸入 → 即時駕駛決策)、分析員(診斷關鍵場景)」和「評論者/判斷者(評估駕駛行為的安全合理性)」。這使得駕駛系統從一個單純的執行系統演變為一個具有認知能力的智慧系統。

在預處理階段,DeepRoute.ai 放棄了傳統的端對端模型方法(該方法依賴軌跡監督,數據利用率僅為 0.001%),轉而採用影片預測任務。這使得模型能夠透過預測影片序列來學習真實世界的動態結構,並將每個像素轉換為監督訊號,從而將資料利用率提高到接近 100%。

在核心訓練階段(中期訓練),該模型協同學習,重點關注以下三個任務:使用 V+A(視覺+行動)進行傳統的端到端駕駛學習;使用 V+A→L(行動後解釋)激活分析者和評論者的角色;以及使用 V→L+A(多模態模態邏輯推理)訓練駕駛員的推理能力。模型利用「思維鏈」方法,首先輸出關鍵事件的語言說明和決策邏輯,然後輸出具體的駕駛軌跡。

在工程實現方面,DeepRoute.ai 透過 KV 快取、多令牌預測 (MTP)、模型量化以及自主研發的推理引擎等最佳化技術,實現了 10-15Hz 的即時閉迴路控制能力,將單步處理 1000 個視覺令牌和數十個封閉回路型令牌的延遲控制在 60-85 毫秒以內。此外,底層模型可根據車輛晶片的運算能力靈活部署,例如在 100 TOPS 平台上部署純駕駛 VA 模型,在 500 TOPS 平台上部署具備邏輯推理能力的 VLA 模型。

此外,此基礎模型經過預先訓練,能夠學習現實世界的物理定律和空間邏輯,並具備原生零樣本轉換能力。憑藉其多功能的認知基礎,該模型透過模型蒸餾、計算能力最佳化和功能微調,可適應從L2駕駛輔助到L4自動駕駛的各個級別。該模型最初應用於自動駕駛領域,未來將擴展到包括人形機器人和工業機器人在內的多個領域,最終實現「萬物互聯」的目標。

2026年,卓宇科技將徹底轉型。公司以原生多模態平台為技術基礎,力求從「智慧駕駛一級供應商」升級為「行動出行與實體人工智慧企業」。本公司將專注於擴大乘用車、商用車、L4級自動駕駛產品及海外擴張等全場景、全垂直領域的量產規模,並進一步進軍實戰機器人領域。

卓宇發布了VLA(VLA世界模型,原生多模態FM),該模型利用統一的骨幹網處理視覺、文字和感測器數據,在潛在空間中進行物理推理,並直接輸出駕駛動作。從預訓練階段開始,VLA利用圖像、影片、文字、駕駛和機器人資料進行協同學習,在統一的潛在空間中預測和推斷物理世界,從而理解語義和物理規律。

2026年將是大規模自動駕駛模型技術進步和典範轉移的關鍵一年。多條技術路線的競爭與整合、虛擬實驗室(VLA)與世界模型的協同部署以及基礎模型​​的大規模應用,將加速自動駕駛產業從「技術探索」向「大規模應用」的轉變。無論是多路徑融合的技術創新,還是基礎模型的通用部署,其核心目標都圍繞著「更安全、更有效率、更適應真實駕駛場景」這一目標。 「實體人工智慧」的實現趨勢將進一步推動自動駕駛系統從「模仿人類」階段邁向「理解世界」階段,最終實現真正的自動駕駛。

未來,隨著技術的不斷發展和產業鏈的協調完善,大規模自動駕駛模型有望逐步克服現有瓶頸,成為支撐自動駕駛大規模部署的核心,重塑出行行業的發展格局,並推動移動物理人工智慧向更多場景擴展和應用。

目錄

第1章:端到端自動駕駛技術基礎

  • 端到端自動駕駛的術語和概念
  • 端對端自動駕駛術語解釋
  • 端到端相關概念之間的關聯性與差異
  • 端到端自動駕駛概述及發展現狀
  • 這是端到端自動駕駛的典型例子。
  • 商湯科技 UniAD:一款專注於路徑規劃的大規模人工智慧模型,可提供端到端的商業場景應用。
  • 商湯科技 UniAD 的技術原則與架構
  • Horizo​​n VAD 的技術原則和架構
  • Horizo​​n VADv2 的技術原理和架構
  • VADv2 培訓
  • DriveVLM技術原理與架構
  • 理想汽車採用混合專業技術(MoE)架構。
  • MOE 和 STR2
  • 上海啟智研究院的端到端自動駕駛模式SGADS:一種基於強化學習和模仿學習的安全通用型端到端自動駕駛系統
  • 上海交通大學ActiveAD主動學習案例研究:從資料中心觀點解決資料標註瓶頸問題
  • 大多數端對端自動駕駛系統都是基於基礎模型開發的。
  • 基本型
  • 視覺語言模型(VLM)
  • 視覺語言模型(VLM)在智慧駕駛的應用
  • 基本模型在自動駕駛的應用
  • 視覺語言模型(VLM)的應用
  • 視覺語言模型(VLM)的發展過程
  • 視覺語言模型(VLM)架構
  • VLM在端對端自動駕駛的應用原則
  • VLM在端對端自動駕駛的應用
  • VLM模型在智慧駕駛中面臨的挑戰
  • 視覺-語言-行為模式(VLA)
  • VLM → VLA
  • VLM+E2E->VLA
  • VLA架構分析
  • 典型的VLA架構
  • VLA架構分析範例:李氏自動化思維VLA架構的解構(1)
  • VLA架構分析範例:李氏自動化思維VLA架構的解構(2)
  • 大型VLA模型的概念
  • VLA模型原理
  • VLA 型號的分類
  • 對甚大陣列(VLA)技術演進的解讀
  • 大規模語言模型是端到端解決方案的核心組成部分之一。
  • VLA的技術架構與關鍵技術
  • VLA的優勢
  • 部署VLA模型面臨的挑戰-即時回應能力
  • VLA模型部署中即時效能和記憶體使用率面臨的挑戰
  • VLA模型部署面臨的挑戰 - 數據
  • 部署VLA模型的挑戰-長期任務規劃能力
  • VLA大型模型的演化路徑
  • VLA技術範式的代表性模型
  • VLA 資料集和基準
  • 世界模型
  • 世界模型原型:心智模型
  • 世界模型的關鍵定義與應用發展
  • 世界模型的基本架構
  • 世界自動駕駛模式的三大核心價值
  • 全球模式的兩條主要技術路線
  • 生成世界模型 DIAMOND:擴散模型 + 即時強化學習自適應 + 長期穩定性
  • Genie:一種用於從未標記的網路影片中學習現實世界物理定律的生成式互動式世界模型
  • WorldDreamer的技術原理與開發過程
  • 隱性世界模型:V-JEPA2 的技術原則與路徑
  • 隱性世界模型:Comma.ai 的技術原則與發展路徑
  • 建立和實施全球模型框架的困難。
  • 基於變壓器和Diffusion模型的影片生成方法
  • 全球模型可能是實現端到端自動駕駛的理想方法之一。
  • 世界模型 - 產生虛擬訓練數據
  • 世界模型 - 特斯拉世界模型
  • 世界模式 - NVIDIA
  • InfinityDrive:突破駕駛世界車型的時間限制
  • SenseAuto InfinityDrive 參數效能
  • SenseAuto InfinityDrive 管道
  • 商湯科技 DiT 架構及關鍵影片產生評估指標 FID/FV
  • 在自動駕駛領域引入世界模型的挑戰
  • 端到端大規模建模技術範式的比較
  • 擴散模型
  • 四種主流生成模型
  • 擴散模型原理
  • 擴散模型最佳化了智慧駕駛軌跡產生的核心環節。
  • 基於擴散模型的駕駛軌跡生成智慧最佳化
  • 擴散模型在智慧駕駛的應用
  • 擴散模型的實際應用

第2章:端到端自動駕駛的技術路線與發展趨勢

  • 端到端自動駕駛技術趨勢
  • 端到端大型模式中智慧駕駛演化路徑的總結
  • 趨勢 1:2026 年大規模自動駕駛模型演進的核心重點將是多種技術路線的競爭和深度整合。
  • 整合範例 1:Afari Technology 的自動駕駛系統採用 VLA+E2E 協作閉合迴路。
  • 整合範例 2:L3 啟用的世界行動模型 (WAM) 建構了「VLA + 世界模型 + 安全對抗模型」的三方架構。
  • 趨勢 2:VLA 和世界模型融合範式有望成為實體人工智慧實現的主流方法之一。
  • VLA + 世界模式整合案例研究 1:小米 OneVL 將 VLA 和世界模式整合到單一框架中
  • 小米 OneVL 架構拆解
  • VLA+World 模型整合案例研究 2:小鵬汽車推出 X-World
  • VLA+世界模型整合案例研究3:透過華為DriveVLA-W0替代世界建模任務預測未來影像
  • DriveVLA-W0架構的解構
  • DriveVLA-W0 利用全域模型來放大自動駕駛資料的縮放規律。
  • VLA+世界模型整合案例研究 4:Bosch ExploreVLA 實現了基於 VLA+RL 的世界模型,並取得了三項重大突破。
  • Bosch ExploreVLA 模型架構的分解
  • 趨勢三:自動駕駛正進入實體人工智慧階段。
  • 實體人工智慧的終極形式將連接數位世界和物理世界,而自動駕駛將是實現這一目標的最佳媒介。
  • 趨勢 4:智慧駕駛 AI 的發展正加速向基礎模型演進,產業正進入這些基礎模型的通用認知和推理能力的競爭階段。
  • 案例 1:基於 DeepRoute 40B VLA 模型中的硬核心技術創新
  • 案例研究 2:卓宇科技 2026 年策略的核心:建構行動智慧平台模式 (1)
  • 案例研究2:卓宇科技2026策略的核心:建構行動智慧平台模式(2)
  • 案例研究3:小鵬世界基金會模式
  • 趨勢 5:端到端自動駕駛進入複雜的運行階段,數據閉合迴路競爭日益激烈。
  • 案例:NVIDIA MOSAIC
  • 趨勢 6:機器人和智慧駕駛將是通往通用人工智慧 (AGI) 的兩大端到端應用場景。
  • 端到端自動駕駛市場趨勢
  • ADAS一級供應商端對端自動駕駛大規模模型配置比較
  • 與其他端對端自動駕駛系統供應商的解決方案配置比較
  • 各廠商端到端自動駕駛大規模模型配置比較(1):小米、小鵬、理想汽車、蔚來汽車
  • 不同廠商(2):長安、比亞迪、躍遷汽車端到端自動駕駛大規模模型配置對比
  • 將奇瑞、東風汽車和意進汽車這三家汽車製造商的端到端自動駕駛大規模模型配置進行比較。
  • 對廣汽、一汽紅旗和吉利四家汽車製造商的端到端自動駕駛大規模模型配置進行比較

第3章:端到端自動駕駛供應商

第4章:端對端自動駕駛中的OEM佈局

  • Xiaomi
  • 輪廓
  • 2026年策略計劃
  • 2026年新車計畫的全面分析
  • 2026年新車的產品定位與參數基準分析
  • 智慧駕駛部門的組織結構重組
  • 智慧駕駛技術路線:對所有路線進行全面的行前調查,不依賴單一技術。
  • VLA 與端對端路由的比較
  • 智慧駕駛演算法的演進趨勢:從模組化端到端架構到端到端架構;世界模型+強化學習的引入
  • 將於 2026 年發布 XLA 認知大型模型。
  • 智慧駕駛系統與大型模式的發展藍圖
  • HAD擴充
  • Orion:端對端VLA智慧駕駛解決方案
  • REION框架
  • 物理世界建模架構
  • 三層分離建模:多模型端對端
  • 長影片生成框架 - MiLA
  • XPeng
  • 面向端對端智慧駕駛大型車款的演進藍圖
  • 自動駕駛產品規劃,2025-2026年
  • 2026 年 L4 級自動駕駛佈局:無人計程車
  • 第二代VLA:一種原生多模態物理世界大規模模型
  • L4 功能 = 模型 × 運算能力 × 資料量 × 車輛硬體
  • 第二代超高速航空
  • 世界基金會模式
  • 世界基金會模式核心技術路徑
  • 世界基金會模式研究與發展成果的三個階段。
  • 雲模型工廠
  • 端對端系統:架構
  • Li Auto
  • 大型車型端到端智慧駕駛演進藍圖
  • 將於 2026 年發布下一代整合架構「MindVLA-o1」。
  • 下一代整合架構 MindVLA-o1
  • 從 E2E+VLM 雙系統到 MindVLA 的演變
  • MindVLA模型架構
  • MindVLA的核心技術1:卓越的3D空間辨識能力
  • MindVLA 的核心技術 2:與大規模語言模型 (LLM) 的整合
  • MindVLA核心技術3:擴散與RLHF的結合
  • MindVLA 的核心技術 4:世界模型與 NVAIE 加速強化學習
  • 端對端解決方案
  • Tesla
  • 2024年人工智慧大會解讀
  • AD演算法的發展史
  • 2023-2024 年全程進展總結
  • FSD v13
  • 自動駕駛演算法的發展史:進入一個強調感知和地圖的時代
  • AD演算法的發展史
  • AD演算法發展歷程:多相機融合演算法HydraNet
  • AD演算法發展歷程:FSD V12
  • 感知與決策全端整合模型的核心要素
  • 端到端演算法
  • 世界模型
  • 數據引擎
  • 道場超級電腦中心:概述
  • 道場超級電腦中心:基於D1晶片整合的訓練模組
  • 道場超級電腦中心:運算能力發展計劃
  • NIO
  • 智慧駕駛事業部重組,2024-2025年
  • 從基於模型到端到端,世界模型成為主導技術範式。
  • 端到端大規模模型的演化路徑
  • 智慧駕駛系統的詳細描述
  • 蔚來世界模型(NWM)
  • 重構世界模型想像與群體智慧的能力
  • NSim 模擬器(NIO 仿真)
  • 世界模型 2.0
  • 端到端模型與世界模型的比較
  • VLA模型與世界模型的比較
  • Changan
  • Dubhe Plan 2.0 - Tenju 智慧駕駛
  • TOPS AD 軟體架構
  • 品牌佈局
  • ADAS策略:「杜布計畫」策略
  • 端對端系統:BEV+LLM+GoT
  • 量產車型配備端對端系統:NEVO E07
  • Chery
  • 產品矩陣與車輛型號
  • 智慧駕駛系統的發展史
  • 2025年將推出四款獵鷹飛行員無人機。
  • 大型端對端智慧駕駛模式的進展
  • GAC Group
  • 智慧駕駛大型車型策略
  • ADiGO智慧駕駛系統演進藍圖(ADiGO 1.0至ADiGO 6.0)
  • 將於 2025 年推出五大智慧駕駛平台。
  • L2.9車輛和城市NOA演算法/智慧駕駛系統供應商
  • 「雙梯度智慧驅動供應商+基於場景的定價匹配」策略使城市網路營運商能夠實現「高階定位+大眾市場可及性」。
  • 華為採用「廣汽智慧製造+華為智慧」模式,拓展高階市場,強化品牌矩陣。
  • 華王愛達樂園 F03 的首款車型預定於 2026 年第二季發布。
  • Momenta 5.0 的單模型端對端演算法現已安裝在 15 萬元級的車輛中,並且還提供都市區NOA(僅噪音接入)功能。
  • 傳訊馳上灣S7配備了加強型、更大的Momenta R6車型。
  • ADiGO端到端表現推理模型的架構
  • ADiGO的核心技術
  • Leapmotor
  • 全球模型將於2026年發布
  • D19 採用大型 VLA 模型,實現了門到門 NOA,支援所有場景。
  • 採用了自主研發的智慧駕駛系統。
  • LeapMotor Pilot 發展藍圖
  • 端對端先進智慧駕駛
  • 端到端高階智慧駕駛應用場景
  • IM Motors
  • 智慧駕駛系統迭代歷史
  • 與 Momenta 合作開發智慧駕駛技術
  • IM AD 端對端 2.0 智慧駕駛大型機型
  • IM AD 端對端 2.0 智慧駕駛:大型車款核心技術
  • 大型車款與IM AD端對端2.0智慧駕駛應用場景對比
  • FAW Hongqi
  • 思南智慧駕駛技術架構
  • 端到端大型模型的核心技術
  • 思南智慧駕駛解決方案
  • 西南智慧駕駛解決方案車輛部署計畫及未來規劃
  • 思南智慧駕駛系統:與大疆卓宇科技合作開發
  • 配備Sinan智慧駕駛系統及其主要配置的車輛
  • 卓宇端到端4.0系統將於2026年在西南智慧駕駛展上首次亮相。
  • 一汽紅旗9系列車款計畫於2026年採用華為的高科技功能。
  • Dongfeng
  • 智慧駕駛策略規劃:2026-2030年
  • 2025年,天元將發表四階段智慧駕駛產品矩陣:涵蓋L2至L4/L5。
  • 針對首次搭載天元T100/T200/T500的量產車型智慧駕駛配置進行比較
  • 天元智慧駕駛技術架構研發
  • 智慧駕駛策略:短期內,內部研發與外部採購並行運作;長期來看,逐步取代內部研發。
  • BYD
  • 2026年智慧駕駛計畫概述
  • 智慧駕駛領域的佈局:全球車型初步研究
  • 智慧駕駛團隊重組(1):整合兩個智慧駕駛部門,共用資源,加速實現普適智慧駕駛。
  • 智慧駕駛團隊組織結構調整(2):增加先進技術研發中心建置的投資
簡介目錄
Product Code: DTT011

Research on Intelligent Driving Large Models: A Critical Period for Technological Competition and Paradigm Integration

As autonomous driving technology rapidly iterates from L2 to L3-L4, intelligent driving systems are shifting profoundly from traditional rule-driven architectures to the new generation of data-driven + cognition-driven architectures. As the underlying core enabler, intelligent driving large models have become the core track in industry competition. As the accelerated arrival of the Physical AI era, autonomous driving stands as its first large-scale application scenario, promoting automobiles to evolve rapidly into super agents that transcend the nature of traditional transportation tools and become all-scenario intelligent hubs connecting mobility, mobile office, home life, and third-party ecosystems.

From an industrial perspective, Physical AI remains in the early stage of technological fission, and the global autonomous driving market holds massive untapped potential. According to the data, there is a global ownership of about 1.5 billion passenger cars, 280 million commercial vehicles and trucks, and 18 million operating taxis. The total annual global driving mileage reaches 13 trillion kilometers, while the autonomous driving mileage is only 700 million kilometers, accounting for only about 0.006%. The future incremental potential is significant.

Judging further from the pace of technological implementation, intelligent driving large models are ushering in a critical technological iteration window period. The segmented end-to-end solution has come into mass production during 2024-2025, and the one-model end-to-end and VLA technologies are intensively implemented during 2025-2026. Coupled with the continuous upgrading of intelligent driving experience and the accelerated maturation of L3-L4 high-level autonomous driving technology, physical AI is accelerating. ResearchInChina predicts three major evolution trends of intelligent driving large models.

Trend 1: The Core Focus of Autonomous Driving Large Model Evolution in 2026 Will Be Competition and Deep Integration of Multiple Technical Routes.

Bosch,Momenta Integration Mode 1: One-model End-to-End + World Model + Reinforcement Learning, Representative Suppliers: WeRide, Bosch and Momenta

Features: The one-model end-to-end model serves as the core neural network of intelligent driving, directly connecting sensor input and driving output with zero information loss and extremely high performance ceiling; the world model is responsible for future deduction of road conditions and can generate massive long-tail scenarios at low cost for simulation training; reinforcement learning iterates and optimizes in the deduction space relying on the reward mechanism, outputs the optimal driving strategy, and copes with various sudden working conditions. The combination of the three forms a powerful closed loop of "data generation (world model) -> policy training (reinforcement learning) -> decision and execution (end-to-end model)". This enables intelligent driving systems to learn from massive driving data and keep evolving.

Integration Mode 2: E2E + Foundation Model (VLM/VLA) + Reinforcement Learning + World Model, Representative Suppliers: Horizon Robotics and Afari Technology

Features: The vision-language large model acts as the "cerebrum" responsible for cognitive reasoning, and the small end-to-end model acts as the "cerebellum" responsible for rapid execution.

Horizon Robotics adopts the one-model E2E + VLM + reinforcement learning + world model. Horizon Robotics' "fast thinking + slow thinking" dual-track intelligent driving architecture takes reinforcement learning as the hub. On the one hand, it empowers the end-to-end intuition model through the world model and simulation training, enabling it to respond in milliseconds while complementing the ability to handle rare short-time-sequence long-tail scenarios. On the other hand, it empowers the VLM cognitive model through reasoning enhancement, strengthening its semantic understanding and logical reasoning capabilities for long-time-sequence complex scenarios. It finally realizes the migration of VLM capabilities to the vehicle model, and completes lightweight deployment by quantization and distillation, building a balanced closed loop of "millisecond-level fast response + long-time-sequence slow reasoning".

Afari Technology adopts the VLA + E2E + world model architecture, in which the VLA model is responsible for reasoning similar to the high-level decision by the slow system, and the E2E end-to-end algorithm is responsible for mapping actions similar to the fast system. The 32B-parameter large model is used for large-scale multimodal pre-training (VLM) -> distilled into a 7B lightweight model, balances performance and deployment (VLM) -> aligning perception and driving actions, introduces driving domain knowledge (VLA) -> supervised fine-tuning, and learns high-level driving strategies and behavioral norms -> reinforcement learning aligning human driving styles and safety constraints, realizing perception-decision-control closed-loop optimization.

Integration Mode 3: VLA + World Model, Representative Suppliers: Zhuoyu Technology and XPeng

Features: VLA is responsible for perceiving the current environment, learning historical driving patterns, and determining the next action. The world model is responsible for deducing how each target on the road will interact in the next 5 to 10 seconds. VLA is good at understanding the present but not predicting the future; the world model is good at prediction but does not reflect on and reason about the prediction results. The combination of the two constitutes a complete brain.

Trend 2: The VLA and world model fusion paradigm is expected to become one of the main ways for the implementation of Physical AI.

The core of the future evolution of intelligent driving large models is the fundamental reconstruction of the underlying paradigm from "imitating human driving" to "understanding the physical world". VLA and world model are not an either-or choice. The future intelligent driving large model will be a fusion of the two. At present, the divergence between the two routes lies in that VLA advocates believe that "understanding" is the premise of driving, while world model advocates believe that "prediction" is the key.

World model advocates believe that changes in the physical world are continuous and high-dimensional. Language is a discrete, low-dimensional symbolic system - the transformation from physics to language is inevitably accompanied by information loss. The world model directly operates physical representations with higher bandwidth. VLA advocates believe that the biggest advantage of VLA is that it can be fine-tuned with the world model or model-based reinforcement learning. It can absorb the advantages of the world model, while the world model cannot utilize the advantages of VLM/VLA. Language brings strong generalization capability for it is a compressed package of human common sense. VLA possesses "common sense reasoning" capability and Chain-of-Thought (CoT) via language, thus gaining self-explanation capability.

Based on the advantages and divergences of the two routes, the industry has begun to explore the fusion path of the two. At present, there are three mainstream fusion modes for VLA and world model: latent space unified fusion, in-depth fusion at the architectural level, and modular collaborative fusion (cloud simulator type).

Fusion Mode 1: Latent Space Unified Fusion, Representatives: Xiaomi OneVL and Huawei DriveVLA-W0

The core is to embed the prediction capability of the world model into the training objectives of VLA, rather than adding additional modules in the reasoning stage. Specifically, it adds a future image prediction task to the training process of the VLA model, allowing the model to not only learn to predict actions, but also the environmental state (i.e., future images) at future moments. This design forces the model to learn the underlying dynamic laws of the driving environment, rather than just fitting sparse action supervision signals.

Case 1 of Latent Space Unified Fusion: Xiaomi OneVL Autonomous Driving Model

On May 13, 2026, Xiaomi officially released Xiaomi OneVL, a fully open-sourced autonomous driving model which unifies the three technical routes of VLA, world model and latent space reasoning into the same framework. The core breakthrough of this model is the in-depth unification of multiple technical paradigms through latent space reasoning. Differing from traditional solutions that decompose the reasoning process into human-readable natural language and generate deduction logic word by word, Xiaomi OneVL directly completes end-to-end logical operations in the high-dimensional vectorized latent space. This latent space integrates both the scenario perception and understanding capability of VLA and the environmental time-series prediction capability of the world model, and all reasoning operations are carried out at the vector level rather than the text level, achieving a significant leap in reasoning efficiency compared with traditional VLA solutions.

In terms of implementation mechanism, firstly, two types of latent variables are introduced inside the model: visual latent token and language latent token. The former is responsible for encoding physical relationships and time-series changes in the scene, carrying the prediction capability of the world model. The latter is responsible for expressing driving intentions and semantic logic, carrying the understanding capability of VLA.

Secondly, OneVL introduces two auxiliary decoders, which are only used in the training stage. The language auxiliary decoder is responsible for restoring human-readable CoT text from the language latent token, explaining why the model makes a certain driving decision. The visual auxiliary decoder is responsible for predicting future frame visual tokens (images after 0.5 seconds and 1.0 seconds) from the visual latent token, allowing the model to predict scene changes. During inference, both decoders are removed, and the model directly outputs planning results, realizing one-step reasoning and completely eliminating the delay accumulation caused by autoregression.

Case 2 of Latent Space Unified Fusion: Huawei DriveVLA-W0 Predicts Future Images Through World Modeling Tasks

Traditional VLA models face a fundamental problem: Supervision Deficit. The input of VLA models is high-dimensional multimodal data (front-view image sequences, language instructions, historical actions, etc.), but the supervision signal is only low-dimensional action tokens. Most of the model's representation capacity is wasted, resulting in its inability to fully learn the complex dynamics of the driving environment, and the huge potential of VLA models cannot be effectively released.

As can be seen from the figure below, as the amount of training data increases from 700,000 frames to 7 million frames and then to 70 million frames (ever more data), the collision rate shows a downward trend, that is, the more training data, the better the safety. However, for the traditional VLA technical paradigm without the world model, when the data increases from 7 million frames to 70 million frames, the decline in collision rate slows down, indicating that data has limited effect on improving the safety performance of VLA.

To solve the sore points of VLA such as sparse supervision, failure of data scaling law, and lack of physical time-series prediction capability, Huawei proposed the DriveVLA-W0 training paradigm in its paper, introducing the world model to predict future images as dense self-supervision signals during the training stage, so as to increase future time-series prediction while maintaining the ability to understand environmental dynamics. Compared with traditional VLA, DriveVLA-W0 adds world modeling (predicting future road conditions): the more data, the greater the advantage is magnified, and the data scaling law is strengthened.

Specifically, it adds a future image prediction task to the training process of the VLA model, allowing the model to not only learn to predict actions, but also the environmental state (i.e., future images) at future moments. This design forces the model to learn the underlying dynamic laws of the driving environment, rather than just fitting sparse action supervision signals.

Fusion Mode 2: In-depth Fusion at the Architectural Level, Representative: VLA-World

Differing from pre-training fusion (external reinforcement), where the world model acts as an external tool to generate first and then transmit, in-depth fusion at the architectural level internalizes the world model capability into the native capability of VLA, with planning and generation growing together in the same architecture.

VLA-World, jointly proposed by Shanghai Jiao Tong University and Huawei Central Research Institute in April 2026, is an integrated VLA architecture with deeply embedded world model capabilities. In traditional solutions, the world model and VLA are independent of each other, with the former responsible for generating simulation videos and the latter for perception reasoning and decision output. VLA-World adopts a single VLA backbone network for feature sharing between visual generation and decision reasoning. It integrates trajectory prediction and visual generation into continuous links of the same decision chain, and follows the causal logic of predicting motion trajectory first and then deducing future images based on the trajectory, realizing deep module coupling and highly coherent reasoning chain.

Working Mechanism:

Trajectory Perception Conditioning: VLA-World predicts the trajectory first, and then generates future frames conditioned on the trajectory: the trajectory prediction result directly serves as the conditioning signal for visual generation to guide the generation process. In this way, the trajectory determines "where to go", and the image presents "what to see when arriving there", forming a causal dependency.

Unified Generation and Reasoning: Differing from the past when the world model and VLA were two independent modules, VLA-World enables the two to share the same VLA backbone, that is, unifying visual generation and reasoning in the same VLA structure.

GRPO End-to-End Alignment: GRPO (Group Relative Policy Optimization) is used to optimize the model during the reinforcement learning stage. The model generates multiple candidate trajectories and corresponding future images, and rewards those results where the "imagined future" is consistent with the "real safe decision". This mechanism makes visual generation no longer an independent task, but always serves the quality of downstream decisions.

Trend 3: The Evolution of Intelligent Driving AI Towards Foundation Models Accelerates, and the Industry Will Enter A Competition Period of General Cognitive and Reasoning Capabilities of Foundation Models.

2026 is the first year of the launch of autonomous driving foundation models. DeepRoute.ai, Afari Technology, Zhuoyu Technology, Li Auto, and XPeng have launched related products. The core of foundation models is to build a universal and reusable cognitive base for the physical world, realizing full-level intelligent driving compatibility and cross-scenario capability migration.

Firstly, autonomous driving is essentially a typical scaling problem, and current implementation is mainly restricted by insufficient model capacity and low efficiency of data closed-loop. First of all, the existing foundation models have limited scale and insufficient generalization capability for long-tail complex scenarios; secondly, high-value data mining relies on manual screening and review, with fragmentation and low automation, limiting long-term iterative capabilities.

To address the two bottlenecks of insufficient model capacity and inefficient data closed-loop, DeepRoute.ai proposed a solution, a unified 40B-parameter VLA foundation model. The core innovation lies in the "trinity" model role design, allowing the same model to play three roles simultaneously: driver (visual input -> real-time driving decision), analyst (diagnostic understanding of key scenarios), and critic/ referee (evaluating the safety and rationality of driving behavior), upgrading the driving system from a simple execution system to an intelligent system with cognitive capabilities.

In the pre-training stage, DeepRoute.ai abandons the traditional approach of the end-to-end model relying on trajectory supervision (data utilization rate is only 0.001%), and instead adopts the video prediction task, enabling the model to learn the dynamic structure of the real world by predicting video sequences, turning every pixel into a supervision signal and increasing the data utilization rate to nearly 100%.

In the core training stage (Mid-train), the model conducts joint training around three tasks: V+A (vision + action) to learn conventional end-to-end driving, V+A->L (explanation after action) to activate the analyst and critic roles, and V->L+A (multimodal logical reasoning) to train a driver with reasoning capability, using Chain-of-Thought to let the model first output language descriptions and decision logic of key events, and then output specific driving trajectories.

In terms of engineering implementation, DeepRoute.ai controls the single-step processing latency of 1,000 visual tokens and dozens of reasoning tokens within 60-85 milliseconds using optimization methods such as KV Cache, Multi-Token Prediction (MTP), model quantization, and self-developed reasoning engine, realizing 10-15Hz real-time closed-loop control capability. Moreover, the foundation model can be flexibly distilled according to the computing power of vehicle chips, and deploy a pure driving VA model on a 100 TOPS platform, and a VLA model with logical reasoning capability on a 500 TOPS platform.

Then the foundation model pre-trains to learn the physical laws and spatial logic of the real world, with native zero-shot migration capability. With a universal cognitive base, it adapts to all levels from L2 assisted driving to L4 autonomous driving through model distillation, computing power tailoring, and capability fine-tuning. It is first applied to autonomous driving, and will migrate to multiple tracks such as humanoid robots and industrial robots in the future, realizing "one foundation making all things intelligent".

In 2026, Zhuoyu Technology fully transforms its strategy. Taking the native multimodal foundation model as the technical base, it aims to upgrade from an "intelligent driving Tier 1 supplier" to a "mobile physical AI company", focusing on mass production expansion across all scenarios and vertical domains covering passenger cars, commercial vehicles, L4 products and overseas layout, and extending to the field of embodied robots.

Zhuoyu launched VLA (VLA World Model, native multimodal FM): it uses a unified Backbone to process visual, text, and sensor data, completes physical reasoning in the latent space, and directly outputs driving actions. From the pre-training stage, it conducts joint training with image/video/text/driving/robot data, and performs prediction and reasoning of the physical world in a unified latent space, understanding both semantics and physical laws.

In 2026, a critical year for the technological iteration and paradigm fusion of intelligent driving large models, the competition and integration of multiple technical routes, the collaborative implementation of VLA and world model, and the large-scale launch of foundation models will jointly promote the intelligent driving industry to accelerate from "technological exploration" to "large-scale implementation". Whether it is technological innovation of multi-route integration or generalized layout of foundation models, the core is to revolve around the goal of "safer, more efficient, and more adaptable to real driving scenarios". The trend of "physical AI" implementation will further drive intelligent driving systems to evolve from "imitating humans" to "understanding the world", realizing true intelligent driving.

In the future, with the continuous iteration of technologies and the coordinated improvement of the industry chain, intelligent driving large models will gradually break through existing bottlenecks, become the core support for the large-scale implementation of autonomous driving, reshape the development pattern of the mobility sector, and also facilitate the extension and application of mobile physical AI in more scenarios.

Table of Contents

1 Fundamentals of End-to-End Autonomous Driving Technology

  • 1.1 Terms and Concepts of End-to-End Autonomous Driving
  • Explanation of End-to-End Autonomous Driving Terminologies
  • Correlation and Differences of End-to-End Related Concepts
  • 1.2 Introduction to End-to-End Autonomous Driving and Development Status
    • 1.2.1 Overview
    • Emerging Background of End-to-End Autonomous Driving
    • Deduced Impacts of Large AI Models on the Pattern of Autonomous Driving Industry
    • Reasons for the Emergence of End-to-End Autonomous Driving: Commercial Value
    • Transformer Enables Autonomous Driving
    • Differences between End-to-End and Traditional Architectures (1)
    • Differences between End-to-End and Traditional Architectures (2)
    • Evolution of End-to-End Architecture
    • Evolution Route of End-to-End Autonomous Driving
    • Comparison between One-Model and Two-Model End-to-End
    • Performance Parameter Benchmarking of Mainstream One-Model/Segmented End-to-End Systems
    • Challenges and Solutions for Large-Scale Mass Production of End-to-End: Computing Power Supply/Data Acquisition
    • Challenges and Solutions for Large-Scale Mass Production of End-to-End: Team Building/Interpretability
    • Progress and Challenges in End-to-End Systems: World Model Generation + Neural Network Simulator + RL Accelerating Innovation
    • Perception Layer under End-to-End Architecture
    • 1.2.2 Implementation Methods of End-to-End Models
    • Two Implementation Approaches for End-to-End
    • End-to-End Implementation Method: Imitation Learning
    • End-to-End Implementation Method: Reinforcement Learning
    • Basic Architecture and Definition of Reinforcement Learning
    • Mainstream Reinforcement Learning Algorithms
    • 1.2.3 Verification Methods of End-to-End Models
    • Dataset Evaluation Methods for End-to-End Autonomous Driving
    • Three Major Simulation Tests for End-to-End Autonomous Driving Models (1) - Bench2Drive
    • Three Major Simulation Tests for End-to-End Autonomous Driving Models (2) - HUGSIM
    • Three Major Simulation Tests for End-to-End Autonomous Driving Models (3) - DriveArena
  • 1.3 Classic End-to-End Autonomous Driving Cases
  • SenseTime UniAD: Path Planning-Oriented Large AI Model Provides E2E Commercial Scenario Applications
  • Technical Principles and Architecture of SenseTime UniAD
  • Technical Principles and Architecture of Horizon VAD
  • Technical Principles and Architecture of Horizon VADv2
  • Training of VADv2
  • Technical Principles and Architecture of DriveVLM
  • Li Auto Adopts Mixture-of-Experts (MoE) Architecture
  • MOE and STR2
  • Shanghai Qi Zhi Institute's E2E-AD Model SGADS: A Safe and Generalized E2E-AD System Based on Reinforcement Learning and Imitation Learning
  • Shanghai Jiao Tong University's ActiveAD Active Learning Case: Solving Data Labeling Bottleneck from A Data-centric Perspective
  • Most End-to-End Autonomous Driving Systems Are Developed Based on Foundation Models
  • 1.4 Foundation Models
    • 1.4.1 Introduction to Foundation Models
    • Significance of Introducing Multimodal Models into End-to-End Autonomous Driving
    • Core of End-to-End Systems - Foundation Models
    • Foundation Model 1: Large Language Model (LLM) - Application Cases in Autonomous Driving
    • Foundation Model 2: Vision Foundation - Application in Intelligent Driving
    • Foundation Model 2: Vision Foundation - Latent Diffusion Models Framework
    • Foundation Model 2: Vision Foundation - Wayve GAIA-1
    • Foundation Model 2: Vision Foundation - DriveDreamer Framework
    • Foundation Model 3: Multimodal Foundation Model - MFM
    • Foundation Model 3: Multimodal Foundation Model - Application of GPT-4V in Intelligent Driving
    • 1.4.2 Foundation Models - Multimodal Foundation Model
    • Development and Overview of Multimodal Foundation Model
    • Multimodal Foundation Model vs. Single-Modal Foundation Model (1)
    • Multimodal Foundation Model vs. Single-Modal Foundation Model (2)
    • Technical Panorama of Multimodal Foundation Model
    • Multimodal Information Representation
    • 1.4.3 Foundation Models - MLLM
    • Multimodal Large Language Model (MLLM)
    • Architecture and Core Components of Multimodal Large Language Model
    • Mainstream Multimodal Large Language Models
    • Application of Multimodal Large Language Model in Intelligent Driving
    • CLIP Model
    • LLaVA Model
  • 1.5 Vision-Language Model (VLM)
  • Application of Vision-Language Model (VLM) in Intelligent Driving
  • Application of Foundation Models in Autonomous Driving
  • Application of Vision-Language Model (VLM)
  • Development History of Vision-Language Model (VLM)
  • Architecture of Vision-Language Model (VLM)
  • Application Principles of VLM in End-to-End Autonomous Driving
  • Application of VLM in End-to-End Autonomous Driving
  • Challenges Faced by VLM Models in Intelligent Driving
  • 1.6 Vision-Language-Action Model (VLA)
  • VLM->VLA
  • VLM +E2E ->VLA
  • Analysis of VLA Architecture
  • Typical VLA Architectures
  • VLA Architecture Analysis Case: Disassembling Li Auto MindVLA Architecture (1)
  • VLA Architecture Analysis Case: Disassembling Li Auto MindVLA Architecture (2)
  • Concept of VLA Large Models
  • Principles of VLA Model
  • Classification of VLA Models
  • Interpretation of VLA Technology Evolution
  • Large Language Model as One of the Cores of End-to-End
  • Technical Architecture and Key Technologies of VLA
  • Advantages of VLA (1)
  • Advantages of VLA (2)
  • Advantages of VLA (3)
  • Deployment Challenges of VLA Model - Real-Time Response Capability
  • Real-Time Performance and Memory Occupancy Challenges of VLA Model Deployment
  • Deployment Challenges of VLA Model - Data (1)
  • Deployment Challenges of VLA Model - Data (2)
  • Deployment Challenges of VLA Model - Long-Term Task Planning Capability
  • Evolution Route of VLA Large Models
  • Representative Models of VLA Technical Paradigms
  • VLA Datasets and Benchmarks
  • 1.7 World Model
  • World Model Prototype: Mental Model (1)
  • World Model Prototype: Mental Model (2)
  • Key Definitions and Application Development of World Model
  • Basic Architecture of World Model
  • Three Core Values of World Model Empowering Autonomous Driving
  • Two Major Technical Routes of World Model
  • Generative World Model DIAMOND: Diffusion Model + Real-Time RL Adaptation + Long-Term Stability
  • Generative Interactive World Model Genie: Unsupervised Learning of Real-World Physical Laws from Unlabeled Internet Videos
  • Technical Principles and Paths of WorldDreamer
  • Implicit World Model: Technical Principles and Paths of V-JEPA2
  • Implicit World Model: Technical Principles and Paths of Comma.ai
  • Framework Setting and Implementation Difficulties of World Model
  • Video Generation Methods Based on Transformer and Diffusion Models
  • World Model May be One of the Ideal Approaches to Realize End-to-End Autonomous Driving
  • World Model - Generation of Virtual Training Data
  • World Model - Tesla World Model
  • World Model - NVIDIA
  • InfinityDrive: Breaking Time Limits in Driving World Models
  • Parameter Performance of SenseAuto InfinityDrive
  • Pipeline of SenseAuto InfinityDrive
  • SenseTime DiT Architecture and Main Video Generation Evaluation Metrics FID/FV
  • Deployment Challenges of World Model in Autonomous Driving
  • 1.8 Comparison between End-to-End Large Model Technical Paradigms
    • 1.8.1 Technical Paradigm Comparison: Modular End-to-End vs. One-Model End-to-End vs. VLM/VLM+E2E/VLA
    • Summary of Comparison between Three Mainstream Intelligent Driving Models (1): Modular / One-Model End-to-End / Foundation Model-Based Autonomous Driving Paradigm
    • Summary of Comparison between Three Mainstream Intelligent Driving Models (2): Modular / One-Model End-to-End / Foundation Model-Based Autonomous Driving Paradigm
    • Summary of Comparison between Three Mainstream Intelligent Driving Models (3): Modular / One-Model End-to-End / Foundation Model-Based Autonomous Driving Paradigm
    • Definition and Classification of Generalized End-to-End (GE2E)
    • Comparison of Different GE2E Autonomous Driving Paradigms: Planning-Only E2E vs. Multi-Task E2E
    • Comparison of Different GE2E Autonomous Driving Paradigms: VLM-Driven Cognitive End-to-End Driving
    • Comparison between Two Technical Paradigms: VLM + Traditional E2E
    • Architecture Summary of Various GE2E Autonomous Driving Models
    • Performance Comparison between Various GE2E Autonomous Driving Models
    • 1.8.2 Technical Paradigm Comparison: VLA vs. World Model
    • VLA vs. World Model: Who will Win?
    • Performance Competition between VLA and World Model
    • Summary of Comparison between VLM/VLA/World Models
  • 1.9 Diffusion Models
  • Four Mainstream Generative Models
  • Principles of Diffusion Models
  • Diffusion Models Optimize Core Links of Intelligent Driving Trajectory Generation
  • Diffusion Models Optimize Intelligent Driving Trajectory Generation
  • Application of Diffusion Models in Intelligent Driving
  • Practical Application Cases of Diffusion Model

2 Technical Routes and Development Trends of End-to-End Autonomous Driving

  • 2.1 Technical Trends of End-to-End Autonomous Driving
  • Summary of Evolution Route of Intelligent Driving End-to-End Large Models
  • Trend 1: The Core Focus of Autonomous Driving Large Model Evolution in 2026 Will Be Competition and Deep Integration of Multiple Technical Routes
  • Integration Case 1: Overall Architecture of Afari Technology's Autonomous Driving System Adopts VLA+E2E Collaborative Closed Loop
  • Integration Case 2: L3-Capable World Action Model (WAM) Builds Trinity Architecture of "VLA + World Model + Safety Adversarial Model"
  • Trend 2: VLA and World Model Fusion Paradigm Is Expected to Become One of the Mainstream Approaches for Physical AI Implementation
  • VLA+World Model Integration Case 1: Xiaomi OneVL Unifies VLA and World Model into One Framework
  • Disassembly of Xiaomi OneVL Architecture
  • VLA+World Model Integration Case 2: XPeng Launches X-World
  • VLA+World Model Integration Case 3: Huawei DriveVLA-W0 Predicts Future Images via World Modeling Tasks
  • Disassembly of DriveVLA-W0 Architecture
  • DriveVLA-W0 Leverages World Models to Amplify Autonomous Driving Data Scaling Law
  • VLA+World Model Integration Case 4: Bosch ExploreVLA Introduces World Model Based on VLA+RL to Achieve Three Major Breakthroughs
  • Disassembly of Bosch ExploreVLA Model Architecture
  • Trend 3: Autonomous Driving Is Entering the Physical AI Stage
  • Ultimate Form of Physical AI Connects Digital and Physical Worlds, and Autonomous Driving Serves as Its Optimal Implementation Carrier
  • Trend 4: Evolution of Intelligent Driving AI Towards Foundation Models Accelerates, and the Industry Will Enter A Competition Period of General Cognitive and Reasoning Capabilities of Foundation Models
  • Case 1: Hardcore Technological Innovations in DeepRoute 40B VLA Foundation Model
  • Case 2: Core of 2026 Strategy of Zhuoyu Technology: Building Mobile Intelligent Foundation Model (1)
  • Case 2: Core of 2026 Strategy of Zhuoyu Technology: Building Mobile Intelligent Foundation Model (2)
  • Case 3: XPeng World Foundation Model
  • Trend 5: End-to-End Autonomous Driving Has Entered the Stage of Data Closed-Loop Competition and Refined Operation
  • Case: NVIDIA MOSAIC
  • Trend 6: Robots and Intelligent Driving Become Two Mainstream E2E Application Scenarios on the Road to AGI (1)
  • Trend 6: Robots and Intelligent Driving Become Two Mainstream E2E Application Scenarios on the Road to AGI (2)
  • 2.2 End-to-End Autonomous Driving Market Trends
  • Comparison of End-to-End Autonomous Driving Large Model Layout between ADAS Tier 1 Suppliers (1)
  • Comparison of End-to-End Autonomous Driving Large Model Layout between ADAS Tier 1 Suppliers (2)
  • Comparison of End-to-End Autonomous Driving Large Model Layout between ADAS Tier 1 Suppliers (3)
  • Comparison of End-to-End Autonomous Driving Large Model Layout between ADAS Tier 1 Suppliers (4)
  • Comparison of End-to-End Autonomous Driving Large Model Layout between ADAS Tier 1 Suppliers (5)
  • Solution Layout Comparison between Other End-to-End Autonomous Driving System Suppliers
  • Comparison of End-to-End Autonomous Driving Large Model Layout between OEMs (1): Xiaomi, XPeng, Li Auto, NIO
  • Comparison of End-to-End Autonomous Driving Large Model Layout between OEMs (2): Changan, BYD, Leapmotor
  • Comparison of End-to-End Autonomous Driving Large Model Layout between OEMs (3): Chery, Dongfeng, IM Motors
  • Comparison of End-to-End Autonomous Driving Large Model Layout between OEMs (4): GAC, FAW Hongqi, Geely

3 End-to-End Autonomous Driving Suppliers

  • 3.1 Afari Technology - End-to-End Autonomous Driving Model
  • Profile
  • Fully Entering into AI-Driven Intelligent Vehicle Era
  • AI + Vehicle Strategy
  • Top-Level Strategy and Commercial Closed Loop
  • Ecosystem Alliance
  • Judgment on Next-Generation End-to-End Architecture Trend (1)
  • Judgment on Next-Generation End-to-End Architecture Trend (2)
  • Judgment on Next-Generation End-to-End Architecture Trend (3)
  • End-to-End Large Model Architecture: E2E2.0+VLA
  • E2E Architecture
  • World Model Closed-Loop Simulation Architecture
  • Native Intelligent Driving Foundation Model
  • Three Major Businesses (1)
  • Three Major Businesses (2): Robotaxi Deployment Plan, 2026-2030
  • Evolution Route of Intelligent Driving Solutions (ASD1.0 to ASD4.0) and End-to-End Large Model
  • Mass Production of Chongqing Qianli Intelligent Driving Technology Co., Ltd.
  • 3.2 Horizon Robotics - End-to-End Autonomous Driving Large Model
  • Ultimate Strategic Roadmap: 2025-2030+
  • Three Strategic Evolutions
  • Latest Product Launches in 2026 (1)
  • Latest Product Launches in 2026 (2)
  • Adopts One-Model End-to-End + VLM Solution
  • Introduction of Reinforcement Learning and World Model
  • Thoughts on One-Model End-to-End Large Models
  • Urban Driving Assistance System: HSD
  • Journey 6 Series Chips
  • SparseDriveV2 (1)
  • SparseDriveV2 (2)
  • UMGen: Unified Framework for Multimodal Driving Scene Generation
  • GoalFlow: Goal-Driven Approach Unlocking New Future of Generative End-to-End Strategies
  • MomAD: Momentum-Aware Planning in End-to-End Autonomous Driving
  • DiffusionDrive: Towards Generative Multimodal End-to-End Autonomous Driving
  • RAD: Post-Training Paradigm of End-to-End Reinforcement Learning Based on 3DGS Digital Twin World
  • Mass Production
  • Super Drive High-Level Intelligent Driving and Its Advantages
  • Architecture and Technical Principles of Super Drive
  • Senna Intelligent Driving System (Large Model + End-to-End)
  • Core Technologies and Training Methods of Senna
  • Core Modules of Senna
  • 3.3 Zhuoyu Technology - Intelligent Driving Large Model
  • Comparison of Three Intelligent Driving Model Paradigms: One-Model End-to-End, World Model and VLA (1)
  • Comparison of Three Intelligent Driving Model Paradigms: One-Model End-to-End, World Model and VLA (2)
  • Launched Mobile Physical AI Foundation Model in 2026: Native Multimodal Foundation Model
  • Comparison between Three VLA Technical Paradigms and Zhuoyu's 2026 Native Multimodal Foundation Model
  • Evolution Route of ClixPilot End-to-End Large Model (1)
  • Evolution Route of ClixPilot End-to-End Large Model (2)
  • End-to-End World Model Architecture
  • Two-Stege Training Model for End-to-End World Model
  • Core Functions of Generative Intelligent Driving GenDrive
  • Core Technologies of Generative Intelligent Driving
  • Two-Model End-to-End
  • Interpretable One-Model End-to-End
  • Mass Production and Clients of End-to-End
  • 3.4 NVIDIA - Intelligent Driving Large Model
  • Ten-Year Layout of Autonomous Driving Business
  • L2++/L4 Intelligent Driving Plan (2026-2030)
  • L3 and L4 Implementation Roadmap of NVIDIA
  • DRIVE Full-Stack Driving Assistance Platform: 5-Layer Architecture
  • Drive Hyperion 10 (1): Hardware Configuration
  • Drive Hyperion 10 (2): Software Architecture
  • Building Autonomous Driving Safety and AI Ecosystem Based on Halos OS
  • DRIVE AV Intelligent Driving Large Model Solution: VLA + Classic Rule-Based Algorithms
  • E2E+VLM->Drive VLA (1)
  • E2E+VLM->Drive VLA (2)
  • VLA On-Vehicle Deployment Solution (1)
  • VLA On-Vehicle Deployment Solution (2)
  • Launched Alpamayo 1.5
  • Drive VLA Technical Route: 10B Large Model Alpamayo 1.5
  • New-Generation In-Vehicle Computing Platform - Drive Thor
  • World Foundation Model Development Platform - Cosmos
  • Cosmos Training Paradigm
  • NVIDIA DriveOS: Foundation Platform Built for Autonomous Driving
  • Core Design Concept of NVIDIA Multicast
  • End-to-End Intelligent Driving Framework - Hydra-MDP
  • Self-Developed Model Architecture - Model Room
  • 3.5 Momenta - Intelligent Driving Large Model
  • Profile
  • R7 Reinforcement Learning World Model
  • Mass-Produced Vehicles Equipped with R7
  • R6 Flywheel Large Model
  • Disassembly of One-Model End-to-End
  • Algorithm Development Path
  • Evolution Roadmap of Intelligent Driving Large Models
  • Intelligent Driving Technology Evolution and Industrial Paradigm Changes
  • End-to-End Planning Architecture
  • End-to-End Large Model Mass Production Solutions
  • 3.6 DeepRoute.ai - Intelligent Driving Large Model
  • Product Layout and Strategic Deployment
  • Launched Unified Foundation Model in 2026
  • Principle, Architecture and Technical Highlights of 40B VLA Foundation Model (1)
  • Principle, Architecture and Technical Highlights of 40B VLA Foundation Model (2)
  • Principle, Architecture and Technical Highlights of 40B VLA Foundation Model (3)
  • Value Brought by Foundation Models
  • End-to-End Intelligent Driving Large Model Evolution, 2023-2026
  • DeepRoute IO 2.0: VLA 2.0 (1)
  • DeepRoute IO 2.0: VLA 2.0 (2)
  • VLA2.0 Designated Mass Production Projects
  • Adopted End-to-End Intelligent Driving Solutions in 2023
  • In-Depth Cooperation with Volcano Engine in 2025
  • Implementation Platform of RoadAGI - AI Spark
  • End-to-End VLA Model: VLA1.0
  • End-to-End VLA Model: Architecture of VLA1.0
  • End-to-End 1.0 Designated Mass Production Projects
  • Introduction of Hierarchical Hint Tokens
  • End-to-End Training Solution - DINOv2
  • Application Value of DINOv2 in Computer Vision
  • VQA Evaluation Dataset for Intelligent Driving
  • BLEU Evaluation Metrics and CIDEr Automatic Evaluation Metric for Image Caption Generation Tasks
  • Score Comparison between DeepRoute HoP and Huawei Solution
  • 3.7 Huawei - End-to-End Intelligent Driving Large Model
  • Evolution Roadmap of Qiankun Intelligent Driving Large Model (ADS2.0 to ADS5)
  • ADS 5 (1): WEWA 2.0 Architecture
  • Comparation between WEWA2.0 and WEWA1.0
  • ADS 5 (2): Computing Power
  • ADS 5 (3): Benchmarking of Four Versions and Production Vehicle Models
  • Hierarchical Architecture of Pangu Large Model
  • Pangu Model Product System (1)
  • Pangu Model Product System (2)
  • ADS 4: WEWA 1.0
  • In-Depth Integration of ADS 4 and XMC, and Cloud Simulation Verification
  • ADS 4: Commercial L3 Highway Solution
  • Mass Production of ADS 4 End-to-End
  • ADS 2.0 (1): End-to-End Concept and Perception Algorithm
  • ADS 2.0 (2): End-to-End Concept and Perception Algorithm
  • Summary of ADS 2.0
  • ADS 3.0 (1): End-to-End
  • ADS 3.0 (2): End-to-End
  • ADS 3.0 (3): ASD3.0 VS. ASD2.0
  • ADS 3.0 End-to-End Application Case (1): STELATO S9
  • ADS 3.0 End-to-End Application Case (2): LUXEED R7
  • ADS 3.0 End-to-End Application Case (3): AITO Series
  • Architecture and Principles of Perception-Enhanced World-Awareness-Action Model (Percept-WAM) (1)
  • Architecture and Principles of Perception-Enhanced World-Awareness-Action Model (Percept-WAM) (2)
  • Architecture and Principles of Perception-Enhanced World-Awareness-Action Model (Percept-WAM) (3)
  • Multimodal LLM End-to-End Autonomous Driving Solution
  • End-to-End Test - VQA Tasks
  • Architecture of DriveGPT4
  • End-to-End Training Solution Case
  • Two Training Stages of DriveGPT4
  • Comparison between DriveGPT4 and GPT4V
  • 3.8 QCraft - Intelligent Driving Large Model
  • Product Matrix in Intelligent Driving: Three-Tier Product Matrix of Intelligent Driving System QPilot 2.0
  • Mass-Produced Urban NOA End-to-End Solution Based on Single Journey 6M Chip
  • Core Technologies Implementing Urban NOA with Single J6M Chip: Interpretable One-Model End-to-End
  • Core Technologies Enabling Ultimate Urban NOA Experience: VLA and World Model Architecture
  • Evolution of Intelligent Driving Large Models
  • Intelligent Driving Solution Evolution Roadmap
  • Data and Model Training Closed Loop
  • Ecosystem Partners Panorama
  • 3.9 Bosch - Intelligent Driving Large Model
  • Zongheng Driving Assistance Solution
  • Urban Driving Assistance Solution Based on End-to-End Model
  • China Strategic Layout of Bosch Mobility
  • Bosch Mobility Launched New Organizational Restructuring and Strategic Cooperation Based on End-to-End Development Trends
  • Adopt One-Model End-to-End for Mass Production Solutions
  • End-to-End Technical Route of Premium Zongheng Driving Assistance Solution
  • Disassembly of One-Model End-to-End Technical Paradigm
  • Comparison between End-to-End Mass Production Solutions
  • Overall Design Idea of CriticVLA
  • Architecture of CriticVLA (1)
  • Architecture of CriticVLA (2)
  • Classification System of Foundation Models for Autonomous Driving Trajectory Planning
  • Customized Foundation Models for Trajectory Planning: Fine-Tuning
  • Foundation Model for Autonomous Driving Trajectory Planning: Customized Foundation Models for Trajectory Planning
  • Foundation Model for Autonomous Driving Trajectory Planning: Models Focused Solely on Trajectory Planning
  • Models and Core Features of Trajectory Planning Methods with Language Interaction Capability
  • Core Features of Models with Action Interaction Capability: Training Datasets, Training Methods and Evaluation Metrics
  • 3.10 WeRide - End-to-End Large Model
  • Profile
  • Business Model
  • Financial Overview, 2023-2025
  • Five Major Product Matrices
  • Exploration of Business Model for L4 Autonomous Driving Multi-Scenario Application
  • Traditional Autonomous Driving Architecture: Two Major Problems of Perception-Prediction-Planning-Control Modular Pipeline
  • Unsolved Problems of One-Model End-to-End
  • E2E + Traditional Pipeline Dual Architecture
  • E2E Model Architecture
  • Evolution Route of End-to-End Autonomous Driving Large Models
  • Hardware Architecture of Gen8 L4 Autonomous Driving System
  • HPC 3.0
  • Self-Developed General Simulation Model: WeRide GENESIS
  • 3.11 Pony.ai - End-to-End Intelligent Driving Large Model
  • Profile
  • Three Major Business Lines and Business Model
  • Robotaxi Business Layout
  • Business Model of Robotaxi
  • Revenue Overview, 2024-2025
  • Comparative Analysis between Pony.ai and WeRide: Market Value, Revenue, Business, Robotaxi Business and Intelligent Driving Models
  • PonyWorld World Model 2.0 (1)
  • PonyWorld World Model 2.0 (2)
  • PonyWorld World Model 2.0 (3)
  • PonyWorld World Model 2.0 (4)
  • E2E End-to-End Intelligent Driving Model
  • Evolution Route of 1st to 7th Generation Robotaxi Products
  • Released New-Generation Autonomous Driving Domain Controller
  • Ecosystem Partners
  • 3.12 Baidu - End-to-End
  • DriVerse: Navigation World Model for Driving Simulation via Multimodal Trajectory Prompting and Motion Alignment
  • Overview of Baidu Apollo
  • Robotaxi Business Layout
  • Commercial Implementation Progress of Robotaxi (1): Overseas Markets
  • Commercial Implementation Progress of Robotaxi (2): Domestic Market
  • Key Nodes of Robotaxi Deployment in 8 Cities in China, 2021-2026
  • Two-Model End-to-End: Adopt the Strategy of Segmenting First and Then Joint Training
  • Production Vehicle Equipped with Two-Model End-to-End Architecture: Jiyue 07
  • Baidu Automotive Cloud 3.0 Enables End-to-End Systems in Three Aspects (1)
  • Baidu Automotive Cloud 3.0 Enables End-to-End Systems in Three Aspects (2)
  • 3.13 SenseAuto - End-to-End
  • Profile
  • Technical Route Analysis 1: End-to-End Autonomous Driving Evolution Roadmap
  • Technical Route Analysis 2: Analysis of Generative Intelligent Driving R-UniAD (1)
  • Technical Route Analysis 3: Analysis of Generative Intelligent Driving R-UniAD (2)
  • Architecture of R-UniAD
  • Practical Demonstration of R-UniAD: Complex Scene Mining, 4D Simulation Reproduction, Reinforcement Learning and Generalization Verification
  • Kaiwu World Model 2.0
  • Mass Production
  • Released UniAD End-to-End Solution
  • DriveAGI: New-Generation Intelligent Driving Large Model and Its Advantages
  • DiFSD: End-to-end Intelligent Driving System That Simulates Human Driving Behaviors
  • DiFSD: Technical Interpretation
  • 3.14 Wayve - Intelligent Driving Large Model
  • Profile
  • Advantages of AV 2.0
  • Latest Progress: Architecture of GAIA-1 World Model
  • GAIA-1 World Model - Token
  • GAIA-1 World Model - Generation Effects
  • LINGO-2 Model
  • 3.15 Waymo - Intelligent Driving Large Model
  • Foundation Model
  • Building the Driver Algorithm
  • Validating the Driver Algorithm
  • Released Multimodal End-to-End Model EMMA
  • EMMA: Multimodal Input
  • EMMA: Defining Driving Tasks as Visual Q&A
  • EMMA: Introducing Chain-of-Thought Reasoning to Enhance Interpretability
  • Limitations of EMMA Model
  • Implementation and Operation
  • 3.16 GigaAI - End-to-End
  • Profile
  • Evolution Route of World Models
  • Hierarchical Construction Method for 4D Generative World Models
  • Application of World Models (1)
  • Application of World Models (2)
  • ReconDreamer
  • World Model: DriveDreamer
  • World Model: DriveDreamer 2
  • Overall Framework of DriveDreamer4D
  • 3.17 Nullmax - Intelligent Driving Large Model
  • Profile
  • MaxDrive Driving Assistance Solution
  • New-Generation Intelligent Driving Technology - Nullmax Intelligence
  • End-to-End Technical Architecture
  • End-to-End Data Platform
  • HiP-AD: End-to-End Intelligent Driving Framework Based on Multi-Granularity Planning and Deformable Attention
  • Mass Production

4 End-to-End Autonomous Driving Layout of OEMs

  • 4.1 Xiaomi
  • Profile
  • 2026 Strategic Planning/li>
  • Comprehensive Analysis of New Vehicle Planning in 2026
  • Product Positioning and Parameter Benchmarking of 2026 New Vehicles (1)
  • Product Positioning and Parameter Benchmarking of 2026 New Vehicles (2)
  • Organizational Structure Changes of Intelligent Driving Division
  • Intelligent Driving Technical Route: Full-Route Pre-Research without Betting on Single Technology
  • Comparison between VLA and End-to-End Routes
  • Intelligent Driving Algorithm Evolution Trend: from Modular End-to-End to End-to-End Architecture Introducing World Model + Reinforcement Learning
  • Launched XLA Cognitive Large Model in 2026
  • Evolution Roadmap of Intelligent Driving System and Large Models
  • Enhanced Version of HAD (1)
  • Enhanced Version of HAD (2)
  • End-to-End VLA Intelligent Driving Solution Orion
  • ORION Framework
  • Physical World Modeling Architecture
  • Multi-Model End-to-End with Three-Layer Separated Modeling
  • Long Video Generation Framework - MiLA
  • 4.2 XPeng
  • Evolution Roadmap of End-to-End Intelligent Driving Large Models
  • Autonomous Driving Product Planning, 2025~2026
  • L4 Autonomous Driving Layout in 2026: Robotaxi
  • Second-Generation VLA: Native Multimodal Physical World Large Model
  • L4 Capability = Model X Computing Power X Data X Vehicle Hardware
  • Second-Generation VLA (1)
  • Second-Generation VLA (2)
  • World Foundation Model (1)
  • World Foundation Model (2)
  • Core Technical Path of World Foundation Model
  • Three Phased Achievements in R&D of World Foundation Model
  • Cloud Model Factory (1)
  • Cloud Model Factory (2)
  • End-to-End System: Architecture
  • 4.3 Li Auto
  • Evolution Roadmap of End-to-End Intelligent Driving Large Models (1)
  • Evolution Roadmap of End-to-End Intelligent Driving Large Models (2)
  • Launched New-Generation Unified Architecture MindVLA-o1 in 2026 (1)
  • Launched New-Generation Unified Architecture MindVLA-o1 in 2026 (2)
  • Next-Generation Unified Architecture MindVLA-o1 (1)
  • Next-Generation Unified Architecture MindVLA-o1 (2)
  • Next-Generation Unified Architecture MindVLA-o1 (3)
  • Evolution from E2E+VLM Dual System to MindVLA
  • Architecture of MindVLA Model
  • Core Technology 1 of MindVLA: Great 3D Physical Spatial Perception Capability
  • Core Technology 2 of MindVLA: Integration with Large Language Model (LLM)
  • Core Technology 3 of MindVLA: Combination of Diffusion and RLHF
  • Core Technology 4 of MindVLA: World Model and NVAIE Accelerated Reinforcement Learning
  • End-to-End Solution (1): Iterative Evolution of System 1
  • End-to-End Solution (2): System 1 (End-to-End Model) + System 2 (VLM)
  • End-to-End Solution (3): Intelligent Driving Technical Architecture
  • End-to-End Solution (4): DriveVLM Large Model - Architecture
  • End-to-End Solution (5): DriveVLM Large Model - Rendering Effects
  • End-to-End Solution (6): DriveVLM Large Model - BEV and Text Feature Processing
  • 4.4 Tesla
  • Interpretation of 2024 AI Conference
  • Development History of AD Algorithms
  • Summary of End-to-End Progress, 2023-2024
  • FSD v13 (1)
  • FSD v13 (2)
  • FSD v13 (3): Subsequent Updates
  • Development History of AD Algorithms: Entering the Perception-heavy Map-light Era
  • Development History of AD Algorithms: Shadow Mode
  • Development History of AD Algorithms: Background of Occupancy Network Adoption
  • Development History of AD Algorithms: Occupancy Network (1)
  • Development History of AD Algorithms: Occupancy Network (2)
  • Development History of AD Algorithms: Occupancy Network (3)
  • Development History of AD Algorithms: Multi-Camera Fusion Algorithm HydraNet
  • Development History of AD Algorithms: FSD V12
  • Core Elements of Perception-Decision Full-Stack Integrated Model
  • End-to-End Algorithms
  • World Model (1)
  • World Model (2)
  • Data Engine
  • Dojo Supercomputer Center: Overview
  • Dojo Supercomputer Center: Training Tile Based on D1 Chip Integration
  • Dojo Supercomputer Center: Computing Power Development Plan
  • 4.5 NIO
  • Organizational Structure Adjustment of Intelligent Driving Division, 2024-2025
  • From Model-Based to End-to-End, World Model Becomes Dominant Technical Paradigm
  • Evolution Route of End-to-End Large Models
  • Detailed Explanation of Intelligent Driving System
  • NIO World Model (NWM) (1)
  • NIO World Model (NWM) (2)
  • Imagination Reconstruction Capability and Swarm Intelligence of World Model
  • NSim Simulator (NIO Simulation)
  • World Model 2.0
  • Comparation between End-to-End Model and World Model
  • Comparation between VLA and World Model
  • 4.6 Changan
  • Dubhe Plan 2.0 - Tianshu Intelligent Driving
  • Software Architecture of TOPS AD
  • Brand Layout
  • ADAS Strategy: "Dubhe Plan" Strategy
  • End-to-End System: BEV+LLM+GoT (1)
  • End-to-End System: BEV+LLM+GoT (2)
  • Production Vehicle Equipped with End-to-End System: NEVO E07
  • 4.7 Chery
  • Product Matrix and Vehicle Models
  • Evolution History of Intelligent Driving System
  • Launched Four Versions of Falcon Pilot in 2025
  • Progress of End-to-End Intelligent Driving Large Models (1)
  • Progress of End-to-End Intelligent Driving Large Models (2)
  • 4.8 GAC Group
  • Intelligent Driving Large Model Strategy
  • Evolution Roadmap of ADiGO Intelligent Driving System (ADiGO1.0 to ADiGO6.0)
  • Launched Five Major Intelligent Driving Platforms in 2025
  • L2.9 Vehicles and Urban NOA Algorithm/Intelligent Driving System Suppliers
  • Achieves "High-End Orientation + Mass Popularization" of Urban NOA through "Dual-Gradient Intelligent Driving Suppliers + Scenario-Price Precision Matching" Strategy
  • Established Huawang Adopting the "GAC Smart Manufacturing + Huawei Intelligence" Model to Expand High-End Market and Improve Brand Matrix
  • First Model Huawang Aistaland F03 Expected to Be Launched in Q2 2026
  • Momenta 5.0 One-Model End-to-End Algorithm Is Deployed on RMB150,000-Level Vehicles, and Urban NOA Function Is Also Available
  • Trumpchi Xiangwang S7 to Be Equipped with Momenta R6 Reinforcement Large Model
  • Architecture of ADiGO End-to-End Embodied Reasoning Model
  • Core Technologies of ADiGO
  • 4.9 Leapmotor
  • Released World Model in 2026
  • D19 Adopts VLA Large Model to Realize Full-Scenario Door-to-Door NOA
  • Adopts Intelligent Driving System Self-Development Model
  • Evolution Roadmap of Leapmotor Pilot (1)
  • Evolution Roadmap of Leapmotor Pilot (2)
  • End-to-End High-Level Intelligent Driving
  • Application Scenarios of End-to-End High-Level Intelligent Driving
  • 4.10 IM Motors
  • Iteration History of Intelligent Driving System
  • Cooperation with Momenta on Intelligent Driving
  • IM AD End-to-End 2.0 Intelligent Driving Large Models
  • Core Technologies of IM AD End-to-End 2.0 Intelligent Driving Large Models
  • Application Scenario Comparison between IM AD End-to-End 2.0 Intelligent Driving Large Models
  • 4.11 FAW Hongqi
  • Technical Architecture of Sinan Intelligent Driving
  • Core Technologies of End-to-End Large Models
  • Sinan Intelligent Driving Solution
  • Vehicle Deployment Schedule and Future Planning of Sinan Intelligent Driving Solution
  • Sinan Intelligent Driving System: Co-Developed with DJI Zhuoyu Technology (1)
  • Sinan Intelligent Driving System: Co-Developed with DJI Zhuoyu Technology (2)
  • Deployed Vehicles and Key Configurations of Sinan Intelligent Driving System
  • Zhuoyu End-to-End 4.0 System Debuted with Sinan Intelligent Driving in 2026
  • FAW Hongqi 9 Series Models to Adopt Huawei Hi Mode in 2026
  • 4.12 Dongfeng
  • Intelligent Driving Strategic Plan 2026-2030
  • Launched Four-Tier Tianyuan Intelligent Driving Product Matrix in 2025: Full Coverage from L2 to L4/L5
  • Comparison of Intelligent Driving Configurations between Production Vehicles First Equipped with Tianyuan T100/T200/T500
  • Tianyuan Intelligent Driving Technical Architecture R-AiD
  • Intelligent Driving Strategy: Self-development + External Procurement in Parallel in Short Term, and Gradual Self-development for Replacement in Long Term
  • 4.13 BYD
  • Overview of 2026 Intelligent Driving Planning
  • Layout in Intelligent Driving Field: Pre-Research on World Models
  • Organizational Structure Adjustment of Intelligent Driving Team (1): Integration of Dual Intelligent Driving Departments to Pool Resources to Accelerate Universal Intelligent Driving
  • Organizational Structure Adjustment of Intelligent Driving Team (2): Establishment of Advanced Technology R&D Center to Increase Investment in