封面
市場調查報告書
商品編碼
1777128

汽車和機器人工學的VLA大規模模式的應用(2025年)

VLA Large Model Applications in Automotive and Robotics Research Report, 2025

出版日期: | 出版商: ResearchInChina | 英文 300 Pages | 商品交期: 最快1-2個工作天內

價格
簡介目錄

2023 年 7 月,Google DeepMind 發布了基於 VLA 架構的 RT-2 模型。該模型透過融合大規模語言模型和多模態資料學習,賦予機器人執行複雜任務的能力。其任務準確率約為第一代模型(32%-62%)的兩倍,並在垃圾分類等場景中實現了突破性的零樣本學習。

VLA 理念迅速引起車企的關注,並在智慧駕駛領域迅速落地。如果說 "端到端" 是2024年智慧駕駛領域最熱詞,那麼 "虛擬自動駕駛" (VLA)預計將成為2025年最熱詞。小鵬汽車、理想汽車、深路智行等公司都已公佈VLA解決方案。

小鵬汽車在7月發布G7車型時,率先宣布量產車載VLA。理想汽車計畫為其i8車型搭載VLA,預計於7月29日的發表會上亮相。吉利、DeepRoute.ai 和 iMotion 等公司也在開發 VLA。

理想汽車和小鵬汽車分別展示了 VLA 模型在汽車上的應用方案,其中,蒸餾或強化學習是優先考慮的。

在小鵬汽車 G7 預售會上,何小鵬用大腦和小腦的比喻來解釋傳統端到端和 VLA 的功能。他表示,傳統的端到端解決方案就像小腦,“讓汽車開得動”,而搭載大規模語言模型的 VLA 就像大腦,“讓汽車開得好”。

小鵬汽車和理想汽車在 VLA 的應用路徑略有不同。理想汽車首先蒸餾一個大規模的雲端模型,然後在蒸餾後的端到端模型上進行強化學習。小鵬汽車首先在雲端大規模模型上進行強化學習,然後進行模式蒸餾到車端。

2025年5月,李想在AI Talk上表示,理想汽車雲端基礎模型有320億參數,並將32億參數模型提取到車端。透過駕駛場景資料進行後訓練和強化學習。第四階段,將最終的駕駛代理部署到端和雲端。

小鵬汽車也將VLA模型訓練部署工廠劃分為四個車間。第一車間負責基礎模型的預訓練和後訓練,第二車間負責模型蒸餾,第三車間繼續對蒸餾後的模型進行預訓練,第四車間將XVLA部署到車端。小鵬全球基礎模型負責人劉先明博士表示,小鵬汽車已在雲端訓練了包括10億、30億、70億、720億多個參數的小鵬全球基礎模型。

究竟哪種方案更適合更智慧的駕駛環境,將根據各廠商的VLA方案應用到車輛上後的具體表現來判斷。

近日,麥基爾大學、清華大學、小米集團和威斯康辛大學麥迪遜分校聯合發表的 "自動駕駛視覺-語言-動作模型綜述" (A Survey on Vision-Language-Action Models for Autonomous Driving)對自動駕駛領域的VLA模型進行了全面的綜述。文將VLA的發展分為四個階段:Pre-VLA(以VLM為解釋角色)、Modular VLA、End-to-End VLA和Augmented VLA。清楚闡述了VLA各階段的特徵及其逐步發展的過程。

本報告研究了汽車和機器人領域中的大規模VLA模型,總結了它們的技術起源、發展階段、應用實例和核心特性。報告列舉了智慧駕駛和機器人領域中八種典型的VLA實現方案和代表性的大規模VLA模型,並總結了VLA發展的四大趨勢。

目錄

關聯的定義

第1章 VLA大規模模式概要

  • VLA(視覺、語言和行為建模)的基本定義
  • VLA 科技的起源與發展
  • 大規模 VLA 建模方法的分類
  • 自動駕駛 VLA 模型開發的四個階段
  • VLA 解決方案的應用(1)
  • VLA 解決方案的應用(2)
  • VLA 解決方案的應用(3)
  • VLA 解決方案的應用(4)
  • 案例 1:增強 VLA 泛化能力
  • 案例 2:VLA 計算開銷
  • VLA 的核心特性
  • VLA 技術開發面臨的課題

第2章 VLA的技術架構,解決方案,趨勢

  • VLA 核心技術架構分析 (1)
  • VLA 核心技術架構分析 (2)
  • VLA 核心技術架構分析 (3)
  • VLA 核心技術架構分析 (4)
  • VLA 核心技術架構分析 (5)
  • VLA 核心技術架構分析 (6)
  • VLA 核心技術架構分析 (7)
  • VLA 決策的核心—思維鏈 (CoT) 技術
  • VLA 大規模模型實作方案概述
  • VLA 實作方案 (1):基於傳統 Transformer 架構的方案
  • VLA 實現方案 (2):基於預訓練 LLM/VLM 的方案
  • VLA 實現方案 (3):基於擴散 (Diffusion) 的方案模型
  • VLA 實現方案 (4):LLM + 擴散模型方案
  • VLA 實現方案 (5):影片產生 + 逆運動學方案
  • VLA 實作方案 (6):明確端對端 VLA 方案
  • VLA 實作方案 (7):隱式端對端 VLA 方案
  • VLA 實作方案 (8):分層端對端 VLA 方案
  • 智慧駕駛 VLA 模型概述
  • 具身化人工智慧 VLA 模型概述
  • 案例1
  • 案例2
  • 案例3
  • 案例4
  • VLA開發趨勢(1)
  • VLA開發趨勢(2)
  • VLA開發趨勢(3)
  • VLA開發趨勢(4)

第3章 汽車領域的VLA大規模模式的應用

  • Li Auto
  • XPeng Motors
  • Chery Automobile
  • Geely
  • Xiaomi Auto
  • DeepRoute.ai
  • Baidu Apollo
  • Horizon Robotics
  • SenseTime
  • NVIDIA
  • iMotion
  • XPeng Motors
  • Chery Automobile
  • Geely
  • Xiaomi Auto
  • DeepRoute.ai
  • Baidu Apollo
  • Horizon Robotics
  • SenseTime
  • NVIDIA
  • iMotion

第4章 機器人工學領域的大規模模式的進步

  • 機器人一般的基本模式
  • 機器人多模態大規模模式
  • 機器人資料一般化模式
  • 機器人大規模模式資料集
  • 機器人VLM模式
  • 機器人VLN模式
  • 機器人VLA模式
  • 機器人世界模式

第5章 機器人工學領域的VLA的應用案例

  • AgiBot
  • Galbot
  • Robot Era
  • Estun
  • Unitree
  • UBTECH
  • Tesla Optimus
  • Figure AI
  • Apptronik
  • Agility Robotics
  • XPeng IRON
  • Xiaomi CyberOne
  • GAC GoMate
  • Mornine
  • Leju Robotics
  • LimX Dynamics
  • AI2 Robotics
  • X Square Robot
簡介目錄
Product Code: ZXF013

ResearchInChina releases "VLA Large Model Applications in Automotive and Robotics Research Report, 2025"

The report summarizes and analyzes the technical origin, development stages, application cases and core characteristics of VLA large models.

It sorts out 8 typical VLA implementation solutions, as well as typical VLA large models in the fields of intelligent driving and robotics, and summarizes 4 major trends in VLA development.

It analyzes the VLA application solutions in the field of intelligent driving of companies such as Li Auto, XPeng Motors, Chery Automobile, Geely Automobile, Xiaomi Auto, DeepRoute.ai, Baidu, Horizon Robotics, SenseTime, NVIDIA, and iMotion.

It sorts out more than 40 large model frameworks or solutions such as robot general basic models, multimodal large models, data generalization models, VLM models, VLN models, VLA models and robot world models.

It analyzes the large models and VLA large model application solutions of companies such as AgiBot, Galbot, Robot Era, Estun, Unitree, UBTECH, Tesla Optimus, Figure AI, Apptronik, Agility Robotics, XPeng IRON, Xiaomi CyberOne, GAC GoMate, Chery Mornine, Leju Robotics, LimX Dynamics, AI2 Robotics, and X Square Robot.

  • Vision-Language-Action (VLA) model is an end-to-end artificial intelligence model that integrates three modalities: Vision, Language, and Action. Through a unified multimodal learning framework, it integrates perception, reasoning and control, and directly generates executable physical world actions (such as robot joint movement, vehicle steering control) based on visual input (such as images, videos) and language instructions (such as task descriptions).

In July 2023, Google DeepMind launched the RT-2 model, which adopts the VLA architecture. By integrating large language models with multimodal data training, it endows robots with the ability to perform complex tasks. Its task accuracy has nearly doubled compared with the first-generation model (from 32% to 62%), and it has achieved breakthrough zero-shot learning in scenarios such as garbage classification.

The concept of VLA was quickly noticed by automobile companies and rapidly applied to the field of automotive intelligent driving. If "end-to-end" was the hottest term in the intelligent driving field in 2024, then "VLA" will be the one in 2025. Companies such as XPeng Motors, Li Auto, and DeepRoute.ai have released their respective VLA solutions.

When XPeng Motors released the G7 model in July, it took the lead in announcing the mass production of VLA in vehicles. Li Auto plans to equip the i8 model with VLA, which is expected to be revealed at the press conference on July 29. Enterprises such as Geely Automobile, DeepRoute.ai and iMotion are also developing VLA.

Li Auto and XPeng Motors have given different solutions on whether VLA models should be distilled first or reinforced learning first when applied in vehicles

At the pre-sale conference of XPeng Motors' G7, He Xiaopeng used the brain and cerebellum as metaphors to explain the functions of the traditional end-to-end and VLA. He said that traditional end-to-end solution plays the role of cerebellum, "making the car able to drive", while VLA introduces a large language model, playing the role of brain, "making the car drive well".

XPeng Motors and Li Auto have taken slightly different routes in VLA application: Li Auto first distills the cloud-based base large model, and then performs reinforcement learning on the distilled end-side model; XPeng Motors first performs reinforcement learning on the cloud-based base large model, and then distills it to the vehicle end.

In May 2025, Li Xiang mentioned in AI Talk that Li Auto's cloud-based base model has 32 billion parameters, distills a 3.2 billion parameter model to the vehicle end, and then conducts post-training and reinforcement learning through driving scenario data, and will deploy the final driver Agent on the end and cloud in the fourth stage.

XPeng Motors has also divided the factory for training and deploying VLA models into four workshops: the first workshop is responsible for pre-training and post-training of the base model; the second workshop is responsible for model distillation; the third workshop continues pre-training the distilled model; the fourth workshop deploys XVLA to the vehicle end. Dr. Liu Xianming, head of XPeng's world base model, said that XPeng Motors has trained "XPeng World Base Models" with multiple parameters such as 1 billion, 3 billion, 7 billion, and 72 billion in the cloud.

Which solution is more suitable for the intelligent driving environment remains to be seen based on the specific performance of different manufacturers' VLA solutions after being applied in vehicles.

Recently, research teams from McGill University, Tsinghua University, Xiaomi Corporation, and the University of Wisconsin-Madison jointly released a comprehensive review article on VLA models in the field of autonomous driving, "A Survey on Vision-Language-Action Models for Autonomous Driving". The article divides the development of VLA into four stages: Pre-VLA (VLM as explainer), Modular VLA, End-to-end VLA and Augmented VLA, clearly showing the characteristics of VLA in different stages and the gradual development process of VLA.

There are over 100 robot VLA models, constantly exploring in different paths

Compared with the application of VLA large models in automobiles, which have tens of billions of parameters and nearly 1,000 TOPS of computing power, AI computing chips in the robotics field are still optional, and the number of parameters in training data sets is mostly between 1 million and 3 million. There are also controversies over the mixed use of real data and simulated synthetic data and routes. One of the reasons is that the number of cars on the road is hundreds of millions, while the number of actually deployed robots is very small; another important reason is that robot VLA models focus on the exploration of the microcosmic world. Compared with the grand automotive world model, the multimodal perception of robot application scenarios is richer, the execution actions are more complex, and the sensor data is more microscopic.

There are more than 100 VLA models and related data sets in the robotics field, and new papers are constantly emerging, with various teams exploring in different paths.

Exploration 1: VTLA framework integrating tactile perception

In May 2025, research teams from the Institute of Automation of the Chinese Academy of Sciences, Samsung Beijing Research Institute, Beijing Academy of Artificial Intelligence (BAAI), and the University of Wisconsin-Madison jointly released a paper on VTLA related to insertion manipulation tasks. The research shows that the integration of visual and tactile perception is crucial for robots to perform tasks with high precision requirements when performing contact-intensive operation tasks. By integrating visual, tactile and language inputs, combined with a time enhancement module and a preference learning strategy, VTLA has shown better performance than traditional imitation learning methods and single-modal models in contact-intensive insertion tasks.

Exploration 2: VLA model supporting multi-robot collaborative operation

In February 2025, Figure AI released the Helix general Embodied AI model. Helix can run collaboratively on humanoid robots, enabling two robots to cooperate to solve a shared, long-term operation task. In the video demonstrated at the press conference, Figure AI's robots showed a smooth collaborative mode in the operation of placing fruits: the robot on the left pulled the fruit basin over, the robot on the right put the fruits in, and then the robot on the left put the fruit basin back to its original position.

Figure AI emphasized that this is only touching "the surface of possibilities", and the company is eager to see what happens when Helix is scaled up 1000 times. Figure AI introduced that Helix can run completely on embedded low-power GPUs and can be commercially deployed immediately.

Exploration 3: Offline end-side VLA model in the robotics field

In June 2025, Google released Gemini Robotics On-Device, a VLA multimodal large model that can run locally offline on embodied robots. The model can simultaneously process visual input, natural language instructions, and action output. It can maintain stable operation even in an environment without a network.

It is particularly worth noting that the model has strong adaptability and versatility. Google pointed out that Gemini Robotics On-Device is the first robot VLA model that opens the fine-tuning function to developers, enabling developers to conduct personalized training on the model according to their specific needs and application scenarios.

VLA robots have been applied in a large number of automobile factories

When the macro world model of automobiles is integrated with the micro world model of robots, the real era of Embodied AI will come.

When Embodied AI enters the stage of VLA development, automobile enterprises have natural first-mover advantages. Tesla Optimus, XPeng Iron, and Xiaomi CyberOne robots have fully learned from their rich experience in intelligent driving, sensor technology, machine vision and other fields, and integrated their technical accumulation in the field of intelligent driving. XPeng Iron robot is equipped with XPeng Motors' AI Hawkeye vision system, end-to-end large model, Tianji AIOS and Turing AI chip.

At the same time, automobile factories are currently the main application scenarios for robots. Tesla Optimus robots are currently mainly used in Tesla's battery workshops. Apptronik cooperates with Mercedes-Benz, and Apollo robots enter Mercedes-Benz factories to participate in car manufacturing, with tasks including handling, assembly and other physical work. At the model level, Apptronik has established a strategic cooperation with Google DeepMind, and Apollo has integrated Google's Gemini Robotics VLA large model.

On July 18, UBTECH released the hot-swappable autonomous battery replacement system for the humanoid robot Walker S2, which enables Walker S2 to achieve 3-minute autonomous battery replacement without manual intervention.

According to public reports, many car companies including Tesla, BMW, Mercedes-Benz, BYD, Geely Zeekr, Dongfeng Liuzhou Motor, Audi FAW, FAW Hongqi, SAIC-GM, NIO, XPeng, Xiaomi, and BAIC Off-Road Vehicle have deployed humanoid robots in their automobile factories. Humanoid robots such as Figure AI, Apptronik, UBTECH, AI2 Robotics, and Leju are widely used in various links such as automobile and parts production and assembly, logistics and transportation, equipment inspection, and factory operation and maintenance. In the near future, AI robots will be the main "labor force" in "unmanned factories".

Table of Contents

Related Definitions

Chapter 1 Overview of VLA Large Models

  • 1.1 Basic Definition of VLA (Vision-Language-Action Model)
  • 1.2 Origin and Evolution of VLA Technology
  • 1.3 Classification of VLA Large Model Methods
  • 1.4 Four Stages of VLA Model Development in Autonomous Driving
  • 1.5 VLA Solution Application (1)
  • 1.5 VLA Solution Application (2)
  • 1.5 VLA Solution Application (3)
  • 1.5 VLA Solution Application (4)
  • 1.6 Case 1: Enhancement of VLA Generalization
  • 1.6 Case 2: VLA Computational Overhead
  • 1.7 Core Characteristics of VLA
  • 1.8 Challenges in VLA Technology Development

Chapter 2 VLA Technical Architecture, Solutions and Trends

  • 2.1 Analysis of VLA Core Technical Architecture (1)
  • 2.1 Analysis of VLA Core Technical Architecture (2)
  • 2.1 Analysis of VLA Core Technical Architecture (3)
  • 2.1 Analysis of VLA Core Technical Architecture (4)
  • 2.1 Analysis of VLA Core Technical Architecture (5)
  • 2.1 Analysis of VLA Core Technical Architecture (6)
  • 2.1 Analysis of VLA Core Technical Architecture (7)
  • 2.2 VLA Decision Core - Chain-of-Thought (CoT) Technology
  • 2.3 Overview of VLA Large Model Implementation Solutions
  • 2.4 VLA Implementation Solution (1): Solution Based on Classic Transformer Structure
  • 2.4 VLA Implementation Solution (2): Solution Based on Pre-trained LLM/VLM
  • 2.4 VLA Implementation Solution (3): Solution Based on Diffusion Model
  • 2.4 VLA Implementation Solution (4): LLM + Diffusion Model Solution
  • 2.4 VLA Implementation Solution (5): Video Generation + Inverse Kinematics Solution
  • 2.4 VLA Implementation Solution (6): Explicit End-to-End VLA Solution
  • 2.4 VLA Implementation Solution (7): Implicit End-to-End VLA Solution
  • 2.4 VLA Implementation Solution (8): Hierarchical End-to-End VLA Solution
  • 2.5 Summary of Intelligent Driving VLA Models
  • 2.6 Summary of Embodied AI VLA Models
  • 2.7 Case 1
  • 2.7 Case 2
  • 2.7 Case 3
  • 2.7 Case 4
  • 2.8 VLA Development Trend (1)
  • 2.8 VLA Development Trend (2)
  • 2.8 VLA Development Trend (3)
  • 2.8 VLA Development Trend (4)

Chapter 3 VLA Large Model Application in the Automotive Field

  • 3.1 Li Auto
  • AI-based Autonomous Driving Development Plan
  • AI Application of Data Closed Loop: Cloud Training of Data
  • Overall Technical Architecture of End-to-End Solution
  • Technical Architecture of End-to-End Solution: System 1 - E2E (End-to-End)
  • Technical Architecture of End-to-End Solution: System 2 - VLM (Vision-Language Model)
  • Technical Architecture of End-to-End Solution: Cloud "World Model"
  • Self-developed MindVLA Based on End-to-End + VLM Dual-System Architecture
  • MindVLA Technical Architecture: Multimodal Perception Layer
  • MindVLA Technical Architecture: Semantic Understanding Layer
  • MindVLA Technical Architecture: Decision and Execution Layer
  • MindVLA: Cloud "World Model"
  • MindVLA: Four Stages of Training and Reasoning Process of VLA Driver Large Model
  • NVIDIA's End-to-End Technology Supports the Implementation of MindVLA
  • Application Scenarios and Functions of MindVLA
  • 3.2 XPeng Motors
  • XPeng G7 Ultra Released, VLA Large Model Applied in Vehicles
  • VLA Large Model: Target to Achieve 10 Times End-to-End Intelligent Driving Capability
  • VLA OL Large Model: Brain + Cerebellum
  • Cloud Model Factory
  • World Base Model (1)
  • World Base Model (2)
  • World Base Model (3)
  • World Base Model (4)
  • World Base Model (5)
  • World Base Model (6)
  • 3.3 Chery Automobile
  • AI Strategy (1)
  • AI Strategy (2)
  • ZDrive.AI Intelligent Driving Technology Evolution Route and Product Plan
  • ZDrive.AI to Realize L3/4 Product Application with VLA Large Model in 2027
  • VLA Large Model Based on One Model End-to-End
  • Embodied AI Platform - VLA Model
  • New Generation Intelligent Driving System Falcon 900, Built with VLA + World Model
  • Falcon Intelligent Driving Large Model Architecture
  • 3.4 Geely
  • AI Strategy
  • High-Level Intelligent Driving System
  • Application of Qianli Haohan H9 Solution: VLA Vehicle-End AI Large Model
  • Integration of VLA Model, World Model and AI Drive Large Model to Build a Pan-World Model System
  • 3.5 Xiaomi Auto
  • Orion Solution Framework
  • Orion's QT-Former
  • Physical World Modeling Framework
  • Dual-Track Layout of Physical Modeling and VLA
  • 3.6 DeepRoute.ai
  • High-Level Intelligent Driving Platform: DeepRoute IO
  • End-to-End Model Intelligent Driving Platform: DeepRoute IO 1.0
  • VLA Model Intelligent Driving Platform: DeepRoute IO 2.0
  • Comparison of VLM & VLA Intelligent Driving Solutions
  • VLA Model Architecture
  • Advantages and Challenges of VLA Model
  • VLA Model Cooperation Dynamics
  • 3.7 Baidu Apollo
  • Open Source End-to-End Autonomous Driving System AIR ApolloFM
  • Core Modules of AIR ApolloFM (1)
  • Core Modules of AIR ApolloFM (2)
  • Reference Engineering Design of AIR ApolloFM
  • Real Vehicle Operation Results of AIR ApolloFM
  • 3.8 Horizon Robotics
  • End-to-End VLA Intelligent Driving System (1)
  • End-to-End VLA Intelligent Driving System (2)
  • Journey 6P Supports VLM/VLA and Other Technologies
  • Prediction of Fully Autonomous Driving by 2035
  • 3.9 SenseTime
  • Launch of End-to-End VLA Modeling Framework SOLAMI
  • Overall Framework of SOLAMI
  • Training Process of SOLAMI
  • Multimodal Interaction Data Flow and Examples of SOLAMI
  • SOLAMI VR Interaction System Architecture
  • 3.10 NVIDIA
  • Robot General VLA Large Model GR00T-N1 (1)
  • Robot General VLA Large Model GR00T-N1 (2)
  • Robot General VLA Large Model GR00T-N1 (3)
  • CoT-VLA Model Achieves Precise Control of Complex Tasks with "Visual Chain of Thought"
  • 3.11 iMotion
  • VLA Intelligent Driving Solution

Chapter 4 Progress of Large Models in the Robotics Field

  • 4.1 General Basic Models for Robots
  • Architecture of Robot Basic Large Models
  • General Basic Large Models
  • Robot General Large Model (1): Pi Zero
  • Robot General Large Model (2): Large Language Model Based on LLaMA
  • Robot General Large Model (3): Large Model Based on Vision Transformer
  • Key Technologies of Robot Driven by Large Models
  • Robot Perception Module
  • Robot Planning Module
  • Robot Decision Module
  • Robot Action Module
  • Robot Motion Control Module
  • Robot Feedback Module
  • Beijing Academy of Artificial Intelligence (BAAI) Open Source RoboBrain 2.0
  • MindLoongGPT
  • 4.2 Robot Multimodal Large Models
  • Robot Multimodal Large Models
  • Visual Generation Large Models
  • SenseTime SenseNova V6 Large Model
  • Manycore Tech SpatialLM
  • 4.3 Robot Data Generalization Models
  • Data-Driven Robot Imitation Learning
  • Real2Sim in RSR: Pure Vision, Low-Cost, Zero Manual Annotation Truth Production Process
  • UnrealZoo: Enriching Realistic Virtual Worlds for Embodied Intelligence Based on Unreal Engine
  • RoboTwin: Dual-Arm Robot Benchmark for Generative Digital Twins
  • RoboGSim: Data Synthesizer and Closed-Loop Simulator for Real2Sim2Real Paradigm
  • Any-point Trajectory Model (ATM) Framework
  • Peking University and Renmin University Teams Release Million-Scale Dataset to Build Humanoid Robot General Large Model
  • MotionLib Large-Scale Action Generation: From Language to Action
  • 4.4 Robot Large Model Datasets
  • AgiBot World
  • Unitree G1 Dataset
  • Shanghai Jiao Tong University RH20T
  • Beijing Innovation Center of Humanoid Robotics RoboMIND
  • 4.5 Robot VLM Models
  • Vision-Language Model VLM
  • General Robot Model Pi zero
  • PaLM-E: Embodied Multimodal Language Model
  • Figure AI Cooperates with OpenAI to Launch Three-Level Hierarchical Decision-Making Scheme
  • Noematrix Brain
  • Galbot Three-Level Large Model System
  • 4.6 Robot VLN Models
  • Basic Concept of VLN
  • Main Implementation Methods of VLN
  • Comparison of VLA and VLN Models
  • LH-VLN: Vision-Language Navigation with a Long-Term Perspective: Platform, Benchmark and Methods
  • Safe-VLN: Anti-Collision for Autonomous Robot Vision and Language Navigation in Continuous Environments
  • MC-GPT: Enhancing Vision and Language Navigation Through Memory Graphs and Reasoning Chains
  • 4.7 Robot VLA Models
  • Composition of Typical Robot VLA Models
  • NaVILA: Vision-Language-Action Model for Legged Robot Navigation
  • OpenVLA: Open Source Vision-Language-Action Model
  • OpenVLA: End-to-End Training - Vision-Language Model VLM
  • Vision-Language-Action (VLA) Model - Robotic Transformer2 (RT-2)
  • Uni-NaVid Proposes Unified Video-Language-Action (VLA) Model for Multiple Embodied Navigation Tasks
  • QUAR-VLA: Vision-Language-Action (VLA) Model for Quadruped Robots
  • RoboMamba: End-to-End VLA Model with 3 Times Faster Reasoning Speed, Only Needing to Adjust 0.1% of Parameters
  • LeVERB: VLA Framework for Zero-Shot Deployment Trained on Simulated Data
  • Google Gemini Robotics On-Device: Launching Locally Deployed Robot VLA Models
  • 4.8 Robot World Models
  • Basic Architecture of World Models
  • Key Definitions and Application Development of World Models
  • AgiBot Jointly with Shanghai AI Lab Proposes Embodied 4D World Model EnerVerse
  • 3D-VLA: A Three-Dimensional Vision-Language-Action Generation World Model
  • RoboDreamer: Compositional World Model for Learning Robot Imagination
  • IRASim - World Model in Robots
  • Robotic World Model: Neural Network Simulator for Robot Robust Strategy Optimization
  • DAMO Academy Releases "World VLA" Large Model WorldVLA

Chapter 5 VLA Application Cases in the Robotics Field

  • 5.1 AgiBot
  • Genie Operator-1 (GO-1) Large Model
  • Vision-Language-Latent-Action (ViLLA) Architecture
  • 5.2 Galbot
  • Galbot G1
  • Open VLA Technical Architecture
  • Simulation Data Pre-trained Model GraspVLA
  • 5.3 Robot Era
  • General Humanoid Robot STAR1
  • ERA-42
  • Open Source AIGC Robot Large Model VPP
  • 5.4 Estun
  • CODROID 02
  • Fast and Slow Systems CODROID 02
  • CoDroid EIP
  • 5.5 Unitree
  • Product Matrix
  • Commercial Layout
  • UnifoLM
  • 5.6 UBTECH
  • Release of Robot Multimodal Reasoning Large Model Based on DeepSeek-R1
  • Robot BrainNet Software Architecture
  • Robot Super Brain + Intelligent Cerebellum Model
  • 5.7 Tesla Optimus
  • Development History of Optimus and Progress of Robot Large Models
  • Technologies of Optimus Robot Learned from the Automotive Field
  • Planning and Adjustment of Optimus
  • Optimus to Integrate xAI's Grok Model
  • Grok 4 Heavy Shows Strong Reasoning and Understanding Abilities
  • 5.8 Figure AI
  • Figure 01 Cooperates with Open AI
  • Figure 02's Self-Developed VLA Model - Helix
  • Dual-System Mode of VLA Model Helix
  • Collaborative Operation Mode of VLA Model Helix
  • 5.9 Apptronik
  • Apollo
  • Apollo Robot (1): Google Gemini Robotics Large Model
  • Apollo Robot (2): Open Docking with External AI Systems
  • Apollo Robot (3): Assisting in AI Car Manufacturing
  • 5.10 Agility Robotics
  • Digit
  • Digit Robot Test Access to Open Source Large Language Models and AI Models
  • 5.11 XPeng IRON
  • Robot Development History
  • Robot Large Model
  • 5.12 Xiaomi CyberOne
  • Main Characteristics of CyberOne Robot
  • MiAl Engine of CyberOne Robot
  • 5.13 GAC GoMate
  • Main Characteristics of GoMate Robot
  • Algorithm of GoMate Robot
  • 5.14 Mornine
  • Main Characteristics of Mornine Robot
  • Dual-Core Intelligent Brain of Mornine Robot
  • 5.15 Leju Robotics
  • Development History
  • KUAVO Equipped with 5G-A Technology
  • Pangu Large Model of KUAVO
  • 5.16 LimX Dynamics
  • TRON 1
  • LIMX: Embodied Operation Algorithm Based on Video Generation Large Model
  • Cooperates with Quectel: Robrain AI Robot
  • 5.17 AI2 Robotics
  • Alpha Bot 2
  • VLA Large Model
  • 5.18 X Square Robot
  • Core Team
  • Focusing on Embodied Large Models, Supported by Meituan's Strategy