![]() |
市場調查報告書
商品編碼
1441458
汽車AI模型技術及使用趨勢(2023-2024)Automotive AI Foundation Model Technology and Application Trends Report, 2023-2024 |
自2023年起,更多車型將開始與基本車型對接,越來越多的Tier 1將推出汽車基礎車型解決方案。尤其是Tesla在FSD V12方面的重大進步以及SORA的推出,加速了基於AI的模型在座艙和智慧駕駛方面的落地。
端到端自動駕駛基礎設施模式的繁榮。
2023年2月,採用端到端自動駕駛車型的Tesla FSD v12.2.1不僅在員工和測試人員中開始推廣,還在美國開始推廣。首批客戶的回饋表明,FSD V12 的功能相當強大,正在讓以前不相信自動駕駛的普通人也能接觸到 FSD。例如,Tesla FSD V12 可以繞過路上的水坑。Tesla工程師評論道, "這樣的駕駛方法很難用顯式代碼來實現,但Tesla的端到端方法使之成為可能,只需付出很少的努力。"
基於人工智慧的自動駕駛模型的發展可分為四個階段。
階段 1.0 使用感知級基礎模型(Transformer)。
2.0 階段是模組化,底層模型用於感知、規劃/控制和決策。
3.0階段是端到端的基礎模型(一個 "端" 是感測器的原始數據,另一個 "端" 直接輸出駕駛動作)。
階段 4.0 是從垂直人工智慧到通用人工智慧(AGI 的世界模型)的轉變。
大多數公司目前處於 2.0 階段,而 TeslaFSD V12 已經處於 3.0 階段。其他 OEM 和 Tier 1 也在傚法端到端基礎模型 FSD V12。2024年1月30日,Xpeng Motor宣佈下一步將全面提供汽車端到端模式。據悉,NIO和Li Auto也將於 2024 年發佈 "基於端到端" 的自動駕駛車型。
FSD V12 的駕駛決策由人工智慧演算法產生。它使用經過大量視訊資料訓練的端到端神經網絡,取代了超過 300,000 行 C++ 程式碼。FSD V12 提供了需要驗證的新途徑。如果實現,預計將對產業產生顛覆性影響。
2月16日,OpenAI發佈了其文字轉視訊轉換模型SORA,展示了AI視訊應用的廣泛採用。SORA不僅支援從文字和圖像生成長達60秒的視頻,而且在生成視頻、創建複雜場景和角色以及模擬物理世界的能力方面顯著超越了以前的技術。
SORA和FSD V12使AI能夠透過視覺理解甚至模擬真實的物理世界。Elon Mask 認為,FSD 12 和 Sora 只是人工智慧透過視覺感知和理解世界的能力的兩個成果,FSD 最終用於駕駛行為,Sora 用於視訊生成。
SORA的高人氣進一步證明了FSD V12的合理性。馬斯克說: "Tesla去年的生成影片。"
基於人工智慧的模型正在迅速發展,創造了新的機會。
三年來,自動駕駛的基礎模型已經經歷了多次演進,各大車廠的自動駕駛系統幾乎每年都被迫重寫,為後來者進入市場創造了機會。
在CVPR 2023上,SenseTime、OpenDriveLab、Horizon Robotics聯合提出的端到端自動駕駛演算法UniAD榮獲2023 Best Paper。
本報告對汽車人工智慧平台模型進行了調查和分析,並提供了演算法和平台模型的概述、平台模型的使用趨勢、公司簡介等。
Since 2023 ever more vehicle models have begun to be connected with foundation models, and an increasing number of Tier1s have launched automotive foundation model solutions. Especially Tesla's big progress of FSD V12 and the launch of SORA have accelerated implementation of AI foundation models in cockpits and intelligent driving.
End-to-End autonomous driving foundation models boom.
In February 2023, Tesla FSD v12.2.1, which adopts an end-to-end autonomous driving model, began to be pushed in the United States, not just to employees and testers. According to the feedback from the first customers, FSD V12 is quite powerful, allowing ordinary people who previously did not believe in and use autonomous driving to dare to use FSD. For example, Tesla FSD V12 can bypass puddles on roads. A Tesla engineer commented: this kind of driving approach is difficult to implement with explicit code, but Tesla's end-to-end approach makes it almost effortlessly.
The development of AI foundation models for autonomous driving can be divided into four phases.
Phase 1.0 uses a foundation model (Transformer) at the perception level.
Phase 2.0 is modularization, with foundation models used in perception, planning & control and decision.
Phase 3.0 is end-to-end foundation models (one "end" is raw data from sensors, and the other "end" directly outputs driving actions).
Phase 4.0 is about heading from vertical AI to artificial general intelligence (AGI's world model).
Most companies are now in Phase 2.0, while Tesla FSD V12 is already in Phase 3.0. Other OEMs and Tier1s have followed up with the end-to-end foundation model FSD V12. On January 30, 2024, Xpeng Motor announced that its end-to-end model will be fully available to vehicles in the next step. It is known that NIO and Li Auto will also launch "end-to-end based" autonomous driving models in 2024.
FSD V12's driving decisions are generated by an AI algorithm. It uses end-to-end neural networks trained with massive video data to replace more than 300,000 lines of C++ code. FSD V12 provides a new path that needs to be verified. If it is feasible, it will have a disruptive impact on the industry.
On February 16, OpenAI introduced text-to-video model SORA, signaling the wide adoption of AI video applications. SORA not only supports generation of up to 60-second videos from texts or images, but it well outperforms previous technologies in capabilities of video generation, complex scenario and character generation, and physical world simulation.
Through vision both SORA and FSD V12 enable AI to understand and even simulate the real physical world. Elon Mask believes that FSD 12 and Sora are just two of the fruits of AI's ability to recognize and understand the world through vision, and FSD is ultimately used for driving behaviors, and Sora is used to generate videos.
The high popularity of SORA is further evidence of the rationality of FSD V12. Musk said "Tesla generative video from last year".
AI foundation models evolve rapidly, bringing new opportunities.
In recent three years foundation models for autonomous driving have undergone several evolutions, and the autonomous driving systems of leading automakers must be rewritten almost every year, which also provides entry opportunities for late entrants.
At CVPR 2023, UniAD, an end-to-end autonomous driving algorithm jointly released by SenseTime, OpenDriveLab and Horizon Robotics, won the 2023 Best Paper.
In early 2024, Waytous' technical team and the Institute of Automation Chinese Academy of Sciences jointly proposed GenAD, the industry's first generative end-to-end autonomous driving model which combines generative AI and end-to-end autonomous driving technology. This technology is a disruption to UniAD progressive process end-to-end solution, and explores a new end-to-end autonomous driving mode. The key is to using generative AI to predict temporal evolution of the vehicle and surroundings in past scenarios.
In February 2024, Horizon Robotics and Huazhong University of Science and Technology proposed VADv2, an end-to-end driving model based on probabilistic planning. VADv2 takes multi-view image sequences as input in a streaming manner, transforms sensor data into environmental token embeddings, outputs the probabilistic distribution of action, and samples one action to control the vehicle. Using only camera sensors, VADv2 achieves state-of-the-art closed-loop performance in CARLA Town05 benchmark test, much better than all existing approaches. It runs stably in a fully end-to-end manner, even without rule-based wrapper.
On the Town05 Long benchmark, VADv2 achieved a Drive Score of 85.1, a Route Completion of 98.4, and an Infraction Score of 0.87, as shown in Tab. 1. Compared to the previous state-of-the-art method, VADv2 achieves a higher Route Completion while significantly improving Drive Score by 9.0. It is worth noting that VADv2 only utilizes cameras as perception input, while DriveMLM utilizes both cameras and LiDAR. Furthermore, compared to the previous best method which only relies on cameras, VADv2 demonstrates even greater advantages, with a remarkable increase in Drive Score of up to 16.8.
Also in February 2024, the Institute for Interdisciplinary Information Sciences at Tsinghua University and Li Auto introduced DriveVLM (its whole process shown in the figure below). A range of images are processed by a large visual language model (VLM) to perform specific chain of thought (CoT) reasoning to produce driving planning results. This large VLM includes a visual encoder and a large language model (LLM).
Due to limitations of VLMs in spatial reasoning and high computing requirements, DriveVLM team proposed DriveVLM-Dual, a hybrid system that combines advantages of DriveVLM and conventional autonomous driving pipelines. DriveVLM-Dual optionally combines DriveVLM with conventional 3D perception and planning modules, such as 3D object detector, occupancy network, and motion planner, allowing the system to achieve 3D localization and high-frequency planning. This dual-system design, similar to slow and fast thinking processes of human brain, can effectively adapt to changing complexity of driving scenarios.
AI and cloud companies attract attention as foundation models emerge.
As AI foundation models emerge, computing power, algorithm and data are indispensable. AI companies (iFLYTEK, SenseTime, Megvii, etc.) that are good at algorithms and have a large reserve of computing power, and cloud computing companies (Inspur, Volcengine, Tencent Cloud, etc.) with powerful intelligent computing centers, come under a spotlight of OEMs.
In the field of AI Foundation Model, SenseTime has deployed cockpit multimodal foundation model SenseChat-Vision, Artificial Intelligence Data Center (AIDC, with computing power of 6000P), and autonomous driving foundation model DriveMLM. In early 2024, SenseTime launched DriveMLM and achieved good results on CARLA, the most authoritative list of closed-loop test. DriveMLM is an intermediate solution between modular and end-to-end solutions and is interpretable.
For collection of autonomous driving corner cases, Volcengine and Haomo.ai work together to use foundation models to generate scenarios and improve annotation efficiency. The cloud service capabilities provided by Volcengine help Haomo.ai to improve the overall pre-annotation efficiency of DriveGPT by 10 times.
In 2023, Tencent released upgraded products and solutions in Intelligent Vehicle Cloud, Intelligent Driving Cloud Map, Intelligent Cockpit and other fields. In terms of computing power, Tencent Intelligent Vehicle Cloud enables 3.2Tbps bandwidth, 3 times higher computing performance, 10 times higher communication performance, and an over 60% increase in computing cluster GPU utilization, providing high-bandwidth, low-latency intelligent computing power support for training foundation models for intelligent driving. As for training acceleration, Tencent Intelligent Vehicle Cloud combines Angel Training Acceleration Framework, with training speed twice and reasoning speed 1.3 times faster than the industry's mainstream frameworks. Currently Bosch, NIO, NVIDIA, Mercedes-Benz, and WeRide among others are users of Tencent Intelligent Vehicle Cloud. In 2024, Tencent will further strengthen construction of AI foundation models.