Waymo, a subsidiary of Alphabet, has introduced a new approach to training its self-driving vehicles using the Gemini model, a large multimodal language model (MLLM) from Google. The model will improve the navigation of autonomous cars and allow them to better cope with complex road situations.
In a new research paper, Waymo defined its development as an “end-to-end multimodal model for autonomous driving” (EMMA), which processes sensor data and helps robotaxis make decisions about the direction of travel while avoiding obstacles. According to The Verge, Waymo has long emphasized its strategic advantage due to access to scientific research in the field of artificial intelligence (AI) Google DeepMind, formerly owned by the British company DeepMind Technologies.
The new EMMA system represents a fundamentally different approach to training autonomous vehicles. Instead of traditional modular systems that separate functions into perception, route planning and other tasks, EMMA offers a unified approach that will allow data to be processed holistically, help avoid errors that occur when transferring data between modules, and improve adaptation to new, unfamiliar road conditions in real time.
One of the key benefits of using MLLM models, in particular Gemini, is their ability to generalize knowledge gleaned from vast amounts of data obtained from the Internet. This allows the models to better adapt to unusual situations on the road, such as the unexpected appearance of animals or repair work. Additionally, models trained on Gemini are capable of “chain of reasoning.” It is a technique that helps break down complex problems into sequential, logical steps, improving decision making.
Despite its successes, Waymo acknowledges that EMMA has its limitations. For example, the model does not yet support processing 3D data from sensors such as lidar or radar due to high computational complexity. Additionally, EMMA can only process a limited number of image frames at a time. It is emphasized that further research will be required to overcome all these limitations before the full implementation of the model in real conditions.
Waymo also recognizes the risks associated with using MLLM to drive autonomous vehicles. Models like Gemini can make mistakes or “hallucinate” in simple tasks, which of course is unacceptable on the road. However, it is hoped that further research and improvements in the architecture of AI models for autonomous driving will overcome these problems.
If you notice an error, select it with the mouse and press CTRL+ENTER.
Source: 3dnews.ru