The following content is from the translation of the original text:
Ali and Beijing Jiaotong University’s Mobile-Agent-v2 released Mobile-Agent-v2, a mobile device operation assistant that achieves effective navigation through multi-agent collaboration. It realizes automated operation and visual perception of mobile devices through multi-agent collaboration. Function, allowing ai to simulate clicks, swipes, inputs and other operations to control your mobile phone like a real person, thereby performing various tasks.
Mobile device operation tasks are increasingly becoming a popular multimodal artificial intelligence application scenario. Current multimodal large language models (MLLM) are limited by training data and lack the ability to effectively act as operation assistants.
In contrast, MLLM-based agents, enhanced through tool calls, are gradually being applied to this scenario.
However, the two major navigation challenges in mobile device operating tasks, task progress navigation and focus content navigation, have become very complex under the single-agent architecture of existing work. This is due to excessive token sequences and interleaved text image data formats that limit performance.
To effectively solve these navigation challenges, we propose Mobile-Agent-v2, a multi-agent architecture for mobile device operation assistance.
The architecture includes three agents: a planning agent, a decision-making agent, and a reflection agent.
Planning agents generate task schedules, making the navigation of historical operations more efficient. To retain the focus, we designed a memory unit that updates as the task progresses.
In addition, to correct erroneous operations, the reflection agent observes the results of each operation and handles any errors accordingly.
Experimental results show that compared with Mobile-Agent’s single-agent architecture, Mobile-Agent-v2’s task completion rate is increased by more than 30%. The code is open source on this https URL
If you want to learn more, you can click on the link below the video.
Thank you for watching this video. If you like it, please subscribe and like it. thank
Thesis:https://arxiv.org/abs/2406.01014
Github:https://github.com/X-PLUG/MobileAgent
Oil tubing: