The AI hardware Rabbit R1, which exploded some time ago, is actually theoretically based on the GPT-4V visual model and can be implemented on mobile phones. This project is an attempt to implement a large-language visual model to operate your mobile phone, including the internal App.
In terms of technical implementation, it relies on mobile phones ‘automated testing tool Appium to allow large language models to interact with mobile phones.
But the problem with this project is also obvious, that is, the setting of the entire environment is too complex, requires professional mobile phone development to run, and also requires a development certificate.
Still a good try👍🏻
Project address:https://github.com/francedot/NavAIGuide-TS
Detailed description:https://medium.com/@francedot/ios-ui-focused-agents-in-the-era-of-multi-modal-generative-ai-1f2097fa8ba6
Imagine if language models could enter the iPhone’s application ecosystem. If we just allow a model to orchestrate our existing (and robust over the years) user interface, will the need for plug-ins and assistants become obsolete?
This proves how good GPT-4V is as a universal mobile AI proxy-without any fine-tuning or foundation, just through integration with a JSON schema-enabled text model.
It is recommended to watch this demo to learn about the (possibly) amazing factors and the results of using Guide on iOS 17,
NavAIGuide is LLMs ‘mobile and Web navigation proxy framework: https://github.com/francedot/NavAIGuide-TS
New video: