Google Unveils Gemini 2.5 AI Model for Human-Like Web Browsing Tasks
Sharing :
Google has introduced its latest AI model, Gemini 2.5 Computer Use, which is designed to navigate and interact with web browsers similarly to human users. This model can perform tasks such as clicking, scrolling, and typing within a browser window, allowing it to access information not available through APIs.
The Gemini 2.5 model employs visual understanding and reasoning capabilities to analyze user requests and execute tasks, including filling out forms. It is particularly useful for user interface testing and navigating sites that lack direct API connections.
This announcement follows OpenAI’s recent unveiling of new applications for ChatGPT, highlighting the competitive landscape in AI development. Additionally, Anthropic had previously released its own AI model with similar capabilities last year.
Google has shared demonstration videos showcasing the computer use tool, although these videos are sped up by three times. The company claims that its model outperforms leading alternatives across various web and mobile benchmarks.
Unlike other AI models, Gemini 2.5 is limited to browser access and does not control an entire computer environment. Currently, it supports 13 actions, including opening a browser and dragging and dropping elements, but is not yet optimized for desktop-level control.
Developers can access Gemini 2.5 through Google AI Studio and Vertex AI. A demo is also available on Browserbase, where users can see the AI complete tasks such as playing a game of 2048 or browsing Hacker News for trending topics.