Which of your browser workflows would you delegate at this time if an agent might plan and execute predefined UI actions? Google AI introduces Gemini 2.5 Computer Use, a specialised variant of Gemini 2.5 that plans and executes actual UI actions in a stay browser by way of a constrained motion API. It’s accessible in public preview by means of Google AI Studio and Vertex AI. The mannequin targets internet automation and UI testing, with documented, human-judged good points on commonplace internet/cell management benchmarks and a security layer that may require human affirmation for dangerous steps.
What the mannequin truly ships?
Builders name a brand new computer_use
software that returns operate calls like click_at
, type_text_at
, or drag_and_drop
. Consumer code executes the motion (e.g., Playwright/Browserbase), captures a contemporary screenshot/URL, and loops till the duty ends or a security rule blocks it. The supported motion house is 13 predefined UI actions—open_web_browser
, wait_5_seconds
, go_back
, go_forward
, search
, navigate
, click_at
, hover_at
, type_text_at
, key_combination
, scroll_document
, scroll_at
, drag_and_drop
—and may be prolonged with customized features (e.g., open_app
, long_press_at
, go_home
) for non-browser surfaces.
What’s the scope and constraints?
The mannequin is optimized for internet browsers. Google states it’s not but optimized for desktop OS-level management; cell situations work by swapping in customized actions beneath the identical loop. A built-in security monitor can block prohibited actions or require consumer affirmation earlier than “high-stakes” operations (funds, sending messages, accessing delicate data).
Measured efficiency
- On-line-Mind2Web (official): 69.0% move@1 (majority-vote human judgments), validated by benchmark organizers.
- Browserbase matched harness: Leads competing computer-use APIs on each accuracy and latency throughout On-line-Mind2Web and WebVoyager beneath an identical time/step/atmosphere constraints. Google’s mannequin card lists 65.7% (OM2W) and 79.9% (WebVoyager) in Browserbase runs.
- Latency/high quality trade-off (Google determine): ~70%+ accuracy at ~225 s median latency on the Browserbase OM2W harness. Deal with as Google-reported, with human analysis.
- AndroidWorld (cell generalization): 69.7% measured by Google; achieved by way of the identical API loop with customized cell actions and excluded browser actions.
Early manufacturing alerts
- Automated UI check restore: Google’s funds platform crew stories the mannequin rehabilitates >60% of beforehand failing automated UI check executions. That is attributed (and ought to be cited) to public reporting reasonably than the core weblog put up.
- Operational pace: Poke.com (early exterior tester) stories workflows usually ~50% sooner versus their next-best various.
Gemini 2.5 Pc Use is in public preview by way of Google AI Studio and Vertex AI; it exposes a constrained API with 13 documented UI actions and requires a client-side executor. Google’s supplies and the mannequin card report state-of-the-art outcomes on internet/cell management benchmarks, and Browserbase’s matched harness exhibits ~65.7% move@1 on On-line-Mind2Web with main latency beneath an identical constraints. The scope is browser-first with per-step security/affirmation. These knowledge factors justify measured analysis in UI testing and internet ops.
Take a look at the GitHub Page and Technical details. Be at liberty to take a look at our GitHub Page for Tutorials, Codes and Notebooks. Additionally, be at liberty to comply with us on Twitter and don’t overlook to affix our 100k+ ML SubReddit and Subscribe to our Newsletter.
Michal Sutter is a knowledge science skilled with a Grasp of Science in Knowledge Science from the College of Padova. With a strong basis in statistical evaluation, machine studying, and knowledge engineering, Michal excels at remodeling advanced datasets into actionable insights.