Agent Browser
Browser automation CLI for AI agents. Use when the user needs to interact with websites, including navigating pages, filling forms, clicking buttons, taking screenshots, extracting data, testing web apps, or automating any browser task. Triggers include requests to "open a website", "fill out a form", "click a button",
Overview
Agent Browser is a specialized command-line interface designed to facilitate browser automation for AI agents. Developed as part of the vercel-labs/agent-browser repository, this tool enables agents like Claude and Cursor to perform complex web-based actions. It supports a wide range of operations including page navigation, form completion, element interaction, and visual verification via screenshots. By providing a structured way for agents to interface with the web, it streamlines tasks such as automated testing and data extraction. The project has gained significant attention in the developer community, with the source repository accumulating over 37,000 stars. This skill serves as a bridge between LLM-driven logic and functional web environments, allowing for seamless execution of browser-dependent workflows.
Use Cases
Install Notes
# Review source first
open https://github.com/vercel-labs/agent-browser/blob/main/skills/agent-browser/SKILL.mdCopy or clone the skill folder into your agent skills directory after reviewing its instructions and scripts.
Security Notes
Users should exercise caution when granting the tool access to sensitive web environments or authenticated sessions, as the skill performs actions on behalf of the agent. Ensure that the AI agent's instructions are properly scoped to prevent unintended data submission or navigation to unauthorized URLs.
Related Skills
Core
vercel-labs/agent-browser
Core agent-browser usage guide. Read this before running any agent-browser commands. Covers the snapshot-and-ref workflow, navigating pages, interacting with elements (click, fill, type, select), extracting text and data, taking screenshots, managing tabs, handling forms and auth, waiting for content, running multiple
Agentcore
vercel-labs/agent-browser
Run agent-browser on AWS Bedrock AgentCore cloud browsers. Use when the user wants to use AgentCore, run browser automation on AWS, use a cloud browser with AWS credentials, or needs a managed browser session backed by AWS infrastructure. Triggers include "use agentcore", "run on AWS", "cloud browser with AWS", "bedroc
LangChain Middleware
langchain-ai/langchain-skills
INVOKE THIS SKILL when you need human-in-the-loop approval, custom middleware, or structured output. Covers HumanInTheLoopMiddleware for human approval of dangerous tool calls, creating custom middleware with hooks, Command resume patterns, and structured output with Pydantic/Zod.
Deep Agents Core
langchain-ai/langchain-skills
INVOKE THIS SKILL when building ANY Deep Agents application. Covers create_deep_agent(), harness architecture, SKILL.md format, and configuration options.
LangGraph Persistence
langchain-ai/langchain-skills
INVOKE THIS SKILL when your LangGraph needs to persist state, remember conversations, travel through history, or configure subgraph checkpointer scoping. Covers checkpointers, thread_id, time travel, Store, and subgraph persistence modes.
Windows Builder
hashicorp/agent-skills
Build Windows images with Packer using WinRM communicator and PowerShell provisioners. Use when creating Windows AMIs, Azure images, or VMware templates.