Launch a cloud browser, interact with pages using AI and Playwright, chain agent tasks, and build complete browser automations in Python or TypeScript.
Browser automations are multi-step automations written in code. You launch a cloud browser, get a Playwright page with AI methods layered on top, and write your automation in Python or TypeScript. The code lives in your repo, runs in your CI, and deploys like any other software.This is the code-first path. If you prefer building automations visually with drag-and-drop blocks, see Workflows in the Cloud UI docs.
Workflows vs. Browser Automation: Workflows are built and run in the Cloud UI, no code required. Browser automations are built and run in code using Page, Agent, and Browser. Both can do multi-step work across pages. Choose based on whether your team prefers a visual editor or a code editor.
This guide walks through a real example: logging into a vendor portal, extracting invoice data, and downloading a PDF. Four steps:
Launch a browser and get a page
Navigate and interact with the page using AI actions, selectors, or both
Use the agent for complex goals like login, multi-step tasks, and file downloads
Every automation starts by launching a cloud browser and getting a page. The browser is a Chromium instance hosted by Skyvern. The page is a Playwright page with AI methods added on top.
Install the SDK first if you haven’t:
pip install skyvern
All code snippets below run inside an async function. See the complete example for the full runnable script.
The browser stays alive until you call browser.close() or the session times out (default: 60 minutes). All pages inside it share cookies, localStorage, and auth state.For browser launch options (timeouts, proxies, connecting to local browsers), see Managing Browsers.
Once you have a page, you can interact with it using standard Playwright, AI actions, or both. For the full list of available methods and parameters, see the Actions Reference.
Four methods let you interact with the page using natural language. Skyvern screenshots the page and determines which elements to target.
# Perform any action. Returns None.await page.act("Click the login button")# Extract structured data. Returns dict (or list if schema root is array).data = await page.extract( "Extract all product names and prices", schema={ "type": "array", "items": { "type": "object", "properties": { "name": {"type": "string"}, "price": {"type": "number"}, }, }, },)# Check page state. Returns bool.is_logged_in = await page.validate("The user is logged in")# Ask the AI a question. Returns {"llm_response": str}.result = await page.prompt("What is the total at the bottom of the table?")answer = result["llm_response"]
click, fill, and select_option accept both a CSS selector and an AI prompt. The selector runs first. If it fails, the AI takes over.
# Selector only (fast, deterministic)await page.click("#submit-button")await page.fill("#email", value="user@example.com")# AI only (no selector needed, resilient to layout changes)await page.click(prompt="Click the 'Submit' button")await page.fill(prompt="Fill 'user@example.com' in the email field")# Both - selector first, AI fallback (best for production)await page.click("#submit-button", prompt="Click the 'Submit' button")await page.fill("#email", value="user@example.com", prompt="Fill the email field")
For multi-step goals (“log in with 2FA”, “navigate to billing and download the invoice”), hand off to the agent. The agent runs a full AI task loop inside your page, preserving all browser state.
The agent handles multi-page login flows, CAPTCHAs, and 2FA automatically. Four credential providers: skyvern (built-in vault), bitwarden, onepassword, azure_vault. Store credentials via the Credentials API.
result = await page.agent.run_task( "Go to the billing page and extract all invoice details", data_extraction_schema={ "type": "object", "properties": { "invoice_number": {"type": "string"}, "amount": {"type": "string"}, }, },)print(result.output)# {"invoice_number": "INV-2025-042", "amount": "$1,250.00"}
After the agent finishes, you take back control. The page retains all state:
await page.agent.run_task("Navigate to the settings page")# Agent is done - use direct page actionssettings = await page.extract("Extract all notification preferences")await page.click("#save-button")
result = await page.agent.download_files( "Download the Q4 2025 financial report", download_suffix=".pdf", download_timeout=30,)for file in result.downloaded_files or []: print(f"Downloaded: {file.url}")
agent.run_workflow vs skyvern.run_workflow: The top-level run_workflow opens its own browser. Use agent.run_workflow when you need to log in first or do setup before the workflow runs.
Here’s what the Complete Example prints, grouped by the call that produced each block:
# skyvern.launch_cloud_browser()[Skyvern] Launched new cloud browser session url=https://app.skyvern.com/browser-session/pbs_519895987620976102# page.agent.login(...)[Skyvern] AI login workflow finished run_id=wr_519896041203554782 status=completed# page.extract(...)[Skyvern] AI extract prompt=Extract all invoice numbers, dates, and amounts# page.agent.download_files(...)[Skyvern] Starting AI file download workflow navigation_goal=Download the latest monthly statement[Skyvern] AI file download workflow is running, this may take a while run_id=wr_519896107880060430[Skyvern] AI file download workflow finished run_id=wr_519896107880060430 status=completed# print(f"Found {len(invoices)} invoices") + for inv in invoices: print(...)Found 3 invoices INV-2025-042: $2,340.00 INV-2025-043: $1,850.00 INV-2025-044: $3,100.00# for file in result.downloaded_files: print(f"Downloaded: {file.url}")Downloaded: https://skyvern-uploads.s3.amazonaws.com/downloads/production/o_510.../wr_519.../statement.pdf?AWS...
The [Skyvern] ... lines come from the SDK’s built-in logger and stream in real time as each call runs. Your own print output appears after the awaited call returns.For the full response shape, see TaskRunResponse and WorkflowRunResponse.
Every run captures recordings, screenshots, AI reasoning logs, network HAR files, and downloaded files. Access them via the Cloud UI or programmatically:
artifacts = await skyvern.get_run_artifacts(result.run_id)for artifact in artifacts: print(f"{artifact.artifact_type}: {artifact.uri}")
For the full artifact type reference and debugging workflows, see Using Artifacts.