Browser automations are built in code using three layers: Page (AI-enhanced Playwright), Agent (multi-step AI goals), and Browser (cloud Chromium instance). This page lists every operation available on a Page or through an Agent. For the full SDK documentation with all parameter options, see the SDK Reference.
If you’re building automations visually instead, see Block Types and Configuration for the equivalent operations in the Cloud UI workflow editor.
Quick reference
Page actions
| Action | Purpose | Returns |
|---|
act | Perform any action from a natural-language prompt | None |
extract | Pull structured data from the page | dict, list, str, or None |
validate | Assert a condition about the page | bool |
prompt | Ask the LLM a question about the page | dict, list, str, or None |
click | Click an element (selector, AI, or both) | str | None |
fill | Fill an input field (selector, AI, or both) | str |
select_option | Select a dropdown option (selector, AI, or both) | str | None |
type | Type text character-by-character (Python only) | str |
hover | Move mouse over an element (Python only) | str |
scroll | Scroll the page by pixel offset (Python only) | None |
upload_file | Upload files to a file input (Python only) | str |
locator | Locate an element with AI, returns a chainable Locator (Python only) | Locator |
Agent methods
| Method | Purpose | Returns |
|---|
agent.login | Authenticate with stored credentials | WorkflowRunResponse |
agent.run_task | Run a multi-step AI task on the current page | TaskRunResponse |
agent.download_files | Navigate and download files | WorkflowRunResponse |
agent.run_workflow | Run a Cloud UI workflow on the current page | WorkflowRunResponse |
| Method | Purpose | Returns |
|---|
fill_form | AI-powered single-page form fill | None |
fill_multipage_form | Form fill across multiple pages | int (pages filled) |
fill_from_mapping | Fill fields by index-to-value mapping | None |
extract_form_fields | Extract all form field metadata | list[dict] |
validate_mapping | Check if a field mapping is valid | bool |
fill_autocomplete | Fill input with typeahead handling | str |
iframe management (Python only)
| Method | Purpose | Returns |
|---|
frame_switch | Switch context to an iframe | dict |
frame_main | Switch back to the main frame | dict |
frame_list | List all frames on the page | list[dict] |
Page actions
act
Perform any action described in natural language.
await page.act("Click the login button")
await page.act("Scroll down to the pricing section")
| Parameter | Type | Required | Description |
|---|
prompt | str | Yes | What action to perform |
skip_refresh | bool | No | Skip page refresh before acting |
use_economy_tree | bool | No | Use a smaller DOM tree for faster processing |
Returns: None
SDK reference: act
Pull structured data from the visible page.
data = await page.extract(
"Extract all product names and prices",
schema={
"type": "array",
"items": {
"type": "object",
"properties": {
"name": {"type": "string"},
"price": {"type": "number"},
},
},
},
)
| Parameter | Type | Required | Description |
|---|
prompt | str | Yes | What to extract |
schema | dict | list | str | No | JSON Schema for typed output |
error_code_mapping | dict | No | Map custom error codes |
Returns: dict, list, str, or None
SDK reference: extract
validate
Assert a condition about the current page state.
is_logged_in = await page.validate("The user is logged in")
| Parameter | Type | Required | Description |
|---|
prompt | str | Yes | Condition to check |
model | dict | No | Override LLM model config |
Returns: bool
SDK reference: validate
prompt
Ask the LLM a question about the current page.
result = await page.prompt(
"How many items are in the navigation menu?",
schema={"count": {"type": "integer"}}
)
| Parameter | Type | Required | Description |
|---|
prompt | str | Yes | Question to ask |
schema | dict | No | JSON Schema for structured response |
model | dict | No | Override LLM model config |
Returns: dict, list, str, or None. Without a schema, returns a dict of the form {"llm_response": "..."} (TypeScript: { llmResponse: "..." }). With a schema, returns data shaped to your schema.
SDK reference: prompt
click
Click an element using a selector, AI prompt, or both.
await page.click("#submit-button") # selector
await page.click(prompt="Click the 'Submit' button") # AI
await page.click("#submit-button", prompt="Click the 'Submit' button") # both
| Parameter | Type | Required | Description |
|---|
selector | str | No | CSS selector |
prompt | str | No | AI prompt (fallback or primary) |
ai | str | No | AI mode: "fallback" (default) or None |
Returns: str \| None (resolved selector)
SDK reference: click
fill
Fill an input field using a selector, AI prompt, or both.
await page.fill("#email", value="user@example.com") # selector
await page.fill(prompt="Fill 'user@example.com' in the email field") # AI
await page.fill("#email", value="user@example.com", prompt="Fill email") # both
| Parameter | Type | Required | Description |
|---|
selector | str | No | CSS selector |
value | str | No | Value to fill |
prompt | str | No | AI prompt (fallback or primary) |
ai | str | No | AI mode: "fallback" (default) or None |
totp_identifier | str | No | TOTP identifier for 2FA fields |
totp_url | str | No | TOTP URL |
Returns: str (resolved selector)
SDK reference: fill
select_option
Select a dropdown option using a selector, AI prompt, or both.
await page.select_option("#country", value="us") # selector
await page.select_option(prompt="Select 'United States' from the country dropdown") # AI
await page.select_option("#country", value="us", prompt="Select United States") # both
| Parameter | Type | Required | Description |
|---|
selector | str | No | CSS selector |
value | str | list[str] | No | Option value(s) to select |
prompt | str | No | AI prompt (fallback or primary) |
ai | str | No | AI mode: "fallback" (default) or None |
Returns: str | None (resolved selector)
SDK reference: select_option
type
Type text character-by-character. Unlike fill, this triggers keystroke events for each character, so use it for fields that react to individual key presses (search autocomplete, OTP inputs). Python only.
# Character-by-character input via selector
await page.type("#search", value="wireless headphones")
# AI-powered type
await page.type(prompt="Type 'hello' into the search box")
# Selector with AI fallback
await page.type("#search", value="query text", prompt="Type into the search field")
# TOTP input from a stored secret
await page.type("#otp", totp_identifier="my-app")
# TOTP generated on the fly from an otpauth URI
await page.type("#otp", totp_url="otpauth://totp/Example:alice?secret=JBSWY3DPEHPK3PXP")
| Parameter | Type | Required | Description |
|---|
selector | str | No | CSS or XPath selector for the input field |
value | str | No | Text to type character-by-character |
prompt | str | No | Natural-language description of the target field |
ai | str | No | AI mode: "fallback" (default) tries the selector first, then AI |
totp_identifier | str | No | Identifier for a stored TOTP secret |
totp_url | str | No | otpauth:// URI to generate a one-time password on the fly |
Returns: str (resolved selector)
SDK reference: type
hover
Move the mouse over an element. Python only.
# Simple hover
await page.hover("#menu-item")
# Hover with a hold duration (useful for tooltip reveals)
await page.hover("#tooltip-trigger", hold_seconds=1.5)
# Hover with intention logging for debugging
await page.hover("#menu-item", intention="Reveal the main menu dropdown")
| Parameter | Type | Required | Description |
|---|
selector | str | Yes | CSS or XPath selector for the target element |
timeout | float | No | Max wait time in milliseconds for the element. Defaults to BROWSER_ACTION_TIMEOUT_MS |
hold_seconds | float | No | How long to hold the hover, in seconds. Default 0.0 |
intention | str | No | Description of the hover intent, used for logging |
Returns: str (resolved selector)
SDK reference: hover
Scroll the page by a pixel offset along the x and y axes. Python only.
await page.scroll(0, 500) # Scroll down 500px
await page.scroll(0, -300) # Scroll up 300px
await page.scroll(200, 0) # Scroll right 200px
| Parameter | Type | Required | Description |
|---|
scroll_x | int | Yes | Horizontal scroll offset in pixels. Positive values scroll right |
scroll_y | int | Yes | Vertical scroll offset in pixels. Positive values scroll down |
Returns: None
SDK reference: scroll
upload_file
Upload one or more files to a file input. Pass a selector for direct Playwright behavior, a prompt for AI-powered file input detection, or both. Python only.
# Direct selector
await page.upload_file("#file-input", files="/path/to/file.pdf")
# Multiple files
await page.upload_file("#file-input", files=["/path/to/file1.pdf", "/path/to/file2.pdf"])
# AI-powered file input detection (no selector needed)
await page.upload_file(prompt="Upload the resume to the file input")
# Selector with AI fallback: try the selector first, use AI if it fails
await page.upload_file(
"#file-input",
files="/path/to/file.pdf",
prompt="Upload the resume to the file input",
)
| Parameter | Type | Required | Description |
|---|
selector | str | No | CSS or XPath selector for the file input |
files | str | list[str] | No | File path or list of file paths to upload |
prompt | str | No | Natural-language description of the file input to target |
ai | str | No | AI mode: "fallback" (default) tries the selector first, then AI |
Returns: str (resolved selector)
SDK reference: upload_file
locator
Locate an element using a CSS/XPath selector, an AI prompt, or both. When called with a prompt, returns an AILocator, a lazy Playwright Locator that resolves the element via AI on first use. Python only.
# AI-powered: pass a natural-language prompt
locator = page.locator(prompt="the submit button")
await locator.click()
# Full Playwright chaining works
text = await page.locator(prompt="the error message").text_content()
# Standard Playwright selector (no AI, identical to vanilla Playwright)
locator = page.locator("#submit-btn")
await locator.click()
# Selector with AI fallback: try the selector first, use AI if it fails
locator = page.locator("#submit-btn", prompt="the submit button")
await locator.click()
When called with only a selector (no prompt), page.locator(selector) behaves exactly like the standard Playwright page.locator(selector). No AI is involved.
| Parameter | Type | Required | Description |
|---|
selector | str | No | CSS or XPath selector passed to Playwright’s built-in locator() |
prompt | str | No | Natural-language description of the element. When provided, returns an AILocator that resolves via AI |
ai | str | No | AI mode: "fallback" (default) tries the selector first, then AI |
**kwargs | | No | Additional keyword arguments forwarded to Playwright’s locator() |
Returns: Locator (standard Playwright Locator when only a selector is given, or AILocator when a prompt is provided)
AILocator methods
When prompt is provided, the returned AILocator supports all standard Playwright Locator methods:
- Actions:
click(), fill(), type(), select_option(), check(), uncheck(), clear(), hover(), focus(), press()
- Queries:
text_content(), inner_text(), inner_html(), get_attribute(), input_value(), count()
- State:
is_visible(), is_hidden(), is_enabled(), is_disabled(), is_editable(), is_checked()
- Chaining:
first(), last(), nth(), filter(), locator(), get_by_text(), get_by_role(), get_by_label(), get_by_placeholder()
- Utilities:
wait_for(), screenshot(), playwright_locator (access raw Locator)
SDK reference: locator
Agent methods
All agent methods return either a TaskRunResponse (for agent.run_task) or a WorkflowRunResponse (for agent.login, agent.download_files, agent.run_workflow). Follow the links for the full field list.
agent.login
Authenticate with stored credentials. Handles multi-page login flows, CAPTCHAs, and 2FA.
await page.agent.login(
credential_type="skyvern",
credential_id="cred_123"
)
| Parameter | Type | Required | Description |
|---|
credential_type | CredentialType | Yes | skyvern, bitwarden, onepassword, or azure_vault |
credential_id | str | No | Credential ID (required for skyvern type) |
url | str | No | Login page URL |
prompt | str | No | Additional login instructions |
timeout | float | No | Max wait time in seconds (default: 1800) |
Returns: WorkflowRunResponse
SDK reference: agent.login
agent.run_task
Run a multi-step AI task on the current page.
result = await page.agent.run_task(
"Go to the billing page and download the latest invoice",
data_extraction_schema={
"type": "object",
"properties": {
"invoice_number": {"type": "string"},
"amount": {"type": "string"},
},
},
)
print(result.output)
| Parameter | Type | Required | Description |
|---|
prompt | str | Yes | Multi-step goal description |
data_extraction_schema | dict | str | No | JSON Schema for structured output. See Extract Structured Data. |
max_steps | int | No | Cap AI steps. The run terminates with timed_out if hit. Controls cost; each step is one AI decision + action cycle. |
engine | RunEngine | No | AI engine. Defaults to skyvern-1.0 in the SDK for backward compatibility. Pass skyvern-2.0 for the latest model (also the Cloud UI default). Other options: openai-cua, anthropic-cua, ui-tars. |
model | dict / Record<string, unknown> | No | Override LLM model configuration |
url | str / string | No | URL to navigate to (defaults to current page URL) |
webhook_url | str / string | No | Callback URL. Skyvern POSTs the full run result on completion or failure. See Webhooks. |
totp_identifier | str / string | No | Identifier for push-based TOTP. See Handle 2FA. |
totp_url | str / string | No | Endpoint Skyvern calls to pull TOTP codes. See Handle 2FA. |
title | str / string | No | Display name for this run |
error_code_mapping | dict / Record<string, string> | No | Map custom error codes to conditions. Keys are your error codes, values describe when to trigger them. If matched, output contains {"error": "your_code"}. |
user_agent | str / string | No | Custom User-Agent header for the browser |
timeout | float | No | Max wait in seconds (default: 1800) |
Returns: TaskRunResponse
SDK reference: agent.run_task
agent.download_files
Navigate and download files from the current page.
result = await page.agent.download_files(
"Download the latest invoice PDF",
download_suffix=".pdf",
download_timeout=30,
)
| Parameter | Type | Required | Description |
|---|
prompt | str | Yes | What to download |
download_suffix | str | No | Filename hint prepended to the saved file (e.g., "invoice" → invoice.pdf). Not a validator; mismatched extensions don’t fail the run. |
download_timeout | float | No | Soft hint (seconds) for how long to wait for a download. The overall timeout is what actually fails the run. |
max_steps_per_run | int | No | Cap AI steps |
timeout | float | No | Max wait in seconds (default: 1800) |
Returns: WorkflowRunResponse
SDK reference: agent.download_files
agent.run_workflow
Run a Cloud UI workflow on the current page.
result = await page.agent.run_workflow(
"wpid_monthly_report",
parameters={"month": "2025-03"}
)
| Parameter | Type | Required | Description |
|---|
workflow_id | str | Yes | Workflow permanent ID |
parameters | dict | No | Workflow input parameters |
template | bool | No | Run a template workflow |
title | str / string | No | Display name for this run |
webhook_url | str / string | No | Callback URL for run completion. See Webhooks. |
totp_url | str / string | No | Endpoint for pulling TOTP codes |
totp_identifier | str / string | No | Identifier for push-based TOTP |
timeout | float | No | Max wait in seconds (default: 1800) |
Returns: WorkflowRunResponse
SDK reference: agent.run_workflow
Fill a single-page form using AI. Pass a data dict describing the values to fill.
# Simple form fill
await page.fill_form(
data={"name": "John Doe", "email": "john@example.com", "role": "Engineer"},
)
# With a custom prompt to guide the AI
await page.fill_form(
data={"name": "John Doe", "email": "john@example.com"},
prompt="Fill out the registration form with the provided user details",
)
| Parameter | Type | Required | Description |
|---|
data | dict[str, Any] | Yes | Key-value pairs of form data to fill |
prompt | str | No | Instruction for the AI. Defaults to "Fill out the form" |
Returns: None
SDK reference: fill_form
fill_multipage_form
Fill a form that spans multiple pages, handling page transitions automatically.
pages_filled = await page.fill_multipage_form(
data={"name": "John Doe", "email": "john@example.com", "address": "123 Main St"},
max_pages=5,
)
print(f"Filled {pages_filled} pages")
| Parameter | Type | Required | Description |
|---|
data | dict[str, Any] | Yes | Key-value pairs of form data to fill across all pages |
prompt | str | No | Instruction for the AI. Defaults to "Fill out the form" |
next_button | str | No | Selector or description of the button to advance to the next page |
max_pages | int | No | Maximum number of pages to fill. Defaults to 10 |
timeout_seconds | float | No | Timeout in seconds for the entire operation. Defaults to 300 |
Returns: int (the number of pages filled)
SDK reference: fill_multipage_form
fill_from_mapping
Fill form fields using an explicit index-based mapping produced by extract_form_fields. Use this when you need precise control over which field gets which value.
fields = await page.extract_form_fields()
await page.fill_from_mapping(
form_fields=fields,
mapping={0: "John", 1: "Doe", 2: "john@example.com"}, # keys are field indices
data={"name": "John Doe"}, # optional context
)
| Parameter | Type | Required | Description |
|---|
form_fields | list[dict[str, Any]] | Yes | Field metadata returned by extract_form_fields |
mapping | dict[int, str | list | bool | None] | Yes | Map of field index to the value to fill |
data | dict[str, Any] | None | No | Optional source data for context. Defaults to None |
Returns: None
SDK reference: fill_from_mapping
Extract all form fields with metadata from the current page. Use the output to drive fill_from_mapping or validate_mapping.
fields = await page.extract_form_fields()
# Returns a list of dicts with field name, type, options, and other metadata:
# [{"name": "First Name", "type": "text", "index": 0, ...}, ...]
Returns: list[dict[str, Any]] where each dict contains field name, type, options, and other metadata.
SDK reference: extract_form_fields
validate_mapping
Check if a field mapping is correct for the current form.
fields = await page.extract_form_fields()
is_valid = await page.validate_mapping(
form_fields=fields,
mapping={0: "John", 1: "Doe"},
prompt="Validate the name fields are filled correctly",
)
| Parameter | Type | Required | Description |
|---|
form_fields | list[dict[str, Any]] | Yes | Field metadata returned by extract_form_fields |
mapping | dict[int, str | list | bool | None] | Yes | Map of field index to the value to validate |
prompt | str | Yes | Instruction describing what to validate |
Returns: bool (True if the mapping is valid, False otherwise)
SDK reference: validate_mapping
fill_autocomplete
Fill an input that has autocomplete/typeahead behavior. Types the value, waits for suggestions, then clicks the matching option.
# Direct selector
await page.fill_autocomplete(
selector="#city",
value="San Francisco",
option_selector=".autocomplete-option",
wait_seconds=1.5,
)
# AI-powered
await page.fill_autocomplete(prompt="Fill 'San Francisco' in the city autocomplete")
| Parameter | Type | Required | Description |
|---|
selector | str | No | CSS selector for the input field |
value | str | No | The text value to type into the field |
prompt | str | No | Natural-language description of the field and value |
ai | str | No | AI mode: "fallback" (default) tries the selector first, then AI |
option_selector | str | No | CSS selector for the autocomplete dropdown options |
wait_seconds | float | No | Seconds to wait for the dropdown to appear. Default 1.5 |
**kwargs | | No | Standard Playwright fill options (e.g., timeout, force) |
Returns: str (resolved selector)
SDK reference: fill_autocomplete
iframe management (Python only)
frame_switch
Switch the working context to an iframe. Exactly one of selector, name, or index must be provided.
# By CSS selector
await page.frame_switch(selector="#payment-iframe")
# By frame name
await page.frame_switch(name="checkout")
# By zero-based index
await page.frame_switch(index=0)
| Parameter | Type | Required | Description |
|---|
selector | str | No | CSS selector for the iframe element |
name | str | No | The name attribute of the iframe |
index | int | No | Zero-based index of the iframe on the page |
Exactly one of selector, name, or index must be provided.
Returns: dict[str, Any] (frame metadata for the switched-to iframe)
SDK reference: frame_switch
frame_main
Switch back to the main page frame after working inside an iframe.
Returns: dict[str, str]
SDK reference: frame_main
frame_list
List all frames on the current page with metadata.
frames = await page.frame_list()
# [{"name": "checkout", "url": "...", "index": 0}, ...]
Returns: list[dict[str, Any]] (metadata for each frame on the page)
SDK reference: frame_list