Skip to main content
Browser automations are built in code using three layers: Page (AI-enhanced Playwright), Agent (multi-step AI goals), and Browser (cloud Chromium instance). This page lists every operation available on a Page or through an Agent. For the full SDK documentation with all parameter options, see the SDK Reference. If you’re building automations visually instead, see Block Types and Configuration for the equivalent operations in the Cloud UI workflow editor.

Quick reference

Page actions

ActionPurposeReturns
actPerform any action from a natural-language promptNone
extractPull structured data from the pagedict, list, str, or None
validateAssert a condition about the pagebool
promptAsk the LLM a question about the pagedict, list, str, or None
clickClick an element (selector, AI, or both)str | None
fillFill an input field (selector, AI, or both)str
select_optionSelect a dropdown option (selector, AI, or both)str | None
typeType text character-by-character (Python only)str
hoverMove mouse over an element (Python only)str
scrollScroll the page by pixel offset (Python only)None
upload_fileUpload files to a file input (Python only)str
locatorLocate an element with AI, returns a chainable Locator (Python only)Locator

Agent methods

MethodPurposeReturns
agent.loginAuthenticate with stored credentialsWorkflowRunResponse
agent.run_taskRun a multi-step AI task on the current pageTaskRunResponse
agent.download_filesNavigate and download filesWorkflowRunResponse
agent.run_workflowRun a Cloud UI workflow on the current pageWorkflowRunResponse

Form automation (Python only)

MethodPurposeReturns
fill_formAI-powered single-page form fillNone
fill_multipage_formForm fill across multiple pagesint (pages filled)
fill_from_mappingFill fields by index-to-value mappingNone
extract_form_fieldsExtract all form field metadatalist[dict]
validate_mappingCheck if a field mapping is validbool
fill_autocompleteFill input with typeahead handlingstr

iframe management (Python only)

MethodPurposeReturns
frame_switchSwitch context to an iframedict
frame_mainSwitch back to the main framedict
frame_listList all frames on the pagelist[dict]

Page actions

act

Perform any action described in natural language.
await page.act("Click the login button")
await page.act("Scroll down to the pricing section")
ParameterTypeRequiredDescription
promptstrYesWhat action to perform
skip_refreshboolNoSkip page refresh before acting
use_economy_treeboolNoUse a smaller DOM tree for faster processing
Returns: None SDK reference: act

extract

Pull structured data from the visible page.
data = await page.extract(
    "Extract all product names and prices",
    schema={
        "type": "array",
        "items": {
            "type": "object",
            "properties": {
                "name": {"type": "string"},
                "price": {"type": "number"},
            },
        },
    },
)
ParameterTypeRequiredDescription
promptstrYesWhat to extract
schemadict | list | strNoJSON Schema for typed output
error_code_mappingdictNoMap custom error codes
Returns: dict, list, str, or None SDK reference: extract

validate

Assert a condition about the current page state.
is_logged_in = await page.validate("The user is logged in")
ParameterTypeRequiredDescription
promptstrYesCondition to check
modeldictNoOverride LLM model config
Returns: bool SDK reference: validate

prompt

Ask the LLM a question about the current page.
result = await page.prompt(
    "How many items are in the navigation menu?",
    schema={"count": {"type": "integer"}}
)
ParameterTypeRequiredDescription
promptstrYesQuestion to ask
schemadictNoJSON Schema for structured response
modeldictNoOverride LLM model config
Returns: dict, list, str, or None. Without a schema, returns a dict of the form {"llm_response": "..."} (TypeScript: { llmResponse: "..." }). With a schema, returns data shaped to your schema. SDK reference: prompt

click

Click an element using a selector, AI prompt, or both.
await page.click("#submit-button")                                       # selector
await page.click(prompt="Click the 'Submit' button")                     # AI
await page.click("#submit-button", prompt="Click the 'Submit' button")   # both
ParameterTypeRequiredDescription
selectorstrNoCSS selector
promptstrNoAI prompt (fallback or primary)
aistrNoAI mode: "fallback" (default) or None
Returns: str \| None (resolved selector) SDK reference: click

fill

Fill an input field using a selector, AI prompt, or both.
await page.fill("#email", value="user@example.com")                         # selector
await page.fill(prompt="Fill 'user@example.com' in the email field")         # AI
await page.fill("#email", value="user@example.com", prompt="Fill email")     # both
ParameterTypeRequiredDescription
selectorstrNoCSS selector
valuestrNoValue to fill
promptstrNoAI prompt (fallback or primary)
aistrNoAI mode: "fallback" (default) or None
totp_identifierstrNoTOTP identifier for 2FA fields
totp_urlstrNoTOTP URL
Returns: str (resolved selector) SDK reference: fill

select_option

Select a dropdown option using a selector, AI prompt, or both.
await page.select_option("#country", value="us")                                       # selector
await page.select_option(prompt="Select 'United States' from the country dropdown")    # AI
await page.select_option("#country", value="us", prompt="Select United States")         # both
ParameterTypeRequiredDescription
selectorstrNoCSS selector
valuestr | list[str]NoOption value(s) to select
promptstrNoAI prompt (fallback or primary)
aistrNoAI mode: "fallback" (default) or None
Returns: str | None (resolved selector) SDK reference: select_option

type

Type text character-by-character. Unlike fill, this triggers keystroke events for each character, so use it for fields that react to individual key presses (search autocomplete, OTP inputs). Python only.
# Character-by-character input via selector
await page.type("#search", value="wireless headphones")

# AI-powered type
await page.type(prompt="Type 'hello' into the search box")

# Selector with AI fallback
await page.type("#search", value="query text", prompt="Type into the search field")

# TOTP input from a stored secret
await page.type("#otp", totp_identifier="my-app")

# TOTP generated on the fly from an otpauth URI
await page.type("#otp", totp_url="otpauth://totp/Example:alice?secret=JBSWY3DPEHPK3PXP")
ParameterTypeRequiredDescription
selectorstrNoCSS or XPath selector for the input field
valuestrNoText to type character-by-character
promptstrNoNatural-language description of the target field
aistrNoAI mode: "fallback" (default) tries the selector first, then AI
totp_identifierstrNoIdentifier for a stored TOTP secret
totp_urlstrNootpauth:// URI to generate a one-time password on the fly
Returns: str (resolved selector) SDK reference: type

hover

Move the mouse over an element. Python only.
# Simple hover
await page.hover("#menu-item")

# Hover with a hold duration (useful for tooltip reveals)
await page.hover("#tooltip-trigger", hold_seconds=1.5)

# Hover with intention logging for debugging
await page.hover("#menu-item", intention="Reveal the main menu dropdown")
ParameterTypeRequiredDescription
selectorstrYesCSS or XPath selector for the target element
timeoutfloatNoMax wait time in milliseconds for the element. Defaults to BROWSER_ACTION_TIMEOUT_MS
hold_secondsfloatNoHow long to hold the hover, in seconds. Default 0.0
intentionstrNoDescription of the hover intent, used for logging
Returns: str (resolved selector) SDK reference: hover

scroll

Scroll the page by a pixel offset along the x and y axes. Python only.
await page.scroll(0, 500)   # Scroll down 500px
await page.scroll(0, -300)  # Scroll up 300px
await page.scroll(200, 0)   # Scroll right 200px
ParameterTypeRequiredDescription
scroll_xintYesHorizontal scroll offset in pixels. Positive values scroll right
scroll_yintYesVertical scroll offset in pixels. Positive values scroll down
Returns: None SDK reference: scroll

upload_file

Upload one or more files to a file input. Pass a selector for direct Playwright behavior, a prompt for AI-powered file input detection, or both. Python only.
# Direct selector
await page.upload_file("#file-input", files="/path/to/file.pdf")

# Multiple files
await page.upload_file("#file-input", files=["/path/to/file1.pdf", "/path/to/file2.pdf"])

# AI-powered file input detection (no selector needed)
await page.upload_file(prompt="Upload the resume to the file input")

# Selector with AI fallback: try the selector first, use AI if it fails
await page.upload_file(
    "#file-input",
    files="/path/to/file.pdf",
    prompt="Upload the resume to the file input",
)
ParameterTypeRequiredDescription
selectorstrNoCSS or XPath selector for the file input
filesstr | list[str]NoFile path or list of file paths to upload
promptstrNoNatural-language description of the file input to target
aistrNoAI mode: "fallback" (default) tries the selector first, then AI
Returns: str (resolved selector) SDK reference: upload_file

locator

Locate an element using a CSS/XPath selector, an AI prompt, or both. When called with a prompt, returns an AILocator, a lazy Playwright Locator that resolves the element via AI on first use. Python only.
# AI-powered: pass a natural-language prompt
locator = page.locator(prompt="the submit button")
await locator.click()

# Full Playwright chaining works
text = await page.locator(prompt="the error message").text_content()

# Standard Playwright selector (no AI, identical to vanilla Playwright)
locator = page.locator("#submit-btn")
await locator.click()

# Selector with AI fallback: try the selector first, use AI if it fails
locator = page.locator("#submit-btn", prompt="the submit button")
await locator.click()
When called with only a selector (no prompt), page.locator(selector) behaves exactly like the standard Playwright page.locator(selector). No AI is involved.
ParameterTypeRequiredDescription
selectorstrNoCSS or XPath selector passed to Playwright’s built-in locator()
promptstrNoNatural-language description of the element. When provided, returns an AILocator that resolves via AI
aistrNoAI mode: "fallback" (default) tries the selector first, then AI
**kwargsNoAdditional keyword arguments forwarded to Playwright’s locator()
Returns: Locator (standard Playwright Locator when only a selector is given, or AILocator when a prompt is provided)

AILocator methods

When prompt is provided, the returned AILocator supports all standard Playwright Locator methods:
  • Actions: click(), fill(), type(), select_option(), check(), uncheck(), clear(), hover(), focus(), press()
  • Queries: text_content(), inner_text(), inner_html(), get_attribute(), input_value(), count()
  • State: is_visible(), is_hidden(), is_enabled(), is_disabled(), is_editable(), is_checked()
  • Chaining: first(), last(), nth(), filter(), locator(), get_by_text(), get_by_role(), get_by_label(), get_by_placeholder()
  • Utilities: wait_for(), screenshot(), playwright_locator (access raw Locator)
SDK reference: locator

Agent methods

All agent methods return either a TaskRunResponse (for agent.run_task) or a WorkflowRunResponse (for agent.login, agent.download_files, agent.run_workflow). Follow the links for the full field list.

agent.login

Authenticate with stored credentials. Handles multi-page login flows, CAPTCHAs, and 2FA.
await page.agent.login(
    credential_type="skyvern",
    credential_id="cred_123"
)
ParameterTypeRequiredDescription
credential_typeCredentialTypeYesskyvern, bitwarden, onepassword, or azure_vault
credential_idstrNoCredential ID (required for skyvern type)
urlstrNoLogin page URL
promptstrNoAdditional login instructions
timeoutfloatNoMax wait time in seconds (default: 1800)
Returns: WorkflowRunResponse SDK reference: agent.login

agent.run_task

Run a multi-step AI task on the current page.
result = await page.agent.run_task(
    "Go to the billing page and download the latest invoice",
    data_extraction_schema={
        "type": "object",
        "properties": {
            "invoice_number": {"type": "string"},
            "amount": {"type": "string"},
        },
    },
)
print(result.output)
ParameterTypeRequiredDescription
promptstrYesMulti-step goal description
data_extraction_schemadict | strNoJSON Schema for structured output. See Extract Structured Data.
max_stepsintNoCap AI steps. The run terminates with timed_out if hit. Controls cost; each step is one AI decision + action cycle.
engineRunEngineNoAI engine. Defaults to skyvern-1.0 in the SDK for backward compatibility. Pass skyvern-2.0 for the latest model (also the Cloud UI default). Other options: openai-cua, anthropic-cua, ui-tars.
modeldict / Record<string, unknown>NoOverride LLM model configuration
urlstr / stringNoURL to navigate to (defaults to current page URL)
webhook_urlstr / stringNoCallback URL. Skyvern POSTs the full run result on completion or failure. See Webhooks.
totp_identifierstr / stringNoIdentifier for push-based TOTP. See Handle 2FA.
totp_urlstr / stringNoEndpoint Skyvern calls to pull TOTP codes. See Handle 2FA.
titlestr / stringNoDisplay name for this run
error_code_mappingdict / Record<string, string>NoMap custom error codes to conditions. Keys are your error codes, values describe when to trigger them. If matched, output contains {"error": "your_code"}.
user_agentstr / stringNoCustom User-Agent header for the browser
timeoutfloatNoMax wait in seconds (default: 1800)
Returns: TaskRunResponse SDK reference: agent.run_task

agent.download_files

Navigate and download files from the current page.
result = await page.agent.download_files(
    "Download the latest invoice PDF",
    download_suffix=".pdf",
    download_timeout=30,
)
ParameterTypeRequiredDescription
promptstrYesWhat to download
download_suffixstrNoFilename hint prepended to the saved file (e.g., "invoice"invoice.pdf). Not a validator; mismatched extensions don’t fail the run.
download_timeoutfloatNoSoft hint (seconds) for how long to wait for a download. The overall timeout is what actually fails the run.
max_steps_per_runintNoCap AI steps
timeoutfloatNoMax wait in seconds (default: 1800)
Returns: WorkflowRunResponse SDK reference: agent.download_files

agent.run_workflow

Run a Cloud UI workflow on the current page.
result = await page.agent.run_workflow(
    "wpid_monthly_report",
    parameters={"month": "2025-03"}
)
ParameterTypeRequiredDescription
workflow_idstrYesWorkflow permanent ID
parametersdictNoWorkflow input parameters
templateboolNoRun a template workflow
titlestr / stringNoDisplay name for this run
webhook_urlstr / stringNoCallback URL for run completion. See Webhooks.
totp_urlstr / stringNoEndpoint for pulling TOTP codes
totp_identifierstr / stringNoIdentifier for push-based TOTP
timeoutfloatNoMax wait in seconds (default: 1800)
Returns: WorkflowRunResponse SDK reference: agent.run_workflow

Form automation (Python only)

fill_form

Fill a single-page form using AI. Pass a data dict describing the values to fill.
# Simple form fill
await page.fill_form(
    data={"name": "John Doe", "email": "john@example.com", "role": "Engineer"},
)

# With a custom prompt to guide the AI
await page.fill_form(
    data={"name": "John Doe", "email": "john@example.com"},
    prompt="Fill out the registration form with the provided user details",
)
ParameterTypeRequiredDescription
datadict[str, Any]YesKey-value pairs of form data to fill
promptstrNoInstruction for the AI. Defaults to "Fill out the form"
Returns: None SDK reference: fill_form

fill_multipage_form

Fill a form that spans multiple pages, handling page transitions automatically.
pages_filled = await page.fill_multipage_form(
    data={"name": "John Doe", "email": "john@example.com", "address": "123 Main St"},
    max_pages=5,
)
print(f"Filled {pages_filled} pages")
ParameterTypeRequiredDescription
datadict[str, Any]YesKey-value pairs of form data to fill across all pages
promptstrNoInstruction for the AI. Defaults to "Fill out the form"
next_buttonstrNoSelector or description of the button to advance to the next page
max_pagesintNoMaximum number of pages to fill. Defaults to 10
timeout_secondsfloatNoTimeout in seconds for the entire operation. Defaults to 300
Returns: int (the number of pages filled) SDK reference: fill_multipage_form

fill_from_mapping

Fill form fields using an explicit index-based mapping produced by extract_form_fields. Use this when you need precise control over which field gets which value.
fields = await page.extract_form_fields()
await page.fill_from_mapping(
    form_fields=fields,
    mapping={0: "John", 1: "Doe", 2: "john@example.com"},  # keys are field indices
    data={"name": "John Doe"},  # optional context
)
ParameterTypeRequiredDescription
form_fieldslist[dict[str, Any]]YesField metadata returned by extract_form_fields
mappingdict[int, str | list | bool | None]YesMap of field index to the value to fill
datadict[str, Any] | NoneNoOptional source data for context. Defaults to None
Returns: None SDK reference: fill_from_mapping

extract_form_fields

Extract all form fields with metadata from the current page. Use the output to drive fill_from_mapping or validate_mapping.
fields = await page.extract_form_fields()
# Returns a list of dicts with field name, type, options, and other metadata:
# [{"name": "First Name", "type": "text", "index": 0, ...}, ...]
Returns: list[dict[str, Any]] where each dict contains field name, type, options, and other metadata. SDK reference: extract_form_fields

validate_mapping

Check if a field mapping is correct for the current form.
fields = await page.extract_form_fields()
is_valid = await page.validate_mapping(
    form_fields=fields,
    mapping={0: "John", 1: "Doe"},
    prompt="Validate the name fields are filled correctly",
)
ParameterTypeRequiredDescription
form_fieldslist[dict[str, Any]]YesField metadata returned by extract_form_fields
mappingdict[int, str | list | bool | None]YesMap of field index to the value to validate
promptstrYesInstruction describing what to validate
Returns: bool (True if the mapping is valid, False otherwise) SDK reference: validate_mapping

fill_autocomplete

Fill an input that has autocomplete/typeahead behavior. Types the value, waits for suggestions, then clicks the matching option.
# Direct selector
await page.fill_autocomplete(
    selector="#city",
    value="San Francisco",
    option_selector=".autocomplete-option",
    wait_seconds=1.5,
)

# AI-powered
await page.fill_autocomplete(prompt="Fill 'San Francisco' in the city autocomplete")
ParameterTypeRequiredDescription
selectorstrNoCSS selector for the input field
valuestrNoThe text value to type into the field
promptstrNoNatural-language description of the field and value
aistrNoAI mode: "fallback" (default) tries the selector first, then AI
option_selectorstrNoCSS selector for the autocomplete dropdown options
wait_secondsfloatNoSeconds to wait for the dropdown to appear. Default 1.5
**kwargsNoStandard Playwright fill options (e.g., timeout, force)
Returns: str (resolved selector) SDK reference: fill_autocomplete

iframe management (Python only)

frame_switch

Switch the working context to an iframe. Exactly one of selector, name, or index must be provided.
# By CSS selector
await page.frame_switch(selector="#payment-iframe")

# By frame name
await page.frame_switch(name="checkout")

# By zero-based index
await page.frame_switch(index=0)
ParameterTypeRequiredDescription
selectorstrNoCSS selector for the iframe element
namestrNoThe name attribute of the iframe
indexintNoZero-based index of the iframe on the page
Exactly one of selector, name, or index must be provided.
Returns: dict[str, Any] (frame metadata for the switched-to iframe) SDK reference: frame_switch

frame_main

Switch back to the main page frame after working inside an iframe.
page.frame_main()
Returns: dict[str, str] SDK reference: frame_main

frame_list

List all frames on the current page with metadata.
frames = await page.frame_list()
# [{"name": "checkout", "url": "...", "index": 0}, ...]
Returns: list[dict[str, Any]] (metadata for each frame on the page) SDK reference: frame_list