Skip to main content
By default, Skyvern returns extracted data in whatever format makes sense for the task. Pass a data_extraction_schema to enforce a specific structure using JSON Schema.

Define a schema

Add data_extraction_schema parameter to your task with a JSON Schema object:
result = await client.run_task(
    prompt="Get the title of the top post",
    url="https://news.ycombinator.com",
    data_extraction_schema={
        "type": "object",
        "properties": {
            "title": {
                "type": "string",
                "description": "The title of the top post"
            }
        }
    }
)
The description field in each property helps Skyvern understand what data to extract. Be specific.
description fields drive extraction quality. Vague descriptions like “the data” produce vague results. Be specific: “The product price in USD, without currency symbol.”

Schema format

Skyvern uses standard JSON Schema. Common types:
TypeJSON SchemaExample value
String{"type": "string"}"Hello world"
Number{"type": "number"}19.99
Integer{"type": "integer"}42
Boolean{"type": "boolean"}true
Array{"type": "array", "items": {...}}[1, 2, 3]
Object{"type": "object", "properties": {...}}{"key": "value"}
A schema doesn’t guarantee all fields are populated. If the data isn’t on the page, fields return null. Design your code to handle missing values.

Build your schema

Use the interactive builder to generate a schema, then copy it into your code.

Examples

Single value

Extract one piece of information, such as the current price of Bitcoin:
result = await client.run_task(
    prompt="Get the current Bitcoin price in USD",
    url="https://coinmarketcap.com/currencies/bitcoin/",
    data_extraction_schema={
        "type": "object",
        "properties": {
            "price": {
                "type": "number",
                "description": "Current Bitcoin price in USD"
            }
        }
    }
)
Output (when completed):
{
  "price": 104521.37
}

List of items

Extract multiple items with the same structure, such as the top posts from a news site:
result = await client.run_task(
    prompt="Get the top 5 posts",
    url="https://news.ycombinator.com",
    data_extraction_schema={
        "type": "object",
        "properties": {
            "posts": {
                "type": "array",
                "description": "Top 5 posts from the front page",
                "items": {
                    "type": "object",
                    "properties": {
                        "title": {
                            "type": "string",
                            "description": "Post title"
                        },
                        "points": {
                            "type": "integer",
                            "description": "Number of points"
                        },
                        "url": {
                            "type": "string",
                            "description": "Link to the post"
                        }
                    }
                }
            }
        }
    }
)
Output (when completed):
{
  "posts": [
    {
      "title": "Running Claude Code dangerously (safely)",
      "points": 342,
      "url": "https://blog.emilburzo.com/2026/01/running-claude-code-dangerously-safely/"
    },
    {
      "title": "Linux kernel framework for PCIe device emulation",
      "points": 287,
      "url": "https://github.com/cakehonolulu/pciem"
    },
    {
      "title": "I'm addicted to being useful",
      "points": 256,
      "url": "https://www.seangoedecke.com/addicted-to-being-useful/"
    },
    {
      "title": "Level S4 solar radiation event",
      "points": 198,
      "url": "https://www.swpc.noaa.gov/news/g4-severe-geomagnetic-storm"
    },
    {
      "title": "WebAssembly Text Format parser performance",
      "points": 176,
      "url": "https://blog.gplane.win/posts/improve-wat-parser-perf.html"
    }
  ]
}
Arrays without limits extract everything visible on the page. Specify limits in your prompt (e.g., “top 5 posts”) or the array description to control output size.

Nested objects

Extract hierarchical data, such as a product with its pricing and availability:
result = await client.run_task(
    prompt="Get product details including pricing and availability",
    url="https://www.amazon.com/dp/B0EXAMPLE",
    data_extraction_schema={
        "type": "object",
        "properties": {
            "product": {
                "type": "object",
                "description": "Product information",
                "properties": {
                    "name": {
                        "type": "string",
                        "description": "Product name"
                    },
                    "pricing": {
                        "type": "object",
                        "description": "Pricing details",
                        "properties": {
                            "current_price": {
                                "type": "number",
                                "description": "Current price in USD"
                            },
                            "original_price": {
                                "type": "number",
                                "description": "Original price before discount"
                            },
                            "discount_percent": {
                                "type": "integer",
                                "description": "Discount percentage"
                            }
                        }
                    },
                    "availability": {
                        "type": "object",
                        "description": "Stock information",
                        "properties": {
                            "in_stock": {
                                "type": "boolean",
                                "description": "Whether the item is in stock"
                            },
                            "delivery_estimate": {
                                "type": "string",
                                "description": "Estimated delivery date"
                            }
                        }
                    }
                }
            }
        }
    }
)
Output (when completed):
{
  "product": {
    "name": "Wireless Bluetooth Headphones",
    "pricing": {
      "current_price": 79.99,
      "original_price": 129.99,
      "discount_percent": 38
    },
    "availability": {
      "in_stock": true,
      "delivery_estimate": "Tomorrow, Jan 21"
    }
  }
}

Accessing extracted data

The extracted data appears in the output field of the completed run. Poll until the task reaches a terminal state, then access the output.
result = await client.run_task(
    prompt="Get the top post",
    url="https://news.ycombinator.com",
    data_extraction_schema={
        "type": "object",
        "properties": {
            "title": {"type": "string", "description": "Post title"},
            "points": {"type": "integer", "description": "Points"}
        }
    }
)

run_id = result.run_id

while True:
    run = await client.get_run(run_id)

    if run.status in ["completed", "failed", "terminated", "timed_out", "canceled"]:
        break

    await asyncio.sleep(5)

# Access the extracted data
print(f"Output: {run.output}")
If using webhooks, the same output field appears in the webhook payload.

Next steps

Task Parameters

All available parameters for run_task

Run a Task

Execute tasks and retrieve results