Define JSON Schema extraction schemas for Skyvern tasks to get consistent, typed output. Build schemas for single objects or arrays of items with the interactive schema builder.
When building browser automations in code, you can extract structured data from any page using page.extract with a JSON schema, or by passing a data_extraction_schema to page.agent.run_task. By default, Skyvern returns extracted data in whatever format makes sense for the action. Pass a schema to enforce a specific shape using JSON Schema.If you’re using the Cloud UI workflow editor instead, extraction works through the Extract block. See Block Types and Configuration for setup.
Pass a JSON Schema object to page.extract via the schema parameter, or to run_task via data_extraction_schema:
page.extract (browser automation)
run_task
data = await page.extract( "Get the title of the top post", schema={ "type": "object", "properties": { "title": { "type": "string", "description": "The title of the top post" } } },)
result = await client.run_task( prompt="Get the title of the top post", url="https://news.ycombinator.com", data_extraction_schema={ "type": "object", "properties": { "title": { "type": "string", "description": "The title of the top post" } } })
The description field in each property helps Skyvern understand what data to extract. Be specific.
description fields drive extraction quality. Vague descriptions like “the data” produce vague results. Be specific: “The product price in USD, without currency symbol.”
Extract multiple items with the same structure, such as the top posts from a news site:
result = await client.run_task( prompt="Get the top 5 posts", url="https://news.ycombinator.com", data_extraction_schema={ "type": "object", "properties": { "posts": { "type": "array", "description": "Top 5 posts from the front page", "items": { "type": "object", "properties": { "title": { "type": "string", "description": "Post title" }, "points": { "type": "integer", "description": "Number of points" }, "url": { "type": "string", "description": "Link to the post" } } } } } })
Output (when completed):
{ "posts": [ { "title": "Running Claude Code dangerously (safely)", "points": 342, "url": "https://blog.emilburzo.com/2026/01/running-claude-code-dangerously-safely/" }, { "title": "Linux kernel framework for PCIe device emulation", "points": 287, "url": "https://github.com/cakehonolulu/pciem" }, { "title": "I'm addicted to being useful", "points": 256, "url": "https://www.seangoedecke.com/addicted-to-being-useful/" }, { "title": "Level S4 solar radiation event", "points": 198, "url": "https://www.swpc.noaa.gov/news/g4-severe-geomagnetic-storm" }, { "title": "WebAssembly Text Format parser performance", "points": 176, "url": "https://blog.gplane.win/posts/improve-wat-parser-perf.html" } ]}
Arrays without limits extract everything visible on the page. Specify limits in your prompt (e.g., “top 5 posts”) or the array description to control output size.
page.extract returns the extracted data directly as the return value:
data = await page.extract( "Get the top post", schema={ "type": "object", "properties": { "title": {"type": "string", "description": "Post title"}, "points": {"type": "integer", "description": "Points"} } },)# data is the extracted result directlyprint(f"Title: {data['title']}")print(f"Points: {data['points']}")