Skip to main content

Documentation Index

Fetch the complete documentation index at: https://docs.scrapegraphai.com/llms.txt

Use this file to discover all available pages before exploring further.

Overview

@scrapegraph-ai/ai-sdk exposes ScrapeGraphAI endpoints as Vercel AI SDK tools. Add the tools to generateText or streamText, set stopWhen, and the model can scrape, extract, search, crawl, and monitor web data during the run.

Vercel AI SDK docs

Official Vercel AI SDK documentation

Tool calling

How AI SDK Core tools are executed

Installation

Install the ScrapeGraphAI tool package, the AI SDK, and the model provider you use:
npm i @scrapegraph-ai/ai-sdk ai @ai-sdk/openai
pnpm add @scrapegraph-ai/ai-sdk ai @ai-sdk/openai
yarn add @scrapegraph-ai/ai-sdk ai @ai-sdk/openai
bun add @scrapegraph-ai/ai-sdk ai @ai-sdk/openai
Set your keys:
export SGAI_API_KEY="your-scrapegraph-key"
export OPENAI_API_KEY="your-openai-key"
The tools read SGAI_API_KEY from the environment by default. You can also pass { apiKey: process.env.SGAI_API_KEY } to any tool factory.

Quickstart

Give the model a scrape tool and allow multiple steps so it can call the tool, receive the result, then write the final answer.
import { openai } from "@ai-sdk/openai";
import { generateText, stepCountIs } from "ai";
import { scrapeTool } from "@scrapegraph-ai/ai-sdk";

const { text } = await generateText({
  model: openai("gpt-5-nano"),
  prompt:
    "Scrape Hacker News and write a short, concise summary of what people are talking about today.",
  tools: {
    scrape: scrapeTool(),
  },
  stopWhen: stepCountIs(3),
});

console.log(text);

Available tools

FactoryWhat it gives the model
scrapeTool()Scrape a page as markdown, HTML, JSON, links, images, summary, branding, or screenshot
extractTool()Extract structured JSON from a URL, HTML, or markdown with a prompt
searchTool()Search the web and optionally extract structured data from results
crawlTools()Start, poll, page through, stop, resume, and delete crawl jobs
monitorTools()Create, list, update, pause, resume, delete, and inspect monitor activity
Use a narrow tool set when the task is specific. Use all tools when the agent needs to decide the workflow:
import { openai } from "@ai-sdk/openai";
import { generateText, stepCountIs } from "ai";
import {
  crawlTools,
  extractTool,
  monitorTools,
  scrapeTool,
  searchTool,
} from "@scrapegraph-ai/ai-sdk";

const { text } = await generateText({
  model: openai("gpt-5-nano"),
  prompt: "Search for ScrapeGraphAI docs, scrape the best page, and summarize it.",
  tools: {
    scrape: scrapeTool(),
    extract: extractTool(),
    search: searchTool(),
    ...crawlTools(),
    ...monitorTools(),
  },
  stopWhen: stepCountIs(10),
});

console.log(text);

Scrape example

This is the smallest useful agent: one scrape tool, a concrete target, and enough steps for the model to call the tool before answering.
import { openai } from "@ai-sdk/openai";
import { generateText, stepCountIs } from "ai";
import { scrapeTool } from "@scrapegraph-ai/ai-sdk";

const result = await generateText({
  model: openai("gpt-5-nano"),
  prompt: "Find the main headline on https://example.com",
  tools: {
    scrape: scrapeTool(),
  },
  stopWhen: stepCountIs(5),
});

console.log(result.text);
Pass an API key explicitly when your runtime does not expose environment variables:
const tools = {
  scrape: scrapeTool({ apiKey: process.env.SGAI_API_KEY }),
};

Crawl example

crawlTools() gives the model the full async crawl loop: start the job, poll status with getCrawl, then retrieve paginated pages with getCrawlPages.
import { openai } from "@ai-sdk/openai";
import { generateText, stepCountIs } from "ai";
import { crawlTools } from "@scrapegraph-ai/ai-sdk";

const { text, steps } = await generateText({
  model: openai("gpt-5-nano"),
  prompt:
    "Find 10 https://scrapegraphai.com/ blog posts. Start a crawl, poll its status, fetch crawled pages with getCrawlPages, then summarize what you found.",
  tools: {
    ...crawlTools(),
  },
  stopWhen: stepCountIs(20),
});

for (const step of steps) {
  for (const toolCall of step.toolCalls) {
    console.log(`[tool] ${toolCall.toolName}`);
    console.log(JSON.stringify(toolCall.input, null, 2));
  }
}

console.log(text);
For longer crawls, keep the same tools but add your app’s own timeout, cancellation, and persistence around the AI SDK call.

Tool reference

Scrape

import { scrapeTool } from "@scrapegraph-ai/ai-sdk";

const tools = {
  scrape: scrapeTool(),
};

Extract

import { extractTool } from "@scrapegraph-ai/ai-sdk";

const tools = {
  extract: extractTool(),
};
import { searchTool } from "@scrapegraph-ai/ai-sdk";

const tools = {
  search: searchTool(),
};

Crawl

import { crawlTools } from "@scrapegraph-ai/ai-sdk";

const tools = {
  ...crawlTools(),
};
crawlTools() registers startCrawl, getCrawl, getCrawlPages, stopCrawl, resumeCrawl, and deleteCrawl.

Monitor

import { monitorTools } from "@scrapegraph-ai/ai-sdk";

const tools = {
  ...monitorTools(),
};
monitorTools() registers createMonitor, listMonitors, getMonitor, updateMonitor, deleteMonitor, pauseMonitor, resumeMonitor, and getMonitorActivity.

Support

GitHub Issues

Report bugs and request features

Discord Community

Get help from our community