Diffbot

artificial intelligenceAPI_KEY35 actions

Diffbot provides AI-powered tools to extract and structure data from web pages, transforming unstructured web content into structured, linked data.

Connect Diffbot

Connect Now Book a Demo

Categoryartificial intelligence

AuthAPI_KEY

Actions35

About

What is Diffbot?

Diffbot provides AI-powered tools to extract and structure data from web pages, transforming unstructured web content into structured, linked data. On Nagent, Diffbot is exposed as a fully-configurable artificial intelligence integration that any agent can call — 35 actions, and API key authentication. No code is required to wire Diffbot into your workflow — connect it once via the External Integrations panel and reuse it across every agent you build.

What can you build with Diffbot?

Agent builders use Diffbot to automate the kinds of tasks artificial intelligence teams previously handled manually. Concrete examples — each one is a single agent step in Nagent — include:

Combine Entity Profiles — Combine multiple entity profiles into a unified view using the Diffbot Knowledge Graph.
Create Bulk Extract Job — Tool to submit a bulk extract job to process multiple URLs with Extract APIs.
Create or Update Custom API — Tool to create or update the parameters and ruleset of a Custom API.
Create Bulk Enhance Job — Tool to submit a bulk enhance job to enrich multiple entities asynchronously.
Delete Custom API — Tool to delete custom API definitions for a given URL pattern.
Delete KG Enhance Bulkjob — Tool to delete an Enhance Bulkjob.

Every action and trigger is paired with a structured input/output schema (visible in the sections below), so when you wire Diffbot into Helix — our agentic agent builder — the editor knows exactly what each step expects and produces. Configure once, deploy anywhere across your Nagent agents.

What You Can Do

Actions (35)

Every operation an agent can call against Diffbot, with input parameters and output schema. Drop these into any step of an agent built in Helix.

Combine Entity ProfilesDIFFBOT_COMBINE_ENTITY_PROFILES

Combine multiple entity profiles into a unified view using the Diffbot Knowledge Graph. Returns enhanced person or organization data by matching on identifying attributes like name, email, employer, or URL. Use this to enrich partial entity data, merge duplicate profiles, or verify entity identity.

Input parameters

PropType

ip?string

Optional

IP address of the entity to enhance. Can be used with types Person and Organization.

url?array

Optional

Origin or homepage URI(s) of entity to enhance. Can be used with types Person and Organization.

name?array

Optional

Name(s) of the entity to enhance. Can be used with types Person and Organization. Multiple names can be provided for better matching.

type?string ("Person")

Optional

Valid entity types for the combine API.

email?string

Optional

Email address of the entity to enhance. Can be used only with type Person.

phone?string

Optional

Phone number of the entity to enhance. Can be used with types Person and Organization.

title?string

Optional

Job title of the entity to enhance. Can be used only with type Person.

filter?string

Optional

Semi-colon separated path filter to include specific fields in response JSON. Use dot notation (e.g., 'skills.name') or JsonPath expression (e.g., '$.name;$.locations.country.name').

school?string

Optional

School or educational institution of the entity to enhance. Can be used only with type Person.

search?boolean

Optional

When true, Diffbot will attempt to search the web for origins and merge relevant results with what's found in the Knowledge Graph.

refresh?boolean

Optional

When true, Diffbot will attempt to recrawl all origins of the identified entity and reconstruct the entity from refreshed data.

customId?string

Optional

User-defined ID for correlation and tracking purposes.

employer?string

Optional

Employer of the entity to enhance. Can be used only with type Person.

jsonmode?string (" " | "extended")

Optional

JSON mode options for response formatting.

location?string

Optional

Location of the entity to enhance. Can be used with types Person and Organization.

threshold?number

Optional

Enhance similarity threshold score (0.0 to 1.0). Higher values require stronger match confidence.

filterExclude?string

Optional

Semi-colon separated path filter to exclude specific fields from response JSON. Use dot notation or JsonPath expression.

nonCanonicalFacts?boolean

Optional

When true, returns non-canonical facts in addition to canonical ones.

Output

PropType

datastring

Required

Data from the action execution

error?string

Optional

Error if any occurred during the execution of the action

successfulboolean

Required

Whether or not the action execution was successful or not

Create Bulk Extract JobDIFFBOT_CREATE_BULK

Tool to submit a bulk extract job to process multiple URLs with Extract APIs. Use when you need to process many URLs asynchronously using any Extract API. The job will process URLs in the background and provide downloadable results.

Input parameters

PropType

namestring

Required

Job name for identification. Must be unique.

urlsstring

Required

URLs to process. Can be a single URL or comma-separated list of URLs.

apiUrlstring

Required

Extract API endpoint URL to use for processing (e.g., 'https://api.diffbot.com/v3/article'). The token will be added automatically.

notifyEmail?string

Optional

Email address to notify when the job completes.

notifyWebhook?string

Optional

Webhook URL to POST to when the job completes.

Output

PropType

datastring

Required

Data from the action execution

error?string

Optional

Error if any occurred during the execution of the action

successfulboolean

Required

Whether or not the action execution was successful or not

Create or Update Custom APIDIFFBOT_CREATE_CUSTOM_API

Tool to create or update the parameters and ruleset of a Custom API. Use this when you need to define custom extraction rules for specific websites that require tailored parsing logic beyond standard Diffbot APIs. Allows defining URL patterns, CSS selectors, extraction rules, and preprocessing filters to extract structured data from websites with unique layouts.

Input parameters

PropType

apistring

Required

The specific API being targeted. Always precede the API name with "/api/" as in "/api/article" (except for "all")

notes?array

Optional

An array of strings that can be added manually. The API automatically adds a note specifying when the API was last updated

rules?array

Optional

An array of objects that defines a set of rules for the specific urlPattern-api combination

testUrl?string

Optional

A URL that can be used to check that the rule still works as intended. This is the page that will load automatically when editing the ruleset in the Dashboard UI

useProxy?string

Optional

Used to disable proxies (when they have been set globally), by applying the value "none"

prefilters?array

Optional

An array of CSS selector strings that should be omitted from the DOM before extraction occurs

urlPatternstring

Required

A regex pattern that defines the URLs for which the ruleset will be applied

renderOptions?string

Optional

Rendering options for the page (e.g., 'mobile', 'desktop')

xForwardHeaders?object

Optional

X-Forward headers configuration.

Output

PropType

datastring

Required

Data from the action execution

error?string

Optional

Error if any occurred during the execution of the action

successfulboolean

Required

Whether or not the action execution was successful or not

Create Bulk Enhance JobDIFFBOT_CREATE_KG_BULK_ENHANCE

Tool to submit a bulk enhance job to enrich multiple entities asynchronously. Use when you need to process many Person or Organization records in batch. The API accepts entity descriptions and returns enriched data from the Diffbot Knowledge Graph.

Input parameters

PropType

name?string

Optional

Human-readable name for the bulk job to help identify it later.

size?integer

Optional

Maximum number of results to return per entity. Default is 1.

search?boolean

Optional

If true, Diffbot will search the web for additional origins and merge results. Default is false.

refresh?boolean

Optional

If true, Diffbot will recrawl all origins and reconstruct entities from fresh data. Default is false.

entitiesarray

Required

List of entities to enhance. Each entity can be a Person, Organization, or a reference by ID. Minimum 1 entity required.

jsonmode?string (" " | "extended")

Optional

JSON mode options for enhance results.

threshold?number

Optional

Similarity threshold for matching entities (0.0 to 1.0). Higher values require closer matches.

webhookurl?string

Optional

Webhook URL to receive notifications when the job completes.

nonCanonicalFacts?boolean

Optional

If true, returns non-canonical facts in results. Default is false.

Output

PropType

datastring

Required

Data from the action execution

error?string

Optional

Error if any occurred during the execution of the action

successfulboolean

Required

Whether or not the action execution was successful or not

Delete Custom APIDIFFBOT_DELETE_CUSTOM_API

Tool to delete custom API definitions for a given URL pattern. Removes custom extraction rules from your account. Use when you need to remove previously configured custom APIs.

Input parameters

PropType

apistring

Required

Base API of the custom API to delete. Always precede the API name with '/api/' (e.g., '/api/article')

urlPatternstring

Required

URL pattern (regex) of the custom API to delete. This defines which URLs the custom API applies to.

Output

PropType

datastring

Required

Data from the action execution

error?string

Optional

Error if any occurred during the execution of the action

successfulboolean

Required

Whether or not the action execution was successful or not

Delete KG Enhance BulkjobDIFFBOT_DELETE_KG_ENHANCE_BULKJOB

Tool to delete an Enhance Bulkjob. Removes the bulk job and its results from the system. Use when cleaning up completed or failed jobs.

Input parameters

PropType

bulkjobIdstring

Required

Enhance Bulkjob ID to delete (e.g., 'B-a6a72339-3af7')

Output

PropType

datastring

Required

Data from the action execution

error?string

Optional

Error if any occurred during the execution of the action

successfulboolean

Required

Whether or not the action execution was successful or not

Download Bulk Job ResultsDIFFBOT_DOWNLOAD_BULK_RESULTS

Tool to download results of a bulk enhance job with filtering options via POST request. Use this to retrieve processed results from a completed or running bulk job. Supports multiple export formats (json, jsonl, csv, xls, xlsx) and various filtering options to customize the output. HTTP 200 indicates results are ready, HTTP 201 means the job is still executing.

Input parameters

PropType

from?integer

Optional

Starting index for pagination (offset). Should be used together with 'size' parameter.

head?integer

Optional

Return first n results. Use this to limit the number of results returned.

size?integer

Optional

Maximum number of results to return. Should be specified with 'from' parameter for pagination.

wait?integer

Optional

Seconds to wait for bulkjob results to export. Results will continue to export in the background. Use 0 to only trigger an export without waiting.

filter?string

Optional

Semi-colon separated path filter to filter response json. You can use simple dot notation like 'skills.name' or JsonPath expressions like '$.name;$.locations.country.name'.

format?string ("json" | "jsonl" | "csv" | "xls" | "xlsx")

Optional

Export format options for bulk results.

bulkjobIdstring

Required

Enhance Bulkjob ID (e.g., 'B-89cfc3b2-e744'). This is the unique identifier of the bulk job whose results you want to download.

exportfile?string

Optional

File name of the export file. Specify a custom filename for the exported results.

exportspec?string

Optional

The spec defines the columns to export. This is applicable for csv, xls and xlsx formats. Simple spec looks like 'name;summary'. For complex specs including list handling, see Diffbot documentation.

exportquery?boolean

Optional

Prefixes the enhance query parameters to the CSV export result. Only applicable for CSV exports.

onlyMatches?boolean

Optional

Return only records that have a match. Use this to filter out records without matches.

filterExclude?string

Optional

Semi-colon separated path filter to exclude data from response json. You can use simple dot notation or JsonPath expressions.

exportseparator?string

Optional

Separator for multi-value fields when exporting columnar results (csv, xls, xlsx formats).

Output

PropType

datastring

Required

Data from the action execution

error?string

Optional

Error if any occurred during the execution of the action

successfulboolean

Required

Whether or not the action execution was successful or not

Enhance Entity with Knowledge GraphDIFFBOT_ENHANCE_ENTITY

Enrich a person or organization with comprehensive data from the Diffbot Knowledge Graph. Provide identifiers like name, email, employer, or URL and receive detailed entity information including employment history, education, location, skills, and more. Use when you need to gather all publicly available knowledge about a specific person or organization from billions of web pages.

Input parameters

PropType

id?string

Optional

DiffbotId of entity to enhance. Can be used with types Person and Organization. If you know the exact entity ID, this is the most precise identifier

ip?string

Optional

IP address of the entity to enhance. Can be used with types Person and Organization

url?array

Optional

Origin or homepage URI of entity to enhance. Can be used with types Person and Organization. Provide multiple URLs associated with the entity

name?array

Optional

Name of the entity to enhance. Can be used with types Person and Organization. Provide multiple name variations to increase match accuracy

size?integer

Optional

Maximum number of results to return (default=1). Set higher to get multiple potential matches

type?string ("Person" | "Organization")

Optional

Type of entity to enhance.

email?array

Optional

Email address(es) of the entity to enhance. Can be used only with type Person. Provide multiple emails if available

phone?string

Optional

Phone number of the entity to enhance. Can be used with types Person and Organization

title?string

Optional

Job title of the entity to enhance. Can be used only with type Person

filter?string

Optional

Semi-colon separated path filter to include only specific fields in response JSON. Use dot notation (e.g., 'skills.name') or JsonPath expressions (e.g., '$.name;$.locations.country.name'). Reduces response size

school?string

Optional

School or educational institution of the entity to enhance. Can be used only with type Person

search?boolean

Optional

If true, Diffbot will attempt to search the web for origins for the search query and merge relevant results with what's found in the KG (default=false). Useful when entity might not be in the KG yet

refresh?boolean

Optional

If true, Diffbot will attempt to recrawl all origins of the identified entity and reconstruct the entity from refreshed data (default=false). This provides the most up-to-date information but takes longer

customId?string

Optional

User-defined ID for correlation and tracking purposes. Will be returned in the response for request matching

employer?string

Optional

Employer of the entity to enhance. Can be used only with type Person. Helps identify the correct person when names are common

jsonmode?string (" " | "extended")

Optional

JSON mode for response formatting.

location?string

Optional

Location of the entity to enhance. Can be used with types Person and Organization. Can be city, state, country, or full address

threshold?number

Optional

Enhance similarity threshold (0.0 to 1.0). Only return matches with similarity score above this threshold. Higher values return fewer but more confident matches

description?string

Optional

Description of the entity to enhance. Can be used with types Person and Organization. Helps identify the correct entity

filterExclude?string

Optional

Semi-colon separated path filter to exclude specific fields from response JSON. Use dot notation or JsonPath expressions. Useful to remove verbose fields

nonCanonicalFacts?boolean

Optional

If true, returns non-canonical facts in addition to canonical facts (default=false). Non-canonical facts are alternative values found across multiple sources

Output

PropType

datastring

Required

Data from the action execution

error?string

Optional

Error if any occurred during the execution of the action

successfulboolean

Required

Whether or not the action execution was successful or not

Diffbot Extract JobDIFFBOT_EXTRACT_JOB

Tool to extract structured job posting data from job listing pages. Returns job title, company, location, salary, requirements, skills, and other job-related information. Use when you need to parse and structure data from job postings.

Input parameters

PropType

urlstring

Required

Target URL of the job listing page to extract structured data from

proxy?string

Optional

IP address of a custom proxy that will be used to fetch the target page. Leave empty to use default proxy

fields?string

Optional

Comma-separated list of optional fields to be returned from any fully-extracted pages (e.g. 'querystring,links'). Valid values: links, extlinks, meta, querystring, breadcrumb

timeout?integer

Optional

Maximum time in milliseconds to wait for the retrieval/fetch of content from the requested URL. Default is 30000 (30 seconds)

useProxy?string

Optional

Set to 'default' to use Diffbot's datacenter proxy for this request. Set to 'none' to instruct Extract to not use proxies

proxyAuth?string

Optional

Authentication parameters for the custom proxy specified in the proxy parameter (format: username:password)

Output

PropType

datastring

Required

Data from the action execution

error?string

Optional

Error if any occurred during the execution of the action

successfulboolean

Required

Whether or not the action execution was successful or not

Diffbot Extract ListDIFFBOT_EXTRACT_LIST

Tool to extract structured data from list-style pages like news indexes, product listings, and directory pages. Returns an array of items with their titles, links, and descriptions. Use when you need to extract multiple items from a page organized as a list or index.

Input parameters

PropType

urlstring

Required

Target URL to extract list data from. Must be a valid URL starting with http or https.

proxy?string

Optional

Specify an IP address of a custom proxy that will be used to fetch the target page.

fields?string

Optional

Comma-separated list of optional fields to be returned from any fully-extracted pages (e.g., 'links,meta,querystring'). Valid values: links, extlinks, meta, querystring, breadcrumb.

timeout?integer

Optional

Sets a value in milliseconds to wait for the retrieval/fetch of content from the requested URL. The default timeout for the third-party response is 30 seconds (30000).

useProxy?string

Optional

Set to 'default' to use Diffbot's datacenter proxy for this request. 'none' will instruct Extract to not use proxies, even if proxies have been enabled for this particular URL globally.

proxyAuth?string

Optional

Used to specify the authentication parameters that will be used with a custom proxy specified in the &proxy parameter.

Output

PropType

datastring

Required

Data from the action execution

error?string

Optional

Error if any occurred during the execution of the action

successfulboolean

Required

Whether or not the action execution was successful or not

Get Diffbot Account DetailsDIFFBOT_GET_ACCOUNT

Retrieves comprehensive Diffbot account information including subscription plan details, credit balance, usage history, and account status. Returns account holder name, email, current plan, available credits, and daily usage statistics for the past 31 days. Use this to check your account's credit balance, monitor API usage patterns, verify account status, or retrieve account metadata.

Output

PropType

datastring

Required

Data from the action execution

error?string

Optional

Error if any occurred during the execution of the action

successfulboolean

Required

Whether or not the action execution was successful or not

Diffbot AnalyzeDIFFBOT_GET_ANALYZE

Automatically analyzes a web page to determine its type and extract structured data. The Analyze API intelligently classifies pages into types (article, product, discussion, image, video, organization, etc.) and extracts relevant structured data. Use this when you need to process URLs of unknown type or want automatic extraction without specifying the page type in advance.

Input parameters

PropType

urlstring

Required

The full URL of the page to analyze, including http:// or https://

mode?string

Optional

Restrict extraction to a specific page type. Options: article, product, discussion, image, video, list, event. If not specified, all types are considered.

fields?string

Optional

Comma-separated list of additional fields to include or limit output fields. Options include: links, extlinks, meta, querystring, breadcrumb, or specific field names to limit output.

timeout?integer

Optional

Maximum time (in milliseconds) to wait for API response. Default is 30000ms (30 seconds). Maximum is 300000ms (5 minutes).

fallback?string

Optional

API to use if page type cannot be determined. Options: article, product, discussion, image, video.

discussion?boolean

Optional

Set to false to disable automatic extraction of comments/discussions from the page. Default is true.

Output

PropType

datastring

Required

Data from the action execution

error?string

Optional

Error if any occurred during the execution of the action

successfulboolean

Required

Whether or not the action execution was successful or not

Get Article DataDIFFBOT_GET_ARTICLE

Tool to extract information from articles, including authors, publication dates, and images. Use when you need structured metadata from a web article URL.

Input parameters

PropType

urlstring

Required

Full URL of the web page to analyze, must start with http or https

mode?string

Optional

Extraction mode override (defaults to 'article')

stats?boolean

Optional

Whether to include statistics like word count

fields?string

Optional

List of specific fields to include in the response. If provided, only these fields are returned.

paging?string

Optional

Paging token for multi-page articles (returned in previous response)

timeout?integer

Optional

Maximum time in milliseconds to wait for page rendering

discussion?boolean

Optional

Whether to include discussion/comment data in the response

Output

PropType

datastring

Required

Data from the action execution

error?string

Optional

Error if any occurred during the execution of the action

successfulboolean

Required

Whether or not the action execution was successful or not

Get Bulk Job DataDIFFBOT_GET_BULK_DATA

Tool to download extracted results from a completed bulk job. Use after a bulk job has finished processing to retrieve the data. Supports JSON and CSV formats.

Input parameters

PropType

num?integer

Optional

Limit results to the N most recently processed URLs. Useful for testing or sampling large result sets.

namestring

Required

Name of the bulk job whose results you want to download. Must match the job name specified when the job was created.

type?string ("data" | "urls")

Optional

Type of data to retrieve.

format?string ("json" | "csv")

Optional

Output format for bulk job data.

Output

PropType

datastring

Required

Data from the action execution

error?string

Optional

Error if any occurred during the execution of the action

successfulboolean

Required

Whether or not the action execution was successful or not

Get Bulk Job StatusDIFFBOT_GET_BULK_JOB_STATUS

Tool to poll the status of a specific Diffbot Knowledge Graph Enhance bulk job. Use when you need to check the progress, completion status, or details of a bulk enhancement job.

Input parameters

PropType

bulkjobIdstring

Required

Enhance Bulkjob ID to poll status for

Output

PropType

datastring

Required

Data from the action execution

error?string

Optional

Error if any occurred during the execution of the action

successfulboolean

Required

Whether or not the action execution was successful or not

Get Bulk Job ResultsDIFFBOT_GET_BULK_RESULTS

Tool to download the results of a completed Enhance Bulkjob. Returns enriched records from the bulk job. Use after a bulk enhance job has completed processing.

Input parameters

PropType

from?integer

Optional

Starting index for pagination (use with size parameter)

head?integer

Optional

Return first n results

size?integer

Optional

Maximum number of results to return. Should be specified with from parameter. Use -1 for all results.

wait?integer

Optional

Seconds to wait for bulkjob results to export. Results will continue to export in the background. Use 0 to only trigger an export without waiting.

filter?string

Optional

Semi-colon separated path filter to filter response json. Use dot notation like 'skills.name' or JsonPath expressions like '$.name;$.locations.country.name'.

format?string ("json" | "jsonl" | "csv" | "xls" | "xlsx")

Optional

Export format options.

bulkjobIdstring

Required

Enhance Bulkjob ID to retrieve results for

exportfile?string

Optional

File name of the export file

exportspec?string

Optional

Defines the columns to export for csv, xls and xlsx formats. Use semi-colon separated field paths like 'name;summary' or JsonPath expressions. See Diffbot documentation for advanced syntax.

exportquery?string ("true" | "false")

Optional

Enum for exportquery parameter.

onlyMatches?string ("true" | "false")

Optional

Enum for onlyMatches parameter.

filterExclude?string

Optional

Semi-colon separated path filter to exclude data from response json. Use dot notation or JsonPath expressions.

exportseparator?string

Optional

Separator for multi-value fields when exporting columnar results

Output

PropType

datastring

Required

Data from the action execution

error?string

Optional

Error if any occurred during the execution of the action

successfulboolean

Required

Whether or not the action execution was successful or not

Get Bulk Single ResultDIFFBOT_GET_BULK_SINGLE_RESULT

Tool to download the result of a single job within a Diffbot bulk enhance job. Returns enriched entity data for a specific input record by its index. Use after a bulk enhance job has completed to retrieve individual results without downloading the entire dataset.

Input parameters

PropType

job_idxinteger

Required

Job index within the bulkjob (0-based index of the input record)

bulkjob_idstring

Required

Enhance Bulkjob ID (e.g., 'B-1ff60452-8421')

Output

PropType

datastring

Required

Data from the action execution

error?string

Optional

Error if any occurred during the execution of the action

successfulboolean

Required

Whether or not the action execution was successful or not

Get Crawl DataDIFFBOT_GET_CRAWL_DATA

Download extracted results from a completed crawl job. Returns all structured data extracted during crawl processing (articles, products, etc.). Use after a crawl job has completed to retrieve the collected data.

Input parameters

PropType

num?integer

Optional

Maximum number of results to return. Omit to return all available results.

namestring

Required

Name of the crawl job to retrieve data from. Must be an existing crawl job created via Start Crawl.

format?string ("json" | "csv")

Optional

Output format for crawl data.

Output

PropType

datastring

Required

Data from the action execution

error?string

Optional

Error if any occurred during the execution of the action

successfulboolean

Required

Whether or not the action execution was successful or not

Get Discussion ThreadDIFFBOT_GET_DISCUSSION

Extract structured discussion threads from web pages including forums, comment sections, product reviews, Reddit discussions, and blog comments. Returns posts with author info, timestamps, content, and hierarchical relationships. Useful for analyzing conversations, gathering feedback, or monitoring discussions. Supported platforms: Native comment systems, Disqus, Facebook Comments, Reddit, forum software, and more. Use this when you need to: - Extract all comments/posts from a discussion thread - Analyze user feedback or reviews - Monitor forum discussions or social media threads - Gather structured conversation data with metadata

Input parameters

PropType

urlstring

Required

The URL of the discussion page to process (e.g., forum thread, Reddit discussion, product review page, or comment section).

fields?string

Optional

Comma-separated list of additional fields to include in the response (e.g., 'sentiment', 'links', 'meta', 'breadcrumb').

timeout?integer

Optional

Maximum time in milliseconds to wait for the response. Default is 30000 (30 seconds).

maxPages?string

Optional

Maximum number of pages to concatenate. Default is 1 (no concatenation). Set to 'all' to retrieve all pages in the thread. Each page counts as a separate API call.

norender?boolean

Optional

Whether to disable full page rendering. Set to True for faster responses with potentially lower extraction quality. Default is False.

discussion?boolean

Optional

Whether to extract comments/reviews. Set to False to disable comment extraction for faster response times. Default is True.

Output

PropType

datastring

Required

Data from the action execution

error?string

Optional

Error if any occurred during the execution of the action

successfulboolean

Required

Whether or not the action execution was successful or not

Diffbot Get EventDIFFBOT_GET_EVENT

Tool to extract event details from web pages. Use when you need structured event data such as venue, date, and description.

Input parameters

PropType

urlstring

Required

URL of the event page to analyze

fields?string

Optional

Comma-separated list of fields to return, e.g., title,date,location

paging?boolean

Optional

Enable automatic paging of results

timeout?integer

Optional

Maximum timeout in milliseconds for the API call

Output

PropType

datastring

Required

Data from the action execution

error?string

Optional

Error if any occurred during the execution of the action

successfulboolean

Required

Whether or not the action execution was successful or not

Diffbot Get ImageDIFFBOT_GET_IMAGE

Tool to extract detailed information about images, including dimensions and recognition data. Use after confirming the image URL is publicly accessible.

Input parameters

PropType

urlstring

Required

Publicly-accessible URL of the image to analyze

fields?string

Optional

Comma-separated list or array of specific fields to include in response, e.g., 'naturalWidth','captions'

paging?boolean

Optional

Whether to include paging information for multi-image responses

timeout?integer

Optional

Maximum time to wait for API response, in milliseconds

Output

PropType

datastring

Required

Data from the action execution

error?string

Optional

Error if any occurred during the execution of the action

successfulboolean

Required

Whether or not the action execution was successful or not

Get KG Coverage Report by IDDIFFBOT_GET_KG_COVERAGE_REPORT_BY_ID

Download Knowledge Graph coverage report by report ID. Returns detailed CSV coverage statistics showing field presence across query results. Use this after generating a coverage report from a DQL query to retrieve the statistical breakdown of field coverage.

Input parameters

PropType

idstring

Required

Report ID to retrieve. Format: C-<hash> (e.g., 'C-a0b39dad-68bf'). Reports may expire quickly, so use immediately after generation.

Output

PropType

datastring

Required

Data from the action execution

error?string

Optional

Error if any occurred during the execution of the action

successfulboolean

Required

Whether or not the action execution was successful or not

Diffbot Get ProductDIFFBOT_GET_PRODUCT

Tool to extract product information such as specifications, prices, availability, and reviews. Use when you need structured product data including specs, pricing, and reviews.

Input parameters

PropType

urlstring

Required

URL of the product page to analyze

mode?string

Optional

Extraction mode override (defaults to 'product')

fields?array

Optional

List of fields to return, e.g., title,offerPrice,images

paging?boolean

Optional

Enable automatic paging of results

timeout?integer

Optional

Maximum timeout in milliseconds for the API call

discussion?boolean

Optional

Include discussions/comments in the response

Output

PropType

datastring

Required

Data from the action execution

error?string

Optional

Error if any occurred during the execution of the action

successfulboolean

Required

Whether or not the action execution was successful or not

Get Video DataDIFFBOT_GET_VIDEO

Tool to extract information from videos, including titles, descriptions, and embedded HTML. Use when you need structured video metadata from any web page.

Input parameters

PropType

urlstring

Required

Full URL of the web page to analyze for embedded videos, must start with http or https

mode?string

Optional

Extraction mode override (e.g., 'auto')

fields?string

Optional

Comma-separated list or array of optional fields to include in the response (e.g., 'links', 'meta', 'querystring', 'breadcrumb'). Standard fields are always returned.

paging?boolean

Optional

Whether to return all detected results in one call (may increase runtime)

timeout?integer

Optional

Maximum time in milliseconds to wait for extraction

callback?string

Optional

Name of the JSONP callback function (if using JSONP)

fallback?boolean

Optional

Whether to try an alternate extraction method if the primary fails

discussion?boolean

Optional

Include user discussion data (comments) if available

Output

PropType

datastring

Required

Data from the action execution

error?string

Optional

Error if any occurred during the execution of the action

successfulboolean

Required

Whether or not the action execution was successful or not

List Bulk JobsDIFFBOT_LIST_BULK_JOBS

Tool to list all Bulk jobs associated with a specific token. Use after authenticating to retrieve statuses of all jobs for the account.

Output

PropType

datastring

Required

Data from the action execution

error?string

Optional

Error if any occurred during the execution of the action

successfulboolean

Required

Whether or not the action execution was successful or not

List Bulk Jobs Status For TokenDIFFBOT_LIST_BULK_JOBS_STATUS_FOR_TOKEN

Tool to get the status of all bulk enhance jobs for a token. Returns list of all bulk jobs associated with your API token. Use when you need to monitor or retrieve the status of multiple bulk jobs at once.

Output

PropType

datastring

Required

Data from the action execution

error?string

Optional

Error if any occurred during the execution of the action

successfulboolean

Required

Whether or not the action execution was successful or not

List Custom APIsDIFFBOT_LIST_CUSTOM_APIS

Tool to retrieve all Custom APIs and their extraction rules currently defined on your Diffbot token. Use when you need to list, review, or audit custom API configurations for your account.

Output

PropType

datastring

Required

Data from the action execution

error?string

Optional

Error if any occurred during the execution of the action

successfulboolean

Required

Whether or not the action execution was successful or not

Manage Crawl JobDIFFBOT_MANAGE_CRAWL

Manages Diffbot crawl jobs: pause, restart, delete, or view status. Returns list of all active crawl jobs when called without parameters. Use 'name' parameter with action flags (pause=1, restart=1, delete=1) to control specific jobs.

Input parameters

PropType

name?string

Optional

Unique identifier of the crawl job to manage. Omit to list all active crawl jobs.

pause?integer

Optional

Set to 1 to pause the specified crawl job. Requires 'name' parameter.

delete?integer

Optional

Set to 1 to delete the specified crawl job. Requires 'name' parameter.

restart?integer

Optional

Set to 1 to restart the specified crawl job. Requires 'name' parameter.

Output

PropType

datastring

Required

Data from the action execution

error?string

Optional

Error if any occurred during the execution of the action

successfulboolean

Required

Whether or not the action execution was successful or not

Resolve Lost IDDIFFBOT_RESOLVE_LOST_ID

Tool to resolve lost IDs in the Knowledge Graph. Use when you need to map a lost identifier to its canonical counterpart for data consistency.

Input parameters

PropType

type?string

Optional

The type of object (e.g., 'article', 'product'). If omitted, Diffbot will attempt to infer.

lostIdstring

Required

The lost ID which needs to be resolved to a canonical ID.

Output

PropType

datastring

Required

Data from the action execution

error?string

Optional

Error if any occurred during the execution of the action

successfulboolean

Required

Whether or not the action execution was successful or not

Diffbot Knowledge Graph SearchDIFFBOT_SEARCH

Search the Diffbot Knowledge Graph using DQL (Diffbot Query Language). Query billions of entities including organizations, people, articles, products, and more. Use structured queries to filter by type, fields, and relationships.

Input parameters

PropType

col?string

Optional

Comma-separated list of custom crawl collections to search (default="all"). Only used when query_type="crawl".

size?integer

Optional

Maximum number of results to return (default=50). Use -1 to return all results. Constraint: from+size ≤ 10,000 for facet queries.

querystring

Required

DQL (Diffbot Query Language) query to search the Knowledge Graph. Use structured syntax to filter entities and documents.

filter?string

Optional

Path filter to include a specific field in the response JSON, using dot notation or JsonPath. Only a single path is accepted by the Diffbot API.

format?string

Optional

Output format. Only "json" is supported by this action. Non-JSON formats (jsonl, csv, xls, xlsx) cannot be processed and will be ignored.

offset?integer

Optional

Starting index for pagination (API param 'from'; default=0). Constraint: from+size ≤ 10,000 for facet queries.

jsonmode?string

Optional

JSON mode: "extended" (includes origin info) or "id" (returns diffbotIds only).

query_type?string

Optional

Type of query: "query" (default, structured DQL), "text" (free-text search), "queryTextFallback" (tries DQL first, falls back to text), or "crawl" (search custom crawl collections).

filter_exclude?string

Optional

Path filter to exclude a specific field from the response JSON. Only a single path is accepted by the Diffbot API.

non_canonical_facts?boolean

Optional

Include non-canonical facts in results (default=false).

Output

PropType

datastring

Required

Data from the action execution

error?string

Optional

Error if any occurred during the execution of the action

successfulboolean

Required

Whether or not the action execution was successful or not

Search Crawl Job DataDIFFBOT_SEARCH_CRAWL_DATA

Tool to query crawl job collections using DQL (Diffbot Query Language). Use when you need to search extracted data from completed crawl or bulk jobs by collection name.

Input parameters

PropType

colstring

Required

Name of the collection (Crawl or Bulk job name) to search

num?integer

Optional

Number of results to return per page (default varies by API)

querystring

Required

Search query string using Diffbot Query Language (DQL). Supports operators like type:Article, sortby:date, etc.

start?integer

Optional

Pagination offset - number of results to skip (default=0)

Output

PropType

datastring

Required

Data from the action execution

error?string

Optional

Error if any occurred during the execution of the action

successfulboolean

Required

Whether or not the action execution was successful or not

Start Bulk JobDIFFBOT_START_BULK

Tool to start a Bulk Extract job. Use when processing large numbers of URLs asynchronously. The Diffbot Bulk API uses GET requests with query parameters to create jobs.

Input parameters

PropType

namestring

Required

Unique job name for identification. Required by the Diffbot API.

urlsarray

Required

List of page URLs to process. URLs should be separated by whitespace in the API call.

apiUrlstring

Required

Diffbot Extract API endpoint to use (e.g., 'https://api.diffbot.com/v3/article'). Do NOT include the token - it will be added automatically.

maxToCrawl?integer

Optional

Maximum number of URLs to crawl.

notifyEmail?string

Optional

Email to notify when job completes.

maxToProcess?integer

Optional

Maximum number of URLs to process.

notifyWebhook?string

Optional

Webhook URL to POST on job completion.

Output

PropType

datastring

Required

Data from the action execution

error?string

Optional

Error if any occurred during the execution of the action

successfulboolean

Required

Whether or not the action execution was successful or not

Start Crawl JobDIFFBOT_START_CRAWL

Initiates a Diffbot crawl job that spiders a website starting from seed URLs and processes discovered pages with a specified Extract API. The crawler follows links within the domain, collects structured data (articles, products, etc.), and stores results for download. Use this to systematically extract data from entire websites or sections. Requires Diffbot Plus plan or higher.

Input parameters

PropType

namestring

Required

Unique identifier for the crawl job. Used to manage and retrieve the crawl.

seedsarray

Required

List of seed URLs where crawling will begin. URLs will be URL-encoded automatically.

apiUrlstring

Required

Full Diffbot Extract API endpoint URL to process crawled pages. Examples: 'https://api.diffbot.com/v3/article' for articles, 'https://api.diffbot.com/v3/product' for products, 'https://api.diffbot.com/v3/analyze' for automatic type detection.

repeat?number

Optional

Number of days between automatic crawl repeats. Use 7.0 for weekly, 1.0 for daily. Omit for one-time crawl.

crawlDelay?number

Optional

Delay in seconds between requests to the same IP address. Default is 0.25 seconds.

maxToCrawl?integer

Optional

Maximum number of pages to crawl/spider. Default is 100,000. Use -1 for unlimited.

obeyRobots?integer

Optional

Whether to respect robots.txt directives. 1 = obey (default), 0 = ignore.

notifyEmail?string

Optional

Email address to notify when the crawl completes.

maxToProcess?integer

Optional

Maximum number of pages to process with the Extract API. Default is 100,000. Use -1 for unlimited.

customHeaders?object

Optional

Custom HTTP headers to include in crawl requests (e.g., {'User-Agent': 'MyBot/1.0'}).

notifyWebhook?string

Optional

Webhook URL to POST to when the crawl completes.

Output

PropType

datastring

Required

Data from the action execution

error?string

Optional

Error if any occurred during the execution of the action

successfulboolean

Required

Whether or not the action execution was successful or not

Stop Bulk JobDIFFBOT_STOP_BULK_JOB

Tool to pause (stop) a running Bulk job. Pausing halts further processing of URLs while preserving existing progress. To resume, use the appropriate resume action. Specify the exact job name (case-sensitive) as provided when the job was created.

Input parameters

PropType

namestring

Required

Name of the Bulk job to pause/stop (as defined when creating the job)

Output

PropType

datastring

Required

Data from the action execution

error?string

Optional

Error if any occurred during the execution of the action

successfulboolean

Required

Whether or not the action execution was successful or not

Stop KG Bulk Job By IDDIFFBOT_STOP_KG_BULK_JOB_BY_ID

Tool to stop an active Knowledge Graph Enhance bulk job by its ID. Halts processing of a running KG bulk job immediately. Use when you need to stop a specific KG bulk job using its bulkjobId.

Input parameters

PropType

bulkjobIdstring

Required

The unique identifier of the Knowledge Graph Enhance bulk job to stop (e.g., 'B-1ff60452-8421')

Output

PropType

datastring

Required

Data from the action execution

error?string

Optional

Error if any occurred during the execution of the action

successfulboolean

Required

Whether or not the action execution was successful or not

On the Nagent Platform

Agents using Diffbot

No publicly available marketplace agent is found using this tool yet. There are 38 agents privately built on Nagent that already use Diffbot.

Build on Nagent

Build an agent that uses Diffbot

Connect Diffbot to any Nagent agent in minutes — no API key management, no boilerplate. Just configure and deploy.

Connect Now Book a Demo

Frequently Asked

Building with Diffbot on Nagent

The five questions agent builders ask before adopting a new integration.

How do I connect Diffbot to my Nagent agent?

Open the External Integrations panel inside Nagent (app.nagent.ai/externalIntegration), find Diffbot, and click "Connect Now." You'll authenticate with an API key — Nagent handles credential storage and refresh automatically. Once connected, Diffbot is available to any agent in your workspace.

Do I need to write code to use Diffbot?

No. Nagent provides no-code integration for every tool. Once Diffbot is connected, you configure its 35 actions directly in the agent builder UI — no API calls, no boilerplate, no schema management.

How do I configure Diffbot actions and triggers in Helix?

Helix — Nagent's agentic agent builder — lets you drop Diffbot steps into any workflow visually. Pick an action (e.g., one of those listed above), fill in the inputs (Helix knows the required vs. optional schema for each parameter), and connect it to upstream/downstream steps. Triggers run as the entry point of an agent, so when a Diffbot event fires, the agent kicks off automatically.

What input and output schemas does Diffbot support?

Every Diffbot action and trigger ships with a fully-typed schema — input parameters with name, type, required flag, and description, plus the output payload shape. The schemas are documented in the sections above. Helix uses these schemas to validate your configuration at build time and to type-check the data flowing between steps.

Can I extend Diffbot with custom logic?

Yes. While Diffbot ships with 35 pre-built artificial intelligence actions, you can layer custom logic around them inside Helix — pre/post-processing steps, conditional branches, retries, or stitching Diffbot together with other connected tools. For deeper customization, talk to our team about Nagent's Agentic AI Lab — forward-deployed engineers who build Diffbot-based workflows tailored to your business.

All tools & integrations