State-of-the-art AI-enabled data extraction
ScrapingAnt provides users with the ability to perform AI-enabled data extraction. This means that for every scraping request, ScrapingAnt will extract the data from the page into the structured JSON object using our AI technology.
The only thing you need to do is to specify the data you want to extract from the page. The extraction input is a free-form text that describes the data for extraction.
While processing, ScrapingAnt's AI model with convert the free-form input parameters into
camelCase JSON property names and extract the data from the page into the JSON object with the same structure.
Our state-of-the-art technology allows extracting the needed data from any web page, even if the page structure is changed. It also allows extracting data from pages with dynamic content, such as Single-Page Applications (SPA) using ScrapingAnt's cloud browser.
AI extractor parameters
AI extractor uses a separate AI-enabled endpoint:
It uses the same request structure as the general endpoint, but with additional
extract_properties parameter which is a free-form text that describes the data parameters for extraction.
The basic request to the AI extractor requires 3 parameters:
url- URL of the page to extract data from
x-api-key- ScrapingAnt API key
extract_properties- free-form text that describes the data you want to extract
In the common case we expect
extract_properties to be comma-separated list of the data you want to extract. For example:
product title, price, full description
Still, it's possible to extend your request with additional details as the input processing is handled by the AI model as well, so it could handle more sophisticated expressions. For example:
product title, price(number), full description, reviews(list: review title, review content)
As well, as all other API parameters,
extract_properties should be URL-encoded and sent to API using query parameter.
AI extractor request example
The simplest request that extracts the title and content of the web page:
curl --request GET \
This request uses the following
extract_properties parameter value:
The output of this request is the following JSON object:
"title": "Example Domain",
"content": "This domain is for use in illustrative examples in documents. You may use this domain in literature without prior coordination or asking for permission."
Our AI extractor uses free-form described parameters for JSON structure names and returns the extracted data in the JSON object with the same structure.
AI extractor cost
The AI extractor cost is calculated based on the number of characters in the original web page text and the number of output characters.
Learn more here: AI extractor cost
AI extractor temporary limitations
- AI extractor works only with extracted from the page text. It doesn't work with images, videos, etc.
- AI extractor is multi-language, but it works best when input parameters described in English for the proper JSON structure names.
- Nested JSON output structures are supported, but requires more sophisticated input parameters.
Check out the AI extractor best practices for more information.