AI extractor best practices
Tips and tricks for AI extractor usage
This page contains useful information about AI extractor usage.
AI extractor best practices
Specify web scraping parameters
AI extractor is a powerful tool that allows extracting data from any web page. But it's important to specify the web scraping parameters to get the best results.
Since AI extractor is based on ScrapingAnt's web scraper technology, it's important to specify the same parameters as for the general endpoint. This way it would be possible to get the best results, enable or disable browser rendering or use all the possible proxy types.
It's also crucial to be familiar with possible web scraping parameters for avoiding anti-scraping detections and geo-blocking.
Specify the data you want to extract
AI extractor is based on the state-of-the-art AI technology that extracts the data from the page using the AI model, so it's important to be as specific as possible.
For example, with e-commerce websites, the price could be specified as a free-form text, but it's better to specify the price format, currency, and other details. Since the extract_properties
parameter is a free-form text, it's possible to specify any details about the data you want to extract.
By appending the data type to the property name, you can specify the data format. For example:
product title, price(number), full description
So the JSON output will contain the price
property with the number
value type.
Still, there are no any limitations for the appendix, so you can specify any details about the data you want to extract. For example:
product title, price(number), full description, reviews(list: review title, review content)
This way the JSON output will contain the reviews
array, and each item of the array will contain the reviewTitle
and reviewContent
properties.
You can virtually use any structures to describe the data you want to extract, as the AI model is not limited in nested structures.
Verify the data input
It's possible that AI model can't find the needed data inside the web page, so it would fill the output properties with null
.
For the processing data verification it's possible to use the Markdown endpoint response for the same URL. AI extractor uses the same web scraper technology as the general endpoint + Markdown transformation, so it contains input for the AI model.
HTML-related data extraction
AI extractor is unable to process HTML tags if they are part of the layout, but not a part of the content.
AI extractor cost
AI extractor cost is calculated based on the Markdown text amount in the original web page and the output text amount.
The pricing can be found on the API credits pricing page.
Each successful response contains the Ant-credits-cost
header that shows the amount of credits that were spent for the request. All the requests that return the error response don't cost any credits.