Tips and tricks for AI extractor usage
This page contains useful information about AI extractor usage.
AI extractor best practices
Specify web scraping parameters
AI extractor is a powerful tool that allows extracting data from any web page. But it's important to specify the web scraping parameters to get the best results.
Since AI extractor is based on ScrapingAnt's web scraper technology, it's important to specify the same parameters as for the general endpoint. This way it would be possible to get the best results, enable or disable browser rendering or use all the possible proxy types.
It's also crucial to be familiar with possible web scraping parameters for avoiding anti-scraping detections and geo-blocking.
Specify the data you want to extract
AI extractor is based on the state-of-the-art AI technology that extracts the data from the page using the AI model, so it's important to be as specific as possible.
For example, with e-commerce websites, the price could be specified as a free-form text, but it's better to specify the price format, currency, and other details. Since the
extract_properties parameter is a free-form text, it's possible to specify any details about the data you want to extract.
By appending the data type to the property name, you can specify the data format. For example:
product title, price(number), full description
So the JSON output will contain the
price property with the
number value type.
Still, there are no any limitations for the appendix, so you can specify any details about the data you want to extract. For example:
product title, price(number), full description, reviews(list: review title, review content)
This way the JSON output will contain the
reviews array, and each item of the array will contain the
You can virtually use any structures to describe the data you want to extract, as the AI model is not limited in nested structures.
Verify the data input
It's possible that AI model can't find the needed data inside the web page, so it would fill the output properties with
For the processing data verification it's possible to use the
text field from the extended endpoint response for the same URL. AI extractor uses the same web scraper technology as the general endpoint, so the
text field contains input text for the AI model.
URLs and HTML-related data extraction
AI extractor is unable to process URLs and HTML tags if they are part of the layout, but not a part of the text. If the URL is present in the web page as a text it would be possible to extract it. Still, AI extractor is in the active development, so some of the limitations would be removed in the future.
AI extractor cost
AI extractor cost is calculated based on the text amount in the original web page and the output text amount.
The pricing can be found on the API credits pricing page.
Each successful response contains the
Ant-credits-cost header that shows the amount of credits that were spent for the request. All the requests that return the error response don't cost any credits.