Version: v2

AI extractor best practices

Tips and tricks for AI extractor usage

This page contains useful information about AI extractor usage.

AI extractor best practices

Specify web scraping parameters

AI extractor is a powerful tool that allows extracting data from any web page. But it's important to specify the web scraping parameters to get the best results.

Since AI extractor is based on ScrapingAnt's web scraper technology, it's important to specify the same parameters as for the general endpoint. This way it would be possible to get the best results, enable or disable browser rendering or use all the possible proxy types.

It's also crucial to be familiar with possible web scraping parameters for avoiding anti-scraping detections and geo-blocking.

Specify the data you want to extract

AI extractor is based on the state-of-the-art AI technology that extracts the data from the page using the AI model, so it's important to be as specific as possible.

For example, with e-commerce websites, the price could be specified as a free-form text, but it's better to specify the price format, currency, and other details. Since the extract_properties parameter is a free-form text, it's possible to specify any details about the data you want to extract.

By appending the data type to the property name, you can specify the data format. For example:

product title, price(number), full description

So the JSON output will contain the price property with the number value type.

Still, there are no any limitations for the appendix, so you can specify any details about the data you want to extract. For example:

product title, price(number), full description, reviews(list: review title, review content)

This way the JSON output will contain the reviews array, and each item of the array will contain the reviewTitle and reviewContent properties.

You can virtually use any structures to describe the data you want to extract, as the AI model is not limited in nested structures.

Verify the data input

It's possible that AI model can't find the needed data inside the web page, so it would fill the output properties with null.

For the processing data verification it's possible to use the Markdown endpoint response for the same URL. AI extractor uses the same web scraper technology as the general endpoint + Markdown transformation, so it contains input for the AI model.

AI extractor is unable to process HTML tags if they are part of the layout, but not a part of the content.

AI extractor cost

AI extractor cost is calculated based on the Markdown text amount in the original web page and the output text amount.

The pricing can be found on the API credits pricing page.

Each successful response contains the Ant-credits-cost header that shows the amount of credits that were spent for the request. All the requests that return the error response don't cost any credits.

AI extractor best practices​

Specify web scraping parameters​

Specify the data you want to extract​

Verify the data input​

HTML-related data extraction​

AI extractor cost​

AI extractor best practices

Specify web scraping parameters

Specify the data you want to extract

Verify the data input

HTML-related data extraction

AI extractor cost