Headless browser (Javascript Rendering)
How to use headless browser for web scraping
Headless browser (JS rendering) scraping
ScrapingAnt provides users with the ability to perform scraping using a browser.
This means that for every scraping request, a real browser will be opened with a web page. After the web page is fully loaded in the web browser, ScrapingAnt will extract the HTML content of the page, cookies and return them to the user.
This technology allows you to extract data from SPA (Single-Page Applications), sites using Ajax technology and any other dynamic web sites. Also, using a web browser allows you to bypass anti-scraping protections such as Cloudflare.
For enabling JS rendering, browser
boolean parameter should be set to true
(default value).
Example:
https://api.scrapingant.com/v2/general?url=https%3A%2F%2Fexample.com
Using browser without JS rendering
ScrapingAnt also provides the ability to perform scraping with browser behavior, but without JS rendering. It allows better performance and lower detection rates.
To enable this mode, browser
boolean parameter should be set to true
and return_page_soure
parameter should be set to true
.
When enabled, ScrapingAnt will use real browser to get the needed data, but will not proceed with JS rendering or any other browser-specific actions. It will just return the raw HTTP response data (usually HTML content) of the page.
Browser-specific parameters
Below you can find a list of the parameters that are only applicable for the browser-enabled scraping:
js_snippet
wait_for_selector
return_page_source
Scraping without JS rendering
ScrapingAnt also provides the ability to perform scraping without using a browser.
This approach allows you to extract data from static websites while bypassing rate limiting.
For disabling JS rendering, browser
boolean parameter should be set to false
.
Example:
https://api.scrapingant.com/v2/general?url=https%3A%2F%2Fexample.com&browser=false