Web Scraping APIs Guide: Features, Performance and Pricing

In recent years, the surge in web scraping activities has prompted the emergence of diverse APIs provided by proxy services and data collection firms.

This report delves into seven prominent vendors in the web scraping API landscape, analyzing their features, scraping capabilities, parsing efficiency, and cost-effectiveness.

Focusing on three key website categories—search engines, e-commerce platforms, and social media—we aim to provide insights into the evolving realm of web scraping APIs.

Evolution of Web Scraping APIs

Web scraping APIs act as remote web scrapers, accepting API requests with target URLs and optional parameters.

Behind the scenes, these APIs utilize proxies, headers, and even headless browsers to retrieve HTML content. Some advanced APIs employ AI vision and pattern recognition for sophisticated tasks.

Pricing models are often based on successful requests, ensuring predictability. However, some providers exhibit opaque pricing structures.

Key Insights

➡️ Data Output and Parsing:

Six out of seven APIs return raw HTML, with advanced parsers available for specific websites.
Google and Amazon are the most targeted websites, with Oxylabs offering a machine-learning model for parsing most e-commerce stores.

➡️ Data Transfer and Customization:

APIs transfer data over open connections, often acting as proxies for seamless integration.
Customization options include location selection, device specification, and custom headers.
Four APIs accept CSS selectors and three support browser interactions for dynamic scraping scenarios.

➡️ Performance and Reliability:

Performance tests reveal varying speeds, with some APIs excelling in Google and Amazon scraping.
Social media, especially GraphQL, proves challenging for many APIs.
Oxylabs, Smartproxy, and Bright Data emerge as the most reliable performers, boasting robust parsers.

➡️ Pricing Models:

Bright Data charges a uniform price for all features, while Oxylabs and Smartproxy differentiate prices by target group.
ScraperAPI and Zyte employ tiered pricing, with rates differing significantly based on the target website.

Participant Overview

We engaged with seven prominent companies offering web scraping APIs, including established names and proxy providers transitioning into this domain.

The participants willingly provided access to their APIs for scraping Google, Amazon, and a social media network.

Participant Snapshot

API	APIs Tested	Starting Price
Oxylabs	Web Scraper API, SERP Scraper API, E-Commerce Scraper API	$99
Bright Data	Web Unlocker, SERP API	$3 (pay as you go), $500 (plan)
Smartproxy	Web Scraping API, SERP Scraping API, E-Commerce Scraping API	$50
Zyte	Zyte API	$0 (pay as you go), $25 (plan)
Rayobyte	Scraping Robot	$0.0018/req
ScraperAPI	ScraperAPI	$49
Shifter	Web Scraping API, SERP API	$44.95

Feature Overview

Integration Methods

In theory, all web scraping APIs use the same basic structure: there's an endpoint where you pass URLs you want to scrape with one or more parameters.

In practice, the implementation can differ significantly. Here are the four main methods we've encountered:

Provider	API (open connection)	API (asynchronous)	Proxy	Library/SDK
Oxylabs	✅ Open connection where you send requests and wait for a response.	✅ Allows asynchronous delivery for bulk scraping.	✅ Can integrate as a proxy.	❌ No dedicated library or SDK.
Bright Data	❌ No open connection method.	✅ Supports asynchronous delivery.	✅ Can integrate as a proxy.	❌ No dedicated library or SDK.
Smartproxy	✅ Open connection method available.	❌ Does not support asynchronous delivery.	✅ Can integrate as a proxy.	❌ No dedicated library or SDK.
Zyte	✅ Open connection for requests.	❌ Does not support asynchronous delivery.	❌ Can be used as a proxy.	✅ Provides a Library/SDK.
Rayobyte	✅ Open connection for requests.	❌ Does not support asynchronous delivery.	❌ Can be used as a proxy.	❌ No dedicated library or SDK.
ScraperAPI	✅ Open connection method.	✅ Supports asynchronous delivery.	✅ Can integrate as a proxy.	✅ Provides a Library/SDK.
Shifter	✅ Open connection for requests.	❌ Does not support asynchronous delivery.	❌ Can be used as a proxy.	✅ Provides a Library/SDK.

API (open connection): Open connection means sending requests to an API endpoint and waiting for the response. GET and POST methods are used, with variations in implementation.
API (asynchronous): Asynchronous delivery allows sending API calls with an ID and fetching results over a webhook, which is useful for scraping in bulk.
Proxy: Most APIs can integrate as proxies, easing the transition from regular proxy servers.
Library/SDK: Some providers offer SDKs for additional convenience.

HTML Scraping

General-purpose APIs have one endpoint that attempts to scrape any website, returning pages in raw HTML.

All participants offer an API for general-purpose scraping:

Provider	Relevant Tool
Oxylabs	Web Scraper API
Bright Data	Web Unlocker
Smartproxy	Web Scraping API
Zyte	Zyte API
Rayobyte	Scraping Robot
ScraperAPI	ScraperAPI
Shifter	Web Scraping API

Parameters like geolocation, residential proxies, device type, sessions, cookies, and data input are common among APIs.

Headless Scraping

Headless scraping is crucial for overcoming website protection systems.

Most providers manage headless browsers for you:

Provider	JavaScript Rendering	Screenshots	Browser Actions
Oxylabs	✅ JavaScript rendering is universally available.	✅ Supports taking screenshots.	❌ Does not support direct browser interactions.
Bright Data	✅ JavaScript is handled automatically.	❌ Does not support screenshots.	❌ Does not support direct browser interactions.
Smartproxy	✅ JavaScript rendering is universally available.	✅ Supports taking screenshots.	❌ Does not support direct browser interactions.
Zyte	✅ JavaScript rendering is universally available.	✅ Supports taking screenshots.	✅ Allows direct browser interactions.
Rayobyte	✅ JavaScript rendering is universally available.	✅ Supports taking screenshots.	✅ Allows direct browser interactions.
ScraperAPI	✅ JavaScript rendering is universally available.	❌ Does not support screenshots.	❌ Does not support direct browser interactions.
Shifter	✅ JavaScript rendering is universally available.	✅ Supports taking screenshots.	✅ Allows advanced browser interactions.

JavaScript rendering is universally available, and some providers allow interactions with the browser, such as clicking and scrolling.

Specialized APIs

Specialized APIs target specific website groups, ensuring compatibility and structured scraping:

Provider	Search Engine APIs	E-commerce APIs	Social Media APIs
Oxylabs	Google, Baidu, Bing, Yandex	Amazon, Walmart, eBay, Wayfair + 7 more	❌
Bright Data	Google, Bing, DuckDuckGo, Yandex	❌	❌
Smartproxy	Google, Baidu, Bing, Yandex	Amazon, Idealo, Wayfair	❌
Zyte	❌ No specialized search engine API.	❌ No specialized e-commerce API.	❌
Rayobyte	Google	Amazon	❌
ScraperAPI	❌ No specialized search engine API.	❌ No specialized e-commerce API.	❌
Shifter	Google, Bing, Yandex	❌	❌

Search engines and e-commerce sites are common targets, with Google and Amazon receiving the most attention.

Google Features

Google Features	Oxylabs	Bright Data	Smartproxy	Rayobyte	Shifter
APIs	Search, ads, hotels, images, autocomplete, search volume, trends	Search, maps, trends, reviews, hotels, reverse image	Search, ads, hotels, images, autocomplete, trends	Search	Search, maps, autocomplete, scholar, product, reverse image, jobs, events, Google Play, trends
Search Type (tbm)	✅ Supports specifying search types.	✅ Supports specifying search types.	✅ Supports specifying search types.	❌ Does not support specifying search types.	✅ Supports specifying search types.
Device Type	✅ Supports specifying device types.	✅ Supports specifying device types.	✅ Supports specifying device types.	❌ Does not support specifying device types.	✅ Supports specifying device types.
Location Selection	City-level	City-level	City-level	Country-level	City-level
Localization	Domain, language	Domain, language	Domain, language	Domain, language	Domain, language
Pagination	Start, number of pages	Start, number of pages	Start, number of pages	Number of pages	Start, number of pages

Amazon Features

Amazon Features	Oxylabs	Smartproxy	Rayobyte
APIs	Bestsellers, pricing, product, QA, reviews, search, sellers	Product, pricing, reviews, QA, search, sellers	Product
Device Type	✅	✅	❌
Domain	✅	✅	❌
Delivery Location	✅	✅	❌
Pagination	Start, number of pages	Start, number of pages	❌

Data Parsing

Parsing capabilities vary among providers. Some offer specialized APIs with built-in parsers, while others expose selectors for manual parsing. Overall parsing capabilities are as follows:

Provider	Manual Parsing	Search Engine Parsers	E-commerce Parsers
Oxylabs	❌ Does not support manual parsing.	Google	Amazon, Walmart, eBay, Wayfair, Target, Etsy, AI parsing
Bright Data	❌ Does not support manual parsing.	Google, Bing, Yandex, DuckDuckGo	❌ Does not support specialized e-commerce parsing.
Smartproxy	❌ Does not support manual parsing.	Google	Amazon
Zyte	CSS selectors	❌ Does not support specialized search engine parsing.	❌ Does not support specialized e-commerce parsing.
Rayobyte	CSS, XPath selectors	Google	❌ Does not support specialized e-commerce parsing.
ScraperAPI	❌ Does not support manual parsing.	Google	Amazon
Shifter	CSS selectors	Google, Bing, Yandex	❌ Does not support specialized parsing.

Pre-built parsers for Google are common, and manual parsing is offered by a few providers. Specialized parsers for Amazon are available, with Oxylabs supporting targets beyond Amazon.

Google Parsing

Google Parsing	Oxylabs	Bright Data	Smartproxy	Rayobyte	ScraperAPI	Shifter
Data Formats	JSON, CSV	JSON	JSON	JSON	JSON	JSON
Parsable Elements	SERP	✅ Supports parsing Search Engine Results Page (SERP).	✅ Supports parsing SERP.	✅ Supports parsing SERP.	✅ Supports parsing SERP.	✅ Supports parsing SERP.
Search Types (tbms)	Images, news, shopping	Images, news, shopping, videos, maps, hotels	Shopping	❌ Does not support specifying search types.	Shopping	Images, news, shopping, videos, maps
Other	Ads, autocomplete, reverse image, monthly search volume, trends	Reverse image, trends, reviews	Ads, autocomplete, trends	❌ Does not support specialized parsing.	❌ Does not support specialized parsing.	Autocomplete, reverse image, scholar, Play, trends

Amazon Parsing

Amazon Parsing	Oxylabs	Smartproxy	Rayobyte	ScraperAPI
Data Formats	JSON	JSON	JSON	JSON
Parsable Elements	Search	✅ Supports parsing search results.	✅ Supports parsing search results.	✅ Supports parsing offer listings.
	Product	✅ Supports parsing product information.	✅ Supports parsing product information.	✅ Supports parsing product information.
	Reviews	✅ Supports parsing reviews.	❌ Does not support parsing reviews.	✅ Supports parsing reviews.
Others	Bestsellers, ASIN prices, QA, seller info	ASIN prices, QA	❌ Does not support specialized parsing.	❌ Does not support specialized parsing.

Performance Benchmarks of Web Scraping APIs

In a comprehensive evaluation of web scraping APIs, a custom Python script utilizing Asyncio and AIOHTTP libraries was employed for asynchronous requests with a timeout of 150 seconds.

The focus was on assessing Google, Amazon, and a photo-centric social media platform across various scenarios.

import asyncio
import aiohttp
from aiohttp import ClientSession

async def fetch_data(session: ClientSession, url: str, timeout: int = 150) -> dict:
    try:
        async with session.get(url, timeout=timeout) as response:
            return await response.json()
    except aiohttp.ClientError as e:
        print(f"Error fetching data from {url}: {e}")
        return {}

async def scrape_google():
    google_url = "https://www.google.com"
    async with aiohttp.ClientSession() as session:
        google_data = await fetch_data(session, google_url)
        print("Google Data:", google_data)

async def scrape_amazon():
    amazon_url = "https://www.amazon.com"
    async with aiohttp.ClientSession() as session:
        amazon_data = await fetch_data(session, amazon_url)
        print("Amazon Data:", amazon_data)

async def main():
    tasks = [
        scrape_google(),
        scrape_amazon(),
    ]
    await asyncio.gather(*tasks)

if __name__ == "__main__":
    asyncio.run(main())

Google

Unparsed Results

Provider	Success Rate	Avg. Response Time (s)
Oxylabs	100%	6.04
Bright Data	98.42%	4.62
Smartproxy	100%	6.09
Zyte	99.47%	4.72
Rayobyte	100%	6.53
ScraperAPI	94.10%	12.58
Shifter	81.76%	1.67

Most APIs performed well, with notable exceptions. Shifter's general-purpose scraper faced challenges with Google, resulting in a 429 detection error every fifth request. The specialized API improved performance but experienced a decrease in speed.

Parsed Results

Provider	Success Rate	Avg. Response Time (s)
Oxylabs	99.90%	6.15
Bright Data	99.71%	6.03
Smartproxy	99.85%	6.04
Zyte	–	10.03
Rayobyte	99.93%	13.24
ScraperAPI	96.88%	10.08
Shifter	96.65%	–

The use of a data parser had minimal impact on response time, except for Rayobyte, which exhibited a three-second delay in JSON results for unexplained reasons.

Amazon

Provider	Success Rate	Avg. Response Time (s)
Oxylabs	100%	4.69
Bright Data	98.42%	4.31
Smartproxy	100%	4.66
Zyte	85.50%	4.51
Rayobyte	95.60%	20.70
ScraperAPI	95.80%	9.69
Shifter	98.80%	5.35

Bright Data, Oxylabs, and Smartproxy consistently delivered excellent results. Rayobyte's slow response was attributed to defaulting to datacenter IPs for Amazon, necessitating multiple request retries. Zyte encountered 520 errors, and ScraperAPI mirrored its Google performance. Shifter performed well, but its scraper faced challenges.

GraphQL Endpoint

Provider	Success Rate	Avg. Response Time (s)
Oxylabs	100%	17.89
Bright Data	73.40%	3.71
Smartproxy	100%	8.95
Zyte	98.40%	2.59
Rayobyte	80%	4.52
ScraperAPI*	24.80%	8.08
Shifter	54.80%	1.77

The GraphQL endpoint posed a serious challenge, with Shifter struggling even with rendering enabled. ScraperAPI faced difficulties, while Zyte stood out with commendable performance.

Headless Rendering

Provider	Success Rate	Avg. Response Time (s)
Oxylabs	100%	28.88
Bright Data	100%	4.10
Smartproxy	100%	29.09
Zyte	94.00%	28.14
Rayobyte	98.60%	23.05
ScraperAPI*	98.20%	16.05
Shifter	62.40%	4.42

The headless test was more forgiving, with Bright Data demonstrating superior results. Shifter was fast but faced errors. ScraperAPI had improved performance, and Oxylabs and Smartproxy maintained success rates at the expense of some speed.

Concurrency

Provider	Concurrency
Oxylabs	5 req/s to unlimited
Bright Data	Unlimited
Smartproxy	Unspecified
Zyte	2 req/s
Rayobyte	100 req/min
ScraperAPI	200-400 threads
Shifter	Unspecified

Concurrency varied, with Bright Data, Smartproxy, and Oxylabs allowing for high parallel requests. Rayobyte and Zyte had more restrictive default limits, mainly applicable to enterprise-level needs.

Evaluation of Parsing Capabilities in Web Scraping APIs

In a nuanced examination of web scraping APIs, a qualitative test was conducted to assess their parsing abilities on four distinct types of pages: localized Google search desktop query, localized Google search mobile query, Google Shopping query, and Amazon product pages.

Google SERP, Localized Desktop Query

For the localized desktop query "best hairdresser near me" in London, the APIs were evaluated based on various elements:

Provider	Localized	Organic	Snack Pack	Map	Related Searches	People Also Ask
Oxylabs	✅	✅	✅	❌	✅	✅
Bright Data	✅	✅	✅	✅	✅	✅
Smartproxy	✅	✅	✅	❌	✅	✅
Rayobyte	✅	✅	✅	❌	✅	✅
ScraperAPI	✅	✅	❌	❌	✅	✅
Shifter	❌	✅	✅	❌	✅	✅

While ScraperAPI and Rayobyte focused on essential information, others aimed to parse the entire SERP.

Notably, Bright Data even provided a screenshot of the map. Shifter faced issues with the location parameter, hindering local result retrieval.

Google SERP, Localized Mobile Query

The mobile query with the same parameters as the desktop query yielded the following results:

Provider	Localized	Organic	Snack Pack	Map	Related Searches	People Also Ask
Oxylabs	✅	✅	✅	❌	✅	✅
Bright Data	✅	✅	✅	✅	✅	✅
Smartproxy	✅	✅	✅	❌	✅	✅
Rayobyte	✅	✅	✅	❌	✅	✅
ScraperAPI	✅	✅	❌	❌	✅	✅
Shifter	❌	✅	✅	❌	✅	✅

Bright Data, Oxylabs, and Smartproxy successfully returned complete and accurate results. However, ScraperAPI failed to scrape anything, and Shifter's mobile parser regressed to main page elements, missing local data.

Google Shopping

The Google Shopping query for "Nike Air Max" in London was analyzed for various aspects:

Provider	Localized	Search Filters	Ads	Item	Pricing	Merchant	Delivery	Evaluation	Other
Oxylabs	✅	✅	❌	✅	✅	✅	❌	✅
Bright Data	❌	❌	❌	✅	✅	✅	✅	✅	Price Comparison
Smartproxy	✅	✅	❌	✅	✅	❌	✅	✅
ScraperAPI	✅	❌	✅	✅	✅	✅	✅	✅	Filter by Material, Related Searches, Price Comparison
Shifter	❌	❌	✅	✅	✅	✅	❌	✅

ScraperAPI provided the most comprehensive results, including related searches and the "you might like" block. It successfully retrieved ad results, a feature absent in other providers. Bright Data and Shifter failed to localize the page for this specific query.

Amazon Product Pages

Various product pages from art supplies, kitchenware, and electronics were targeted for parsing. The evaluation included elements such as breadcrumbs, item details, images, pricing, merchant information, availability, bestsellers rank, delivery, evaluation, and warranty.

Provider	Breadcrumbs	Item	Images	Item Variations	Pricing	Merchant	Availability	Bestsellers Rank	Delivery	Evaluation	Warranty
Oxylabs	✅	✅	✅	✅	✅	✅	✅	✅	✅	✅	✅
Smartproxy	✅	✅	✅	✅	✅	✅	✅	✅	✅	✅	✅
Rayobyte	✅	✅	✅	❌	✅	✅	❌	✅	❌	✅	❌
ScraperAPI	✅	✅	✅	✅	✅	✅	✅	✅	✅	✅	❌

All four APIs demonstrated the ability to parse most page elements. Oxylabs and Smartproxy provided the most comprehensive results, including discounts, delivery, and warranty information. Rayobyte's parser was less informative, excluding item variations, delivery, and warranty information. Shifter chose to exclude buy box data and experienced a few formatting errors.

In summary, this qualitative test unveiled the varying parsing capabilities of web scraping APIs, shedding light on their strengths and limitations across different types of web pages.

Pricing Models

Web scraping APIs predominantly adopt a pricing structure centered around successful requests, simplifying expense calculations. Providers typically charge based on the number of successful requests, allowing users to gauge costs easily. The standard metric for comparison is the CPM (cost per 1,000 requests).

Provider	Pricing Model	Structure	Starting Price	Trial
Oxylabs	Subscription	Successful requests	$99	5,000 req for a week
Bright Data	Pay as you go, Subscription	Successful requests	$3 (pay as you go), $500 (plan)	7 days for companies
Smartproxy	Subscription	Successful requests	$50	3,000 req for 3 days
Zyte	Pay as you go, Subscription	Successful requests	$0 (pay as you go), $25 (plan)	$5 free credit
Rayobyte	Pay as you go	Successful requests	$0.0018/request	5,000 free per month (renewed)
ScraperAPI	Subscription	Successful requests	$49	5,000 credits for a week
Shifter	Subscription	Successful requests	$44	Money-back guarantee

The dominant pricing model remains the monthly subscription, but variations exist. Zyte introduces an intriguing approach where users set a monthly limit and pay half in advance each month. Notably, trials are available with most providers, with a standard offering of 5,000 requests.

Calculating Request Price

While the pricing model appears straightforward, some web scraping APIs introduce complexities in calculating a request's price.

Factors such as the target website, JavaScript rendering, residential proxies, and more contribute to price modifiers, leading to significant cost variations.

Provider	Price Modifiers	Max Price Difference
Oxylabs	Search engines, e-commerce websites	x2-3
Bright Data	–	x1
Smartproxy	Search engines, e-commerce websites	x1.5-3
Zyte	Target, JS rendering, premium proxies, screenshots, browser actions	Custom
Rayobyte	–	x1
ScraperAPI	Premium, super premium proxies, premium targets, JS rendering	x75
Shifter	Premium proxies, JS rendering, search engines	x25

ScraperAPI stands out with a complex structure involving three tiers of proxy networks and JavaScript rendering.

The pricing varies based on factors like the use of residential proxies, headless scraping, and rates for specific websites such as Google, Amazon, and social media.

Oxylabs and Smartproxy adopt a differentiation approach, with higher costs for search engine scrapers and approximately double expenses for e-commerce scrapers.

Shifter follows a similar strategy for search engines, while its regular scraper aligns with ScraperAPI's structure.

Bright Data and Rayobyte maintain consistent pricing irrespective of whether they use custom scrapers or render JavaScript, simplifying the process of scraping challenging targets.

Zyte, on the other hand, dynamically calculates the price per request for each website, considering its difficulty, JavaScript rendering, screenshots, and browser actions. This dynamic approach makes it challenging to estimate expenses in advance.

Conclusion

The web scraping API landscape is dynamic, offering diverse features and pricing structures.

Key insights include the evolution of advanced features, the targeting of major websites like Google and Amazon, and the importance of parsing capabilities.

Performance and reliability vary, with Oxylabs, Smartproxy, and Bright Data emerging as reliable performers.

Pricing models are generally based on successful requests, but some providers introduce complexity with differentiated pricing.

Organizations should carefully assess their needs and budget constraints when choosing a web scraping API, considering factors like data output, customization, and parsing capabilities. Ongoing monitoring is essential in this competitive and evolving ecosystem.

illustration of a group of people sitting in front of a computer with a question mark monitor

Frequently Asked Questions

How do web scraping APIs handle pricing?

Web scraping APIs typically follow a pricing model based on successful requests. Users are charged for the number of requests that are completed successfully. Some providers introduce additional complexities, such as differentiated pricing for specific websites or features.

What are the key features to consider when evaluating a web scraping API?

Important features include data output format, customization options (e.g., location selection, device specification), parsing capabilities, and performance/reliability. Consideration of the target websites and the ability to handle dynamic content and JavaScript is also crucial.

What are some challenges associated with web scraping, and how can they be addressed?

Challenges include handling dynamic content, CAPTCHAs, and changes in website structure. To address these challenges, choose a web scraping API with robust parsing capabilities and support for JavaScript rendering, and consider implementing techniques like rotating proxies and user agents to avoid detection. Regularly monitor and adapt your scraping strategy as websites evolve.

For further reading, you might be interested in the following: