How to Set up a Rotating Proxy in Puppeteer for Web Scraping
Contents
Puppeteer is a robust tool for web scraping. However, some websites implement IP rate limits or blocks to prevent bots from visiting their websites, which can hinder Puppeteer's web scraping capabilities on these sites. A solution to this issue is setting up a rotating proxy, which allows you to access the website through various IP addresses. This helps you avoid being detected as a bot and ensure a smoother data extraction process.
In this article, we will learn about different types of proxies, how to set up a rotating proxy in Puppeteer for more efficient web scraping, and why you should use one when using Puppeteer.
What is a Rotating Proxy
A rotating proxy is a type of proxy server that automatically rotates or assigns a new IP address for each connection request. This allows you to make multiple requests to a website from different IP addresses, helping you avoid being blocked or rate-limited when using Puppeteer to scrape data from a website.
Here's how a rotating proxy typically works—the proxy server has a pool of IP addresses and each time a request is made through the proxy, it uses a different IP address from the pool. The rotation frequency can vary depending on the proxy service. Some rotate IPs for every request, while others do it at set intervals (e.g., every 5 minutes).
🐻 Bear Tips: While a rotating proxy can be useful, it may also face challenges such as IP address blacklisting, latency issues due to frequent IP changes, and potential security risks if not used properly.
Types of Rotating Proxies
Any type of proxy can be configured to be a rotating proxy, with residential proxies being the most common one. The rotation mechanism is typically implemented at the proxy server level, where the server periodically changes the IP address it uses to forward traffic.
Here are some types of proxies that are commonly available as rotating proxies:
- Residential Proxies: These proxies use IP addresses assigned by Internet Service Providers (ISPs) to residential users. They are more reliable and less likely to be blocked but can be slower and more expensive.
- Datacenter Proxies: Datacenter proxies use IP addresses from data centers. They are faster and cheaper than residential proxies but are more likely to be detected and blocked by websites.
- Mobile Proxies: These proxies use IP addresses assigned to mobile devices by mobile network providers. They are similar to residential proxies but offer even better reliability and lower detection rates.
Each type of rotating proxy has its own advantages and disadvantages, depending on your specific needs and requirements. Next, let's see how to set up a rotating proxy in Puppeteer.
Setting Up a Rotating Proxy in Puppeteer
Step 1. Choose a Proxy Service Provider
Before setting up a rotating proxy in Puppeteer, you need to choose a reliable proxy service provider. It is advisable to look for a provider that offers a large pool of rotating proxies, excellent uptime, and responsive customer support.
Here are some popular proxy service providers that offer rotating proxies:
- Froxy - Residential, mobile, and fast proxies
- Bright Data - Residential, mobile, ISP, and datacenter proxies
- Oxylabs - Residential, mobile, shared datacenter, dedicated datacenter, and ISP proxies
- Smartproxy - Residential and ISP proxies
- Storm Proxies - Residential and dedicated proxies
Step 2. Obtain Proxy Credentials
Once you have selected a proxy service provider, sign up for an account and obtain the necessary credentials to authenticate your requests. This typically includes the proxy host/IP, port number, and sometimes a username/password combination that you will use to connect to the proxy server.
Step 3. Configure Puppeteer to Use the Rotating Proxy
Puppeteer has a proxy-server
argument that allows you to use a proxy when using Puppeteer. If you have a username and password for the proxy server, set them in Page.authenticate
:
import puppeteer from 'puppeteer';
(async () => {
const browser = await puppeteer.launch({
args: [`--proxy-server=http://${host}:${port}`]
});
const page = await browser.newPage();
await page.authenticate({
username: `${your_username}`,
password: `${your_password}`
});
...
await browser.close();
})();
Depending on your proxy service provider’s configuration, the IP address should be rotated and help you avoid being blocked on rate-limiting websites.
🐻 Bear Tips: Test the rotating proxy by navigating to a website that displays your IP address and confirm that it changes with each request.
Rotating Proxies from a List
If your proxy service provider doesn’t offer auto-rotating proxies, you can also implement your own proxy rotation within Puppeteer. First, create a list of proxies that you want to rotate through in your code:
const proxies = [
'http://proxy1.example.com:port1',
'http://proxy2.example.com:port2',
'http://proxy3.example.com:port3',
];
Then, select a random proxy from the list using the Math.random()
method:
const randomProxy = proxies[Math.floor(Math.random() * proxies.length)];
Every time you run Puppeteer, a random proxy will be used:
const browser = await puppeteer.launch({
args: [`--proxy-server=${randomProxy}`]
});
Why Use a Rotating Proxy in Puppeteer
Besides avoiding being blocked when scraping websites website, using a rotating proxy with Puppeteer can be beneficial for other reasons, like:
- Increasing scalability: Using a rotating proxy allows you to scale your web scraping or other web automation tasks by distributing requests across multiple IP addresses. This improves the efficiency and scalability of the Puppeteer task on top of reducing the likelihood of getting banned or blocked from a website.
- Accessing restricted content: Some websites restrict access based on geographic location. Rotating proxies allow you to mask your IP address and access content that may be otherwise unavailable in your region. As it can provide IP addresses from various regions, you are also able to simulate browsing from different locations.
- Increasing efficiency: By rotating proxies, you can manage your requests more efficiently, ensuring that you don't overload any single IP address. This helps to avoid unnecessary delays or errors.
- Ensuring anonymity/privacy : When you access a website through a proxy, the website sees the IP address of the proxy server instead of your own IP address. This can prevent your online activities from being tracked and protect your identity and privacy.
However, while a rotating proxy can reduce the risk of IP blocking, some websites may still detect and block the proxy IPs. This can happen for websites with advanced anti-scraping measures that block rapidly changing IP addresses associated with rotating proxies. If a rotating proxy is being blocked, it might be worth trying a static proxy, and vice versa.
Using Browserbear for Web Scraping
When you’re scraping data from a website using Puppeteer, you must register with a proxy service provider and get the proxy server’s host and port number yourself to scrape the website behind a proxy. If you’re looking for a web scraping solution that can save you the hassle of setting up a proxy, Browserbear has your back!
Browserbear is a scalable, cloud-based browser automation tool that helps you automate browser tasks, including web scraping. Browserbear offers built-in proxies with IP from various countries including France, Germany, Singapore, Japan, etc. You can select one of the countries from the list or use a custom proxy if you want to (all proxy values are securely encrypted):
Unlike other web scraping or automation tools that require coding, Browserbear is incredibly simple to set up and use. You can easily locate your target HTML element by hovering your mouse over it with the Browserbear Helper extension and adding the necessary steps to your automation task from the Browserbear dashboard.
Then, you can run the task either directly from the account dashboard, or use the API (if you want to integrate it into coding projects):
Besides the built-in proxy and user-friendly interface, Browserbear also integrates AI to help you scrape information from simple websites quickly and easily! If this sounds like what you need, here’s the link for you to sign up for free!