Automating your workflow to get results efficiently is better than being painstakingly done manually. Scraping the web is all about extracting data in a clean and readable format that developers, data analysts, and scientists deploy to read and download an entire web page of its data ethically. with scripts In this article, you will learn and explore the benefits of using Bright Data infrastructure that connects to large datasets with great proxy networks using the . Scraping Browser Let’s get started. What is Bright Data? Bright Data is a web data platform that helps organizations, small businesses, and academic institutions retrieve crucial public web data efficiently, reliably, and flexibly. Bright Data comprises ready-to-use datasets that are GDPR and CCPA-compliant. What is Playwright? Playwright is used to navigating target websites just like the function of Puppeteer interacting with the site’s HTML code to extract the data you need. Installation Before writing a single script, check if you have Python installed on your system using this command in the command line interface (CLI) or terminal: python --version If the version is not present in the terminal after running the command, go to the to download it to your local machine. official website of Python Connecting to Scraping Browser Create a new account on to gain access to the admin dashboard of the Scraping Browser for the proxy integration with your application. Bright Data On the left pane of the dashboard, click on the icon. Proxies and Scraping Infra Scrolling down the page, select the After that, click on the button. If you don’t find it, click the button dropdown from the previous image and select the . Scraping Browser. Get started Add Scraping Browser The next screen allows you to rename the proxy name. Click the button to pop up a prompt display message. Accept the default change by clicking the button. Add proxy Yes Next, click the button to configure the code in Python. </> Check out code and integration examples Creating environment variables in Python Environment variables are stored secret keys and credentials in the form of values configured to keep the app running during development and prevent unauthorized access. Like in a app, create a new file called in the root directory. But first, you will need to install the Python package . Node.js .env python-dotenv pip3 install python-dotenv The package reads the key-value pairs of the environment variables set. To confirm the installation of the package , run this command that lists all installed packages present: python-dotenv pip3 list Next, copy-paste this code into the file: .env .env USERNAME="<user-name>"
HOST="<host>" Replace the values in the with the values from Bright Data. quotation Creating the web scraper with Playwright In the project directory, create a new file called to handle scraping the web. app.py Installing packages You will need to install these two libraries, , and playwright, with this command: asyncio pip3 install asyncio
pip3 install playwright : It is a library to write concurrent code using the async/await syntax Asyncio : This module provides a method to launch a browser instance Playwright Now, copy-paste this code: app.py import asyncio
import os
from playwright.async_api import async_playwright
from dotenv import load_dotenv

load_dotenv()

auth = os.getenv("USERNAME")
host = os.getenv("HOST")

browser_url = f'wss://{auth}@{host}'

async def main():
    async with async_playwright() as pw:
        print('connecting');
        browser = await pw.chromium.connect_over_cdp(browser_url)
        print('connected');
        page = await browser.new_page()
        print('goto')
        await page.goto('http://lumtest.com/myip.json', timeout=120000)
        print('done, evaluating')
        print(await page.evaluate('()=>document.documentElement.outerHTML'))
        await browser.close()

asyncio.run(main()) The code above does the following: Import the necessary modules like , , , and asyncio async_playwright load_dotenv os The is responsible for reading the variables from the file load_dotenv() .env The method returns the values of the environment variable key os.getenv() The function is asynchronous, and within the function, the playwright module connects to the data zone main() The method gets the page HTML and, with the method, leads to the destination site with a timeout of 2 minutes new_page() goto While the method will query the page and print out the result after accessing the page elements and firing up the events page.evaluate() It is a must to close the browser with the method browser.close() To test this application, run with the command: python app.py Conclusion The prospects of evaluating and extracting meaningful data are the heart and operation of what Bright Data offers. This tutorial showed you how to use the Scraping Browser in Python with the Playwright package to read data from a website. Try today! Bright Data

How to Scrape Large Datasets at Scale

Web scraping using a headless browser in NodeJS

Portfolio

Nominated for 2022 - HackerNoon Contributor of the Year - Data Visualization

Nominated for 2022 - HackerNoon Contributor of the Year - Heroku

Nominated for 2022 - HackerNoon Contributor of the Year - Javascript

Nominated for 2022 - HackerNoon Contributor of the Year - Frontend

Nominated for 2022 - Remote Work Warrior

Nominated for 2022 - No No No Nodejs

Technical content creator

Too Long; Didn't Read

Scraping the unscrapable in Python using Playwright

Scraping the unscrapable in Python using Playwright

About Author

Comments

TOPICS

THIS ARTICLE WAS FEATURED IN

Related Stories

2021: Reviewing and Kaizen-ing My Programming and Writing Life

116 Stories To Learn About Web Scraping

3 Mejores Formas de Crawl Datos desde Website

5 Técnicas Anti-Scraping que Puedes Encontrar

53 Stories To Learn About Data Scraping

8 Browser Extensions for Scraping Google Maps like a Pro

2021: Reviewing and Kaizen-ing My Programming and Writing Life

116 Stories To Learn About Web Scraping

3 Mejores Formas de Crawl Datos desde Website

5 Técnicas Anti-Scraping que Puedes Encontrar

53 Stories To Learn About Data Scraping

8 Browser Extensions for Scraping Google Maps like a Pro

Light-Mode

Classic

Newspaper

Minty

Dark-Mode

Neon Noir

Minty

HN StartUps