How to Make the Most of Playwright After the Latest Updates

Written by brightdata | Published 2023/12/17
Tech Story Tags: python | data | playwright | web-scraping | proxy-servers | programming | selenium | good-company

TLDRGuess what? Microsoft just dropped a new version of Playwright like clockwork! Keeping up with bug fixes and API changes can feel like a full-time gig, and missing out on cool features is a snap. But fret not, we've got your back! We're here to help you stay in the loop, pick up some new tricks, and wow your colleagues with your stellar skills! Ready to dive in? Let's go!via the TL;DR App

Guess what? Microsoft just dropped a new version of Playwright like clockwork! Keeping up with bug fixes and API changes can feel like a full-time gig, and missing out on cool features is a snap. But fret not; we've got your back!

Join us on a journey through the latest Playwright updates. We're here to help you stay in the loop, pick up some new tricks, and wow your colleagues with your stellar skills! Ready to dive in? Let's go!

Playwright Latest Updates (v1.04)

If you're eager to get a firsthand look at the latest Playwright updates through some awesome examples, head over to the Playwright YouTube Channel and watch the monthly “What's New in Playwright” video:

https://www.youtube.com/watch?v=mn892dV81_8&embedable=true

Time to explore the latest features introduced in Playwright and see how to get the most out of them 🔍

New APIs

  • FirefoxUserPrefs field added to the options object argument of browserType.launchPersistentContext(userDataDir, options). That method launches a browser using persistent storage located at userDataDir and returns the browser context instance. firefoxUserPrefs`is an object containing the Firefox user preferences as specified at about:config.
  • reason field added to the options object argument of the page.close(options), browserContext.close(options), and browser.close(options) methods. reason is a string containing the error message reported by all operations interrupted as a result of a close() call.

While firefoxUserPrefs caters specifically to Firefox users in need of custom configs, the reason field is much more general purpose. Use it as in the snippet below:

await browser.close({
    reason: "Scraping process completed!"
})

All pending operations interrupted by browser.close() will now throw a JavaScript error with the message “Scraping process completed!”

How to use this feature? Imagine your target page is in the midst of a critical task—let's say, waiting for some data to be retrieved. Suddenly, an unexpected error pops up, and you're faced with the need to gracefully close the browser.

Without reason, you'd be left clueless about whether the resources for the ongoing task have been released and why the operation was interrupted. Not knowing what's going on is bad, especially when you're tasked with inspecting the reason for an error in the logs of an automated web scraping script. That's where the `reason` field comes to your rescue, saving you days of painstaking investigation.

Awesome, this API introduction makes debugging a lot easier! 🚀

New Functionality for the Test Generator Tool

The Playwright Test Generator tool, designed to create tests automatically as you perform actions in a browser, now provides the following buttons:

  • Assert visibility: Verify that the selected element is visible by generating a expect(locator).toBeVisible() instruction.
  • Assert text: Make sure that the selected HTML element contains specific text via a expect(locator).toContainText() instruction
  • Assert value: Check that the select element has a particular value by adding a expect(locator).toHaveValue() instruction to your test.

Picture this: you’re working on a complex task, such as building a web scraper for a dynamic web page. Assume you need to make sure that some elements on the page are visible and contain specific text or values. That would involve some complex logic 👎. But hold up—thanks to this fresh update, it all boils down to a few clicks in the Test Generator tool!

See this new feature in action in the following GIF:

That Test Generator interaction will produce the following TypeScript test for you:

import { test, expect } from '@playwright/test';
   test('test', async ({ page }) => {
   await page.goto('https://playwright.dev/');
   await expect(page.getByRole('banner')).toContainText('Get started');
   });

✨ Pretty magical, isn’t it? ✨

Updated Browser Versions

In the grand tradition of Playwright's major updates, the lineup of supported browsers has been updated with newer versions:

  • Chromium 120.0.6099.28
  • Mozilla Firefox 119.0
  • WebKit 17.4

But that's not all! The current version of Playwright has also proven excellent against the following stable channels:

  • Google Chrome 119

  • Microsoft Edge 119

Other Minor Changes

  • The download.path() and download.createReadStream() methods now throw an error when the download operation fails or gets canceled.

  • The Playwright docker image now comes with Node.js v20.

Don't want to miss out on any new updates? Keep an eye on the Playwright Release Notes page!

How to Update Playwright

Now, you must be thinking, “How can I get my hands on those fantastic new additions?” Well, by updating Playwright to the latest version, my friend!

Just fire up the command below:

npm install @playwright/test@latest

And forget to upgrade the browser instances with:

npx playwright install

Voilà! You're all set to get your hands on the latest Playwright updates!

New Playwright, Same Old Problems…

No matter how up-to-date your version of Playwright is, most sites will still be able to detect and block your automated scripts. But how is that even possible? Well, headless browsers controlled by libraries like Playwright involve special configurations and settings that are seen as red flags by anti-bot solutions. The consequence? Immediate blocks or the unwelcome appearance of CAPTCHA and other pesky obstacles.

Now, you might be thinking, “Can't I just tweak my browser settings to avoid this?”

Not so fast, kiddo! That’s not a great idea for at least three compelling reasons:

  1. It's a never-ending cat-and-mouse game—anti-bot measures evolve, making today's workaround old news by tomorrow.
  2. Even with the slickest browser configurations, excessive requests from the same IP might still earn you suspicious glances from the target site.
  3. User interactions like form submissions may require CAPTCHA solving, which isn’t a walk in the park to automate!

The issue isn’t with the browser automation library itself (Playwright rocks! 🤘), but rather with the browser under control. The solution would be a highly customizable browser that:

  • Runs in headed mode like a normal browser to avoid bot detection.
  • Can easily scale to the cloud to save you time and money in infrastructure management.
  • Provides rotating IPs backed by one of the widest and most reliable proxy networks on the market
  • Can automatically manage CAPTCHA solving, browser fingerprinting, cookie and headers customization, and automatic retries for you for maximum effectiveness.
  • Integrates with the most popular browser automation libraries, such as Playwright, Selenium, and Puppeteer

Believe it or not, that isn't some far-off mirage. That's real and is exactly what Bright Data’s Scraping Browser solution is all about!

Final Thoughts

Playwright is the rock star of browser automation libraries, and just like Santa Claus delivers presents on Christmas Eve, Microsoft releases a major new update every month. Here, you've seen how to get the most out of the latest Playwright updates, but let's face it, they won't magically make you invisible to sites with advanced bot detection technologies.

Dodge that bullet with the Scraping Browser solution from Bright Data and join our mission to make the Internet a public place for everyone, everywhere, even through automated scripts!

Until next time, keep exploring the Web with freedom!


Written by brightdata | From data collection to ready-made datasets, Bright Data allows you to retrieve the data that matters.
Published by HackerNoon on 2023/12/17