How to Find Elements by CSS Selector in Puppeteer?

CSS selectors are one of the most efficient ways to parse HTML pages when web scraping. In Node.js and Puppeteer, you can use CSS selectors with the page.$ and page.$$ methods. These methods allow you to interact with elements on the page using familiar CSS syntax.

Below is an example code demonstrating how to use these methods to find elements on a page, along with detailed comments to help you understand each step.

      const puppeteer = require('puppeteer');

async function run() {
    // Launch a new browser instance
    const browser = await puppeteer.launch();
    const page = await browser.newPage();
    
    // Navigate to the target webpage
    await page.goto("https://httpbin.dev/html", { waitUntil: 'domcontentloaded' });

    // Get the first matching element
    const firstParagraph = await page.$("p");
    console.log("First paragraph element found:", firstParagraph);

    // Get all matching elements
    const allParagraphs = await page.$$("p");
    console.log("Total paragraphs found:", allParagraphs.length);

    // Extract and log the text content of the first paragraph
    const firstParagraphText = await page.$eval("p", element => element.innerText);
    console.log("Text of the first paragraph:", firstParagraphText);

    // Extract and log the href attribute of the first anchor tag
    const firstAnchorHref = await page.$eval("a", element => element.href);
    console.log("Href of the first anchor tag:", firstAnchorHref);

    // Count the total number of paragraph elements
    const paragraphCount = await page.$$eval("p", elements => elements.length);
    console.log("Total number of paragraph elements:", paragraphCount);

    // Modify the inner text of the first paragraph
    await page.$eval("p", element => element.innerText = "New text for the first paragraph");
    console.log("Modified the text of the first paragraph.");

    // Close the browser
    await browser.close();
}

// Run the function
run();
    

In this example, we perform the following actions:

  1. Launch a new browser instance: This starts a new Puppeteer-controlled browser.
  2. Navigate to the target webpage: The goto method navigates to the specified URL and waits until the page’s DOM content is fully loaded.
  3. Get the first matching element: The page.$ method retrieves the first element that matches the CSS selector p.
  4. Get all matching elements: The page.$$ method retrieves all elements that match the CSS selector p.
  5. Extract and log the text content of the first paragraph: The page.$eval method evaluates a function in the context of the first matching element and returns its innerText.
  6. Extract and log the href attribute of the first anchor tag: The page.$eval method retrieves the href attribute of the first a tag.
  7. Count the total number of paragraph elements: The page.$$eval method evaluates a function in the context of all matching elements and returns the total count.
  8. Modify the inner text of the first paragraph: The page.$eval method changes the innerText of the first matching p element.
  9. Close the browser: This ensures that the browser instance is properly closed after the script finishes.

Note:

It’s essential to wait for the page to fully load before attempting to find elements, especially for pages with dynamic content. For more information, see How to wait for a page to load in Puppeteer?.

For additional ways to locate elements, you can also refer to How to find elements by XPath in Puppeteer?.

Conclusion

Using CSS selectors with Puppeteer makes web scraping and automation tasks straightforward and efficient. By understanding and leveraging these methods, you can effectively interact with web page elements and extract valuable data.

Максимальный контроль и эффективность

Добро пожаловать в Scraping Cloud

Ready to get started?