- Автоматическое управление сеансами
- Таргетинг на любой город в 195 странах
- Неограниченное количество одновременных сеансов
How to Find Elements by CSS Selector in Puppeteer?
CSS selectors are one of the most efficient ways to parse HTML pages when web scraping. In Node.js and Puppeteer, you can use CSS selectors with the page.$
and page.$$
methods. These methods allow you to interact with elements on the page using familiar CSS syntax.
Below is an example code demonstrating how to use these methods to find elements on a page, along with detailed comments to help you understand each step.
const puppeteer = require('puppeteer');
async function run() {
// Launch a new browser instance
const browser = await puppeteer.launch();
const page = await browser.newPage();
// Navigate to the target webpage
await page.goto("https://httpbin.dev/html", { waitUntil: 'domcontentloaded' });
// Get the first matching element
const firstParagraph = await page.$("p");
console.log("First paragraph element found:", firstParagraph);
// Get all matching elements
const allParagraphs = await page.$$("p");
console.log("Total paragraphs found:", allParagraphs.length);
// Extract and log the text content of the first paragraph
const firstParagraphText = await page.$eval("p", element => element.innerText);
console.log("Text of the first paragraph:", firstParagraphText);
// Extract and log the href attribute of the first anchor tag
const firstAnchorHref = await page.$eval("a", element => element.href);
console.log("Href of the first anchor tag:", firstAnchorHref);
// Count the total number of paragraph elements
const paragraphCount = await page.$$eval("p", elements => elements.length);
console.log("Total number of paragraph elements:", paragraphCount);
// Modify the inner text of the first paragraph
await page.$eval("p", element => element.innerText = "New text for the first paragraph");
console.log("Modified the text of the first paragraph.");
// Close the browser
await browser.close();
}
// Run the function
run();
In this example, we perform the following actions:
- Launch a new browser instance: This starts a new Puppeteer-controlled browser.
- Navigate to the target webpage: The
goto
method navigates to the specified URL and waits until the page’s DOM content is fully loaded. - Get the first matching element: The
page.$
method retrieves the first element that matches the CSS selectorp
. - Get all matching elements: The
page.$$
method retrieves all elements that match the CSS selectorp
. - Extract and log the text content of the first paragraph: The
page.$eval
method evaluates a function in the context of the first matching element and returns itsinnerText
. - Extract and log the href attribute of the first anchor tag: The
page.$eval
method retrieves thehref
attribute of the firsta
tag. - Count the total number of paragraph elements: The
page.$$eval
method evaluates a function in the context of all matching elements and returns the total count. - Modify the inner text of the first paragraph: The
page.$eval
method changes theinnerText
of the first matchingp
element. - Close the browser: This ensures that the browser instance is properly closed after the script finishes.
Note:
It’s essential to wait for the page to fully load before attempting to find elements, especially for pages with dynamic content. For more information, see How to wait for a page to load in Puppeteer?.
For additional ways to locate elements, you can also refer to How to find elements by XPath in Puppeteer?.
Conclusion
Using CSS selectors with Puppeteer makes web scraping and automation tasks straightforward and efficient. By understanding and leveraging these methods, you can effectively interact with web page elements and extract valuable data.