Как парсить eBay с помощью Python

This tutorial will cover:

Зачем парсить данные электронной коммерции из сети?
Библиотеки и инструменты для парсинга eBay
Сбор данных о продуктах eBay с помощью Beautiful Soup

Зачем парсить данные электронной коммерции из сети?

Парсинг данных электронной коммерции позволяет получить полезную информацию для различных сценариев и действий. Среди них:

Price monitoring: By tracking e-commerce websites, businesses can monitor the prices of products in real-time. This helps you identify price fluctuations, spot trends, and adjust your pricing strategy accordingly. If you are a consumer, this will help you find the best deals and save money.
Competitor analysis: By gathering info about your competitors’ product offerings, prices, discounts, and promotions, you can make data-driven decisions about your own pricing strategies, product assortment, and marketing campaigns.
Market research: E-commerce data provides valuable insights into market trends, consumer preferences, and demand patterns. You can use that info as the source of a data analysis process to study emerging trends and understand customer behavior.
Sentiment analysis: By scraping customer reviews from e-commerce sites, you can gain insights into customer satisfaction, product feedback, and areas for improvement.

Когда речь идет о парсинге электронной коммерции, то eBay является одним из самых популярных вариантов как минимум по трем веским причинам:

Широкий ассортимент продукции.
Основан на системе аукционов и торгов, которая позволяет получать больше данных, чем Amazon и аналогичные платформы.
Несколько цен на одни и те же товары (Аукцион + Купить сейчас!)

Извлекая данные с eBay, вы можете получить доступ к большому количеству информации для поддержки своей стратегии мониторинга, сравнения или анализа цен.

Библиотеки и инструменты парсинга eBay

Python is considered one of the best languages for scraping thanks to its ease of use, simple syntax, and vast ecosystem of libraries. So, it will be the programming language chosen to scrape eBay. Explore our in-depth guide on how to do web scraping with Python.

Вам нужно выбрать подходящую библиотеку для парсинга из множества доступных. Чтобы принять правильное решение, ознакомьтесь с eBay в браузере. Изучив вызовы AJAX, выполняемые страницей, вы заметите, что большая часть данных на сайте встроена в HTML-документ, возвращаемый сервером.

Это означает, что простого HTTP-клиента для репликации запроса на сервер и парсера HTML будет достаточно. По этой причине мы рекомендуем:

Requests: The most popular HTTP client library for Python. It simplifies the process of sending HTTP requests and handling their responses, making it easier to retrieve page content from web servers.
Beautiful Soup: A full-featured HTML and XML parsing Python library. It is mostly used for web scraping as it provides powerful methods for exploring the DOM and extracting data from its elements.

Благодаря Requests и Beautiful Soup вы сможете парсить целевой сайт с помощью Python. Давайте посмотрим, как это сделать!

Сбор данных о продуктах eBay с помощью Beautiful Soup

Следуя этому пошаговому руководству, вы узнаете, как создать скрипт на Python для веб-скрапинга eBay.

Step 1: Getting started

Чтобы внедрить парсинг цен, выполните следующие условия:

Python 3+ installed on your computer: Download the installer, launch it, and follows the installation wizard.
A Python IDE of your choice: Visual Studio Code with the Python extension or PyCharm Community Edition are two great choices.

Next, initialize a Python project with a virtual environment called ebay-scraper by running the commands below:

mkdir ebay-scraper

cd ebay-scraper

python -m venv env

Войдите в папку проекта и добавьте файл scraper.py, содержащий следующий фрагмент:

print('Hello, World!')

Это пример скрипта, который печатает только «Hello, World!». Однако скоро он будет содержать логику для парсинга eBay.

Убедитесь, что он работает, выполнив его с:

python scraper.py

В терминале вы должны увидеть:

Hello, World!

Отлично, теперь у вас есть проект Python!

Step 2: Install the scraping libraries

It is time to add libraries required to perform web scraping to your project’s dependencies. Launch the command below in the project folder to install the Beautiful Soup and Requests packages:

pip install beautifulsoup4 requests

Импортируйте библиотеки в scraper.py и приготовьтесь использовать их для извлечения данных из eBay:

import requests

from bs4 import BeautifulSoup

# scraping logic...

Убедитесь, что ваша среда разработки Python не сообщает об ошибках, и вы готовы к мониторингу цен с помощью парсинга!

Step 3: Download the target web page

Если вы являетесь пользователем eBay, то, возможно, заметили, что URL страницы продукта имеет следующий формат:

https://www.ebay.com/itm/<ITM_ID>

Как видите, это динамический URL, который меняется в зависимости от ID элемента.

Например, это URL товара eBay:

https://www.ebay.com/itm/225605642071?epid=26057553242&hash=item348724e757:g:~ykAAOSw201kD1un&amdata=enc%3AAQAIAAAA4OMICjL%2BH6HBrWqJLiCPpCurGf8qKkO7CuQwOkJClqK%2BT2B5ioN3Z9pwm4r7tGSGG%2FI31uN6k0IJr0SEMEkSYRrz1de9XKIfQhatgKQJzIU6B9GnR6ZYbzcU8AGyKT6iUTEkJWkOicfCYI5N0qWL8gYV2RGT4zr6cCkJQnmuYIjhzFonqwFVdYKYukhWNWVrlcv5g%2BI9kitSz8k%2F8eqAz7IzcdGE44xsEaSU2yz%2BJxneYq0PHoJoVt%2FBujuSnmnO1AXqjGamS3tgNcK5Tqu36QhHRB0tiwUfAMrzLCOe9zTa%7Ctkp%3ABFBMmNDJgZJi

Здесь 225605642071 — уникальный идентификатор элемента. Обратите внимание, что параметры запроса не обязательны для посещения страницы. Вы можете удалить их, при этом eBay по-прежнему будет корректно загружать страницу товара.

Вместо того чтобы жестко кодировать целевую страницу в вашем скрипте, вы можете заставить ее считывать ID элемента из аргумента командной строки. Так вы можете собирать данные с любой страницы продукта.

Для этого обновите файл scraper.py следующим образом:

import requests

from bs4 import BeautifulSoup

import sys

# if there are no CLI parameters

if len(sys.argv) <= 1:

    print('Item ID argument missing!')

    sys.exit(2)

# read the item ID from a CLI argument

item_id = sys.argv[1]

# build the URL of the target product page

url = f'https://www.ebay.com/itm/{item_id}'

# scraping logic...

Assume you want to scrape the product 225605642071. You can launch your scraper with:

python scraper.py 225605642071

Thanks to sys, you can access the command-line arguments. The first element of sys.argv is the name of your script, scraper.py. To get item ID, you then need to target the element with index 1.

Если вы забудете ID элемента в CLI, приложение завершится ошибкой:

Item ID argument missing!

Otherwise, it will read the CLI parameter and use it in an f-string to generate the target URL of the product to scrape. In this case, URL will contain:


https://www.ebay.com/itm/225605642071

Теперь вы можете использовать запросы для загрузки этой веб-страницы с помощью следующей строки кода:

page = requests.get(url)

Behind the scene, request.get() performs an HTTP GET request to the URL passed as a parameter. page will store the response produced by the eBay server, including the HTML content of the target page.

Фантастика! Теперь давайте научимся извлекать из него данные.

Step 4: Parse the HTML document

page.text contains the HTML document returned by the server. Pass it to the BeautifulSoup() constructor to parse it:

soup = BeautifulSoup(page.text, 'html.parser')

The second parameter specifies the parser used by Beautiful Soup. If you are not familiar with it, html.parser is the name of the Python built-in HTML parser.

Переменная soup теперь хранит древовидную структуру, которая предоставляет некоторые полезные методы для выбора элементов из DOM. Самые популярные из них:

find(): Returns the first HTML element that matches the selector condition passed as a parameter.
find_all(): Returns a list of HTML elements matching the input selector strategy.
select_one(): Returns the HTML elements matching the input CSS selector.
select(): Returns a list of HTML elements matching the CSS selector passed as a parameter.

Используйте их для выбора элементов HTML по тегу, ID, классам CSS и т.д. Затем вы можете извлекать данные из их атрибутов и текстового содержимого. Давайте посмотрим, как это делать.

Step 5: Inspect the product page

Если вы хотите структурировать эффективную стратегию парсинга данных, вы должны сначала ознакомиться со структурой целевых веб-страниц. Откройте браузер и некоторые товары eBay.

Сначала вы заметите, что в зависимости от категории продукта страница содержит разную информацию. В товарах электроники у вас будет доступ к техническим спецификациям.

Когда вы откроете категорию с одеждой, то увидите доступные размеры и цвета.

Эти несоответствия в структуре веб-страниц затрудняют парсинг. Однако некоторые информационные поля есть на каждой странице, например, стоимость товара и доставки.

Ознакомьтесь также с DevTools вашего браузера. Щелкните правой кнопкой мыши на элемент HTML, содержащий интересные данные, и выберите «Просмотреть». Это откроет окно ниже:

Здесь вы можете изучить структуру DOM страницы и понять, как определить эффективные стратегии выбора.

Потратьте некоторое время на изучение страниц продукта с помощью DevTools.

Step 6: Extract the price data

First, you need a data structure where to store the data to scrape. Initialize a Python dictionary with:

item = {}

Как вы могли заметить в шаге №5, данные о ценах находятся в этом разделе:

Просмотрите элемент HTML price:

Вы можете получить цену товара с помощью описанного ниже селектора CSS:

.x-price-primary span[itemprop="price"]

And the currency with:

.x-price-primary span[itemprop="priceCurrency"]

Apply those selectors in Beautiful Soup and retrieve the desired data with:

price_html_element = soup.select_one('.x-price-primary span[itemprop="price"]')

price = price_html_element['content']

currency_html_element = soup.select_one('.x-price-primary span[itemprop="priceCurrency"]')

currency = currency_html_element['content']

Этот фрагмент выбирает HTML-элементы цены и валюты, а затем собирает строку, содержащуюся в их атрибуте content.

Имейте в виду, что цена, указанная выше, это лишь часть полной стоимости, которую вам придется заплатить за товар. Сюда также входят расходы на доставку.

Осмотрите транспортировочный элемент:

This time, extracting the desired data is a bit trickier as there is not an easy CSS selector to get the element. What you can do is iterate over each .ux-labels-values__labels div. When the current element contains the “Shipping:” string, you can access the next sibling in the DOM and extract the price from .ux-textspans–BOLD:

label_html_elements = soup.select('.ux-labels-values__labels')

for label_html_element in label_html_elements:

    if 'Shipping:' in label_html_element.text:

        shipping_price_html_element = label_html_element.next_sibling.select_one('.ux-textspans--BOLD')

        # if there is a shipping price HTML element

        if shipping_price_html_element is not None:

            # extract the float number of the price from

            # the text content

            shipping_price = re.findall("\d+[.,]\d+", shipping_price_html_element.text)[0]

        break

Элемент цены доставки содержит нужные данные в следующем формате:

US $105.44

To extract the price, you can use a regex with the re.findall() method. Do not forget to add the following line in the import section of your script:

import re 

Add the collected data to the item dictionary:

item['price'] = price

item['shipping_price'] = shipping_price

item['currency'] = currency

Print it with:

print(item)

And you will get:

{'price': '499.99', 'shipping_price': '72.58', 'currency': 'USD'}

Этого достаточно, чтобы реализовать процесс отслеживания цен в Python. Тем не менее на странице продукта eBay есть много другой полезной информации. Поэтому нужно научиться ее извлекать!

Step 7: Retrieve the item details

Если вы посмотрите на вкладку «Об этом элементе», то заметите, что она содержит много интересных данных:

Разделы и поля в них меняются от продукта к продукту, поэтому вам нужно найти способ собрать их все, используя разумный подход.

Наиболее важными разделами являются «Характеристики товара» и «Об этом продукте». Они присутствуют в карточке большинства товаров. Осмотрите один из них и обратите внимание, что вы можете выбрать их с помощью:

.section-title

Учитывая раздел, исследуйте его структуру DOM:

Обратите внимание, что он состоит из нескольких строк, каждая из которых содержит два элемента .ux-layout-section-evo__col

.ux-labels-values__labels: The attribute name.
.ux-labels-values__values: The attribute value.

Теперь вы готовы к программному парсингу всей информации с помощью:

section_title_elements = soup.select('.section-title')

for section_title_element in section_title_elements:

    if 'Item specifics' in section_title_element.text or 'About this product' in section_title_element.text:

        # get the parent element containing the entire section

        section_element = section_title_element.parent

        for section_col in section_element.select('.ux-layout-section-evo__col'):

            print(section_col.text)

            col_label = section_col.select_one('.ux-labels-values__labels')

            col_value = section_col.select_one('.ux-labels-values__values')

            # if both elements are present

            if col_label is not None and col_value is not None:

                item[col_label.text] = col_value.text

Этот код проходит через каждый элемент поля сведений HTML и добавляет пару ключ-значение, связанную с каждым атрибутом продукта, в словарь элементов.

В конце цикла for элемент будет содержать:

{'price': '499.99', 'shipping_price': '72.58', 'currency': 'USD', 'Condition': "New: A brand-new, unused, unopened, undamaged item in its original packaging (where packaging is applicable). Packaging should be the same as what is found in a retail store, unless the item is handmade or was packaged by the manufacturer in non-retail packaging, such as an unprinted box or plastic bag. See the seller's listing for full details. See all condition definitionsopens in a new window or tab ", 'Manufacturer Warranty': '1 Year', 'Item Height': '16.89"', 'Item Length': '18.5"', 'Item Depth': '6.94"', 'Item Weight': '15.17 lbs', 'UPC': '0711719558255', 'Brand': 'Sony', 'Type': 'Home Console', 'Region Code': 'Region Free', 'Platform': 'Sony PlayStation 5', 'Color': 'White', 'Model': 'Sony PlayStation 5 Blu-Ray Edition', 'Connectivity': 'HDMI', 'MPN': '1000032624', 'Features': '3D Audio Technology, Blu-Ray Compatible, Wi-Fi Capability, Internet Browsing', 'Storage Capacity': '825 GB', 'Resolution': '4K (UHD)', 'eBay Product ID (ePID)': '26057553242', 'Manufacturer Color': 'White', 'Edition': 'God of War Ragnarök Bundle', 'Release Year': '2022'}

Замечательно! Вы только что достигли своей цели по извлечению данных!

Step 8: Export scraped data to JSON

В данный момент отсканированные данные хранятся в словаре Python. Чтобы сделать его более удобным для обмена и чтения, вы можете экспортировать его в JSON:

import json

# scraping logic...

with open('product_info.json', 'w') as file:

    json.dump(item, file)

First, you need to initialize a product_info.json file with open(). Then, you can write the JSON representation of the item dictionary to the output file with json.dump(). Check out our article to learn more about how to parse and serialize data to JSON in Python.

Пакет json поставляется из стандартной библиотеки Python, поэтому вам даже не нужно устанавливать дополнительную зависимость для достижения цели.

Отлично! Вы начали с необработанных данных, содержащихся на веб-странице, и теперь у вас есть частично структурированные данные JSON. Пришло время взглянуть на весь парсер eBay.

Step 9: Put it all together

Полный скрипт scraper.py:

import requests

from bs4 import BeautifulSoup

import sys

import re

import json

# if there are no CLI parameters

if len(sys.argv) <= 1:

    print('Item ID argument missing!')

    sys.exit(2)

# read the item ID from a CLI argument

item_id = sys.argv[1]

# build the URL of the target product page

url = f'https://www.ebay.com/itm/{item_id}'

# download the target page

page = requests.get(url)

# parse the HTML document returned by the server

soup = BeautifulSoup(page.text, 'html.parser')

# initialize the object that will contain

# the scraped data

item = {}

# price scraping logic

price_html_element = soup.select_one('.x-price-primary span[itemprop="price"]')

price = price_html_element['content']

currency_html_element = soup.select_one('.x-price-primary span[itemprop="priceCurrency"]')

currency = currency_html_element['content']

shipping_price = None

label_html_elements = soup.select('.ux-labels-values__labels')

for label_html_element in label_html_elements:

    if 'Shipping:' in label_html_element.text:

        shipping_price_html_element = label_html_element.next_sibling.select_one('.ux-textspans--BOLD')

        # if there is not a shipping price HTML element

        if shipping_price_html_element is not None:

            # extract the float number of the price from

            # the text content

            shipping_price = re.findall("\d+[.,]\d+", shipping_price_html_element.text)[0]

        break

item['price'] = price

item['shipping_price'] = shipping_price

item['currency'] = currency

# product detail scraping logic

section_title_elements = soup.select('.section-title')

for section_title_element in section_title_elements:

    if 'Item specifics' in section_title_element.text or 'About this product' in section_title_element.text:

        # get the parent element containing the entire section

        section_element = section_title_element.parent

        for section_col in section_element.select('.ux-layout-section-evo__col'):

            print(section_col.text)

            col_label = section_col.select_one('.ux-labels-values__labels')

            col_value = section_col.select_one('.ux-labels-values__values')

            # if both elements are present

            if col_label is not None and col_value is not None:

                item[col_label.text] = col_value.text

# export the scraped data to a JSON file

with open('product_info.json', 'w') as file:

    json.dump(item, file, indent=4)

Менее чем из 70 строк кода вы можете создать парсер для мониторинга данных о товарах eBay.

В качестве примера запустите его для элемента, идентифицированного ID 225605642071 с помощью:

python scraper.py 225605642071

At the end of the scraping process, the product_info.json file below will appear in the root folder of your project:

{

    "price": "499.99",

    "shipping_price": "72.58",

    "currency": "USD",

    "Condition": "New: A brand-new, unused, unopened, undamaged item in its original packaging (where packaging is applicable). Packaging should be the same as what is found in a retail store, unless the item is handmade or was packaged by the manufacturer in non-retail packaging, such as an unprinted box or plastic bag. See the seller's listing for full details",

    "Manufacturer Warranty": "1 Year",

    "Item Height": "16.89\"",

    "Item Length": "18.5\"",

    "Item Depth": "6.94\"",

    "Item Weight": "15.17 lbs",

    "UPC": "0711719558255",

    "Brand": "Sony",

    "Type": "Home Console",

    "Region Code": "Region Free",

    "Platform": "Sony PlayStation 5",

    "Color": "White",

    "Model": "Sony PlayStation 5 Blu-Ray Edition",

    "Connectivity": "HDMI",

    "MPN": "1000032624",

    "Features": "3D Audio Technology, Blu-Ray Compatible, Wi-Fi Capability, Internet Browsing",

    "Storage Capacity": "825 GB",

    "Resolution": "4K (UHD)",

    "eBay Product ID (ePID)": "26057553242",

    "Manufacturer Color": "White",

    "Edition": "God of War Ragnarok Bundle",

    "Release Year": "2022"

}

Congrats! You just learned how to scrape eBay in Python!

Подведем итоги

В этом руководстве вы узнали, почему eBay является одной из лучших целей для парсинга цен на товары и как этого добиться. Мы подробно показали, как создать парсер Python, который может извлекать данные о товарах. Как вы могли заметить, это не сложно и требует всего несколько строк кода.

At the same time, you understood how inconsistent the structure of Ebay’s pages is. The scraper built here might therefore work for one product but not for another. Also, eBay’s UI changes often, which forces you to continually maintain the script. Fortunately, you can avoid this with our eBay scraper!

If you want to extend your scraping process and extract prices from other e-commerce platforms, keep in mind that many of them heavily rely on JavaScript. When dealing with such sites, a traditional approach based on an HTML parser will not work. Instead, you need a tool that can render JavaScript and is automatically able to handle fingerprinting, CAPTCHAs, and automated retries for you. This is exactly what our new Scraping Browser solution is all about!

Узнайте больше про Scraping Browser

Don’t want to deal with eBay web scraping at all but are interested in item data? Purchase an eBay dataset.

Пробная версия Начать с Гугла

Как парсить eBay на Python для мониторинга цен