Webmaster Console

Gain better control over data collection from your website 

What is the Bright Data Webmaster Console?

The Webmasters can configure a collectors.txt file to inform Bright Data about information important to data collectors, like the presence of personal information and more.

Webmasters can configure a collectors.txt file to inform Bright Data about interactive endpoints present on their websites.

The Webmaster Console offers a practical and informative solution for managing Bright Data traffic on your website.

  • User-friendly control panel
  • Round-trip time (RTT) statistics for website health tracking

What is a collectors.txt?

BrightBot enforces the robots.txt guidelines; however, it’s important to note that robots.txt was initially designed to guide search engine crawlers, not public web data collectors. There is a wealth of additional information that responsible data collectors should be aware of to ensure proper and respectful data interaction with your website.

Key considerations include the presence of personal information, which should be handled in compliance with applicable privacy laws. Furthermore, many public endpoints on your website may have limited resources. By communicating these limits, you can help in preventing unintentional overloading of various resources.

Bright Data will review collectors.txt information prior to implementation, with authentication tokens from partner cybersecurity companies as an exception. The decision whether to accept certain webpages with their collectors.txt is at Bright Data’s discretion, and Bright Data is not obligated to accept any requests, nor will Bright Data be liable for any consequences arising from unapproved requests.

  • Enhance transparency by monitoring how Bright Data interacts with your website.
  • Utilize a collectors.txt file to fine-tune access to specific sections of your website.

Webmasters can facilitate a more efficient approach for BrightBot operated by Bright Data to access their website by providing access guidelines within a collectors.txt file via the Webmaster Console. This file may contain the following information:

InputsDescriptionFormat
Personal InformationEndpoints containing information which are related to an identified or identifiable natural person.URL / Document Object
DisallowList interactive endpoint patterns such as ad links, likes, reviews, and posts. This instruction enables BrightBot to block these endpoints, aligning with our guidelines that prohibit data collection from these areas.URL / Document Object
LoadReport your organic traffic load on specific domains or subdomains and on specific time frames. Bright Bot will use this information instead of public load statistics when deciding how it should rate limit itself.URL / Document Object
Rate Time-frame

Traffic peak time

Define time slots of peak organic traffic, reducing data collection during these times.URL / Document Object
Date|Weekday|Any
Start time / End time

How it works

1

Create a Webmaster Console

2

Authenticate your websites

3

Build a collectors.txt for each site

What is Brightbot?

BrightBot is the name of Bright Data’s crawler layer, which monitors the health of every domain it targets and enforces ethical usage. This crawler is the technology that prevents access to non-public information and blocks interactive endpoints that could be abused, like ad clicks, reviews, likes, account management, etc. After you join the Bright Data Webmaster Console and submit requests to a collectors.txt file, it is Bright Bot that will enforce the ethical data collection from your website as was approved by Bright Data. 

Examples & Format

Collector.txt

                  
                    Ignore: robots. Txt
pii: /personal_info_1
pii: /personal_info_2

// endpoints containing information which are related to an identified or identifiable natural person.

Disallow: /disallow_1
disallow: /disallow_2

// list interactive endpoint patterns such as ad links, likes, reviews, and posts.

Load: example. Com:100: min
load: /endpoint_1:4500: day
load: /endpoint_2:20000: month

// organic traffic per domain or sub-domain per timerframe as reported by
the webmaster, to be considered by bright bot