🚀 Features
The core logic defined in src/agent/graph.py
orchestrates a sophisticated search workflow that:
-
🧠 Intent Classification: Uses Gemini 2.0 Flash to classify queries into four categories:
general_search
: News, facts, definitions, explanationsproduct_search
: Shopping, prices, reviews, recommendationsweb_scraping
: Data extraction from specific websitescomparison
: Comparing multiple items or services
-
🔍 Multi-Modal Search:
- Google Search: Via Bright Data’s MCP search engine for general queries
- Web Scraping: Using Bright Data’s Web Unlocker for targeted data extraction
- Smart Routing: Automatically chooses the best search strategy based on intent
-
📊 Result Processing:
- Sanitizes and deduplicates results
- Scores results on relevance and quality
- Returns configurable top N results with confidence scores
- Provides query summaries
-
🛡️ Error Handling: Graceful fallbacks and comprehensive error management
🏗️ Architecture
The agent follows a sophisticated graph-based workflow:
START → Intent Classifier → [Google Search | Web Unlocker] → Final Processing → END
<div align=”center”>
<img src=”https://github.com/user-attachments/assets/1fba5659-1ba9-4970-bcda-949465c96872” alt=”2025-06-26_15h18_46″>
</div>
Routing Logic:
- URLs in query → Direct to Web Unlocker
general_search
→ Google Search onlyproduct_search
→ Google Search then Web Scrapingweb_scraping
→ Web Unlocker onlycomparison
→ Both search methods in parallel
Tech Stack
- LangGraph
- Gemini 2.0 Flash
- Bright Data MCP
- Pydantic
- LangGraph Studio
Getting Started
<!–
Setup instruction auto-generated by langgraph template lock
. DO NOT EDIT MANUALLY.
–>
<!–
End setup instructions
–>
- Install dependencies along with the LangGraph CLI:
cd unified-search-agent
pip install -e . "langgraph-cli[inmem]"
- Set up environment variables. Create a
.env
file with your API keys:
cp .env.example .env
Add your API keys to the .env
file:
# Required
GOOGLE_API_KEY=your_gemini_api_key_here
BRIGHT_DATA_API_TOKEN=your_bright_data_token_here
# Optional zones (defaults provided)
WEB_UNLOCKER_ZONE=unblocker
BROWSER_ZONE=scraping_browser
# Optional - for LangSmith tracing
LANGSMITH_API_KEY=lsv2...
- Start the LangGraph Server:
langgraph dev
- Open LangGraph Studio at the URL provided (typically
https://smith.langchain.com/studio/?baseUrl=http://127.0.0.1:2024
)
For more information on getting started with LangGraph Server, see here.
📝 Usage Examples
Basic Search
{
"query": "Who is Or Lenchner",
"max_results": 3
}
Product Search
{
"query": "best laptops under $1000",
"max_results": 5
}
Web Scraping
{
"query": "extract contact info from https://example.com",
"max_results": 10
}
Comparison Query
{
"query": "iPhone 15 vs Samsung Galaxy S24 comparison",
"max_results": 5
}
🎛️ Configuration
The agent supports several configurable parameters:
max_results
: Number of final results to return (default: 5)- Query-specific routing: URLs in queries automatically trigger web scraping
- Search strategies: Automatically determined by intent classification
How to Customize
-
Modify Intent Classification: Update the categories and examples in
intent_classifier_node()
insrc/agent/nodes.py
-
Adjust Search Strategies: Modify the routing logic in
src/agent/graph.py
to change how different intents are handled -
Customize Result Scoring: Update the scoring criteria in
final_processing_node()
to change how results are ranked -
Add New Search Sources: Extend the graph with additional search nodes for other data sources
-
Configure Parameters: Modify the
Configuration
class ingraph.py
to expose additional runtime parameters
🛠️ Development
While iterating on your graph in LangGraph Studio, you can:
- Edit past state and rerun from previous states to debug specific nodes
- Hot reload – local changes are automatically applied
- Create new threads using the
+
button to clear previous history - Visual debugging – see the exact flow and state at each step
The graph structure allows for easy debugging of:
- Intent classification accuracy
- Search result quality
- Routing decisions
- Final result scoring
📊 Result Format
The agent returns structured results with comprehensive scoring:
{
"final_results": [
{
"title": "Result Title",
"url": "https://example.com",
"snippet": "Relevant description...",
"source": "google_search",
"relevance_score": 0.95,
"quality_score": 0.88,
"final_score": 0.92,
"metadata": {
"search_engine": "google",
"via": "bright_data_mcp",
"query": "original query"
}
}
],
"query_summary": "Found information about...",
"total_processed": 8,
"intent": "general_search",
"intent_confidence": 0.95
}
🔧 Advanced Features
- Parallel Processing: Comparison queries execute both search methods simultaneously
- Intelligent Fallbacks: Graceful error handling with default responses
- Duplicate Detection: Automatic deduplication of results across sources
- URL Validation: Filters out invalid or empty URLs
- Content Sanitization: Cleans and validates all text content
For more advanced features and examples, refer to the LangGraph documentation.
LangGraph Studio integrates with LangSmith for in-depth tracing and team collaboration, allowing you to analyze and optimize your search agent’s performance.
📋 Dependencies
langgraph>=0.2.6
: Core orchestration frameworklangchain-google-genai
: Gemini integration for LLM operationspydantic>=2.0.0
: Data validation and parsingmcp-use
: MCP client for Bright Data integrationlangchain-core
: Core LangChain utilitiespython-dotenv>=1.0.1
: Environment variable management
🤝 Contributing
- Fork the repository
- Create a feature branch
- Make your changes
- Test with LangGraph Studio
- Submit a pull request
📄 License
This project is licensed under the MIT License – see the LICENSE file for details.
<!–
Configuration auto-generated by langgraph template lock
. DO NOT EDIT MANUALLY.
{
“config_schemas”: {
“agent”: {
“type”: “object”,
“properties”: {
“max_results”: {
“type”: “integer”,
“description”: “Maximum number of final results to return”,
“default”: 5
}
}
}
}
}
–>