A comprehensive intelligent system for Samsung smartphone information, featuring web scraping, conversational AI chatbot, multi-agent system, and REST API integration.
- Web Scraping: Automated data collection from GSMArena for Samsung phone specifications
- RAG Chatbot: Conversational AI using free open-source models with retrieval-augmented generation
- Multi-Agent System: Specialized agents for phone specifications, comparisons, and review generation
- REST API: FastAPI-based endpoints for system interaction
- PostgreSQL Database: Structured storage for phone data and generated reviews
- Vector Database: ChromaDB integration for semantic search and RAG capabilities
- Web Scraping: BeautifulSoup4, aiohttp, Selenium
- Database: PostgreSQL with SQLAlchemy ORM
- Chatbot: Hugging Face Transformers, Sentence Transformers
- Multi-Agent: CrewAI framework
- API: FastAPI with Pydantic models
- Vector Store: ChromaDB
- Models: Free open-source models (no OpenAI API required)
-
Clone the repository:
git clone https://github.com/zenjahid/samsung-chatbot-api.git cd samsung-chatbot-api -
Install dependencies:
pip install -r requirements.txt
-
Set up PostgreSQL database:
# Install PostgreSQL (Ubuntu/Debian) sudo apt-get install postgresql postgresql-contrib # Create database sudo -u postgres createdb samsung_phones sudo -u postgres createuser --superuser $USER
-
Configure environment:
cp .env.example .env # Edit .env with your database credentials -
Initialize the system:
python -c "from src.database.connection import create_tables; create_tables()"
python main.pyThe API will be available at http://localhost:8000
Visit http://localhost:8000/docs for interactive API documentation.
# Via API
curl -X POST "http://localhost:8000/scraper/run"
# Or directly
python src/scraper/gsmarena_scraper.pycurl -X POST "http://localhost:8000/chat" \
-H "Content-Type: application/json" \
-d '{"message": "What are the camera specs of the Samsung Galaxy S23?"}'# Get specifications
curl -X POST "http://localhost:8000/agents/specifications?phone_name=Galaxy S23"
# Compare phones
curl -X POST "http://localhost:8000/agents/compare" \
-H "Content-Type: application/json" \
-d '{"phone_names": ["Galaxy S23", "Galaxy S22"]}'
# Generate review
curl -X POST "http://localhost:8000/agents/review" \
-H "Content-Type: application/json" \
-d '{"phone_name": "Galaxy S23 Ultra"}'POST /chat- Chat with RAG-enabled botGET /examples- Get example queries
GET /phones- List all phonesGET /phones/{phone_id}- Get phone by IDGET /phones/search/{phone_name}- Search phones
POST /agents/specifications- Get detailed specsPOST /agents/compare- Compare multiple phonesPOST /agents/review- Generate comprehensive review
POST /scraper/run- Run web scraperGET /stats- System statisticsGET /health- Health check
src/
├── config.py # Configuration settings
├── api/
│ └── main.py # FastAPI application
├── agents/
│ └── multi_agent_system.py # CrewAI multi-agent system
├── chatbot/
│ └── rag_chatbot.py # RAG chatbot implementation
├── database/
│ ├── connection.py # Database connection
│ └── models.py # SQLAlchemy models
└── scraper/
└── gsmarena_scraper.py # Web scraper for GSMArena
data/ # Data storage
tests/ # Test files
The system uses free, open-source models:
- Language Model: microsoft/DialoGPT-medium (conversation)
- Embedding Model: all-MiniLM-L6-v2 (text embeddings)
- Agent Model: microsoft/DialoGPT-small (agent coordination)
- "What are the camera specs of the Samsung Galaxy S23?"
- "Which Samsung phone has the best battery life?"
- "How does the Galaxy S23 compare to the S22 in terms of performance?"
- "What is the price of the Galaxy Z Fold 4?"
- Specifications: Get detailed specs for any Samsung phone
- Comparisons: Compare 2 or more Samsung phones
- Reviews: Generate comprehensive AI reviews
pytest tests/black src/
isort src/
flake8 src/- Follow the existing project structure
- Use type hints and docstrings
- Add appropriate error handling
- Update API documentation
Key configuration options in .env:
DATABASE_URL: PostgreSQL connection stringMODEL_NAME: Language model for chatbotEMBEDDING_MODEL: Model for text embeddingsSCRAPING_DELAY: Delay between web scraping requestsCHROMA_DB_PATH: ChromaDB storage path
-
Database Connection Error:
- Ensure PostgreSQL is running
- Check database credentials in
.env
-
Model Loading Issues:
- Ensure sufficient disk space for model cache
- Check internet connection for model downloads
-
Web Scraping Failures:
- Check GSMArena website accessibility
- Adjust scraping delays if needed
Check logs for detailed error information:
tail -f logs/app.log- Fork the repository
- Create a feature branch
- Make changes with tests
- Submit a pull request
This project is for educational and research purposes. Please respect GSMArena's terms of service when scraping data.