A Streamlit-based web application that demonstrates autonomous browser automation using LLMs (Large Language Models). The project leverages either Gemini Flash 2.0 or GPT-4 Turbo to perform browser-based tasks through natural language instructions.
- 🤖 Autonomous browser automation using natural language commands
- 🔄 Real-time visualization of browser actions through GIF generation
- 🎯 Step-by-step progress tracking with detailed goal evaluation
- 🔀 Support for both Google's Gemini Flash 2.0 and OpenAI's GPT-4 Turbo
- 📊 Debug view for detailed execution information
- 🎨 Clean and intuitive Streamlit interface
- Python 3.12 or higher
- Google Cloud credentials (for Gemini) or OpenAI API key (for GPT-4)
- Environment variables properly configured
uvpackage manager (curl -LsSf https://astral.sh/uv/install.sh | sh)
-
Clone the repository:
git clone [repository-url] cd browser-use-project -
Create and activate a virtual environment with uv:
uv venv
-
Install dependencies using uv:
uv install
-
Set up environment variables:
- Copy
.env.exampleto.env - Fill in your API keys and credentials
cp .env.example .env
- Copy
Create a .env file with the following variables:
GOOGLE_APPLICATION_CREDENTIALS=path/to/service-account.json # For Gemini
OPENAI_API_KEY=your-openai-key # For GPT-4
LLM_TYPE=gemini # or 'openai'-
Start the Streamlit application:
uv run streamlit run app.py
-
Select your preferred model (Gemini Flash 2.0 or GPT-4 Turbo)
-
Enter your task in natural language (e.g., "Go to Reddit, search for 'python', and get the first post title")
-
Click "Run Task" and watch the agent perform the requested actions
Here's a GIF demonstrating the task execution process:
app.py: Main Streamlit applicationexample.py: Example usage and LLM configurationpyproject.toml: Project dependencies and configuration.env: Environment variables (not tracked in git).env.example: Template for environment variables
browser-use: Core browser automation librarystreamlit: Web interfaceplaywright: Browser automationgoogle-generativeai: Gemini API integrationlangchain: LLM framework integrationPillow: Image processing for GIF generation
