The LocalLab Chat Interface is a powerful terminal-based tool that gives you a ChatGPT-like experience right in your command line. It's the easiest and most intuitive way to interact with your AI models.
- 🚀 Instant Access - No coding required, just type and chat
- 💬 Natural Conversations - ChatGPT-like experience in your terminal
- 🎨 Rich Formatting - Markdown rendering with syntax highlighting
- ⚡ Real-time Responses - See AI responses as they're generated
- 🔄 Smart Features - History, saving, batch processing, and more
- 🌐 Works Everywhere - Local, remote, or Google Colab
# 1. Start your server (if not already running)
locallab start
# 2. Open chat interface
locallab chat
# 3. Start chatting!
You: Hello! Can you help me with Python?
AI: Hello! I'd be happy to help you with Python programming...# Connect to remote server
locallab chat --url https://your-ngrok-url.app
# Use specific generation mode
locallab chat --generate chat
# Customize generation parameters
locallab chat --max-tokens 200 --temperature 0.8 --top-p 0.9| Mode | Description | Best For | Usage |
|---|---|---|---|
| Stream | Real-time response streaming | Interactive conversations | --generate stream |
| Chat | Conversational with context | Multi-turn discussions | --generate chat |
| Simple | Single-shot generation | Quick queries | --generate simple |
| Batch | Multiple prompt processing | Bulk operations | --generate batch |
Override the default mode for any message:
You: Write a story --stream # Use streaming mode
You: Remember my name is Alice --chat # Use chat mode with context
You: What's 2+2? --simple # Use simple mode
You: Process these --batch # Use batch mode- 🎨 Rich Markdown Rendering - Full markdown support with syntax highlighting
- 💻 Code Highlighting - 40+ programming languages supported
- 📊 Progress Indicators - Visual progress bars for batch operations
- 🎮 Interactive Commands - Built-in commands for session management
- 📱 Responsive Design - Adapts to terminal size and capabilities
- 🔄 Auto-reconnection - Automatic server reconnection with retries
- 💾 Graceful Shutdown - Clean exit with conversation save prompts
- 📡 Connection Monitoring - Real-time health checks and status monitoring
- ⚡ Resource Management - Efficient memory and connection management
| Option | Short | Description | Default |
|---|---|---|---|
--url |
-u |
LocalLab server URL | http://localhost:8000 |
--generate |
-g |
Generation mode | stream |
--verbose |
-v |
Enable verbose output | False |
| Option | Short | Description | Default |
|---|---|---|---|
--max-tokens |
-m |
Maximum tokens to generate | 8192 |
--temperature |
-t |
Temperature for generation | 0.7 |
--top-p |
-p |
Top-p for nucleus sampling | 0.9 |
Real-time streaming responses with live text generation.
locallab chat --generate streamFeatures:
- Live text streaming
- Real-time response display
- Immediate feedback
- Optimal for interactive conversations
Single-shot text generation without streaming.
locallab chat --generate simpleFeatures:
- Complete response at once
- Lower resource usage
- Suitable for quick queries
- Faster for short responses
Conversational mode with context retention and history.
locallab chat --generate chatFeatures:
- Multi-turn conversations
- Context preservation
- Conversation history
- Memory of previous exchanges
Process multiple prompts efficiently in batches.
locallab chat --generate batchFeatures:
- Multiple prompt processing
- Progress tracking
- Efficient resource usage
- Bulk text generation
/exit,/quit,/bye,/goodbye- Exit the chat gracefully/clear- Clear the terminal screen/help- Show available commands
/history- Display conversation history/reset- Reset conversation history/stats- Show conversation statistics
/save- Save conversation to file/load- Load conversation from file
/batch- Enter interactive batch mode
You can override the default generation mode for individual messages using inline mode switches. Simply append the desired mode to your message:
Your message --[mode]
--stream- Stream response in real-time--chat- Use conversational mode with context--batch- Process as single batch item--simple- Simple text generation
# Override to stream mode for one message
You: Explain quantum physics --stream
🔄 Using stream mode for this message
AI: [Streaming response...]
# Override to chat mode
You: Remember my name is Alice --chat
🔄 Using chat mode for this message
AI: I'll remember that your name is Alice.
# Override to simple mode
You: What's 2+2? --simple
🔄 Using simple mode for this message
AI: 4
# Invalid mode shows error
You: Hello --invalid
❌ Invalid mode: --invalid. Valid modes: --stream, --chat, --batch, --simple- Per-message overrides: Change mode for specific messages without affecting the default
- Case insensitive:
--STREAM,--Stream, and--streamall work - Backward compatible: Existing CLI options continue to work as default mode setters
- Error handling: Clear error messages for invalid mode specifications
- Visual feedback: Mode override notifications show which mode is being used
$ locallab chat
🚀 LocalLab Chat Interface
Connected to: http://localhost:8000
Server: LocalLab v0.9.0 | Model: qwen-0.5b
You: Hello! How are you today?
AI: Hello! I'm doing well, thank you for asking. I'm here and ready to help you with any questions or tasks you might have. How can I assist you today?
You: /exit
👋 Goodbye!$ locallab chat --url https://abc123.ngrok.io
🚀 LocalLab Chat Interface
🔗 Connecting to remote server...
✅ Connected to: https://abc123.ngrok.io
Server: LocalLab v0.9.0 | Model: qwen-7b
You: What's the weather like?
AI: I don't have access to real-time weather data...$ locallab chat --generate chat
🚀 LocalLab Chat Interface - Chat Mode
💬 Context retention enabled
You: My name is Alice
AI: Nice to meet you, Alice! How can I help you today?
You: What's my name?
AI: Your name is Alice, as you just told me.
You: /stats
📊 Conversation Statistics:
- Total messages: 4
- User messages: 2
- Assistant messages: 2
- Estimated tokens: ~150$ locallab chat --generate batch
🚀 LocalLab Chat Interface - Batch Mode
You: /batch
📝 Enter prompts (one per line, empty line to finish):
> Explain quantum computing
> What is machine learning?
> Define artificial intelligence
>
🔄 Processing 3 prompts...
[████████████████████████████████] 100% Complete
Results:
1. Quantum computing is a revolutionary computing paradigm...
2. Machine learning is a subset of artificial intelligence...
3. Artificial intelligence (AI) refers to the simulation...$ locallab chat --max-tokens 200 --temperature 0.9 --top-p 0.95
🚀 LocalLab Chat Interface
⚙️ Generation Settings:
- Max tokens: 200
- Temperature: 0.9
- Top-p: 0.95
You: Write a creative story
AI: [More creative and varied response due to higher temperature]The chat interface automatically handles various connection scenarios:
❌ Connection failed: Server not responding
🔄 Attempting to reconnect... (1/3)
✅ Reconnected successfully!⚠️ Connection lost - attempting reconnection...
🔄 Reconnecting... (2/3)
✅ Connection restored!You: /exit
🛑 Initiating graceful shutdown...
💾 Save conversation before exiting? [y/N]: y
📁 Conversation saved to: chat_2024-07-06_14-30-15.json
👋 Goodbye!# Default server URL
export LOCALLAB_URL="http://localhost:8000"
# Default generation parameters
export LOCALLAB_MAX_TOKENS=4096
export LOCALLAB_TEMPERATURE=0.7
export LOCALLAB_TOP_P=0.9# Connection timeout (seconds)
export LOCALLAB_TIMEOUT=30
# Reconnection attempts
export LOCALLAB_MAX_RETRIES=3
# Retry delay (seconds)
export LOCALLAB_RETRY_DELAY=2Server Not Found
❌ Error: Could not connect to LocalLab server
💡 Make sure the LocalLab server is running and accessible.Solution: Start the LocalLab server first:
locallab startConnection Timeout
❌ Timeout Error: Connection or operation timed out
💡 Try increasing timeout or check your network connection.Solution: Use longer timeout:
locallab chat --timeout 60Model Not Loaded
⚠️ Warning: No model currently loaded
💡 Load a model first using the LocalLab interfaceSolution: Load a model through the web interface or API.
Enable verbose output for debugging:
locallab chat --verboseThis provides detailed logging of:
- Connection attempts
- API requests/responses
- Error details
- Performance metrics
The chat interface can be used in scripts with input redirection:
# Process prompts from file
echo "Hello world" | locallab chat --generate simple
# Batch process from file
locallab chat --generate batch < prompts.txt# Pipe output to other commands
echo "Summarize this text" | locallab chat --generate simple | tee summary.txt
# Use with curl for remote processing
curl -s https://api.example.com/data | locallab chat --generate simpleThe chat interface is compatible with LocalLab server endpoints:
/generate- Text generation/chat- Chat completions/generate/batch- Batch processing/health- Health checks/system/info- Server information/models/current- Model information
-
Use appropriate generation modes:
- Stream for interactive chat
- Simple for quick queries
- Batch for multiple prompts
-
Optimize parameters:
- Lower max_tokens for faster responses
- Adjust temperature based on use case
- Use appropriate top_p values
-
Connection management:
- Keep connections alive for multiple requests
- Use local servers when possible
- Monitor connection health
- Always use HTTPS for remote connections
- Validate server certificates
- Avoid sending sensitive data over unencrypted connections
- Use authentication when available
- Monitor for unusual connection patterns