Complete overview of the LLM Playground codebase.
llm_playground/
β
βββ README.md # Main entry point, setup instructions
βββ CONCEPTS.md # Deep dive into LLM theory
βββ LEARNING_OUTCOMES.md # What you'll learn + follow-ups
βββ TUTORIAL.md # Step-by-step guided experiments
β
βββ requirements.txt # Python dependencies
βββ .env.example # Environment variables template
βββ .gitignore # Git ignore patterns
βββ config.py # Central configuration
βββ logger.py # Structured logging system
β
βββ setup.sh # Automated setup script
βββ example.py # Quick verification script
βββ app.py # Streamlit web interface β
βββ cli.py # Command-line interface
β
βββ models/ # Model abstraction layer
β βββ __init__.py # Factory and exports
β βββ base.py # BaseModel interface
β βββ ollama_model.py # Ollama implementation β
β βββ openai_model.py # OpenAI implementation (optional)
β
βββ experiments/ # Experiment implementations
β βββ __init__.py
β βββ zero_shot.py # Zero-shot prompting
β βββ few_shot.py # Few-shot learning
β βββ sampling_params.py # Temperature/top-p experiments
β βββ context_window.py # Context length testing
β βββ prompt_sensitivity.py # Prompt variation analysis
β
βββ logs/ # Generated logs (auto-created)
βββ interactions_20251218_120000.jsonl
βββ interactions_20251218_130000.jsonl
- Modularity: Each component has a single responsibility
- Extensibility: Easy to add new models or experiments
- Simplicity: Beginner-friendly code, no over-engineering
- Observability: Everything is logged for analysis
Purpose: Provide unified interface to different LLM providers.
# All models implement this interface
class BaseModel:
def generate(prompt, temperature, max_tokens, top_p) -> ModelResponse
def count_tokens(text) -> intBenefits:
- Swap models without changing experiment code
- Add new providers easily
- Consistent response format
Example:
# Same code works for any provider
model = get_model("ollama", "llama2")
# or
model = get_model("openai", "gpt-4")
# Both work identically
response = model.generate("Hello")Purpose: Capture all interactions for analysis.
Features:
- Structured logging (JSON or CSV)
- Automatic metrics collection
- Cost tracking
- Experiment categorization
Logged Data:
{
"timestamp": "2025-12-18T10:30:45.123Z",
"model": "ollama:llama2",
"prompt": "What is AI?",
"response": "AI is...",
"parameters": {"temperature": 0.7, ...},
"metrics": {
"prompt_tokens": 5,
"completion_tokens": 87,
"latency_ms": 1234,
"cost_usd": 0.0
},
"experiment_type": "zero_shot",
"notes": "Additional context"
}Purpose: Reusable experiment implementations.
Design Pattern:
def run_experiment(model, params, logger):
# 1. Run generation
response = model.generate(...)
# 2. Log interaction
logger.log_interaction(...)
# 3. Return results
return responseAvailable Experiments:
- Zero-shot: No examples
- Few-shot: With examples
- Temperature: Parameter tuning
- Context window: Length testing
- Prompt sensitivity: Variation analysis
- Interactive web UI
- Visual parameter controls
- Real-time results
- Built-in tutorials
- Fast terminal access
- Scripting support
- Batch processing
- Automation-friendly
User Input
β
app.py or cli.py
β
get_model(provider, name)
β
model.generate(prompt, params)
β
[API Call to Ollama/OpenAI]
β
ModelResponse(text, tokens, latency, ...)
β
logger.log_interaction(...)
β
[Save to logs/]
β
Display to User
User selects experiment
β
experiment.run_xxx(model, params, logger)
β
Multiple model.generate() calls
β
Each call logged separately
β
Aggregate results
β
Analysis & display
Reasons:
- β Free - No API costs
- β Private - Data stays local
- β Fast - No network latency
- β Educational - See models up close
Reasons:
- β Rapid development - Build UI in pure Python
- β Interactive - Great for experimentation
- β Familiar - Popular in data science
- β Easy to extend - Add features quickly
Reasons:
- β Structured - Easy to parse and analyze
- β Flexible - Can add fields without breaking
- β Standard - Works with many tools
- β Human-readable - Can inspect manually
Steps:
- Create
models/your_provider.py - Inherit from
BaseModel - Implement
generate()andcount_tokens() - Add to
models/__init__.py
Example:
# models/huggingface_model.py
from models.base import BaseModel, ModelResponse
class HuggingFaceModel(BaseModel):
def generate(self, prompt, temperature, max_tokens, top_p):
# Your implementation
return ModelResponse(...)
def count_tokens(self, text):
# Your implementation
return count
# models/__init__.py
from models.huggingface_model import HuggingFaceModel
def get_model(provider, model_name, **kwargs):
if provider == "huggingface":
return HuggingFaceModel(model_name, **kwargs)
# ...Steps:
- Create
experiments/your_experiment.py - Implement
run_your_experiment(model, logger, ...) - Export from
experiments/__init__.py - Add UI in
app.py
Template:
# experiments/your_experiment.py
from models.base import BaseModel
from logger import Logger
def run_your_experiment(
model: BaseModel,
logger: Logger,
# Your parameters
) -> YourResultType:
"""
Your experiment description.
"""
# 1. Setup
# 2. Run generation(s)
response = model.generate(...)
# 3. Log
logger.log_interaction(
prompt=...,
response=response,
parameters=...,
experiment_type="your_experiment",
)
# 4. Return
return resultsSteps in app.py:
def your_experiment_tab(params):
"""Your tab implementation."""
st.header("π Your Experiment")
st.markdown("Description...")
# Add your controls
user_input = st.text_input("Input")
if st.button("Run"):
# Call your experiment
result = run_your_experiment(...)
# Display results
st.write(result)
# Add to main tabs
tabs = st.tabs([..., "π Your Experiment"])
with tabs[-1]:
your_experiment_tab(params)streamlit>=1.28.0 # Web UI framework
requests>=2.31.0 # HTTP client for Ollama
python-dotenv>=1.0.0 # Environment variables
openai>=1.0.0 # For OpenAI support
tiktoken>=0.5.0 # For exact token counting
pandas>=2.0.0 # For log analysis
matplotlib>=3.7.0 # For visualization
Basic Functionality:
- Connect to Ollama
- Generate simple response
- Change temperature and observe effect
- Run each experiment type
- Check logs are created
Error Handling:
- Ollama not running β Clear error message
- Model not installed β Helpful suggestion
- Invalid parameters β Validation error
- Network timeout β Graceful degradation
Performance:
- Response within reasonable time
- No memory leaks in long sessions
- Logs don't grow unbounded
# tests/test_models.py
def test_ollama_generation():
model = OllamaModel("llama2")
response = model.generate("Hello", temperature=0.7)
assert response.text is not None
assert response.total_tokens > 0
# tests/test_experiments.py
def test_zero_shot():
# Mock model
result = run_zero_shot_experiment(mock_model, "test", mock_logger)
assert result is not None- β Use environment variables
- β
Never commit
.envfiles - β
Provide
.env.exampletemplate
- β Sanitize prompts (no code injection)
- β Validate parameters
- β Limit request sizes
- β No external data sharing
- β Full privacy
- β No API key needed
- Latency: 50-200ms overhead + generation time
- Throughput: 5-20 tokens/second (depends on hardware)
- Memory: 4-8GB RAM per model
- Cost: Free
- Latency: 100-500ms overhead + generation time
- Throughput: 50-100 tokens/second
- Memory: Minimal (cloud-based)
- Cost: ~$0.0005-0.03 per 1K tokens
- One concept per file
- Clear function names
- Type hints where helpful
- Docstrings for public APIs
- Centralize in
config.py - Use environment variables for secrets
- Provide sensible defaults
- Log every interaction
- Include enough context
- Use structured formats
- Rotate logs when large
- Catch specific exceptions
- Provide actionable error messages
- Don't swallow errors silently
- Log errors for debugging
- Streamlit docs: https://docs.streamlit.io
- Ollama API: https://github.com/ollama/ollama/blob/main/docs/api.md
- OpenAI API: https://platform.openai.com/docs
- LangChain: Full LLM framework
- LiteLLM: Unified API for many providers
- Haystack: NLP framework with LLM support
To extend this project:
- Fork and clone
- Create a branch:
git checkout -b feature/your-feature - Make changes following the patterns above
- Test thoroughly
- Document in README and inline comments
- Submit PR with clear description
This architecture prioritizes learning and experimentation over production robustness. It's designed to be understood, modified, and extended by beginners! π