📁 Project Structure & Architecture

Complete overview of the LLM Playground codebase.

📊 Directory Structure

llm_playground/
│
├── README.md                     # Main entry point, setup instructions
├── CONCEPTS.md                   # Deep dive into LLM theory
├── LEARNING_OUTCOMES.md          # What you'll learn + follow-ups
├── TUTORIAL.md                   # Step-by-step guided experiments
│
├── requirements.txt              # Python dependencies
├── .env.example                  # Environment variables template
├── .gitignore                   # Git ignore patterns
├── config.py                    # Central configuration
├── logger.py                    # Structured logging system
│
├── setup.sh                     # Automated setup script
├── example.py                   # Quick verification script
├── app.py                       # Streamlit web interface ⭐
├── cli.py                       # Command-line interface
│
├── models/                      # Model abstraction layer
│   ├── __init__.py             # Factory and exports
│   ├── base.py                 # BaseModel interface
│   ├── ollama_model.py         # Ollama implementation ⭐
│   └── openai_model.py         # OpenAI implementation (optional)
│
├── experiments/                 # Experiment implementations
│   ├── __init__.py
│   ├── zero_shot.py            # Zero-shot prompting
│   ├── few_shot.py             # Few-shot learning
│   ├── sampling_params.py      # Temperature/top-p experiments
│   ├── context_window.py       # Context length testing
│   └── prompt_sensitivity.py   # Prompt variation analysis
│
└── logs/                        # Generated logs (auto-created)
    ├── interactions_20251218_120000.jsonl
    └── interactions_20251218_130000.jsonl

🏗️ Architecture Overview

Design Principles

Modularity: Each component has a single responsibility
Extensibility: Easy to add new models or experiments
Simplicity: Beginner-friendly code, no over-engineering
Observability: Everything is logged for analysis

Key Components

1. Model Abstraction (`models/`)

Purpose: Provide unified interface to different LLM providers.

# All models implement this interface
class BaseModel:
    def generate(prompt, temperature, max_tokens, top_p) -> ModelResponse
    def count_tokens(text) -> int

Benefits:

Swap models without changing experiment code
Add new providers easily
Consistent response format

Example:

# Same code works for any provider
model = get_model("ollama", "llama2")
# or
model = get_model("openai", "gpt-4")

# Both work identically
response = model.generate("Hello")

2. Logging System (`logger.py`)

Purpose: Capture all interactions for analysis.

Features:

Structured logging (JSON or CSV)
Automatic metrics collection
Cost tracking
Experiment categorization

Logged Data:

{
  "timestamp": "2025-12-18T10:30:45.123Z",
  "model": "ollama:llama2",
  "prompt": "What is AI?",
  "response": "AI is...",
  "parameters": {"temperature": 0.7, ...},
  "metrics": {
    "prompt_tokens": 5,
    "completion_tokens": 87,
    "latency_ms": 1234,
    "cost_usd": 0.0
  },
  "experiment_type": "zero_shot",
  "notes": "Additional context"
}

3. Experiment Framework (`experiments/`)

Purpose: Reusable experiment implementations.

Design Pattern:

def run_experiment(model, params, logger):
    # 1. Run generation
    response = model.generate(...)
    
    # 2. Log interaction
    logger.log_interaction(...)
    
    # 3. Return results
    return response

Available Experiments:

Zero-shot: No examples
Few-shot: With examples
Temperature: Parameter tuning
Context window: Length testing
Prompt sensitivity: Variation analysis

4. User Interfaces

Streamlit App (`app.py`)

Interactive web UI
Visual parameter controls
Real-time results
Built-in tutorials

CLI (`cli.py`)

Fast terminal access
Scripting support
Batch processing
Automation-friendly

🔄 Data Flow

Simple Generation Flow

User Input
    ↓
app.py or cli.py
    ↓
get_model(provider, name)
    ↓
model.generate(prompt, params)
    ↓
[API Call to Ollama/OpenAI]
    ↓
ModelResponse(text, tokens, latency, ...)
    ↓
logger.log_interaction(...)
    ↓
[Save to logs/]
    ↓
Display to User

Experiment Flow

User selects experiment
    ↓
experiment.run_xxx(model, params, logger)
    ↓
Multiple model.generate() calls
    ↓
Each call logged separately
    ↓
Aggregate results
    ↓
Analysis & display

🎨 Design Decisions

Why Ollama-First?

Reasons:

✅ Free - No API costs
✅ Private - Data stays local
✅ Fast - No network latency
✅ Educational - See models up close

Why Streamlit for UI?

Reasons:

✅ Rapid development - Build UI in pure Python
✅ Interactive - Great for experimentation
✅ Familiar - Popular in data science
✅ Easy to extend - Add features quickly

Why JSON Logs?

Reasons:

✅ Structured - Easy to parse and analyze
✅ Flexible - Can add fields without breaking
✅ Standard - Works with many tools
✅ Human-readable - Can inspect manually

🔌 Extension Points

Adding a New Model Provider

Steps:

Create models/your_provider.py
Inherit from BaseModel
Implement generate() and count_tokens()
Add to models/__init__.py

Example:

# models/huggingface_model.py
from models.base import BaseModel, ModelResponse

class HuggingFaceModel(BaseModel):
    def generate(self, prompt, temperature, max_tokens, top_p):
        # Your implementation
        return ModelResponse(...)
    
    def count_tokens(self, text):
        # Your implementation
        return count

# models/__init__.py
from models.huggingface_model import HuggingFaceModel

def get_model(provider, model_name, **kwargs):
    if provider == "huggingface":
        return HuggingFaceModel(model_name, **kwargs)
    # ...

Adding a New Experiment

Steps:

Create experiments/your_experiment.py
Implement run_your_experiment(model, logger, ...)
Export from experiments/__init__.py
Add UI in app.py

Template:

# experiments/your_experiment.py
from models.base import BaseModel
from logger import Logger

def run_your_experiment(
    model: BaseModel,
    logger: Logger,
    # Your parameters
) -> YourResultType:
    """
    Your experiment description.
    """
    # 1. Setup
    # 2. Run generation(s)
    response = model.generate(...)
    
    # 3. Log
    logger.log_interaction(
        prompt=...,
        response=response,
        parameters=...,
        experiment_type="your_experiment",
    )
    
    # 4. Return
    return results

Adding a New UI Tab

Steps in app.py:

def your_experiment_tab(params):
    """Your tab implementation."""
    st.header("🆕 Your Experiment")
    st.markdown("Description...")
    
    # Add your controls
    user_input = st.text_input("Input")
    
    if st.button("Run"):
        # Call your experiment
        result = run_your_experiment(...)
        
        # Display results
        st.write(result)

# Add to main tabs
tabs = st.tabs([..., "🆕 Your Experiment"])
with tabs[-1]:
    your_experiment_tab(params)

📦 Dependencies

Core Dependencies

streamlit>=1.28.0      # Web UI framework
requests>=2.31.0       # HTTP client for Ollama
python-dotenv>=1.0.0   # Environment variables

Optional Dependencies

openai>=1.0.0          # For OpenAI support
tiktoken>=0.5.0        # For exact token counting
pandas>=2.0.0          # For log analysis
matplotlib>=3.7.0      # For visualization

🧪 Testing Strategy

Manual Testing Checklist

Basic Functionality:

Error Handling:

Ollama not running → Clear error message
Model not installed → Helpful suggestion
Invalid parameters → Validation error
Network timeout → Graceful degradation

Performance:

Response within reasonable time
No memory leaks in long sessions
Logs don't grow unbounded

Automated Testing (Future)

# tests/test_models.py
def test_ollama_generation():
    model = OllamaModel("llama2")
    response = model.generate("Hello", temperature=0.7)
    assert response.text is not None
    assert response.total_tokens > 0

# tests/test_experiments.py
def test_zero_shot():
    # Mock model
    result = run_zero_shot_experiment(mock_model, "test", mock_logger)
    assert result is not None

🔒 Security Considerations

API Keys

✅ Use environment variables
✅ Never commit .env files
✅ Provide .env.example template

User Input

✅ Sanitize prompts (no code injection)
✅ Validate parameters
✅ Limit request sizes

Local Models

✅ No external data sharing
✅ Full privacy
✅ No API key needed

📈 Performance Characteristics

Ollama (Local)

Latency: 50-200ms overhead + generation time
Throughput: 5-20 tokens/second (depends on hardware)
Memory: 4-8GB RAM per model
Cost: Free

OpenAI (API)

Latency: 100-500ms overhead + generation time
Throughput: 50-100 tokens/second
Memory: Minimal (cloud-based)
Cost: ~$0.0005-0.03 per 1K tokens

🎯 Best Practices

Code Organization

One concept per file
Clear function names
Type hints where helpful
Docstrings for public APIs

Configuration

Centralize in config.py
Use environment variables for secrets
Provide sensible defaults

Logging

Log every interaction
Include enough context
Use structured formats
Rotate logs when large

Error Handling

Catch specific exceptions
Provide actionable error messages
Don't swallow errors silently
Log errors for debugging

📚 Further Reading

Code References

Streamlit docs: https://docs.streamlit.io
Ollama API: https://github.com/ollama/ollama/blob/main/docs/api.md
OpenAI API: https://platform.openai.com/docs

Similar Projects

LangChain: Full LLM framework
LiteLLM: Unified API for many providers
Haystack: NLP framework with LLM support

🤝 Contributing

To extend this project:

Fork and clone
Create a branch: git checkout -b feature/your-feature
Make changes following the patterns above
Test thoroughly
Document in README and inline comments
Submit PR with clear description

This architecture prioritizes learning and experimentation over production robustness. It's designed to be understood, modified, and extended by beginners! 🚀

FilesExpand file tree

ARCHITECTURE.md

Latest commit

History

ARCHITECTURE.md

File metadata and controls

📁 Project Structure & Architecture

📊 Directory Structure

🏗️ Architecture Overview

Design Principles

Key Components

1. Model Abstraction (models/)

2. Logging System (logger.py)

3. Experiment Framework (experiments/)

4. User Interfaces

Streamlit App (app.py)

CLI (cli.py)

🔄 Data Flow

Simple Generation Flow

Experiment Flow

🎨 Design Decisions

Why Ollama-First?

Why Streamlit for UI?

Why JSON Logs?

🔌 Extension Points

Adding a New Model Provider

Adding a New Experiment

Adding a New UI Tab

📦 Dependencies

Core Dependencies

Optional Dependencies

🧪 Testing Strategy

Manual Testing Checklist

Automated Testing (Future)

🔒 Security Considerations

API Keys

User Input

Local Models

📈 Performance Characteristics

Ollama (Local)

OpenAI (API)

🎯 Best Practices

Code Organization

Configuration

Logging

Error Handling

📚 Further Reading

Code References

Similar Projects

🤝 Contributing

1. Model Abstraction (`models/`)

2. Logging System (`logger.py`)

3. Experiment Framework (`experiments/`)

Streamlit App (`app.py`)

CLI (`cli.py`)