Skip to content

Latest commit

Β 

History

History
475 lines (372 loc) Β· 11 KB

File metadata and controls

475 lines (372 loc) Β· 11 KB

πŸ“ Project Structure & Architecture

Complete overview of the LLM Playground codebase.


πŸ“Š Directory Structure

llm_playground/
β”‚
β”œβ”€β”€ README.md                     # Main entry point, setup instructions
β”œβ”€β”€ CONCEPTS.md                   # Deep dive into LLM theory
β”œβ”€β”€ LEARNING_OUTCOMES.md          # What you'll learn + follow-ups
β”œβ”€β”€ TUTORIAL.md                   # Step-by-step guided experiments
β”‚
β”œβ”€β”€ requirements.txt              # Python dependencies
β”œβ”€β”€ .env.example                  # Environment variables template
β”œβ”€β”€ .gitignore                   # Git ignore patterns
β”œβ”€β”€ config.py                    # Central configuration
β”œβ”€β”€ logger.py                    # Structured logging system
β”‚
β”œβ”€β”€ setup.sh                     # Automated setup script
β”œβ”€β”€ example.py                   # Quick verification script
β”œβ”€β”€ app.py                       # Streamlit web interface ⭐
β”œβ”€β”€ cli.py                       # Command-line interface
β”‚
β”œβ”€β”€ models/                      # Model abstraction layer
β”‚   β”œβ”€β”€ __init__.py             # Factory and exports
β”‚   β”œβ”€β”€ base.py                 # BaseModel interface
β”‚   β”œβ”€β”€ ollama_model.py         # Ollama implementation ⭐
β”‚   └── openai_model.py         # OpenAI implementation (optional)
β”‚
β”œβ”€β”€ experiments/                 # Experiment implementations
β”‚   β”œβ”€β”€ __init__.py
β”‚   β”œβ”€β”€ zero_shot.py            # Zero-shot prompting
β”‚   β”œβ”€β”€ few_shot.py             # Few-shot learning
β”‚   β”œβ”€β”€ sampling_params.py      # Temperature/top-p experiments
β”‚   β”œβ”€β”€ context_window.py       # Context length testing
β”‚   └── prompt_sensitivity.py   # Prompt variation analysis
β”‚
└── logs/                        # Generated logs (auto-created)
    β”œβ”€β”€ interactions_20251218_120000.jsonl
    └── interactions_20251218_130000.jsonl

πŸ—οΈ Architecture Overview

Design Principles

  1. Modularity: Each component has a single responsibility
  2. Extensibility: Easy to add new models or experiments
  3. Simplicity: Beginner-friendly code, no over-engineering
  4. Observability: Everything is logged for analysis

Key Components

1. Model Abstraction (models/)

Purpose: Provide unified interface to different LLM providers.

# All models implement this interface
class BaseModel:
    def generate(prompt, temperature, max_tokens, top_p) -> ModelResponse
    def count_tokens(text) -> int

Benefits:

  • Swap models without changing experiment code
  • Add new providers easily
  • Consistent response format

Example:

# Same code works for any provider
model = get_model("ollama", "llama2")
# or
model = get_model("openai", "gpt-4")

# Both work identically
response = model.generate("Hello")

2. Logging System (logger.py)

Purpose: Capture all interactions for analysis.

Features:

  • Structured logging (JSON or CSV)
  • Automatic metrics collection
  • Cost tracking
  • Experiment categorization

Logged Data:

{
  "timestamp": "2025-12-18T10:30:45.123Z",
  "model": "ollama:llama2",
  "prompt": "What is AI?",
  "response": "AI is...",
  "parameters": {"temperature": 0.7, ...},
  "metrics": {
    "prompt_tokens": 5,
    "completion_tokens": 87,
    "latency_ms": 1234,
    "cost_usd": 0.0
  },
  "experiment_type": "zero_shot",
  "notes": "Additional context"
}

3. Experiment Framework (experiments/)

Purpose: Reusable experiment implementations.

Design Pattern:

def run_experiment(model, params, logger):
    # 1. Run generation
    response = model.generate(...)
    
    # 2. Log interaction
    logger.log_interaction(...)
    
    # 3. Return results
    return response

Available Experiments:

  • Zero-shot: No examples
  • Few-shot: With examples
  • Temperature: Parameter tuning
  • Context window: Length testing
  • Prompt sensitivity: Variation analysis

4. User Interfaces

Streamlit App (app.py)
  • Interactive web UI
  • Visual parameter controls
  • Real-time results
  • Built-in tutorials
CLI (cli.py)
  • Fast terminal access
  • Scripting support
  • Batch processing
  • Automation-friendly

πŸ”„ Data Flow

Simple Generation Flow

User Input
    ↓
app.py or cli.py
    ↓
get_model(provider, name)
    ↓
model.generate(prompt, params)
    ↓
[API Call to Ollama/OpenAI]
    ↓
ModelResponse(text, tokens, latency, ...)
    ↓
logger.log_interaction(...)
    ↓
[Save to logs/]
    ↓
Display to User

Experiment Flow

User selects experiment
    ↓
experiment.run_xxx(model, params, logger)
    ↓
Multiple model.generate() calls
    ↓
Each call logged separately
    ↓
Aggregate results
    ↓
Analysis & display

🎨 Design Decisions

Why Ollama-First?

Reasons:

  1. βœ… Free - No API costs
  2. βœ… Private - Data stays local
  3. βœ… Fast - No network latency
  4. βœ… Educational - See models up close

Why Streamlit for UI?

Reasons:

  1. βœ… Rapid development - Build UI in pure Python
  2. βœ… Interactive - Great for experimentation
  3. βœ… Familiar - Popular in data science
  4. βœ… Easy to extend - Add features quickly

Why JSON Logs?

Reasons:

  1. βœ… Structured - Easy to parse and analyze
  2. βœ… Flexible - Can add fields without breaking
  3. βœ… Standard - Works with many tools
  4. βœ… Human-readable - Can inspect manually

πŸ”Œ Extension Points

Adding a New Model Provider

Steps:

  1. Create models/your_provider.py
  2. Inherit from BaseModel
  3. Implement generate() and count_tokens()
  4. Add to models/__init__.py

Example:

# models/huggingface_model.py
from models.base import BaseModel, ModelResponse

class HuggingFaceModel(BaseModel):
    def generate(self, prompt, temperature, max_tokens, top_p):
        # Your implementation
        return ModelResponse(...)
    
    def count_tokens(self, text):
        # Your implementation
        return count

# models/__init__.py
from models.huggingface_model import HuggingFaceModel

def get_model(provider, model_name, **kwargs):
    if provider == "huggingface":
        return HuggingFaceModel(model_name, **kwargs)
    # ...

Adding a New Experiment

Steps:

  1. Create experiments/your_experiment.py
  2. Implement run_your_experiment(model, logger, ...)
  3. Export from experiments/__init__.py
  4. Add UI in app.py

Template:

# experiments/your_experiment.py
from models.base import BaseModel
from logger import Logger

def run_your_experiment(
    model: BaseModel,
    logger: Logger,
    # Your parameters
) -> YourResultType:
    """
    Your experiment description.
    """
    # 1. Setup
    # 2. Run generation(s)
    response = model.generate(...)
    
    # 3. Log
    logger.log_interaction(
        prompt=...,
        response=response,
        parameters=...,
        experiment_type="your_experiment",
    )
    
    # 4. Return
    return results

Adding a New UI Tab

Steps in app.py:

def your_experiment_tab(params):
    """Your tab implementation."""
    st.header("πŸ†• Your Experiment")
    st.markdown("Description...")
    
    # Add your controls
    user_input = st.text_input("Input")
    
    if st.button("Run"):
        # Call your experiment
        result = run_your_experiment(...)
        
        # Display results
        st.write(result)

# Add to main tabs
tabs = st.tabs([..., "πŸ†• Your Experiment"])
with tabs[-1]:
    your_experiment_tab(params)

πŸ“¦ Dependencies

Core Dependencies

streamlit>=1.28.0      # Web UI framework
requests>=2.31.0       # HTTP client for Ollama
python-dotenv>=1.0.0   # Environment variables

Optional Dependencies

openai>=1.0.0          # For OpenAI support
tiktoken>=0.5.0        # For exact token counting
pandas>=2.0.0          # For log analysis
matplotlib>=3.7.0      # For visualization

πŸ§ͺ Testing Strategy

Manual Testing Checklist

Basic Functionality:

  • Connect to Ollama
  • Generate simple response
  • Change temperature and observe effect
  • Run each experiment type
  • Check logs are created

Error Handling:

  • Ollama not running β†’ Clear error message
  • Model not installed β†’ Helpful suggestion
  • Invalid parameters β†’ Validation error
  • Network timeout β†’ Graceful degradation

Performance:

  • Response within reasonable time
  • No memory leaks in long sessions
  • Logs don't grow unbounded

Automated Testing (Future)

# tests/test_models.py
def test_ollama_generation():
    model = OllamaModel("llama2")
    response = model.generate("Hello", temperature=0.7)
    assert response.text is not None
    assert response.total_tokens > 0

# tests/test_experiments.py
def test_zero_shot():
    # Mock model
    result = run_zero_shot_experiment(mock_model, "test", mock_logger)
    assert result is not None

πŸ”’ Security Considerations

API Keys

  • βœ… Use environment variables
  • βœ… Never commit .env files
  • βœ… Provide .env.example template

User Input

  • βœ… Sanitize prompts (no code injection)
  • βœ… Validate parameters
  • βœ… Limit request sizes

Local Models

  • βœ… No external data sharing
  • βœ… Full privacy
  • βœ… No API key needed

πŸ“ˆ Performance Characteristics

Ollama (Local)

  • Latency: 50-200ms overhead + generation time
  • Throughput: 5-20 tokens/second (depends on hardware)
  • Memory: 4-8GB RAM per model
  • Cost: Free

OpenAI (API)

  • Latency: 100-500ms overhead + generation time
  • Throughput: 50-100 tokens/second
  • Memory: Minimal (cloud-based)
  • Cost: ~$0.0005-0.03 per 1K tokens

🎯 Best Practices

Code Organization

  1. One concept per file
  2. Clear function names
  3. Type hints where helpful
  4. Docstrings for public APIs

Configuration

  1. Centralize in config.py
  2. Use environment variables for secrets
  3. Provide sensible defaults

Logging

  1. Log every interaction
  2. Include enough context
  3. Use structured formats
  4. Rotate logs when large

Error Handling

  1. Catch specific exceptions
  2. Provide actionable error messages
  3. Don't swallow errors silently
  4. Log errors for debugging

πŸ“š Further Reading

Code References

Similar Projects

  • LangChain: Full LLM framework
  • LiteLLM: Unified API for many providers
  • Haystack: NLP framework with LLM support

🀝 Contributing

To extend this project:

  1. Fork and clone
  2. Create a branch: git checkout -b feature/your-feature
  3. Make changes following the patterns above
  4. Test thoroughly
  5. Document in README and inline comments
  6. Submit PR with clear description

This architecture prioritizes learning and experimentation over production robustness. It's designed to be understood, modified, and extended by beginners! πŸš€