Skip to content

Latest commit

 

History

History
264 lines (196 loc) · 6.23 KB

File metadata and controls

264 lines (196 loc) · 6.23 KB

Tiny Conversational AI - Usage Guide

A minimal transformer-based chatbot built from scratch using only NumPy. This guide covers installation, training, and usage.

📋 Prerequisites

  • Python 3.6+
  • NumPy library

🚀 Quick Start

Installation

# Install dependencies from requirements.txt
pip install -r requirements.txt

# Or install manually
pip install numpy

# Make scripts executable (optional)
chmod +x train.py launch.py

Training the Model

# Train with default output path (chatbot_model.pkl)
python train.py

# Or specify a custom output path
python train.py my_model.pkl

Launching the Chatbot

# Launch with default model (chatbot_model.pkl in current directory)
python launch.py

# Or specify a custom model path
python launch.py my_model.pkl

💬 Interactive Chat

Once you launch with python launch.py, you'll see:

==================================================
Chatbot Ready! (type 'quit' to exit)
==================================================

You: 

Type your messages and press Enter. The bot will generate responses based on the trained model.

Available commands:

  • quit - Exit the chatbot
  • exit - Exit the chatbot
  • q - Exit the chatbot

🔧 Advanced Usage

Loading a Pre-trained Model

from tiny_chatbot import TinyChatbot

# Load saved model
model = TinyChatbot.load('chatbot_model.pkl')

# Generate response
response = model.generate("hello ", max_new_tokens=40, temperature=0.8)
print(response)

Training with Custom Data

from tiny_chatbot import TinyChatbot, train

# Create model
model = TinyChatbot(
    vocab_size=128,
    embed_dim=64,
    num_heads=4,
    ff_dim=128,
    num_layers=2,
    max_len=64
)

# Prepare your data (list of token sequences)
data = [
    [ord(c) for c in "hello Hi there!"],
    [ord(c) for c in "how are you I'm great!"],
    # Add more conversations...
]

# Train
train(model, data, epochs=100, batch_size=4)

# Save
model.save('my_model.pkl')

Adjusting Generation Parameters

# Temperature controls randomness (0.1 = conservative, 1.5 = creative)
response = model.generate(
    prompt="hello",
    max_new_tokens=50,    # Maximum tokens to generate
    temperature=0.7       # Sampling temperature
)

⚙️ Configuration

Edit hyperparameters at the top of tiny_chatbot.py:

VOCAB_SIZE = 128      # ASCII character set
EMBED_DIM = 64        # Embedding dimension
NUM_HEADS = 4         # Number of attention heads
FF_DIM = 128          # Feed-forward dimension
NUM_LAYERS = 2        # Number of transformer layers
MAX_LEN = 64          # Maximum sequence length
LEARNING_RATE = 0.001 # Learning rate (not used in current version)

Recommended configurations:

Use Case EMBED_DIM NUM_HEADS NUM_LAYERS Notes
Tiny (demo) 32 2 1 Very fast, limited capability
Small 64 4 2 Default, good for testing
Medium 128 8 4 Better quality, slower
Large 256 8 6 Best quality, much slower

📊 Model Architecture

The model implements a simplified GPT-style transformer:

Input Text
    ↓
Token Embedding + Positional Embedding
    ↓
Transformer Block 1
  ├─ Multi-Head Attention
  ├─ Layer Normalization
  ├─ Feed-Forward Network
  └─ Layer Normalization
    ↓
Transformer Block 2
  └─ (same structure)
    ↓
Output Linear Layer
    ↓
Generated Text

Components:

  • Multi-Head Attention: Allows model to focus on different parts of input
  • Feed-Forward Networks: Processes attention outputs
  • Layer Normalization: Stabilizes training
  • Positional Embeddings: Encodes token positions
  • Causal Masking: Ensures autoregressive generation

🎯 Training Data Format

The default training includes 10 conversation pairs:

conversations = [
    ("hello", "Hi! How can I help you today?"),
    ("hi", "Hello! What's on your mind?"),
    ("how are you", "I'm doing well, thanks for asking!"),
    # ...
]

To add your own:

  1. Edit the prepare_data() function in tiny_chatbot.py
  2. Add conversation tuples: (user_input, bot_response)
  3. Re-run the script to train with new data

📈 Training Process

Training output shows progress:

Training for 50 epochs...
Epoch 10/50, Loss: -0.11
Epoch 20/50, Loss: -0.11
Epoch 30/50, Loss: -0.11
Epoch 40/50, Loss: -0.11
Epoch 50/50, Loss: -0.11
Training complete!

Note: The loss is a simplified approximation. Real implementations use proper cross-entropy loss with backpropagation.

🐛 Limitations

This is an educational implementation demonstrating transformer concepts:

  • No backpropagation: Weights aren't actually updated during training
  • Small vocabulary: Only supports ASCII characters (128 tokens)
  • Limited data: Only 10 training examples by default
  • Simple generation: May produce random outputs without proper training
  • No GPU support: Uses only NumPy (CPU-based)

🔍 Troubleshooting

Model generates random characters

  • This is expected with the minimal training data
  • Add more conversation pairs to prepare_data()
  • Increase training epochs
  • Adjust temperature (lower = more deterministic)

Out of memory errors

  • Reduce EMBED_DIM, NUM_LAYERS, or MAX_LEN
  • Process smaller batches
  • Limit max_new_tokens during generation

Import errors

# Make sure NumPy is installed
pip install numpy

📚 Further Learning

To build a production-ready chatbot:

  1. Use PyTorch or TensorFlow for automatic differentiation
  2. Implement proper backpropagation with optimizers (Adam, SGD)
  3. Use larger datasets (thousands or millions of examples)
  4. Add tokenization (BPE, WordPiece) for better vocabulary
  5. Implement beam search for better generation
  6. Add temperature scaling and top-k/top-p sampling
  7. Use pre-trained models (GPT-2, BERT) and fine-tune

📄 License

Educational implementation - use freely for learning purposes!

🤝 Contributing

Feel free to extend this implementation:

  • Add proper backpropagation
  • Implement different attention mechanisms
  • Add more training data
  • Optimize performance
  • Create a web interface

Built with ❤️ using only NumPy and Python