A minimal transformer-based chatbot built from scratch using only NumPy. This guide covers installation, training, and usage.
- Python 3.6+
- NumPy library
# Install dependencies from requirements.txt
pip install -r requirements.txt
# Or install manually
pip install numpy
# Make scripts executable (optional)
chmod +x train.py launch.py# Train with default output path (chatbot_model.pkl)
python train.py
# Or specify a custom output path
python train.py my_model.pkl# Launch with default model (chatbot_model.pkl in current directory)
python launch.py
# Or specify a custom model path
python launch.py my_model.pklOnce you launch with python launch.py, you'll see:
==================================================
Chatbot Ready! (type 'quit' to exit)
==================================================
You:
Type your messages and press Enter. The bot will generate responses based on the trained model.
Available commands:
quit- Exit the chatbotexit- Exit the chatbotq- Exit the chatbot
from tiny_chatbot import TinyChatbot
# Load saved model
model = TinyChatbot.load('chatbot_model.pkl')
# Generate response
response = model.generate("hello ", max_new_tokens=40, temperature=0.8)
print(response)from tiny_chatbot import TinyChatbot, train
# Create model
model = TinyChatbot(
vocab_size=128,
embed_dim=64,
num_heads=4,
ff_dim=128,
num_layers=2,
max_len=64
)
# Prepare your data (list of token sequences)
data = [
[ord(c) for c in "hello Hi there!"],
[ord(c) for c in "how are you I'm great!"],
# Add more conversations...
]
# Train
train(model, data, epochs=100, batch_size=4)
# Save
model.save('my_model.pkl')# Temperature controls randomness (0.1 = conservative, 1.5 = creative)
response = model.generate(
prompt="hello",
max_new_tokens=50, # Maximum tokens to generate
temperature=0.7 # Sampling temperature
)Edit hyperparameters at the top of tiny_chatbot.py:
VOCAB_SIZE = 128 # ASCII character set
EMBED_DIM = 64 # Embedding dimension
NUM_HEADS = 4 # Number of attention heads
FF_DIM = 128 # Feed-forward dimension
NUM_LAYERS = 2 # Number of transformer layers
MAX_LEN = 64 # Maximum sequence length
LEARNING_RATE = 0.001 # Learning rate (not used in current version)Recommended configurations:
| Use Case | EMBED_DIM | NUM_HEADS | NUM_LAYERS | Notes |
|---|---|---|---|---|
| Tiny (demo) | 32 | 2 | 1 | Very fast, limited capability |
| Small | 64 | 4 | 2 | Default, good for testing |
| Medium | 128 | 8 | 4 | Better quality, slower |
| Large | 256 | 8 | 6 | Best quality, much slower |
The model implements a simplified GPT-style transformer:
Input Text
↓
Token Embedding + Positional Embedding
↓
Transformer Block 1
├─ Multi-Head Attention
├─ Layer Normalization
├─ Feed-Forward Network
└─ Layer Normalization
↓
Transformer Block 2
└─ (same structure)
↓
Output Linear Layer
↓
Generated Text
- Multi-Head Attention: Allows model to focus on different parts of input
- Feed-Forward Networks: Processes attention outputs
- Layer Normalization: Stabilizes training
- Positional Embeddings: Encodes token positions
- Causal Masking: Ensures autoregressive generation
The default training includes 10 conversation pairs:
conversations = [
("hello", "Hi! How can I help you today?"),
("hi", "Hello! What's on your mind?"),
("how are you", "I'm doing well, thanks for asking!"),
# ...
]To add your own:
- Edit the
prepare_data()function intiny_chatbot.py - Add conversation tuples:
(user_input, bot_response) - Re-run the script to train with new data
Training output shows progress:
Training for 50 epochs...
Epoch 10/50, Loss: -0.11
Epoch 20/50, Loss: -0.11
Epoch 30/50, Loss: -0.11
Epoch 40/50, Loss: -0.11
Epoch 50/50, Loss: -0.11
Training complete!
Note: The loss is a simplified approximation. Real implementations use proper cross-entropy loss with backpropagation.
This is an educational implementation demonstrating transformer concepts:
- No backpropagation: Weights aren't actually updated during training
- Small vocabulary: Only supports ASCII characters (128 tokens)
- Limited data: Only 10 training examples by default
- Simple generation: May produce random outputs without proper training
- No GPU support: Uses only NumPy (CPU-based)
- This is expected with the minimal training data
- Add more conversation pairs to
prepare_data() - Increase training epochs
- Adjust temperature (lower = more deterministic)
- Reduce
EMBED_DIM,NUM_LAYERS, orMAX_LEN - Process smaller batches
- Limit
max_new_tokensduring generation
# Make sure NumPy is installed
pip install numpyTo build a production-ready chatbot:
- Use PyTorch or TensorFlow for automatic differentiation
- Implement proper backpropagation with optimizers (Adam, SGD)
- Use larger datasets (thousands or millions of examples)
- Add tokenization (BPE, WordPiece) for better vocabulary
- Implement beam search for better generation
- Add temperature scaling and top-k/top-p sampling
- Use pre-trained models (GPT-2, BERT) and fine-tune
Educational implementation - use freely for learning purposes!
Feel free to extend this implementation:
- Add proper backpropagation
- Implement different attention mechanisms
- Add more training data
- Optimize performance
- Create a web interface
Built with ❤️ using only NumPy and Python