RoLLM (Early Alpha Release v0.1.0)

RoLLM is a lightweight and educational transformer-based language model framework built entirely for the ROBLOX ecosystem. It includes a full list from character and BPE-based tokenizers to multi-head attention, training loops, and inference with temperature control.

🚧 This project is in early alpha and not intended for production usage or real deployment on ROBLOX experiences.

Features

Char/BPE Tokenizers: Two tokenizer strategies with external vocab support for BPE.
Custom Embedding Layer: Position-aware token embedding matrices.
Multi-Head Attention: Scaled dot-product attention with causal masking.
Transformer Blocks: Includes layer norm, residual connections, and feedforward networks.
Training Loop: Cross-entropy loss and basic SGD optimizer.
Generation: Sampling and greedy decoding with temperature control.
Modular Design: Swap out components easily (e.g., tokenizer, attention).

Architecture Overview

Input Text → Tokenizer → Embedding → N × Transformer Blocks
             ↓                               ↓
        Vocabulary         ←         Final Projection (d_model × vocab_size)

Installation

Clone the Repo Place the contents into ReplicatedStorage in your Roblox project.

RoLLM/
├── components/
│   ├── Tokenizer.luau
│   ├── CharTokenizer.luau
│   ├── BPETokenizer.luau
│   ├── Embedding.luau
│   ├── MultiHeadAttention.luau
│   ├── TransformerBlock.luau
│   ├── TransformerModel.luau
├── lib/
│   ├── LinearAlgebra.luau
│   ├── CrossEntropyLoss.luau
│   ├── Optimizer.luau
│   ├── types.luau
├── RoLLM.luau

Require from any script:

local RoLLM = require(ReplicatedStorage.RoLLM)

Usage

Quick Start

local RoLLM = require(script.Parent.RoLLM)

local model = RoLLM.new("hello world", {
    dModel = 64,
    numHeads = 4,
    dFF = 128,
    numLayers = 2,
    maxSeqLen = 64,
    tokenizerMode = "char"
})

print(model:generate("hel", 10))

Training

model:trainModel({
    "hello world",
    "how are you",
    "i am a bot"
}, 5, 0.01)

Generation

local output = model:generateTemperature("hello", 20, 0.8)
print(output)

Modules

RoLLM.lua: Entry point, builds the tokenizer and transformer.
Tokenizer.lua: Factory for char or bpe modes.
CharTokenizer.lua: Character-level tokenizer.
BPETokenizer.lua: Byte-Pair Encoding tokenizer with external vocab support.
TransformerModel.lua: Core forward/backward implementation.
Embedding.lua: Position-encoded embeddings.
MultiHeadAttention.lua: Attention module.
TransformerBlock.lua: One transformer block.
LinearAlgebra.lua: Basic matrix math.
CrossEntropyLoss.lua: Computes loss and gradients.
Optimizer.lua: Naive SGD optimizer.
types.lua: Shared matrix and config types.

Tokenizer Modes

"char": Default. Simple and fast but limited generalization.
"bpe": External vocab must be loaded with loadExternalVocab(url).

Name		Name	Last commit message	Last commit date
Latest commit History 10 Commits
Packages		Packages
src		src
.gitignore		.gitignore
README.md		README.md
aftman.toml		aftman.toml
default.project.json		default.project.json
sourcemap.json		sourcemap.json
wally.lock		wally.lock
wally.toml		wally.toml

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

RoLLM (Early Alpha Release v0.1.0)

Table of Contents

Features

Architecture Overview

Installation

Usage

Quick Start

Training

Generation

Modules

Tokenizer Modes

License

About

Uh oh!

Releases

Packages

Uh oh!

Contributors

Uh oh!

Languages

Folders and files

Latest commit

History

Repository files navigation

RoLLM (Early Alpha Release v0.1.0)

Table of Contents

Features

Architecture Overview

Installation

Usage

Quick Start

Training

Generation

Modules

Tokenizer Modes

License

About

Resources

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Contributors

Uh oh!

Languages

Packages