WorkRightUAE: Fine-Tuning Small LLMs as ReAct Agents for Domain-Specific Tasks

Overview

This project focuses on fine-tuning small open-source language models (e.g., Gemma3:4B) to act as ReAct agents for domain-specific tasks—specifically, interacting with UAE labor laws and employment regulations. The goal is to reduce production costs by avoiding reliance on large-scale or paid models like ChatGPT or Gemini, while still delivering robust tool-augmented reasoning.

🔗 work-right-uae model on Hugging Face

Motivation

Deploying large proprietary models in production can be prohibitively expensive and often unnecessary for specialized applications. This project explores whether a small, instruction-tuned model (4B or smaller) can be trained to:

Reliably select tools in a ReAct agent setting
Generate accurate, grounded answers
Mimic the performance of a much larger teacher model (Mistral 7B) through knowledge distillation

Dataset Construction

Document Chunking

Using LlamaIndex:

Initial parsing with MarkdownNodeParser
If a chunk exceeds 512 tokens, further split using SentenceSplitter with overlap

Synthetic Question Generation

Embed all nodes and perform K-Means clustering
For each cluster, apply sliding window technique to generate questions
Inject persona-based variations (HR manager, Employer, Employee, Domestic Worker)
Use Mistral-7B-AWQ via vLLM to generate QA pairs

ReAct Trace Generation

Use LlamaIndex's Workflow API to generate ReAct-style agent traces
Each trace contains:
- User query
- Model thought
- Tool selection
- Observation
- Final answer
Hosted Mistral 7B via vLLM for high-throughput generation

Fine-Tuning: `Gemma3:4b-it → work-right-uae`

Fine-tuned with Unsloth + PEFT

Chat template modified to support ReAct agent flow:

user → assistant → observation → assistant → ...

Enabled train_on_responses_only to fine-tune only the assistant response
Untuned Gemma3:4b-it struggled with:
- Hallucinations
- Infinite reasoning loops
- Tool selection failures

After fine-tuning:

Model understands when to stop
Selects correct tool
Avoids infinite loops and hallucinations

LoRA Knowledge Distillation: `Gemma3:1b-it`

To explore model compression, knowledge distillation was attempted from 4B to 1B model.

Problem

KL Divergence-based loss in HuggingFace’s GKDTrainer requires identical tokenizer vocab
Gemma3:4b-it and Gemma3:1b-it have different tokenizers

Solution: ULD Loss (Universal Logit Distillation)

Implemented based on the paper:

“Towards Cross-Tokenizer Distillation: the Universal Logit Distillation Loss for LLMs”

Key Modifications:

Extended SFTTrainer instead of GKDTrainer
Performed loss computation per batch (to reduce CUDA memory)
Used Top-K token filtering to reduce unnecessary computation and accelerate training

Results

Model	Role	Behavior
`Gemma3:4b-it`	Untuned	Hallucinates, fails tool usage
`work-right-uae`	Fine-tuned (ReAct)	Predicts tool + halts correctly
`Gemma3:1b-it`	Distilled via ULD	Still under testing

Future Improvements

Merge summarizer and ReAct agent into one compact model
Enhance data quality and complexity to enable the model to handle more nuanced queries, multi-turn conversations, and complex reasoning across chat history
Deploy with llama.cpp for CPU-only environments (reduce production cost)
Further evaluate distilled 1B model performance

Tech Stack

Models: Gemma3:4b-it, Mistral-7B-AWQ, Gemma3:1b-it
Libraries: LlamaIndex, Unsloth, vLLM, PEFT, Transformers, Ollama, ChromaDB
Distillation: Custom implementation of Universal Logit Distillation

Name		Name	Last commit message	Last commit date
Latest commit History 4 Commits
data		data
imgs		imgs
.gitignore		.gitignore
README.md		README.md
main.py		main.py
requirements.txt		requirements.txt
sys-prompt.md		sys-prompt.md
vecdb.py		vecdb.py

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

WorkRightUAE: Fine-Tuning Small LLMs as ReAct Agents for Domain-Specific Tasks

Overview

🔗 work-right-uae model on Hugging Face

Motivation

Dataset Construction

Document Chunking

Synthetic Question Generation

ReAct Trace Generation

Fine-Tuning: `Gemma3:4b-it → work-right-uae`

LoRA Knowledge Distillation: `Gemma3:1b-it`

Problem

Solution: ULD Loss (Universal Logit Distillation)

Key Modifications:

Results

Future Improvements

Tech Stack

About

Uh oh!

Releases

Packages

Uh oh!

Contributors

Uh oh!

Languages

Folders and files

Latest commit

History

Repository files navigation

WorkRightUAE: Fine-Tuning Small LLMs as ReAct Agents for Domain-Specific Tasks

Overview

🔗 work-right-uae model on Hugging Face

Motivation

Dataset Construction

Document Chunking

Synthetic Question Generation

ReAct Trace Generation

Fine-Tuning: Gemma3:4b-it → work-right-uae

LoRA Knowledge Distillation: Gemma3:1b-it

Problem

Solution: ULD Loss (Universal Logit Distillation)

Key Modifications:

Results

Future Improvements

Tech Stack

About

Topics

Resources

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Contributors

Uh oh!

Languages

Fine-Tuning: `Gemma3:4b-it → work-right-uae`

LoRA Knowledge Distillation: `Gemma3:1b-it`

Packages