NanoAgent is a 135M parameter, 8k context length, open-source language model designed for agentic tasks such as tool calling, instruction following, and lightweight reasoning.
Itโs small enough (~135 MB in 8-bit) to run on edge devices like personal laptops, low-memory CPUs, and even wearables โ yet smart enough to make tool calls, parse web information, and give structured answers.
Quick inference resource: here
Huggingface Model: NanoAgent-135M
Run in Ollama: ollama run quwsarohi/NanoAgent
- ๐น๏ธ Runs on edge devices โ laptops, smartwatches, browsers, or CPU-only environments.
- ๐ Parses and answers from the web โ supports tool calling to fetch real-time information.
- ๐ Answers recent questions with live web search tools.
- ๐ฌ Continues conversations โ ideal for assistant or agent frameworks.
- โ๏ธ Tool calling support enables chaining multiple tools and parsing results to produce final answers.
| Capability | Description |
|---|---|
| ๐ฌ Basic conversation | Casual small talk |
| ๐ Information retrieval | e.g., โHow to bake a cake?โ, โWeather in Torontoโ through web search. Extracts answers from information returned by tools (scraping/search) |
| ๐งฐ Tool calling | Single & multi-tool call with structured explanation |
| ๐ง Question decomposition | Breaks complex questions into steps |
| ๐งญ Question classification | Identifies type of user query (e.g., fact, reasoning, instruction) |
| ๐ Following system prompts | Responds properly to system-level instructions |
| โ๏ธ Writing emails and tasks | Writes emails, structured messages |
- Base model:
SmolLM2-135M-Instruct(instruction-tuned) - Fine-tuning method:
Dynamic Fine-Tuning (DFT)Supervised Fine-Tuning - Platform: Apple Mac M1 (16 GB) โ MLX framework
This model was trained using a combination of datasets under different open licenses.
Each dataset retains its original license, and use of those datasets is subject to their respective terms.
| Dataset | Purpose | License |
|---|---|---|
| microsoft/orca-math-word-problems-200k | Math reasoning, word-level reasoning | MIT |
| allenai/tulu-3-sft-personas-instruction-following | Instruction following with personas | Open Data Commons License Attribution |
| mlabonne/orca-agentinstruct-1M-v1-cleaned | RAG, MCQ, JSON parsing, text classification | Community Data License Agreement โ Permissive, Version 2.0 |
| HuggingFaceTB/smoltalk (systemchats-30k) | General conversation, system prompts | Apache-2.0 |
| HuggingFaceTB/smoltalk (everyday-conversations) | Everyday conversation | Apache-2.0 |
| nvidia/Nemotron-Instruction-Following-Chat-v1 | Instruction following, structured outputs | NVIDIA Open Model License |
| Dataset | Purpose | License |
|---|---|---|
| Locutusque/function-calling-chatml | Tool call response formatting | Apache-2.0 |
| Salesforce/xlam-function-calling-60k | Function calling coverage | Creative Commons Attribution 4.0 |
| nemotron/interactive_agent (local) | Tool calling, agentic behavior | NVIDIA Open Model License |
- โ๏ธ Dataset deduplication significantly improved performance by removing noisy or duplicate Q/As.
- โ๏ธ Shortening the responses (casual response) and using shorter python code in training improved performance and reduce repeated token generation.
- ๐งฎ Word-level reasoning from
orca-mathenhanced the modelโs ability to handle stepwise logic. - ๐งฐ Designing tool calling prompts using six open-source tool calling datasets resulted in stronger structured output generation.
- ๐ Tool calling integration enabled the model to extract answers from parsed web data, supporting up-to-date queries.
| Benchmark | SmolLM2-135M-Instruct | NanoAgent |
|---|---|---|
| Commonsense QA (acc) | 20.88% | 20.23% |
| IFEval (prompt strict) | 21.63% | 29.94% |
| IFEval (inst strict) | 35.01% | 42.33% |
| IFEval (prompt loose) | 23.84% | 32.16% |
| IFEval (inst loose) | 37.65% | 45.32% |
| tinyArc (acc_norm) | 33.76% | 36.47% |
| tinyGSM8k (exact_match) | 0.55% | 2.31% |
| tinyHellaswag (acc_norm) | 42.20% | 43.45% |
| tinyMMLU (acc_norm) | 26.79% | 27.62% |
| tinyTruthfulQA (acc) | 38.65% | 40.45% |
| tinyWinogrande (acc_norm) | 46.48% | 42.86% |
| Category | Accuracy | Correct/Total |
|---|---|---|
| Overall | 28.99% | 725/2501 |
| parallel | 56.50% | 113/200 |
| parallel_multiple | 54.50% | 109/200 |
| simple_python | 41.50% | 166/400 |
| simple_javascript | 40.00% | 20/50 |
| multiple | 31.50% | 63/200 |
| live_simple | 28.29% | 73/258 |
| simple_java | 27.00% | 27/100 |
| live_parallel | 37.50% | 6/16 |
| live_parallel_multiple | 25.00% | 6/24 |
| live_multiple | 13.49% | 142/1053 |
*All evaluations were conducted using greedy decoding (sampling parameter was set to false during HuggingFace inference).
- NanoAgent significantly outperforms the base SmolLM2-135M-Instruct on instruction following (IFEval) with +8-10% improvements across all metrics
- NanoAgent improves on tinyMMLU, tinyTruthfulQA, and tinyHellaswag over the base model
- ๐งฐ Tool Calling: Only NanoAgent supports tool calling โ SmolLM2-135M-Instruct does not
from transformers import AutoModelForCausalLM, AutoTokenizer
model_name = "quwsarohi/NanoAgent-135M"
tokenizer = AutoTokenizer.from_pretrained(model_name)
model = AutoModelForCausalLM.from_pretrained(model_name, device_map="auto")
def inference(messages, max_new_tokens=256, temperature=0.3, **kwargs):
input_text = tokenizer.apply_chat_template(
messages, tokenize=False, add_generation_prompt=True
)
inputs = tokenizer.encode(input_text, return_tensors="pt").to(model.device)
outputs = model.generate(
inputs,
max_new_tokens=max_new_tokens,
do_sample=True,
temperature=temperature,
**kwargs
)
return tokenizer.decode(outputs[0][inputs.shape[1]:], skip_special_tokens=True)
messages = [{"role": "user", "content": "Hi! Do you have a name?"}]
print(inference(messages))NanoAgent uses a JSON-based tool calling format:
import json
tools = [
{
"type": "function",
"function": {
"name": "web_search",
"description": "Performs a web search and returns formatted results.",
"parameters": {
"type": "object",
"properties": {
"query": {"type": "string", "description": "The search query."}
},
"required": ["query"],
},
}
}
]
TOOL_TEMPLATE = """You are a helpful AI assistant. You have a set of possible tools that you can execute to retrieve information or to perform specific actions. You can execute zero or more tools to answer user question.
Here are the list of tools that you have access to:
```json
{tools}
```
Only execute tools from above. Follow the below JSON signature to execute tools:
```json
[{{"name": "tool_name", "arguments": {{"arg1": "val1", ...}}}}, ...]
```
"""
messages = [
{"role": "system", "content": TOOL_TEMPLATE.format(tools=json.dumps(tools, indent=2))},
{"role": "user", "content": "What's the latest AI news?"},
]
response = inference(messages, max_new_tokens=512)
print(response)
# Output: ```json
# [{"name": "web_search", "arguments": {"query": "latest AI news 2026"}}]
# ```- ๐ Benchmark more agentic tasks
- ๐ง Explore GRPO for tool calling improvement
- ๐ Experiment with weight merging
- ๐งช Evaluate multi-turn tool chaining
- ๐งน Further refine datasets for stability
NanoAgent/
โโโ benchmarks/ # Benchmark results and evaluations
โ โโโ results/
โ โโโ bfcl/
โโโ config/ # Configuration files
โ โโโ lm_eval/
โ โโโ mergekit/
โโโ data/ # Dataset preparation and processing
โ โโโ dataprep.py
โ โโโ grpo/ # GRPO-specific tools and data
โ โโโ utils.py
โโโ grpo/ # GRPO training scripts
โ โโโ grpo-mlx.py
โโโ notebooks/ # Jupyter notebooks
โ โโโ inference.ipynb
โโโ sft/ # Supervised Fine-Tuning
โ โโโ train-mlx.py
โโโ utils/ # Utility scripts
โ โโโ gguf_conv.py
โ โโโ tokenizer.py
โ โโโ webtool.py
โโโ weights/ # Model weights
โโโ LICENSE # Apache 2.0 license
โโโ NOTICE # Notices and attributions
โโโ README.md # Project overview
โโโ requirements.txt # Python dependencies
This project (code, model weights, and training recipes) is licensed under the Apache License 2.0.
- Model & code are ยฉ quwsarohi, licensed under Apache 2.0.
- Portions of the training data were sourced from third-party datasets under CDLA-P 2.0, MIT, CC-BY 4.0, ODC-BY, and Apache 2.0.
- The licensors of these datasets do not endorse this project or its outputs.
- If you redistribute or fine-tune this model, ensure your use complies with all applicable dataset licenses.