Skip to content

QuwsarOhi/NanoAgent

Folders and files

NameName
Last commit message
Last commit date

Latest commit

ย 

History

84 Commits
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 

Repository files navigation

๐Ÿง  NanoAgent โ€” A 135M Parameter Agentic SLM

NanoAgent is a 135M parameter, 8k context length, open-source language model designed for agentic tasks such as tool calling, instruction following, and lightweight reasoning.
Itโ€™s small enough (~135 MB in 8-bit) to run on edge devices like personal laptops, low-memory CPUs, and even wearables โ€” yet smart enough to make tool calls, parse web information, and give structured answers.

Quick inference resource: here

Huggingface Model: NanoAgent-135M

Run in Ollama: ollama run quwsarohi/NanoAgent

๐ŸŒ Real-World Use Cases

  • ๐Ÿ•น๏ธ Runs on edge devices โ€” laptops, smartwatches, browsers, or CPU-only environments.
  • ๐ŸŒ Parses and answers from the web โ€” supports tool calling to fetch real-time information.
  • ๐Ÿ”Ž Answers recent questions with live web search tools.
  • ๐Ÿ’ฌ Continues conversations โ€” ideal for assistant or agent frameworks.
  • โš™๏ธ Tool calling support enables chaining multiple tools and parsing results to produce final answers.

โœจ What NanoAgent Supports

Capability Description
๐Ÿ’ฌ Basic conversation Casual small talk
๐ŸŒ Information retrieval e.g., โ€œHow to bake a cake?โ€, โ€œWeather in Torontoโ€ through web search. Extracts answers from information returned by tools (scraping/search)
๐Ÿงฐ Tool calling Single & multi-tool call with structured explanation
๐Ÿง  Question decomposition Breaks complex questions into steps
๐Ÿงญ Question classification Identifies type of user query (e.g., fact, reasoning, instruction)
๐Ÿ“ Following system prompts Responds properly to system-level instructions
โœ๏ธ Writing emails and tasks Writes emails, structured messages

๐Ÿงช Training Overview

๐Ÿ“š Datasets Used

This model was trained using a combination of datasets under different open licenses.
Each dataset retains its original license, and use of those datasets is subject to their respective terms.

General Training (SFT)

Dataset Purpose License
microsoft/orca-math-word-problems-200k Math reasoning, word-level reasoning MIT
allenai/tulu-3-sft-personas-instruction-following Instruction following with personas Open Data Commons License Attribution
mlabonne/orca-agentinstruct-1M-v1-cleaned RAG, MCQ, JSON parsing, text classification Community Data License Agreement โ€“ Permissive, Version 2.0
HuggingFaceTB/smoltalk (systemchats-30k) General conversation, system prompts Apache-2.0
HuggingFaceTB/smoltalk (everyday-conversations) Everyday conversation Apache-2.0
nvidia/Nemotron-Instruction-Following-Chat-v1 Instruction following, structured outputs NVIDIA Open Model License

Function Calling Training

Dataset Purpose License
Locutusque/function-calling-chatml Tool call response formatting Apache-2.0
Salesforce/xlam-function-calling-60k Function calling coverage Creative Commons Attribution 4.0
nemotron/interactive_agent (local) Tool calling, agentic behavior NVIDIA Open Model License

๐Ÿงญ Key Explorations & Findings

  • โœ‚๏ธ Dataset deduplication significantly improved performance by removing noisy or duplicate Q/As.
  • โœ‚๏ธ Shortening the responses (casual response) and using shorter python code in training improved performance and reduce repeated token generation.
  • ๐Ÿงฎ Word-level reasoning from orca-math enhanced the modelโ€™s ability to handle stepwise logic.
  • ๐Ÿงฐ Designing tool calling prompts using six open-source tool calling datasets resulted in stronger structured output generation.
  • ๐ŸŒ Tool calling integration enabled the model to extract answers from parsed web data, supporting up-to-date queries.

โšก Benchmark

Model Comparison

Benchmark SmolLM2-135M-Instruct NanoAgent
Commonsense QA (acc) 20.88% 20.23%
IFEval (prompt strict) 21.63% 29.94%
IFEval (inst strict) 35.01% 42.33%
IFEval (prompt loose) 23.84% 32.16%
IFEval (inst loose) 37.65% 45.32%
tinyArc (acc_norm) 33.76% 36.47%
tinyGSM8k (exact_match) 0.55% 2.31%
tinyHellaswag (acc_norm) 42.20% 43.45%
tinyMMLU (acc_norm) 26.79% 27.62%
tinyTruthfulQA (acc) 38.65% 40.45%
tinyWinogrande (acc_norm) 46.48% 42.86%

BFCL Benchmark (Tool Calling)

Category Accuracy Correct/Total
Overall 28.99% 725/2501
parallel 56.50% 113/200
parallel_multiple 54.50% 109/200
simple_python 41.50% 166/400
simple_javascript 40.00% 20/50
multiple 31.50% 63/200
live_simple 28.29% 73/258
simple_java 27.00% 27/100
live_parallel 37.50% 6/16
live_parallel_multiple 25.00% 6/24
live_multiple 13.49% 142/1053

*All evaluations were conducted using greedy decoding (sampling parameter was set to false during HuggingFace inference).

Key Findings

  • NanoAgent significantly outperforms the base SmolLM2-135M-Instruct on instruction following (IFEval) with +8-10% improvements across all metrics
  • NanoAgent improves on tinyMMLU, tinyTruthfulQA, and tinyHellaswag over the base model
  • ๐Ÿงฐ Tool Calling: Only NanoAgent supports tool calling โ€” SmolLM2-135M-Instruct does not

โšก Example Usage

Basic Inference

from transformers import AutoModelForCausalLM, AutoTokenizer

model_name = "quwsarohi/NanoAgent-135M"
tokenizer = AutoTokenizer.from_pretrained(model_name)
model = AutoModelForCausalLM.from_pretrained(model_name, device_map="auto")

def inference(messages, max_new_tokens=256, temperature=0.3, **kwargs):
    input_text = tokenizer.apply_chat_template(
        messages, tokenize=False, add_generation_prompt=True
    )
    inputs = tokenizer.encode(input_text, return_tensors="pt").to(model.device)
    outputs = model.generate(
        inputs,
        max_new_tokens=max_new_tokens,
        do_sample=True,
        temperature=temperature,
        **kwargs
    )
    return tokenizer.decode(outputs[0][inputs.shape[1]:], skip_special_tokens=True)

messages = [{"role": "user", "content": "Hi! Do you have a name?"}]
print(inference(messages))

Tool Calling

NanoAgent uses a JSON-based tool calling format:

import json

tools = [
    {
        "type": "function",
        "function": {
            "name": "web_search",
            "description": "Performs a web search and returns formatted results.",
            "parameters": {
                "type": "object",
                "properties": {
                    "query": {"type": "string", "description": "The search query."}
                },
                "required": ["query"],
            },
        }
    }
]

TOOL_TEMPLATE = """You are a helpful AI assistant. You have a set of possible tools that you can execute to retrieve information or to perform specific actions. You can execute zero or more tools to answer user question.

Here are the list of tools that you have access to:
```json
{tools}
```

Only execute tools from above. Follow the below JSON signature to execute tools:
```json
[{{"name": "tool_name", "arguments": {{"arg1": "val1", ...}}}}, ...]
```
"""

messages = [
    {"role": "system", "content": TOOL_TEMPLATE.format(tools=json.dumps(tools, indent=2))},
    {"role": "user", "content": "What's the latest AI news?"},
]
response = inference(messages, max_new_tokens=512)
print(response)

# Output: ```json
# [{"name": "web_search", "arguments": {"query": "latest AI news 2026"}}]
# ```

๐Ÿงญ Roadmap

  • ๐Ÿ“Š Benchmark more agentic tasks
  • ๐Ÿง  Explore GRPO for tool calling improvement
  • ๐Ÿ”€ Experiment with weight merging
  • ๐Ÿงช Evaluate multi-turn tool chaining
  • ๐Ÿงน Further refine datasets for stability

Directory Tree

NanoAgent/
โ”œโ”€โ”€ benchmarks/              # Benchmark results and evaluations
โ”‚   โ””โ”€โ”€ results/
โ”‚       โ””โ”€โ”€ bfcl/
โ”œโ”€โ”€ config/                 # Configuration files
โ”‚   โ”œโ”€โ”€ lm_eval/
โ”‚   โ””โ”€โ”€ mergekit/
โ”œโ”€โ”€ data/                   # Dataset preparation and processing
โ”‚   โ”œโ”€โ”€ dataprep.py
โ”‚   โ”œโ”€โ”€ grpo/               # GRPO-specific tools and data
โ”‚   โ””โ”€โ”€ utils.py
โ”œโ”€โ”€ grpo/                   # GRPO training scripts
โ”‚   โ””โ”€โ”€ grpo-mlx.py
โ”œโ”€โ”€ notebooks/              # Jupyter notebooks
โ”‚   โ””โ”€โ”€ inference.ipynb
โ”œโ”€โ”€ sft/                    # Supervised Fine-Tuning
โ”‚   โ””โ”€โ”€ train-mlx.py
โ”œโ”€โ”€ utils/                  # Utility scripts
โ”‚   โ”œโ”€โ”€ gguf_conv.py
โ”‚   โ”œโ”€โ”€ tokenizer.py
โ”‚   โ””โ”€โ”€ webtool.py
โ”œโ”€โ”€ weights/                # Model weights
โ”œโ”€โ”€ LICENSE                 # Apache 2.0 license
โ”œโ”€โ”€ NOTICE                  # Notices and attributions
โ”œโ”€โ”€ README.md               # Project overview
โ””โ”€โ”€ requirements.txt        # Python dependencies

๐Ÿ“„ License

This project (code, model weights, and training recipes) is licensed under the Apache License 2.0.

๐Ÿ“ข Notice

  • Model & code are ยฉ quwsarohi, licensed under Apache 2.0.
  • Portions of the training data were sourced from third-party datasets under CDLA-P 2.0, MIT, CC-BY 4.0, ODC-BY, and Apache 2.0.
  • The licensors of these datasets do not endorse this project or its outputs.
  • If you redistribute or fine-tune this model, ensure your use complies with all applicable dataset licenses.

About

An agent that can run everywhere - even in your watch!

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Sponsor this project

Packages