🍌 OpenNanoScaleLLM

OpenNanoScaleLLM is a clean‑room, open‑source nano‑scale Large Language Model (LLM) inspired by the ideas behind Google’s internal small‑model research — but built entirely in the open, using Hugging Face Transformers.

It is designed to be small, fast, infra‑aware, tool‑aware, and explainable, not just fluent.

This is not a toy fine‑tune. It is a full, end‑to‑end LLM system with training, RAG, tools, evaluation, and live demos.

🚀 Why OpenNanoScaleLLM Exists

There is no open‑source Google NanoBanana:

No public repo
No weights
No training code

That creates a gap.

OpenNanoScaleLLM fills that gap with:

A real nano‑LLM (≈1.5B params)
Infrastructure & DevOps specialization
Retrieval‑Augmented Generation (RAG)
Tool‑aware reasoning to prevent hallucinations
Transparent evaluation metrics

All built in a clean‑room, reproducible way.

🧠 Core Design Goals

🧩 Nano‑scale – runs on modest GPUs / CPU when quantized
⚡ Fast inference – LoRA + efficient base model
🎯 Domain‑specialized – cloud, DevOps, Linux, APIs
🔍 Grounded answers – RAG + context checks
🛠️ Tool‑aware – asks for logs, regions, APIs when needed
🔓 Fully open – Apache‑2.0 license

📐 Model Overview

Attribute	Value
Base model	Qwen2.5-1.5B
Parameters	~1.5B
Fine‑tuning	LoRA (SFT)
Context length	4k tokens
License	Apache‑2.0
Library	Hugging Face Transformers

System Overview

Retrieval‑Augmented Generation as a multi‑stage observation pipeline, inspired by light behavior.

Stages:

Ingestion (Observation)
Chunking (Diffraction)
Embedding (Refraction)
Retrieval (Reflection)
Reasoning (Interpretation)
Evaluation (Noise & Hallucination Measurement)

Tech Stack

Layer	Choice
Backend	FastAPI
Vector DB	Chroma (local) / Pinecone (prod)
LLM	OpenAI / compatible
Embeddings	text-embedding-3-large
UI	React + Tailwind
PDF Parsing	PyMuPDF
Eval	RAGAS‑style metrics
Deploy	Docker + AWS EC2

🏗️ Repository Structure

Open-NanoScale-LLM/
├── README.md
├── LICENSE
├── requirements.txt
├── configs/
│   ├── model.yaml
│   ├── training.yaml
│   └── lora.yaml
├── data/
│   ├── raw/
│   │   ├── devops_notes.md
│   │   ├── docker_errors.md
│   │   └── k8s_troubleshooting.md
│   ├── processed/
│   │   ├── instructions.jsonl
│   │   ├── rag_chunks.jsonl
│   │   └── README.md
│   └── samples.jsonl
├── scripts/
│   ├── prepare_data.py
│   ├── train_lora.py
│   ├── merge_lora.py
│   ├── inference.py
│   └── evaluate.py
├── rag/
│   ├── ingest.py
│   ├── retriever.py
│   ├── prompt.py
│   └── qa.py
├── tools/
│   ├── aws.py
│   ├── logs.py
│   └── api.py
├── app/
│   ├── main.py
│   ├── rag_engine.py
│   └── schemas.py
├── ui/
│   └── gradio_app.py
├── evals/
│   ├── test_cases.json
│   ├── metrics.py
│   └── run_eval.py
├── dashboard/
│   └── gradio_eval.py
└── rag-light-system/
    ├── backend/
    │   ├── app.py
    │   ├── ingest.py
    │   ├── retriever.py
    │   ├── llm.py
    │   ├── evaluate.py
    │   ├── config.py
    │   └── requirements.txt
    ├── frontend/
    │   ├── src/App.jsx
    │   ├── src/components/StageView.jsx
    │   └── src/index.css
    ├── docker-compose.yml
    └── README.md

🧱 ASCII Architecture Diagram (README-friendly)

This works perfectly in README.md and GitHub renders it cleanly.

                         ┌───────────────────────────┐
                         │       User / Client       │
                         │  (CLI, Gradio, FastAPI)   │
                         └─────────────┬─────────────┘
                                       │
                                       ▼
                         ┌───────────────────────────┐
                         │   OpenNanoScaleLLM Engine │
                         │  (Inference Orchestrator) │
                         └─────────────┬─────────────┘
                                       │
               ┌───────────────────────┼───────────────────────┐
               │                       │                       │
               ▼                       ▼                       ▼
     ┌─────────────────┐   ┌─────────────────────┐   ┌─────────────────┐
     │ Tool Prechecks  │   │   RAG Retriever     │   │  Prompt Builder │
     │ (AWS / Logs /   │   │ (FAISS / Chroma)    │   │  System + Rules │
     │  API Context)   │   └───────────┬─────────┘   └──────────┬──────┘
     └────────┬────────┘               │                        │
              │                        ▼                        │
              │           ┌────────────────────────┐            │
              │           │   Vector Embeddings    │            │
              │           │   (MiniLM / SBERT)     │            │
              │           └───────────┬────────────┘            │
              │                       │                         │
              └───────────────────────┼─────────────────────────┘
                                      ▼
                         ┌───────────────────────────┐
                         │    OpenNanoScaleLLM       │
                         │    (Qwen2.5-1.5B + LoRA)  │
                         └─────────────┬─────────────┘
                                       │
                                       ▼
                         ┌───────────────────────────┐
                         │      Final Response       │
                         │  (Grounded + Tool-aware)  │
                         └───────────────────────────┘

🧪 Training Pipeline

1️⃣ Dataset

Instruction‑style JSONL focused on infra reasoning:

AWS IAM, EC2, S3, ECR
Docker & Kubernetes
CI/CD failures
API debugging

Example:

{
  "instruction": "Why does an EC2 instance fail to access S3?",
  "input": "AccessDenied error",
  "output": "The EC2 instance likely lacks an IAM role or the attached policy does not allow s3:GetObject..."
}

2️⃣ Data Preparation

python scripts/prepare_data.py

Formats data into a model‑friendly instruction template.

3️⃣ LoRA Fine‑Tuning

python scripts/train_lora.py

Efficient
Low VRAM
Domain‑focused

4️⃣ Merge LoRA

python scripts/merge_lora.py

Produces a standalone model for inference & upload.

🔍 Retrieval‑Augmented Generation (RAG)

OpenNanoBanana is RAG‑first, not RAG‑bolted‑on.

RAG Flow

User Question
   ↓
Pre‑check (tools)
   ↓
Vector Retrieval (FAISS)
   ↓
Context Assembly
   ↓
Prompt Injection
   ↓
LLM Answer

Knowledge Sources

Markdown / PDF docs
Cloud & DevOps references
User‑supplied documents

Ingest once:

python rag/ingest.py

Run interactive QA:

python rag/qa.py

🛠️ Tool‑Aware Reasoning (Anti‑Hallucination)

Instead of guessing, the model asks for missing info.

Built‑in Tool Signals

AWS → asks for region, account, service
Logs → requests error logs
API → asks for endpoint, auth, method

Example:

User: EC2 cannot pull image from ECR Model: Please confirm the AWS region and ensure the EC2 IAM role has ecr:GetAuthorizationToken permission.

This is intentional and by design.

📊 Evaluation & Hallucination Metrics

Most projects skip this. OpenNanoBanana doesn’t.

Metrics Implemented

Keyword Coverage – expected technical concepts
Groundedness – answer vs retrieved context
Hallucination Score – unsupported content
Refusal Correctness – asks for info instead of guessing

Run batch evaluation:

python evals/run_eval.py

Visual Dashboard

python dashboard/gradio_eval.py

Shows:

Per‑question scores
Hallucination trends
Grounding quality

🌐 Live Demo

Backend (FastAPI)

uvicorn app.main:app --reload

Frontend (Gradio)

python ui/gradio_app.py

A real, production‑style LLM demo — not a notebook.

🤗 Hugging Face Release

Model: hf.co/<Naveenub>/Open-NanoScale-LLM
Live Space: hf.co/spaces/<Naveenub>/Open-NanoScale-LLM-demo

Includes:

Model card
License
Demo UI

⚖️ License

Apache License 2.0

You are free to:

Use commercially
Modify
Redistribute

⚠️ Disclaimer

This project is a clean‑room, independent open‑source implementation.

Not affiliated with Google
Not derived from any proprietary NanoBanana system
No private or restricted data used

🎯 Who This Is For

LLM / AI Engineers
Infra & DevOps Engineers exploring AI
Researchers interested in small‑model systems
Anyone tired of hype‑only LLM repos

🛣️ Roadmap

Multi‑knowledge‑base RAG
GGUF / Ollama packaging
Tool execution (not just awareness)
Deterministic infra mode

⭐ Final Note

OpenNanoBanana is meant to be:

Readable
Reproducible
Honest
Useful

If you build on it — ship it. 🍌🚀

Name		Name	Last commit message	Last commit date
Latest commit History 86 Commits
app		app
configs		configs
dashboard		dashboard
data		data
evals		evals
merged-model		merged-model
modeling		modeling
rag-light-system		rag-light-system
rag		rag
scripts		scripts
tools		tools
ui		ui
.gitignore		.gitignore
LICENSE		LICENSE
README.md		README.md
app.py		app.py
requirements.txt		requirements.txt

Folders and files

Latest commit

History

Repository files navigation

🍌 OpenNanoScaleLLM

🚀 Why OpenNanoScaleLLM Exists

🧠 Core Design Goals

📐 Model Overview

System Overview

Tech Stack

🏗️ Repository Structure

🧱 ASCII Architecture Diagram (README-friendly)

🧪 Training Pipeline

1️⃣ Dataset

2️⃣ Data Preparation

3️⃣ LoRA Fine‑Tuning

4️⃣ Merge LoRA

🔍 Retrieval‑Augmented Generation (RAG)

RAG Flow

Knowledge Sources

🛠️ Tool‑Aware Reasoning (Anti‑Hallucination)

Built‑in Tool Signals

📊 Evaluation & Hallucination Metrics

Metrics Implemented

Visual Dashboard

🌐 Live Demo

Backend (FastAPI)

Frontend (Gradio)

🤗 Hugging Face Release

⚖️ License

⚠️ Disclaimer

🎯 Who This Is For

🛣️ Roadmap

⭐ Final Note

About

Resources

License

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Uh oh!

Contributors

Uh oh!

Languages

Packages