OpenNanoScaleLLM is a cleanβroom, openβsource nanoβscale Large Language Model (LLM) inspired by the ideas behind Googleβs internal smallβmodel research β but built entirely in the open, using Hugging Face Transformers.
It is designed to be small, fast, infraβaware, toolβaware, and explainable, not just fluent.
This is not a toy fineβtune. It is a full, endβtoβend LLM system with training, RAG, tools, evaluation, and live demos.
There is no openβsource Google NanoBanana:
- No public repo
- No weights
- No training code
That creates a gap.
OpenNanoScaleLLM fills that gap with:
- A real nanoβLLM (β1.5B params)
- Infrastructure & DevOps specialization
- RetrievalβAugmented Generation (RAG)
- Toolβaware reasoning to prevent hallucinations
- Transparent evaluation metrics
All built in a cleanβroom, reproducible way.
- π§© Nanoβscale β runs on modest GPUs / CPU when quantized
- β‘ Fast inference β LoRA + efficient base model
- π― Domainβspecialized β cloud, DevOps, Linux, APIs
- π Grounded answers β RAG + context checks
- π οΈ Toolβaware β asks for logs, regions, APIs when needed
- π Fully open β Apacheβ2.0 license
| Attribute | Value |
|---|---|
| Base model | Qwen2.5-1.5B |
| Parameters | ~1.5B |
| Fineβtuning | LoRA (SFT) |
| Context length | 4k tokens |
| License | Apacheβ2.0 |
| Library | Hugging Face Transformers |
RetrievalβAugmented Generation as a multiβstage observation pipeline, inspired by light behavior.
Stages:
- Ingestion (Observation)
- Chunking (Diffraction)
- Embedding (Refraction)
- Retrieval (Reflection)
- Reasoning (Interpretation)
- Evaluation (Noise & Hallucination Measurement)
| Layer | Choice |
|---|---|
| Backend | FastAPI |
| Vector DB | Chroma (local) / Pinecone (prod) |
| LLM | OpenAI / compatible |
| Embeddings | text-embedding-3-large |
| UI | React + Tailwind |
| PDF Parsing | PyMuPDF |
| Eval | RAGASβstyle metrics |
| Deploy | Docker + AWS EC2 |
Open-NanoScale-LLM/
βββ README.md
βββ LICENSE
βββ requirements.txt
βββ configs/
β βββ model.yaml
β βββ training.yaml
β βββ lora.yaml
βββ data/
β βββ raw/
β β βββ devops_notes.md
β β βββ docker_errors.md
β β βββ k8s_troubleshooting.md
β βββ processed/
β β βββ instructions.jsonl
β β βββ rag_chunks.jsonl
β β βββ README.md
β βββ samples.jsonl
βββ scripts/
β βββ prepare_data.py
β βββ train_lora.py
β βββ merge_lora.py
β βββ inference.py
β βββ evaluate.py
βββ rag/
β βββ ingest.py
β βββ retriever.py
β βββ prompt.py
β βββ qa.py
βββ tools/
β βββ aws.py
β βββ logs.py
β βββ api.py
βββ app/
β βββ main.py
β βββ rag_engine.py
β βββ schemas.py
βββ ui/
β βββ gradio_app.py
βββ evals/
β βββ test_cases.json
β βββ metrics.py
β βββ run_eval.py
βββ dashboard/
β βββ gradio_eval.py
βββ rag-light-system/
βββ backend/
β βββ app.py
β βββ ingest.py
β βββ retriever.py
β βββ llm.py
β βββ evaluate.py
β βββ config.py
β βββ requirements.txt
βββ frontend/
β βββ src/App.jsx
β βββ src/components/StageView.jsx
β βββ src/index.css
βββ docker-compose.yml
βββ README.md
This works perfectly in README.md and GitHub renders it cleanly.
βββββββββββββββββββββββββββββ
β User / Client β
β (CLI, Gradio, FastAPI) β
βββββββββββββββ¬ββββββββββββββ
β
βΌ
βββββββββββββββββββββββββββββ
β OpenNanoScaleLLM Engine β
β (Inference Orchestrator) β
βββββββββββββββ¬ββββββββββββββ
β
βββββββββββββββββββββββββΌββββββββββββββββββββββββ
β β β
βΌ βΌ βΌ
βββββββββββββββββββ βββββββββββββββββββββββ βββββββββββββββββββ
β Tool Prechecks β β RAG Retriever β β Prompt Builder β
β (AWS / Logs / β β (FAISS / Chroma) β β System + Rules β
β API Context) β βββββββββββββ¬ββββββββββ ββββββββββββ¬βββββββ
ββββββββββ¬βββββββββ β β
β βΌ β
β ββββββββββββββββββββββββββ β
β β Vector Embeddings β β
β β (MiniLM / SBERT) β β
β βββββββββββββ¬βββββββββββββ β
β β β
βββββββββββββββββββββββββΌββββββββββββββββββββββββββ
βΌ
βββββββββββββββββββββββββββββ
β OpenNanoScaleLLM β
β (Qwen2.5-1.5B + LoRA) β
βββββββββββββββ¬ββββββββββββββ
β
βΌ
βββββββββββββββββββββββββββββ
β Final Response β
β (Grounded + Tool-aware) β
βββββββββββββββββββββββββββββ
Instructionβstyle JSONL focused on infra reasoning:
- AWS IAM, EC2, S3, ECR
- Docker & Kubernetes
- CI/CD failures
- API debugging
Example:
{
"instruction": "Why does an EC2 instance fail to access S3?",
"input": "AccessDenied error",
"output": "The EC2 instance likely lacks an IAM role or the attached policy does not allow s3:GetObject..."
}python scripts/prepare_data.pyFormats data into a modelβfriendly instruction template.
python scripts/train_lora.py- Efficient
- Low VRAM
- Domainβfocused
python scripts/merge_lora.pyProduces a standalone model for inference & upload.
OpenNanoBanana is RAGβfirst, not RAGβboltedβon.
User Question
β
Preβcheck (tools)
β
Vector Retrieval (FAISS)
β
Context Assembly
β
Prompt Injection
β
LLM Answer
- Markdown / PDF docs
- Cloud & DevOps references
- Userβsupplied documents
Ingest once:
python rag/ingest.pyRun interactive QA:
python rag/qa.pyInstead of guessing, the model asks for missing info.
- AWS β asks for region, account, service
- Logs β requests error logs
- API β asks for endpoint, auth, method
Example:
User: EC2 cannot pull image from ECR Model: Please confirm the AWS region and ensure the EC2 IAM role has
ecr:GetAuthorizationTokenpermission.
This is intentional and by design.
Most projects skip this. OpenNanoBanana doesnβt.
- Keyword Coverage β expected technical concepts
- Groundedness β answer vs retrieved context
- Hallucination Score β unsupported content
- Refusal Correctness β asks for info instead of guessing
Run batch evaluation:
python evals/run_eval.pypython dashboard/gradio_eval.pyShows:
- Perβquestion scores
- Hallucination trends
- Grounding quality
uvicorn app.main:app --reloadpython ui/gradio_app.pyA real, productionβstyle LLM demo β not a notebook.
- Model:
hf.co/<Naveenub>/Open-NanoScale-LLM - Live Space:
hf.co/spaces/<Naveenub>/Open-NanoScale-LLM-demo
Includes:
- Model card
- License
- Demo UI
Apache License 2.0
You are free to:
- Use commercially
- Modify
- Redistribute
This project is a cleanβroom, independent openβsource implementation.
- Not affiliated with Google
- Not derived from any proprietary NanoBanana system
- No private or restricted data used
- LLM / AI Engineers
- Infra & DevOps Engineers exploring AI
- Researchers interested in smallβmodel systems
- Anyone tired of hypeβonly LLM repos
- Multiβknowledgeβbase RAG
- GGUF / Ollama packaging
- Tool execution (not just awareness)
- Deterministic infra mode
OpenNanoBanana is meant to be:
- Readable
- Reproducible
- Honest
- Useful
If you build on it β ship it. ππ