Skip to content

Naveenub/Open-NanoScale-LLM

Folders and files

NameName
Last commit message
Last commit date

Latest commit

Β 

History

86 Commits
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 

Repository files navigation

🍌 OpenNanoScaleLLM

OpenNanoScaleLLM is a clean‑room, open‑source nano‑scale Large Language Model (LLM) inspired by the ideas behind Google’s internal small‑model research β€” but built entirely in the open, using Hugging Face Transformers.

It is designed to be small, fast, infra‑aware, tool‑aware, and explainable, not just fluent.

This is not a toy fine‑tune. It is a full, end‑to‑end LLM system with training, RAG, tools, evaluation, and live demos.


πŸš€ Why OpenNanoScaleLLM Exists

There is no open‑source Google NanoBanana:

  • No public repo
  • No weights
  • No training code

That creates a gap.

OpenNanoScaleLLM fills that gap with:

  • A real nano‑LLM (β‰ˆ1.5B params)
  • Infrastructure & DevOps specialization
  • Retrieval‑Augmented Generation (RAG)
  • Tool‑aware reasoning to prevent hallucinations
  • Transparent evaluation metrics

All built in a clean‑room, reproducible way.


🧠 Core Design Goals

  • 🧩 Nano‑scale – runs on modest GPUs / CPU when quantized
  • ⚑ Fast inference – LoRA + efficient base model
  • 🎯 Domain‑specialized – cloud, DevOps, Linux, APIs
  • πŸ” Grounded answers – RAG + context checks
  • πŸ› οΈ Tool‑aware – asks for logs, regions, APIs when needed
  • πŸ”“ Fully open – Apache‑2.0 license

πŸ“ Model Overview

Attribute Value
Base model Qwen2.5-1.5B
Parameters ~1.5B
Fine‑tuning LoRA (SFT)
Context length 4k tokens
License Apache‑2.0
Library Hugging Face Transformers

System Overview

Retrieval‑Augmented Generation as a multi‑stage observation pipeline, inspired by light behavior.

Stages:

  1. Ingestion (Observation)
  2. Chunking (Diffraction)
  3. Embedding (Refraction)
  4. Retrieval (Reflection)
  5. Reasoning (Interpretation)
  6. Evaluation (Noise & Hallucination Measurement)

Tech Stack

Layer Choice
Backend FastAPI
Vector DB Chroma (local) / Pinecone (prod)
LLM OpenAI / compatible
Embeddings text-embedding-3-large
UI React + Tailwind
PDF Parsing PyMuPDF
Eval RAGAS‑style metrics
Deploy Docker + AWS EC2

πŸ—οΈ Repository Structure

Open-NanoScale-LLM/
β”œβ”€β”€ README.md
β”œβ”€β”€ LICENSE
β”œβ”€β”€ requirements.txt
β”œβ”€β”€ configs/
β”‚   β”œβ”€β”€ model.yaml
β”‚   β”œβ”€β”€ training.yaml
β”‚   └── lora.yaml
β”œβ”€β”€ data/
β”‚   β”œβ”€β”€ raw/
β”‚   β”‚   β”œβ”€β”€ devops_notes.md
β”‚   β”‚   β”œβ”€β”€ docker_errors.md
β”‚   β”‚   └── k8s_troubleshooting.md
β”‚   β”œβ”€β”€ processed/
β”‚   β”‚   β”œβ”€β”€ instructions.jsonl
β”‚   β”‚   β”œβ”€β”€ rag_chunks.jsonl
β”‚   β”‚   └── README.md
β”‚   └── samples.jsonl
β”œβ”€β”€ scripts/
β”‚   β”œβ”€β”€ prepare_data.py
β”‚   β”œβ”€β”€ train_lora.py
β”‚   β”œβ”€β”€ merge_lora.py
β”‚   β”œβ”€β”€ inference.py
β”‚   └── evaluate.py
β”œβ”€β”€ rag/
β”‚   β”œβ”€β”€ ingest.py
β”‚   β”œβ”€β”€ retriever.py
β”‚   β”œβ”€β”€ prompt.py
β”‚   └── qa.py
β”œβ”€β”€ tools/
β”‚   β”œβ”€β”€ aws.py
β”‚   β”œβ”€β”€ logs.py
β”‚   └── api.py
β”œβ”€β”€ app/
β”‚   β”œβ”€β”€ main.py
β”‚   β”œβ”€β”€ rag_engine.py
β”‚   └── schemas.py
β”œβ”€β”€ ui/
β”‚   └── gradio_app.py
β”œβ”€β”€ evals/
β”‚   β”œβ”€β”€ test_cases.json
β”‚   β”œβ”€β”€ metrics.py
β”‚   └── run_eval.py
β”œβ”€β”€ dashboard/
β”‚   └── gradio_eval.py
└── rag-light-system/
    β”œβ”€β”€ backend/
    β”‚   β”œβ”€β”€ app.py
    β”‚   β”œβ”€β”€ ingest.py
    β”‚   β”œβ”€β”€ retriever.py
    β”‚   β”œβ”€β”€ llm.py
    β”‚   β”œβ”€β”€ evaluate.py
    β”‚   β”œβ”€β”€ config.py
    β”‚   └── requirements.txt
    β”œβ”€β”€ frontend/
    β”‚   β”œβ”€β”€ src/App.jsx
    β”‚   β”œβ”€β”€ src/components/StageView.jsx
    β”‚   └── src/index.css
    β”œβ”€β”€ docker-compose.yml
    └── README.md

🧱 ASCII Architecture Diagram (README-friendly)

This works perfectly in README.md and GitHub renders it cleanly.

                         β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”
                         β”‚       User / Client       β”‚
                         β”‚  (CLI, Gradio, FastAPI)   β”‚
                         β””β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”¬β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”˜
                                       β”‚
                                       β–Ό
                         β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”
                         β”‚   OpenNanoScaleLLM Engine β”‚
                         β”‚  (Inference Orchestrator) β”‚
                         β””β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”¬β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”˜
                                       β”‚
               β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”Όβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”
               β”‚                       β”‚                       β”‚
               β–Ό                       β–Ό                       β–Ό
     β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”   β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”   β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”
     β”‚ Tool Prechecks  β”‚   β”‚   RAG Retriever     β”‚   β”‚  Prompt Builder β”‚
     β”‚ (AWS / Logs /   β”‚   β”‚ (FAISS / Chroma)    β”‚   β”‚  System + Rules β”‚
     β”‚  API Context)   β”‚   β””β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”¬β”€β”€β”€β”€β”€β”€β”€β”€β”€β”˜   β””β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”¬β”€β”€β”€β”€β”€β”€β”˜
     β””β”€β”€β”€β”€β”€β”€β”€β”€β”¬β”€β”€β”€β”€β”€β”€β”€β”€β”˜               β”‚                        β”‚
              β”‚                        β–Ό                        β”‚
              β”‚           β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”            β”‚
              β”‚           β”‚   Vector Embeddings    β”‚            β”‚
              β”‚           β”‚   (MiniLM / SBERT)     β”‚            β”‚
              β”‚           β””β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”¬β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”˜            β”‚
              β”‚                       β”‚                         β”‚
              β””β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”Όβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”˜
                                      β–Ό
                         β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”
                         β”‚    OpenNanoScaleLLM       β”‚
                         β”‚    (Qwen2.5-1.5B + LoRA)  β”‚
                         β””β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”¬β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”˜
                                       β”‚
                                       β–Ό
                         β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”
                         β”‚      Final Response       β”‚
                         β”‚  (Grounded + Tool-aware)  β”‚
                         β””β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”˜

πŸ§ͺ Training Pipeline

1️⃣ Dataset

Instruction‑style JSONL focused on infra reasoning:

  • AWS IAM, EC2, S3, ECR
  • Docker & Kubernetes
  • CI/CD failures
  • API debugging

Example:

{
  "instruction": "Why does an EC2 instance fail to access S3?",
  "input": "AccessDenied error",
  "output": "The EC2 instance likely lacks an IAM role or the attached policy does not allow s3:GetObject..."
}

2️⃣ Data Preparation

python scripts/prepare_data.py

Formats data into a model‑friendly instruction template.


3️⃣ LoRA Fine‑Tuning

python scripts/train_lora.py
  • Efficient
  • Low VRAM
  • Domain‑focused

4️⃣ Merge LoRA

python scripts/merge_lora.py

Produces a standalone model for inference & upload.


πŸ” Retrieval‑Augmented Generation (RAG)

OpenNanoBanana is RAG‑first, not RAG‑bolted‑on.

RAG Flow

User Question
   ↓
Pre‑check (tools)
   ↓
Vector Retrieval (FAISS)
   ↓
Context Assembly
   ↓
Prompt Injection
   ↓
LLM Answer

Knowledge Sources

  • Markdown / PDF docs
  • Cloud & DevOps references
  • User‑supplied documents

Ingest once:

python rag/ingest.py

Run interactive QA:

python rag/qa.py

πŸ› οΈ Tool‑Aware Reasoning (Anti‑Hallucination)

Instead of guessing, the model asks for missing info.

Built‑in Tool Signals

  • AWS β†’ asks for region, account, service
  • Logs β†’ requests error logs
  • API β†’ asks for endpoint, auth, method

Example:

User: EC2 cannot pull image from ECR Model: Please confirm the AWS region and ensure the EC2 IAM role has ecr:GetAuthorizationToken permission.

This is intentional and by design.


πŸ“Š Evaluation & Hallucination Metrics

Most projects skip this. OpenNanoBanana doesn’t.

Metrics Implemented

  • Keyword Coverage – expected technical concepts
  • Groundedness – answer vs retrieved context
  • Hallucination Score – unsupported content
  • Refusal Correctness – asks for info instead of guessing

Run batch evaluation:

python evals/run_eval.py

Visual Dashboard

python dashboard/gradio_eval.py

Shows:

  • Per‑question scores
  • Hallucination trends
  • Grounding quality

🌐 Live Demo

Backend (FastAPI)

uvicorn app.main:app --reload

Frontend (Gradio)

python ui/gradio_app.py

A real, production‑style LLM demo β€” not a notebook.


πŸ€— Hugging Face Release

  • Model: hf.co/<Naveenub>/Open-NanoScale-LLM
  • Live Space: hf.co/spaces/<Naveenub>/Open-NanoScale-LLM-demo

Includes:

  • Model card
  • License
  • Demo UI

βš–οΈ License

Apache License 2.0

You are free to:

  • Use commercially
  • Modify
  • Redistribute

⚠️ Disclaimer

This project is a clean‑room, independent open‑source implementation.

  • Not affiliated with Google
  • Not derived from any proprietary NanoBanana system
  • No private or restricted data used

🎯 Who This Is For

  • LLM / AI Engineers
  • Infra & DevOps Engineers exploring AI
  • Researchers interested in small‑model systems
  • Anyone tired of hype‑only LLM repos

πŸ›£οΈ Roadmap

  • Multi‑knowledge‑base RAG
  • GGUF / Ollama packaging
  • Tool execution (not just awareness)
  • Deterministic infra mode

⭐ Final Note

OpenNanoBanana is meant to be:

  • Readable
  • Reproducible
  • Honest
  • Useful

If you build on it β€” ship it. πŸŒπŸš€

About

🧠 Open-NanoScale-LLM is a production-minded, nano-scale LLM system demonstrating how small models can achieve reliable, grounded generation through retrieval and deterministic pipelines.

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

 
 
 

Contributors