Latin: "I perceive."
Two ML engines. Zero cloud dependencies. One Next.js app.
Train intent classifiers in your browser. Triage CVE vulnerabilities with severity, OWASP mapping, and remediation guidance.
Both run on pure TypeScript math — no Python runtime, no API keys, no GPU required.
What It Does • Quick Start • NLU Engine • Vuln Triage • Deploy • Train • Scalability • API
| Capability | NLU Bot Trainer | Vulnerability Triage |
|---|---|---|
| What | Train intent classifiers for chatbots | 4-mode vulnerability triage: CVE text, CVE ID lookup, code scanning, dependency audit |
| How | 5-classifier stacking ensemble (171K params) | ResNet-MLP (9.7M params) + NVD API + regex engine + OSV API |
| Output | Intent + confidence + per-model scores | CWE + severity + OWASP + remediation + CVSS + affected products + fix versions |
| Runs where | 100% in-browser, zero server | ML: API route · Code scanner: browser · NVD/OSV: proxy routes |
| Training | In-browser (30s) or Python pipeline | PyTorch on GCP VM (~2 hours) |
| Data | 420 pre-loaded examples, 12 intents | 224K+ CVEs + NVD live data + OSV vulnerability database |
git clone https://github.com/divyamohan1993/nlu-bot-trainer.git
cd nlu-bot-trainer
npm install
npm run devOpen http://localhost:3000. Both engines are ready.
Or with Docker:
docker compose up --build- Open the app → 420 pre-loaded e-commerce support examples across 12 intents
- Click Train → Working model in under 3 seconds
- Go to /test → Type "where is my package?" → See
order_statusat 95%+ confidence
- Go to /vulnerability → 4 tabs: Description, CVE Lookup, Code Scanner, Dependency Scan
- CVE Lookup: Type
CVE-2021-44228→ CVSS 10.0 CRITICAL, 342 affected products, ML classification - Code Scanner: Click "Python Vulns" sample → 5 findings in <5ms, zero network calls
- Dependency Scan: Click "npm package.json" sample → OSV vulnerability report with fix versions
A research-grade intent classification engine with autonomous self-learning. Five classifiers vote through learned meta-weights to produce predictions that no single model can match alone.
User Input → Tokenizer V2 (6 strategies) → MurmurHash3 (1024-dim)
│
┌──────────┬──────────┬──────────┬─────────┼─────────┐
▼ ▼ ▼ ▼ ▼ │
Logistic Complement Linear MLP Gradient │
Regress. NB v2 SVM 128h Boost │
12K par 7K par 12K par 133K 7K par │
└──────────┴────┬─────┴──────────┴─────────┘ │
▼ │
Cross-Validated Meta-Weights Drift ─┘
▼ Monitoring
Prediction Result
Why five classifiers? Each fails differently. Linear models miss overlapping features. Naive Bayes struggles with correlations. SVMs overfit tight margins. Neural nets need lots of data. Boosted stumps miss smooth boundaries. The ensemble's error rate is strictly lower than any individual.
- Self-learning loop — Evaluates → diagnoses weak intents → augments data → pseudo-labels high-confidence predictions → curriculum-orders → retrains → validates. Accepts only if accuracy doesn't regress. Fully autonomous.
- Drift detection — Page-Hinkley (concept drift), DDM (error rate drift), vocabulary distribution monitoring. Real-time dashboard.
- Model registry — Semantic versioning, champion/challenger lifecycle, A/B testing with configurable traffic splits.
- 7-platform export — Rasa, Dialogflow, Lex, LUIS, Wit.ai, CSV, JSON.
- Zero dependencies — Every algorithm (MurmurHash3, Pegasos SVM, CNB, backprop MLP, gradient boosted stumps) implemented from scratch in TypeScript.
| Metric | Value |
|---|---|
| Inference | 1–6 ms (modern), 50–200μs (optimized path) |
| Training | 30–60 seconds (full ensemble + meta-weights) |
| Model size | ~2 MB (localStorage) |
| Parameters | 171,772 |
For deep algorithm references and the math behind each classifier, see docs/ARCHITECTURE.md.
Four ways to find vulnerabilities — pick the one that matches what you have.
| Tab | You have... | You get... |
|---|---|---|
| CVE Description | Prose text describing a vulnerability | CWE classification + severity + OWASP + remediation |
| CVE ID Lookup | A CVE ID (e.g., CVE-2021-44228) | NVD metadata (CVSS, products, dates) + ML classification + CWE agreement check |
| Code Scanner | Source code (any language) | Line-by-line vulnerability findings with CWE mapping and fix guidance |
| Dependency Scan | package.json / requirements.txt / pom.xml | Known vulnerable dependencies with severity, fix versions, and CVE links |
Type a CVE ID, get everything: CVSS score and vector, severity rating, affected products, NVD references, plus ML-powered CWE classification cross-referenced against NVD's ground truth.
Paste code, get instant results. ~30 regex patterns detect SQL injection, XSS, command injection, path traversal, hardcoded secrets, insecure crypto, deserialization, SSRF, and buffer overflows. Runs entirely client-side in <5ms — zero network calls, zero data leaves your browser.
Pattern-based scanning catches common vulnerability patterns but cannot analyze data flow. Use alongside SAST tools for comprehensive coverage.
Paste a manifest file. The scanner parses your dependencies, queries the OSV vulnerability database, and returns known CVEs with severity, fix versions, and CWE enrichment. Supports npm (package.json), PyPI (requirements.txt), and Maven (pom.xml).
For each CVE description (direct input or fetched via CVE ID), the classifier returns:
- CWE ID + human-readable name — Not just "CWE-89", but "SQL Injection"
- Severity — Critical / High / Medium / Low, mapped from CWE category and exploit impact
- OWASP Top 10 2021 category — Where this weakness fits in the security landscape
- Remediation guidance — 3–4 actionable steps specific to the weakness category
- Top 5 predictions — Ranked alternatives with confidence scores
- Architecture: ResNet-MLP (TF-IDF → 192 → 192+skip → 96 → 349 classes)
- Parameters: 9.7M
- Training data: 224K+ CVEs (NVD, 1999–2025)
- Accuracy: 71% top-1, 85% top-5 across 349 CWE categories
- Inference: 1–30ms server-side
71% across 349 categories is a triage starting point, not a final determination. The model helps security teams prioritize — it doesn't replace expert analysis.
All 349 supported CWE classes include enrichment data. The top ~50 most critical CWEs (SQL injection, XSS, buffer overflow, RCE, etc.) have hand-curated descriptions and remediation. The remaining ~300 use category-based enrichment with MITRE CWE names.
docker compose up --build
# App available at http://localhost:3000git clone https://github.com/divyamohan1993/nlu-bot-trainer.git
cd nlu-bot-trainer
npm ci && npm run build
node .next/standalone/server.js
# Reverse proxy port 3000 with Nginx/CaddyFor detailed deployment guides (Nginx config, systemd service, model weights management), see docs/DEPLOYMENT.md.
Just use the app. Add intents, add examples, click Train. The 5-classifier ensemble trains in 30–60 seconds entirely in your browser. No server, no setup.
For large datasets (50K+), use the Python pipeline:
cd training
pip install -r requirements.txt
python train_nlu.py --input data.json --output results/ --optimizeThe vulnerability model was trained on a GCP VM with PyTorch:
# On a VM with 16GB+ RAM
cd training/vuln-classifier
pip install -r requirements.txt
python train.py --epochs 50 --batch-size 256
python export_weights.py # → weights.json + tfidf_vocab.json + labels.jsonExported weights drop directly into public/models/vuln-classifier/ for the Next.js API route.
For full training guides (data preparation, hyperparameters, checkpoints, custom datasets), see docs/TRAINING.md.
"Can I run this with Ollama?" — No. Sentio's models are classifiers, not LLMs. Ollama serves large language models. These are different things.
What actually runs where:
| Engine | Where It Runs | Infrastructure Needed |
|---|---|---|
| NLU Bot Trainer | In the browser | None. Zero. The user's browser IS the compute. |
| Vulnerability Classifier | Next.js API route | A Node.js server (Docker container, Vercel, any VM) |
The NLU engine needs no hosting at all — it trains and infers in the browser tab. The vulnerability classifier is a stateless API route that loads model weights on startup.
For ONNX-based serving (Triton, ONNX Runtime), the training pipeline exports ONNX format. See docs/SELF-HOSTING.md.
| Tier | Users | NLU | Vuln Classifier | Infra |
|---|---|---|---|---|
| 0 | 0–10K | Client-side (free) | Single container | 1 VM / Vercel |
| 1 | 10K–100K | Client-side (free) | 2–4 containers behind LB | Horizontal scale |
| 2 | 100K–1M | Client-side (free) | Auto-scaling group | Cloud Run / ECS |
| 3 | 1M+ | Client-side (free) | Multi-region deployment | CDN + edge |
NLU scales infinitely for free. Every user brings their own compute — the browser. 10 users or 10 million users, the server load is identical (serving static files).
Vulnerability classifier scales horizontally. It's a stateless API route. No sessions, no database, no shared state. Add containers behind a load balancer.
import { trainEnsemble, predictEnsemble } from "@/lib/engine/ensemble";
const model = trainEnsemble([
{ text: "hello", intent: "greet" },
{ text: "track my order", intent: "order_status" },
]);
const result = predictEnsemble("hey there", model);
// result.intent → "greet"
// result.confidence → 0.94
// result.ranking → [{ name: "greet", confidence: 0.94 }, ...]import { runSelfLearningLoop } from "@/lib/self-learn/autonomous-loop";
const result = runSelfLearningLoop(trainingData, {
maxIterations: 10,
pseudoLabelThreshold: 0.92,
enableAugmentation: true,
});
// result.finalAccuracy, result.totalNewExamplesPOST /api/classify-vuln
Content-Type: application/json
{ "text": "SQL injection in login endpoint...", "topK": 5 }
Response:
{
"predictions": [{
"cwe": "CWE-89",
"score": 0.71,
"name": "SQL Injection",
"severity": "High",
"owasp": "A03:2021 Injection",
"remediation": ["Use parameterized queries...", "..."],
"category": "Injection"
}],
"inferenceMs": 1.2,
"modelInfo": { "parameters": "9.7M", "classes": 349, "architecture": "ResNet-MLP" }
}GET /api/nvd-lookup?cveId=CVE-2021-44228
Returns NVD metadata: description, CVSS v3.1 score/vector/severity, ground-truth CWEs, affected products (CPE), references, and timestamps. Optional NVD_API_KEY env var for higher rate limits.
POST /api/osv-scan
Content-Type: application/json
{ "dependencies": [{ "name": "lodash", "version": "4.17.20", "ecosystem": "npm" }] }
Returns known vulnerabilities per dependency from OSV.dev: vuln ID, summary, severity, fix version, published date, and CWE IDs. Max 100 dependencies per request.
import { exportTrainingData } from "@/lib/enterprise/export-formats";
const { content, filename } = exportTrainingData(data, "rasa");
// Rasa YAML v3.1, Dialogflow ES, Lex V2, LUIS, Wit.ai, CSV, JSON| Guide | What's Covered |
|---|---|
| ARCHITECTURE.md | Algorithm deep dive, ensemble math, classifier internals, design system |
| DEPLOYMENT.md | Docker, Vercel, GCP, Nginx config, model weights |
| TRAINING.md | Browser training, Python pipeline, vulnerability model training, custom data |
| SELF-HOSTING.md | What runs where, ONNX export, the honest Ollama answer |
| CONTRIBUTING.md | Dev setup, PR process, code standards |
| SECURITY.md | Supported versions, reporting vulnerabilities |
| CHANGELOG.md | Version history |
WCAG 2.2 AA compliant. Full keyboard navigation (Alt+1–8 page switching). ARIA labels, screen reader support, reduced motion, skip navigation.
See CONTRIBUTING.md.
See SECURITY.md.
GNU Affero General Public License v3.0
Sentio — I perceive.
Built for dmj.one • Aatmanirbhar Bharat