diff --git a/.github/pull_request_template.md b/.github/pull_request_template.md
new file mode 100644
index 0000000..b93e50b
--- /dev/null
+++ b/.github/pull_request_template.md
@@ -0,0 +1,39 @@
+## Summary
+
+
+
+-
+
+## Type of Change
+
+
+
+- [ ] Bug fix
+- [ ] New feature / enhancement
+- [ ] Documentation update
+- [ ] Refactor (no behavior change)
+- [ ] Chore (dependencies, CI, tooling)
+
+## Changes Made
+
+
+
+Resolves #
+
+## How to Test
+
+
+
+1.
+
+## Checklist
+
+- [ ] I have read the [Contributing Guide](../CONTRIBUTING.md)
+- [ ] My branch is up to date with `main`
+- [ ] New environment variables (if any) are documented in `.env.example` and the README
+- [ ] No secrets, API keys, or credentials are included in this PR
+- [ ] I have tested my changes locally
+
+## Screenshots (if applicable)
+
+
diff --git a/.github/workflows/code-scans.yaml b/.github/workflows/code-scans.yaml
index 2029a2f..940d9b7 100644
--- a/.github/workflows/code-scans.yaml
+++ b/.github/workflows/code-scans.yaml
@@ -37,7 +37,7 @@ jobs:
run: mkdir -p trivy-reports
- name: Run Trivy FS Scan
- uses: aquasecurity/trivy-action@0.24.0
+ uses: aquasecurity/trivy-action@0.35.0
with:
scan-type: 'fs'
scan-ref: '.'
diff --git a/README.md b/README.md
index 3902d38..b551fb0 100644
--- a/README.md
+++ b/README.md
@@ -1,11 +1,10 @@
-
+
# ๐ DocuBot - AI Documentation Generator
-AI-powered documentation generation using multi-provider LLMs and specialized micro-agent architecture for comprehensive README creation.
-
+An AI-powered full-stack application that automatically generates high-quality project documentation from source code repositories. Connect a GitHub repo, let specialized micro-agents analyze the codebase, architecture, dependencies, and APIs, and get structured README documentation in minutes, powered by multi-provider LLMs, OpenAI-compatible endpoints, or locally hosted models such as Ollama.
---
## ๐ Table of Contents
@@ -18,7 +17,8 @@ AI-powered documentation generation using multi-provider LLMs and specialized mi
- [Project Structure](#project-structure)
- [Usage Guide](#usage-guide)
- [LLM Provider Configuration](#llm-provider-configuration)
-- [Performance Benchmarks](#performance-benchmarks)
+- [Inference Benchmarks](#inference-benchmarks)
+- [Model Capabilities](#model-capabilities)
- [Environment Variables](#environment-variables)
- [Technology Stack](#technology-stack)
- [Troubleshooting](#troubleshooting)
@@ -28,7 +28,15 @@ AI-powered documentation generation using multi-provider LLMs and specialized mi
## Project Overview
-**DocuBot** is an intelligent documentation generation platform that analyzes GitHub repositories using specialized micro-agents to automatically create comprehensive, well-structured README documentation with minimal human intervention.
+**DocuBot** shows how agentic AI can be applied to one of the most time-consuming software tasks: documentation. The applicatoin analyzes real project evidence from a repository and uses specialized micro-agents to generate structured, context-aware README documentation that is more accurate and maintainable than traditional single-prompt generation.
+
+The application supports a flexible inference layer, allowing it to work with OpenAI, Groq, OpenRouter, custom OpenAI-compatible APIs, and local Ollama deployments. This makes it practical for cloud-based teams, enterprise environments, and privacy-sensitive local setups alike.
+
+This makes DocuBot suitable for:
+
+- **Enterprise teams** โ integrate with internal gateways, hosted APIs, or private inference infrastructure
+- **Local experimentation** โ run documentation generation with self-hosted models through Ollama
+- **Hardware benchmarking** โ measure SLM throughput on Apple Silicon, CUDA, or Intel Gaudi hardware
### How It Works
@@ -442,20 +450,21 @@ DocuBot/
### Performance Tips
-- **Model Selection**: For faster processing, use `gpt-4o-mini` or Groq's `llama-3.2-90b-text-preview`
-- **Local Development**: Use Ollama with `qwen2.5:7b` for private, offline documentation generation
-- **Monorepo**: Select specific subprojects for focused documentation
-- **PR Creation**: Requires `GITHUB_TOKEN` with `repo` scope in `api/.env`
+- **Use the largest model your hardware can sustain.** `qwen3:14b` produces the best documentation quality; `qwen3:4b` is faster and good for benchmarking.
+- **Lower `LLM_TEMPERATURE`** (e.g., `0.1`) for more factual, evidence-grounded documentation. Raise it slightly (e.g., `0.3โ0.5`) for more descriptive, narrative-style README prose.
+- **Keep repositories focused.** The agents analyze up to `MAX_FILES_TO_SCAN` files (default: 500). For large monorepos, use the built-in project selector to target a specific subproject rather than letting agents scan the entire repo.
+- **On Apple Silicon**, always run Ollama natively โ never inside Docker. The Metal GPU backend delivers significantly higher throughput for sequential multi-agent workloads compared to CPU-only inference.
+- **On Linux with an NVIDIA GPU**, set `CUDA_VISIBLE_DEVICES` before starting Ollama to target a specific GPU.
+- **For enterprise remote APIs**, choose a model with a large context window (โฅ16k tokens) to avoid truncation on longer inputs.
---
## LLM Provider Configuration
-DocuBot supports multiple LLM providers. Choose the one that best fits your needs:
+DocuBot supports multiple LLM providers. All providers are configured via the `.env` file. Set `INFERENCE_PROVIDER=ollama` for local inference.
-### OpenAI (Recommended for Production)
-**Best for**: Highest quality outputs, production deployments
+### OpenAI
- **Get API Key**: https://platform.openai.com/account/api-keys
- **Models**: `gpt-4o`, `gpt-4-turbo`, `gpt-4o-mini`
@@ -468,9 +477,9 @@ DocuBot supports multiple LLM providers. Choose the one that best fits your need
LLM_MODEL=gpt-4o
```
-### Groq (Fast & Free Tier)
+### Groq
-**Best for**: Fast inference, development, free tier testing
+Groq provides OpenAI-compatible endpoints with extremely fast inference (LPU hardware).
- **Get API Key**: https://console.groq.com/keys
- **Models**: `llama-3.2-90b-text-preview`, `llama-3.1-70b-versatile`
@@ -484,14 +493,13 @@ DocuBot supports multiple LLM providers. Choose the one that best fits your need
LLM_MODEL=llama-3.2-90b-text-preview
```
-### Ollama (Local & Private)
+### Ollama
-**Best for**: Local deployment, privacy, no API costs, offline operation
+Runs inference locally on the host machine with full GPU acceleration.
-- **Install**: https://ollama.com/download
-- **Pull Model**: `ollama pull qwen2.5:7b`
-- **Models**: `qwen2.5:7b`, `llama3.1:8b`, `llama3.2:3b`
-- **Pricing**: Free (local hardware costs only)
+- **Install Ollama**: https://ollama.com/download
+- **Pull Model**: `ollama pull qwen3:14b`
+- **Models**: `qwen3:4b`, `llama3.1:8b`, `llama3.2:3b`
- **Configuration**:
```bash
LLM_PROVIDER=ollama
@@ -505,15 +513,15 @@ DocuBot supports multiple LLM providers. Choose the one that best fits your need
curl -fsSL https://ollama.com/install.sh | sh
# Pull model
- ollama pull qwen2.5:7b
+ ollama pull qwen3:14b
- # Verify it's running
+ # Verify Ollama is running:
curl http://localhost:11434/api/tags
```
-### OpenRouter (Multi-Model Access)
+### OpenRouter
-**Best for**: Access to multiple models through one API, model flexibility
+OpenRouter provides a unified API across hundreds of models from different providers.
- **Get API Key**: https://openrouter.ai/keys
- **Models**: Claude, Gemini, GPT-4, Llama, and 100+ others
@@ -539,6 +547,13 @@ LLM_BASE_URL=https://your-custom-endpoint.com/v1
LLM_MODEL=your-model-name
```
+If the endpoint uses a private domain mapped in `/etc/hosts`, also set:
+
+```bash
+LOCAL_URL_ENDPOINT=your-private-domain.internal
+```
+
+
### Switching Providers
To switch providers, simply update `api/.env` and restart:
@@ -557,27 +572,92 @@ docker compose up -d
---
-## Performance Benchmarks
-
-The following benchmarks were collected by running DocuBot's full 9-agent documentation pipeline across three inference environments. Use these results to choose the right deployment profile for your needs.
+## Inference Benchmarks
-> **Note:** Intel Enterprise Inference was tested on Intel Xeon hardware to demonstrate on-premises SLM deployment for enterprise codebases.
+The table below compares inference performance across different providers, deployment modes, and hardware profiles using a standardized DocuBot's full 9-agent documentation pipeline.
-### Results
-
-| Model Type / Inference Provider | Model Name | Deployment | Context Window | Avg Input Tokens | Avg Output Tokens | Avg Total Tokens / Request | P50 Latency (ms) | P95 Latency (ms) | Throughput (req/sec) | Hardware Profile |
+| Provider | Model | Deployment | Context Window | Avg Input Tokens | Avg Output Tokens | Avg Total Tokens / Request | P50 Latency (ms) | P95 Latency (ms) | Throughput (req/sec) | Hardware |
|---|---|---|---|---|---|---|---|---|---|---|
-| vLLM | Qwen3-4B-Instruct-2507 | Local | 262.1K | 3,040 | 307.7 | 5809 | 15,864 | 40,809 | 0.0580 | Apple Silicon (Metal) |
-| Enterprise Inference / SLM ยท [Intel OPEA EI](https://opea.dev) | Qwen3-4B-Instruct-2507 | CPU (Xeon) | 8.1K | 4,211.9 | 270 | 4481 | 10,540 | 32,205 | 0.076 | CPU-only |
+| vLLM | Qwen3-4B-Instruct-2507 | Local | 262.1K | 3,040 | 307.7 | 5809 | 15,864 | 40,809 | 0.0580 | Apple Silicon (Metal)(Macbook Pro M4) |
+| [Intel OPEA EI](https://github.com/opea-project/Enterprise-Inference) | Qwen3-4B-Instruct-2507 | CPU (Xeon) | 8.1K | 4,211.9 | 270 | 4481 | 10,540 | 32,205 | 0.076 | CPU-only |
| OpenAI (Cloud) | gpt-4o-mini | API (Cloud) | 128K | 3,820.11 | 316.41 | 4136.52 | 7,760 | 23,535 | 0.108 | N/A |
+> **Notes:**
+>
+> - All benchmarks use the same Documentation generation workflow. Token counts may vary slightly per run due to non-deterministic model output.
+> - vLLM on Apple Silicon uses Metal (MPS) GPU acceleration.
+> - [Intel OPEA Enterprise Inference](https://github.com/opea-project/Enterprise-Inference) runs on Intel Xeon CPUs without GPU acceleration.
+
+---
+
+## Model Capabilities
+
+### Qwen3-4B-Instruct-2507
-### Model Capabilities
+A 4-billion-parameter open-weight code model from Alibaba's Qwen team (July 2025 release), designed for on-prem and edge deployment.
-| Model | Highlights |
-|---|---|
-| **Qwen3-4B-Instruct-2507** | 4B-parameter code-specialized model with 262.1K native context (deployment-limited to 8.1K on Xeon CPU). Supports multi-agent documentation generation, code analysis, and structured JSON output. Enables full on-premises deployment with data sovereignty for enterprise codebases. |
-| **gpt-4o-mini** | Cloud-native multimodal model with 128K context, optimized for code understanding and technical documentation. Delivers 42% higher throughput and 26% lower latency versus CPU-based alternatives while supporting concurrent multi-agent orchestration at cloud scale. |
+
+| Attribute | Details |
+| --------------------------- | ------------------------------------------------------------------------------------------------------------------- |
+| **Parameters** | 4.0B total (3.6B non-embedding) |
+| **Architecture** | Transformer with Grouped Query Attention (GQA) โ 36 layers, 32 Q-heads / 8 KV-heads |
+| **Context Window** | 262,144 tokens (256K) native |
+| **Reasoning Mode** | Non-thinking only (Instruct-2507 variant). Separate Thinking-2507 variant available with always-on chain-of-thought |
+| **Tool / Function Calling** | Supported; MCP (Model Context Protocol) compatible |
+| **Structured Output** | JSON-structured responses supported |
+| **Multilingual** | 100+ languages and dialects |
+| **Code Benchmarks** | MultiPL-E: 76.8%, LiveCodeBench v6: 35.1%, BFCL-v3 (tool use): 61.9 |
+| **Quantization Formats** | GGUF (Q4_K_M ~2.5 GB, Q8_0 ~4.3 GB), AWQ (int4), GPTQ (int4), MLX (4-bit ~2.3 GB) |
+| **Inference Runtimes** | Ollama, vLLM, llama.cpp, LMStudio, SGLang, KTransformers |
+| **Fine-Tuning** | Full fine-tuning and adapter-based (LoRA); 5,000+ community adapters on HuggingFace |
+| **License** | Apache 2.0 |
+| **Deployment** | Local, on-prem, air-gapped, cloud โ full data sovereignty |
+
+
+### GPT-4o-mini
+
+OpenAI's cost-efficient multimodal model, accessible exclusively via cloud API.
+
+
+| Attribute | Details |
+| --------------------------- | --------------------------------------------------------------------------------- |
+| **Parameters** | Not publicly disclosed |
+| **Architecture** | Multimodal Transformer (text + image input, text output) |
+| **Context Window** | 128,000 tokens input / 16,384 tokens max output |
+| **Reasoning Mode** | Standard inference (no explicit chain-of-thought toggle) |
+| **Tool / Function Calling** | Supported; parallel function calling |
+| **Structured Output** | JSON mode and strict JSON schema adherence supported |
+| **Multilingual** | Broad multilingual support |
+| **Code Benchmarks** | MMMLU: ~87%, strong HumanEval and MBPP scores |
+| **Pricing** | $0.15 / 1M input tokens, $0.60 / 1M output tokens (Batch API: 50% discount) |
+| **Fine-Tuning** | Supervised fine-tuning via OpenAI API |
+| **License** | Proprietary (OpenAI Terms of Use) |
+| **Deployment** | Cloud-only โ OpenAI API or Azure OpenAI Service. No self-hosted or on-prem option |
+| **Knowledge Cutoff** | October 2023 |
+
+
+### Comparison Summary
+
+
+| Capability | Qwen3-4B-Instruct-2507 | GPT-4o-mini |
+| ------------------------------- | -------------------------------- | --------------------------------- |
+| Code Analysis & Documentation Generation | Yes | Yes |
+| Multi-agent / agentic task execution | Yes | Yes |
+| Mermaid / architecture diagram Generation | Yes | Yes |
+| Function / tool calling | Yes | Yes |
+| JSON structured output | Yes | Yes |
+| On-prem / air-gapped deployment | Yes | No |
+| Data sovereignty | Full (weights run locally) | No (data sent to cloud API) |
+| Open weights | Yes (Apache 2.0) | No (proprietary) |
+| Custom fine-tuning | Full fine-tuning + LoRA adapters | Supervised fine-tuning (API only) |
+| Quantization for edge devices | GGUF / AWQ / GPTQ / MLX | N/A |
+| Multimodal (image input) | No | Yes |
+| Native context window | 256K | 128K |
+
+
+> Both models support Code Analysis & Documentation Generation, Multi-agent / agentic task execution, Mermaid diagram generation, function calling, and JSON-structured output. However, only Qwen3-4B offers open weights, data sovereignty, and local deployment flexibility โ making it suitable for air-gapped, regulated, or cost-sensitive environments. GPT-4o-mini offers lower latency and higher throughput via OpenAI's cloud infrastructure, with added multimodal capabilities.
+
+---
## Environment Variables