Clawrity — Context Document for LLM Continuity

Purpose

This document allows another LLM session to continue development without breaking the system. Read this before making any changes.

Project Identity

Name: Clawrity
Type: AI Business Intelligence Platform
Language: Python 3.11+
Framework: FastAPI + LangChain + Groq/DeepSeek

Key Decisions Made

LLM Strategy (IMPORTANT — Cost Optimization):
- Groq (FREE): NL-to-SQL, QA scoring, draft generation, all non-critical tasks
- DeepSeek (PAID): Final polished chat responses ONLY (use sparingly)
- Rationale: Students with limited budget — maximize free API usage
Channel Priority: Slack first, then Teams, then WhatsApp
Data Source: BigQuery (real) with Mock Data Service fallback for demo
Deployment: Azure VM with Docker containers (Terraform in infra/)
Client Config: YAML files in clients/ directory — one file per client
No Hardcoded Secrets: All credentials via environment variables (.env file)

Architecture Patterns

LLM Usage Strategy

┌─────────────────────────────────────────────────────────────────┐
│                    LLM USAGE STRATEGY                           │
├─────────────────────────────────────────────────────────────────┤
│                                                                 │
│  GROQ (FREE)                    DEEPSeek (PAID)                 │
│  ─────────────                  ────────────────                │
│  • NL-to-SQL generation         • Final chat response polish    │
│  • QA scoring/hallucination     • (Only when explicitly needed) │
│  • Draft summaries                                              │
│  • Headline extraction                                          │
│  • All retries                                                  │
│                                                                 │
│  Model: llama-3.3-70b-versatile  Model: deepseek-chat           │
│  Cost: $0                        Cost: Pay per token            │
│                                                                 │
└─────────────────────────────────────────────────────────────────┘

Agent Pipeline

User Query → NL-to-SQL (Groq) → Data Fetch → Gen Agent (Groq) → QA Agent (Groq) → Response
                                                                          │
                                                                          ▼
                                                                    Score < threshold?
                                                                          │
                                                                    Yes → Retry (max 3x, Groq)
                                                                    No  → Optional DeepSeek polish
                                                                          │
                                                                          ▼
                                                                    Final Response

Client Isolation

Each client has isolated config (YAML)
Each client has isolated data (BigQuery dataset or mock)
No shared state between clients
Framework is client-agnostic

Mock Data Strategy

MockDataService generates realistic business data
Same pipeline as real data (Gen Agent → QA Agent)
Configurable via use_mock: true in client YAML
Seed-based for reproducible demos

File Map

File	Purpose
`src/clawrity/models/client.py`	Client configuration models
`src/clawrity/models/chat.py`	Chat request/response models
`src/clawrity/models/digest.py`	Digest report models
`src/clawrity/models/qa.py`	QA scoring models
`src/clawrity/models/rag.py`	RAG recommendation models (Phase 2)
`src/clawrity/models/forecast.py`	Forecast models (Phase 3)
`src/clawrity/config/settings.py`	Environment-based settings
`src/clawrity/config/client_loader.py`	YAML client config loader
`src/clawrity/services/mock_data.py`	Mock data for demo
`src/clawrity/services/data_service.py`	Unified data access (mock + BigQuery)
`src/clawrity/services/nl_to_sql.py`	NL-to-SQL query generation
`src/clawrity/agents/gen_agent.py`	Summary generation (Groq + DeepSeek)
`src/clawrity/agents/qa_agent.py`	Hallucination scoring (Groq)
`src/clawrity/agents/orchestrator.py`	Gen→QA pipeline with retry
`src/clawrity/utils/exceptions.py`	Custom exception hierarchy
`src/clawrity/utils/logging.py`	Structured logging setup
`src/clawrity/utils/formatters.py`	Markdown output formatters
`infra/`	Terraform scripts for Azure deployment

Environment Variables

CLAWRITY_APP_NAME=clawrity
CLAWRITY_DEBUG=false
CLAWRITY_LOG_LEVEL=INFO

# Groq (FREE — use for all non-critical tasks)
CLAWRITY_GROQ_API_KEY=gsk-...
CLAWRITY_GROQ_MODEL=llama-3.3-70b-versatile

# DeepSeek (PAID — use only for final responses)
CLAWRITY_DEEPSEEK_API_KEY=sk-...
CLAWRITY_DEEPSEEK_BASE_URL=https://api.deepseek.com
CLAWRITY_DEEPSEEK_MODEL=deepseek-chat

# BigQuery (empty = use mock)
CLAWRITY_GOOGLE_APPLICATION_CREDENTIALS=
CLAWRITY_BIGQUERY_PROJECT_ID=

# Slack
CLAWRITY_SLACK_BOT_TOKEN=xoxb-...
CLAWRITY_SLACK_APP_TOKEN=xapp-...

# Server
CLAWRITY_HOST=0.0.0.0
CLAWRITY_PORT=8000
CLAWRITY_CLIENTS_DIR=clients
CLAWRITY_DATA_DIR=data

Testing Strategy

Unit tests: Test each model, service, and agent in isolation
Integration tests: Test full pipeline with mock data
Coverage target: 80%+ for business logic
Run tests: pytest tests/unit/ from project root

Quality Gates

Before committing:

ruff check . — No linting errors (TC001 suggestions OK)
ruff format . — Consistent formatting
pytest tests/unit/ — All tests pass

Git Conventions

Conventional commits: feat:, fix:, refactor:, test:, docs:, chore:
Atomic commits — one logical change per commit
Never commit secrets or .env files

What NOT to Change

Don't modify Pydantic model structure without updating all dependent code
Don't change exception hierarchy without updating error handlers
Don't add new dependencies without checking pyproject.toml compatibility
Don't hardcode client-specific logic — use YAML config instead
Don't use DeepSeek for non-critical tasks — use Groq (free) instead

Current Status

Phase 0: COMPLETE
Phase 1: COMPLETE (core pipeline)
Phase 2: NOT STARTED (RAG recommendations)
Phase 3: NOT STARTED (ML forecasting)

Next Steps

Get Groq API key (free at https://console.groq.com)
Add keys to .env file
Test end-to-end with uvicorn clawrity.api.app:app --reload
Deploy to Azure using cd infra && terraform apply
Begin Phase 2 (RAG) or Phase 3 (Forecasting)

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Clawrity — Context Document for LLM Continuity

Purpose

Project Identity

Key Decisions Made

Architecture Patterns

LLM Usage Strategy

Agent Pipeline

Client Isolation

Mock Data Strategy

File Map

Environment Variables

Testing Strategy

Quality Gates

Git Conventions

What NOT to Change

Current Status

Next Steps

FilesExpand file tree

context.md

Latest commit

History

context.md

File metadata and controls

Clawrity — Context Document for LLM Continuity

Purpose

Project Identity

Key Decisions Made

Architecture Patterns

LLM Usage Strategy

Agent Pipeline

Client Isolation

Mock Data Strategy

File Map

Environment Variables

Testing Strategy

Quality Gates

Git Conventions

What NOT to Change

Current Status

Next Steps