Skip to content

Latest commit

 

History

History
175 lines (151 loc) · 5.8 KB

File metadata and controls

175 lines (151 loc) · 5.8 KB

Clawrity — Project Initialization Document

What is Clawrity?

Clawrity is a multi-channel AI business intelligence platform that replaces traditional dashboards with conversational interfaces. Enterprise users interact with their business data through Slack, Teams, or WhatsApp and receive data-grounded insights, reports, and predictions.

Tech Stack

Component Technology Purpose
Language Python 3.11+ Core application
Package Manager uv (preferred) or pip Dependency management
API Framework FastAPI REST API + WebSocket support
LLM DeepSeek (via OpenAI-compatible API) NL-to-SQL, summary generation, QA
LLM Framework LangChain Agent orchestration, prompt management
Data Models Pydantic v2 Validation, serialization, settings
Data Source BigQuery (real) / Mock Service (demo) Business data queries
Channels Slack SDK (first), Teams, WhatsApp User interaction
Containerization Docker + docker-compose Deployment
CI/CD GitHub Actions Lint, typecheck, test, build
Logging structlog Structured JSON logging
Testing pytest + pytest-cov + pytest-asyncio Unit + integration tests
Linting ruff Code formatting + style
Type Checking mypy (strict) Static type analysis
Security bandit Security linting

Architecture Overview

User (Slack/Teams/WhatsApp)
    │
    ▼
Channel Adapter (Slack Bolt / Teams Bot / WhatsApp)
    │
    ▼
FastAPI Application
    ├── POST /api/v1/chat          → Chat endpoint
    ├── POST /api/v1/digest/generate → Daily digest
    ├── GET  /api/v1/digest/{id}   → Retrieve digest
    ├── CRUD /api/v1/clients       → Client management
    └── GET  /health               → Health check
    │
    ▼
Agent Orchestrator
    ├── Gen Agent (DeepSeek) → Generate newsletter-style summary
    ├── QA Agent (DeepSeek)  → Score for hallucinations
    └── Retry Logic          → Max 3 retries, decreasing temperature
    │
    ▼
Services Layer
    ├── BigQuery Service (or Mock Data Service)
    ├── NL-to-SQL Engine
    ├── Vector Store (Phase 2)
    └── ML Forecast (Phase 3)
    │
    ▼
Client Config (YAML files in clients/ directory)

Three-Phase Build

Phase 1 — Data-Grounded Digest & Chat (Current Focus)

  • Daily automated business digests
  • Natural language queries → SQL → BigQuery → grounded responses
  • Gen Agent → QA Agent pipeline with hallucination detection
  • Newsletter-style summaries
  • Bottom 3 performing branches analysis
  • Budget allocation insights

Phase 2 — RAG-Based Recommendations

  • Vector store for client historical data + industry benchmarks
  • RAG pipeline for grounded recommendations
  • Client-specific risk tolerance via YAML config

Phase 3 — Forecasting & ROI Predictions

  • Prophet/ML models for forecasting
  • Scenario-based ROI calculations
  • Cached prediction embeddings

Client Onboarding

Each client = one YAML file in clients/ directory.

# clients/acme_corp.yaml
client:
  id: acme_corp
  name: "Acme Corporation"
  risk_tolerance: medium
  budget_reallocation_max: 0.20

data:
  project_id: "acme-bigquery-prod"
  dataset: "acme_analytics"
  tables:
    spend: "daily_spend"
    branches: "branch_master"
    performance: "kpi_daily"
  use_mock: true  # Set to false when BigQuery credentials available

countries:
  - code: US
    name: "United States"
    branches:
      - code: NYC
        name: "New York City"
      - code: LA
        name: "Los Angeles"
  - code: CA
    name: "Canada"
    branches:
      - code: TOR
        name: "Toronto"
      - code: VAN
        name: "Vancouver"
  - code: MA
    name: "Morocco"
    branches:
      - code: CAS
        name: "Casablanca"
      - code: RAB
        name: "Rabat"

digest:
  schedule: "0 8 * * *"
  bottom_n: 3
  metrics: [spend, leads, conversions, roi]

rules:
  min_data_points: 30
  hallucination_threshold: 0.85
  max_retries: 3
  temperature_step: 0.1

Demo Strategy (No BigQuery Credentials)

Since we don't have BigQuery credentials for the demo:

  1. Mock Data Service — Generates realistic business data with configurable seed
  2. YAML Config use_mock: true — Each client config can toggle mock mode
  3. Same Pipeline — Mock data flows through the same Gen Agent → QA Agent pipeline
  4. Realistic Patterns — Data includes trends, outliers, and seasonal patterns

Deployment

  • Development: docker-compose up locally
  • Production: Azure VM with Docker containers
  • Future: Kubernetes for multi-client scaling

Directory Structure

clawrity/
├── src/clawrity/           # Main application code
│   ├── models/             # Pydantic v2 data models
│   ├── agents/             # AI agents (Gen, QA, Orchestrator)
│   ├── services/           # BigQuery, NL-to-SQL, Mock Data
│   ├── api/                # FastAPI app + routers
│   ├── channels/           # Slack, Teams, WhatsApp adapters
│   ├── config/             # Settings + client config loader
│   └── utils/              # Exceptions, logging, formatters
├── tests/                  # Test suite
│   ├── unit/               # Unit tests
│   ├── integration/        # Integration tests
│   └── fixtures/           # Test fixtures
├── clients/                # Client YAML configurations
├── data/                   # Mock data + caches
├── docs/                   # Documentation
├── .github/workflows/      # CI/CD pipeline
├── Dockerfile              # Container definition
├── docker-compose.yml      # Local development
├── pyproject.toml          # Project configuration
└── AGENTS.md               # Project-specific rules