Skip to content

SarimMalik01/ODESSA

Repository files navigation

ODESSA — Static Code Analysis Platform (Beta V(1.1))

ODESSA Demo

ODESSA is a static code analysis platform designed to surface deep, structural engineering risks early — before they turn into expensive production problems.

Instead of focusing on syntax or isolated linting, ODESSA analyzes architecture, performance, security, and browser compatibility together, across entire repositories, using a deterministic and explainable analysis pipeline.

In addition, ODESSA integrates a Retrieval-Augmented Generation (RAG) system, enabling users to interactively explore codebases through natural language queries, providing context-aware, repository-specific insights grounded in actual code and analysis results.

This repository contains the Beta (V1.1) implementation of ODESSA.


🚩 The Problem

In real-world systems, the most damaging issues are rarely syntax errors.

They are structural:

  • Architectural drift that slowly erodes maintainability
  • Performance bottlenecks hidden in hot execution paths
  • Security risks that go unnoticed until exploitation
  • Browser APIs that silently fail across environments
  • Fragmented tooling and collaboration around these risks

Most tools solve these problems in isolation.
ODESSA brings them together in one system.


📐 System Architecture (High-Level Design)

ODESSA High-Level Architecture Diagram

This HLD illustrates ODESSA’s end-to-end architecture from GitHub OAuth–based repository access, to queue-driven scan execution, and isolated ephemeral scan containers with various other endpoints for sharing,commenting and analytics too.

The design emphasizes:

  • Strong isolation via per-scan containers
  • Deterministic, reproducible analysis
  • Clear separation between API, workers, and scan engine
  • Secure handling of OAuth tokens

🔍 Design Philosophy

ODESSA is built around a few core beliefs:

  • Architecture determines long-term system health
  • Many critical risks only emerge across files, not within one file
  • Static analysis must be explainable, deterministic, and reproducible
  • Collaboration is as important as detection

The system is intentionally designed to prioritize signal over noise.


🧠 What ODESSA Analyzes

ODESSA surfaces issues across four domains:

🧱 Architecture

  • Circular dependencies
  • Layer violations
  • God modules
  • Dependency instability
  • Domain leakage into UI paths

⚡ Performance

  • Nested loops
  • Function calls inside loops
  • Expensive operations in hot paths
  • Cross-file hot functions
  • Multi-caller hot paths

🔐 Security

  • eval usage
  • new Function usage
  • Hardcoded secrets
  • Unsafe DOM manipulation

🌐 Browser Compatibility

  • Clipboard API usage
  • Fetch API assumptions
  • ResizeObserver usage
  • IntersectionObserver usage
  • Web Share API usage
  • CSS Grid & Flex Gap support
  • Optional chaining & nullish coalescing
  • Promise.allSettled availability

The beta V(1.1) currently includes ~30 custom rules, spanning:

  • Single-file analysis
  • Multi-file / cross-file analysis

🔐 Unified Scanning Model

ODESSA provides one unified feature:

Repository-based scanning via GitHub OAuth, supporting both public and private repositories.

There is no ZIP upload or local scanning mode.

All scans are executed against real repositories, using authenticated access.


🔑 Repository Access (GitHub OAuth)

  • Users authenticate via GitHub OAuth
  • Scoped access is granted to:
    • Public repositories
    • Private repositories (with explicit user consent)
  • No GitHub credentials are stored permanently
  • Repositories are cloned into temporary, isolated workspaces

This enables analysis of production-grade codebases without manual uploads.


🧠 End-to-End Scanning Flow (Step by Step)

This section describes exactly how a scan works internally.

1️⃣ Scan Initiation

  • User selects a repository after GitHub OAuth
  • Backend creates a scan job
  • A unique SCAN_ID is generated
  • Job metadata is stored (repo URL, commit ref, user)

2️⃣ Job Queuing (BullMQ + Redis)

  • Scan job is pushed to Redis-backed BullMQ
  • Backend remains non-blocking
  • Scan progress is exposed via polling

Heavy analysis is fully decoupled from user interaction.

3️⃣ Scan Engine Container Spawn

  • A BullMQ worker starts a fresh scan-engine Docker container
  • Container receives:
    • Repository URL
    • Commit or branch reference
    • Scan ID
  • Repository is cloned inside the container
  • Workspace is fully isolated

User code never runs inside backend containers.

4️⃣ File System Normalization

Inside the scan engine:

  • Repository is normalized into a deterministic workspace
  • Supported source files are discovered
  • A hierarchical file tree is constructed for UI rendering

This ensures reproducible scans.

5️⃣ AST Parsing & Traversal

For each discovered file:

  • File is parsed into an AST
  • AST is traversed node-by-node
  • Multiple signals are collected in a single traversal

Collected signals include:

  • Function declarations
  • Imports and dependencies
  • Loop depth
  • Function calls
  • Performance heuristics
  • Security-sensitive constructs
  • Browser API usage

6️⃣ Rule Execution During Traversal

Rules are executed inline during AST traversal:

  • Rules are indexed by AST node type
  • Only relevant rules run per node
  • Duplicate hits are deduplicated per file and line

Rule domains:

  • Architecture
  • Performance
  • Security
  • Browser compatibility

This avoids redundant passes and keeps scans efficient.

7️⃣ Global & Cross-File Analysis

After all files are scanned:

  • Architecture rules run using a global dependency graph
  • Cross-file performance rules detect:
    • Hot functions across modules
    • Multi-caller performance risks

These issues cannot be detected from single files alone.

8️⃣ Issue Normalization

All findings are merged into a unified issue model:

  • Architecture issues
  • Performance issues
  • Security issues
  • Browser compatibility issues

Each issue includes:

  • Rule ID
  • Severity
  • Explanation
  • File and line context
  • Additional metadata

9️⃣ Asynchronous LLM Enrichment

  • Normalized issues are optionally enriched via an LLM
  • This step is:
    • Fire-and-forget
    • Non-blocking
    • Failure-tolerant
  • The same LLM layer also powers the RAG-based query system for interactive repository exploration

The core scan remains deterministic, even if enrichment fails.


🐳 Container & Deployment Architecture

ODESSA intentionally separates responsibilities across containers.

1️⃣ Scan Engine Dockerfile

Purpose:
Executes static analysis in a fully isolated environment.

Responsibilities:

  • Clone repository
  • Parse files and ASTs
  • Execute rules
  • Produce normalized results

Build:

docker build -t odessa-scan-engine ./scan-engine

Each scan runs in a fresh container, which is destroyed after completion.

2️⃣ Backend & Redis Dockerfiles

Backend Container

  • API layer
  • GitHub OAuth handling
  • Project and user management
  • Job orchestration
  • Result persistence

Redis Container

  • BullMQ-backed job queue
  • Scan scheduling
  • Worker coordination

Each service runs in its own container for isolation and scalability.


3️⃣ docker-compose.yml

The docker-compose.yml orchestrates:

  • Backend service
  • Redis service
  • Shared volumes for job metadata

Volumes allow BullMQ workers to:

  • Trigger scan-engine containers
  • Share temporary scan context
  • Avoid persisting user code beyond scan lifecycle

🔐 Security & Isolation Guarantees

  • Every scan runs in a fresh container
  • Workspaces are destroyed after completion
  • User code is never persisted
  • No cross-scan contamination is possible
  • OAuth scopes are minimal and revocable


💬 RAG-Powered Code Understanding

Hybrid Importance Score Search Diagram

ODESSA integrates a Retrieval-Augmented Generation (RAG) system to enable interactive, context-aware exploration of scanned repositories.

Instead of manually navigating large codebases, users can ask natural language questions and receive grounded, repository-specific answers.

🔍 What RAG Enables

  • High-level project understanding
  • Architecture and data-flow explanations
  • Deep code-level insights
  • Identification of risks, bottlenecks, and vulnerabilities
  • Context-aware debugging and improvement suggestions

⚙️ How It Works

  • Relevant code chunks and scan results are indexed
  • Queries are matched using semantic retrieval
  • Retrieved context is passed to an LLM
  • Responses are generated grounded in actual repository data

This ensures answers are:

  • Contextual
  • Explainable
  • Repository-specific (not generic LLM guesses)

🎥 Demo

Full Demo (Earlier Beta — ZIP-based scanning, now ZIP-based scanning is deprecated and has been replaced by Repo Based scanning):
https://drive.google.com/file/d/1q3Tor2KKY0fF3mLtv3p_2pMhNDroWp1A/view?usp=sharing

Full Demo (Repo-Based Scanning Demo (GitHub OAuth — Public + Private Repositories)):
https://drive.google.com/file/d/1wJQMXXzZJYy6AI9lgzVAWHhccGg3KqSt/view?usp=sharing

RAG-Featue Demo:
https://drive.google.com/file/d/1vMkvpto-7prn9wytdMaKToBgJIj7wV8a/view?usp=sharing


🧪 Current Status

  • Stage: Beta(V1.1)
  • Scanning Mode: Repository-based only
  • Access: GitHub OAuth (public + private repos)
  • Focus: Signal quality and real-world applicability

🚀 What’s Next

This beta(V1.1) is intentionally a stepping stone.

Planned improvements include:

  • Deeper architectural heuristics
  • Better dependency graph visualization
  • Smarter rule prioritization
  • Improved collaboration workflows
  • Performance and scalability enhancements

🔄 Version Evolution

ODESSA has evolved through multiple iterations, each improving how repositories are analyzed and explored:

  • Beta → ZIP-based file upload scanning
  • Beta (v1.0) → Repository URL-based scanning with GitHub OAuth (public + private repos)
  • Beta (v1.1) → Introduction of RAG-powered code understanding in natural language queries from users for interactive, context-aware repository exploration

🚀 Launching the Project Locally

Backend & Redis

From the backend directory:

docker compose up --build

This starts:

  • Backend service
  • Redis (BullMQ queue)
  • Scan orchestration support

Frontend

From the Frontend Directory

npm install
npm run dev -- --host 127.0.0.1

Notes

  • This project is under active development
  • APIs, rules, and structure may evolve
  • Feedback is highly appreciated

About

ODESSA is a deterministic static analysis engine that identifies architectural drift, performance bottlenecks, security smells, and browser compatibility risks before code reaches production, with built-in collaboration and a focus on signal over noise.

Topics

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

 
 
 

Contributors