Skip to content

Latest commit

 

History

History
282 lines (191 loc) · 6.74 KB

File metadata and controls

282 lines (191 loc) · 6.74 KB

🧠 SamvidAI – High Level Design (HLD)

Document Version: 1.0
Status: Draft
Owner: Vishal Gorule
Last Updated: 30/01/2026


Document Control

Field Value
Project Name SamvidAI
Document Type High Level Design (HLD)
Version 1.0
Document Status Draft
Confidentiality Public

Revision History

Version Date Description Author
1.0 30/01/2026 Initial High Level Design draft Vishal Gorule

Table of Contents

  1. Introduction
  2. Background & Problem Statement
  3. Goals & Non-Goals
  4. Design Principles
  5. Assumptions & Constraints
  6. System Overview
  7. High-Level Architecture
  8. OpticalRAG Design Rationale
  9. Component Breakdown
  10. Model & LLM Strategy
  11. Data Flow & Control Flow
  12. Non-Functional Requirements
  13. Security Considerations
  14. Ethics & Human-in-the-Loop
  15. Failure Scenarios & Mitigations
  16. Deployment Strategy
  17. Scalability & Future Extensions
  18. Open Questions & Risks
  19. Conclusion

1. Introduction

This document describes the High Level Design (HLD) of SamvidAI, an AI-assisted legal contract analysis system. It defines the system’s architectural vision, design principles, major components, and non-functional requirements.

This HLD serves as the authoritative architectural reference for implementation, review, and future evolution.


2. Background & Problem Statement

Legal contracts are long, structured, and risk-sensitive documents. Manual review is slow and expensive, while OCR-based and text-only AI systems fail to preserve layout, hierarchy, and clause structure.

Traditional RAG systems suffer from:

  • Loss of visual semantics
  • Clause fragmentation
  • High token cost
  • Increased hallucinations

A new architecture is required to treat contracts as visual documents, not plain text.


3. Goals & Non-Goals

Goals

  • Accurate clause-level understanding
  • Layout-aware document processing
  • Controlled and explainable LLM usage
  • Reduced hallucinations
  • Mandatory human validation
  • Modular and extensible architecture

Non-Goals

  • Replacing legal professionals
  • Providing legal advice
  • Autonomous legal decision-making
  • Guaranteeing legal correctness

SamvidAI is a decision-support system, not a legal authority.


4. Design Principles

  • Vision-first document understanding
  • Retrieval before reasoning
  • Region-level processing
  • Controlled LLM usage
  • Human-in-the-loop by design
  • Explainability and traceability
  • Cost-aware architecture
  • Safety over automation

5. Assumptions & Constraints

Assumptions

  • Documents are primarily PDFs
  • Users are trained legal professionals
  • Human oversight is always available
  • Modern AI models are accessible
  • Consumer-grade hardware may be used

Constraints

  • Interactive latency requirements
  • Minimized LLM token usage
  • Strict data privacy expectations
  • Conservative and explainable outputs

6. System Overview

SamvidAI consists of ingestion, layout analysis, retrieval, reasoning, and review subsystems. Documents are processed visually, indexed at the region level, retrieved by relevance, reasoned over using constrained LLMs, and validated by humans.


7. High-Level Architecture

The architecture is modular and layered:

  • Ingestion Layer
  • Understanding Layer
  • Retrieval Layer
  • Reasoning Layer
  • Review Layer

Each component has clearly defined responsibilities and interfaces.


8. OpticalRAG Design Rationale

OpticalRAG is a vision-first retrieval-augmented architecture that:

  • Preserves layout and spatial structure
  • Retrieves relevant regions before reasoning
  • Uses LLMs only on grounded inputs
  • Maintains traceability to source clauses

It overcomes the limitations of OCR-only and text-only RAG systems.


9. Component Breakdown

Major components include:

  • Document Ingestion
  • Layout Analysis
  • Text Extraction
  • Embedding Generation
  • Vector Storage
  • Retrieval Engine
  • LLM Reasoning
  • Review & Feedback

Each component is modular and independently evolvable.


10. Model & LLM Strategy

SamvidAI separates model roles:

  • Vision models for layout understanding
  • Embedding models for retrieval
  • LLMs (e.g., Gemini 2.5 Pro) for constrained reasoning

LLMs never ingest full documents and always operate on retrieved regions only.


11. Data Flow & Control Flow

  1. Document ingestion
  2. Layout analysis
  3. Region representation
  4. Embedding and indexing
  5. Query-time retrieval
  6. LLM reasoning
  7. Human review

Control flow is stage-gated and failure-aware.


12. Non-Functional Requirements

  • Predictable performance
  • Scalability across document size and volume
  • Reliability and graceful failure
  • Cost efficiency
  • Explainability and traceability
  • Maintainability and extensibility

13. Security Considerations

Security is enforced through:

  • Data minimization
  • Controlled LLM interaction
  • Local, cloud, or hybrid deployment
  • Access control and auditing
  • Secure failure handling

14. Ethics & Human-in-the-Loop

SamvidAI enforces responsible AI usage through:

  • Mandatory human validation
  • Avoidance of automation bias
  • Explicit uncertainty handling
  • Clear system disclaimers

Human judgment remains authoritative.


15. Failure Scenarios & Mitigations

The system anticipates failures in ingestion, layout analysis, retrieval, reasoning, external services, and human review. Mitigations prioritize transparency, isolation, and human intervention.


16. Deployment Strategy

Supported deployment models:

  • Local (on-premise)
  • Cloud
  • Hybrid

Each balances privacy, scalability, and operational constraints.


17. Scalability & Future Extensions

The architecture supports:

  • Larger and more complex contracts
  • Higher document volumes
  • Multi-language and jurisdiction expansion
  • Advanced analytics and compliance use cases
  • Integration with external systems

18. Open Questions & Risks

Key risks include model variability, layout generalization, cost growth, automation bias, and regulatory evolution. These are mitigated through modular design and human oversight.


19. Conclusion

SamvidAI applies AI to legal document analysis in a careful, responsible manner. By combining vision-first processing, retrieval-driven reasoning, and human validation, the system balances capability with trust, safety, and professional accountability.

This HLD serves as the foundation for implementation and long-term evolution.

📌 Notes

For full explanations and detailed rationale, see HLD.docx