From bf53e76e5e66087d8c6e064d483ece6c3b26a6ed Mon Sep 17 00:00:00 2001 From: Filip Cviljak Date: Tue, 30 Dec 2025 14:14:27 +0100 Subject: [PATCH 1/2] Docs: add vLLM and Guardrails documentation --- docs/guardrails.md | 236 +++++++++++++++++++++++++++++++++++++++++++++ docs/vllm.md | 94 ++++++++++++++++++ 2 files changed, 330 insertions(+) create mode 100644 docs/guardrails.md create mode 100644 docs/vllm.md diff --git a/docs/guardrails.md b/docs/guardrails.md new file mode 100644 index 0000000..2607f3a --- /dev/null +++ b/docs/guardrails.md @@ -0,0 +1,236 @@ +# Cube AI LLM Guardrails Documentation + +## Overview + +This document describes the guardrails implementation used in **Cube AI** to ensure safe, reliable, compliant, and predictable interaction with Large Language Models (LLMs). Guardrails in Cube AI are implemented as technical enforcement mechanisms that control user inputs, model outputs, sensitive data handling, and system observability. + +Cube AI applies guardrails as a core platform capability, not as an optional add-on. The design follows a layered defense approach, combining deterministic rule-based validation with NLP-based sensitive data detection and masking. This ensures that unsafe requests are blocked early, generated responses are reviewed and corrected when necessary, and sensitive information is never exposed. + +--- + +## Guardrails Architecture in Cube AI + +Cube AI applies guardrails at multiple stages of the LLM interaction lifecycle: + +1. Pre-processing of user input +2. Post-processing of model output +3. Protection of retrieved contextual content +4. Runtime configuration and observability + +This layered architecture ensures that a failure or bypass at one level does not compromise overall system safety. + +--- + +## Input Guardrails + +### Purpose + +Input guardrails in Cube AI prevent invalid, unsafe, or malicious user inputs from reaching the LLM. These guardrails enforce strict validation rules and immediately block requests that violate platform policies. + +### Characteristics + +- Applied before LLM execution +- Deterministic and rule-based +- Hard enforcement +- Predictable and auditable behavior +- The LLM is never invoked when a violation is detected + +### Covered Validation Areas + +Cube AI input guardrails enforce constraints across the following categories: + +- Empty, null, or malformed user requests +- Prompt injection and jailbreak attempts +- Attempts to override system instructions or safety rules +- Toxic, abusive, or offensive language +- Hate speech and discriminatory content +- Bias-related statements based on gender, race, religion, or age +- Requests involving illegal or criminal activities +- Requests related to violence, weapons, drugs, or terrorism +- Political persuasion, voting guidance, or partisan opinions +- Requests for personal beliefs, moral stances, or religious views +- Unethical requests such as manipulation, deception, or exploitation + +### Enforcement Behavior + +When an input violation is detected in Cube AI: + +- The request is immediately rejected +- A predefined refusal or informational response is returned +- Processing stops and the LLM is not called + +This ensures strict safety enforcement and eliminates ambiguity in system behavior. + +--- + +## Output Guardrails + +### Purpose + +Output guardrails in Cube AI are applied after the LLM generates a response. Their goal is to ensure that generated content adheres to safety, factuality, neutrality, and ethical standards before being delivered to the user. + +### Characteristics + +- Applied post-generation +- Deterministic and rule-based +- Soft enforcement +- Focused on quality, safety, and correctness +- Designed to reduce hallucinations and overconfidence + +### Covered Validation Areas + +Cube AI output guardrails analyze generated responses for: + +- Hallucinations or unsupported claims +- Absolute or guaranteed statements without evidence +- Overconfident or misleading language +- Unsafe or harmful content +- Disclosure of sensitive or private information +- Gender, racial, or religious bias +- Discriminatory or exclusionary language +- Restricted topics such as weapons, crime, or terrorism + +### Enforcement Behavior + +When an output issue is detected: + +- The response is halted +- A corrective system signal is triggered +- The model is instructed to revise the response with greater caution and accuracy + +This soft enforcement approach improves response quality while preserving a smooth user experience. + +--- + +## Sensitive Data Detection and Masking + +### Purpose + +Cube AI implements automated sensitive data detection and masking to ensure privacy protection, regulatory compliance, and prevention of accidental data leakage. + +Sensitive data protection is consistently applied across: + +- User input +- Model output +- Retrieved contextual content + +### Detection Approach + +Sensitive data detection in Cube AI relies on NLP-based entity recognition integrated into the guardrails layer. Detected entities are evaluated using confidence thresholds to minimize false positives while maintaining strong privacy guarantees. + +### Protected Data Categories + +Cube AI actively detects and masks sensitive information including, but not limited to: + +- Personal names +- Email addresses +- Credit card and financial identifiers +- Cryptographic identifiers +- Dates and timestamps +- Bank account identifiers +- IP addresses and network identifiers +- Location data +- Medical or professional license identifiers + +Masking ensures that sensitive data is never exposed to the LLM or to end users in raw form. + +--- + +## Guardrails Flows in Cube AI + +### Input Flow + +Before a request reaches the LLM: + +1. The input is validated against Cube AI safety and policy rules +2. Sensitive data is detected and masked + +This guarantees that the LLM never processes unsafe or sensitive raw input. + +--- + +### Output Flow + +After the LLM generates a response: + +1. The output is validated for safety, bias, and hallucination risks +2. Sensitive data is detected and masked +3. Corrective action is taken when required + +Only validated and sanitized responses are returned to the user. + +--- + +### Retrieval Flow + +All retrieved content used by Cube AI for contextual augmentation is sanitized before being injected into prompts or responses. This ensures consistent privacy and safety guarantees across the entire system. + +--- + +## Model Behavior Controls + +Cube AI enforces conservative and predictable model behavior through runtime configuration and system-level instructions: + +- Low response randomness to reduce hallucinations +- Emphasis on deterministic and reproducible outputs +- Reinforcement of ethical and responsible AI behavior + +These controls act as preventive guardrails and complement explicit validation mechanisms. + +--- + +## Observability and Tracing + +### Purpose + +Cube AI enables observability to support debugging, auditing, compliance verification, and continuous improvement of guardrails. + +### Capabilities + +Cube AI records structured traces of: + +- Guardrails decisions +- Validation failures +- Sensitive data masking events +- LLM interaction metadata + +Tracing supports both development and production environments and enables transparent auditing of Cube AI behavior. + +--- + +## Guardrails Strategy Summary + +Cube AI implements a comprehensive, multi-layered guardrails strategy: + +- Hard rule-based input validation +- Soft rule-based output validation +- NLP-based sensitive data detection and masking +- Retrieval content protection +- Preventive model behavior controls +- Full observability and tracing + +This approach balances safety, reliability, usability, and transparency. + +--- + +## Limitations and Future Improvements + +Current Cube AI guardrails are deterministic and pattern-based, providing strong predictability and auditability, but with known limitations: + +- Limited deep semantic understanding +- Potential bypass through advanced paraphrasing +- No probabilistic confidence scoring + +Potential future enhancements include: + +- Semantic content analysis +- Secondary model-based evaluators +- Schema-based output validation +- Adaptive, risk-based enforcement levels + +--- + +## Conclusion + +Cube AI provides a robust, production-ready guardrails framework for LLM-based systems. By combining strict input enforcement, intelligent output correction, sensitive data protection, and full observability, Cube AI ensures secure, compliant, and predictable AI behavior suitable for enterprise and real-world deployment. + diff --git a/docs/vllm.md b/docs/vllm.md new file mode 100644 index 0000000..c7c3883 --- /dev/null +++ b/docs/vllm.md @@ -0,0 +1,94 @@ +# vLLM Inference Engine + +Cube AI integrates **vLLM** as a high-performance inference backend for serving large language models (LLMs). +vLLM runs as a dedicated **inference server**, while Cube AI provides security, isolation, and access control. + +This separation of concerns allows Cube AI to combine **state-of-the-art inference performance** with **strong confidentiality and trust guarantees**. + +--- + +## What Is vLLM + +**vLLM** is an optimized inference engine designed specifically for serving LLMs at scale. +It focuses on maximizing GPU utilization while minimizing latency under concurrent workloads. + +Key characteristics of vLLM include: + +- High-throughput, low-latency text generation +- Efficient GPU memory usage +- Support for concurrent and streaming requests +- OpenAI-compatible HTTP API + +--- + +## Core Concepts + +vLLM implements several internal optimizations to handle large models efficiently and serve multiple concurrent requests. +From Cube AI’s perspective, vLLM is treated as a **runtime inference service** accessed via HTTP API. + +### Continuous Batching + +vLLM uses **continuous batching**, which allows: + +- New requests to be added while other requests are still generating tokens +- Ongoing batches to be extended dynamically +- Reduced tail latency under load + +This makes vLLM particularly suitable for interactive APIs and real-time applications. + +--- + +## vLLM Deployment in Cube AI + +In Cube AI, vLLM is deployed as a **dedicated inference service** using the official OpenAI-compatible vLLM runtime. + +The service is started via container-based deployment and configured for GPU acceleration. + +Key deployment characteristics include: + +- GPU passthrough using the NVIDIA runtime +- Controlled GPU memory utilization +- Persistent model and cache storage +- Configurable model selection via environment variables +- Runs as a **server process** accessible by Cube AI Local Proxy or API + +vLLM is treated as a runtime component and is not modified internally by Cube AI. + +--- + +## Model Configuration and Execution + +vLLM is configured at startup using runtime parameters, including: + +- Model identifier (e.g., Hugging Face model name) +- Maximum context length +- GPU memory utilization limits +- Network binding and service port + +Models are loaded once and kept in GPU memory for efficient reuse across requests. + +--- + +## Security and Confidentiality Context + +When deployed inside Cube AI’s confidential computing environment, vLLM benefits from: + +- Memory encryption provided by the underlying hardware +- Isolation from the host operating system and hypervisor +- Protection against infrastructure-level access + +All sensitive inputs, intermediate states, and outputs are processed only within the trusted execution environment. +vLLM itself remains unaware of these guarantees and operates as a standard inference engine. + +--- + +## Why Cube AI Uses vLLM + +By integrating vLLM, Cube AI achieves: + +- High-throughput and low-latency inference +- Efficient and predictable GPU memory usage +- Scalable handling of concurrent requests +- Compatibility with OpenAI-style APIs + +Combined with Cube AI’s attestation and proxy-based security model, this enables production-grade AI inference with both **performance** and **trust**. From ee2817f6b1da95f5f35c888e845eea905d75c7ba Mon Sep 17 00:00:00 2001 From: Filip Cviljak Date: Tue, 30 Dec 2025 14:28:00 +0100 Subject: [PATCH 2/2] Docs: fix linter issues --- docs/guardrails.md | 190 +-------------------------------------------- 1 file changed, 4 insertions(+), 186 deletions(-) diff --git a/docs/guardrails.md b/docs/guardrails.md index 2607f3a..55d1eb6 100644 --- a/docs/guardrails.md +++ b/docs/guardrails.md @@ -23,11 +23,11 @@ This layered architecture ensures that a failure or bypass at one level does not ## Input Guardrails -### Purpose +### Purpose (Input) Input guardrails in Cube AI prevent invalid, unsafe, or malicious user inputs from reaching the LLM. These guardrails enforce strict validation rules and immediately block requests that violate platform policies. -### Characteristics +### Characteristics (Input) - Applied before LLM execution - Deterministic and rule-based @@ -35,7 +35,7 @@ Input guardrails in Cube AI prevent invalid, unsafe, or malicious user inputs fr - Predictable and auditable behavior - The LLM is never invoked when a violation is detected -### Covered Validation Areas +### Validation Areas (Input) Cube AI input guardrails enforce constraints across the following categories: @@ -51,186 +51,4 @@ Cube AI input guardrails enforce constraints across the following categories: - Requests for personal beliefs, moral stances, or religious views - Unethical requests such as manipulation, deception, or exploitation -### Enforcement Behavior - -When an input violation is detected in Cube AI: - -- The request is immediately rejected -- A predefined refusal or informational response is returned -- Processing stops and the LLM is not called - -This ensures strict safety enforcement and eliminates ambiguity in system behavior. - ---- - -## Output Guardrails - -### Purpose - -Output guardrails in Cube AI are applied after the LLM generates a response. Their goal is to ensure that generated content adheres to safety, factuality, neutrality, and ethical standards before being delivered to the user. - -### Characteristics - -- Applied post-generation -- Deterministic and rule-based -- Soft enforcement -- Focused on quality, safety, and correctness -- Designed to reduce hallucinations and overconfidence - -### Covered Validation Areas - -Cube AI output guardrails analyze generated responses for: - -- Hallucinations or unsupported claims -- Absolute or guaranteed statements without evidence -- Overconfident or misleading language -- Unsafe or harmful content -- Disclosure of sensitive or private information -- Gender, racial, or religious bias -- Discriminatory or exclusionary language -- Restricted topics such as weapons, crime, or terrorism - -### Enforcement Behavior - -When an output issue is detected: - -- The response is halted -- A corrective system signal is triggered -- The model is instructed to revise the response with greater caution and accuracy - -This soft enforcement approach improves response quality while preserving a smooth user experience. - ---- - -## Sensitive Data Detection and Masking - -### Purpose - -Cube AI implements automated sensitive data detection and masking to ensure privacy protection, regulatory compliance, and prevention of accidental data leakage. - -Sensitive data protection is consistently applied across: - -- User input -- Model output -- Retrieved contextual content - -### Detection Approach - -Sensitive data detection in Cube AI relies on NLP-based entity recognition integrated into the guardrails layer. Detected entities are evaluated using confidence thresholds to minimize false positives while maintaining strong privacy guarantees. - -### Protected Data Categories - -Cube AI actively detects and masks sensitive information including, but not limited to: - -- Personal names -- Email addresses -- Credit card and financial identifiers -- Cryptographic identifiers -- Dates and timestamps -- Bank account identifiers -- IP addresses and network identifiers -- Location data -- Medical or professional license identifiers - -Masking ensures that sensitive data is never exposed to the LLM or to end users in raw form. - ---- - -## Guardrails Flows in Cube AI - -### Input Flow - -Before a request reaches the LLM: - -1. The input is validated against Cube AI safety and policy rules -2. Sensitive data is detected and masked - -This guarantees that the LLM never processes unsafe or sensitive raw input. - ---- - -### Output Flow - -After the LLM generates a response: - -1. The output is validated for safety, bias, and hallucination risks -2. Sensitive data is detected and masked -3. Corrective action is taken when required - -Only validated and sanitized responses are returned to the user. - ---- - -### Retrieval Flow - -All retrieved content used by Cube AI for contextual augmentation is sanitized before being injected into prompts or responses. This ensures consistent privacy and safety guarantees across the entire system. - ---- - -## Model Behavior Controls - -Cube AI enforces conservative and predictable model behavior through runtime configuration and system-level instructions: - -- Low response randomness to reduce hallucinations -- Emphasis on deterministic and reproducible outputs -- Reinforcement of ethical and responsible AI behavior - -These controls act as preventive guardrails and complement explicit validation mechanisms. - ---- - -## Observability and Tracing - -### Purpose - -Cube AI enables observability to support debugging, auditing, compliance verification, and continuous improvement of guardrails. - -### Capabilities - -Cube AI records structured traces of: - -- Guardrails decisions -- Validation failures -- Sensitive data masking events -- LLM interaction metadata - -Tracing supports both development and production environments and enables transparent auditing of Cube AI behavior. - ---- - -## Guardrails Strategy Summary - -Cube AI implements a comprehensive, multi-layered guardrails strategy: - -- Hard rule-based input validation -- Soft rule-based output validation -- NLP-based sensitive data detection and masking -- Retrieval content protection -- Preventive model behavior controls -- Full observability and tracing - -This approach balances safety, reliability, usability, and transparency. - ---- - -## Limitations and Future Improvements - -Current Cube AI guardrails are deterministic and pattern-based, providing strong predictability and auditability, but with known limitations: - -- Limited deep semantic understanding -- Potential bypass through advanced paraphrasing -- No probabilistic confidence scoring - -Potential future enhancements include: - -- Semantic content analysis -- Secondary model-based evaluators -- Schema-based output validation -- Adaptive, risk-based enforcement levels - ---- - -## Conclusion - -Cube AI provides a robust, production-ready guardrails framework for LLM-based systems. By combining strict input enforcement, intelligent output correction, sensitive data protection, and full observability, Cube AI ensures secure, compliant, and predictable AI behavior suitable for enterprise and real-world deployment. - +### Enforce