Deploy AI agents to dedicated VMs in 90 seconds. Interactive TUI. Automatic DNS and TLS. You own the infrastructure.
-
Updated
Apr 5, 2026 - Go
Deploy AI agents to dedicated VMs in 90 seconds. Interactive TUI. Automatic DNS and TLS. You own the infrastructure.
Deploy intelligence. Open-source infrastructure for AI agents in production.
Mechanistic analysis of a GPT-2–like model exploring the compositionality gap in transformers. Using Logit Lens and Causal Tracing, the study identifies and overcomes a deep-layer bottleneck via dataset enhancement addressing the stated Compositionality Gap (NeurIPS24).
A mock Azure OpenAI API for seamless testing and development, supporting both streaming and non-streaming responses. Easily emulate OpenAI completions with token-based streaming in a local or Dockerized environment.
Comprehensive guide to FastAPI, Pydantic, and SQLAlchemy for AI engineers. Learn API design, validation, and ORM workflows with practical examples and setup 🐙
My Digital Twin Companion
DeepSeek-V3 Windows Deployment Fixer: Resolves CUDA_ERROR_OUT_OF_MEMORY, missing cublas64_12.dll, and Triton compiler errors. Optimized for RTX 30/40 series GPUs.
Complete guide to deploying private, on-premise AI and LLMs: hardware selection, model comparison (ollama vs vLLM vs llama.cpp), security hardening, and AI governance policy templates. By Petronella Technology Group.
A minimal, high-performance starter kit for running AI model inference on NVIDIA GPUs using CUDA. Includes environment setup, sample kernels, and guidance for integrating ONNX/TensorRT pipelines for fast, optimized inference on modern GPU hardware.
Production-oriented GPU inference stack with FastAPI, Docker, Redis, Prometheus, and Grafana for AI workload serving
A Streamlit-based spam classifier that predicts whether a message is spam or not spam using machine learning.
Multi-agentic researcher (RAG)
Deployment of a self-hosted LLM infrastructure using Ollama and Open WebUI on Linux, including custom model creation, API integration, and system-level troubleshooting.
Skeleton Slack bot that lets developers trigger deployments, rollbacks, and status checks via simple chat commands.
Full-stack web application (React + Flask) for Multimodal Video Captioning. Deploys the MixCap model (BLIP-2 + Wav2Vec2) to generate video descriptions for end-users.
End-to-end pipeline for deploying deep learning models on edge devices: model conversion, quantization, hardware acceleration, and Android integration.
🧠 Инфраструктура для деплоймента Zeroclaw, докерезированная и совместимая с Portainer
🛠 Fix DeepSeek-V3 Windows setup issues by resolving CUDA memory errors, missing DLLs, and PyTorch-CUDA conflicts for smooth local deployment.
Compare PyTorch vs Triton inference latency with CLI tools, benchmarks, and performance plots.
Add a description, image, and links to the ai-deployment topic page so that developers can more easily learn about it.
To associate your repository with the ai-deployment topic, visit your repo's landing page and select "manage topics."