piqc — GPU Waste Scanner for Kubernetes

Most AI clusters waste 40–60% of GPU spend. piqc finds it in one command.

Read-only · No agents · No sidecars · Nothing installed permanently · Runs as a Job, prints results, exits.

Quick Start • Features • Commands • Output Formats • Installation

What you'll see

Run piqc scan against your cluster and get an instant cost report:

                                                    Discovered Inference Deployments
┏━━━━━━━━━━━━━━━━━━━━━━━━━━━━━┳━━━━━━━━━┳━━━━━━━━━━━━━━━━━━┳━━━━━━━━━━┳━━━━━━━━━━┳━━━━━━┳━━━━━━━━━━━━━┳━━━━━━━━┳━━━━━━━━━━━━━━┳━━━━━━━━━━━━━━┳━━━━━━━━━━━━━━━━━┓
┃ Deployment                  ┃ Engine  ┃ GPU              ┃ Replicas ┃ GPU Util ┃  MFU ┃ $/1K tokens ┃   $/hr ┃   Idle $/day ┃   Tier Fit   ┃ Namespace       ┃
┡━━━━━━━━━━━━━━━━━━━━━━━━━━━━━╇━━━━━━━━━╇━━━━━━━━━━━━━━━━━━╇━━━━━━━━━━╇━━━━━━━━━━╇━━━━━━╇━━━━━━━━━━━━━╇━━━━━━━━╇━━━━━━━━━━━━━━╇━━━━━━━━━━━━━━╇━━━━━━━━━━━━━━━━━┩
│ meta-llama/Llama-3-70B-Inst │ vllm    │ 8xH100-SXM4-80GB │        2 │       4% │ 3.1% │     $0.0842 │ $68.00 │    $1,566.72 │ ⚠ >A100-80GB │ production      │
│ mistral-7b-instruct         │ vllm    │ 1xA100-SXM4-40GB │        1 │      11% │ 8.4% │     $0.0073 │  $2.50 │       $53.40 │    ⚠ >T4     │ production      │
│ codellama-34b-staging       │ vllm    │ 4xH100-SXM4-80GB │        1 │       0% │  N/A │         N/A │ $17.00 │      $408.00 │ ⚠ >A100-40GB │ staging         │
│ embedding-bge-large         │ vllm    │ 1xT4             │        3 │      82% │  N/A │     $0.0002 │  $1.35 │        $5.83 │      ✓       │ shared-services │
│ unknown-runtime-7f3a2       │ unknown │ 2xA100-SXM4-80GB │        1 │      N/A │  N/A │         N/A │  $7.00 │ util unknown │      ?       │ ml-platform     │
└─────────────────────────────┴─────────┴──────────────────┴──────────┴──────────┴──────┴─────────────┴────────┴──────────────┴──────────────┴─────────────────┘

╭──────────────────────────────────── Cost Summary ──────────────────────────────────────╮
│   Total GPU spend rate      : $95.85/hr                                                │
│                                                                                        │
│   Leased & idle (util <60%) : $2,033.95/day  (pods running, GPUs underused)            │
│   Unallocated nodes         : $1,152.00/day  (12 GPU(s) with no pods scheduled)        │
│   Tier misplacement         :   $721.20/day  (3 model(s) on oversized GPU tier)        │
│                                                                                        │
│   Total estimated leak      : $3,907.15/day  ($1,426,110/yr)                           │
│                                                                                        │
│   Avg MFU (active deployments) : 15.7%  (healthy range: 30–60%)                        │
╰────────────────────────────────────────────────────────────────────────────────────────╯

piqc surfaces three types of waste:

Idle GPUs — pods running, GPUs sitting near-empty
Tier misplacement — a 7B model on an H100 that only needs a T4
Unallocated nodes — GPU nodes with no pods scheduled at all

🚀 Quick Start

Option 1: Run as a Kubernetes Job (recommended)

Runs inside your cluster — no Docker auth or kubeconfig wrangling:

# Step 1 — Apply RBAC permissions (one-time setup)
kubectl apply -f https://raw.githubusercontent.com/paralleliq/piqc/main/deploy/rbac.yaml

# Step 2 — Run the scan
kubectl apply -f https://raw.githubusercontent.com/paralleliq/piqc/main/deploy/scan-job.yaml

# Step 3 — View the output
kubectl logs -f job/piqc-scan -n kube-system

# Clean up when done
kubectl delete job piqc-scan -n kube-system

The job auto-deletes itself after 10 minutes (ttlSecondsAfterFinished: 600).

Option 2: Run with Docker from your laptop

# Export a static kubeconfig with embedded credentials
kubectl config view --raw --flatten > /tmp/piqc-kubeconfig.yaml

# Run the scan
docker run --rm \
  -v /tmp/piqc-kubeconfig.yaml:/root/.kube/config \
  ghcr.io/paralleliq/piqc:latest \
  scan --format table

Supports both linux/amd64 and linux/arm64.

Option 3: Install from source

git clone https://github.com/paralleliq/piqc.git
cd piqc
poetry install
poetry run piqc scan --format table

✨ Features

🔍 Intelligent Discovery

Auto-Detection: Automatically discovers vLLM inference deployments across all namespaces
Weighted Confidence Scoring: Uses multiple signals (images, env vars, CLI args, labels) with weighted scoring
Framework Detection: Identifies vLLM with high accuracy using pattern matching and heuristics

📊 Comprehensive Metrics Collection

GPU Metrics: Real-time GPU utilization, memory, temperature, and power via nvidia-smi
Runtime Metrics: Collects vLLM API metrics including:
- Request latency (P50, P95, P99)
- Token throughput (prefill & decode)
- KV cache utilization
- Queue depth and active requests
- Health status

💰 Waste Detection

GPU underutilization — Deployments below 60% utilization threshold, with dollar waste per day and annualized
Dark capacity — GPU nodes with no pods scheduled (paying for nodes sitting empty)
Tier misplacement — Models running on an oversized GPU tier, with estimated cost delta per day
Fragmentation — Nodes with free GPU slots too small to fit any running model
Pending GPU pods — Workloads blocked from scheduling, shown with wait time
Cost Summary panel — Total spend rate, all waste categories, total estimated leak per day and per year
MFU (Model FLOPS Utilization) — Observed compute vs. theoretical GPU peak per deployment
Cost per 1K tokens — GPU spend translated into a business metric comparable to API pricing

📄 Multiple Output Formats

Format	Description
Table	Cost report with MFU, $/1K tokens, idle waste (default)
YAML	Kubernetes-style ModelSpec files
JSON	Machine-readable JSON output
PIQC Facts	Standardized facts bundle for control plane integration

🚀 Production-Ready

Parallel Processing: Multi-threaded scanning with configurable workers
RBAC Support: Pre-configured ClusterRole and ServiceAccount manifests
Flexible Modes: Auto-detect, remote (kubeconfig), or in-cluster execution
Timeout Controls: Configurable operation timeouts
Docker Image: Pre-built multi-platform image (linux/amd64 + linux/arm64) on GitHub Container Registry

🔮 Coming Soon

🔴 AMD GPU Support

Support for AMD Instinct and Radeon GPUs via rocm-smi:

AMD Instinct MI250X/MI300X detection
GPU utilization, memory & temperature metrics
ROCm ecosystem integration
Seamless multi-vendor GPU environments

🌐 LLM-D (LLM-Distributed)

Discovery and documentation for distributed LLM inference:

Distributed inference topology mapping
Multi-node GPU coordination metrics
Cross-node performance aggregation
Distributed KV cache analysis

📋 Commands

`piqc scan`

Scan your Kubernetes cluster for inference workloads and surface GPU waste.

piqc scan [OPTIONS]

Scan Options

Option	Default	Description
`--kubeconfig PATH`	`~/.kube/config`	Path to kubeconfig file
`--context TEXT`	current	Kubernetes context to use
`-n, --namespace TEXT`	all	Specific namespace to scan
`--format [yaml\|json\|table]`	`yaml`	Output format
`-o, --output PATH`	`./output`	Output directory for generated files

Collection Options

Option	Default	Description
`--collect-runtime`	`false`	Collect runtime metrics via vLLM API
`--no-exec`	`false`	Disable pod exec (skip GPU metrics)
`--no-logs`	`false`	Disable log reading
`--aggregate/--no-aggregate`	`aggregate`	Aggregate metrics across pod replicas
`--contribute-benchmarks`	`false`	Contribute anonymized GPU/model performance data to the Paralleliq benchmark dataset

Output Options

Option	Default	Description
`--combined`	`false`	Generate single combined output file
`--output-piqc`	`false`	Generate `piqc-facts.json` (PIQC v0.1 schema)

Execution Options

Option	Default	Description
`--timeout INT`	`30`	Operation timeout in seconds
`--workers INT`	`10`	Number of parallel workers
`--mode [auto\|remote\|incluster\|dry-run]`	`auto`	Execution mode
`-v, --verbose`	`false`	Enable verbose output
`--debug`	`false`	Enable debug mode with detailed trace

Examples

# Basic scan — discover all vLLM deployments and surface waste
piqc scan

# Scan specific namespace with JSON output
piqc scan -n production --format json

# Quick scan without GPU metrics (faster)
piqc scan --no-exec

# Collect runtime metrics from vLLM API
piqc scan --collect-runtime

# Generate PIQC facts bundle for control plane integration
piqc scan --output-piqc -o ./facts

# Table output to console (human-readable)
piqc scan --format table

# Custom kubeconfig and context
piqc scan --kubeconfig /path/to/config --context my-cluster

# Contribute anonymized GPU/model benchmarks to Paralleliq dataset
piqc scan --contribute-benchmarks

`piqc test-connection`

Test connection to Kubernetes cluster and verify required permissions.

piqc test-connection [OPTIONS]

Option	Default	Description
`--kubeconfig PATH`	`~/.kube/config`	Path to kubeconfig file
`--context TEXT`	current	Kubernetes context to use

`piqc version`

piqc version

📁 Output Formats

Table Format (default)

Run piqc scan --format table — no flags required. See the output example above.

Tier Fit column:

Symbol	Meaning
`✓`	Model is on an appropriate GPU tier for its size
`⚠ >T4`	Model is over-provisioned — minimum sufficient tier shown
`?`	Parameter count not parseable from model name

YAML Format

Generates individual Kubernetes-style YAML files for each deployment:

apiVersion: modelspec/v1
kind: ModelSpec
metadata:
  name: vllm-llama-7b
  namespace: inference
  collectionTimestamp: "2024-01-07T12:00:00Z"
  collectorVersion: "1.0.0"
model:
  name: meta-llama/Llama-2-7b-hf
  architecture: llama
  parameters: "7B"
  identificationConfidence: 0.95
engine:
  name: vllm
  version: "0.4.0"
  detectionConfidence: 0.95
inference:
  precision: float16
  tensorParallelSize: 4
  maxModelLen: 4096
  gpuMemoryUtilization: 0.90
resources:
  replicas: 2
  gpuCount: 4
  gpus:
    - type: A100-SXM4-80GB
      memoryTotal: "80GB"
      utilization: 87
      memoryUsed: 72000
runtimeState:
  vllm:
    healthStatus: healthy
    kvCacheUsagePercent: 45.2
    avgPromptThroughput: 1250.5
    avgGenerationThroughput: 85.3

PIQC Facts Bundle

With --output-piqc, generates a standardized facts bundle for integration with the Paralleliq control plane:

{
  "schemaVersion": "piqc-scan.v0.1",
  "generatedAt": "2024-01-07T12:00:00Z",
  "tool": {
    "name": "piqc",
    "version": "1.0.0"
  },
  "cluster": {
    "context": "my-context",
    "name": "my-cluster"
  },
  "objects": [
    {
      "workloadId": "ns/inference/deployment/vllm-llama-7b",
      "facts": {
        "runtime.engineType": {"value": "vllm", "dataConfidence": "high"},
        "hardware.gpuType": {"value": "A100-SXM4-80GB", "dataConfidence": "high"},
        "hardware.gpuCount": {"value": 4, "dataConfidence": "high"},
        "observed.gpuUtilization": {"value": 87, "unit": "%", "dataConfidence": "high"},
        "observed.kvCacheUsage": {"value": 45.2, "unit": "%", "dataConfidence": "high"}
      }
    }
  ]
}

📥 Installation

Prerequisites

Python: 3.11 or higher
Kubernetes Access: Valid kubeconfig with cluster access
Poetry: For development installation

Install from Source

git clone https://github.com/paralleliq/piqc.git
cd piqc
poetry install
poetry run piqc --version

Install for Development

git clone https://github.com/paralleliq/piqc.git
cd piqc
poetry install --with dev
poetry run pytest tests/unit -v

🔐 Kubernetes RBAC Requirements

piqc is read-only. It never creates, modifies, or deletes any resource in your cluster. The only write permission is pods/exec (to run nvidia-smi inside pods for GPU metrics) — and that can be disabled with --no-exec.

kubectl apply -f https://raw.githubusercontent.com/paralleliq/piqc/main/deploy/rbac.yaml

Resource	Verbs	Purpose
`pods`	get, list	Discover inference workloads
`pods/exec`	create	Run nvidia-smi for GPU metrics
`pods/log`	get	Enhanced framework detection
`namespaces`	get, list	Scan multiple namespaces
`deployments`	get, list	Identify deployment metadata
`statefulsets`	get, list	Identify StatefulSet workloads
`services`	get, list	Endpoint detection

🔧 Execution Modes

Mode	Description
`auto`	Automatically detect if running in-cluster or remotely
`remote`	Force remote mode (uses kubeconfig)
`incluster`	Force in-cluster mode (uses ServiceAccount)
`dry-run`	Simulate scan without cluster access

🐛 Troubleshooting

Docker Auth Plugin Errors (GKE / EKS / AKS)

Use the in-cluster Job approach (Option 1 in Quick Start) — it runs inside the cluster and needs no auth plugins. Or export a static kubeconfig:

kubectl config view --raw --flatten > /tmp/piqc-kubeconfig.yaml
docker run --rm -v /tmp/piqc-kubeconfig.yaml:/root/.kube/config ghcr.io/paralleliq/piqc:latest scan

RBAC Permission Errors

kubectl auth can-i list pods --all-namespaces
kubectl auth can-i create pods/exec -n <namespace>
kubectl apply -f https://raw.githubusercontent.com/paralleliq/piqc/main/deploy/rbac.yaml

GPU Metrics Unavailable

piqc scan --no-exec

📚 Project Structure

piqc/
├── src/piqc/
│   ├── cli/                  # CLI commands (scan, test-connection, version)
│   ├── collectors/           # Data collectors (vLLM config, GPU metrics)
│   ├── core/                 # Core logic (orchestrator, discovery, k8s client)
│   ├── generators/           # Output generators (YAML, JSON, Table, PIQC)
│   ├── models/               # Pydantic data models (ModelSpec, PIQC schema)
│   ├── parsers/              # Configuration parsers (vLLM)
│   └── utils/                # Utilities (logging, exceptions)
├── tests/
│   ├── unit/                 # Unit tests
│   └── integration/          # Integration tests
├── rbac/                     # Kubernetes RBAC manifests
├── docs/                     # Documentation
└── examples/                 # Example ModelSpec files

What to do with the results

piqc tells you what's wrong. The Paralleliq control plane closes the loop — it ingests the piqc facts bundle and automatically remediates misplacement, underutilization, and OOM risk through human-approved Temporal workflows.

→ paralleliq.ai · info@paralleliq.ai

📄 License

Apache License 2.0 — see LICENSE for details.

Name		Name	Last commit message	Last commit date
Latest commit History 56 Commits
.github/ISSUE_TEMPLATE		.github/ISSUE_TEMPLATE
deploy		deploy
docs		docs
examples		examples
k8s		k8s
piqc-test-outputs		piqc-test-outputs
rbac		rbac
src/piqc		src/piqc
tests		tests
.gitignore		.gitignore
CODEOWNERS		CODEOWNERS
CODE_OF_CONDUCT.md		CODE_OF_CONDUCT.md
CONTRIBUTING.md		CONTRIBUTING.md
Dockerfile		Dockerfile
GOVERNANCE.md		GOVERNANCE.md
LICENSE		LICENSE
ModelSpec_Final_Documentation.pdf		ModelSpec_Final_Documentation.pdf
README.md		README.md
REPO_STRUCTURE.md		REPO_STRUCTURE.md
SECURITY.md		SECURITY.md
TEST_GIT.md		TEST_GIT.md
gcp_testing_guide.md.resolved		gcp_testing_guide.md.resolved
piqc-test-outputs.zip		piqc-test-outputs.zip
piqc_Guide.pdf		piqc_Guide.pdf
poetry.lock		poetry.lock
pyproject.toml		pyproject.toml

Folders and files

Latest commit

History

Repository files navigation

piqc — GPU Waste Scanner for Kubernetes

What you'll see

🚀 Quick Start

Option 1: Run as a Kubernetes Job (recommended)

Option 2: Run with Docker from your laptop

Option 3: Install from source

✨ Features

🔍 Intelligent Discovery

📊 Comprehensive Metrics Collection

💰 Waste Detection

📄 Multiple Output Formats

🚀 Production-Ready

🔮 Coming Soon

📋 Commands

piqc scan

Scan Options

Collection Options

Output Options

Execution Options

Examples

piqc test-connection

piqc version

📁 Output Formats

Table Format (default)

YAML Format

PIQC Facts Bundle

📥 Installation

Prerequisites

Install from Source

Install for Development

🔐 Kubernetes RBAC Requirements

🔧 Execution Modes

🐛 Troubleshooting

Docker Auth Plugin Errors (GKE / EKS / AKS)

RBAC Permission Errors

GPU Metrics Unavailable

📚 Project Structure

What to do with the results

📄 License

About

Topics

Resources

License

Code of conduct

Contributing

Security policy

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Contributors

Uh oh!

Languages

`piqc scan`

`piqc test-connection`

`piqc version`

Packages