Skip to content
Open
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
178 changes: 57 additions & 121 deletions box/overall/how-it-works.mdx
Original file line number Diff line number Diff line change
Expand Up @@ -2,113 +2,69 @@
title: "Box Basics"
---

Every Upstash Box is a **durable execution environment** for AI workloads. Each box is an isolated container with its own filesystem, shell, network stack, and optional coding agent. You send prompts or commands, the box executes them, and you get back structured results without managing infrastructure.

Check warning on line 5 in box/overall/how-it-works.mdx

View check run for this annotation

Mintlify / Mintlify Validation (upstash) - vale-spellcheck

box/overall/how-it-works.mdx#L5

Did you really mean 'Upstash'?

Boxes are billed per active CPU time (not idle time), state persists across runs, and you can choose from Node, Python, Go or other runtimes. Unlike other sandboxing services, our boxes do not have a maximum session duration, you can resume them days or even weeks later.

Check warning on line 7 in box/overall/how-it-works.mdx

View check run for this annotation

Mintlify / Mintlify Validation (upstash) - vale-spellcheck

box/overall/how-it-works.mdx#L7

Did you really mean 'runtimes'?

Check warning on line 7 in box/overall/how-it-works.mdx

View check run for this annotation

Mintlify / Mintlify Validation (upstash) - vale-spellcheck

box/overall/how-it-works.mdx#L7

Did you really mean 'sandboxing'?

---

## Use Cases
Looking for inspiration? Check out the [Use Cases](/box/overall/use-cases) page.

### 1. Agent Servers

A very powerful pattern is the **Agent Server**: a long-running, per-tenant agent that persists its state across sessions. Unlike ephemeral sandboxes that lose everything on shutdown, an Agent Server keeps its history, context, and learned preferences intact forever.
---

<Frame>
<img src="/img/box/agent-server.png" />
</Frame>
## Architecture

Each user gets a dedicated Box running its own agent. The agent observes every request and response in a non-blocking way. It builds up a personalized understanding of what that user needs. Over time, it contributes back to a shared **Knowledge Base**, so insights from one tenant can improve results for everyone.
Every box is a self-contained environment with five capabilities:

Because boxes are serverless, idle tenants only cost a very low storage rate. When a user returns, their box wakes instantly with all prior state intact (installed packages, file history, learned preferences) and picks up exactly where it left off.
| Module | Description |
| -------------- | ------------------------------------------------------------ |
| **Agent** | Run a coding agent (Claude Code or Codex) |
| **Git** | Clone repos, inspect diffs, and open pull requests |

Check warning on line 20 in box/overall/how-it-works.mdx

View check run for this annotation

Mintlify / Mintlify Validation (upstash) - vale-spellcheck

box/overall/how-it-works.mdx#L20

Did you really mean 'repos'?
| **Shell** | Execute OS-level commands directly |
| **Filesystem** | Upload, write, read, list, and download files inside the box |
| **Snapshots** | Capture box state and restore new boxes from it |

### 2. Multi-Agent Orchestration
The agent has full access to the shell, filesystem, and git inside its box. It can install packages, write files, run tests, and interact with the network.

Box's async SDK lets you spin up multiple boxes in parallel, each running a specialized agent with a distinct role. Once every agent finishes, a final box can synthesize their outputs into a single result.
---

<Frame>
<img src="/img/box/review-infrastructure.png" />
</Frame>
## Runtimes

Check warning on line 29 in box/overall/how-it-works.mdx

View check run for this annotation

Mintlify / Mintlify Validation (upstash) - vale-spellcheck

box/overall/how-it-works.mdx#L29

Did you really mean 'Runtimes'?

A practical example is an automated **PR review pipeline**. When a pull request is opened, you fan out to three boxes. One for security review, one for code quality, and one for architecture. Then you collect their findings in a fourth box that summarizes everything and posts a comment on GitHub.
Each box runs in an isolated container with a pre-installed language runtime. By default, all runtimes use **Debian** (glibc), which offers the widest binary compatibility. Append `-alpine` for smaller images based on musl.

Check warning on line 31 in box/overall/how-it-works.mdx

View check run for this annotation

Mintlify / Mintlify Validation (upstash) - vale-spellcheck

box/overall/how-it-works.mdx#L31

Did you really mean 'runtimes'?

Check warning on line 31 in box/overall/how-it-works.mdx

View check run for this annotation

Mintlify / Mintlify Validation (upstash) - vale-spellcheck

box/overall/how-it-works.mdx#L31

Did you really mean 'glibc'?

Check warning on line 31 in box/overall/how-it-works.mdx

View check run for this annotation

Mintlify / Mintlify Validation (upstash) - vale-spellcheck

box/overall/how-it-works.mdx#L31

Did you really mean 'musl'?

```tsx
const pr = "https://github.com/acme/app/pull/42"
| Runtime | Default (Debian) | Alpine variant |
|---------|------------------|-------------------|
| Node.js | `node` | `node-alpine` |
| Python | `python` | `python-alpine` |
| Go | `golang` | `golang-alpine` |
| Ruby | `ruby` | `ruby-alpine` |
| Rust | `rust` | `rust-alpine` |

const [security, quality, architecture] = await Promise.all([
Box.create({ runtime: "node", agent: { provider: Agent.ClaudeCode, model: ClaudeCode.Sonnet_4_6 } }),
Box.create({ runtime: "node", agent: { provider: Agent.ClaudeCode, model: ClaudeCode.Sonnet_4_6 } }),
Box.create({ runtime: "node", agent: { provider: Agent.ClaudeCode, model: ClaudeCode.Sonnet_4_6 } }),
])
```ts
// Debian (default) — best for native modules and prebuilt binaries
const box = await Box.create({ runtime: "node" })

const reviews = await Promise.all([
security.agent.run({ prompt: `Security review for ${pr}` }),
quality.agent.run({ prompt: `Code quality review for ${pr}` }),
architecture.agent.run({ prompt: `Architecture review for ${pr}` }),
])
// Alpine — smaller image, musl-based
const box = await Box.create({ runtime: "node-alpine" })
```

const jury = await Box.create({
runtime: "node",
agent: { provider: Agent.ClaudeCode, model: ClaudeCode.Sonnet_4_6 },
git: { token: process.env.GIT_TOKEN },
})
---

await jury.agent.run({
prompt: `Summarize these reviews and post a comment on ${pr}:\n${reviews.map((r) => r.result).join("\n\n")}`,
})
```
## Agent

Because each box is isolated, the agents cannot interfere with each other. You get true parallelism with independent filesystems, and the orchestration logic stays in your own code.
Every Upstash Box comes with built-in coding agent harnesses. You don't need to bring your own agent framework or wire up tool calls. The box already knows how to give an agent access to its shell, filesystem, and git, and how to stream output back to you.

Check warning on line 53 in box/overall/how-it-works.mdx

View check run for this annotation

Mintlify / Mintlify Validation (upstash) - vale-spellcheck

box/overall/how-it-works.mdx#L53

Did you really mean 'Upstash'?

### 3. Parallel Testing & Comparison
We currently support Claude Code and Codex as native agents inside of a box. You choose a model when creating a box.

Box makes it easy to run parallel test scenarios at scale. Spin up N boxes, each running a different model against the same inputs, and compare the results side by side.
For more details, see the [Agent](/box/overall/agent) page.

<Frame>
<img src="/img/box/model-comparison.png" />
<img src="/img/box/agent.png" />
</Frame>

For example, at Context7 we use Box to benchmark LLMs for context extraction over documentation. We spin up boxes in parallel, each running a different model against the same documentation files and prompts. We then evaluate hallucination percentage, accuracy score, and context quality to find the best model:

```tsx
const models = ["claude/opus_4_6", "openai/gpt-5.4-codex", "google/gemini-3-pro"]

const docs = await fs.readFile("./documentation.md", "utf-8")
const prompt = `Extract all API endpoints from this documentation:\n${docs}`

const boxes = await Promise.all(
models.map((model) => Box.create({ runtime: "node", agent: { provider: inferDefaultProvider(model), model } })),
)

const results = await Promise.all(boxes.map((box) => box.agent.run({ prompt })))

const evaluation = results.map((r, i) => ({
model: models[i],
result: r.result,
hallucinationPct: evaluateHallucination(r.result, docs),
accuracyScore: evaluateAccuracy(r.result, docs),
}))

const bestModel = evaluation.sort((a, b) => a.hallucinationPct - b.hallucinationPct)[0]
```

Each box is fully isolated, so one model's behavior never leaks into another's results.

---

## Architecture

Every box is a self-contained environment with five capabilities:
Each iteration builds on the last. If a test fails, the agent sees the error output and corrects. If a file is missing, it discovers that during the read phase and adapts. The loop continues until the task is complete or the agent determines it cannot make further progress.

| Module | Description |
| -------------- | ------------------------------------------------------------ |
| **Agent** | Run a coding agent (Claude Code or Codex) |
| **Git** | Clone repos, inspect diffs, and open pull requests |
| **Shell** | Execute OS-level commands directly |
| **Filesystem** | Upload, write, read, list, and download files inside the box |
| **Snapshots** | Capture box state and restore new boxes from it |
You control what goes in (the prompt) and what comes out (raw text or a structured response). The agent handles reasoning and tool selection within its box, using the same [shell](/box/overall/shell), [filesystem](/box/overall/files), and [git](/box/overall/git) available to you through the SDK.

The agent has full access to the shell, filesystem, and git inside its box. It can install packages, write files, run tests, and interact with the network.
A box retains its full state between runs (files, installed packages, git history, etc.). You can send multiple prompts to the same box and the agent picks up exactly where it left off.

---

Expand All @@ -120,7 +76,7 @@

### 1. Created

When you create a box, Upstash provisions a new isolated container with its own filesystem, shell, and network stack. You can start from a fresh box or restore from a snapshot. Once provisioning finishes, the box is ready to receive commands.

Check warning on line 79 in box/overall/how-it-works.mdx

View check run for this annotation

Mintlify / Mintlify Validation (upstash) - vale-spellcheck

box/overall/how-it-works.mdx#L79

Did you really mean 'Upstash'?

### 2. Running

Expand Down Expand Up @@ -223,31 +179,11 @@

---

## Security & Isolation

Every box runs as its own Docker container with an independent filesystem, process tree, and network stack. Boxes cannot communicate with or observe each other. There is no shared state between them.

<Frame>
<img src="/img/box/routing.png" />
</Frame>

Your app makes SDK calls to the Upstash API gateway, which authenticates the request and routes it to the correct box. Each box has a unique ID, and all communication between your app and the box is encrypted in transit.

Inside a box, the agent, shell, filesystem, and git all share the same isolated environment. The agent can install packages, write files, spawn processes, and make outbound HTTP requests, but only within its own container boundary. It cannot access the host, other boxes, or any Upstash-internal infrastructure.

| Boundary | Guarantee |
| -------------- | --------------------------------------------------------------------------------------- |
| **Filesystem** | Each box has its own filesystem. No shared volumes between boxes. |
| **Processes** | Process trees are fully isolated. One box cannot signal or inspect another's processes. |
| **Network** | Boxes can make outbound requests (HTTP, DNS) but cannot reach other boxes. |

---

## Networking

Every box has full outbound network access by default. HTTP, HTTPS, DNS, WebSockets, and raw TCP are all available. Agents can call external APIs, download packages, pull container images, and interact with any public endpoint.

Boxes run on AWS infrastructure with **22.5 Gbps** network bandwidth per host. This means large file transfers, dataset downloads, and parallel API calls are fast by default.

Check warning on line 186 in box/overall/how-it-works.mdx

View check run for this annotation

Mintlify / Mintlify Validation (upstash) - vale-spellcheck

box/overall/how-it-works.mdx#L186

Did you really mean 'Gbps'?

Because boxes run on fast AWS infrastructure, they have single-digit ms to major cloud services (S3, GitHub, etc.).

Expand All @@ -271,7 +207,7 @@
| --- | --- |
| `allow-all` | Default. No restrictions on outbound traffic. |
| `deny-all` | Block all outbound network access. |
| `custom` | Allow or deny specific domains and CIDRs. |

Check warning on line 210 in box/overall/how-it-works.mdx

View check run for this annotation

Mintlify / Mintlify Validation (upstash) - vale-spellcheck

box/overall/how-it-works.mdx#L210

Did you really mean 'CIDRs'?

In `custom` mode you can combine `allowedDomains`, `allowedCidrs`, and `deniedCidrs`:

Expand Down Expand Up @@ -300,6 +236,26 @@

---

## Security & Isolation

Every box runs as its own Docker container with an independent filesystem, process tree, and network stack. Boxes cannot communicate with or observe each other. There is no shared state between them.

<Frame>
<img src="/img/box/routing.png" />
</Frame>

Your app makes SDK calls to the Upstash API gateway, which authenticates the request and routes it to the correct box. Each box has a unique ID, and all communication between your app and the box is encrypted in transit.

Check warning on line 247 in box/overall/how-it-works.mdx

View check run for this annotation

Mintlify / Mintlify Validation (upstash) - vale-spellcheck

box/overall/how-it-works.mdx#L247

Did you really mean 'Upstash'?

Inside a box, the agent, shell, filesystem, and git all share the same isolated environment. The agent can install packages, write files, spawn processes, and make outbound HTTP requests, but only within its own container boundary. It cannot access the host, other boxes, or any Upstash-internal infrastructure.

| Boundary | Guarantee |
| -------------- | --------------------------------------------------------------------------------------- |
| **Filesystem** | Each box has its own filesystem. No shared volumes between boxes. |
| **Processes** | Process trees are fully isolated. One box cannot signal or inspect another's processes. |

Check warning on line 254 in box/overall/how-it-works.mdx

View check run for this annotation

Mintlify / Mintlify Validation (upstash) - vale-spellcheck

box/overall/how-it-works.mdx#L254

Did you really mean 'another's'?
| **Network** | Boxes can make outbound requests (HTTP, DNS) but cannot reach other boxes. |

---

## Compute & Billing

Compute is billed separately from persisted storage. CPU is only metered while a box is actively executing commands or agent steps, so paused or idle boxes do not accrue compute charges.
Expand All @@ -311,23 +267,3 @@
| **Storage** | $0.10 per GB-month for **all** persisted storage, including disks and snapshots |

Boxes are currently available in one size with 2 vCPU, 2GB of memory and 10GB disk space.

---

## Agent

Every Upstash Box comes with built-in coding agent harnesses. You don't need to bring your own agent framework or wire up tool calls. The box already knows how to give an agent access to its shell, filesystem, and git, and how to stream output back to you.

We currently support Claude Code and Codex as native agents inside of a box. You choose a model when creating a box.

For more details, see the [Agent](/box/overall/agent) page.

<Frame>
<img src="/img/box/agent.png" />
</Frame>

Each iteration builds on the last. If a test fails, the agent sees the error output and corrects. If a file is missing, it discovers that during the read phase and adapts. The loop continues until the task is complete or the agent determines it cannot make further progress.

You control what goes in (the prompt) and what comes out (raw text or a structured response). The agent handles reasoning and tool selection within its box, using the same [shell](/box/overall/shell), [filesystem](/box/overall/files), and [git](/box/overall/git) available to you through the SDK.

A box retains its full state between runs (files, installed packages, git history, etc.). You can send multiple prompts to the same box and the agent picks up exactly where it left off.
4 changes: 4 additions & 0 deletions box/overall/quickstart.mdx
Original file line number Diff line number Diff line change
Expand Up @@ -2,9 +2,9 @@
title: "Quickstart"
---

**Upstash Box lets you give your AI agents a computer.**

Check warning on line 5 in box/overall/quickstart.mdx

View check run for this annotation

Mintlify / Mintlify Validation (upstash) - vale-spellcheck

box/overall/quickstart.mdx#L5

Did you really mean 'Upstash'?

Every Upstash Box is a **secure, isolated cloud container with an AI Agent built-in**. Spin up as many as you want in parallel. Each one includes a full environment with a filesystem, shell, git, and a runtime. Your agent can read files, write code, and execute tasks inside it.

Check warning on line 7 in box/overall/quickstart.mdx

View check run for this annotation

Mintlify / Mintlify Validation (upstash) - vale-spellcheck

box/overall/quickstart.mdx#L7

Did you really mean 'Upstash'?

<Note>Upstash Box is in developer preview — APIs and pricing may change.</Note>

Expand Down Expand Up @@ -53,6 +53,10 @@
})
```

<Tip>
By default, runtimes use **Debian** (glibc). For smaller Alpine-based images, use `"node-alpine"`, `"python-alpine"`, etc.

Check warning on line 57 in box/overall/quickstart.mdx

View check run for this annotation

Mintlify / Mintlify Validation (upstash) - vale-spellcheck

box/overall/quickstart.mdx#L57

Did you really mean 'runtimes'?

Check warning on line 57 in box/overall/quickstart.mdx

View check run for this annotation

Mintlify / Mintlify Validation (upstash) - vale-spellcheck

box/overall/quickstart.mdx#L57

Did you really mean 'glibc'?
</Tip>

Your box is ready to use! You can already use it as a standalone, secure, isolated sandbox with full shell access, git, and filesystem operations.

---
Expand Down Expand Up @@ -122,7 +126,7 @@

## Use Cases

The idea behind Upstash Box is simple: **give AI its own computer**. Your agent gets a full, isolated cloud environment it can control. Run commands, write files, or execute code independent of any user device. Freeze a box anytime, and continue days or even weeks later with perfect resumability.

Check warning on line 129 in box/overall/quickstart.mdx

View check run for this annotation

Mintlify / Mintlify Validation (upstash) - vale-spellcheck

box/overall/quickstart.mdx#L129

Did you really mean 'Upstash'?

Check warning on line 129 in box/overall/quickstart.mdx

View check run for this annotation

Mintlify / Mintlify Validation (upstash) - vale-spellcheck

box/overall/quickstart.mdx#L129

Did you really mean 'resumability'?

Great example use cases:

Expand All @@ -146,7 +150,7 @@

<CardGroup cols={2}>
<Card title="How Boxes work" href="/box/overall/how-it-works">
Learn the basics about using Upstash Box.

Check warning on line 153 in box/overall/quickstart.mdx

View check run for this annotation

Mintlify / Mintlify Validation (upstash) - vale-spellcheck

box/overall/quickstart.mdx#L153

Did you really mean 'Upstash'?
</Card>

<Card title="Agent" href="/box/overall/agent">
Expand Down
91 changes: 91 additions & 0 deletions box/overall/use-cases.mdx
Original file line number Diff line number Diff line change
@@ -0,0 +1,91 @@
---
title: "Use Cases"
---

The idea behind Upstash Box is simple: **give AI its own computer**. Your agent gets a full, isolated cloud environment it can control. Run commands, write files, or execute code independent of any user device. Freeze a box anytime, and continue days or even weeks later with perfect resumability.

Check warning on line 5 in box/overall/use-cases.mdx

View check run for this annotation

Mintlify / Mintlify Validation (upstash) - vale-spellcheck

box/overall/use-cases.mdx#L5

Did you really mean 'Upstash'?

Check warning on line 5 in box/overall/use-cases.mdx

View check run for this annotation

Mintlify / Mintlify Validation (upstash) - vale-spellcheck

box/overall/use-cases.mdx#L5

Did you really mean 'resumability'?

---

## 1. Agent Servers

A very powerful pattern is the **Agent Server**: a long-running, per-tenant agent that persists its state across sessions. Unlike ephemeral sandboxes that lose everything on shutdown, an Agent Server keeps its history, context, and learned preferences intact forever.

<Frame>
<img src="/img/box/agent-server.png" />
</Frame>

Each user gets a dedicated Box running its own agent. The agent observes every request and response in a non-blocking way. It builds up a personalized understanding of what that user needs. Over time, it contributes back to a shared **Knowledge Base**, so insights from one tenant can improve results for everyone.

Because boxes are serverless, idle tenants only cost a very low storage rate. When a user returns, their box wakes instantly with all prior state intact (installed packages, file history, learned preferences) and picks up exactly where it left off.

Check warning on line 19 in box/overall/use-cases.mdx

View check run for this annotation

Mintlify / Mintlify Validation (upstash) - vale-spellcheck

box/overall/use-cases.mdx#L19

Did you really mean 'serverless'?

## 2. Multi-Agent Orchestration

Box's async SDK lets you spin up multiple boxes in parallel, each running a specialized agent with a distinct role. Once every agent finishes, a final box can synthesize their outputs into a single result.

<Frame>
<img src="/img/box/review-infrastructure.png" />
</Frame>

A practical example is an automated **PR review pipeline**. When a pull request is opened, you fan out to three boxes. One for security review, one for code quality, and one for architecture. Then you collect their findings in a fourth box that summarizes everything and posts a comment on GitHub.

```tsx
const pr = "https://github.com/acme/app/pull/42"

const [security, quality, architecture] = await Promise.all([
Box.create({ runtime: "node", agent: { provider: Agent.ClaudeCode, model: ClaudeCode.Sonnet_4_6 } }),
Box.create({ runtime: "node", agent: { provider: Agent.ClaudeCode, model: ClaudeCode.Sonnet_4_6 } }),
Box.create({ runtime: "node", agent: { provider: Agent.ClaudeCode, model: ClaudeCode.Sonnet_4_6 } }),
])

const reviews = await Promise.all([
security.agent.run({ prompt: `Security review for ${pr}` }),
quality.agent.run({ prompt: `Code quality review for ${pr}` }),
architecture.agent.run({ prompt: `Architecture review for ${pr}` }),
])

const jury = await Box.create({
runtime: "node",
agent: { provider: Agent.ClaudeCode, model: ClaudeCode.Sonnet_4_6 },
git: { token: process.env.GIT_TOKEN },
})

await jury.agent.run({
prompt: `Summarize these reviews and post a comment on ${pr}:\n${reviews.map((r) => r.result).join("\n\n")}`,
})
```

Because each box is isolated, the agents cannot interfere with each other. You get true parallelism with independent filesystems, and the orchestration logic stays in your own code.

## 3. Parallel Testing & Comparison

Box makes it easy to run parallel test scenarios at scale. Spin up N boxes, each running a different model against the same inputs, and compare the results side by side.

<Frame>
<img src="/img/box/model-comparison.png" />
</Frame>

For example, at Context7 we use Box to benchmark LLMs for context extraction over documentation. We spin up boxes in parallel, each running a different model against the same documentation files and prompts. We then evaluate hallucination percentage, accuracy score, and context quality to find the best model:

Check warning on line 67 in box/overall/use-cases.mdx

View check run for this annotation

Mintlify / Mintlify Validation (upstash) - vale-spellcheck

box/overall/use-cases.mdx#L67

Did you really mean 'LLMs'?

```tsx
const models = ["claude/opus_4_6", "openai/gpt-5.4-codex", "google/gemini-3-pro"]

const docs = await fs.readFile("./documentation.md", "utf-8")
const prompt = `Extract all API endpoints from this documentation:\n${docs}`

const boxes = await Promise.all(
models.map((model) => Box.create({ runtime: "node", agent: { provider: inferDefaultProvider(model), model } })),
)

const results = await Promise.all(boxes.map((box) => box.agent.run({ prompt })))

const evaluation = results.map((r, i) => ({
model: models[i],
result: r.result,
hallucinationPct: evaluateHallucination(r.result, docs),
accuracyScore: evaluateAccuracy(r.result, docs),
}))

const bestModel = evaluation.sort((a, b) => a.hallucinationPct - b.hallucinationPct)[0]
```

Each box is fully isolated, so one model's behavior never leaks into another's results.

Check warning on line 91 in box/overall/use-cases.mdx

View check run for this annotation

Mintlify / Mintlify Validation (upstash) - vale-spellcheck

box/overall/use-cases.mdx#L91

Did you really mean 'another's'?
1 change: 1 addition & 0 deletions docs.json
Original file line number Diff line number Diff line change
Expand Up @@ -1583,6 +1583,7 @@
"box/overall/quickstart",
"box/overall/pricing",
"box/overall/how-it-works",
"box/overall/use-cases",
"box/overall/agent",
"box/overall/git",
"box/overall/shell",
Expand Down