Skip to content

Denomas/clone

 
 

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

4 Commits
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

Clone

A lightweight Linux VMM built for multi-tenant shell hosting and high-density VM workloads. 25K lines of Rust, single binary, KVM-based.

Clone boots a template VM once, then forks isolated copies via Shadow Clone page mapping. Idle VMs get reclaimed automatically. A host running 100 shells uses memory like it's running 10.

Watch the demo unixshells.com uses Clone as the VMM behind its shell hosting.

Template VM (4GB Ubuntu, all tools warm)
  ├── Fork → User shell 1  ─── ~160ms, full networking, unique IP
  ├── Fork → User shell 2  ─── ~160ms, Shadow Clone diverges on write
  ├── Fork → User shell 3  ─── ~160ms, balloon reclaims when idle
  └── Fork → User shell N  ─── ~160ms, KVM hardware isolation

Lightweight template (Alpine/busybox, 128-512MB)
  ├── Fork → Lambda 1  ─── <20ms, minimal overhead
  ├── Fork → Lambda 2  ─── <20ms, ~4MB per fork
  └── Fork → Lambda N  ─── <20ms, destroy after use

Why Clone

The problem: Traditional shared shell hosting gives every user a login on the same kernel. Resource usage is minimal — idle users cost almost nothing. But in 2026, multi-tenant on a shared kernel is indefensible. Container escapes are routine. Kernel exploits ship monthly. There's no safe way to run untrusted users on a shared kernel.

VMs solve the security problem — KVM gives you a hardware-enforced boundary. But VMs are slow to start and each one consumes its own memory. Running 100 VMs like you'd run 100 shell users is prohibitively expensive. We looked at every existing VMM — QEMU, Firecracker, Cloud Hypervisor — and none of them could give us shared-shell-level resource efficiency with VM-level isolation.

So we built Clone.

Clone's answer: Shadow Clone fork from warm templates. Boot a VM once with everything loaded, snapshot it, then fork copies in <20ms. All forks share the same physical memory pages until they write — only dirty pages cost memory. 100 forked VMs use memory like 10. You get the resource profile of shared shell hosting with the full security of KVM hardware isolation.

Three-layer memory management (overcommit + KSM + balloon), virtio-fs for host directory sharing, pre-copy live migration, and VFIO device passthrough — all in a single binary smaller than most config files.

Clone (measured) Firecracker (official) Cloud Hypervisor (official) QEMU
Code size 25K Rust ~50K Rust ~70K Rust ~2M+ C
Fork (Alpine) <20ms (Shadow Clone) ~5-10ms (snapshot) stop+resume stop+resume
Fork (4GB Ubuntu + net) ~160ms (Shadow Clone) N/A N/A N/A
Cold boot (distro kernel) 2,217ms ~2-3s ~2s 5-20s
Cold boot (minimal kernel) <=125ms ^1 <100ms ^1 500ms-2s
Live migration downtime 1ms none yes (unpublished) 50-300ms
2x 4GB forked VMs host RAM ~1GB N/A N/A N/A
3 forked VMs RSS (Alpine) 13MB N/A N/A N/A
10x 512MB idle VMs ~200MB ~5GB variable variable
Incremental snapshot 192KB (682x smaller) full only full only full + incremental
GPU passthrough yes (VFIO) no yes yes
Host dir sharing yes (no daemon) no virtiofsd virtiofsd
Fork networking yes (userspace) no no no
Fork vsock yes (userspace) yes (userspace) no no

^1 With custom minimal kernels. Distro kernels: all VMMs converge to ~2-3s.


Use Cases

Unix Shell Hosting (Primary)

The original shared hosting model — many users, each with their own Linux shell — but with VM-level hardware isolation instead of chroot.

# Build the template image (Ubuntu 24.04 + 200+ dev tools)
sudo ./shell-template/build.sh

# Boot, warm binaries, snapshot
sudo ./shell-template/warm.sh

# Fork shells for users in ~160ms each
clone fork --template /templates/shell-base --net
clone fork --template /templates/shell-base --net
clone fork --template /templates/shell-base --net
# All share the same base memory pages. KVM isolates each user.

# Execute commands inside a forked VM
clone exec --vm-id $VM_ID -- python3 -c "print('hello')"
clone exec --vm-id $VM_ID -- curl https://example.com
  • ~160ms to spin up a new user shell (4GB Ubuntu, all tools warm)
  • ~500MB additional RAM per forked 4GB VM (Shadow Clone sharing, measured: 2 VMs = 1GB host RAM)
  • Full networking — each fork gets TAP + bridge + NAT, ping/curl/HTTPS work
  • Userspace vsock — guest agent for exec, heartbeat, shutdown
  • KVM hardware boundary — not a container, real isolation
  • Balloon reclaim — idle shells automatically give back memory
  • D-Bus/service recovery — agent restarts stale services after fork

Function-as-a-Service

Shadow Clone fork from warm templates with the runtime already loaded. Sub-20ms cold start without custom kernel tuning.

# Warm template with Python + ML libs loaded
clone fork --template /templates/python-ml
# Execute function, destroy. Runtime was already warm.

Dev Environments

Isolated Linux environment in <20ms with your code mounted in.

clone fork --template /templates/node20-warm \
  --shared-dir ~/projects/myapp:code
# Node.js already warm, your files at /mnt/code inside the VM

CI/CD Runners

Shared base image, per-build writable overlay. Strong isolation, fast teardown.

sudo clone run --kernel vmlinuz --rootfs ubuntu-ci.img --overlay --net
# Fresh writable layer on shared read-only base. Discard on exit.

Quick Start

# Build
cargo build --release

# Create a rootfs (defaults: ubuntu=noble, debian=bookworm, alpine=3.21)
sudo clone rootfs create --distro ubuntu --size 2G -o ubuntu.img
sudo clone rootfs create --distro ubuntu --release jammy --size 2G -o ubuntu-22.img
sudo clone rootfs create --distro alpine --size 1G -o alpine.img

# Boot a VM
sudo clone run --kernel /boot/vmlinuz-$(uname -r) --rootfs alpine.img

# With networking
sudo clone run --kernel vmlinuz --rootfs alpine.img --net --mem-mb 512

# With host directory sharing
sudo clone run --kernel vmlinuz --rootfs alpine.img \
  --shared-dir /tmp/shared:myfs
# Inside guest: mount -t virtiofs myfs /mnt

# With overlay (shared read-only base, per-VM writable layer)
sudo clone run --kernel vmlinuz --rootfs base.img --overlay

# GPU passthrough
sudo clone run --kernel vmlinuz --rootfs ml.img \
  --passthrough 0000:01:00.0 --mem-mb 8192

# Attach to a running VM's serial console (Ctrl-Q to detach)
clone attach

# Execute a command inside a running VM
clone exec -- ls /

# List all running VMs (no daemon required)
clone list --no-daemon

# Fork with full device support
sudo clone fork --template /tmp/my-template \
  --net --shared-dir /tmp/shared:myfs --block extra-disk.img

Prerequisites

  • Linux host with KVM (/dev/kvm)
  • Kernel 6.5+ recommended
  • For networking: /dev/net/tun (vhost-net optional, used on boot path only)
  • For GPU passthrough: device bound to vfio-pci driver

Features

VM Lifecycle

Command What it does
clone run Boot a new VM from kernel + rootfs/initrd
clone fork Fork from a template snapshot (<20ms)
clone snapshot Snapshot a running VM for later fork
clone attach Attach to a running VM's serial console
clone exec Execute a command inside a running VM
clone list List running VMs (works with or without daemon)
clone migrate --live Pre-copy live migration to another host
clone migrate-recv Receive a live migration
clone rootfs create Create a bootable rootfs (Alpine, Ubuntu, Debian, Docker import)
clone daemon Multi-VM orchestration daemon (create, fork, snapshot, destroy)

Devices

  • virtio-block — raw and qcow2 disk images, thin provisioning
  • virtio-net — TAP + vhost-net (boot) or userspace (fork), auto bridge/NAT setup
  • virtio-balloon — cooperative memory reclaim with hysteresis policy
  • virtio-vsock — userspace backend for host-guest communication (fork-compatible)
  • virtio-fs — host directory sharing via inline FUSE (no external daemon)
  • PCI bus — ECAM config space for VFIO device passthrough
  • Serial console — 16550A UART, bidirectional terminal I/O

Memory Management

Three layers stacked to minimize host RAM across VMs:

  1. OvercommitMAP_NORESERVE, pages allocated on first write only
  2. KSMMADV_MERGEABLE deduplicates identical pages across all VMs
  3. Balloon — graduated reclaim with hysteresis (idle 30s → 25%, idle 2min → 50%, idle 5min → floor)

Result: 10 idle 512MB VMs use ~200MB of host RAM, not 5GB.

VMs with >3GB RAM automatically get split memory regions around the x86 PCI MMIO hole (3-4GB). The guest sees all requested memory (e.g., 4GB VM → 3.8Gi usable, 8GB → 7.8Gi). No configuration needed — Clone handles the split transparently.

Shadow Clone Fork

Boot template → warm binaries → snapshot memory + registers
                                        ↓
              New VM = mmap(snapshot, MAP_PRIVATE)  ← ~160ms
                                        ↓
              Inject identity (hostname, CID, IP, MAC)
                                        ↓
              Transport reset → agent reconnects → exec ready

All forks share the same physical pages via Shadow Clone mapping until they write. No kernel boot on fork. The userspace vsock and net backends handle fork state without kernel involvement. Each fork gets unique IP, hostname, and vsock CID.

Measured: 2 forked 4GB Ubuntu VMs use ~1GB host RAM (Shadow Clone sharing). 3 forked Alpine VMs use 13MB total RSS vs 127MB template.

Live Migration

Pre-copy over TCP. VM keeps running while memory transfers in the background.

Source                              Destination
  │ send full memory (skip zeros) ──→ │
  │ send dirty pages (round 1)   ──→ │
  │ send dirty pages (round 2)   ──→ │
  │ ...converge...                    │
  │ PAUSE → send final dirty + CPU ─→│
  │         ~19ms downtime            │ RESUME
  │ shutdown                          │ running

Security

  • KVM hardware isolation — each VM is a separate address space
  • Seccomp jailer — BPF syscall filter on VMM process (--seccomp)
  • Measured boot — SHA-256 kernel hash verification before loading
  • Namespace jail — optional full jail with chroot + capabilities (--jail)

Rootfs Modes

# Mode 1: Custom initrd (everything in RAM)
clone run --kernel vmlinuz --initrd my-initrd.img

# Mode 2: Disk rootfs (persistent, read-write)
clone run --kernel vmlinuz --rootfs disk.img

# Mode 3: Shared base + overlay (multi-VM, ephemeral or persistent)
clone run --kernel vmlinuz --rootfs base.img --overlay
clone run --kernel vmlinuz --rootfs base.img --overlay /data/vm1.qcow2

Architecture

src/
├── main.rs              CLI entry point
├── vmm/                 VM lifecycle, vCPU threads, MMIO bus
├── boot/                Kernel loading (bzImage/ELF), ACPI tables, page tables
├── memory/              Guest memory, overcommit, KSM, page tables, GDT
├── virtio/              Virtio devices (block, net, balloon, vsock, fs)
├── pci/                 PCI bus (ECAM), VFIO passthrough
├── migration/           Pre-copy live migration (sender, receiver, wire protocol)
├── control/             Control plane (per-VM socket + daemon for multi-VM orchestration)
├── net/                 TAP/bridge/NAT auto-setup
├── storage/             Raw + QCOW2 block backends
├── rootfs.rs            Auto-generated initrd for --rootfs mode (embeds kernel modules, agent)
└── rootfs_create.rs     `clone rootfs create` (Alpine, Ubuntu, Debian, Docker)

crates/
├── guest-agent/         In-guest vsock agent (exec, networking, D-Bus recovery, heartbeat)
└── clone-init/         Minimal init for auto-generated initrd (module loading, rootfs mount, agent launch)

shell-template/
├── build.sh             Build Ubuntu 24.04 rootfs with 200+ dev tools
├── warm.sh              Boot template, warm binaries, snapshot for fork
├── setup-host.sh        Configure host (KVM, bridge, firewall, RAID)
└── setup-backups.sh     Nightly R2 backups for user home dirs

Dependencies: kvm-ioctls, kvm-bindings, vm-memory, libc, clap, anyhow, tracing, sha2. No libvirt, no QEMU, no forked codebases.


Benchmarks

All numbers measured on bare-metal (OVH dedicated server, Intel Xeon E-2386G, Ubuntu 24.04, kernel 6.8.0-106-generic).

Metric Value
Shadow Clone fork (Alpine, minimal) <20ms
Shadow Clone fork (4GB Ubuntu, to exec) ~160ms
VMM overhead (cold boot) 35ms (memory, irqchip, devices, kernel load)
Cold boot to shell (distro kernel) 2,217ms (best), 2,338ms (avg of 5 runs)
Live migration downtime (256MB) 1ms
Incremental snapshot size 192KB for 512MB VM (682x smaller than full)
Shadow Clone sharing (2x 4GB Ubuntu) ~1GB host RAM for 8GB committed
Shadow Clone sharing (3x Alpine) 13MB vs 127MB template
Fork networking Full (ping, curl, HTTPS) via userspace virtio-net
Fork exec echo/python/git/curl all work within 50ms
Binary size ~3MB
VMM memory overhead ~5-10MB

See docs/SPEC.md for detailed comparisons with Firecracker, Cloud Hypervisor, and QEMU.


Test Results

63 tests, 62 passed, 1 skipped. Full suite in ~315 seconds.

Run on bare-metal Ubuntu 22.04, kernel 6.5.0-35-generic, 2026-03-17.

Boot & ACPI

Test Result Details
test_boot_serial PASS VM booted, printed serial marker, completed init
test_boot_speed PASS Cold boot in 2,218ms (< 3,000ms target)
test_boot_speed_avg PASS Average 2,225ms over 5 runs (< 3,000ms target)
test_acpi_no_errors PASS Zero ACPI errors in boot log
test_multi_vcpu PASS 4-vCPU VM booted, guest sees 4 CPUs

Control Plane

Test Result Details
test_control_socket PASS Socket appears, status/pause/resume/shutdown all work
test_pause_resume PASS VM survives 6 pause/resume cycles

Storage

Test Result Details
test_virtio_block_rw PASS VM boots with virtio-block attached
test_qcow2_block PASS QCOW2 disk image as block backend
test_qcow2_backing_file PASS QCOW2 overlay + raw backing (overlay=196KB, base=16MB untouched)

Snapshots & Fork

Test Result Details
test_snapshot_fork PASS Snapshot created, integrity verified, forked VM running
test_incremental_snapshot PASS Incremental snapshot 668x smaller (full=128MB, dirty=192KB)
test_shadow_clone_sharing PASS 3 forked VMs RSS=19MB < 2x single VM=125MB
test_template_integrity PASS Corrupted template correctly rejected

Security

Test Result Details
test_seccomp_filter PASS VM boots and runs cleanly under seccomp BPF

Devices & Sharing

Test Result Details
test_virtiofs PASS virtio-fs device registered and active
test_pci_bus PASS VM boots with PCI enumeration active (no pci=off)
test_vfio_passthrough SKIP No PCI device bound to vfio-pci on test server

Migration

Test Result Details
test_live_migration PASS Pre-copy migration, 1ms downtime, source stopped, receiver running

Rootfs Boot (Real Distros)

Test Result Details
test_rootfs_alpine PASS Alpine 3.21 boots to OpenRC login prompt
test_rootfs_ubuntu PASS Ubuntu 24.04 (noble) boots, clone-init hands off to init, control socket active

Guest Agent & Networking

Test Result Details
test_unique_cid PASS 2 VMs with unique CIDs boot simultaneously
test_guest_networking PASS eth0 configured, gateway ICMP, DNS (UDP), TCP all working
test_exec_latency PASS Exec round-trip in 796ms (< 1,000ms target)

Multi-VM & Memory

Test Result Details
test_concurrent_vms PASS 3 VMs running simultaneously
test_memory_accounting PASS 512MB VM uses 126MB RSS (overcommit working)
test_balloon PASS Guest kernel detects virtio-balloon, RSS=126MB for 512MB VM

Shadow Clone with Real Distros

Test Result Details
test_shadow_clone_rootfs PASS Alpine template (126MB) → 3 forks total RSS=18MB (~6MB/fork)
# Run all tests (requires root + KVM + kernel 6.5+)
sudo KERNEL=/path/to/vmlinuz-6.5 ./tests/e2e/run_all.sh

# Run a specific test
sudo KERNEL=/path/to/vmlinuz-6.5 ./tests/e2e/run_all.sh test_live_migration

Building on Clone

Clone is the VM engine. Your product is what you build on top.

What Clone handles (the hard part):

  • VM lifecycle — boot, fork, snapshot, migrate, shutdown
  • Memory efficiency — CoW sharing, overcommit, KSM, balloon reclaim
  • Device I/O — block, network, filesystem sharing, GPU passthrough
  • Isolation — KVM hardware boundary, seccomp, measured boot
  • Control plane — per-VM Unix socket API + daemon for multi-VM orchestration
  • Guest networking — auto bridge/TAP/NAT/DNS, per-VM IP allocation

What you build for your use case (the product):

Use Case You Build
Shell hosting User auth, SSH key injection, template management, quota/billing, web terminal (websocket → serial bridge)
FaaS platform HTTP router → fork → execute → respond → destroy, request queuing, template pool per runtime
CI/CD runners Job scheduler, build script injection, artifact extraction, GitHub/GitLab webhook integration
Dev environments Workspace config (which template, which dirs to mount), IDE integration, persistent overlay management

The pattern is always the same:

1. Create templates for your workload (boot once, snapshot)
2. Fork VMs from templates on demand (<20ms)
3. Inject per-user/per-request state (dirs, env, identity)
4. Run workload
5. Destroy or migrate when done

Clone exposes this via CLI (clone fork, clone run) and Unix socket API. Your orchestration layer calls these and adds the business logic.


Status

25K lines of Rust. 63 e2e tests (62 pass, 1 skip). Single binary.

Working: full VM boot (up to 64GB RAM), 5 virtio devices, PCI/VFIO passthrough, Shadow Clone fork (~160ms with networking and exec), live migration (1ms downtime), snapshots (full + incremental), memory overcommit + KSM + balloon, split memory regions (MMIO hole handling for >3GB VMs), virtio-fs, overlay mode (tmpfs + block), rootfs creation (Alpine, Ubuntu, Debian, Docker import), compressed kernel module support (.ko.zst, .ko.xz), seccomp, measured boot, multi-vCPU SMP, guest agent with remote exec (50ms response), guest networking (auto bridge/TAP/NAT/DNS), userspace vsock (fork-compatible), userspace virtio-net on fork path (ioeventfd + TAP poll), per-VM identity injection (hostname, CID, IP, MAC), D-Bus/service recovery after fork, daemon orchestration (create/fork/snapshot/destroy), console attach, daemonless VM listing.

Needs work: MSI-X interrupt routing (stubbed), SR-IOV, vGPU/mdev, confidential VMs (TDX/SEV), persistent disk overlay (qcow2 per-fork).


License

MIT. Copyright (c) 2026 Unix Shells Limited Company.

About

Minimal VMM for multi-tenant dev shells and serverless

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

 
 
 

Contributors

Languages

  • Rust 91.0%
  • Shell 8.3%
  • Other 0.7%