Tugboat is a system for orchestrating virtual machines in a Kubernetes‑like manner.
Tugboat is a VM orchestration tool that provides Kubernetes‑style functionality while avoiding the heaviness of KubeVirt and the complexity of OpenStack.
It aims to manage VMs declaratively—similar to Kubernetes—using a minimal set of components: etcd, tugboat‑apiserver, tugboat‑agent, tugboat‑runtime, tugboat‑scheduler, and tugboat‑controller‑manager.
Existing VM orchestration systems come with significant challenges:
- Heavy due to running VMs on top of Kubernetes (double layering)
- Complex CRDs and libvirt integration
- Too many components for individuals or small teams
- Very high learning cost
- Operationally difficult Tugboat solves these problems by:
- Inheriting Kubernetes design principles
- Not depending on Kubernetes itself
- Providing a lightweight architecture optimized for VMs
- Remaining simple enough for individuals to run
- Scaling to large deployments through a clean, minimal design
- Kubernetes‑compatible API manifests
- Uses familiar structures such as
TypeMetaandObjectMeta - Can use
kubectl
- Uses familiar structures such as
- Declarative cluster powered by etcd
- The apiserver is stateless; etcd is the single source of truth
- Lightweight control plane
- Only the apiserver and scheduler are required
- Direct QEMU execution
- No libvirt; QEMU is invoked directly
- Clear separation of agent and runtime responsibilities
- Similar to Kubernetes’ kubelet/runtime model
- VM images as OCI artifacts
Imagefile → build → push to registry → referenced by Ship
- CNI support
NetworkClass/ClusterNetworkClassbased network configuration- agent publishes plugin readiness to
Node.status.cniPlugins - scheduler filters nodes with
NetworkFit
- RBAC / ServiceAccount
Role/ClusterRole,RoleBinding/ClusterRoleBinding,ServiceAccountAPI (authorization/v1)- Fine-grained verb- and resource-level access control enforced in the apiserver
- Built-in roles:
cluster-admin,admin,edit,view - Opaque bearer tokens generated per ServiceAccount, stored as Secrets
- Planned CRD support
- High availability design
- Apiserver can scale horizontally
- Scheduler uses Lease‑based leader election
| Kubernetes | Tugboat |
|---|---|
| Pod | Ship |
| ReplicaSet | ReplicaSet |
| Deployment | Deployment |
| Node | Node |
| Container image | VM image (OCI) |
| Dockerfile | Imagefile |
| kubelet | agent |
| RuntimeClass | RuntimeClass |
Note:
Fleetis a Tugboat-specific resource for grouping multiple Ship types that share a private network — it has no direct Kubernetes equivalent.
Defines a VM machine type. This is a cluster‑scoped resource.
apiVersion: v1
kind: ShipClass
metadata:
name: lightweight
spec:
cpu:
architecture: x64
cores: 2
memory:
size: 4GiDefines a VM instance. This is a namespaced resource.
apiVersion: v1
kind: Ship
metadata:
namespace: default
name: ship
spec:
image: ghcr.io/sileader/tugboat-vm-images/ubuntu:24.04
shipClass: lightweight
volumes:
- name: data-disk
persistentVolumeClaim:
claimName: data-disk
- name: app-config
configMap:
name: app-config
- name: app-secret
secret:
secretName: app-secretGroups multiple Ship types that share a private network.
This is a namespaced resource (apps/v1).
apiVersion: apps/v1
kind: Fleet
metadata:
namespace: default
name: my-fleet
spec:
networkClassName: my-network-class
components:
- name: frontend
replicas: 2
shipTemplate:
metadata:
labels:
role: frontend
spec:
image: ghcr.io/sileader/tugboat-vm-images/ubuntu:24.04
shipClass: lightweight
- name: backend
replicas: 3
shipTemplate:
metadata:
labels:
role: backend
spec:
image: ghcr.io/sileader/tugboat-vm-images/ubuntu:24.04
shipClass: lightweightManages a set of identical Ships with rolling-update support.
This is a namespaced resource (apps/v1).
apiVersion: apps/v1
kind: Deployment
metadata:
namespace: default
name: my-deployment
spec:
replicas: 3
selector:
app: my-app
shipTemplate:
metadata:
labels:
app: my-app
spec:
image: ghcr.io/sileader/tugboat-vm-images/ubuntu:24.04
shipClass: lightweightMaintains a stable set of replica Ships.
This is a namespaced resource (apps/v1).
apiVersion: apps/v1
kind: ReplicaSet
metadata:
namespace: default
name: my-replicaset
spec:
replicas: 2
selector:
app: my-app
shipTemplate:
metadata:
labels:
app: my-app
spec:
image: ghcr.io/sileader/tugboat-vm-images/ubuntu:24.04
shipClass: lightweightDeclares the capabilities of the VM runtime on a node (live migration support, hotplug support). This is a cluster‑scoped resource.
apiVersion: v1
kind: RuntimeClass
metadata:
name: standard
spec:
liveMigration: true
hotplug:
cpu:
add: true
remove: false
memory:
add: true
remove: false
nic:
add: true
remove: true
storage:
add: true
remove: trueA Ship can reference a RuntimeClass by name via spec.runtimeClass. The scheduler uses the
RuntimeClassFit plugin to ensure Ships are only placed on nodes whose associated RuntimeClass
satisfies the Ship's requirements (e.g., live migration capability).
CSI-backed volumes can be referenced through spec.volumes[].persistentVolumeClaim. The
legacy volumeClaimRef field is still accepted for backward compatibility.
Current node-side support matrix:
-
BlockvolumeMode -
FilesystemvolumeMode - drivers that require
NodeStageVolume/NodeUnstageVolume -
nodePublishSecretRef/nodeStageSecretRef - agent restart recovery from persisted publish state
- explicit
fs_typeandvolume_attributes -
NodeExpandVolume/ volume expansion - drivers that require controller publish context
-
NodeGetVolumeStats/ CSI volume health + usage surfacing on PV/PVC conditions
Filesystem volumes are exposed to the guest as a 9p share. The mount tag is the Ship
volume name, so both CSI Filesystem claims and projected ConfigMap / Secret
volumes are passed to the guest through the same mechanism.
Control-plane storage support includes PersistentVolume, PersistentVolumeClaim, and StorageClass APIs plus dynamic
CSI provisioning, managed PV cleanup, capacity-aware provisioning/expansion, filesystem claims, and CSI secret /
fsType propagation in tugboat-controller-manager. Node-side support also includes controller-publish-context
handling, live NodeExpandVolume (without Ship recreate) when the driver advertises it, and NodeGetVolumeStats-backed
PV/PVC condition updates for CSI health and usage. The main remaining gaps are scheduler awareness of storage
constraints, richer recovery beyond persisted publish state, and snapshot/clone style workflows.
Tugboat now exposes observed CNI readiness through Node.status.cniPlugins and
publishes NetworkClass.status.readyNodes / ClusterNetworkClass.status.readyNodes
from tugboat-controller-manager. The scheduler's NetworkFit filter uses that
status to reject nodes that do not advertise the plugins required by a Ship's
requested NetworkClass / ClusterNetworkClass.
The current rollout assumes Flannel itself is installed and managed externally. For a manual multi-node validation flow:
- Install the required CNI binaries (
bridge,loopback,flannel, andportmapwhen port mappings are enabled) on each node under the configured CNI bin directory. - Bring up Flannel externally so each node has the expected runtime state
(by default
/run/flannel/subnet.envand/var/lib/cni/flannel). - Start
tugboat-agenton each node and confirmkubectl get node -o yamlshowsstatus.cniPluginswith the expected readiness. - Apply a
ClusterNetworkClassorNetworkClassusingcniPlugin: flanneland confirm itsstatus.readyNodescontains the nodes that passed the probe. - Create Ships that reference that network class and verify they schedule only to ready nodes before performing cross-node connectivity checks.
Tugboat provides a RBAC system. Access control is enforced in the apiserver for every request.
| Resource | API Group | Scope | Description |
|---|---|---|---|
ServiceAccount |
core/v1 |
Namespaced | Identity for automated processes and controllers |
Role |
authorization/v1 |
Namespaced | Permission rules scoped to a single namespace |
ClusterRole |
authorization/v1 |
Cluster | Permission rules that apply cluster-wide |
RoleBinding |
authorization/v1 |
Namespaced | Bind a Role or ClusterRole to subjects within a namespace |
ClusterRoleBinding |
authorization/v1 |
Cluster | Bind a ClusterRole to subjects cluster-wide |
Supported verbs: get, list, watch, create, update, patch, delete, deletecollection
Subject types: User, Group, ServiceAccount
Permission can be further narrowed to specific resource instances via resourceNames.
| ClusterRole | Description |
|---|---|
cluster-admin |
Full access to all resources |
admin |
Full access within a namespace; cannot modify RBAC or namespace itself |
edit |
Read/write access to most namespaced resources; cannot read Secrets or modify RBAC |
view |
Read-only access to most namespaced resources |
apiVersion: v1
kind: ServiceAccount
metadata:
namespace: default
name: my-service-accountGrants read access to Ships within the default namespace.
apiVersion: authorization/v1
kind: Role
metadata:
namespace: default
name: ship-reader
rules:
- apiGroups: [ "" ]
resources: [ "ships" ]
verbs: [ "get", "list", "watch" ]Grants read access to Nodes cluster-wide.
apiVersion: authorization/v1
kind: ClusterRole
metadata:
name: node-reader
rules:
- apiGroups: [ "" ]
resources: [ "nodes" ]
verbs: [ "get", "list", "watch" ]Binds ship-reader to a user and a ServiceAccount within the default namespace.
apiVersion: authorization/v1
kind: RoleBinding
metadata:
namespace: default
name: ship-reader-binding
roleRef:
apiGroup: authorization
kind: Role
name: ship-reader
subjects:
- kind: User
name: alice
- kind: ServiceAccount
namespace: default
name: my-service-accountGrants the node-reader ClusterRole to an entire group cluster-wide.
apiVersion: authorization/v1
kind: ClusterRoleBinding
metadata:
name: node-reader-binding
roleRef:
apiGroup: authorization
kind: ClusterRole
name: node-reader
subjects:
- kind: Group
name: ops-team- Signed JWT tokens — current tokens are opaque random strings stored in Secrets; planned upgrade to signed JWTs with audience and expiry
- ServiceAccount token projection — automatic mounting of scoped tokens into Ships (similar to Kubernetes projected service account tokens)
- Aggregated ClusterRoles — compose ClusterRoles by label selector so extensions can inject rules automatically
- OIDC integration — validate tokens issued by external identity providers (e.g. Dex, Keycloak, cloud IAM) via standard OIDC discovery
- Audit logging — structured audit records for every API request (who, what, when, response code) with configurable per-resource verbosity
- tugboat-runtime
- tugboat-resources (Resource definitions)
- tugboat-resource-store (etcd wrapper for apiserver)
- tugboat‑apiserver
- tugboat-client
- tugboat-cli build (Build a VM Image from a Imagefile)
- fieldSelector and labelSelector
- tugboat‑scheduler
- tugboat‑agent
- Node auto-registration
- Reconcile on Ship Added events
- Networking (CNI, NetworkClass / ClusterNetworkClass)
- Reconcile on Ship Modified events
- Reconcile on Ship Deleted events
- Storage (CSI publish/stage, controller publish context, and live expansion)
- Topology-aware scheduling and snapshot-style workflows
- Secret
- Namespace resource definition and API (
core/v1) - tugboat-controller-manager
- Dynamic CSI volume provisioning
- CSI-backed managed PV cleanup
- ReplicaSet controller (maintaining the prescribed number of Ships)
- Deployment controller (rolling-update management of ReplicaSets)
- Fleet controller
- Fleet resource definition and API (
apps/v1) - ReplicaSet resource definition and API (
apps/v1) - Deployment resource definition and API (
apps/v1) - ConfigMap
- NetworkClass / ClusterNetworkClass resource definition and API (
core/v1) - PersistentVolume / PersistentVolumeClaim / StorageClass resource definition and API (
core/v1) - Lease resource definition and API (
coordination/v1) - Live migration
- Core migration triggered by
target_node_name - Migration state machine (Pending, Ready, Migrating, Completed, Failed)
- Migration status and conditions reflected on Ship
- Preflight compatibility checks (CPU, shared-storage eligibility, target network capability)
- Reliable recovery and explicit error reporting on migration failure
- Guest/network continuity via deterministic bridge/interface/MAC identity across nodes
- Timeout detection with automatic QEMU cancel (Pending: 2 min, Migrating: 30 min)
- Per-ShipClass configurable QEMU migration parameters (bandwidth, downtime, xbzrle cache, postcopy)
- Scheduler StorageFit plugin rejects nodes for Ships with non-RWX volumes
- Core migration triggered by
- RuntimeClass
- Resource definition and API (
core/v1) -
spec.runtimeClassfield on Ship - Scheduler
RuntimeClassFitplugin (live migration capability check) - Hotplug operations gated by RuntimeClass flags
- Resource definition and API (
- RBAC / ServiceAccount
-
ServiceAccountresource definition and API (core/v1) -
Role/ClusterRoleresource definition and API (authorization/v1) -
RoleBinding/ClusterRoleBindingresource definition and API (authorization/v1) - RBAC authorization enforcement in apiserver
- Built-in roles (
cluster-admin,admin,edit,view) - ServiceAccount token generation and validation
- Opaque bearer token issued via Secret of type
service-account-token - Default ServiceAccount auto-created per namespace
- Signed JWT tokens with audience/expiry
- Opaque bearer token issued via Secret of type
- ServiceAccount token auto-projection into Ships
- OIDC integration for external identity providers
- Aggregated ClusterRoles
- Audit logging
-
- CRD
Tugboat is still in an early stage, and contributions of any kind are welcome.
Apache License 2.0 See LICENSE for details.