From 2ac8c5cea5cce5a8f9abe423e8e1aeab898a7c17 Mon Sep 17 00:00:00 2001
From: Zac Dover <zac.dover@proton.me>
Date: Tue, 28 Apr 2026 14:32:02 +1000
Subject: [PATCH] Fix link structure

Fix the link structure in preparation for changing the topology of the
sidebar in vitepress.

Signed-off-by: Zac Dover <zac.dover@proton.me>
---
 docs/architecture/cloud-storage/ceph/ceph.md  | 461 ++++++++++++++++++
 .../cloud-storage/ceph/chorus/chorus.md       |  22 +
 .../cloud-storage/ceph/prysm/prysm.md         |  37 ++
 .../cloud-storage/ceph/rook/rook.md           |  37 ++
 docs/architecture/cluster/cluster.md          |  69 +++
 5 files changed, 626 insertions(+)
 create mode 100644 docs/architecture/cloud-storage/ceph/ceph.md
 create mode 100644 docs/architecture/cloud-storage/ceph/chorus/chorus.md
 create mode 100644 docs/architecture/cloud-storage/ceph/prysm/prysm.md
 create mode 100644 docs/architecture/cloud-storage/ceph/rook/rook.md
 create mode 100644 docs/architecture/cluster/cluster.md

diff --git a/docs/architecture/cloud-storage/ceph/ceph.md b/docs/architecture/cloud-storage/ceph/ceph.md
new file mode 100644
index 0000000..02ce1ce
--- /dev/null
+++ b/docs/architecture/cloud-storage/ceph/ceph.md
@@ -0,0 +1,461 @@
+---
+title: Ceph 
+---
+
+# Ceph 
+Ceph is a clustered and distributed storage manager.
+
+Ceph uniquely delivers object, block, and file storage in one unified system.
+Ceph is highly reliable, easy to manage, and free. Ceph delivers extraordinary
+scalability–thousands of clients accessing petabytes to exabytes of data. A
+Ceph Node leverages commodity hardware and intelligent daemons, and a Ceph
+Storage Cluster accommodates large numbers of nodes, which communicate with
+each other to replicate and redistribute data dynamically.
+
+## Architecture
+
+### Ceph Block Device Summary (RBD)
+
+#### Overview of RBD
+
+A block is a sequence of bytes, often 512 bytes in size. Block-based storage
+interfaces represent a mature and common method for storing data on various
+media types including hard disk drives (HDDs), solid-state drives (SSDs),
+compact discs (CDs), floppy disks, and magnetic tape. The widespread adoption
+of block device interfaces makes them an ideal fit for mass data storage
+applications, including their integration with Ceph storage systems.
+
+#### Core Features
+
+Ceph block devices are designed with three fundamental characteristics:
+thin-provisioning, resizability, and data striping across multiple Object
+Storage Daemons (OSDs). These devices leverage the full capabilities of RADOS
+(Reliable Autonomic Distributed Object Store), including snapshotting,
+replication, and strong consistency guarantees. Ceph block storage clients
+establish communication with Ceph clusters through two primary methods: kernel
+modules or the librbd library.
+
+An important distinction exists between these two communication methods
+regarding caching behavior. Kernel modules have the capability to utilize Linux
+page caching for performance optimization. For applications that rely on the
+librbd library, Ceph provides its own RBD (RADOS Block Device) caching
+mechanism to enhance performance.
+
+#### Performance and Scalability
+
+Ceph's block devices are engineered to deliver high performance combined with
+vast scalability capabilities. This performance extends to various deployment
+scenarios, including direct integration with kernel modules and virtualization
+environments. The architecture supports Key-Value Machines (KVMs) such as QEMU,
+enabling efficient virtualized storage operations.
+
+Cloud-based computing platforms have embraced Ceph block devices as a storage
+backend solution. Major cloud computing systems including OpenStack, OpenNebula,
+and CloudStack integrate with Ceph block devices through their reliance on
+libvirt and QEMU technologies. This integration allows these cloud platforms to
+leverage Ceph's distributed storage capabilities for their virtual machine
+storage requirements.
+
+#### Unified Storage Cluster
+
+One of Ceph's significant architectural advantages is its ability to support
+multiple storage interfaces simultaneously within a single cluster. The same
+Ceph cluster can concurrently operate the Ceph RADOS Gateway for object
+storage, the Ceph File System (CephFS) for file-based storage, and Ceph block
+devices for block-based storage. This unified approach eliminates the need for
+separate storage infrastructure for different storage paradigms, simplifying
+management and reducing operational overhead.
+
+This multi-interface capability allows organizations to deploy a single storage
+solution that addresses diverse storage requirements, from traditional block
+storage for databases and virtual machines to object storage for unstructured
+data and file storage for shared filesystems. The convergence of these storage
+types within one cluster provides operational efficiency and cost-effectiveness
+while maintaining the performance and reliability characteristics required for
+enterprise deployments.
+
+#### Technical Implementation
+
+The thin-provisioning feature of Ceph block devices means that storage space is
+allocated only as data is written, rather than pre-allocating the entire volume
+capacity upfront. This approach optimizes storage utilization by avoiding waste
+from unused pre-allocated space and allows for oversubscription strategies
+where the sum of provisioned capacity can exceed physical capacity, based on
+actual usage patterns.
+
+The resizable nature of Ceph block devices provides operational flexibility,
+allowing administrators to expand or contract volume sizes based on changing
+application requirements without disrupting service availability. This dynamic
+sizing capability supports evolving storage needs without requiring complex
+migration procedures or extended downtime windows.
+
+Data striping across multiple OSDs distributes data blocks across the cluster's
+storage nodes. This distribution achieves two critical objectives: it increases
+aggregate throughput by allowing parallel I/O operations across multiple
+devices, and it ensures data availability through the replication mechanisms
+built into RADOS. The striping process breaks data into smaller chunks that are
+distributed according to the cluster's CRUSH (Controlled Scalable Decentralized
+Placement of Replicated Data) algorithm, which determines optimal placement
+based on cluster topology and configured policies.
+
+#### RADOS Integration
+
+The integration with RADOS provides Ceph block devices with enterprise-grade
+features. Snapshotting capability enables point-in-time copies of block devices,
+supporting backup operations, testing scenarios, and recovery procedures.
+Snapshots are space-efficient, storing only changed data rather than full
+copies, and can be created instantaneously without impacting ongoing operations.
+
+Replication ensures data durability by maintaining multiple copies of data
+across different cluster nodes. The replication factor is configurable,
+allowing organizations to balance storage efficiency against data protection
+requirements. Strong consistency guarantees ensure that all replicas reflect the
+same data state, preventing split-brain scenarios and ensuring data integrity
+even during failure conditions.
+
+The communication architecture between block storage clients and Ceph clusters
+through kernel modules or librbd provides flexibility in deployment scenarios.
+Kernel module integration enables direct access from operating systems, while
+librbd allows applications to interact with Ceph block devices programmatically,
+supporting a wide range of use cases from bare-metal servers to containerized
+applications.
+
+#### Conclusion
+
+Ceph block devices represent a sophisticated implementation of block storage
+that combines the traditional simplicity of block-based interfaces with modern
+distributed storage capabilities. The thin-provisioned, resizable architecture
+with data striping across multiple OSDs provides a foundation for scalable,
+high-performance storage. Integration with RADOS brings enterprise features
+including snapshotting, replication, and strong consistency, while support for
+both kernel modules and librbd ensures broad compatibility across deployment
+scenarios. The ability to run block devices alongside object and file storage
+within a unified cluster positions Ceph as a comprehensive storage solution
+capable of addressing diverse organizational storage requirements through a
+single infrastructure platform. This convergence of capabilities, combined with
+proven integration with major virtualization and cloud platforms, establishes
+Ceph block devices as a viable solution for modern data center storage needs.
+
+### RADOS Gateway (RGW) in Summary 
+
+#### Introduction
+
+RADOS Gateway, commonly referred to as RGW or radosgw, is Ceph's object storage
+interface that provides applications with a RESTful gateway to store objects
+and metadata in a Ceph cluster. As one of Ceph's three primary storage
+interfaces alongside CephFS (file storage) and RBD (block storage), RGW
+transforms Ceph's underlying RADOS object store into a scalable, S3 and
+Swift-compatible object storage service. This enables organizations to build
+cloud storage solutions that are compatible with industry-standard APIs while
+leveraging Ceph's distributed architecture for reliability, scalability, and
+performance.
+
+#### Architecture and Design
+
+RGW operates as a FastCGI or standalone HTTP service that sits atop the Ceph
+Storage Cluster. Unlike direct RADOS access, RGW provides a higher-level
+abstraction specifically designed for object storage workloads. The gateway
+maintains its own data formats, user database, authentication mechanisms, and
+access control systems independent of the underlying Ceph cluster's
+authentication.
+
+When a client stores data through RGW, the gateway receives HTTP requests,
+authenticates the user, authorizes the operation, and then translates the
+request into RADOS operations. Objects stored via RGW are ultimately persisted
+as RADOS objects in the Ceph cluster, but RGW manages the mapping between
+S3/Swift objects and the underlying RADOS objects. This abstraction layer allows
+a single S3 or Swift object to potentially map to multiple RADOS objects,
+particularly for large files that are striped across the cluster.
+
+#### API Compatibility
+
+One of RGW's most significant features is its dual API compatibility. RGW
+provides RESTful interfaces compatible with both Amazon S3 and OpenStack Swift,
+enabling applications designed for these platforms to work with Ceph without
+modification. This compatibility extends beyond basic object operations to
+include advanced features like multipart uploads, versioning, lifecycle
+management, and bucket policies.
+
+The S3-compatible API supports a comprehensive set of operations including
+bucket creation and deletion, object PUT/GET/DELETE operations, ACL management,
+and metadata handling. The Swift-compatible API provides similar functionality
+using Swift's terminology and conventions, with containers instead of buckets
+and account/container/object hierarchy. Importantly, RGW implements a unified
+namespace, meaning data written through the S3 API can be read through the Swift
+API and vice versa, providing exceptional flexibility for multi-application
+environments.
+
+#### Multi-Tenancy and User Management
+
+RGW implements sophisticated multi-tenancy capabilities that allow multiple
+independent users and organizations to share the same Ceph cluster while
+maintaining complete isolation. The system supports multiple authentication
+mechanisms including built-in user management, LDAP integration, and integration
+with external authentication systems like Keystone for OpenStack environments.
+
+Users in RGW are organized into a hierarchical structure. Each user belongs to a
+tenant (which can be implicit or explicit), and users can have multiple access
+keys for different applications or purposes. RGW manages user credentials,
+quotas, and usage statistics independently, enabling service providers to offer
+object storage as a multi-tenant service with per-user billing and resource
+limits.
+
+#### Data Organization
+
+RGW organizes data using a bucket-based model for S3 compatibility (containers
+in Swift terminology). Buckets are logical containers that hold objects, with
+each bucket having its own policies, ACLs, and configuration. Objects within
+buckets are identified by unique keys and can include arbitrary metadata
+alongside the actual data payload.
+
+Internally, RGW uses multiple RADOS pools to organize different types of data.
+Separate pools typically store bucket indexes, data objects, and metadata,
+allowing administrators to apply different replication or erasure coding
+strategies to different data types. For example, bucket indexes might use
+replication for fast access while large data objects use erasure coding for
+storage efficiency.
+
+#### Advanced Features
+
+RGW supports numerous advanced object storage features that make it suitable for
+production deployments. Object versioning allows multiple versions of the same
+object to coexist, enabling recovery from accidental overwrites or deletions.
+Lifecycle management policies automate the transition of objects between storage
+classes or deletion after specified periods, reducing storage costs and
+administrative overhead.
+
+Server-side encryption provides data protection at rest, with support for
+multiple encryption modes including customer-provided keys. Cross-origin
+resource sharing (CORS) configuration enables web applications to access RGW
+directly from browsers. Bucket notifications allow applications to receive
+real-time events when objects are created, deleted, or modified, enabling
+event-driven architectures.
+
+#### Scalability and Performance
+
+RGW's architecture enables horizontal scaling to meet growing storage and
+throughput demands. Multiple RGW instances can be deployed behind load
+balancers to distribute client requests across many gateways. Each RGW instance
+operates independently, communicating directly with the underlying Ceph
+cluster, avoiding any single point of contention.
+
+For improved performance, RGW implements various optimization strategies. It
+can cache frequently accessed objects and metadata to reduce latency for
+popular content. Asynchronous operations handle time-consuming tasks like
+garbage collection and data synchronization without blocking client requests.
+The gateway also supports byte-range requests, enabling efficient partial
+object retrieval for large files and supporting features like HTTP video
+streaming.
+
+#### Multi-Site Capabilities
+
+RGW includes robust multi-site replication capabilities for disaster recovery,
+geographic distribution, and compliance requirements. The multi-site
+architecture supports active-active configurations where multiple RGW clusters
+can accept writes simultaneously, with changes automatically synchronized
+across sites. This enables organizations to build globally distributed object
+storage systems with local read/write access and automatic data replication.
+
+Metadata and data can be replicated independently with different strategies,
+allowing for flexible topology designs. Zone groups organize multiple zones
+(independent RGW deployments) into replication domains, while periods define
+consistent configuration states across all zones. This sophisticated
+replication framework supports complex scenarios like hub-and-spoke topologies,
+full-mesh replication, and tiered storage architectures.
+
+#### Monitoring and Operations
+
+RGW provides comprehensive monitoring capabilities through usage statistics,
+performance metrics, and administrative APIs. Administrators can track
+bandwidth consumption, request rates, and storage utilization on a per-user or
+per-bucket basis. Integration with standard monitoring tools allows RGW metrics
+to be collected and visualized alongside other infrastructure components.
+
+The admin API enables programmatic management of users, buckets, and quotas,
+facilitating automation and integration with billing systems or custom
+management tools. Command-line tools provide capabilities for troubleshooting,
+data inspection, and emergency operations.
+
+#### Conclusion
+
+RADOS Gateway represents a mature, feature-rich object storage solution that
+brings cloud-compatible APIs to Ceph's distributed storage platform. By
+providing S3 and Swift compatibility, RGW enables organizations to build
+private cloud storage solutions or offer object storage as a service while
+maintaining control over their infrastructure. Its scalability, multi-tenancy
+support, and advanced features make it suitable for use cases ranging from
+backup and archive to content distribution and application data storage. As
+part of the unified Ceph storage platform, RGW benefits from the same
+reliability, performance, and operational characteristics that make Ceph a
+leading choice for software-defined storage solutions.
+
+### CephFS in Summary 
+
+#### Introduction
+
+CephFS (Ceph File System) is Ceph's distributed file system interface that
+provides POSIX-compliant file storage built on top of the RADOS object store.
+As one of Ceph's three primary storage interfaces alongside RBD (block storage)
+and RGW (object storage), CephFS enables users to mount a shared filesystem that
+appears as a traditional hierarchical directory structure while leveraging
+Ceph's distributed storage capabilities for scalability, reliability, and
+performance. This combination of familiar filesystem semantics with enterprise
+storage features makes CephFS suitable for workloads ranging from home
+directories and shared application data to high-performance computing and big
+data analytics.
+
+#### Architecture and Components
+
+CephFS operates through a carefully designed architecture that separates data
+and metadata management. At its core, CephFS relies on two essential components:
+the Metadata Server (MDS) and the underlying RADOS storage cluster that stores
+both file data and metadata.
+
+The Metadata Server daemon (ceph-mds) manages all filesystem metadata including
+directory structures, file ownership, permissions, access timestamps, and
+extended attributes. Unlike traditional filesystems where metadata resides on
+the same storage devices as data, CephFS stores metadata in dedicated RADOS
+pools, allowing it to be replicated and distributed independently. This
+separation enables CephFS to scale metadata operations independently of data
+operations, a critical capability for large-scale deployments.
+
+File data in CephFS is stored as RADOS objects distributed across the cluster's
+Object Storage Daemons (OSDs). When a client writes a file, CephFS stripes the
+data across multiple objects according to configurable striping parameters,
+enabling parallel I/O and leveraging the aggregate bandwidth of multiple storage
+devices. This architecture allows CephFS to scale from gigabytes to petabytes
+while maintaining consistent performance characteristics.
+
+#### POSIX Compliance and Compatibility
+
+CephFS provides strong POSIX compliance, supporting the vast majority of
+standard filesystem operations expected by applications and users. This includes
+hierarchical directory structures, standard file permissions and ownership,
+symbolic and hard links, extended attributes, and file locking mechanisms. The
+POSIX compliance ensures that existing applications can use CephFS without
+modification, making it a drop-in replacement for traditional network filesystems
+like NFS or SMB.
+
+Clients can access CephFS through multiple methods. The kernel client integrates
+directly with the Linux kernel, providing native filesystem performance and
+supporting standard mount operations. FUSE (Filesystem in User Space) clients
+enable CephFS mounting on systems without kernel module support or in situations
+requiring non-root access. Additionally, libcephfs provides a library interface
+for applications to interact with CephFS programmatically, enabling custom
+integration scenarios.
+
+#### Metadata Server Design
+
+The MDS represents a sophisticated component designed specifically for
+distributed metadata management. In CephFS, metadata operations like listing
+directories, creating files, or checking permissions can dominate workload
+patterns, particularly with applications handling many small files. By
+maintaining metadata in memory and leveraging high-performance RADOS operations
+for persistence, the MDS achieves low-latency metadata operations essential for
+good filesystem performance.
+
+CephFS supports multiple MDS daemons operating simultaneously, enabling both
+high availability and horizontal scalability. In active-passive configurations,
+standby MDS daemons monitor active instances and can take over immediately if an
+active MDS fails, with the transition handled automatically by Ceph monitors.
+The journal stored in RADOS ensures that no metadata operations are lost during
+failover.
+
+For scalability, CephFS implements dynamic subtree partitioning, allowing
+multiple active MDS daemons to divide the filesystem namespace among themselves.
+The system automatically balances load by migrating directory subtrees between
+MDS instances based on access patterns. A heavily accessed directory can even be
+sharded across multiple MDS daemons, with each daemon handling different entries
+within the same directory. This dynamic load balancing ensures that metadata
+operations scale with the number of active MDS instances.
+
+#### Performance Characteristics
+
+CephFS delivers strong performance across diverse workloads through several
+architectural optimizations. Client-side caching reduces latency for frequently
+accessed data and metadata, with cache coherency maintained through distributed
+locking mechanisms managed by the MDS. This caching enables multiple clients to
+access the same files efficiently while maintaining consistency.
+
+The striping of file data across multiple RADOS objects enables high-bandwidth
+sequential I/O operations, with clients performing parallel reads and writes
+directly to OSDs. For large files, this parallelism allows CephFS to saturate
+available network bandwidth and leverage the aggregate throughput of many
+storage devices simultaneously.
+
+Metadata performance benefits from the MDS's in-memory metadata cache and
+efficient RADOS operations for persistence. For workloads with good locality,
+where applications repeatedly access files within the same directory trees, the
+MDS cache provides excellent performance. The ability to scale metadata
+operations through multiple active MDS daemons addresses the metadata bottleneck
+that plagues many distributed filesystems at scale.
+
+#### Snapshots and Quotas
+
+CephFS provides sophisticated snapshot capabilities enabling point-in-time
+copies of directory trees. Snapshots are space-efficient, storing only changed
+data rather than full copies, and can be created instantly on any directory
+within the filesystem. Users can browse snapshot contents through a special
+`.snap` directory and restore files or entire directory trees as needed.
+Administrative snapshots enable backup and recovery strategies while
+user-accessible snapshots provide self-service recovery from accidental
+deletions or modifications.
+
+Directory quotas allow administrators to limit storage consumption at any point
+in the directory hierarchy. Quotas can restrict both the total bytes consumed
+and the number of files, with enforcement occurring at write time. This enables
+multi-tenant deployments where different users or projects share a filesystem
+while preventing any single entity from consuming excessive resources.
+
+#### Multiple Filesystems
+
+Recent CephFS versions support multiple independent filesystems within a single
+Ceph cluster, each with its own namespace, MDS cluster, and data pools. This
+capability enables isolation between different use cases or tenants while
+sharing the underlying storage infrastructure. Each filesystem can be configured
+with different parameters, replication strategies, or performance
+characteristics appropriate to its specific workload requirements.
+
+#### Security and Access Control
+
+CephFS implements multiple layers of security. Path-based access restrictions
+allow administrators to limit client access to specific directory subtrees,
+enabling multi-tenant scenarios where different clients see only their allocated
+portions of the filesystem. CephX authentication integrates with Ceph's native
+authentication system, ensuring that only authorized clients can mount the
+filesystem.
+
+Standard POSIX permissions and ACLs provide fine-grained access control at the
+file and directory level, allowing familiar Unix-style permission management.
+Extended attributes enable additional metadata storage for applications
+requiring custom attributes or security labels.
+
+#### Use Cases and Applications
+
+CephFS excels in scenarios requiring shared filesystem access across multiple
+clients. Home directories, shared application data, and collaborative workspaces
+benefit from CephFS's strong consistency and POSIX compatibility. High
+performance computing environments leverage CephFS for shared job data and
+scratch space, taking advantage of the parallel I/O capabilities and scalability.
+
+Content creation workflows in media and entertainment utilize CephFS for shared
+storage of large media files, benefiting from high bandwidth and the ability to
+scale capacity and performance independently. Big data analytics platforms use
+CephFS for storing datasets that multiple processing nodes must access
+simultaneously.
+
+#### Conclusion
+
+CephFS represents a mature, scalable distributed filesystem that brings POSIX
+compatibility to Ceph's distributed storage platform. By separating metadata and
+data management, supporting multiple active MDS daemons, and leveraging RADOS
+for reliable distributed storage, CephFS delivers enterprise-grade filesystem
+capabilities suitable for demanding production workloads. Its combination of
+familiar filesystem semantics, strong performance, and advanced features like
+snapshots and dynamic metadata scaling makes CephFS a compelling choice for
+organizations requiring shared filesystem storage at scale.
+
+## See Also
+The architecture of the Ceph cluster is explained in [the Architecture
+chapter of the upstream Ceph
+documentation](https://docs.ceph.com/en/latest/architecture/)
diff --git a/docs/architecture/cloud-storage/ceph/chorus/chorus.md b/docs/architecture/cloud-storage/ceph/chorus/chorus.md
new file mode 100644
index 0000000..017cb9f
--- /dev/null
+++ b/docs/architecture/cloud-storage/ceph/chorus/chorus.md
@@ -0,0 +1,22 @@
+---
+title: Chorus
+---
+
+# Chorus 
+
+Chorus is data replication software designed for Object Storage systems,
+supporting S3 and OpenStack Swift APIs. It enables zero-downtime migration
+between storage systems, maintains synchronized backups for disaster recovery,
+and verifies migration integrity through consistency checks.
+
+Chorus operates through two main components: Chorus Proxy, an S3 proxy that
+captures changes, and Chorus Worker, which processes replication tasks and
+webhook events. Users configure storage credentials, designating one endpoint
+as "main" while others become "followers." Requests route through Chorus's S3
+API to the main storage and asynchronously replicate to follower endpoints.
+
+The system supports user-level and bucket-level replication policies, allowing
+users to pause and resume replication via web admin UI or CLI. Chorus handles
+initial replication of existing data in the background and can accept change
+events via webhooks when proxy deployment isn't feasible, supporting S3 bucket
+notifications and Swift access-log events.
diff --git a/docs/architecture/cloud-storage/ceph/prysm/prysm.md b/docs/architecture/cloud-storage/ceph/prysm/prysm.md
new file mode 100644
index 0000000..47b02b6
--- /dev/null
+++ b/docs/architecture/cloud-storage/ceph/prysm/prysm.md
@@ -0,0 +1,37 @@
+---
+title: Prysm 
+---
+
+# Prysm 
+
+Prysm is a comprehensive observability CLI tool developed by CobaltCore for
+monitoring [Ceph](../ceph.md) storage clusters and RADOS Gateway (RGW)
+deployments. Prysm provides a multi-layered architecture designed to deliver
+real-time monitoring, data collection, and analysis across Ceph environments.
+
+Prysm employs a four-tier architecture consisting of Consumers, NATS
+messaging, Remote Producers, and Nearby Producers. This design enables flexible
+data collection from diverse sources within Ceph infrastructure. Remote
+Producers gather metrics via APIs from outside the monitored environment,
+collecting data such as RGW bucket notifications, quota usage, and RadosGW
+usage statistics. Nearby Producers operate within the same network as Ceph
+clusters, providing direct access to logs, metrics, and hardware sensors for
+lower latency and higher fidelity monitoring of disk health, kernel metrics,
+and resource usage.
+
+NATS serves as the messaging backbone, routing data between producers and
+consumers with low latency and reliable delivery. Consumers process this data
+to generate alerts, perform analytics, display real-time dashboards, and ensure
+compliance through log analysis.
+
+Prysm supports multiple output formats including console, NATS, and Prometheus,
+making it adaptable to existing monitoring infrastructure. It can function
+standalone for specific tasks such as providing Prometheus metrics endpoints or
+checking disk health through SMART attributes.
+
+Prysm addresses the operational complexity of managing large-scale Ceph
+deployments by providing unified observability across storage clusters, gateway
+services, and underlying hardware components.
+
+## See Also
+[The Prysm Repository](https://github.com/cobaltcore-dev/prysm)
diff --git a/docs/architecture/cloud-storage/ceph/rook/rook.md b/docs/architecture/cloud-storage/ceph/rook/rook.md
new file mode 100644
index 0000000..b11bb26
--- /dev/null
+++ b/docs/architecture/cloud-storage/ceph/rook/rook.md
@@ -0,0 +1,37 @@
+---
+title: Rook
+---
+
+# Rook
+
+Rook is an open-source cloud-native storage orchestrator that automates the
+deployment, configuration, and management of [Ceph](../ceph.md) storage clusters
+within Kubernetes environments. Built as a Kubernetes operator, Rook extends
+Kubernetes with custom resource definitions (CRDs) that allow administrators to
+define and manage Ceph clusters using native Kubernetes APIs and tools.  
+
+Rook eliminates much of the operational complexity traditionally associated
+with running Ceph by leveraging Kubernetes primitives for scheduling,
+self-healing, and scaling. When deployed, Rook runs as a set of pods within the
+Kubernetes cluster, managing the lifecycle of Ceph daemons (monitors, managers,
+OSDs, MDS, and RGW) as containerized workloads. It automatically handles tasks
+such as OSD provisioning from available storage devices, the management of the
+monitor quorum. 
+
+The system provides declarative configuration through YAML manifests, enabling
+infrastructure-as-code practices for storage management. Administrators can
+define storage classes that map to Ceph pools, allowing applications to
+dynamically provision persistent volumes for block storage (RBD), shared file
+systems (CephFS), or object storage (RGW) through standard Kubernetes
+mechanisms.  
+
+Rook continuously monitors cluster health and automatically responds to
+failures by restarting failed daemons, replacing unhealthy OSDs, and
+maintaining desired state as defined in the cluster specifications. It
+integrates with [Kubernetes](../../../cluster.md) monitoring and logging systems,
+providing visibility into storage operations alongside application workloads.
+
+## See Also 
+1. [The rook.io page](https://rook.io/)
+1. [The Rook Documentation](https://rook.io/docs/rook/latest-release/Getting-Started/intro/)
+1. [The Rook project repository](https://github.com/rook/rook)
diff --git a/docs/architecture/cluster/cluster.md b/docs/architecture/cluster/cluster.md
new file mode 100644
index 0000000..4f9c34b
--- /dev/null
+++ b/docs/architecture/cluster/cluster.md
@@ -0,0 +1,69 @@
+---
+title: Kubernetes Cluster
+---
+
+# Kubernetes Cluster 
+
+The CobaltCore cluster is a Kubernetes-based environment designed to manage hypervisor nodes and their associated workloads. 
+It provides a robust framework for deploying, scaling, and maintaining virtual machines across multiple hypervisor nodes.
+
+The cluster is provisioned using [IronCore](https://ironcore.dev/), which automates the discovery, provisioning, and evacuation of hypervisor nodes.
+
+Components of the cluster, which are not required to be run on every hypervisor node, are deployed as Kubernetes Deployments.
+
+## Hypervisor Operator
+
+::: tip Source Code
+[github.com/cobaltcore-dev/openstack-hypervisor-operator](https://github.com/cobaltcore-dev/openstack-hypervisor-operator)
+:::
+
+The Kubernetes operator that manages the lifecycle of hypervisor nodes.
+It ensures a newly discovered node is properly configured and integrated into the cluster.
+After the initial onboarding, the operator runs a final check to ensure the node is ready for use.
+The operator also handles the evacuation of nodes in case of failures or maintenance.
+
+## HA Service
+
+::: tip Source Code
+[github.com/cobaltcore-dev/kvm-ha-service](https://github.com/cobaltcore-dev/kvm-ha-service)
+:::
+
+The **KVM High Availability Service** is a central component that monitors the health and status of hypervisor nodes and their virtual machines.
+It collects telemetry data from the KVM HA Agent, processes it, and provides insights into the state of the hypervisors and their workloads.
+It is responsible for ensuring that critical workloads remain operational even in the event of failures.
+
+```mermaid
+graph LR;
+    subgraph application [Application]
+    source(Sources tasks);
+    monitoring(Monitoring tasks);
+    hypervisors(Hypervisors task);
+    config("Configuration (YAML)")
+    end
+    
+    monitoring --> |evacuate| nova;
+
+    endpoints("http(s) endpoints") ---|pull metrics| source;
+    senders("http(s) senders") ---|push telemetry| source;
+
+    subgraph database [Database]
+    sqlite
+    end
+
+    source ---> |add telemetry| database;
+    monitoring <--> |check telemetry| database;
+
+    hypervisors ---> database;
+
+    hypervisors ---|refresh hypervisors| nova;
+
+    subgraph hypervisor [Hypervisors]
+    Hypervisor1(Hypervisor 1);
+    HypervisorN(Hypervisor n);
+    end
+
+    subgraph openstack [Openstack]
+    nova --- Hypervisor1;
+    nova --- HypervisorN;
+    end
+```