-
Notifications
You must be signed in to change notification settings - Fork 5
Feat/clickhouse replicas #167
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Merged
Merged
Changes from all commits
Commits
Show all changes
9 commits
Select commit
Hold shift + click to select a range
66471ad
feat: clickhouse replicas support
sk-portkey 0fbe6f6
fix: update ClickHouse cluster name in values and documentation
910f823
feat: add ClickHouse migration support with configuration options
7158e91
feat: enhance ClickHouse migration with new configuration options and…
4007e2d
feat: add database configuration for ClickHouse migration and enhance…
3fb3cf7
refactor: update ClickHouse migration configuration and documentation…
fae8438
refactor: update ClickHouse migration configuration and documentation…
60686eb
docs: update ClickHouse replication configuration in documentation to…
3fe05ee
fix: update condition for retrieving old ClickHouse secrets during mi…
File filter
Filter by extension
Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
There are no files selected for viewing
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
| Original file line number | Diff line number | Diff line change |
|---|---|---|
| @@ -0,0 +1,204 @@ | ||
| # Migrating ClickHouse from Built-in to External Cluster | ||
|
|
||
| This guide walks through migrating data from the single-node ClickHouse instance shipped with this chart to an external ClickHouse cluster (e.g. Altinity). | ||
|
|
||
| --- | ||
|
|
||
| ## Prerequisites | ||
|
|
||
| - `kubectl` access to the namespace running Portkey | ||
| - New external ClickHouse cluster deployed (see [clickhouse-replication.md](clickhouse-replication.md)) | ||
| - `clickhouse-client` installed locally | ||
| - Enough disk on a local machine to hold the exported data | ||
|
|
||
| ## Tables to Migrate | ||
|
|
||
| | Table | Timestamp Column | | ||
| |-------|-----------------| | ||
| | `generations` | `created_at` | | ||
| | `feedbacks` | `created_at` | | ||
| | `generation_hooks` | `created_at` | | ||
| | `audit_logs` | `timestamp` | | ||
|
|
||
| --- | ||
|
|
||
| ## Step 1: Enable Migration Mode | ||
|
|
||
| Before switching to the external cluster, enable migration mode to keep the old in-cluster ClickHouse alive alongside the new external one. This ensures you can still access the old data for export. | ||
|
|
||
| **Without existing secrets** (credentials in values): | ||
|
|
||
| ```yaml | ||
| clickhouse: | ||
| external: | ||
| enabled: true | ||
| host: "<new-clickhouse-host>" | ||
| port: "8123" | ||
| nativePort: "9000" | ||
| user: "default" | ||
| password: "<new-ch-password>" | ||
| database: "default" | ||
| replicationEnabled: true | ||
| shardingEnabled: false | ||
| clusterName: "portkey" | ||
| migration: | ||
| enabled: true | ||
| oldCredentials: | ||
| user: "default" | ||
| password: "<old-ch-password>" | ||
| database: "default" | ||
| ``` | ||
|
|
||
| **With existing secrets** (credentials managed externally): | ||
|
|
||
| ```yaml | ||
| clickhouse: | ||
| external: | ||
| enabled: true | ||
| existingSecretName: "clickhouse-secret-new" | ||
| migration: | ||
| enabled: true | ||
| oldCredentials: | ||
| existingSecretName: "clickhouse-secret-old" | ||
| ``` | ||
|
|
||
| The old credentials secret must have `clickhouse_user`, `clickhouse_password`, and `clickhouse_db` keys. The new credentials secret must have all standard keys (`clickhouse_host`, `clickhouse_port`, `clickhouse_native_port`, `clickhouse_user`, `clickhouse_password`, `clickhouse_db`). | ||
|
|
||
| If old and new ClickHouse share the same credentials, omit `oldCredentials` entirely -- it falls back to the external credentials. | ||
|
|
||
| Run the upgrade: | ||
|
|
||
| ```bash | ||
| helm upgrade <release> portkey/portkey-app \ | ||
| -f values.yaml \ | ||
| -n <namespace> | ||
| ``` | ||
|
|
||
| This will: | ||
|
|
||
| 1. Switch the application to the new external ClickHouse immediately. | ||
| 2. Keep the old in-cluster ClickHouse StatefulSet and Service alive so you can export data. | ||
|
|
||
| ### Configuration Reference | ||
|
|
||
| | Value | Default | Description | | ||
| |-------|---------|-------------| | ||
| | `clickhouse.migration.enabled` | `false` | Keep old in-cluster ClickHouse alive while using external | | ||
| | `clickhouse.migration.oldCredentials.existingSecretName` | `""` | Secret with old CH credentials (`clickhouse_user`, `clickhouse_password`, `clickhouse_db` keys) | | ||
| | `clickhouse.migration.oldCredentials.user` | `""` | Old CH username (plain value, used when no secret is set) | | ||
| | `clickhouse.migration.oldCredentials.password` | `""` | Old CH password (plain value, used when no secret is set) | | ||
| | `clickhouse.migration.oldCredentials.database` | `""` | Old CH database name (falls back to `external.database` if empty) | | ||
|
|
||
| ## Step 2: Port-Forward Both Clusters | ||
|
|
||
| Old (built-in) instance: | ||
|
|
||
| ```bash | ||
| kubectl port-forward svc/<release>-portkey-app-clickhouse 9000:9000 -n <namespace> | ||
| ``` | ||
|
|
||
| New (external) cluster: | ||
|
|
||
| ```bash | ||
| kubectl port-forward svc/<new-clickhouse-svc> 9001:9000 -n <namespace> | ||
| ``` | ||
|
|
||
| You now have the old instance on `localhost:9000` and the new cluster on `localhost:9001`. | ||
|
|
||
| ## Step 3: Export Data from the Old Instance | ||
|
|
||
| ```bash | ||
| for table in generations feedbacks generation_hooks audit_logs; do | ||
| echo "Exporting ${table}..." | ||
| clickhouse-client --host localhost --port 9000 \ | ||
| --user "<old-ch-user>" --password "<old-ch-password>" \ | ||
| --query "SELECT * FROM default.${table} FORMAT Native" \ | ||
| > "${table}.native" | ||
| echo "Done: $(ls -lh ${table}.native | awk '{print $5}')" | ||
| done | ||
| ``` | ||
|
|
||
| For very large tables, export in chunks by time range: | ||
|
|
||
| ```bash | ||
| clickhouse-client --host localhost --port 9000 \ | ||
| --user "<old-ch-user>" --password "<old-ch-password>" \ | ||
| --query "SELECT * FROM default.generations WHERE created_at >= '2025-01-01' AND created_at < '2025-02-01' FORMAT Native" \ | ||
| > generations_2025_01.native | ||
| ``` | ||
|
|
||
| ## Step 4: Import Data into the New Cluster | ||
|
|
||
| Wait for the backend to create tables on the new cluster (it does this automatically on startup), then import: | ||
|
|
||
| ```bash | ||
| for table in generations feedbacks generation_hooks audit_logs; do | ||
| echo "Importing ${table}..." | ||
| clickhouse-client --host localhost --port 9001 \ | ||
| --user "<new-ch-user>" --password "<new-ch-password>" \ | ||
| --query "INSERT INTO default.${table} FORMAT Native" \ | ||
| < "${table}.native" | ||
| echo "Done." | ||
| done | ||
| ``` | ||
|
|
||
| ## Step 5: Verify Data Integrity | ||
|
|
||
| ```bash | ||
| for table in generations feedbacks generation_hooks audit_logs; do | ||
| old=$(clickhouse-client --host localhost --port 9000 \ | ||
| --user "<old-ch-user>" --password "<old-ch-password>" \ | ||
| --query "SELECT count() FROM default.${table}") | ||
| new=$(clickhouse-client --host localhost --port 9001 \ | ||
| --user "<new-ch-user>" --password "<new-ch-password>" \ | ||
| --query "SELECT count() FROM default.${table}") | ||
| echo "${table}: old=${old} new=${new} $([ "$old" = "$new" ] && echo 'OK' || echo 'MISMATCH')" | ||
| done | ||
| ``` | ||
|
|
||
| ## Step 6: Decommission the Old Instance | ||
|
|
||
| Once verified, disable migration mode to remove the old in-cluster ClickHouse: | ||
|
|
||
| ```yaml | ||
| clickhouse: | ||
| external: | ||
| enabled: true | ||
| # ... same external config as above | ||
| migration: | ||
| enabled: false | ||
| ``` | ||
|
|
||
| ```bash | ||
| helm upgrade <release> portkey/portkey-app \ | ||
| -f values.yaml \ | ||
| -n <namespace> | ||
| ``` | ||
|
|
||
| This deletes the old StatefulSet, Service, ConfigMap, and ServiceAccount. Clean up the PVC if persistence was enabled: | ||
|
|
||
| ```bash | ||
| kubectl delete pvc -l app.kubernetes.io/component=<release>-portkey-app-clickhouse -n <namespace> | ||
| ``` | ||
|
|
||
| Remove the exported `.native` files from your local machine. | ||
|
|
||
| ## Rollback | ||
|
|
||
| To revert to the built-in ClickHouse: | ||
|
|
||
| ```yaml | ||
| clickhouse: | ||
| external: | ||
| enabled: false | ||
| migration: | ||
| enabled: false | ||
| ``` | ||
|
|
||
| ```bash | ||
| helm upgrade <release> portkey/portkey-app \ | ||
| -f values.yaml \ | ||
| -n <namespace> | ||
| ``` | ||
|
|
||
| If the PVC was not deleted, data will still be intact on the in-cluster instance. If it was deleted, the backend will create empty tables on a fresh instance -- re-import from your exported files. | ||
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
| Original file line number | Diff line number | Diff line change |
|---|---|---|
| @@ -0,0 +1,177 @@ | ||
| # ClickHouse Replication | ||
|
|
||
| The built-in ClickHouse StatefulSet shipped with this chart runs a single-node instance and does not support replication. For production workloads that require high availability and replicated tables, deploy ClickHouse separately using a dedicated chart and point Portkey at it as an external data store. | ||
|
|
||
| ## Prerequisites | ||
|
|
||
| - A running Kubernetes cluster | ||
| - `helm` v3 installed | ||
| - `kubectl` configured for your cluster | ||
|
|
||
| ## Deploy Replicated ClickHouse with Altinity Helm Chart | ||
|
|
||
| The [Altinity Helm Charts](https://github.com/Altinity/helm-charts) project provides a production-grade ClickHouse chart backed by the Altinity Operator. It supports multi-replica, multi-shard clusters with ClickHouse Keeper for coordination. | ||
|
|
||
| ### Step 1: Add the Altinity Helm Repository | ||
|
|
||
| ```bash | ||
| helm repo add altinity https://altinity.github.io/helm-charts | ||
| helm repo update | ||
| ``` | ||
|
|
||
| ### Step 2: Create a Values File | ||
|
|
||
| Create a file called `clickhouse-replicated-values.yaml`: | ||
|
|
||
| ```yaml | ||
| clickhouse: | ||
| replicasCount: 2 | ||
| shardsCount: 1 | ||
|
|
||
| defaultUser: | ||
| password: "<your-clickhouse-password>" | ||
| allowExternalAccess: true | ||
|
|
||
| clusterSecret: | ||
| enabled: true | ||
| auto: true | ||
|
|
||
| persistence: | ||
| enabled: true | ||
| size: 50Gi | ||
| storageClass: "" # set to your preferred StorageClass (e.g. gp3, standard) | ||
|
|
||
| service: | ||
| type: ClusterIP | ||
|
|
||
| settings: | ||
| max_table_size_to_drop: "0" | ||
|
|
||
| keeper: | ||
| enabled: true | ||
| replicaCount: 3 | ||
| localStorage: | ||
| size: 5Gi | ||
|
|
||
| operator: | ||
| enabled: true # set to false if the Altinity Operator is already installed | ||
| ``` | ||
|
|
||
| Adjust `replicasCount`, `shardsCount`, keeper `replicaCount`, and storage sizes to match your requirements. | ||
|
|
||
| ### Step 3: Install the Chart | ||
|
|
||
| ```bash | ||
| helm install portkey altinity/clickhouse \ | ||
| -f clickhouse-replicated-values.yaml \ | ||
| -n portkey \ | ||
| --create-namespace | ||
| ``` | ||
|
|
||
| ### Step 4: Get the ClickHouse Service Endpoint | ||
|
|
||
| After the pods are running, get the service name: | ||
|
|
||
| ```bash | ||
| kubectl get svc -n portkey -l app.kubernetes.io/name=clickhouse | ||
| ``` | ||
|
|
||
| The service name will typically follow the pattern `clickhouse-<release>`. Use this as the host when configuring Portkey. | ||
|
|
||
| ## Configure Portkey to Use External Replicated ClickHouse | ||
|
|
||
| In your Portkey Helm values file, disable the built-in ClickHouse and point to the external cluster: | ||
|
|
||
| ```yaml | ||
| clickhouse: | ||
| external: | ||
| enabled: true | ||
| host: "<clickhouse-service-name>.<namespace>.svc.cluster.local" | ||
| port: "8123" | ||
| nativePort: "9000" | ||
| user: "default" | ||
| password: "<your-clickhouse-password>" | ||
| database: "default" | ||
| tls: false | ||
| replicationEnabled: true | ||
| shardingEnabled: false | ||
| clusterName: "portkey" | ||
| ``` | ||
|
|
||
| When `replicationEnabled: true`, the backend runs migrations using `ReplicatedMergeTree` instead of `MergeTree` and creates the database with `ENGINE = Replicated(...)`. | ||
|
|
||
| When `shardingEnabled: true`, the backend additionally creates `_local` tables and `Distributed` tables on top, and all DDL is executed with `ON CLUSTER`. | ||
|
|
||
| `clusterName` must match the cluster name in your ClickHouse deployment. When using the Altinity Helm chart, the cluster name is the same as the Helm release name (truncated to 15 characters). For example, `helm install portkey altinity/clickhouse ...` creates a cluster named `portkey`. | ||
|
|
||
| > **Note:** When using the Altinity Helm chart, the cluster name is derived from the Helm release name. Avoid hyphens in the release name as ClickHouse treats them as the minus operator in SQL. Use only alphanumeric characters and underscores (e.g. `helm install portkey_ch` not `helm install portkey-ch`). | ||
|
|
||
| Then deploy or upgrade Portkey: | ||
|
|
||
| ```bash | ||
| helm upgrade --install portkey portkey/portkey-app \ | ||
| -f portkey-values.yaml \ | ||
| -n portkey | ||
| ``` | ||
|
|
||
| ### Using an Existing Secret | ||
|
|
||
| If you manage secrets externally, create a Kubernetes secret with these keys: | ||
|
|
||
| ```yaml | ||
| apiVersion: v1 | ||
| kind: Secret | ||
| metadata: | ||
| name: portkey-clickhouse-external | ||
| namespace: portkey | ||
| type: Opaque | ||
| stringData: | ||
| store: "clickhouse" | ||
| clickhouse_user: "default" | ||
| clickhouse_password: "<password>" | ||
| clickhouse_host: "<host>" | ||
| clickhouse_port: "8123" | ||
| clickhouse_native_port: "9000" | ||
| clickhouse_db: "default" | ||
| clickhouse_tls: "false" | ||
| ``` | ||
|
|
||
| Then reference it in your values. Note that `replicationEnabled`, `shardingEnabled`, and `clusterName` must be set in values even when using an existing secret: | ||
|
|
||
| ```yaml | ||
| clickhouse: | ||
| external: | ||
| enabled: true | ||
| existingSecretName: "portkey-clickhouse-external" | ||
| replicationEnabled: true | ||
| shardingEnabled: false | ||
| clusterName: "portkey" | ||
| ``` | ||
|
avaya09 marked this conversation as resolved.
|
||
|
|
||
| ## Environment Variables | ||
|
|
||
| The following environment variables are set on Portkey services from these values: | ||
|
|
||
| | Helm Value | Backend / Data Service Env | Gateway Env | | ||
| |------------|---------------------------|-------------| | ||
| | `replicationEnabled` | `CLICKHOUSE_REPLICATION_ENABLED` | `ANALYTICS_STORE_REPLICATION_ENABLED` | | ||
| | `shardingEnabled` | `CLICKHOUSE_SHARDING_ENABLED` | `ANALYTICS_STORE_SHARDING_ENABLED` | | ||
| | `clusterName` | `CLICKHOUSE_CLUSTER_NAME` | `ANALYTICS_STORE_CLUSTER_NAME` | | ||
|
|
||
| ## Migration Behavior | ||
|
|
||
| | Mode | `replicationEnabled` | `shardingEnabled` | What Happens | | ||
| |------|---------------------|-------------------|--------------| | ||
| | Single-node | `false` | `false` | `MergeTree` tables, no cluster DDL | | ||
| | Replicated | `true` | `false` | `ReplicatedMergeTree` tables, `ON CLUSTER` DDL, Replicated database engine | | ||
| | Sharded | `false` | `true` | `MergeTree` local tables + `Distributed` tables, `ON CLUSTER` DDL | | ||
| | Replicated + Sharded | `true` | `true` | `ReplicatedMergeTree` local tables + `Distributed` tables, `ON CLUSTER` DDL, Replicated database engine | | ||
|
|
||
| ## Scaling Considerations | ||
|
|
||
| | Parameter | Guidance | | ||
| |-----------|----------| | ||
| | `replicasCount` | 2+ for HA. Each replica holds a full copy of every shard's data. | | ||
| | `shardsCount` | Increase to distribute data horizontally when a single replica can't hold the full dataset. | | ||
| | `keeper.replicaCount` | Must be an odd number (3 or 5). Do **not** change after initial deployment. | | ||
| | `persistence.size` | Size per replica. Plan for data growth + merge overhead (~2x working set). | | ||
Oops, something went wrong.
Oops, something went wrong.
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
Uh oh!
There was an error while loading. Please reload this page.