Skip to content
204 changes: 204 additions & 0 deletions charts/portkey-app/docs/clickhouse-migration.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,204 @@
# Migrating ClickHouse from Built-in to External Cluster

This guide walks through migrating data from the single-node ClickHouse instance shipped with this chart to an external ClickHouse cluster (e.g. Altinity).

---

## Prerequisites

- `kubectl` access to the namespace running Portkey
- New external ClickHouse cluster deployed (see [clickhouse-replication.md](clickhouse-replication.md))
- `clickhouse-client` installed locally
- Enough disk on a local machine to hold the exported data

## Tables to Migrate

| Table | Timestamp Column |
|-------|-----------------|
| `generations` | `created_at` |
| `feedbacks` | `created_at` |
| `generation_hooks` | `created_at` |
| `audit_logs` | `timestamp` |

---

## Step 1: Enable Migration Mode

Before switching to the external cluster, enable migration mode to keep the old in-cluster ClickHouse alive alongside the new external one. This ensures you can still access the old data for export.

**Without existing secrets** (credentials in values):

```yaml
clickhouse:
external:
enabled: true
host: "<new-clickhouse-host>"
port: "8123"
nativePort: "9000"
user: "default"
password: "<new-ch-password>"
database: "default"
replicationEnabled: true
shardingEnabled: false
clusterName: "portkey"
migration:
enabled: true
oldCredentials:
user: "default"
password: "<old-ch-password>"
database: "default"
```

**With existing secrets** (credentials managed externally):

```yaml
clickhouse:
external:
enabled: true
existingSecretName: "clickhouse-secret-new"
migration:
enabled: true
oldCredentials:
existingSecretName: "clickhouse-secret-old"
```

The old credentials secret must have `clickhouse_user`, `clickhouse_password`, and `clickhouse_db` keys. The new credentials secret must have all standard keys (`clickhouse_host`, `clickhouse_port`, `clickhouse_native_port`, `clickhouse_user`, `clickhouse_password`, `clickhouse_db`).

If old and new ClickHouse share the same credentials, omit `oldCredentials` entirely -- it falls back to the external credentials.

Run the upgrade:

```bash
helm upgrade <release> portkey/portkey-app \
-f values.yaml \
-n <namespace>
```
Comment thread
avaya09 marked this conversation as resolved.

This will:

1. Switch the application to the new external ClickHouse immediately.
2. Keep the old in-cluster ClickHouse StatefulSet and Service alive so you can export data.

### Configuration Reference

| Value | Default | Description |
|-------|---------|-------------|
| `clickhouse.migration.enabled` | `false` | Keep old in-cluster ClickHouse alive while using external |
| `clickhouse.migration.oldCredentials.existingSecretName` | `""` | Secret with old CH credentials (`clickhouse_user`, `clickhouse_password`, `clickhouse_db` keys) |
| `clickhouse.migration.oldCredentials.user` | `""` | Old CH username (plain value, used when no secret is set) |
| `clickhouse.migration.oldCredentials.password` | `""` | Old CH password (plain value, used when no secret is set) |
| `clickhouse.migration.oldCredentials.database` | `""` | Old CH database name (falls back to `external.database` if empty) |

## Step 2: Port-Forward Both Clusters

Old (built-in) instance:

```bash
kubectl port-forward svc/<release>-portkey-app-clickhouse 9000:9000 -n <namespace>
```

New (external) cluster:

```bash
kubectl port-forward svc/<new-clickhouse-svc> 9001:9000 -n <namespace>
```

You now have the old instance on `localhost:9000` and the new cluster on `localhost:9001`.

## Step 3: Export Data from the Old Instance

```bash
for table in generations feedbacks generation_hooks audit_logs; do
echo "Exporting ${table}..."
clickhouse-client --host localhost --port 9000 \
--user "<old-ch-user>" --password "<old-ch-password>" \
--query "SELECT * FROM default.${table} FORMAT Native" \
> "${table}.native"
echo "Done: $(ls -lh ${table}.native | awk '{print $5}')"
done
```

For very large tables, export in chunks by time range:

```bash
clickhouse-client --host localhost --port 9000 \
--user "<old-ch-user>" --password "<old-ch-password>" \
--query "SELECT * FROM default.generations WHERE created_at >= '2025-01-01' AND created_at < '2025-02-01' FORMAT Native" \
> generations_2025_01.native
```

## Step 4: Import Data into the New Cluster

Wait for the backend to create tables on the new cluster (it does this automatically on startup), then import:

```bash
for table in generations feedbacks generation_hooks audit_logs; do
echo "Importing ${table}..."
clickhouse-client --host localhost --port 9001 \
--user "<new-ch-user>" --password "<new-ch-password>" \
--query "INSERT INTO default.${table} FORMAT Native" \
< "${table}.native"
echo "Done."
done
```

## Step 5: Verify Data Integrity

```bash
for table in generations feedbacks generation_hooks audit_logs; do
old=$(clickhouse-client --host localhost --port 9000 \
--user "<old-ch-user>" --password "<old-ch-password>" \
--query "SELECT count() FROM default.${table}")
new=$(clickhouse-client --host localhost --port 9001 \
--user "<new-ch-user>" --password "<new-ch-password>" \
--query "SELECT count() FROM default.${table}")
echo "${table}: old=${old} new=${new} $([ "$old" = "$new" ] && echo 'OK' || echo 'MISMATCH')"
done
```

## Step 6: Decommission the Old Instance

Once verified, disable migration mode to remove the old in-cluster ClickHouse:

```yaml
clickhouse:
external:
enabled: true
# ... same external config as above
migration:
enabled: false
```

```bash
helm upgrade <release> portkey/portkey-app \
-f values.yaml \
-n <namespace>
```

This deletes the old StatefulSet, Service, ConfigMap, and ServiceAccount. Clean up the PVC if persistence was enabled:

```bash
kubectl delete pvc -l app.kubernetes.io/component=<release>-portkey-app-clickhouse -n <namespace>
```

Remove the exported `.native` files from your local machine.

## Rollback

To revert to the built-in ClickHouse:

```yaml
clickhouse:
external:
enabled: false
migration:
enabled: false
```

```bash
helm upgrade <release> portkey/portkey-app \
-f values.yaml \
-n <namespace>
```

If the PVC was not deleted, data will still be intact on the in-cluster instance. If it was deleted, the backend will create empty tables on a fresh instance -- re-import from your exported files.
177 changes: 177 additions & 0 deletions charts/portkey-app/docs/clickhouse-replication.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,177 @@
# ClickHouse Replication

The built-in ClickHouse StatefulSet shipped with this chart runs a single-node instance and does not support replication. For production workloads that require high availability and replicated tables, deploy ClickHouse separately using a dedicated chart and point Portkey at it as an external data store.

## Prerequisites

- A running Kubernetes cluster
- `helm` v3 installed
- `kubectl` configured for your cluster

## Deploy Replicated ClickHouse with Altinity Helm Chart

The [Altinity Helm Charts](https://github.com/Altinity/helm-charts) project provides a production-grade ClickHouse chart backed by the Altinity Operator. It supports multi-replica, multi-shard clusters with ClickHouse Keeper for coordination.

### Step 1: Add the Altinity Helm Repository

```bash
helm repo add altinity https://altinity.github.io/helm-charts
helm repo update
```

### Step 2: Create a Values File

Create a file called `clickhouse-replicated-values.yaml`:

```yaml
clickhouse:
replicasCount: 2
shardsCount: 1

defaultUser:
password: "<your-clickhouse-password>"
allowExternalAccess: true

clusterSecret:
enabled: true
auto: true

persistence:
enabled: true
size: 50Gi
storageClass: "" # set to your preferred StorageClass (e.g. gp3, standard)

service:
type: ClusterIP

settings:
max_table_size_to_drop: "0"

keeper:
enabled: true
replicaCount: 3
localStorage:
size: 5Gi

operator:
enabled: true # set to false if the Altinity Operator is already installed
```

Adjust `replicasCount`, `shardsCount`, keeper `replicaCount`, and storage sizes to match your requirements.

### Step 3: Install the Chart

```bash
helm install portkey altinity/clickhouse \
-f clickhouse-replicated-values.yaml \
-n portkey \
--create-namespace
```

### Step 4: Get the ClickHouse Service Endpoint

After the pods are running, get the service name:

```bash
kubectl get svc -n portkey -l app.kubernetes.io/name=clickhouse
```

The service name will typically follow the pattern `clickhouse-<release>`. Use this as the host when configuring Portkey.

## Configure Portkey to Use External Replicated ClickHouse

In your Portkey Helm values file, disable the built-in ClickHouse and point to the external cluster:

```yaml
clickhouse:
external:
enabled: true
host: "<clickhouse-service-name>.<namespace>.svc.cluster.local"
port: "8123"
nativePort: "9000"
user: "default"
password: "<your-clickhouse-password>"
database: "default"
tls: false
replicationEnabled: true
shardingEnabled: false
clusterName: "portkey"
```

When `replicationEnabled: true`, the backend runs migrations using `ReplicatedMergeTree` instead of `MergeTree` and creates the database with `ENGINE = Replicated(...)`.

When `shardingEnabled: true`, the backend additionally creates `_local` tables and `Distributed` tables on top, and all DDL is executed with `ON CLUSTER`.

`clusterName` must match the cluster name in your ClickHouse deployment. When using the Altinity Helm chart, the cluster name is the same as the Helm release name (truncated to 15 characters). For example, `helm install portkey altinity/clickhouse ...` creates a cluster named `portkey`.

> **Note:** When using the Altinity Helm chart, the cluster name is derived from the Helm release name. Avoid hyphens in the release name as ClickHouse treats them as the minus operator in SQL. Use only alphanumeric characters and underscores (e.g. `helm install portkey_ch` not `helm install portkey-ch`).

Then deploy or upgrade Portkey:

```bash
helm upgrade --install portkey portkey/portkey-app \
-f portkey-values.yaml \
-n portkey
```

### Using an Existing Secret

If you manage secrets externally, create a Kubernetes secret with these keys:

```yaml
apiVersion: v1
kind: Secret
metadata:
name: portkey-clickhouse-external
namespace: portkey
type: Opaque
stringData:
store: "clickhouse"
clickhouse_user: "default"
clickhouse_password: "<password>"
clickhouse_host: "<host>"
clickhouse_port: "8123"
clickhouse_native_port: "9000"
clickhouse_db: "default"
clickhouse_tls: "false"
```

Then reference it in your values. Note that `replicationEnabled`, `shardingEnabled`, and `clusterName` must be set in values even when using an existing secret:

```yaml
clickhouse:
external:
enabled: true
existingSecretName: "portkey-clickhouse-external"
replicationEnabled: true
shardingEnabled: false
clusterName: "portkey"
```
Comment thread
avaya09 marked this conversation as resolved.

## Environment Variables

The following environment variables are set on Portkey services from these values:

| Helm Value | Backend / Data Service Env | Gateway Env |
|------------|---------------------------|-------------|
| `replicationEnabled` | `CLICKHOUSE_REPLICATION_ENABLED` | `ANALYTICS_STORE_REPLICATION_ENABLED` |
| `shardingEnabled` | `CLICKHOUSE_SHARDING_ENABLED` | `ANALYTICS_STORE_SHARDING_ENABLED` |
| `clusterName` | `CLICKHOUSE_CLUSTER_NAME` | `ANALYTICS_STORE_CLUSTER_NAME` |

## Migration Behavior

| Mode | `replicationEnabled` | `shardingEnabled` | What Happens |
|------|---------------------|-------------------|--------------|
| Single-node | `false` | `false` | `MergeTree` tables, no cluster DDL |
| Replicated | `true` | `false` | `ReplicatedMergeTree` tables, `ON CLUSTER` DDL, Replicated database engine |
| Sharded | `false` | `true` | `MergeTree` local tables + `Distributed` tables, `ON CLUSTER` DDL |
| Replicated + Sharded | `true` | `true` | `ReplicatedMergeTree` local tables + `Distributed` tables, `ON CLUSTER` DDL, Replicated database engine |

## Scaling Considerations

| Parameter | Guidance |
|-----------|----------|
| `replicasCount` | 2+ for HA. Each replica holds a full copy of every shard's data. |
| `shardsCount` | Increase to distribute data horizontally when a single replica can't hold the full dataset. |
| `keeper.replicaCount` | Must be an odd number (3 or 5). Do **not** change after initial deployment. |
| `persistence.size` | Size per replica. Plan for data growth + merge overhead (~2x working set). |
Loading
Loading