Skip to content

feat: Add grpc_health_probe for ECS health checks#13

Open
laurencehook-lr wants to merge 2 commits intolil5:mainfrom
laurencehook-lr:add-grpc-health-probe
Open

feat: Add grpc_health_probe for ECS health checks#13
laurencehook-lr wants to merge 2 commits intolil5:mainfrom
laurencehook-lr:add-grpc-health-probe

Conversation

@laurencehook-lr
Copy link
Copy Markdown

@laurencehook-lr laurencehook-lr commented Jul 7, 2025

Summary

Adds grpc_health_probe to enable native health checks for gRPC services in Kubernetes and AWS ECS.

Usage

Kubernetes

livenessProbe:
  exec:
    command:
    - /bin/grpc_health_probe
    - -addr=localhost:50051
  initialDelaySeconds: 30
  periodSeconds: 10

readinessProbe:
  exec:
    command:
    - /bin/grpc_health_probe
    - -addr=localhost:50051
  initialDelaySeconds: 10
  periodSeconds: 5

AWS ECS

"healthCheck": {
  "command": [
    "CMD-SHELL",
    "/bin/grpc_health_probe -addr=localhost:50051 || exit 1"
  ],
  "interval": 30,
  "timeout": 5,
  "retries": 3,
  "startPeriod": 60
}

Test

docker exec <container_id> /bin/grpc_health_probe -addr=localhost:50051
# Expected: status: SERVING

🤖 Generated with Claude Code

The grpc_health_probe binary was being installed in the builder stage but not copied to the final Alpine image, making it unavailable at runtime. This fix ensures the health probe is available for ECS health checks.
tks-socius added a commit to Socius-Technologies-SG-Limited/tigerbeetle_api that referenced this pull request Apr 15, 2026
Scope:
* Import migration: github.com/tigerbeetle/tigerbeetle-go/pkg/types was
  flattened into the root tigerbeetle_go package in 0.17. All `types.X`
  references across grpc/, benchmark/ and the e2e suite now point at
  tigerbeetle_go.X.
* Type role-swap: the 0.17 SDK reused the old enum names for new per-
  event struct types.
    - old enum CreateAccountResult  -> CreateAccountStatus (enum)
    - old struct AccountEventResult -> CreateAccountResult (struct)
  Same shift for Transfer. Client method signatures now return
  []CreateAccountResult / []CreateTransferResult, each holding Status +
  Timestamp + Reserved.
* Dense response handling: CreateAccounts and CreateTransfers now
  return one entry per input event (including successes), not just
  failures. `.Index` is gone; the converter positionally maps
  results[i] to inputs[i].
* grpc/convert.go: new AccountResultsToReply/ResultsToReply for dense
  arrays; new mapAccountStatus/mapTransferStatus helpers that normalise
  the SDK's 0xFFFFFFFF success sentinel to proto's idiomatic 0. Updated
  Uint128.BigInt() usage — it now returns *big.Int, not big.Int, so the
  lo.ToPtr wrappers are gone. BigIntToUint128 now takes *big.Int.
* grpc/routes.go flushFunc rewrite:
    - per-payload demux: each caller gets exactly its own slice of the
      dense reply array. Previously every caller got the concatenated
      slice of all other callers' failures, which worked by accident
      under the sparse model but breaks under dense.
    - accumulate replies across multiple transferBatches (pre-existing
      bug: the old code overwrote `replies` inside the loop, so only
      the last batch's replies survived a multi-batch flush).
    - tracked `flushErr` at closure scope instead of leaking the outer
      NewApp `err` via scoping shadowing (pre-existing bug).
    - reset the running total when starting a new batch so the split
      point is actually TB_MAX_BATCH_SIZE and not drift-accumulated.
* Metric counting: TotalCreateTransferTxErr and TotalCreateAccountsTxErr
  now increment only for non-Created statuses, not for every result
  entry. Added TotalCreateAccountsTx as the account-side parallel to
  TotalCreateTransferTx.
* Test updates:
    - Mock interfaces: AccountEventResult -> CreateAccountResult,
      TransferEventResult -> CreateTransferResult. Mock matchers
      ("types.QueryFilter" etc) updated to tigerbeetle_go.QueryFilter
      since that's the runtime-qualified type name.
    - Benchmark and e2e assertions that read `len(results) == 0` as
      "all succeeded" flipped to checking len == N with every
      status == Created/0. The sparse-empty convention was a 0.16
      artefact that no longer holds.

Not yet covered (follow-up commits):
  * TooMuchDataError on AccountFilter/QueryFilter.Limit (task lil5#8).
  * ClientEvicted / ClientRelease* translation (task lil5#9).
  * Bruno/Kreya collection refresh (task lil5#12).
  * IPLM-1587 coordination (task lil5#13).
  * Final e2e + benchmark validation under docker compose (task #14).
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant