[scd] Time-bound subscription locks and add lock diagnostics#1384
[scd] Time-bound subscription locks and add lock diagnostics#1384barroco merged 3 commits intointeruss:masterfrom
Conversation
barroco
left a comment
There was a problem hiding this comment.
Thank you very much @ashishjsharda for this contribution.
Please see comments inline.
|
Thanks for the clarification @ashishjsharda, while looking at it again, I am not sure about the usage of RowsAffected. Can you please confirm the behaviour is correct ? |
Thanks for the follow-up. You’re right to question it. In the latest update (
So the behavior of this PR no longer depends on |
|
@ashishjsharda many thanks for the changes. I will review them tomorrow first thing. Please note that we are still missing your signature of the Contributor License Agreement (CLA) (See this automatic check) before being able to merge this PR. |
There was a problem hiding this comment.
Thank you @ashishjsharda for the changes.
Knowing that:
- in terms of performance, the change is not making the situation worse and best case improves it as far as we have tested
- and that, the
notifyVolumediffering fromvalidParamsis out of scope of this PR
I am good to merge it. We will release it as patch update upon request so it can be tested in a production environment allowing users to easily rollback if there is any critical issue. Ideally, we will wait for this confirmation before releasing a new minor version including it.
Waiting for all automated checks to pass including the CLA to be signed to merge.
Thanks for the review and approval, @barroco. |
Problem
Issue #1311 reports high latency and timeouts for SCD operations under overlapping traffic patterns. Lock acquisition in
LockSubscriptionsOnCellsbecomes expensive when many subscriptions share S2 cells, regardless of temporal relevance.Solution
Narrow lock scope to subscriptions that overlap both the relevant S2 cells and the operation time window, while preserving correctness by always locking explicitly referenced subscription IDs. Add low-noise diagnostics to improve production observability of costly lock queries.
Changes
1. Time-bounded lock scope
Updated
LockSubscriptionsOnCellsto acceptstartTimeandendTime, locking rows matching:cells && $1COALESCE(starts_at <= endTime, true)COALESCE(ends_at >= startTime, true)OR id = ANY(explicitSubscriptionIDs)2. Call-site updates
Updated OIR mutation paths to pass the relevant time window:
old.StartTime/old.EndTimevalidParams.uExtent.StartTime/validParams.uExtent.EndTime3. Lock diagnostics (thresholded, low-noise)
Warnings emitted when:
duration >= 500ms, ormatched_rows >= 1000Fields logged:
global_lockdurationmatched_rowscell_countexplicit_subscription_id_countFailure warnings:
SCD global lock query failedSCD subscription lock query failed4. Documentation
Extended
docs/operations/performances.mdwith diagnostics behavior, fields, and thresholds.Validation
go test ./pkg/scd/store/datastore— passedgo test ./pkg/scd/...— passed locallyRisk / Compatibility
LockSubscriptionsOnCellsinterface signature updated; all in-repo call sites patched.Follow-up
Closes #1311