feat(ntx-builder): deactivate accounts which crash repeatedly#1712
Conversation
| tracing::warn!( | ||
| %account_id, | ||
| crash_count = count, | ||
| "Account blacklisted due to repeated crashes, skipping actor spawn" |
There was a problem hiding this comment.
Q: should we make this more explicit? tracing::error!, since it signifies a bug in our impl or add "BUG" deliberately to the message?
There was a problem hiding this comment.
We should already be logging the crash itself, so this is probably okay at warn since it will repeat often for each account.
| return; | ||
| } | ||
| } | ||
|
|
There was a problem hiding this comment.
Q: should we first check for running actors and terminate them?
There was a problem hiding this comment.
I dont think so. By the time the crash threshold is reached, the actor has already shut itself down. DbError shutdowns are handled in next(), which removes the actor from the registry before any re-spawn attempt. So when spawn_actor is called for a deactivated account, there's no running actor to terminate.
| assert_eq!(inactive_targets[0], inactive_id); | ||
| } | ||
|
|
||
| // BLACKLIST TESTS |
There was a problem hiding this comment.
nit: we might want to find a different term to "blacklist", i.e. repeated_failure_block_list to avoid any confusion / fud
There was a problem hiding this comment.
I'd prefer something much shorter -- we're already overly verbose and stuttery throughout the codebase imo. Maybe deactivated_accounts
Mirko-von-Leipzig
left a comment
There was a problem hiding this comment.
Just naming nits :) I guess blacklist was the wrong term to use -- let's try deactivated instead 🙃
| tracing::warn!( | ||
| %account_id, | ||
| crash_count = count, | ||
| "Account blacklisted due to repeated crashes, skipping actor spawn" |
There was a problem hiding this comment.
We should already be logging the crash itself, so this is probably okay at warn since it will repeat often for each account.
| assert_eq!(inactive_targets[0], inactive_id); | ||
| } | ||
|
|
||
| // BLACKLIST TESTS |
There was a problem hiding this comment.
I'd prefer something much shorter -- we're already overly verbose and stuttery throughout the codebase imo. Maybe deactivated_accounts
ebf9d91 to
870292b
Compare
b993661 to
e681868
Compare
Co-authored-by: Mirko <48352201+Mirko-von-Leipzig@users.noreply.github.com>
c5f91d8 to
d0c58c2
Compare
Implements account blacklisting for the NTX Builder (4th task from #1694).
Track crash counts per account in the coordinator. When an actor shuts down due to a
DbError, its crash count is incremented. Once it reaches a configurable threshold, the account is blacklisted andspawn_actorskips it.Only DbError shutdowns count as crashes because other shutdown reasons (
Cancelled,IdleTimeout,SemaphoreFailed) are either intentional or system-wide and not indicative of a per-account bug.