feat: add alerting for unreachable errors#7046
feat: add alerting for unreachable errors#7046simone-stacks wants to merge 11 commits intostacks-network:developfrom
Conversation
Coverage Report for CI Build 24078323220Warning Build has drifted: This PR's base is out of sync with its target branch, so coverage data may include unrelated changes. Coverage decreased (-8.3%) to 77.398%Details
Uncovered Changes
Coverage Regressions19497 previously-covered lines in 183 files lost coverage.
Coverage Stats
💛 - Coveralls |
c33a7c7 to
11f9e09
Compare
clarity/src/vm/analysis/errors.rs
Outdated
|
|
||
| #[cfg(test)] | ||
| mod tests { | ||
| use super::*; | ||
|
|
||
| #[test] | ||
| fn runtime_check_unreachable_returns_true() { | ||
| let err = RuntimeCheckErrorKind::Unreachable("test bug".into()); | ||
| assert!(err.is_unreachable()); | ||
| } | ||
|
|
||
| #[test] | ||
| fn runtime_check_cost_overflow_not_unreachable() { | ||
| assert!(!RuntimeCheckErrorKind::CostOverflow.is_unreachable()); | ||
| } | ||
|
|
||
| #[test] | ||
| fn runtime_check_type_error_not_unreachable() { | ||
| let err = RuntimeCheckErrorKind::TypeError( | ||
| Box::new(TypeSignature::IntType), | ||
| Box::new(TypeSignature::UIntType), | ||
| ); | ||
| assert!(!err.is_unreachable()); | ||
| } | ||
|
|
||
| #[test] | ||
| fn runtime_check_value_too_large_not_unreachable() { | ||
| assert!(!RuntimeCheckErrorKind::ValueTooLarge.is_unreachable()); | ||
| } | ||
|
|
||
| #[test] | ||
| fn static_check_unreachable_returns_true() { | ||
| let err = StaticCheckErrorKind::Unreachable("test bug".into()); | ||
| assert!(err.is_unreachable()); | ||
| } | ||
|
|
||
| #[test] | ||
| fn static_check_supertype_too_large_not_unreachable() { | ||
| assert!(!StaticCheckErrorKind::SupertypeTooLarge.is_unreachable()); | ||
| } | ||
|
|
||
| #[test] | ||
| fn static_check_cost_overflow_not_unreachable() { | ||
| assert!(!StaticCheckErrorKind::CostOverflow.is_unreachable()); | ||
| } | ||
|
|
||
| #[test] | ||
| fn static_check_type_already_annotated_not_unreachable() { | ||
| assert!(!StaticCheckErrorKind::TypeAlreadyAnnotatedFailure.is_unreachable()); | ||
| } | ||
| } |
There was a problem hiding this comment.
I have a few suggestions that might help:
- for consistency in test structure (at least it seems for
vm/analysis), it may be worth moving these intosrc/vm/analysis/tests/errors.rs - tests could be collapsed per error type, e.g.
static_check_is_unreachable()andruntime_check_is_unreachable() - At the moment, the tests are not exhaustive, as they don’t cover all variants. I see a few possible approaches here:
- Make them fully exhaustive by matching all variants per type (though this could become quite heavy and maybe not worthy in this case).
- Keep the pattern matching focused on the “unreachable” variants, and use a catch-all
_arm for the rest. - Alternatively, we may consider to drop them.
There was a problem hiding this comment.
I see that point 1 and point 2 have been addressed, but not point 3. I would like to know your thoughts about it :)
There was a problem hiding this comment.
You're right, I missed the exhaustiveness concern, crc is in 68e514f, I dropped them entirely. is_unreachable() is a trivial matches! one-liner, these were just testing Rust pattern matching. removed from both analysis and ast. What do you think?
There was a problem hiding this comment.
IMHO, the point here isn’t the triviality of the code, but rather that we’re not using is_unreachable() as part of the business logic (as stated in the docstring).
I’m fine with this. Let’s keep this open for now to gather feedback from other reviewers.
federico-stacks
left a comment
There was a problem hiding this comment.
Implementation looks good overall.
I’ve left a specific note in this thread: #7046 (comment)
One additional thought: since there are other error variants that might be considered unreachable (as mentioned in the PR description), it could be worth including them in our monitoring as well. More in general we could collect from the consensus tests documentation all the error variants currently treated as unreachable and track them accordingly. (This could also be done in a follow-up PR)
Any thoughts?
|
Yeah i think it's a great follow-up. The strongest candidates are There's also There's also epoch-gated stuff ( I'd tackle it in a follow-up PR, focusing on the first two categories. does that make sense? anything else I might be missing? |
we could consider all the unreachable variants from the consensus tests (the one under |
federico-stacks
left a comment
There was a problem hiding this comment.
LGTM!
heads-up: a lot of CI checks are failed. Should not be related, but worth to have a clean run before merge this
Description
Adds distinct alerting when "unreachable" errors are triggered. These are errors that should never occur in valid execution and indicate potential bugs.
Today these errors flow through the generic "problematic transaction" path and get logged at info! level with no way to distinguish them from normal rejectable errors.
This PR adds:
is_unreachable()methods onRuntimeCheckErrorKind,StaticCheckErrorKind, andParseErrorerror!level logs with aUNREACHABLE_ERROR_TRIGGEREDmarker and structuredevent_name=unreachable_errorfield at all consensus rejection sites (block validation + miner)stacks_unreachable_errors_totalPrometheus counter (behindmonitoring_promfeature flag)Applicable issues
Checklist
docs/property-testing.md)rpc/openapi.yamlfor RPC endpoints,event-dispatcher.mdfor new events)clarity-benchmarkingrepo