Conversation
Signed-off-by: Matthias Büchse <matthias.buechse@alasca.cloud>
|
@mitch000001 That's how far we got yesterday |
Signed-off-by: Matthias Büchse <matthias.buechse@alasca.cloud>
|
I think the approach is not quite right. Here, we track whether the result for a given triple subject/scope/version changes, e.g., from PASS to FAIL. However, so many variables are at play here, because this result is compiled from testcase results, and each testcase result can stem from a different run with different software version etc. So what we ought to track instead is individual testcase results, for instance, given subject X and testcase Y, and we have two consecutive runs of the corresponding check script, we might obtain results r1 at time t1 with software revision v1 and r2 at time t2 with software revision v2. And if r1 is not equal to r2, then we can tell the partner: hey, there has been a change, and we can even say whether v1 != v2 or not. |
|
What's also not covered here is the case that the state changes because the latest result expires. We have no outside trigger for that, so we might need add a dedicated one. |
|
Since #889 the risk that the latest result expires is actually quite minimal, because we always get something (be it ABORT). What could expire, though, is the latest PASSing result. This, however, is probably not what we want! As long as the test suite is run regularly, we can't fault the partner if it produces false positives. We actually have to extend the lifetime period of the passing result. That's not something we currently do (and I'm not even sure the database schema is up for that) |
|
The principle should be: the subject passes the testcase until/unless proven otherwise. We have to prove that. If we get an ABORT, it should probably count as PASS, until/unless a human assessor reviews it and judges it's indeed a fail. We should write the tests in such a way that FAIL results are quite airtight. Then the fail could be effective immediately. However, there are sometimes tests that are flaky, in the sense that they usually pass, but sometimes inexplicably fail (because of some overly strict timeout or what not). This is particularly true in the case of Sonobuoy, where we don't even control the code. So we should probably at least retry failed tests or only count multiple failed tests in a row. Again, in the case of Sonobuoy, this is not trivial, because Sonobuoy actually aggregates multiple testcases, where each one can be flaky individually, so three consecutive runs of Sonobuoy may all fail with no two runs failing because of the same testcase. |
No description provided.