docs(log-ingestor): Add user docs about fault tolerance guarantees.#2057
Conversation
WalkthroughUpdated the Fault tolerance section in the log-ingestor guide: replaced a previous "no fault tolerance" warning with detailed behavior describing job-level restoration after crashes/restarts, that files discovered via S3 or received via SQS will be submitted for compression eventually, and explicit caveats about what is not guaranteed (successful compression, exactly-once processing). Changes
Estimated code review effort🎯 1 (Trivial) | ⏱️ ~2 minutes 🚥 Pre-merge checks | ✅ 3✅ Passed checks (3 passed)
✏️ Tip: You can configure your own custom pre-merge checks in the settings. ✨ Finishing Touches🧪 Generate unit tests (beta)
Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out. Comment |
There was a problem hiding this comment.
Actionable comments posted: 1
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.
Inline comments:
In `@docs/src/user-docs/guides-using-log-ingestor.md`:
- Around line 65-72: Update the “No data loss” bullet to narrow its scope by
adding the qualifier “within the supported ingestion assumptions and persisted
CLP state”; specifically, change the bullet that begins with “No data loss” to
read something like “No data loss (within the supported ingestion assumptions
and persisted CLP state)” so the claim is not overbroad—keep the other bullets
(e.g., “Buffered data recovery” and “Isolation of completed work”) unchanged.
There was a problem hiding this comment.
Actionable comments posted: 1
♻️ Duplicate comments (1)
docs/src/user-docs/guides-using-log-ingestor.md (1)
60-61:⚠️ Potential issue | 🟡 MinorScope this guarantee to the documented ingestion assumptions.
As written, this still reads broader than the actual contract. The later S3 scanner and SQS listener sections both rely on explicit assumptions (for example, lexicographic key ordering and a dedicated queue), so this guarantee should be qualified accordingly.
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@docs/src/user-docs/guides-using-log-ingestor.md` around lines 60 - 61, The sentence "Within an ingestion job, any files that have been found on S3 or received as messages from SQS queue will eventually be submitted for compression." is too broad; update it to scope the guarantee to the documented ingestion assumptions by qualifying it (e.g., "Within an ingestion job, and assuming the ingestion preconditions described below—such as lexicographic S3 key ordering for the S3 scanner and a dedicated queue for the SQS listener—any files found on S3 or received from the SQS queue will eventually be submitted for compression."). Reference the S3 scanner and SQS listener sections and explicitly mention the relevant assumptions (lexicographic key ordering, dedicated queue) so the guarantee only applies when those preconditions hold.
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.
Inline comments:
In `@docs/src/user-docs/guides-using-log-ingestor.md`:
- Line 59: Update the sentence that currently reads "Any ingestion job
successfully submitted to `log-ingestor` will run continuously" to clarify
fault-tolerance semantics: replace "will run continuously" with wording like
"will be restored and resumed after a crash or restart" (or similar) so it
communicates recovery and resumption rather than uninterrupted runtime; ensure
the updated sentence still references `log-ingestor` and the successful
submission condition.
---
Duplicate comments:
In `@docs/src/user-docs/guides-using-log-ingestor.md`:
- Around line 60-61: The sentence "Within an ingestion job, any files that have
been found on S3 or received as messages from SQS queue will eventually be
submitted for compression." is too broad; update it to scope the guarantee to
the documented ingestion assumptions by qualifying it (e.g., "Within an
ingestion job, and assuming the ingestion preconditions described below—such as
lexicographic S3 key ordering for the S3 scanner and a dedicated queue for the
SQS listener—any files found on S3 or received from the SQS queue will
eventually be submitted for compression."). Reference the S3 scanner and SQS
listener sections and explicitly mention the relevant assumptions (lexicographic
key ordering, dedicated queue) so the guarantee only applies when those
preconditions hold.
ℹ️ Review info
⚙️ Run configuration
Configuration used: Organization UI
Review profile: ASSERTIVE
Plan: Pro
Run ID: addd62f0-7b91-425f-a6c7-da06db377022
📒 Files selected for processing (1)
docs/src/user-docs/guides-using-log-ingestor.md
| If `log-ingestor` crashes or is restarted, all in-progress ingestion jobs and their associated state | ||
| will be lost, and must be restored manually. Robust fault tolerance for the ingestion pipeline is | ||
| planned for a future release. | ||
| * Any ingestion job successfully submitted to `log-ingestor` will run continuously. |
There was a problem hiding this comment.
Replace “run continuously” with “be restored and resumed”.
Line 59 promises uninterrupted execution, but the fault-tolerance behaviour described here is recovery after a crash/restart, not literal continuous runtime. Wording like “will be restored and resumed after restart” would set the right expectation.
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.
In `@docs/src/user-docs/guides-using-log-ingestor.md` at line 59, Update the
sentence that currently reads "Any ingestion job successfully submitted to
`log-ingestor` will run continuously" to clarify fault-tolerance semantics:
replace "will run continuously" with wording like "will be restored and resumed
after a crash or restart" (or similar) so it communicates recovery and
resumption rather than uninterrupted runtime; ensure the updated sentence still
references `log-ingestor` and the successful submission condition.
kirkrodrigues
left a comment
There was a problem hiding this comment.
For the PR title, how about:
docs(log-ingestor): Add user docs about fault tolerance guarantees.
Co-authored-by: kirkrodrigues <2454684+kirkrodrigues@users.noreply.github.com>
There was a problem hiding this comment.
♻️ Duplicate comments (2)
docs/src/user-docs/guides-using-log-ingestor.md (2)
59-59:⚠️ Potential issue | 🟡 MinorDon’t promise uninterrupted execution.
Line 59 still says ingestion jobs “will run continuously”, but the implementation guarantees recovery after a crash/restart, not literal uninterrupted runtime. Please reword this to “will be restored and resumed” (or equivalent).
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@docs/src/user-docs/guides-using-log-ingestor.md` at line 59, The sentence claiming ingestion jobs "will run continuously" is inaccurate; update the wording in the docs (the bullet that currently reads "Any ingestion job successfully submitted to `log-ingestor` will run continuously.") to state that jobs "will be restored and resumed" (or equivalent phrasing like "will be restored and resumed after a crash/restart") and scan nearby text for similar absolute claims to adjust consistency.
60-61:⚠️ Potential issue | 🟡 MinorScope this guarantee to the documented ingestion assumptions.
This reads as universal, but the same page later says S3 scanner correctness depends on lexicographical ordering and immutability. Without that qualifier, readers may infer that any file seen in S3/SQS is eventually submitted, which is stronger than the documented contract.
Proposed wording
-* Within an ingestion job, any files that have been found on S3 or received as messages from SQS - queue will eventually be submitted for compression. +* Within an ingestion job, any files that are discovered under the documented S3 scanner or SQS + listener assumptions will eventually be submitted for compression.🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@docs/src/user-docs/guides-using-log-ingestor.md` around lines 60 - 61, Update the sentence beginning "Within an ingestion job, any files that have been found on S3 or received as messages from SQS queue will eventually be submitted for compression." to scope the guarantee to the documented ingestion assumptions: explicitly state that this applies only when the S3 scanner assumptions (lexicographical ordering and object immutability) and the documented SQS assumptions hold, or otherwise qualify that eventual submission is subject to those scanner/ingestion correctness conditions referenced elsewhere on this page.
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.
Duplicate comments:
In `@docs/src/user-docs/guides-using-log-ingestor.md`:
- Line 59: The sentence claiming ingestion jobs "will run continuously" is
inaccurate; update the wording in the docs (the bullet that currently reads "Any
ingestion job successfully submitted to `log-ingestor` will run continuously.")
to state that jobs "will be restored and resumed" (or equivalent phrasing like
"will be restored and resumed after a crash/restart") and scan nearby text for
similar absolute claims to adjust consistency.
- Around line 60-61: Update the sentence beginning "Within an ingestion job, any
files that have been found on S3 or received as messages from SQS queue will
eventually be submitted for compression." to scope the guarantee to the
documented ingestion assumptions: explicitly state that this applies only when
the S3 scanner assumptions (lexicographical ordering and object immutability)
and the documented SQS assumptions hold, or otherwise qualify that eventual
submission is subject to those scanner/ingestion correctness conditions
referenced elsewhere on this page.
ℹ️ Review info
⚙️ Run configuration
Configuration used: Organization UI
Review profile: ASSERTIVE
Plan: Pro
Run ID: ea11b415-31b6-46e3-a197-103099f01095
📒 Files selected for processing (1)
docs/src/user-docs/guides-using-log-ingestor.md
Description
As the title suggests. This PR updates
log-ingestor's user-doc to reflect the fault-tolerance features introduced in #1978.Checklist
breaking change.
Validation performed
Summary by CodeRabbit