Skip to content

docs(log-ingestor): Add user docs about fault tolerance guarantees.#2057

Merged
kirkrodrigues merged 5 commits intoy-scope:mainfrom
LinZhihao-723:log-ingestor-doc-update
Mar 6, 2026
Merged

docs(log-ingestor): Add user docs about fault tolerance guarantees.#2057
kirkrodrigues merged 5 commits intoy-scope:mainfrom
LinZhihao-723:log-ingestor-doc-update

Conversation

@LinZhihao-723
Copy link
Copy Markdown
Member

@LinZhihao-723 LinZhihao-723 commented Mar 2, 2026

Description

As the title suggests. This PR updates log-ingestor's user-doc to reflect the fault-tolerance features introduced in #1978.

Checklist

  • The PR satisfies the contribution guidelines.
  • This is a breaking change and that has been indicated in the PR title, OR this isn't a
    breaking change.
  • Necessary docs have been updated, OR no docs need to be updated.

Validation performed

  • Ensure all workflows pass.
  • Ensure the doc can be properly rendered on the doc site.

Summary by CodeRabbit

  • Documentation
    • Expanded Fault Tolerance guidance for the log ingestor: replaces a blanket warning with a clear explanation that ingestion jobs tolerate crashes/restarts and will resume; files discovered on S3 or via SQS will eventually be submitted for compression. Adds explicit notes on limits — successful compression is not guaranteed and duplicate processing can occur due to external-system behaviors.

@LinZhihao-723 LinZhihao-723 requested a review from a team as a code owner March 2, 2026 22:13
@coderabbitai
Copy link
Copy Markdown
Contributor

coderabbitai Bot commented Mar 2, 2026

Walkthrough

Updated the Fault tolerance section in the log-ingestor guide: replaced a previous "no fault tolerance" warning with detailed behavior describing job-level restoration after crashes/restarts, that files discovered via S3 or received via SQS will be submitted for compression eventually, and explicit caveats about what is not guaranteed (successful compression, exactly-once processing).

Changes

Cohort / File(s) Summary
Documentation Updates
docs/src/user-docs/guides-using-log-ingestor.md
Rewrote the Fault tolerance content: removed the prior warning about no fault tolerance and added a detailed description stating ingestion jobs and per-job state persist across restarts, files found on S3 or from SQS will be submitted for compression eventually, and listed non-guarantees after a crash (compression success not guaranteed; possible duplicate ingestion due to SQS/delete timing).

Estimated code review effort

🎯 1 (Trivial) | ⏱️ ~2 minutes

🚥 Pre-merge checks | ✅ 3
✅ Passed checks (3 passed)
Check name Status Explanation
Description Check ✅ Passed Check skipped - CodeRabbit’s high-level summary is enabled.
Title check ✅ Passed The title accurately describes the main change: updating documentation to add fault tolerance guarantee information for the log-ingestor.
Docstring Coverage ✅ Passed No functions found in the changed files to evaluate docstring coverage. Skipping docstring coverage check.

✏️ Tip: You can configure your own custom pre-merge checks in the settings.

✨ Finishing Touches
🧪 Generate unit tests (beta)
  • Create PR with unit tests
  • Post copyable unit tests in a comment

Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share

Comment @coderabbitai help to get the list of available commands and usage tips.

Copy link
Copy Markdown
Contributor

@coderabbitai coderabbitai Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 1

🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.

Inline comments:
In `@docs/src/user-docs/guides-using-log-ingestor.md`:
- Around line 65-72: Update the “No data loss” bullet to narrow its scope by
adding the qualifier “within the supported ingestion assumptions and persisted
CLP state”; specifically, change the bullet that begins with “No data loss” to
read something like “No data loss (within the supported ingestion assumptions
and persisted CLP state)” so the claim is not overbroad—keep the other bullets
(e.g., “Buffered data recovery” and “Isolation of completed work”) unchanged.

ℹ️ Review info

Configuration used: Organization UI

Review profile: ASSERTIVE

Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between a0f0933 and aee4992.

📒 Files selected for processing (1)
  • docs/src/user-docs/guides-using-log-ingestor.md

Comment thread docs/src/user-docs/guides-using-log-ingestor.md Outdated
Copy link
Copy Markdown
Contributor

@coderabbitai coderabbitai Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 1

♻️ Duplicate comments (1)
docs/src/user-docs/guides-using-log-ingestor.md (1)

60-61: ⚠️ Potential issue | 🟡 Minor

Scope this guarantee to the documented ingestion assumptions.

As written, this still reads broader than the actual contract. The later S3 scanner and SQS listener sections both rely on explicit assumptions (for example, lexicographic key ordering and a dedicated queue), so this guarantee should be qualified accordingly.

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@docs/src/user-docs/guides-using-log-ingestor.md` around lines 60 - 61, The
sentence "Within an ingestion job, any files that have been found on S3 or
received as messages from SQS queue will eventually be submitted for
compression." is too broad; update it to scope the guarantee to the documented
ingestion assumptions by qualifying it (e.g., "Within an ingestion job, and
assuming the ingestion preconditions described below—such as lexicographic S3
key ordering for the S3 scanner and a dedicated queue for the SQS listener—any
files found on S3 or received from the SQS queue will eventually be submitted
for compression."). Reference the S3 scanner and SQS listener sections and
explicitly mention the relevant assumptions (lexicographic key ordering,
dedicated queue) so the guarantee only applies when those preconditions hold.
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.

Inline comments:
In `@docs/src/user-docs/guides-using-log-ingestor.md`:
- Line 59: Update the sentence that currently reads "Any ingestion job
successfully submitted to `log-ingestor` will run continuously" to clarify
fault-tolerance semantics: replace "will run continuously" with wording like
"will be restored and resumed after a crash or restart" (or similar) so it
communicates recovery and resumption rather than uninterrupted runtime; ensure
the updated sentence still references `log-ingestor` and the successful
submission condition.

---

Duplicate comments:
In `@docs/src/user-docs/guides-using-log-ingestor.md`:
- Around line 60-61: The sentence "Within an ingestion job, any files that have
been found on S3 or received as messages from SQS queue will eventually be
submitted for compression." is too broad; update it to scope the guarantee to
the documented ingestion assumptions by qualifying it (e.g., "Within an
ingestion job, and assuming the ingestion preconditions described below—such as
lexicographic S3 key ordering for the S3 scanner and a dedicated queue for the
SQS listener—any files found on S3 or received from the SQS queue will
eventually be submitted for compression."). Reference the S3 scanner and SQS
listener sections and explicitly mention the relevant assumptions (lexicographic
key ordering, dedicated queue) so the guarantee only applies when those
preconditions hold.

ℹ️ Review info
⚙️ Run configuration

Configuration used: Organization UI

Review profile: ASSERTIVE

Plan: Pro

Run ID: addd62f0-7b91-425f-a6c7-da06db377022

📥 Commits

Reviewing files that changed from the base of the PR and between 39b1010 and 89c8474.

📒 Files selected for processing (1)
  • docs/src/user-docs/guides-using-log-ingestor.md

If `log-ingestor` crashes or is restarted, all in-progress ingestion jobs and their associated state
will be lost, and must be restored manually. Robust fault tolerance for the ingestion pipeline is
planned for a future release.
* Any ingestion job successfully submitted to `log-ingestor` will run continuously.
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟡 Minor

Replace “run continuously” with “be restored and resumed”.

Line 59 promises uninterrupted execution, but the fault-tolerance behaviour described here is recovery after a crash/restart, not literal continuous runtime. Wording like “will be restored and resumed after restart” would set the right expectation.

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@docs/src/user-docs/guides-using-log-ingestor.md` at line 59, Update the
sentence that currently reads "Any ingestion job successfully submitted to
`log-ingestor` will run continuously" to clarify fault-tolerance semantics:
replace "will run continuously" with wording like "will be restored and resumed
after a crash or restart" (or similar) so it communicates recovery and
resumption rather than uninterrupted runtime; ensure the updated sentence still
references `log-ingestor` and the successful submission condition.

Copy link
Copy Markdown
Member

@kirkrodrigues kirkrodrigues left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

For the PR title, how about:

docs(log-ingestor): Add user docs about fault tolerance guarantees.

Comment thread docs/src/user-docs/guides-using-log-ingestor.md Outdated
Co-authored-by: kirkrodrigues <2454684+kirkrodrigues@users.noreply.github.com>
@LinZhihao-723 LinZhihao-723 changed the title docs(log-ingestor): Update user documentation for fault tolerance and recovery guarantees. docs(log-ingestor): Add user docs about fault tolerance guarantees. Mar 6, 2026
Copy link
Copy Markdown
Contributor

@coderabbitai coderabbitai Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

♻️ Duplicate comments (2)
docs/src/user-docs/guides-using-log-ingestor.md (2)

59-59: ⚠️ Potential issue | 🟡 Minor

Don’t promise uninterrupted execution.

Line 59 still says ingestion jobs “will run continuously”, but the implementation guarantees recovery after a crash/restart, not literal uninterrupted runtime. Please reword this to “will be restored and resumed” (or equivalent).

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@docs/src/user-docs/guides-using-log-ingestor.md` at line 59, The sentence
claiming ingestion jobs "will run continuously" is inaccurate; update the
wording in the docs (the bullet that currently reads "Any ingestion job
successfully submitted to `log-ingestor` will run continuously.") to state that
jobs "will be restored and resumed" (or equivalent phrasing like "will be
restored and resumed after a crash/restart") and scan nearby text for similar
absolute claims to adjust consistency.

60-61: ⚠️ Potential issue | 🟡 Minor

Scope this guarantee to the documented ingestion assumptions.

This reads as universal, but the same page later says S3 scanner correctness depends on lexicographical ordering and immutability. Without that qualifier, readers may infer that any file seen in S3/SQS is eventually submitted, which is stronger than the documented contract.

Proposed wording
-* Within an ingestion job, any files that have been found on S3 or received as messages from SQS
-  queue will eventually be submitted for compression.
+* Within an ingestion job, any files that are discovered under the documented S3 scanner or SQS
+  listener assumptions will eventually be submitted for compression.
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@docs/src/user-docs/guides-using-log-ingestor.md` around lines 60 - 61, Update
the sentence beginning "Within an ingestion job, any files that have been found
on S3 or received as messages from SQS queue will eventually be submitted for
compression." to scope the guarantee to the documented ingestion assumptions:
explicitly state that this applies only when the S3 scanner assumptions
(lexicographical ordering and object immutability) and the documented SQS
assumptions hold, or otherwise qualify that eventual submission is subject to
those scanner/ingestion correctness conditions referenced elsewhere on this
page.
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.

Duplicate comments:
In `@docs/src/user-docs/guides-using-log-ingestor.md`:
- Line 59: The sentence claiming ingestion jobs "will run continuously" is
inaccurate; update the wording in the docs (the bullet that currently reads "Any
ingestion job successfully submitted to `log-ingestor` will run continuously.")
to state that jobs "will be restored and resumed" (or equivalent phrasing like
"will be restored and resumed after a crash/restart") and scan nearby text for
similar absolute claims to adjust consistency.
- Around line 60-61: Update the sentence beginning "Within an ingestion job, any
files that have been found on S3 or received as messages from SQS queue will
eventually be submitted for compression." to scope the guarantee to the
documented ingestion assumptions: explicitly state that this applies only when
the S3 scanner assumptions (lexicographical ordering and object immutability)
and the documented SQS assumptions hold, or otherwise qualify that eventual
submission is subject to those scanner/ingestion correctness conditions
referenced elsewhere on this page.

ℹ️ Review info
⚙️ Run configuration

Configuration used: Organization UI

Review profile: ASSERTIVE

Plan: Pro

Run ID: ea11b415-31b6-46e3-a197-103099f01095

📥 Commits

Reviewing files that changed from the base of the PR and between 89c8474 and 6d5da51.

📒 Files selected for processing (1)
  • docs/src/user-docs/guides-using-log-ingestor.md

@kirkrodrigues kirkrodrigues merged commit 255f232 into y-scope:main Mar 6, 2026
9 checks passed
@junhaoliao junhaoliao added this to the February 2026 milestone Mar 7, 2026
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants