Skip to content

restart notify, probe_success_rate, diff sort#44

Merged
wpak-ai merged 3 commits into
developfrom
bugfix/seed-notify-health-diff
May 20, 2026
Merged

restart notify, probe_success_rate, diff sort#44
wpak-ai merged 3 commits into
developfrom
bugfix/seed-notify-health-diff

Conversation

@henry0816191
Copy link
Copy Markdown
Collaborator

@henry0816191 henry0816191 commented May 20, 2026

Summary

  • Restart-aware seed notifications — On a cold first deploy, the first poll after seed() stays silent (no notify_callback). On restart (existing last_poll or discovered URLs), recent probe hits from the seed run_cycle() are filtered, matched via watchlists, and passed to notify_callback on the first poll. Redundant mark_discovered in seed() was removed; ISOProber.run_cycle() already persists hits with last_modified.
  • Health JSON — Renamed probe_hit_rate → probe_success_rate and fixed the ratio to (hit_recent + hit_old + hit_no_lm) / (hits + miss + error), excluding skipped URLs.
  • Diff ordering — updated_papers in diff_snapshots() are now sorted like new_papers (date descending, then id).

Changes

  • monitor.py
    SeedResult; seed() captures had_prior_state before work; first poll_once() branch for restart notify; shared sort key for new/updated papers

  • main.py
    probe_success_rate in /health extra fields

  • tests/test_monitor.py
    Cold start vs restart notify tests; updated_papers sort test; test_seed_marks_discovered updated for run_cycle-only discovery

  • tests/test_health.py
    Fixture key renamed to probe_success_rate

Breaking change (operators): /health JSON field probe_hit_rate is now probe_success_rate with a corrected definition. Update any dashboards or alerts that read the old name.

Test plan

uv run pytest tests/test_monitor.py tests/test_health.py -q (50 passed)
uv run pytest tests/ -q (318 passed, 2 skipped)
Manual: restart with populated DB (last_poll / discovered_urls); confirm first poll notifies only for recent seed probe hits, not cold deploy noise

Related issues

close #30

Summary by CodeRabbit

  • Improvements

    • Health endpoint now reports probe_success_rate (hits/attempted) instead of probe_hit_rate.
    • Initial scheduler run now provides recent probe data when prior state exists, improving visibility after restarts.
    • Updated sorting for detected/updated items to be deterministic by date then ID.
  • Tests

    • Tests updated/added to validate the new health metric, seeding/initial-run behavior, sorting, and notification logic.

Review Change Stack

@henry0816191 henry0816191 self-assigned this May 20, 2026
@henry0816191 henry0816191 requested a review from wpak-ai as a code owner May 20, 2026 17:39
@coderabbitai
Copy link
Copy Markdown

coderabbitai Bot commented May 20, 2026

No actionable comments were generated in the recent review. 🎉

ℹ️ Recent review info
⚙️ Run configuration

Configuration used: defaults

Review profile: CHILL

Plan: Pro

Run ID: 89807f73-7ceb-4228-a02b-989150acbfc4

📥 Commits

Reviewing files that changed from the base of the PR and between 4457889 and 0caf394.

📒 Files selected for processing (1)
  • src/paperscout/monitor.py

📝 Walkthrough

Walkthrough

Refactors scheduler seeding to return probe hits plus a prior-state flag and conditionally emit seed notifications on restart; renames health metric to probe_success_rate; ensures deterministic ordering for both new and updated paper snapshots.

Changes

Seed restart notifications, metrics, and ordering bugfix bundle

Layer / File(s) Summary
SeedResult data contract and seed() signature
src/paperscout/monitor.py
New SeedResult dataclass carries probe_hits list and had_prior_state flag. Scheduler.seed() signature changes to async def seed() -> SeedResult, with had_prior_state computed from existing poll or discovered URLs before seeding begins.
Scheduler.seed() implementation refactor
src/paperscout/monitor.py
seed() now captures ISO probe hits into a local list, completes seeding, logs seed outcome including had_prior_state, and returns SeedResult(probe_hits=..., had_prior_state=...) instead of only internal state mutation.
Poll-once first-run notification conditionals
src/paperscout/monitor.py
poll_once() on first run now calls seed() and conditionally emits notifications: if had_prior_state=true, recent probe hits are filtered, per-user watchlist matches are computed via blocking DB call, and notify_callback is invoked; if had_prior_state=false, an empty PollResult is returned silently (cold start).
Paper snapshot ordering and health metric naming
src/paperscout/monitor.py, src/paperscout/__main__.py
diff_snapshots applies stable sort by date descending then id to both new_papers and updated_papers. Health endpoint renames probe_hit_rate to probe_success_rate, computing hits from scheduler stats and setting the rate to hits / attempted when nonzero (else None).
Test validation for seeding, notifications, ordering, and metrics
tests/test_monitor.py, tests/test_health.py
Adds tests for cold-start silent behavior, restart notification emissions filtering old hits, seed result composition, updated_papers ordering by date descending, and health field probe_success_rate. Updates _make_scheduler fixture to mock prober.snapshot_stats.

Sequence Diagram(s)

sequenceDiagram
  participant PollOnce
  participant Scheduler.seed
  participant ProbeState
  participant DB
  participant notify_callback
  PollOnce->>Scheduler.seed: call seed()
  Scheduler.seed->>ProbeState: read last_poll & discovered_present?
  Scheduler.seed->>Scheduler.seed: prober.run_cycle() -> probe_hits
  Scheduler.seed-->>PollOnce: return SeedResult(probe_hits, had_prior_state)
  alt had_prior_state == false
    PollOnce-->>PollOnce: return empty PollResult (cold start)
  else had_prior_state == true
    PollOnce->>DB: matches_for_users(recent_hits) (blocking)
    PollOnce->>notify_callback: invoke with PollResult(probe_hits=recent_hits, per_user_matches=...)
  end
Loading

Estimated code review effort

🎯 3 (Moderate) | ⏱️ ~20 minutes

Possibly related PRs

  • cppalliance/paperscout#29: Both PRs modify the scheduler/health "extra fields" logic derived from scheduler stats; this PR renames the metric to probe_success_rate and updates tests.

Suggested labels

bug

Suggested reviewers

  • wpak-ai

Poem

🐰 A seed now blooms with facts in tow,
Did we have state before? Now we know!
Fresh notifs spring when paths are known,
While papers sort in steady tone.
Metrics gleam with clearer rate—first-run silence: smart and great!

🚥 Pre-merge checks | ✅ 4 | ❌ 1

❌ Failed checks (1 warning)

Check name Status Explanation Resolution
Docstring Coverage ⚠️ Warning Docstring coverage is 18.18% which is insufficient. The required threshold is 80.00%. Write docstrings for the functions missing them to satisfy the coverage threshold.
✅ Passed checks (4 passed)
Check name Status Explanation
Title check ✅ Passed The title concisely references the three main changes: restart notification behavior, probe_success_rate metric rename, and diff ordering for updated papers.
Description check ✅ Passed The description covers all required template sections, including a detailed summary of changes, comprehensive test plan with results, and a link to the related issue #30.
Linked Issues check ✅ Passed All acceptance criteria from issue #30 are met: seed() returns SeedResult with had_prior_state; poll_once() implements restart notify logic; probe_success_rate metric is corrected; updated_papers are now sorted; comprehensive tests added.
Out of Scope Changes check ✅ Passed All changes are directly aligned with issue #30 objectives: seed notification behavior, health JSON metric correction, and diff ordering. No unrelated modifications detected.

✏️ Tip: You can configure your own custom pre-merge checks in the settings.

✨ Finishing Touches
📝 Generate docstrings
  • Create stacked PR
  • Commit on current branch
🧪 Generate unit tests (beta)
  • Create PR with unit tests
  • Commit unit tests in branch bugfix/seed-notify-health-diff

Comment @coderabbitai help to get the list of available commands and usage tips.

Copy link
Copy Markdown

@coderabbitai coderabbitai Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 1

🤖 Prompt for all review comments with AI agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

Inline comments:
In `@src/paperscout/monitor.py`:
- Around line 170-173: The code updates `_last_successful_poll` and
`_last_probe_stats` immediately after `seed_result = await self.seed()` which
can mark a poll successful even if subsequent steps like
`self.matches_for_users(...)` or `self.notify_callback(...)` fail; move the
bookkeeping (assignments to `_last_successful_poll` and `_last_probe_stats`) to
the very end of the restart-first-poll path—i.e., after `matches_for_users` and
`notify_callback` have completed without error—to ensure only fully completed
polls are recorded; apply the same relocation in the other restart-first-poll
occurrences (the blocks around the other two sites referenced) so all three code
paths set those fields only after successful notification and matching.
🪄 Autofix (Beta)

Fix all unresolved CodeRabbit comments on this PR:

  • Push a commit to this branch (recommended)
  • Create a new PR with the fixes

ℹ️ Review info
⚙️ Run configuration

Configuration used: defaults

Review profile: CHILL

Plan: Pro

Run ID: 76ad56dd-a2d4-4f5e-9aa2-0012f80529c1

📥 Commits

Reviewing files that changed from the base of the PR and between dc319cd and 4457889.

📒 Files selected for processing (4)
  • src/paperscout/__main__.py
  • src/paperscout/monitor.py
  • tests/test_health.py
  • tests/test_monitor.py

Comment thread src/paperscout/monitor.py
@wpak-ai wpak-ai merged commit d4e5053 into develop May 20, 2026
10 checks passed
@wpak-ai wpak-ai deleted the bugfix/seed-notify-health-diff branch May 20, 2026 19:46
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

Logical bugfix bundle — monitor seed, health JSON, diff ordering

2 participants