feat(scraper): add support for weeks, months, and 'just posted' date formats by whbbupt · Pull Request #295 · ambicuity/New-Grad-Jobs

whbbupt · 2026-05-24T07:20:23Z

Enhanced normalize_date_string() to handle additional date formats commonly returned by job boards:

"Just posted" / "Recently" → today
"X weeks ago" → X * 7 days ago
"X months ago" → X * 30 days ago
"Active X days ago" → X days ago

These formats appear on Indeed, LinkedIn, and other job listing platforms but were previously falling through to the
raw-string return path, which caused some jobs to be unnecessarily filtered out.

Summary by CodeRabbit

Bug Fixes
- Improved date format recognition for job postings to support additional human-readable date expressions. The system now correctly interprets "Recently", "Just posted", "Just now", and relative time phrases like "Posted X Weeks Ago" and "Posted X Months Ago", ensuring more accurate chronological sorting and display.

…formats

coderabbitai · 2026-05-24T07:20:35Z

Warning

Review limit reached

@whbbupt, we couldn't start this review because you've used your available PR reviews for now.

Your plan currently allows 1 review/hour. Refill in 42 minutes and 22 seconds.

Your organization has run out of usage credits. Purchase more in the billing tab.

⌛ How to resolve this issue?

After more review capacity refills, a review can be triggered using the @coderabbitai review command as a PR comment. Alternatively, push new commits to this PR.

We recommend that you space out your commits to avoid hitting the rate limit.

🚦 How do rate limits work?

CodeRabbit enforces hourly rate limits for each developer per organization.

Our paid plans have higher rate limits than trial, open-source, and free plans. In all cases, review capacity refills continuously over time.

Please see our FAQ for further information.

ℹ️ Review info

⚙️ Run configuration

Configuration used: Repository UI

Review profile: CHILL

Plan: Pro

Run ID: cb0e1988-e430-4317-bdf9-b989260e0788

📥 Commits

Reviewing files that changed from the base of the PR and between cf71786 and 6434af1.

📒 Files selected for processing (1)

scripts/update_jobs.py

📝 Walkthrough

Walkthrough

This PR extends the normalize_date_string() function in scripts/update_jobs.py to parse additional relative date formats commonly found in job posting timestamps. The changes add support for immediate-posting phrases ("just posted," "recently"), week and month offset expressions, and "active" day counts, with corresponding docstring examples.

Changes

Relative Date Format Expansion

Layer / File(s)	Summary
Relative date format support `scripts/update_jobs.py`	Added parsing branches for "just posted/now/recently" (mapped to today), weeks-ago expressions (weeks × 7 days), months-ago expressions (months × 30 days), and "active X days ago" (X days). Docstring examples extended to document the newly supported phrases.

Estimated code review effort

🎯 2 (Simple) | ⏱️ ~8 minutes

Possibly related PRs

ambicuity/New-Grad-Jobs#161: Both PRs expand relative-date parsing in normalize_date_string() to recognize additional timestamp formats ("hours ago"/"minutes ago" in #161, vs. weeks/months/active days/"just posted" in this PR).

Suggested labels

size/S

Suggested reviewers

ambicuity

Poem

🐰 With timestamps now speaking in weeks and in moons,
And "just posted" ringing across all the job postings,
The parsing grows wiser, no longer in cartoon—
Fresh dates in human words, our script now well-knowing!
hops past the merge button ✨

🚥 Pre-merge checks | ✅ 5

✅ Passed checks (5 passed)

Check name	Status	Explanation
Description Check	✅ Passed	Check skipped - CodeRabbit’s high-level summary is enabled.
Title check	✅ Passed	The title directly and clearly summarizes the main change: adding support for additional date formats (weeks, months, and 'just posted') in the date normalization function.
Docstring Coverage	✅ Passed	Docstring coverage is 100.00% which is sufficient. The required threshold is 80.00%.
Linked Issues check	✅ Passed	Check skipped because no linked issues were found for this pull request.
Out of Scope Changes check	✅ Passed	Check skipped because no linked issues were found for this pull request.

_{✏️ Tip: You can configure your own custom pre-merge checks in the settings.}

✨ Finishing Touches

🧪 Generate unit tests (beta)

Create PR with unit tests

Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share

_{Comment @coderabbitai help to get the list of available commands and usage tips.}

gemini-code-assist

Code Review

This pull request expands the normalize_date_string function in scripts/update_jobs.py to support a wider range of relative date formats, including 'Just posted', 'Weeks Ago', and 'Months Ago'. A review comment identifies that the logic added for 'Active X Days Ago' is redundant because the existing 'days ago' regex already captures these cases.

gemini-code-assist · 2026-05-24T07:22:18Z

+    # Handle "Active X Days Ago" (some job boards use "Active" instead of "Posted")
+    active_match = re.search(r'active\s+(\d+)\s*days?\s+ago', posted_at_lower)
+    if active_match:
+        days = int(active_match.group(1))
+        return (now - timedelta(days=days)).strftime('%Y-%m-%d')


This block is redundant and will never be executed. The existing days_match logic at line 2240 already uses the regex r'(\d+)\s*days?\s+ago', which captures the "X days ago" pattern regardless of whether it is preceded by "Posted" or "Active" (since re.search matches anywhere in the string). Because the implementation here is identical to the earlier block, this addition is unnecessary.

coderabbitai

Actionable comments posted: 3

🤖 Prompt for all review comments with AI agents

Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

Inline comments:
In `@scripts/update_jobs.py`:
- Around line 2265-2269: The current logic uses timedelta(days=months * 30)
which can drift by several days; replace this with calendar-accurate month
subtraction using dateutil.relativedelta: import relativedelta via "from
dateutil.relativedelta import relativedelta" and change the return to "(now -
relativedelta(months=months)).strftime('%Y-%m-%d')". Update the code path that
handles months (variables months_match, posted_at_lower, now) and ensure
dateutil is added to requirements if not present; alternatively, if you cannot
add the dependency, document the 30-day approximation in the function docstring
mentioning the possible +/- days drift.
- Around line 2271-2275: Remove the unreachable "Active X Days Ago" regex block:
the earlier pattern r'(\d+)\s*days?\s+ago' already matches "active 5 days ago"
and returns, so delete the lines that define active_match, its regex
r'active\s+(\d+)\s*days?\s+ago', and the subsequent return; locate this dead
code in the same function that computes posted_at using posted_at_lower and now
(look for the existing r'(\d+)\s*days?\s+ago' match and the variable names
active_match, posted_at_lower, days) and remove only that redundant block to
keep behavior identical.
- Around line 2259-2263: Extract the hardcoded week-to-day conversion (the
literal 7) into a module-level constant (e.g., DAYS_PER_WEEK) alongside the
other constants in scripts/update_jobs.py, then replace the literal in the weeks
handling block (the code that uses weeks_match and computes timedelta(days=weeks
* 7)) with timedelta(days=weeks * DAYS_PER_WEEK) so the conversion is named and
reusable.

🪄 Autofix (Beta)

Fix all unresolved CodeRabbit comments on this PR:

Push a commit to this branch (recommended)
Create a new PR with the fixes

ℹ️ Review info

⚙️ Run configuration

Configuration used: Repository UI

Review profile: CHILL

Plan: Pro

Run ID: 9080e92e-a03a-4278-b23c-ac5f273b8de8

📥 Commits

Reviewing files that changed from the base of the PR and between 94eb071 and cf71786.

📒 Files selected for processing (1)

scripts/update_jobs.py

…edelta

ambicuity

Approving after thorough local verification.

Scope verified: Extends normalize_date_string() to handle "Just posted" / "Recently", "X Weeks Ago", "X Months Ago" formats — fixes a real bug where these formats fell through to the raw-string return and got filtered downstream.

AI reviewer feedback (CodeRabbit + Gemini): all addressed in commit 6434af1.

Extracted DAYS_PER_WEEK / DAYS_PER_MONTH module-level constants — done.
Replaced timedelta(days=months * 30) with relativedelta(months=months) for calendar-accurate month arithmetic — done.
Removed the unreachable "Active X Days Ago" block; the earlier (\d+)\s*days?\s+ago regex already handles it (verified with real input — see test results below).

Local verification on PR head (6434af1):

All 22 hand-written real-data assertions pass (regression + new behaviors + edge cases): "Posted 1 Month Ago" → exactly 1 calendar month back via relativedelta, "Active 5 Days Ago" → correctly resolved via the existing days regex, "Recently"/"Just now"/"Just posted" → today, "3 wks ago" / "6 mos ago" abbreviations work.
Full repo test suite: 718 passed locally (Python 3.13.5).
CI test_unicorn_company failed once due to Python hash-randomization picking a unicorn that is also FAANG+ (next(iter(UNICORNS)) is non-deterministic and get_company_tier checks FAANG_PLUS before UNICORNS). This is a pre-existing flaky test unrelated to this PR's diff (which only touches normalize_date_string). Re-run succeeded.

All 6 status checks now green, mergeStateStatus: CLEAN. LGTM.

…NG_PLUS (#298) Closes #297. ## Summary - `test_unicorn_company` flaked on PR #295's CI with `assert 'faang_plus' == 'unicorn'`. Root cause: `get_company_tier()` checks `FAANG_PLUS` before `UNICORNS`, and the test used `next(iter(UNICORNS))` — which depends on `PYTHONHASHSEED` and occasionally lands on a company that is in both sets. - Pick deterministically from `sorted(UNICORNS - FAANG_PLUS)` so the chosen company is guaranteed to resolve to `'unicorn'`. - Apply the same sort-for-determinism hardening to `test_finance_sector_detected`, `test_defense_sector_detected`, and `test_company_can_overlap_tier_and_sector` so a future tier-precedence change cannot reintroduce flakiness. - Use `pytest.skip(...)` instead of a silent `if ...:` branch so that an empty config category is loudly visible. ## Test plan - [x] `pytest tests/test_enrichment.py -v` — 93 passed. - [x] Stress-tested across 10 `PYTHONHASHSEED` values (0, 1, 7, 13, 42, 99, 314, 1234, 9999, 12345): 7/7 of the `TestGetCompanyTier` cases pass every time. - [x] Full repo suite (`pytest`): 718 passed. ## Risk Test-only change. No production code touched.

feat(scraper): add support for weeks, months, and 'just posted' date …

cf71786

…formats

whbbupt requested a review from ambicuity as a code owner May 24, 2026 07:20

gemini-code-assist Bot reviewed May 24, 2026

View reviewed changes

coderabbitai Bot reviewed May 24, 2026

View reviewed changes

Comment thread scripts/update_jobs.py Outdated

Comment thread scripts/update_jobs.py Outdated

Comment thread scripts/update_jobs.py Outdated

fix: address code review feedback - extract constants and use relativ…

6434af1

…edelta

ambicuity approved these changes May 26, 2026

View reviewed changes

ambicuity merged commit abdbdfc into ambicuity:main May 26, 2026
7 of 8 checks passed

This was referenced May 26, 2026

Flaky test: test_unicorn_company can fail when iteration order picks a FAANG+ unicorn #297

Closed

test: fix flaky test_unicorn_company by selecting from UNICORNS - FAANG_PLUS #298

Merged

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

feat(scraper): add support for weeks, months, and 'just posted' date formats#295

feat(scraper): add support for weeks, months, and 'just posted' date formats#295
ambicuity merged 2 commits into
ambicuity:mainfrom
whbbupt:feat/add-date-parsing

whbbupt commented May 24, 2026 •

edited by coderabbitai Bot

Loading

Uh oh!

coderabbitai Bot commented May 24, 2026 •

edited

Loading

Review limit reached

Walkthrough

Changes

Estimated code review effort

Possibly related PRs

Suggested labels

Suggested reviewers

Poem

Uh oh!

gemini-code-assist Bot left a comment

Uh oh!

gemini-code-assist Bot May 24, 2026

Uh oh!

coderabbitai Bot left a comment

Uh oh!

Uh oh!

Uh oh!

Uh oh!

ambicuity left a comment

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

Conversation

whbbupt commented May 24, 2026 • edited by coderabbitai Bot Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Summary by CodeRabbit

Uh oh!

coderabbitai Bot commented May 24, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Review limit reached

Walkthrough

Changes

Estimated code review effort

Possibly related PRs

Suggested labels

Suggested reviewers

Poem

Uh oh!

gemini-code-assist Bot left a comment

Choose a reason for hiding this comment

Code Review

Uh oh!

gemini-code-assist Bot May 24, 2026

Choose a reason for hiding this comment

Uh oh!

coderabbitai Bot left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

Uh oh!

ambicuity left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

whbbupt commented May 24, 2026 •

edited by coderabbitai Bot

Loading

coderabbitai Bot commented May 24, 2026 •

edited

Loading