test: fix flaky test_unicorn_company by selecting from UNICORNS - FAANG_PLUS by ambicuity · Pull Request #298 · ambicuity/New-Grad-Jobs

ambicuity · 2026-05-26T17:04:28Z

Closes #297.

Summary

test_unicorn_company flaked on PR feat(scraper): add support for weeks, months, and 'just posted' date formats #295's CI with assert 'faang_plus' == 'unicorn'. Root cause: get_company_tier() checks FAANG_PLUS before UNICORNS, and the test used next(iter(UNICORNS)) — which depends on PYTHONHASHSEED and occasionally lands on a company that is in both sets.
Pick deterministically from sorted(UNICORNS - FAANG_PLUS) so the chosen company is guaranteed to resolve to 'unicorn'.
Apply the same sort-for-determinism hardening to test_finance_sector_detected, test_defense_sector_detected, and test_company_can_overlap_tier_and_sector so a future tier-precedence change cannot reintroduce flakiness.
Use pytest.skip(...) instead of a silent if ...: branch so that an empty config category is loudly visible.

Test plan

pytest tests/test_enrichment.py -v — 93 passed.
Stress-tested across 10 PYTHONHASHSEED values (0, 1, 7, 13, 42, 99, 314, 1234, 9999, 12345): 7/7 of the TestGetCompanyTier cases pass every time.
Full repo suite (pytest): 718 passed.

Risk

Test-only change. No production code touched.

Fixes #297. `test_unicorn_company` was flaky because `get_company_tier()` checks FAANG_PLUS before UNICORNS, so a company in both sets resolves to 'faang_plus'. The old test took `next(iter(UNICORNS))`, which depends on PYTHONHASHSEED, and occasionally picked a FAANG+ unicorn — causing unrelated PRs (e.g. #295) to fail CI. Fix: select from `sorted(UNICORNS - FAANG_PLUS)` so the company is deterministic and is guaranteed to resolve to the 'unicorn' tier. Apply the same sort-for-determinism hardening to test_finance_sector_detected, test_defense_sector_detected, and test_company_can_overlap_tier_and_sector, so future tier-precedence refactors do not re-introduce flakiness. Verified across 10 PYTHONHASHSEED values (0..12345): 7/7 pass each time. Full suite: 718 passed.

gemini-code-assist

Code Review

This pull request improves the determinism of tests in tests/test_enrichment.py by sorting sets before selecting elements, preventing issues caused by Python's hash seed randomization. It also introduces pytest.skip to explicitly skip tests when required configurations are missing. The review feedback suggests reorganizing the imports at the top of the file to comply with PEP 8 guidelines by separating standard library and third-party imports.

gemini-code-assist · 2026-05-26T17:05:28Z

 import sys
 import os
 import math
+import pytest
 import requests
 from datetime import datetime, timedelta, timezone, date
 from unittest.mock import patch


According to PEP 8, imports should be grouped in the following order:

Standard library imports

Related third-party imports

Local application/library specific imports

Each group should be separated by a blank line. Currently, standard library imports (sys, os, math, datetime, unittest.mock) and third-party imports (pytest, requests) are mixed together.

Suggested change

import sys

import os

import math

import pytest

import requests

from datetime import datetime, timedelta, timezone, date

from unittest.mock import patch

import math

import os

import sys

from datetime import datetime, timedelta, timezone, date

from unittest.mock import patch

import pytest

import requests

References

Imports should be grouped in the following order: standard library imports, third-party imports, and local imports, with a blank line separating each group. ^(link)

codecov · 2026-05-26T17:05:33Z

Codecov Report

✅ All modified and coverable lines are covered by tests.

📢 Thoughts on this report? Let us know!

Pre-commit's end-of-file-fixer hook (--all-files) fails on every PR until the file on main has a trailing newline. The file is regenerated by automation that strips it; a follow-up should fix the generator (separate scope from this PR).

Copilot

Pull request overview

This PR hardens get_company_tier()-related tests against Python set iteration nondeterminism (hash-seed dependent ordering), eliminating CI flakiness when a selected company happens to exist in multiple classification sets.

Changes:

Fix test_unicorn_company flakiness by selecting deterministically from sorted(UNICORNS - FAANG_PLUS).
Make sector/overlap tests deterministic by sorting candidate sets before selecting an element.
Replace silent “no-op” branches with explicit pytest.skip(...) so empty categories are visible in test output.

coderabbitai · 2026-05-26T17:07:47Z

Warning

Review limit reached

@ambicuity, we couldn't start this review because you've reached your PR review rate limit.

More reviews will be available in 52 minutes and 18 seconds. Learn how PR review limits work.

Your organization has run out of usage credits. Purchase more in the billing tab.

⌛ How to resolve this issue?

After more reviews become available, a review can be triggered using the @coderabbitai review command as a PR comment. Alternatively, push new commits to this PR.

We recommend that you space out your commits to avoid hitting the rate limit.

🚦 How do rate limits work?

CodeRabbit enforces hourly rate limits for each developer per organization.

Our paid plans include higher PR review limits than trial, open-source, and free plans. In all cases, reviews become available again over time. During sustained high-volume PR review activity, CodeRabbit may temporarily slow when the next review becomes available.

Please see our Fair Usage Limits Policy for further information.

ℹ️ Review info

⚙️ Run configuration

Configuration used: Repository UI

Review profile: CHILL

Plan: Pro

Run ID: 170fe185-b738-4f7a-b977-d092956ea96c

📥 Commits

Reviewing files that changed from the base of the PR and between 58fc6a1 and 1474a55.

📒 Files selected for processing (2)

docs/predictions.json
tests/test_enrichment.py

✨ Finishing Touches

🧪 Generate unit tests (beta)

Create PR with unit tests
Commit unit tests in branch fix/flaky-unicorn-test

Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share

_{Comment @coderabbitai help to get the list of available commands and usage tips.}

Address gemini-code-assist review feedback (matches .gemini/styleguide.md). The pre-commit isort hook is scoped to scripts/*.py, so tests/ imports weren't auto-grouped.

Address github-advanced-security alert. 'date' was imported but never used in this file (all in-file occurrences are inside docstrings or as part of identifiers like format_posted_date).

@rvac-bucky

## Why `docs/market-history.json` has been silently empty-on-categories for the entire 90-day retention window. The aggregation step in `save_market_history()` reads the wrong key. `scripts/update_jobs.py` (pre-fix): ```python for job in jobs: for category in job.get('categories', []): # plural list, never populated post-enrichment category_counts[category] += 1 ``` But `enrich_jobs()` writes the singular form and never creates a `categories` list: ```python category = categorize_job(title, description) job['category'] = category # {'id': 'software_engineering', 'name': ..., 'emoji': ...} ``` Reproduced against the live artifact: ``` $ python3 -c "import json; d=json.load(open('docs/market-history.json')); \ [print(s['date'], 'cats=', len(s['categories'])) for s in d['snapshots'][-5:]]" 2026-05-22 cats= 0 2026-05-23 cats= 0 2026-05-24 cats= 0 2026-05-25 cats= 0 2026-05-26 cats= 0 ``` All 81 retained snapshots show `categories: {}`. The category breakdown that feeds any "growing/declining categories" analytic has been dead. ## What New module-level helper `iter_category_ids(job)` (`scripts/update_jobs.py:2464`): - Reads from `category.id` first (the enriched shape). - Falls back to the legacy `categories` list **only** when the singular path is absent or invalid — handles `None`, `{}`, `{'id': None}`, `{'id': ''}`, non-list `categories` values, and non-string elements. - Strips whitespace and skips empty values so malformed upstream data can't poison the Counter. `save_market_history()` switches to the new helper. ## Tests `tests/test_save_market_history.py` adds four targeted regressions: - `test_counts_categories_correctly_from_singular_category_field` — the path that's broken on main today. - `test_prefers_singular_category_field_over_legacy_categories_list` — guards against double-counting if both shapes coexist. - `test_falls_back_to_legacy_categories_list_when_category_missing` — keeps old fixtures working. - `test_falls_back_to_legacy_categories_when_category_payload_is_invalid` — partial-scrape recovery against four invalid singular shapes. Existing snapshot-schema / retention / determinism assertions stay green. ## Validation - `pytest tests/test_save_market_history.py`: 25 passed. - `pytest` (full suite): 722 passed. - `pre-commit run --all-files`: clean. - End-to-end smoke (run real `enrich_jobs()` → `save_market_history()` on synthetic raw jobs, inspect resulting `docs/market-history.json`): ``` before fix: categories: {} after fix: categories: {'software_engineering': 2, 'data_ml': 2} ``` Counts equal the enriched job count, as expected. ## Provenance This is the same scoped fix from #274 (closed without merge despite green CI), rebased cleanly onto current `main`. Original work by @rvac-bucky — author attribution preserved in the commit. The `docs/predictions.json` conflict from #274 doesn't re-occur because #298 already landed the trailing-newline normalization. Closes #228  ## Summary by CodeRabbit * **Refactor** * Improved job category data normalization to support both legacy and enriched data formats, enhancing system robustness and backward compatibility. * **Tests** * Updated test suite to validate consistent category data handling and standardized tier naming conventions.  [![Review Change Stack](https://storage.googleapis.com/coderabbit_public_assets/review-stack-in-coderabbit-ui.svg)](https://app.coderabbit.ai/change-stack/ambicuity/New-Grad-Jobs/pull/299?utm_source=github_walkthrough&utm_medium=github&utm_campaign=change_stack)   --------- Co-authored-by: rvac-bucky <263012179+rvac-bucky@users.noreply.github.com>

Copilot AI review requested due to automatic review settings May 26, 2026 17:04

Copilot started reviewing on behalf of ambicuity May 26, 2026 17:04 View session

gemini-code-assist Bot reviewed May 26, 2026

View reviewed changes

Copilot AI reviewed May 26, 2026

View reviewed changes

style: regroup imports per PEP 8 (stdlib / third-party)

e1a131d

Address gemini-code-assist review feedback (matches .gemini/styleguide.md). The pre-commit isort hook is scoped to scripts/*.py, so tests/ imports weren't auto-grouped.

github-advanced-security AI found potential problems May 26, 2026

View reviewed changes

Comment thread tests/test_enrichment.py Fixed

style: remove unused 'date' import (CodeQL: py/unused-import)

1474a55

Address github-advanced-security alert. 'date' was imported but never used in this file (all in-file occurrences are inside docstrings or as part of identifiers like format_posted_date).

ambicuity merged commit 1cd3848 into main May 26, 2026
7 checks passed

ambicuity deleted the fix/flaky-unicorn-test branch May 26, 2026 17:14

This was referenced May 26, 2026

fix: replace hardcoded timeout with DEFAULT_TIMEOUT constant #296

Merged

fix: restore market history category snapshots #274

Closed

fix: restore market history category snapshots #299

Merged

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

test: fix flaky test_unicorn_company by selecting from UNICORNS - FAANG_PLUS#298

test: fix flaky test_unicorn_company by selecting from UNICORNS - FAANG_PLUS#298
ambicuity merged 4 commits into
mainfrom
fix/flaky-unicorn-test

ambicuity commented May 26, 2026

Uh oh!

gemini-code-assist Bot left a comment

Uh oh!

gemini-code-assist Bot May 26, 2026

Uh oh!

codecov Bot commented May 26, 2026

Uh oh!

Copilot AI left a comment

Uh oh!

coderabbitai Bot commented May 26, 2026 •

edited

Loading

Review limit reached

Uh oh!

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

Conversation

ambicuity commented May 26, 2026

Summary

Test plan

Risk

Uh oh!

gemini-code-assist Bot left a comment

Choose a reason for hiding this comment

Code Review

Uh oh!

gemini-code-assist Bot May 26, 2026

Choose a reason for hiding this comment

Uh oh!

codecov Bot commented May 26, 2026

Codecov Report

Uh oh!

Copilot AI left a comment

Choose a reason for hiding this comment

Pull request overview

Uh oh!

coderabbitai Bot commented May 26, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Review limit reached

Uh oh!

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

coderabbitai Bot commented May 26, 2026 •

edited

Loading