Skip to content

Latest commit

 

History

History
317 lines (239 loc) · 10.8 KB

File metadata and controls

317 lines (239 loc) · 10.8 KB

Testing Guide (Phase 3)

📖 Testing Philosophy (CSC315 - System Analysis & Design)

Testing is essential for catching bugs before they reach production. By writing tests, we identify root causes early, fix them quickly, and ensure the system is production-ready.

Unit Tests vs Integration Tests

Unit Tests test individual components in isolation:

  • Example: Test FetchJobRepository.mark_completed() with a mocked database
  • Mock all dependencies (DB, external services)
  • Fast execution, easy to debug
  • Catch bugs within a single function/service

Integration Tests test multiple layers working together:

  • Example: Test entire run_fetch_cycle() with real repositories, real database, but mocked HTTP
  • Use real components (repos, models) but mock only external services
  • Slower, but catch bugs at component boundaries
  • Catch data flow issues, serialization bugs, validation problems that unit tests miss

Real-world example from your code:

  • Unit test catches: "Does FetchService.fetch_rss() parse RSS correctly?"
  • Integration test catches: "When scheduler fetches from source1 (succeeds) and source2 (fails), are jobs marked correctly AND posts persisted only for source1?"

Test Pyramid (CSC315)

         ┌─────────────────┐
         │  E2E Tests      │  (Few - test with real HTTP)
         │  (Slowest)      │
         └─────────────────┘
      ┌──────────────────────┐
      │ Integration Tests    │  (Some - real DB, mocked HTTP)
      │  (Moderate Speed)    │
      └──────────────────────┘
   ┌──────────────────────────────┐
   │  Unit Tests (Many)           │  (Most - isolated, fast)
   │  (Fastest - mocked deps)     │
   └──────────────────────────────┘

We write more unit tests (fast, cheap) and fewer integration tests (slower, more setup) because:

  • Unit tests catch 80% of bugs quickly
  • Integration tests catch the remaining 20% that happens when components interact
  • E2E tests are expensive and slow, reserved for critical paths

🚀 How to Run Tests

# Run all tests
pytest tests/

# Run only unit tests
pytest tests/unit/

# Run only integration tests
pytest tests/integration/

# Verbose output (show each test)
pytest tests/ -v

# Show print statements (useful for debugging)
pytest tests/ -s

# Run specific test file
pytest tests/integration/test_scheduler_service_integration.py -v -s

# Run specific test function
pytest tests/integration/test_scheduler_service_integration.py::test_scheduler_integration_multiple_source_sucess -v -s

# Run with coverage report
pytest tests/ --cov=app --cov-report=html

📂 Test Organization

tests/
├── __init__.py
├── conftest.py                    # Shared fixtures for all tests
├── unit/                          # Unit tests (isolated components)
│   ├── test_fetch_job_repo.py
│   ├── test_fetch_service.py
│   ├── test_scheduler_service.py
│   └── test_auth_service.py
└── integration/                   # Integration tests (multiple layers)
    ├── test_scheduler_service_integration.py
    └── test_auth_routers.py

Principle (CSC315):

  • unit/ → Test single function/class in isolation; mock everything
  • integration/ → Test multiple layers together; use real DB, mock only external HTTP
  • conftest.py → Shared fixtures (database, content sources, posts, mocks)

🔧 Test Data & Fixtures (CSC318 - Async Testing)

All fixtures are defined in conftest.py and automatically available to all tests:

1. test_db_session (In-Memory SQLite)

@fixture
async def test_db_session():
    # Creates fresh in-memory database for each test
    # Drops all tables after test completes
    # Ensures zero test interference

Why SQLite in memory?

  • Fast (no disk I/O)
  • Isolated (each test gets clean database)
  • No external dependencies

2. test_content_source

Creates a test RSS feed source in the database:

  • name="Test Feed"
  • url="https://example.com/feed.xml"
  • type_of=SourceType.RSS
  • is_active=True

3. test_fetch_job

Creates a QUEUED job in the database, linked to a content source

4. test_post

Creates a sample post in the database with correct source_id foreign key

5. mock_posts

Provides sample PostCreate objects (not in database, used for mocking)

Async Fixture Pattern (CSC318):

from pytest_asyncio import fixture

@fixture
async def test_db_session():
    # setup: Create in-memory database
    ...
    yield session  # Provide to test
    # teardown: Drop all tables
    ...

📝 How to Add New Tests

Adding a Unit Test

Example: Test FetchService error handling

# tests/unit/test_fetch_service.py

@pytest.mark.asyncio
async def test_fetch_rss_timeout():
    """
    Test: FetchService raises TimeoutError when request exceeds timeout
    """
    # Setup: Create mock HTTP client
    mock_aclient = mocker.AsyncMock(spec=httpx.AsyncClient)
    mock_aclient.get.side_effect = asyncio.TimeoutError()
    
    # Exercise: Call fetch_rss with mocked client
    service = FetchService(http_client=mock_aclient)
    with pytest.raises(asyncio.TimeoutError):
        await service.fetch_rss(...)
    
    # Verify: Exception was raised (pytest.raises confirms)
    assert True  # Test passes

Key pattern (CSC315 - Black-box testing):

  1. Mock all dependencies
  2. Test the function in isolation
  3. Verify behavior with assertions

Adding an Integration Test

Example: Test scheduler with 3 sources (2 succeed, 1 fails)

# tests/integration/test_scheduler_service_integration.py

@pytest.mark.asyncio
async def test_scheduler_three_sources_mixed():
    """
    Test: Scheduler processes 3 sources; 2 succeed, 1 fails.
    Verify: Successful jobs marked COMPLETED, failed job marked FAILED,
            only successful sources' posts persisted.
    """
    # Step 1: Create 3 real sources in test DB
    source1 = await content_source_repo.create(...)
    source2 = await content_source_repo.create(...)
    source3 = await content_source_repo.create(...)
    
    # Step 2: Mock FetchService with side_effect for each source
    mock_fetch_service.fetch_rss.side_effect = [
        [posts_for_source1],           # Source 1 succeeds
        [posts_for_source2],           # Source 2 succeeds
        ValueError("Feed failed"),     # Source 3 fails
    ]
    
    # Step 3: Create service with real repos, mocked HTTP
    service = SchedulerService(
        fetch_job_repo=real_repo,
        post_repo=real_repo,
        fetch_service=mock_fetch_service,  # Mocked
    )
    
    # Step 4: Execute
    await service.run_fetch_cycle()
    
    # Step 5: Verify all 3 jobs created, 2 COMPLETED, 1 FAILED
    job1 = await fetch_job_repo.get_latest_for_source(source1.id)
    job2 = await fetch_job_repo.get_latest_for_source(source2.id)
    job3 = await fetch_job_repo.get_latest_for_source(source3.id)
    
    assert job1.status == FetchJobStatus.COMPLETED
    assert job2.status == FetchJobStatus.COMPLETED
    assert job3.status == FetchJobStatus.FAILED
    
    # Step 6: Verify posts persisted only for sources 1 & 2
    posts = await post_repo.get_all()
    assert len(posts) == 4  # 2 posts from source1 + 2 from source2
    assert all(p.source_id in [source1.id, source2.id] for p in posts)

Key pattern (CSC315 - Integration):

  1. Create real objects in test DB
  2. Mock only external services (HTTP)
  3. Execute the full flow
  4. Verify persistence + state transitions

📊 Coverage Goals

Target: 80%+ code coverage

Run coverage report:

pytest tests/ --cov=app --cov-report=html

Opens htmlcov/index.html in browser to visualize covered/uncovered lines.

Focus coverage on:

  • ✅ State transitions (QUEUED → ONGOING → COMPLETED/FAILED)
  • ✅ Error handling (what happens when fetch fails?)
  • ✅ Data persistence (do repos actually save to DB?)
  • ✅ Edge cases (empty feeds, malformed XML, timeouts)

Don't obsess over 100% coverage — sometimes code is untested for good reason (e.g., error paths we can't easily trigger).


⚠️ Known Limitations & Future Work (CSC317 - Simulation)

Current MVP Limitations

  1. Sleep-based timestamp ordering (Phase 4 will improve)

    • We use asyncio.sleep(0.3) delays to ensure distinct created_at timestamps
    • Reason: Prevents ordering ambiguity when multiple jobs created in same millisecond
    • Future: Use UUID-based ordering (more robust, deterministic)
  2. No distributed scheduler testing

    • Currently tests single-process scheduler
    • Future: Test with Celery + Redis for concurrent fetch jobs
  3. No E2E tests with real HTTP

    • All tests mock HTTP (deterministic, fast)
    • Future: Add E2E tests against real RSS feeds (integration environment only)

Performance Considerations (CSC314)

  • Tests use in-memory SQLite (fast but not production-like)
  • Large dataset tests (1000+ posts) not yet included
  • Query performance optimization deferred to Phase 4

📚 Course Integration (Your Learning Path)

Course Concept Application
CSC315 Black-box testing, test isolation, DTO pattern Unit tests mock all deps; integration tests use real repos
CSC318 Async testing, pytest-asyncio, fixture management All tests use @pytest.mark.asyncio and async fixtures
CSC314 Edge cases, algorithm robustness Tests cover empty feeds, malformed XML, timeouts
CSC317 State machine testing, simulation Tests verify all FetchJob state transitions

🔗 Related Files

Test Files:

Implementation Files:


Last Updated: January 12, 2026
Phase 3 Status: ✅ 100% Complete
Total Tests: 13+ passing (6 unit FetchJobRepo + 5 unit FetchService + 2 unit SchedulerService + 2 integration)