Skip to content

feat(optimizer): [3/N] Analyzer#533

Open
mkuchenbecker wants to merge 198 commits into
mainfrom
mkuchenb/optimizer-3
Open

feat(optimizer): [3/N] Analyzer#533
mkuchenbecker wants to merge 198 commits into
mainfrom
mkuchenb/optimizer-3

Conversation

@mkuchenbecker
Copy link
Copy Markdown
Collaborator

@mkuchenbecker mkuchenbecker commented Apr 7, 2026

Optimizer Stack

PR Content
#527 Data Model
#530 Database Repos
#531 REST service
#533 (this) Analyzer app
#534 Scheduler app
#599 Spark BatchedOFD app
#tbd Infra, docker-compose, smoke test

Summary

PR 3 of N in the optimizer stack.

Introduces apps/optimizer-analyzer, a Spring Boot CommandLineRunner that evaluates every table in table_stats against pluggable OperationAnalyzer strategies. The first strategy, OrphanFilesDeletionAnalyzer, schedules OFD operations with 24h success / 1h failure retry cadence, a 6h SCHEDULED timeout, and a 5-strike circuit breaker.

Key design choices:

  • Bulk-loads operations and history into maps (one query per type), then iterates the stats list — O(types) queries, not O(tables).
  • Uses the existing generic find() repository methods with null params.
  • Pure unit tests with Mockito — no Spring context needed.

Changes

  • Client-facing API Changes
  • Internal API Changes
  • Bug Fixes
  • New Features
  • Performance Improvements
  • Code Style
  • Refactoring
  • Documentation
  • Tests

Core: AnalyzerRunner — loads table_stats, pre-loads operations and history into maps, evaluates each table against all analyzers, circuit breaker logic.

Strategy interface: OperationAnalyzerisEnabled(table), shouldSchedule(table, currentOp, latestHistory), getCircuitBreakerThreshold().

Cadence policy: CadencePolicy — encapsulates time-based retry logic shared across operation types.

OFD analyzer: OrphanFilesDeletionAnalyzer — enabled via maintenance.optimizer.ofd.enabled table property.

Testing Done

  • Manually Tested on local docker setup. Please include commands ran, and their output.
  • Added new tests for the changes made.
  • Updated existing tests to reflect the changes made.
  • No tests added or updated. Please explain why. If unsure, please feel free to ask for help.
  • Some other form of testing like staging or soak time in production. Please explain.

25 unit tests:

  • AnalyzerRunnerTest (7 tests) — eligible table insertion, cadence skip, disabled table, shouldSchedule=false, null UUID, circuit breaker trip, below-threshold pass
  • OrphanFilesDeletionAnalyzerTest (18 tests) — isEnabled variants, shouldSchedule for no-op/PENDING/SCHEDULING/SCHEDULED with history combinations
./gradlew :apps:optimizer-analyzer:test
# BUILD SUCCESSFUL — 25 tests pass

Additional Information

  • Breaking Changes
  • Deprecations
  • Large PR broken into smaller PRs, and PR plan linked in the description.

mkuchenbecker and others added 14 commits April 3, 2026 11:00
Introduces the optimizer service module with:
- MySQL/H2 schema for table_operations, table_stats, table_stats_history,
  and table_operations_history
- JPA entities with JSON column support (vladmihalcea hibernate-types)
- All model/DTO/enum types: OperationType, OperationStatus, TableStats,
  CompleteOperationRequest, JobResult, OperationMetrics, etc.
- JPA AttributeConverters for JobResult and OperationMetrics JSON columns
- MapStruct mapper (OptimizerMapper) for entity→DTO conversion
- Spring Boot application shell and build wiring (settings.gradle,
  build.gradle dockerPrereqs)

No repositories, controllers, or service layer yet — those follow in
subsequent PRs.

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
- Remove OperationMetrics class and converter; stats are read
  directly from table_stats instead of duplicating into operations
- Remove orphanFilesDeleted/orphanBytesDeleted from history entity,
  DTO, and schema; operation-specific data belongs in the result JSON
- Add addedSizeBytes to CommitDelta for tracking write volume
- Fix OperationType javadoc to describe current state, not roadmap
- Fix TableOperationsHistoryRow javadoc: written on operation
  complete, not by Spark app directly
- Add field comments to all DTOs and request objects

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
Spring Data JPA repositories for all four optimizer tables with
filtered query support. Includes tests exercising save/find,
filtered queries, upsert semantics, and append-only history.

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
Address PR review comments: rename findFiltered → find across all repos,
remove redundant findByTableUuid/findByTableUuidSince from history repos,
add explicit assertion to context test.

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
Service interface and implementation for all optimizer CRUD operations
including complete-operation lifecycle, stats upsert with history
double-write, and filtered queries. Three REST controllers expose the
endpoints. The apps/optimizer shared module provides lightweight
entity/repo copies for the analyzer and scheduler apps.

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
Align OptimizerDataServiceImpl with renamed repository methods from
optimizer-1 review feedback.

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
Shared JPA entities and repositories for optimizer apps (analyzer,
scheduler). All repos expose a single find method with optional filters.

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
Resolve repo conflicts by taking optimizer-1's clean find-only versions.
Scheduler-specific methods and streamAll removed per review feedback.

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
These fields never belonged in the data model — remove them at the
source rather than adding then deleting in a later PR.

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
Propagate CompleteOperationRequest orphan field removal.

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
Propagate CompleteOperationRequest orphan field removal.

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
H2 integration tests for OptimizerDataServiceImpl covering
completeOperation (write history, not-found) and upsertTableStats
(create, update, history append).

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
Strengthen upsertTableStats test to verify history rows contain the raw
delta stats from each call, not just the row count.

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
…ling

Introduces apps/optimizer-analyzer, a Spring Boot CommandLineRunner that
evaluates every table in table_stats against pluggable OperationAnalyzer
strategies. The first strategy, OrphanFilesDeletionAnalyzer, schedules
OFD operations with 24h success / 1h failure retry cadence, a 6h
SCHEDULED timeout, and a 5-strike circuit breaker.

Key design choices:
- Bulk-loads operations and history into maps (one query per type),
  then iterates the stats list — O(types) queries, not O(tables).
- Uses the existing generic find() repository methods with null params.
- Pure unit tests with Mockito — no Spring context needed.

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
mkuchenbecker and others added 9 commits May 20, 2026 15:55
Repo public API now:
  - find(...) with Optional<T> filters + required Pageable, on all four repos
  - updateBatch(ids, fromStatus, toStatus, Optional<Instant> scheduledAt,
                Optional<String> jobId) — replaces markSchedulingBatch,
    markScheduledBatch, markPendingBatch
  - cancel(ids) — replaces cancelDuplicatePendingBatch; deletes by-id with
    a defensive PENDING-only gate
  - findLatest(opType, Pageable) — was findLatestPerTable
  - history.find(tableUuid, Pageable) — was findByTableUuidOrderByCompletedAtDesc

Side-effect columns on updateBatch use COALESCE with Optional.empty() →
leave-unchanged. scheduledAt is not cleared on SCHEDULING → PENDING revert;
status is the source of truth and the watermark is overwritten on the next
claim.

@Modifying queries get flushAutomatically + clearAutomatically so the L1
cache reflects the change immediately (caught by the unit tests).

Spring Data @query can't share an "IS NULL OR IN :list" pattern (Hibernate
expands the list inline and the IS NULL check turns ungrammatical). The
find path uses two internal queries dispatched by the default method —
one with the ids predicate, one without.

Callers (service, analyzer, scheduler) update on opt-2..opt-4 in
follow-up commits.

Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
OptimizerDataServiceImpl pipes Optional<T> filters straight through to
the repo (no .orElse(null) at the boundary). Adds the
optimizer.repo.default-limit config property and threads it into the
list-shaped calls. Service-impl test updates the one direct
statsHistoryRepository.find(...) call to pass Optional.empty().

Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
AnalyzerRunner now passes Optional<T> filters to the repos and a
PageRequest backed by optimizer.repo.default-limit. findLatestPerTable
becomes findLatest with a Pageable. Test stubs updated to use
ArgumentMatchers (eq/any) for the Optional-typed params.

defaultLimit is initialised inline so pure-Mockito tests (no Spring
context) get a sane value when @value isn't injected.

Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
Instant.now() on Linux CI carries nanoseconds; MySQL TIMESTAMP(6) and
H2 in MySQL mode store microseconds. The scheduledAt = :scheduledAt
predicate in find(...) compared nano-resolution param against
micro-resolution stored value and missed. Local (macOS, micro-only)
hid the bug.

Truncate to ChronoUnit.MICROS at write time in the one repo test that
exercises the watermark round-trip.

Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
mkuchenbecker added a commit that referenced this pull request May 21, 2026
## Optimizer Stack

| PR | Content |
|---|---|
| #527 | Data Model |
| #530 **(this)** | Database
Repos |
| #531 | REST service |
| #533 | Analyzer app |
| #534 | Scheduler app |
| #tbd | Spark BatchedOFD app |
| #tbd | Infra, docker-compose, smoke test |

## Summary

PR 1 of N in the optimizer stack. 
[Overall
Project](https://docs.google.com/document/d/1oGQWkmlVw0HG-D4Nx37q0oUEQd53Ni4q5Q7h1voaZIQ/edit?tab=t.0#heading=h.vtl818e4m9f7)
[Service Design
doc](https://docs.google.com/document/d/1xYOD7iuPjZO05UWfT1Dkmf4lYNw4iOjqY10UT2X1FAg/edit?tab=t.0).


Spring Data JPA repositories for all four optimizer tables with filtered
query support, plus tests exercising save/find, filtered queries, upsert
semantics, and append-only history.

## Changes

- [ ] Client-facing API Changes
- [ ] Internal API Changes
- [ ] Bug Fixes
- [x] New Features
- [ ] Performance Improvements
- [ ] Code Style
- [ ] Refactoring
- [ ] Documentation
- [x] Tests

**Repositories**: `TableOperationsRepository`,
`TableOperationsHistoryRepository`, `TableStatsRepository`,
`TableStatsHistoryRepository` — each with JPQL filtered query methods.

**Tests**: Repository tests for all four tables plus
`OptimizerServiceContextTest` verifying the Spring context loads.

## Testing Done

- [ ] Manually Tested on local docker setup. Please include commands
ran, and their output.
- [x] Added new tests for the changes made.
- [ ] Updated existing tests to reflect the changes made.
- [ ] No tests added or updated. Please explain why. If unsure, please
feel free to ask for help.
- [ ] Some other form of testing like staging or soak time in
production. Please explain.

`./gradlew :services:optimizer:test` — all tests pass (H2 in MySQL
mode).

# Additional Information

- [ ] Breaking Changes
- [ ] Deprecations
- [x] Large PR broken into smaller PRs, and PR plan linked in the
description.

---------

Co-authored-by: mkuchenbecker <mkuchenbecker@users.noreply.github.com>
Co-authored-by: Claude Opus 4.6 <noreply@anthropic.com>
mkuchenbecker and others added 3 commits May 21, 2026 15:37
All four list-style endpoints now require a caller-supplied limit. No
server-side default, no max — getting the API contract right comes first;
bounds can land separately.

- TableOperationsController.listTableOperations: @RequestParam int limit
- TableStatsController.listTableStats: @RequestParam int limit
- TableStatsController.getStatsHistory: drop defaultValue="100"
- TableOperationsHistoryController.getHistory: drop defaultValue="100"
- OptimizerDataService: listTableOperations / listTableStats gain int limit
- OptimizerDataServiceImpl: drop @value("${optimizer.repo.default-limit}")
  and the defaultLimit field; thread caller-supplied limit straight to
  PageRequest.of(0, limit), which cascades to MySQL LIMIT n.
- application.properties: remove now-unused optimizer.repo.default-limit.

Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
@mkuchenbecker mkuchenbecker changed the title [BDP-102028] feat(optimizer): [3/N] Analyzer feat(optimizer): [3/N] Analyzer May 22, 2026
mkuchenbecker and others added 10 commits May 22, 2026 10:11
Every optimizer endpoint now returns a uniform ApiError body
({code, message, path}) on non-2xx. Reshape updateOperation so the
operationId lives in the URL (and is repeated in the body for
self-describing payloads).

- api/error/ApiError.java — DTO shape.
- api/error/GlobalExceptionHandler.java — @RestControllerAdvice mapping
  framework exceptions to {VALIDATION_ERROR, INVALID_PARAMETER,
  MISSING_PARAMETER, MALFORMED_REQUEST, INTERNAL_ERROR}. Reason of
  ResponseStatusException is parsed as "CODE: message" for endpoint-
  specific 404s.
- Controllers: orElseThrow(ResponseStatusException) replaces bare 404.
  TableOperationsController moves updateOperation to
  POST /v1/optimizer/operations/{id}/update; rejects with 400
  PATH_BODY_MISMATCH when body.operationId != path.id.
- UpdateOperationRequest: @notblank operationId, @NotNull status.
- UpsertTableStatsRequest: @notblank databaseName, tableName.
- spring-boot-starter-validation dep added.
- New ControllerErrorHandlingTest: 13 MockMvc cases covering 404 / 400
  validation / 400 type-mismatch / 400 missing-param / 400 malformed-
  body / 400 path-body-mismatch + happy-path sanity.

Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
Address review comments on PR #596 — minimal diff, no bean validation,
no per-framework-exception handlers, scoped to the optimizer controllers.

- @RestControllerAdvice scoped to the three optimizer controllers via
  assignableTypes; no longer global.
- Two handlers only: ResponseStatusException → ApiError with code =
  status.name() + reason as message; Exception → 500 INTERNAL_ERROR.
  Drop MethodArgumentNotValidException, MethodArgumentTypeMismatchException,
  MissingServletRequestParameterException, HttpMessageNotReadableException —
  Spring's defaults handle those.
- Drop the "CODE: message" reason-parsing convention.
- Drop @notblank / @NotNull on UpdateOperationRequest and
  UpsertTableStatsRequest; drop @Valid on controllers; drop
  spring-boot-starter-validation dep. Validate operationId / status server-
  side in TableOperationsController.updateOperation — loose-coupling so
  relaxing required fields later doesn't break wire callers.
- String.format throughout; no message concatenation.
- ControllerErrorHandlingTest trimmed from 13 cases to 7: only what the
  controllers actually own (404s, server-side validation on updateOperation,
  happy-path sanity). Framework-level 4xx left to Spring.

Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
…ng defaults

The custom advice was producing a body shape (ApiError {code, message,
path}) that duplicated Spring Boot's default error JSON ({timestamp,
status, error, message, path}). The only substantive difference was
that Spring Boot 2.7 omits the `message` field by default.

Replace the custom advice with a one-line config:
  server.error.include-message=always

Now ResponseStatusException reasons (e.g. "no operation with id X")
reach the caller via Spring's default error body, no custom code.

- Delete api/error/GlobalExceptionHandler.java
- Delete api/error/ApiError.java
- application.properties: server.error.include-message=always
- ControllerErrorHandlingTest assertions trimmed to status-code-only
  (MockMvc does not trigger Spring's error-dispatch to
  BasicErrorController, so body assertions cannot be made in tests
  even though the body is populated on real HTTP requests).

Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
Per review — no change needed on this file. The path/body validation
lives in the controller; the DTO carries the same fields as before with
the existing javadoc.

Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
Address review on PR #531 — each endpoint lists the HTTP response codes
it actually returns, in the same "Resource ACTION: STATUS" style used
by services/tables.

Codes per endpoint:
- POST /operations/{id}/update — 201, 400, 404
- GET  /operations/{id}        — 200, 404
- GET  /operations             — 200, 400
- POST /operations-history     — 201
- GET  /operations-history/{u} — 200, 400
- PUT  /stats/{u}              — 200
- GET  /stats/{u}              — 200, 404
- GET  /stats                  — 200, 400
- GET  /stats/{u}/history      — 200, 400

Annotations only — no runtime behavior change, no new tests required.
swagger-annotations 2.1.11 is already on the optimizer classpath via
openhouse.springboot-conventions.

Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
mkuchenbecker added a commit that referenced this pull request May 22, 2026
## Optimizer Stack

| PR | Content |
|---|---|
| #527 | Data Model |
| #530 | Database Repos |
| #531 **(this)** | REST
service |
| #533 | Analyzer app |
| #534 | Scheduler app |
| #tbd | Spark BatchedOFD app |
| #tbd | Infra, docker-compose, smoke test |

## Summary

PR 2 of N in the optimizer stack. 

Service layer and REST controllers for the optimizer service, plus the
`apps/optimizer` shared module providing lightweight entity/repo copies
for the analyzer and scheduler apps.

## Changes

- [ ] Client-facing API Changes
- [x] Internal API Changes
- [ ] Bug Fixes
- [x] New Features
- [ ] Performance Improvements
- [ ] Code Style
- [ ] Refactoring
- [ ] Documentation
- [x] Tests

**Service layer**: `OptimizerDataService` interface and
`OptimizerDataServiceImpl` — CRUD operations, complete-operation
lifecycle, stats upsert with history double-write, filtered queries.

**Controllers**: `TableOperationsController`,
`TableOperationsHistoryController`, `TableStatsController` — REST
endpoints per the design doc API spec.

**Shared module** (`apps/optimizer`): Lightweight entity and repository
copies used by the analyzer and scheduler apps to read optimizer state
directly from MySQL.

## Testing Done

- [ ] Manually Tested on local docker setup. Please include commands
ran, and their output.
- [x] Added new tests for the changes made.
- [ ] Updated existing tests to reflect the changes made.
- [ ] No tests added or updated. Please explain why. If unsure, please
feel free to ask for help.
- [ ] Some other form of testing like staging or soak time in
production. Please explain.

H2 integration tests in `OptimizerDataServiceImplTest` (5 tests):
- `completeOperation_writesHistoryFromOperationRow` — saves SCHEDULED
row, completes it, asserts history DTO fields
- `completeOperation_notFound_returnsEmpty` — completes nonexistent ID,
asserts empty
- `upsertTableStats_createsNewRow` — upserts new table, asserts DTO and
repo row
- `upsertTableStats_updatesExistingRow` — upserts twice, asserts
overwrite with single row
- `upsertTableStats_appendsHistoryOnEveryCall` — upserts twice, asserts
2 history rows

```
./gradlew :services:optimizer:test
# BUILD SUCCESSFUL — all 25 tests pass (repo tests from PR 1 + 5 new service tests)
```

# Additional Information

- [ ] Breaking Changes
- [ ] Deprecations
- [x] Large PR broken into smaller PRs, and PR plan linked in the
description.

---------

Co-authored-by: mkuchenbecker <mkuchenbecker@users.noreply.github.com>
Co-authored-by: Claude Opus 4.6 <noreply@anthropic.com>
@abhisheknath2011
Copy link
Copy Markdown
Member

Regarding the directory structure, can we make follow two options:

  • Root project can be apps/optimizer/analyzer
  • Root project can be existing one apps/spark and source dir can reflect optimizer.analyzer in the package name.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants