feat(optimizer): [3/N] Analyzer#533
Open
mkuchenbecker wants to merge 198 commits into
Open
Conversation
Introduces the optimizer service module with: - MySQL/H2 schema for table_operations, table_stats, table_stats_history, and table_operations_history - JPA entities with JSON column support (vladmihalcea hibernate-types) - All model/DTO/enum types: OperationType, OperationStatus, TableStats, CompleteOperationRequest, JobResult, OperationMetrics, etc. - JPA AttributeConverters for JobResult and OperationMetrics JSON columns - MapStruct mapper (OptimizerMapper) for entity→DTO conversion - Spring Boot application shell and build wiring (settings.gradle, build.gradle dockerPrereqs) No repositories, controllers, or service layer yet — those follow in subsequent PRs. Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
- Remove OperationMetrics class and converter; stats are read directly from table_stats instead of duplicating into operations - Remove orphanFilesDeleted/orphanBytesDeleted from history entity, DTO, and schema; operation-specific data belongs in the result JSON - Add addedSizeBytes to CommitDelta for tracking write volume - Fix OperationType javadoc to describe current state, not roadmap - Fix TableOperationsHistoryRow javadoc: written on operation complete, not by Spark app directly - Add field comments to all DTOs and request objects Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
Spring Data JPA repositories for all four optimizer tables with filtered query support. Includes tests exercising save/find, filtered queries, upsert semantics, and append-only history. Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
Address PR review comments: rename findFiltered → find across all repos, remove redundant findByTableUuid/findByTableUuidSince from history repos, add explicit assertion to context test. Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
Service interface and implementation for all optimizer CRUD operations including complete-operation lifecycle, stats upsert with history double-write, and filtered queries. Three REST controllers expose the endpoints. The apps/optimizer shared module provides lightweight entity/repo copies for the analyzer and scheduler apps. Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
Align OptimizerDataServiceImpl with renamed repository methods from optimizer-1 review feedback. Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
Shared JPA entities and repositories for optimizer apps (analyzer, scheduler). All repos expose a single find method with optional filters. Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
Resolve repo conflicts by taking optimizer-1's clean find-only versions. Scheduler-specific methods and streamAll removed per review feedback. Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
These fields never belonged in the data model — remove them at the source rather than adding then deleting in a later PR. Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
Propagate CompleteOperationRequest orphan field removal. Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
Propagate CompleteOperationRequest orphan field removal. Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
H2 integration tests for OptimizerDataServiceImpl covering completeOperation (write history, not-found) and upsertTableStats (create, update, history append). Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
Strengthen upsertTableStats test to verify history rows contain the raw delta stats from each call, not just the row count. Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
…ling Introduces apps/optimizer-analyzer, a Spring Boot CommandLineRunner that evaluates every table in table_stats against pluggable OperationAnalyzer strategies. The first strategy, OrphanFilesDeletionAnalyzer, schedules OFD operations with 24h success / 1h failure retry cadence, a 6h SCHEDULED timeout, and a 5-strike circuit breaker. Key design choices: - Bulk-loads operations and history into maps (one query per type), then iterates the stats list — O(types) queries, not O(tables). - Uses the existing generic find() repository methods with null params. - Pure unit tests with Mockito — no Spring context needed. Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
This was referenced Apr 7, 2026
mkuchenbecker
commented
Apr 7, 2026
mkuchenbecker
commented
Apr 7, 2026
mkuchenbecker
commented
Apr 7, 2026
mkuchenbecker
commented
Apr 7, 2026
mkuchenbecker
commented
Apr 7, 2026
mkuchenbecker
commented
Apr 7, 2026
mkuchenbecker
commented
Apr 7, 2026
mkuchenbecker
commented
Apr 7, 2026
mkuchenbecker
commented
Apr 7, 2026
mkuchenbecker
commented
Apr 7, 2026
mkuchenbecker
commented
Apr 7, 2026
mkuchenbecker
commented
Apr 7, 2026
Repo public API now:
- find(...) with Optional<T> filters + required Pageable, on all four repos
- updateBatch(ids, fromStatus, toStatus, Optional<Instant> scheduledAt,
Optional<String> jobId) — replaces markSchedulingBatch,
markScheduledBatch, markPendingBatch
- cancel(ids) — replaces cancelDuplicatePendingBatch; deletes by-id with
a defensive PENDING-only gate
- findLatest(opType, Pageable) — was findLatestPerTable
- history.find(tableUuid, Pageable) — was findByTableUuidOrderByCompletedAtDesc
Side-effect columns on updateBatch use COALESCE with Optional.empty() →
leave-unchanged. scheduledAt is not cleared on SCHEDULING → PENDING revert;
status is the source of truth and the watermark is overwritten on the next
claim.
@Modifying queries get flushAutomatically + clearAutomatically so the L1
cache reflects the change immediately (caught by the unit tests).
Spring Data @query can't share an "IS NULL OR IN :list" pattern (Hibernate
expands the list inline and the IS NULL check turns ungrammatical). The
find path uses two internal queries dispatched by the default method —
one with the ids predicate, one without.
Callers (service, analyzer, scheduler) update on opt-2..opt-4 in
follow-up commits.
Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
OptimizerDataServiceImpl pipes Optional<T> filters straight through to the repo (no .orElse(null) at the boundary). Adds the optimizer.repo.default-limit config property and threads it into the list-shaped calls. Service-impl test updates the one direct statsHistoryRepository.find(...) call to pass Optional.empty(). Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
AnalyzerRunner now passes Optional<T> filters to the repos and a PageRequest backed by optimizer.repo.default-limit. findLatestPerTable becomes findLatest with a Pageable. Test stubs updated to use ArgumentMatchers (eq/any) for the Optional-typed params. defaultLimit is initialised inline so pure-Mockito tests (no Spring context) get a sane value when @value isn't injected. Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
Instant.now() on Linux CI carries nanoseconds; MySQL TIMESTAMP(6) and H2 in MySQL mode store microseconds. The scheduledAt = :scheduledAt predicate in find(...) compared nano-resolution param against micro-resolution stored value and missed. Local (macOS, micro-only) hid the bug. Truncate to ChronoUnit.MICROS at write time in the one repo test that exercises the watermark round-trip. Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
mkuchenbecker
added a commit
that referenced
this pull request
May 21, 2026
## Optimizer Stack | PR | Content | |---|---| | #527 | Data Model | | #530 **(this)** | Database Repos | | #531 | REST service | | #533 | Analyzer app | | #534 | Scheduler app | | #tbd | Spark BatchedOFD app | | #tbd | Infra, docker-compose, smoke test | ## Summary PR 1 of N in the optimizer stack. [Overall Project](https://docs.google.com/document/d/1oGQWkmlVw0HG-D4Nx37q0oUEQd53Ni4q5Q7h1voaZIQ/edit?tab=t.0#heading=h.vtl818e4m9f7) [Service Design doc](https://docs.google.com/document/d/1xYOD7iuPjZO05UWfT1Dkmf4lYNw4iOjqY10UT2X1FAg/edit?tab=t.0). Spring Data JPA repositories for all four optimizer tables with filtered query support, plus tests exercising save/find, filtered queries, upsert semantics, and append-only history. ## Changes - [ ] Client-facing API Changes - [ ] Internal API Changes - [ ] Bug Fixes - [x] New Features - [ ] Performance Improvements - [ ] Code Style - [ ] Refactoring - [ ] Documentation - [x] Tests **Repositories**: `TableOperationsRepository`, `TableOperationsHistoryRepository`, `TableStatsRepository`, `TableStatsHistoryRepository` — each with JPQL filtered query methods. **Tests**: Repository tests for all four tables plus `OptimizerServiceContextTest` verifying the Spring context loads. ## Testing Done - [ ] Manually Tested on local docker setup. Please include commands ran, and their output. - [x] Added new tests for the changes made. - [ ] Updated existing tests to reflect the changes made. - [ ] No tests added or updated. Please explain why. If unsure, please feel free to ask for help. - [ ] Some other form of testing like staging or soak time in production. Please explain. `./gradlew :services:optimizer:test` — all tests pass (H2 in MySQL mode). # Additional Information - [ ] Breaking Changes - [ ] Deprecations - [x] Large PR broken into smaller PRs, and PR plan linked in the description. --------- Co-authored-by: mkuchenbecker <mkuchenbecker@users.noreply.github.com> Co-authored-by: Claude Opus 4.6 <noreply@anthropic.com>
All four list-style endpoints now require a caller-supplied limit. No server-side default, no max — getting the API contract right comes first; bounds can land separately. - TableOperationsController.listTableOperations: @RequestParam int limit - TableStatsController.listTableStats: @RequestParam int limit - TableStatsController.getStatsHistory: drop defaultValue="100" - TableOperationsHistoryController.getHistory: drop defaultValue="100" - OptimizerDataService: listTableOperations / listTableStats gain int limit - OptimizerDataServiceImpl: drop @value("${optimizer.repo.default-limit}") and the defaultLimit field; thread caller-supplied limit straight to PageRequest.of(0, limit), which cascades to MySQL LIMIT n. - application.properties: remove now-unused optimizer.repo.default-limit. Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
Every optimizer endpoint now returns a uniform ApiError body
({code, message, path}) on non-2xx. Reshape updateOperation so the
operationId lives in the URL (and is repeated in the body for
self-describing payloads).
- api/error/ApiError.java — DTO shape.
- api/error/GlobalExceptionHandler.java — @RestControllerAdvice mapping
framework exceptions to {VALIDATION_ERROR, INVALID_PARAMETER,
MISSING_PARAMETER, MALFORMED_REQUEST, INTERNAL_ERROR}. Reason of
ResponseStatusException is parsed as "CODE: message" for endpoint-
specific 404s.
- Controllers: orElseThrow(ResponseStatusException) replaces bare 404.
TableOperationsController moves updateOperation to
POST /v1/optimizer/operations/{id}/update; rejects with 400
PATH_BODY_MISMATCH when body.operationId != path.id.
- UpdateOperationRequest: @notblank operationId, @NotNull status.
- UpsertTableStatsRequest: @notblank databaseName, tableName.
- spring-boot-starter-validation dep added.
- New ControllerErrorHandlingTest: 13 MockMvc cases covering 404 / 400
validation / 400 type-mismatch / 400 missing-param / 400 malformed-
body / 400 path-body-mismatch + happy-path sanity.
Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
Address review comments on PR #596 — minimal diff, no bean validation, no per-framework-exception handlers, scoped to the optimizer controllers. - @RestControllerAdvice scoped to the three optimizer controllers via assignableTypes; no longer global. - Two handlers only: ResponseStatusException → ApiError with code = status.name() + reason as message; Exception → 500 INTERNAL_ERROR. Drop MethodArgumentNotValidException, MethodArgumentTypeMismatchException, MissingServletRequestParameterException, HttpMessageNotReadableException — Spring's defaults handle those. - Drop the "CODE: message" reason-parsing convention. - Drop @notblank / @NotNull on UpdateOperationRequest and UpsertTableStatsRequest; drop @Valid on controllers; drop spring-boot-starter-validation dep. Validate operationId / status server- side in TableOperationsController.updateOperation — loose-coupling so relaxing required fields later doesn't break wire callers. - String.format throughout; no message concatenation. - ControllerErrorHandlingTest trimmed from 13 cases to 7: only what the controllers actually own (404s, server-side validation on updateOperation, happy-path sanity). Framework-level 4xx left to Spring. Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
…ng defaults
The custom advice was producing a body shape (ApiError {code, message,
path}) that duplicated Spring Boot's default error JSON ({timestamp,
status, error, message, path}). The only substantive difference was
that Spring Boot 2.7 omits the `message` field by default.
Replace the custom advice with a one-line config:
server.error.include-message=always
Now ResponseStatusException reasons (e.g. "no operation with id X")
reach the caller via Spring's default error body, no custom code.
- Delete api/error/GlobalExceptionHandler.java
- Delete api/error/ApiError.java
- application.properties: server.error.include-message=always
- ControllerErrorHandlingTest assertions trimmed to status-code-only
(MockMvc does not trigger Spring's error-dispatch to
BasicErrorController, so body assertions cannot be made in tests
even though the body is populated on real HTTP requests).
Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
Per review — no change needed on this file. The path/body validation lives in the controller; the DTO carries the same fields as before with the existing javadoc. Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
…chenb/optimizer-2
Address review on PR #531 — each endpoint lists the HTTP response codes it actually returns, in the same "Resource ACTION: STATUS" style used by services/tables. Codes per endpoint: - POST /operations/{id}/update — 201, 400, 404 - GET /operations/{id} — 200, 404 - GET /operations — 200, 400 - POST /operations-history — 201 - GET /operations-history/{u} — 200, 400 - PUT /stats/{u} — 200 - GET /stats/{u} — 200, 404 - GET /stats — 200, 400 - GET /stats/{u}/history — 200, 400 Annotations only — no runtime behavior change, no new tests required. swagger-annotations 2.1.11 is already on the optimizer classpath via openhouse.springboot-conventions. Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
mkuchenbecker
added a commit
that referenced
this pull request
May 22, 2026
## Optimizer Stack | PR | Content | |---|---| | #527 | Data Model | | #530 | Database Repos | | #531 **(this)** | REST service | | #533 | Analyzer app | | #534 | Scheduler app | | #tbd | Spark BatchedOFD app | | #tbd | Infra, docker-compose, smoke test | ## Summary PR 2 of N in the optimizer stack. Service layer and REST controllers for the optimizer service, plus the `apps/optimizer` shared module providing lightweight entity/repo copies for the analyzer and scheduler apps. ## Changes - [ ] Client-facing API Changes - [x] Internal API Changes - [ ] Bug Fixes - [x] New Features - [ ] Performance Improvements - [ ] Code Style - [ ] Refactoring - [ ] Documentation - [x] Tests **Service layer**: `OptimizerDataService` interface and `OptimizerDataServiceImpl` — CRUD operations, complete-operation lifecycle, stats upsert with history double-write, filtered queries. **Controllers**: `TableOperationsController`, `TableOperationsHistoryController`, `TableStatsController` — REST endpoints per the design doc API spec. **Shared module** (`apps/optimizer`): Lightweight entity and repository copies used by the analyzer and scheduler apps to read optimizer state directly from MySQL. ## Testing Done - [ ] Manually Tested on local docker setup. Please include commands ran, and their output. - [x] Added new tests for the changes made. - [ ] Updated existing tests to reflect the changes made. - [ ] No tests added or updated. Please explain why. If unsure, please feel free to ask for help. - [ ] Some other form of testing like staging or soak time in production. Please explain. H2 integration tests in `OptimizerDataServiceImplTest` (5 tests): - `completeOperation_writesHistoryFromOperationRow` — saves SCHEDULED row, completes it, asserts history DTO fields - `completeOperation_notFound_returnsEmpty` — completes nonexistent ID, asserts empty - `upsertTableStats_createsNewRow` — upserts new table, asserts DTO and repo row - `upsertTableStats_updatesExistingRow` — upserts twice, asserts overwrite with single row - `upsertTableStats_appendsHistoryOnEveryCall` — upserts twice, asserts 2 history rows ``` ./gradlew :services:optimizer:test # BUILD SUCCESSFUL — all 25 tests pass (repo tests from PR 1 + 5 new service tests) ``` # Additional Information - [ ] Breaking Changes - [ ] Deprecations - [x] Large PR broken into smaller PRs, and PR plan linked in the description. --------- Co-authored-by: mkuchenbecker <mkuchenbecker@users.noreply.github.com> Co-authored-by: Claude Opus 4.6 <noreply@anthropic.com>
17 tasks
Member
|
Regarding the directory structure, can we make follow two options:
|
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Optimizer Stack
Summary
PR 3 of N in the optimizer stack.
Introduces
apps/optimizer-analyzer, a Spring Boot CommandLineRunner that evaluates every table intable_statsagainst pluggableOperationAnalyzerstrategies. The first strategy,OrphanFilesDeletionAnalyzer, schedules OFD operations with 24h success / 1h failure retry cadence, a 6h SCHEDULED timeout, and a 5-strike circuit breaker.Key design choices:
find()repository methods with null params.Changes
Core:
AnalyzerRunner— loads table_stats, pre-loads operations and history into maps, evaluates each table against all analyzers, circuit breaker logic.Strategy interface:
OperationAnalyzer—isEnabled(table),shouldSchedule(table, currentOp, latestHistory),getCircuitBreakerThreshold().Cadence policy:
CadencePolicy— encapsulates time-based retry logic shared across operation types.OFD analyzer:
OrphanFilesDeletionAnalyzer— enabled viamaintenance.optimizer.ofd.enabledtable property.Testing Done
25 unit tests:
AnalyzerRunnerTest(7 tests) — eligible table insertion, cadence skip, disabled table, shouldSchedule=false, null UUID, circuit breaker trip, below-threshold passOrphanFilesDeletionAnalyzerTest(18 tests) — isEnabled variants, shouldSchedule for no-op/PENDING/SCHEDULING/SCHEDULED with history combinationsAdditional Information