feat(optimizer): [2/N] Optimizer REST Service and Controller#531
Conversation
Introduces the optimizer service module with: - MySQL/H2 schema for table_operations, table_stats, table_stats_history, and table_operations_history - JPA entities with JSON column support (vladmihalcea hibernate-types) - All model/DTO/enum types: OperationType, OperationStatus, TableStats, CompleteOperationRequest, JobResult, OperationMetrics, etc. - JPA AttributeConverters for JobResult and OperationMetrics JSON columns - MapStruct mapper (OptimizerMapper) for entity→DTO conversion - Spring Boot application shell and build wiring (settings.gradle, build.gradle dockerPrereqs) No repositories, controllers, or service layer yet — those follow in subsequent PRs. Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
- Remove OperationMetrics class and converter; stats are read directly from table_stats instead of duplicating into operations - Remove orphanFilesDeleted/orphanBytesDeleted from history entity, DTO, and schema; operation-specific data belongs in the result JSON - Add addedSizeBytes to CommitDelta for tracking write volume - Fix OperationType javadoc to describe current state, not roadmap - Fix TableOperationsHistoryRow javadoc: written on operation complete, not by Spark app directly - Add field comments to all DTOs and request objects Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
Spring Data JPA repositories for all four optimizer tables with filtered query support. Includes tests exercising save/find, filtered queries, upsert semantics, and append-only history. Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
Address PR review comments: rename findFiltered → find across all repos, remove redundant findByTableUuid/findByTableUuidSince from history repos, add explicit assertion to context test. Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
Service interface and implementation for all optimizer CRUD operations including complete-operation lifecycle, stats upsert with history double-write, and filtered queries. Three REST controllers expose the endpoints. The apps/optimizer shared module provides lightweight entity/repo copies for the analyzer and scheduler apps. Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
Align OptimizerDataServiceImpl with renamed repository methods from optimizer-1 review feedback. Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
e628426 to
ef3260f
Compare
Shared JPA entities and repositories for optimizer apps (analyzer, scheduler). All repos expose a single find method with optional filters. Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
Resolve repo conflicts by taking optimizer-1's clean find-only versions. Scheduler-specific methods and streamAll removed per review feedback. Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
mkuchenbecker
left a comment
There was a problem hiding this comment.
this needs tests
These fields never belonged in the data model — remove them at the source rather than adding then deleting in a later PR. Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
Propagate CompleteOperationRequest orphan field removal. Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
Propagate CompleteOperationRequest orphan field removal. Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
H2 integration tests for OptimizerDataServiceImpl covering completeOperation (write history, not-found) and upsertTableStats (create, update, history append). Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
Strengthen upsertTableStats test to verify history rows contain the raw delta stats from each call, not just the row count. Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
Repo public API now:
- find(...) with Optional<T> filters + required Pageable, on all four repos
- updateBatch(ids, fromStatus, toStatus, Optional<Instant> scheduledAt,
Optional<String> jobId) — replaces markSchedulingBatch,
markScheduledBatch, markPendingBatch
- cancel(ids) — replaces cancelDuplicatePendingBatch; deletes by-id with
a defensive PENDING-only gate
- findLatest(opType, Pageable) — was findLatestPerTable
- history.find(tableUuid, Pageable) — was findByTableUuidOrderByCompletedAtDesc
Side-effect columns on updateBatch use COALESCE with Optional.empty() →
leave-unchanged. scheduledAt is not cleared on SCHEDULING → PENDING revert;
status is the source of truth and the watermark is overwritten on the next
claim.
@Modifying queries get flushAutomatically + clearAutomatically so the L1
cache reflects the change immediately (caught by the unit tests).
Spring Data @query can't share an "IS NULL OR IN :list" pattern (Hibernate
expands the list inline and the IS NULL check turns ungrammatical). The
find path uses two internal queries dispatched by the default method —
one with the ids predicate, one without.
Callers (service, analyzer, scheduler) update on opt-2..opt-4 in
follow-up commits.
Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
OptimizerDataServiceImpl pipes Optional<T> filters straight through to the repo (no .orElse(null) at the boundary). Adds the optimizer.repo.default-limit config property and threads it into the list-shaped calls. Service-impl test updates the one direct statsHistoryRepository.find(...) call to pass Optional.empty(). Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
Instant.now() on Linux CI carries nanoseconds; MySQL TIMESTAMP(6) and H2 in MySQL mode store microseconds. The scheduledAt = :scheduledAt predicate in find(...) compared nano-resolution param against micro-resolution stored value and missed. Local (macOS, micro-only) hid the bug. Truncate to ChronoUnit.MICROS at write time in the one repo test that exercises the watermark round-trip. Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
## Optimizer Stack | PR | Content | |---|---| | #527 | Data Model | | #530 **(this)** | Database Repos | | #531 | REST service | | #533 | Analyzer app | | #534 | Scheduler app | | #tbd | Spark BatchedOFD app | | #tbd | Infra, docker-compose, smoke test | ## Summary PR 1 of N in the optimizer stack. [Overall Project](https://docs.google.com/document/d/1oGQWkmlVw0HG-D4Nx37q0oUEQd53Ni4q5Q7h1voaZIQ/edit?tab=t.0#heading=h.vtl818e4m9f7) [Service Design doc](https://docs.google.com/document/d/1xYOD7iuPjZO05UWfT1Dkmf4lYNw4iOjqY10UT2X1FAg/edit?tab=t.0). Spring Data JPA repositories for all four optimizer tables with filtered query support, plus tests exercising save/find, filtered queries, upsert semantics, and append-only history. ## Changes - [ ] Client-facing API Changes - [ ] Internal API Changes - [ ] Bug Fixes - [x] New Features - [ ] Performance Improvements - [ ] Code Style - [ ] Refactoring - [ ] Documentation - [x] Tests **Repositories**: `TableOperationsRepository`, `TableOperationsHistoryRepository`, `TableStatsRepository`, `TableStatsHistoryRepository` — each with JPQL filtered query methods. **Tests**: Repository tests for all four tables plus `OptimizerServiceContextTest` verifying the Spring context loads. ## Testing Done - [ ] Manually Tested on local docker setup. Please include commands ran, and their output. - [x] Added new tests for the changes made. - [ ] Updated existing tests to reflect the changes made. - [ ] No tests added or updated. Please explain why. If unsure, please feel free to ask for help. - [ ] Some other form of testing like staging or soak time in production. Please explain. `./gradlew :services:optimizer:test` — all tests pass (H2 in MySQL mode). # Additional Information - [ ] Breaking Changes - [ ] Deprecations - [x] Large PR broken into smaller PRs, and PR plan linked in the description. --------- Co-authored-by: mkuchenbecker <mkuchenbecker@users.noreply.github.com> Co-authored-by: Claude Opus 4.6 <noreply@anthropic.com>
All four list-style endpoints now require a caller-supplied limit. No server-side default, no max — getting the API contract right comes first; bounds can land separately. - TableOperationsController.listTableOperations: @RequestParam int limit - TableStatsController.listTableStats: @RequestParam int limit - TableStatsController.getStatsHistory: drop defaultValue="100" - TableOperationsHistoryController.getHistory: drop defaultValue="100" - OptimizerDataService: listTableOperations / listTableStats gain int limit - OptimizerDataServiceImpl: drop @value("${optimizer.repo.default-limit}") and the defaultLimit field; thread caller-supplied limit straight to PageRequest.of(0, limit), which cascades to MySQL LIMIT n. - application.properties: remove now-unused optimizer.repo.default-limit. Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
The PR description looks good with all the design docs including BDP ticket and make the PR comprehensive. But people outside of LinkedIn would not be able to open the links. We are following the convention of not linking internal tickets and doc in the OSS repo. I thought about this in the previous two PRs which are already merged, but did not mention about this. Now I am thinking should we put the internal references, as I am also in the process of raising PRs. Let me know your thoughts on this. |
agreed and removed up the stack. |
Every optimizer endpoint now returns a uniform ApiError body
({code, message, path}) on non-2xx. Reshape updateOperation so the
operationId lives in the URL (and is repeated in the body for
self-describing payloads).
- api/error/ApiError.java — DTO shape.
- api/error/GlobalExceptionHandler.java — @RestControllerAdvice mapping
framework exceptions to {VALIDATION_ERROR, INVALID_PARAMETER,
MISSING_PARAMETER, MALFORMED_REQUEST, INTERNAL_ERROR}. Reason of
ResponseStatusException is parsed as "CODE: message" for endpoint-
specific 404s.
- Controllers: orElseThrow(ResponseStatusException) replaces bare 404.
TableOperationsController moves updateOperation to
POST /v1/optimizer/operations/{id}/update; rejects with 400
PATH_BODY_MISMATCH when body.operationId != path.id.
- UpdateOperationRequest: @notblank operationId, @NotNull status.
- UpsertTableStatsRequest: @notblank databaseName, tableName.
- spring-boot-starter-validation dep added.
- New ControllerErrorHandlingTest: 13 MockMvc cases covering 404 / 400
validation / 400 type-mismatch / 400 missing-param / 400 malformed-
body / 400 path-body-mismatch + happy-path sanity.
Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
Address review comments on PR #596 — minimal diff, no bean validation, no per-framework-exception handlers, scoped to the optimizer controllers. - @RestControllerAdvice scoped to the three optimizer controllers via assignableTypes; no longer global. - Two handlers only: ResponseStatusException → ApiError with code = status.name() + reason as message; Exception → 500 INTERNAL_ERROR. Drop MethodArgumentNotValidException, MethodArgumentTypeMismatchException, MissingServletRequestParameterException, HttpMessageNotReadableException — Spring's defaults handle those. - Drop the "CODE: message" reason-parsing convention. - Drop @notblank / @NotNull on UpdateOperationRequest and UpsertTableStatsRequest; drop @Valid on controllers; drop spring-boot-starter-validation dep. Validate operationId / status server- side in TableOperationsController.updateOperation — loose-coupling so relaxing required fields later doesn't break wire callers. - String.format throughout; no message concatenation. - ControllerErrorHandlingTest trimmed from 13 cases to 7: only what the controllers actually own (404s, server-side validation on updateOperation, happy-path sanity). Framework-level 4xx left to Spring. Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
…ng defaults
The custom advice was producing a body shape (ApiError {code, message,
path}) that duplicated Spring Boot's default error JSON ({timestamp,
status, error, message, path}). The only substantive difference was
that Spring Boot 2.7 omits the `message` field by default.
Replace the custom advice with a one-line config:
server.error.include-message=always
Now ResponseStatusException reasons (e.g. "no operation with id X")
reach the caller via Spring's default error body, no custom code.
- Delete api/error/GlobalExceptionHandler.java
- Delete api/error/ApiError.java
- application.properties: server.error.include-message=always
- ControllerErrorHandlingTest assertions trimmed to status-code-only
(MockMvc does not trigger Spring's error-dispatch to
BasicErrorController, so body assertions cannot be made in tests
even though the body is populated on real HTTP requests).
Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
Per review — no change needed on this file. The path/body validation lives in the controller; the DTO carries the same fields as before with the existing javadoc. Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
…chenb/optimizer-2
Address review on PR #531 — each endpoint lists the HTTP response codes it actually returns, in the same "Resource ACTION: STATUS" style used by services/tables. Codes per endpoint: - POST /operations/{id}/update — 201, 400, 404 - GET /operations/{id} — 200, 404 - GET /operations — 200, 400 - POST /operations-history — 201 - GET /operations-history/{u} — 200, 400 - PUT /stats/{u} — 200 - GET /stats/{u} — 200, 404 - GET /stats — 200, 400 - GET /stats/{u}/history — 200, 400 Annotations only — no runtime behavior change, no new tests required. swagger-annotations 2.1.11 is already on the optimizer classpath via openhouse.springboot-conventions. Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
abhisheknath2011
left a comment
There was a problem hiding this comment.
Thanks @mkuchenbecker. The initial set of APIs look good. We can iterate and update as needed.
Optimizer Stack
Summary
PR 2 of N in the optimizer stack.
Service layer and REST controllers for the optimizer service, plus the
apps/optimizershared module providing lightweight entity/repo copies for the analyzer and scheduler apps.Changes
Service layer:
OptimizerDataServiceinterface andOptimizerDataServiceImpl— CRUD operations, complete-operation lifecycle, stats upsert with history double-write, filtered queries.Controllers:
TableOperationsController,TableOperationsHistoryController,TableStatsController— REST endpoints per the design doc API spec.Shared module (
apps/optimizer): Lightweight entity and repository copies used by the analyzer and scheduler apps to read optimizer state directly from MySQL.Testing Done
H2 integration tests in
OptimizerDataServiceImplTest(5 tests):completeOperation_writesHistoryFromOperationRow— saves SCHEDULED row, completes it, asserts history DTO fieldscompleteOperation_notFound_returnsEmpty— completes nonexistent ID, asserts emptyupsertTableStats_createsNewRow— upserts new table, asserts DTO and repo rowupsertTableStats_updatesExistingRow— upserts twice, asserts overwrite with single rowupsertTableStats_appendsHistoryOnEveryCall— upserts twice, asserts 2 history rowsAdditional Information