Skip to content

[BDP-102028] feat(optimizer): [0/N] Optimizer API and internal model#527

Merged
mkuchenbecker merged 29 commits into
linkedin:mainfrom
mkuchenbecker:mkuchenb/optimizer-0
May 20, 2026
Merged

[BDP-102028] feat(optimizer): [0/N] Optimizer API and internal model#527
mkuchenbecker merged 29 commits into
linkedin:mainfrom
mkuchenbecker:mkuchenb/optimizer-0

Conversation

@mkuchenbecker
Copy link
Copy Markdown
Collaborator

@mkuchenbecker mkuchenbecker commented Apr 3, 2026

Optimizer Stack

PR Content
#527 (this) API and internal models
#530 Database Repos
#531 REST service
#533 Analyzer app
#534 Scheduler app
#tbd Spark BatchedOFD app
#tbd Infra, docker-compose, smoke test

Summary

PR 0 of N in the optimizer stack.
Overall Project
Service Design doc.

Introduces the optimizer service API and internal model

image

Changes

  • Client-facing API Changes
  • Internal API Changes
  • Bug Fixes
  • New Features
  • Performance Improvements
  • Code Style
  • Refactoring
  • Documentation
  • Tests

Testing Done

  • Manually Tested on local docker setup. Please include commands ran, and their output.
  • Added new tests for the changes made.
  • Updated existing tests to reflect the changes made.
  • No tests added or updated. Please explain why. If unsure, please feel free to ask for help.
  • Some other form of testing like staging or soak time in production. Please explain.

This PR contains only the data model (entities, DTOs, converters). Repository tests follow in PR 1. Verified:

  • ./gradlew :services:optimizer:compileJava passes
  • ./gradlew compileJava (full project) passes with no regressions
  • Spotless formatting passes

Additional Information

  • Breaking Changes
  • Deprecations
  • Large PR broken into smaller PRs, and PR plan linked in the description.

Introduces the optimizer service module with:
- MySQL/H2 schema for table_operations, table_stats, table_stats_history,
  and table_operations_history
- JPA entities with JSON column support (vladmihalcea hibernate-types)
- All model/DTO/enum types: OperationType, OperationStatus, TableStats,
  CompleteOperationRequest, JobResult, OperationMetrics, etc.
- JPA AttributeConverters for JobResult and OperationMetrics JSON columns
- MapStruct mapper (OptimizerMapper) for entity→DTO conversion
- Spring Boot application shell and build wiring (settings.gradle,
  build.gradle dockerPrereqs)

No repositories, controllers, or service layer yet — those follow in
subsequent PRs.

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
- Remove OperationMetrics class and converter; stats are read
  directly from table_stats instead of duplicating into operations
- Remove orphanFilesDeleted/orphanBytesDeleted from history entity,
  DTO, and schema; operation-specific data belongs in the result JSON
- Add addedSizeBytes to CommitDelta for tracking write volume
- Fix OperationType javadoc to describe current state, not roadmap
- Fix TableOperationsHistoryRow javadoc: written on operation
  complete, not by Spark app directly
- Add field comments to all DTOs and request objects

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
@mkuchenbecker mkuchenbecker changed the title feat(optimizer): add data model — schema, entities, DTOs, converters feat(optimizer): [1/N] data model Apr 6, 2026
These fields never belonged in the data model — remove them at the
source rather than adding then deleting in a later PR.

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
@mkuchenbecker mkuchenbecker changed the title feat(optimizer): [1/N] data model feat(optimizer): [1/N] Optimizer Data Model Apr 6, 2026
@mkuchenbecker mkuchenbecker marked this pull request as ready for review April 6, 2026 19:46
@mkuchenbecker mkuchenbecker changed the title feat(optimizer): [1/N] Optimizer Data Model feat(optimizer): [0/N] Optimizer Data Model Apr 6, 2026
The DB layer (entities + api↔db mapper) belongs to optimizer-1, not
optimizer-0. optimizer-0 owns only the wire-API surface and the
internal model.

Delete from this PR:
- entity/ package (TableOperationsRow, TableOperationsHistoryRow,
  TableStatsRow, TableStatsHistoryRow, package-info).
- api/mapper/OptimizerMapper — was the api↔entity bridge. With the
  entity files moving out of this PR and the new model/mapper/
  taking over conversion duties, this mapper is no longer needed
  here.

optimizer-1 will re-introduce these as db/ (renamed) with db-side
per-layer types and a model/mapper/ModelDbMapper.
The DDL is part of the db/ layer's ownership (optimizer-1). Move
the schema file and its schema-init properties out of optimizer-0
so this PR is purely api/ + model/.

Delete:
- src/main/resources/db/optimizer-schema.sql.
- spring.sql.init.mode, spring.sql.init.schema-locations, and
  spring.jpa.defer-datasource-initialization from
  application.properties (they reference the deleted schema file).

optimizer-1 re-introduces these alongside the db/ entities and
repositories.
DB-layer dependencies belong to optimizer-1. With entities, schema,
and the api/mapper deleted from this PR, the JPA + MySQL stack is
unused — remove the dependency declarations and configuration that
referenced them.

build.gradle:
- Drop spring-boot-starter-data-jpa, mysql-connector-java, the
  vladmihalcea hibernate-types JSON serializer, and the h2 test
  runtime.

application.properties:
- Drop spring.jpa.* and spring.datasource.* lines.

Delete services/optimizer/src/test/resources/application-test.properties
(H2 test datasource config — re-introduced on optimizer-1 alongside
the repositories and repo tests).
No external system creates table operations — operations are
written by the in-process analyzer directly through the model
layer. The request type has no wire consumer and no internal
consumer, so it's dead code.

Delete services/optimizer/.../api/model/UpsertTableOperationsRequest.java.
JobResult is removed from the optimizer API. CompleteOperationRequest
(user-edited) now carries only operationId + status — the failure
detail abstraction has been retired. The internal model and DTOs no
longer carry it either, and the type itself is deleted from both
api/ and model/.

CompleteOperationRequest:
- operationId moved from path to body (user manual edit).
- jobId field removed.
- result field removed.

api/model/TableOperationsHistoryDto:
- Drop jobId and result fields.

model/TableOperationsHistory:
- Drop jobId and result fields.

model/mapper/ApiModelMapper:
- Remove toModelJobResult / toApiJobResult helpers + JobResult import.
- toHistory()/toDto() no longer touch jobId or result.

Delete:
- services/optimizer/.../api/model/JobResult.java
- services/optimizer/.../model/JobResult.java

Downstream propagation: opt-2's service signature changes
(completeOperation now takes only the request body); db/HistoryStatus
remains needed on opt-1 but db/JobResult no longer is. See
memory/tasks/mkuchenb-optimizer-3-fixes.md for the full propagation
list.
Add tableUuid, databaseName, tableName, and operationType to the
complete request body. They're debug-only — the server keys lookup
off operationId — but preserving them on logs and traces helps an
operator diagnose a failing complete call without joining back to
the operation row.
@mkuchenbecker mkuchenbecker changed the title [BDP-102028] feat(optimizer): [0/N] Optimizer API and data model [BDP-102028] feat(optimizer): [0/N] Optimizer API and internal model May 14, 2026
mkuchenbecker and others added 8 commits May 14, 2026 13:53
Every line in application.properties is run-time config (server.port,
spring.application.name, actuator endpoints). optimizer-0 has no
controllers and no endpoint to serve — the file is doing nothing
here. The first PR that actually runs a web service is optimizer-2.

Delete the file from this PR. optimizer-2 will re-introduce it
alongside the REST controllers.

The OptimizerServiceApplication @SpringBootApplication shell stays
on this branch — optimizer-1's repository tests use @SpringBootTest
and need an application class to discover.
Prepare model/ for a service-layer rewrite that returns only model/
types (no api/ DTO leakage into the service interface).

- model/Table: add `Instant updatedAt`. The service stamps it on every
  upsert; controllers read it when assembling the wire DTO.
- model/TableStatsHistory: new internal-model counterpart to
  db.TableStatsHistoryRow. Fields mirror the row in internal types
  (id, tableUuid, databaseName, tableName, stats, recordedAt).
- ApiModelMapper: add the missing api↔model conversions that
  controllers will own once the service drops api/ knowledge —
  Table ↔ TableStatsDto, TableStatsHistory ↔ TableStatsHistoryDto,
  and toTable(tableUuid, UpsertTableStatsRequest).
Several fields under api/model/ and model/ were left undocumented in
the earlier per-layer-types passes. Audit + fill them in:

api/model/TableOperationsHistoryDto: databaseName, tableName,
operationType — add display/role docs.

api/model/HistoryStatus: SUCCESS, FAILED — add enum-value docs.

api/model/TableStats inner classes:
- SnapshotMetrics: clusterId, tableVersion, tableLocation,
  tableSizeBytes — add field docs.
- CommitDelta: numFilesAdded, numFilesDeleted, addedSizeBytes,
  deletedSizeBytes — add field docs.

model/Table: tableUuid, databaseName, tableId, tableProperties,
stats — add field docs.

model/TableStats: same field-doc additions on SnapshotMetrics and
CommitDelta as the api/ counterpart.

model/OperationStatus: PENDING, SCHEDULING, SCHEDULED, CANCELED —
add enum-value docs.

model/OperationType: ORPHAN_FILES_DELETION — add enum-value doc.

model/HistoryStatus: SUCCESS, FAILED — add enum-value docs.

model/TableStatsHistory: id, tableUuid, databaseName, tableName —
add field docs.
clusterId is per-table-immutable in OpenHouse — it never changes
after the table is created — so persisting and transmitting it on
every snapshot is dead weight. Remove from the wire and internal
representations.

- api/model/TableStats.SnapshotMetrics: drop clusterId.
- model/TableStats.SnapshotMetrics: drop clusterId.
- model/mapper/ApiModelMapper: drop the clusterId hop in
  toModelSnapshot and toApiSnapshot.
… ApiModelMapper

Replace the api/model boundary mapper with conversion methods on the
types themselves. The api layer now imports model/ directly via
to/from methods — controllers and other api-edge callers no longer
inject a mapper bean.

The dependency direction is a strict downward chain:

  api  →  model  →  db

api types know about model types (and call model methods); model
types know about db types (next round). db remains import-free. No
central mapper, no risk of a cycle through a hub class.

api/model/* changes (each gets a `toModel()` instance method + a
static `fromModel(...)` factory):
- TableOperationsDto ↔ model.TableOperation.
- TableOperationsHistoryDto ↔ model.TableOperationsHistory.
- TableStatsDto ↔ model.Table.
- TableStatsHistoryDto ↔ model.TableStatsHistory.
- UpsertTableStatsRequest → model.Table (one-way; takes the
  path-var tableUuid; updatedAt is server-stamped).
- TableStats (+ SnapshotMetrics + CommitDelta inner) ↔ model.TableStats.
- OperationType / OperationStatus / HistoryStatus (api enums) ↔
  model enums.

CompleteOperationRequest keeps its fields plain — callers extract
`operationId` and `status.toModel()` directly; no wrapper needed.

Delete services/optimizer/.../model/mapper/ApiModelMapper.java.
… to TableStats

model.TableStats now carries its own identity (tableUuid, databaseName,
tableName) and metadata (tableProperties, updatedAt) alongside the
snapshot + delta payload. Consumers no longer need an outer wrapper to
know which table the stats belong to.

api.TableStatsDto.toModel() and api.UpsertTableStatsRequest.toModel()
now return model.TableStats (was model.Table). The two types only
happened to have the same shape — semantically a DTO for stats is
stats, not a table.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
…ationsHistory

Moved down from opt-2. The service-layer code (opt-2) uses .toBuilder()
on both types; the lombok annotation that enables it belongs on the
PR that owns model/.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
TableOperation becomes a pure operation record. Consumers (scheduler) look
up TableStats at the point they need it, rather than carrying enrichment
data on the model type.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
mkuchenbecker and others added 2 commits May 19, 2026 13:35
…edin#527 review)

Per @abhisheknath2011 review comment 3262776356:
> "We could change all the internal model add Dto suffix something like
> TableOperationsDto. This aligns with the existing services codebase."

Renames (suffix added):
- CompleteOperationRequest    -> CompleteOperationRequestDto
- UpsertTableStatsRequest     -> UpsertTableStatsRequestDto
- OperationType (enum)        -> OperationTypeDto
- OperationStatus (enum)      -> OperationStatusDto
- HistoryStatus (enum)        -> HistoryStatusDto
- TableStats (inner payload)  -> TableStatsPayloadDto
- TableStats.SnapshotMetrics  -> TableStatsPayloadDto.SnapshotMetricsDto
- TableStats.CommitDelta      -> TableStatsPayloadDto.CommitDeltaDto

Cross-reference updates inside api/model. Internal model layer
(services/optimizer/.../model/) is intentionally unchanged.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
…n#527 review)

Per @abhisheknath2011 review comment 3262769497:
> "Can we change the client side API to api.spec instead of api.model?
> This also aligns with existing services."

Mechanical package rename. The 10 api wire types move from
services/optimizer/.../api/model/ to services/optimizer/.../api/spec/.
No type or signature changes.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Reversal of an earlier inconsistency surfaced by abhisheknath2011 in
the PR linkedin#527 review thread on api/spec/HistoryStatusDto.java. The api
wire types are the canonical contract; they should carry the canonical
name. The internal-model types are transfer objects between layers and
now carry the Dto suffix.

api/spec/ — Dto stripped from class + filename (10 files):
  CompleteOperationRequestDto -> CompleteOperationRequest
  HistoryStatusDto             -> HistoryStatus
  OperationStatusDto           -> OperationStatus
  OperationTypeDto             -> OperationType
  TableOperationsDto           -> TableOperations
  TableOperationsHistoryDto    -> TableOperationsHistory
  TableStatsDto                -> TableStats
  TableStatsHistoryDto         -> TableStatsHistory
  TableStatsPayloadDto         -> TableStatsPayload
  UpsertTableStatsRequestDto   -> UpsertTableStatsRequest

model/ — Dto added to class + filename (8 files):
  HistoryStatus           -> HistoryStatusDto
  OperationStatus         -> OperationStatusDto
  OperationType           -> OperationTypeDto
  Table                   -> TableDto
  TableOperation          -> TableOperationDto
  TableOperationsHistory  -> TableOperationsHistoryDto
  TableStats              -> TableStatsDto
  TableStatsHistory       -> TableStatsHistoryDto

Both renames land on opt-0 because opt-0 owns api/spec/ and model/.
Cascade up the stack in follow-up commits.

Out of scope here: HistoryStatus enum value additions (CANCELED, QUEUED)
also raised in the same review thread; separate semantic change.

Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
mkuchenbecker and others added 2 commits May 20, 2026 15:16
model.TableOperationDto grows a jobId field; api.TableOperations
conversions copy it across the api ↔ model boundary. The api DTO
already had the field; the model side was missing it.

Relocated from opt-5 to its proper owner per the model-layer rule.
Model ↔ db plumbing for the same field lands on opt-1 in a follow-up.

Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
…nRequest

Symbol rename only. The HistoryStatus enum (SUCCESS/FAILED) and the
once-terminal semantics are unchanged; the endpoint's behavior is the
same. Future broadening (CANCELED/QUEUED, idempotency, mid-lifecycle
status changes) is a separate concern.

Method names + URL path will follow on opt-2; Spark-app caller +
docs follow on opt-5.

Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
Copy link
Copy Markdown
Member

@abhisheknath2011 abhisheknath2011 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks @mkuchenbecker for addressing the comments.

@mkuchenbecker mkuchenbecker merged commit f85ad6d into linkedin:main May 20, 2026
1 check passed
mkuchenbecker added a commit that referenced this pull request May 21, 2026
## Optimizer Stack

| PR | Content |
|---|---|
| #527 | Data Model |
| #530 **(this)** | Database
Repos |
| #531 | REST service |
| #533 | Analyzer app |
| #534 | Scheduler app |
| #tbd | Spark BatchedOFD app |
| #tbd | Infra, docker-compose, smoke test |

## Summary

PR 1 of N in the optimizer stack. 
[Overall
Project](https://docs.google.com/document/d/1oGQWkmlVw0HG-D4Nx37q0oUEQd53Ni4q5Q7h1voaZIQ/edit?tab=t.0#heading=h.vtl818e4m9f7)
[Service Design
doc](https://docs.google.com/document/d/1xYOD7iuPjZO05UWfT1Dkmf4lYNw4iOjqY10UT2X1FAg/edit?tab=t.0).


Spring Data JPA repositories for all four optimizer tables with filtered
query support, plus tests exercising save/find, filtered queries, upsert
semantics, and append-only history.

## Changes

- [ ] Client-facing API Changes
- [ ] Internal API Changes
- [ ] Bug Fixes
- [x] New Features
- [ ] Performance Improvements
- [ ] Code Style
- [ ] Refactoring
- [ ] Documentation
- [x] Tests

**Repositories**: `TableOperationsRepository`,
`TableOperationsHistoryRepository`, `TableStatsRepository`,
`TableStatsHistoryRepository` — each with JPQL filtered query methods.

**Tests**: Repository tests for all four tables plus
`OptimizerServiceContextTest` verifying the Spring context loads.

## Testing Done

- [ ] Manually Tested on local docker setup. Please include commands
ran, and their output.
- [x] Added new tests for the changes made.
- [ ] Updated existing tests to reflect the changes made.
- [ ] No tests added or updated. Please explain why. If unsure, please
feel free to ask for help.
- [ ] Some other form of testing like staging or soak time in
production. Please explain.

`./gradlew :services:optimizer:test` — all tests pass (H2 in MySQL
mode).

# Additional Information

- [ ] Breaking Changes
- [ ] Deprecations
- [x] Large PR broken into smaller PRs, and PR plan linked in the
description.

---------

Co-authored-by: mkuchenbecker <mkuchenbecker@users.noreply.github.com>
Co-authored-by: Claude Opus 4.6 <noreply@anthropic.com>
mkuchenbecker added a commit that referenced this pull request May 22, 2026
## Optimizer Stack

| PR | Content |
|---|---|
| #527 | Data Model |
| #530 | Database Repos |
| #531 **(this)** | REST
service |
| #533 | Analyzer app |
| #534 | Scheduler app |
| #tbd | Spark BatchedOFD app |
| #tbd | Infra, docker-compose, smoke test |

## Summary

PR 2 of N in the optimizer stack. 

Service layer and REST controllers for the optimizer service, plus the
`apps/optimizer` shared module providing lightweight entity/repo copies
for the analyzer and scheduler apps.

## Changes

- [ ] Client-facing API Changes
- [x] Internal API Changes
- [ ] Bug Fixes
- [x] New Features
- [ ] Performance Improvements
- [ ] Code Style
- [ ] Refactoring
- [ ] Documentation
- [x] Tests

**Service layer**: `OptimizerDataService` interface and
`OptimizerDataServiceImpl` — CRUD operations, complete-operation
lifecycle, stats upsert with history double-write, filtered queries.

**Controllers**: `TableOperationsController`,
`TableOperationsHistoryController`, `TableStatsController` — REST
endpoints per the design doc API spec.

**Shared module** (`apps/optimizer`): Lightweight entity and repository
copies used by the analyzer and scheduler apps to read optimizer state
directly from MySQL.

## Testing Done

- [ ] Manually Tested on local docker setup. Please include commands
ran, and their output.
- [x] Added new tests for the changes made.
- [ ] Updated existing tests to reflect the changes made.
- [ ] No tests added or updated. Please explain why. If unsure, please
feel free to ask for help.
- [ ] Some other form of testing like staging or soak time in
production. Please explain.

H2 integration tests in `OptimizerDataServiceImplTest` (5 tests):
- `completeOperation_writesHistoryFromOperationRow` — saves SCHEDULED
row, completes it, asserts history DTO fields
- `completeOperation_notFound_returnsEmpty` — completes nonexistent ID,
asserts empty
- `upsertTableStats_createsNewRow` — upserts new table, asserts DTO and
repo row
- `upsertTableStats_updatesExistingRow` — upserts twice, asserts
overwrite with single row
- `upsertTableStats_appendsHistoryOnEveryCall` — upserts twice, asserts
2 history rows

```
./gradlew :services:optimizer:test
# BUILD SUCCESSFUL — all 25 tests pass (repo tests from PR 1 + 5 new service tests)
```

# Additional Information

- [ ] Breaking Changes
- [ ] Deprecations
- [x] Large PR broken into smaller PRs, and PR plan linked in the
description.

---------

Co-authored-by: mkuchenbecker <mkuchenbecker@users.noreply.github.com>
Co-authored-by: Claude Opus 4.6 <noreply@anthropic.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants