Skip to content

DataCompaction: skip launching when no strategy has gain > 0#592

Closed
kamanavishnu wants to merge 1 commit into
linkedin:mainfrom
kamanavishnu:vkamana/BDP-102219-compaction-gain-gate
Closed

DataCompaction: skip launching when no strategy has gain > 0#592
kamanavishnu wants to merge 1 commit into
linkedin:mainfrom
kamanavishnu:vkamana/BDP-102219-compaction-gain-gate

Conversation

@kamanavishnu
Copy link
Copy Markdown
Collaborator

Summary

  • 26% of DataCompaction apps on ltx1-holdem (35 of 135 daily) currently complete 0 tasks — there is nothing useful to compact. The strategy generator already records this on the table (via write.data-layout.strategies with gain <= 0, or no property at all), but TableDataCompactionTask.shouldRunTask() ignored that signal and the scheduler still launched the Spark job.
  • This PR reads the persisted strategies on the table and returns false from shouldRunTask() when the list is empty or every entry has gain <= 0. The existing isPrimary() && (isTimePartitioned() || isClustered()) guard runs first, so non-eligible tables don't incur the extra API call.
  • Expected savings on ltx1-holdem: ~1.5K–3K GB-hr/day (~9% of compaction cost); ~$55K–$110K/yr at $0.10/GB-hr. To be validated after EI rollout.

Refs BDP-102219.

Changes

  • apps/spark/.../client/TablesClient.java — public getDataLayoutStrategies(TableMetadata) that reuses the existing private extractor.
  • apps/spark/.../scheduler/tasks/TableDataCompactionTask.java — gate on persisted strategies in shouldRunTask(); log the skip reason.
  • apps/spark/.../scheduler/tasks/TableDataCompactionTaskTest.java — 6 new unit tests.

Test plan

  • :apps:openhouse-spark-apps_2.12:test — full suite passes
  • :apps:openhouse-spark-apps-1.5_2.12:test — full suite passes
  • spotlessApply applied
  • New tests cover: positive-gain → run; empty strategies → skip; all gains ≤ 0 → skip; non-primary → skip with no fetch; non-partitioned & non-clustered → skip with no fetch; clustered + positive gain → run
  • EI rollout to confirm savings estimate

@kamanavishnu kamanavishnu marked this pull request as draft May 20, 2026 21:20
26% of DataCompaction apps on holdem (35/135 daily) complete 0 tasks
because there is nothing useful to compact. The strategy generator
already encodes this — it writes write.data-layout.strategies with
gain <= 0 (or no entry at all) for those tables — but the scheduler's
TableDataCompactionTask.shouldRunTask() ignored the signal and launched
the Spark job anyway.

This change reads the persisted strategies via TablesClient and returns
false when the property is missing/empty or every strategy has
gain <= 0, eliminating the no-op job submissions.

Refs BDP-102219.
@kamanavishnu kamanavishnu force-pushed the vkamana/BDP-102219-compaction-gain-gate branch from bbb6beb to ae5e4a7 Compare May 20, 2026 22:06
@kamanavishnu
Copy link
Copy Markdown
Collaborator Author

Closing — the ticket's premise (135 DATA_COMPACTION apps/day on holdem, 35 no-ops) no longer holds.

Looking at hive.service.sparkmetricsv2 on holdem for the last 10 days (2026-05-10 … 2026-05-19), the legacy DATA_COMPACTION path is effectively turned off: only 1 invocation in that whole window (on 2026-05-16). All real compaction work has moved to DATA_LAYOUT_STRATEGY_EXECUTION — ~178/day on holdem (grid1), ~2,825/day on war-DR (grid2).

So the gating logic in this PR is technically correct, but the savings it predicted won't materialize because the scheduler path it gates isn't being scheduled. Closing as moot. If DATA_COMPACTION is ever re-enabled this can be revived from the branch.

Follow-up worth opening separately: TableDataLayoutStrategyExecutionTask.shouldRunTask() is currently just return metadata.isPrimary(); — all gating lives upstream in DataLayoutUtil.selectStrategies. That's the real place to look for further savings.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant