DataCompaction: skip launching when no strategy has gain > 0#592
DataCompaction: skip launching when no strategy has gain > 0#592kamanavishnu wants to merge 1 commit into
Conversation
26% of DataCompaction apps on holdem (35/135 daily) complete 0 tasks because there is nothing useful to compact. The strategy generator already encodes this — it writes write.data-layout.strategies with gain <= 0 (or no entry at all) for those tables — but the scheduler's TableDataCompactionTask.shouldRunTask() ignored the signal and launched the Spark job anyway. This change reads the persisted strategies via TablesClient and returns false when the property is missing/empty or every strategy has gain <= 0, eliminating the no-op job submissions. Refs BDP-102219.
bbb6beb to
ae5e4a7
Compare
|
Closing — the ticket's premise (135 DATA_COMPACTION apps/day on holdem, 35 no-ops) no longer holds. Looking at So the gating logic in this PR is technically correct, but the savings it predicted won't materialize because the scheduler path it gates isn't being scheduled. Closing as moot. If Follow-up worth opening separately: |
Summary
DataCompactionapps onltx1-holdem(35 of 135 daily) currently complete 0 tasks — there is nothing useful to compact. The strategy generator already records this on the table (viawrite.data-layout.strategieswithgain <= 0, or no property at all), butTableDataCompactionTask.shouldRunTask()ignored that signal and the scheduler still launched the Spark job.falsefromshouldRunTask()when the list is empty or every entry hasgain <= 0. The existingisPrimary() && (isTimePartitioned() || isClustered())guard runs first, so non-eligible tables don't incur the extra API call.ltx1-holdem: ~1.5K–3K GB-hr/day (~9% of compaction cost); ~$55K–$110K/yr at $0.10/GB-hr. To be validated after EI rollout.Refs BDP-102219.
Changes
apps/spark/.../client/TablesClient.java— publicgetDataLayoutStrategies(TableMetadata)that reuses the existing private extractor.apps/spark/.../scheduler/tasks/TableDataCompactionTask.java— gate on persisted strategies inshouldRunTask(); log the skip reason.apps/spark/.../scheduler/tasks/TableDataCompactionTaskTest.java— 6 new unit tests.Test plan
:apps:openhouse-spark-apps_2.12:test— full suite passes:apps:openhouse-spark-apps-1.5_2.12:test— full suite passesspotlessApplyapplied