Skip to content

[#10457] improvement: Shrink the package size of Hive Metastore 2 and 3 catalogs#10465

Merged
yuqi1129 merged 8 commits intoapache:mainfrom
diqiu50:hive-metastore-shrink-package
Mar 25, 2026
Merged

[#10457] improvement: Shrink the package size of Hive Metastore 2 and 3 catalogs#10465
yuqi1129 merged 8 commits intoapache:mainfrom
diqiu50:hive-metastore-shrink-package

Conversation

@diqiu50
Copy link
Copy Markdown
Contributor

@diqiu50 diqiu50 commented Mar 17, 2026

What changes were proposed in this pull request?

Reduce the distribution package size of Hive Metastore 2 and 3 catalog libs
by adding dependency exclusions to filter out unnecessary transitive JARs.

Package Before After
hive-metastore2-libs 102MB (176 JARs) 78MB (138 JARs)
hive-metastore3-libs 127MB (205 JARs) 104MB (201 JARs)

Excluded categories: logging stack (slf4j/log4j/logback), test artifacts
(junit/hamcrest), HBase, DataNucleus, Ant, Avro, Parquet, YARN server
components, and compile-time annotation JARs.

Why are the changes needed?

Fixes #10457.

Does this PR introduce any user-facing change?

No.

How was this patch tested?

Unit tests and integration tests pass.

…kage size

Reduce the distribution package size of the Hive Metastore 2 and 3
catalog libs by adding dependency exclusions and file-level exclusions
to filter out unnecessary transitive dependencies.

Changes:
- Add dependency-level excludes on `hadoop2.common` and `hive2/3.metastore`
  for groups like slf4j, log4j, findbugs, jetty that are either provided
  by the server runtime or not needed at runtime
- Add file-level excludes in the `copyLibs` task to filter JARs that
  sneak in via other transitive paths:
  - Logging: log4j-*.jar, slf4j-*.jar, logback-*.jar
  - Test artifacts: junit-*.jar, hamcrest-*.jar, *-tests.jar
  - HBase subsystem: hbase-*.jar (not needed for HMS client)
  - DataNucleus: datanucleus-*.jar (HMS server-side persistence only)
  - Ant build tools: ant-*.jar
  - Serialization: avro-*.jar, parquet-hadoop-bundle-*.jar
  - Tephra/Twill: tephra-*.jar, twill-*.jar (optional HMS extensions)
  - YARN server: hadoop-yarn-server-*.jar
  - Annotations: jsr305-*.jar, spotbugs-annotations-*.jar,
    findbugs-annotations-*.jar, error_prone_annotations-*.jar, jol-core-*.jar

Result:
- hive-metastore2-libs: 102MB → 78MB (−24MB, −38 JARs)
- hive-metastore3-libs: 127MB → 104MB (−23MB, −4 JARs)
Copilot AI review requested due to automatic review settings March 17, 2026 14:47
@diqiu50 diqiu50 self-assigned this Mar 17, 2026
Copy link
Copy Markdown
Contributor

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

This PR reduces the distribution package size for Hive Metastore 2 and 3 catalog libraries by excluding unnecessary transitive dependencies and filtering out unneeded JARs during packaging, addressing #10457.

Changes:

  • Add dependency-level exclusions to hadoop2.common and hive2/3.metastore to avoid pulling in unwanted transitive artifacts.
  • Add copyLibs file-pattern excludes to prevent specific JARs from being shipped in the distribution packages.

Reviewed changes

Copilot reviewed 2 out of 2 changed files in this pull request and generated 3 comments.

File Description
catalogs/hive-metastore3-libs/build.gradle.kts Adds dependency exclusions and filters copied JARs to shrink the Hive 3 metastore libs package.
catalogs/hive-metastore2-libs/build.gradle.kts Mirrors the same exclusions/filtering approach for the Hive 2 metastore libs package.

You can also share your feedback on Copilot code review. Take the survey.

Comment thread catalogs/hive-metastore3-libs/build.gradle.kts Outdated
Comment thread catalogs/hive-metastore2-libs/build.gradle.kts
Comment thread catalogs/hive-metastore3-libs/build.gradle.kts Outdated
@github-actions
Copy link
Copy Markdown

github-actions Bot commented Mar 17, 2026

Code Coverage Report

Overall Project 64.85% 🟢
Files changed No Java source files changed -

Module Coverage
aliyun 1.73% 🔴
api 47.14% 🟢
authorization-common 85.96% 🟢
aws 1.1% 🔴
azure 2.6% 🔴
catalog-common 10.0% 🔴
catalog-fileset 80.02% 🟢
catalog-hive 80.98% 🟢
catalog-jdbc-clickhouse 79.06% 🟢
catalog-jdbc-common 42.89% 🟢
catalog-jdbc-doris 80.28% 🟢
catalog-jdbc-hologres 54.03% 🟢
catalog-jdbc-mysql 79.23% 🟢
catalog-jdbc-oceanbase 78.38% 🟢
catalog-jdbc-postgresql 82.05% 🟢
catalog-jdbc-starrocks 78.27% 🟢
catalog-kafka 77.01% 🟢
catalog-lakehouse-generic 45.07% 🟢
catalog-lakehouse-hudi 79.1% 🟢
catalog-lakehouse-iceberg 87.15% 🟢
catalog-lakehouse-paimon 77.71% 🟢
catalog-model 77.72% 🟢
cli 44.51% 🟢
client-java 77.83% 🟢
common 49.42% 🟢
core 80.96% 🟢
filesystem-hadoop3 76.97% 🟢
flink 38.86% 🔴
flink-runtime 0.0% 🔴
gcp 14.2% 🔴
hadoop-common 10.39% 🔴
hive-metastore-common 45.82% 🟢
iceberg-common 50.21% 🟢
iceberg-rest-server 66.24% 🟢
integration-test-common 0.0% 🔴
jobs 66.17% 🟢
lance-common 23.78% 🔴
lance-rest-server 57.84% 🟢
lineage 53.02% 🟢
optimizer 82.95% 🟢
optimizer-api 21.95% 🔴
server 85.6% 🟢
server-common 69.43% 🟢
spark 32.79% 🔴
spark-common 39.09% 🔴
trino-connector 31.62% 🔴

@diqiu50 diqiu50 marked this pull request as draft March 18, 2026 09:23
diqiu50 and others added 3 commits March 19, 2026 16:47
…ions and sync comments

- Add exclude(group = "com.google.guava") and exclude(group = "ch.qos.logback")
  at the dependency level for hadoop2.common and hive2/3.metastore in both
  hive-metastore2-libs and hive-metastore3-libs, consistent with all other
  excluded groups
- Add cross-reference comments noting the exclusion lists are kept in sync
  between the two modules, and documenting that Guava/Logback are provided
  by the Gravitino runtime classpath
…ency-level excludes

Replace all 19 file-name glob patterns in copyLibs tasks with
dependency-level exclude(group = ...) declarations, making the
filtering auditable via ./gradlew dependencies and immune to
JAR filename changes.
@diqiu50 diqiu50 marked this pull request as ready for review March 20, 2026 01:53
@yuqi1129
Copy link
Copy Markdown
Contributor

image What's this used for? Can it be excluded?

@diqiu50
Copy link
Copy Markdown
Contributor Author

diqiu50 commented Mar 20, 2026

image What's this used for? Can it be excluded?

Remove it

@diqiu50 diqiu50 requested a review from yuqi1129 March 20, 2026 07:31
Comment on lines 39 to 41
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

These three can be compiled only

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

fixed

…eOnly in hive-metastore-common

These dependencies are provided at runtime by the catalog plugin classloader,
so they should be compileOnly to avoid bundling them into the package.
@yuqi1129
Copy link
Copy Markdown
Contributor

@diqiu50 CI failed.

@diqiu50
Copy link
Copy Markdown
Contributor Author

diqiu50 commented Mar 23, 2026

@diqiu50 CI failed.

fixed

@diqiu50 diqiu50 requested a review from yuqi1129 March 23, 2026 06:59
@yuqi1129
Copy link
Copy Markdown
Contributor

image

Why does hadoop-common-2.10.2 exist in hive3? Hive3 still uses Hadoop 2.10.2.

@diqiu50
Copy link
Copy Markdown
Contributor Author

diqiu50 commented Mar 23, 2026

image Why does `hadoop-common-2.10.2` exist in `hive3`? Hive3 still uses `Hadoop` 2.10.2.

Yes, Hive 3 also needs hadoop-common, Hadoop 2.10.2 is OK

@yuqi1129
Copy link
Copy Markdown
Contributor

I have no more comments, @jerryshao Would you like to take a look?

@jerryshao jerryshao requested a review from mchades March 24, 2026 09:30
Copy link
Copy Markdown
Contributor

@mchades mchades left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM.

@diqiu50 diqiu50 added the branch-1.2 Automatically cherry-pick commit to branch-1.2 label Mar 25, 2026
@yuqi1129 yuqi1129 merged commit 74e9881 into apache:main Mar 25, 2026
24 checks passed
github-actions Bot pushed a commit that referenced this pull request Mar 25, 2026
… 3 catalogs (#10465)

### What changes were proposed in this pull request?

Reduce the distribution package size of Hive Metastore 2 and 3 catalog
libs
by adding dependency exclusions to filter out unnecessary transitive
JARs.

| Package | Before | After |
|---------|--------|-------|
| hive-metastore2-libs | 102MB (176 JARs) | 78MB (138 JARs) |
| hive-metastore3-libs | 127MB (205 JARs) | 104MB (201 JARs) |

Excluded categories: logging stack (slf4j/log4j/logback), test artifacts
(junit/hamcrest), HBase, DataNucleus, Ant, Avro, Parquet, YARN server
components, and compile-time annotation JARs.

### Why are the changes needed?

Fixes #10457.

### Does this PR introduce _any_ user-facing change?

No.

### How was this patch tested?

Unit tests and integration tests pass.
diqiu50 added a commit to diqiu50/gravitino that referenced this pull request Mar 27, 2026
… 2 and 3 catalogs (apache#10465)

### What changes were proposed in this pull request?

Reduce the distribution package size of Hive Metastore 2 and 3 catalog
libs
by adding dependency exclusions to filter out unnecessary transitive
JARs.

| Package | Before | After |
|---------|--------|-------|
| hive-metastore2-libs | 102MB (176 JARs) | 78MB (138 JARs) |
| hive-metastore3-libs | 127MB (205 JARs) | 104MB (201 JARs) |

Excluded categories: logging stack (slf4j/log4j/logback), test artifacts
(junit/hamcrest), HBase, DataNucleus, Ant, Avro, Parquet, YARN server
components, and compile-time annotation JARs.

### Why are the changes needed?

Fixes apache#10457.

### Does this PR introduce _any_ user-facing change?

No.

### How was this patch tested?

Unit tests and integration tests pass.
jerryshao pushed a commit that referenced this pull request Mar 30, 2026
…size of Hive Metastore 2 and 3 catalogs (#10465) (#10544)

**Cherry-pick Information:**
- Original commit: 74e9881
- Target branch: `branch-1.2`
- Status: ✅ Clean cherry-pick (no conflicts)

Co-authored-by: Yuhui <hui@datastrato.com>
danhuawang pushed a commit to danhuawang/gravitino that referenced this pull request Mar 30, 2026
… 2 and 3 catalogs (apache#10465)

### What changes were proposed in this pull request?

Reduce the distribution package size of Hive Metastore 2 and 3 catalog
libs
by adding dependency exclusions to filter out unnecessary transitive
JARs.

| Package | Before | After |
|---------|--------|-------|
| hive-metastore2-libs | 102MB (176 JARs) | 78MB (138 JARs) |
| hive-metastore3-libs | 127MB (205 JARs) | 104MB (201 JARs) |

Excluded categories: logging stack (slf4j/log4j/logback), test artifacts
(junit/hamcrest), HBase, DataNucleus, Ant, Avro, Parquet, YARN server
components, and compile-time annotation JARs.

### Why are the changes needed?

Fixes apache#10457.

### Does this PR introduce _any_ user-facing change?

No.

### How was this patch tested?

Unit tests and integration tests pass.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

branch-1.2 Automatically cherry-pick commit to branch-1.2

Projects

None yet

Development

Successfully merging this pull request may close these issues.

[Improvement] Shrink the package size of Hive Metastore 2 and Hive Metastore 3 catalogs

4 participants