[SPARK-51654][BUILD] Add a dev script to compare SBT and Maven builds#54371
[SPARK-51654][BUILD] Add a dev script to compare SBT and Maven builds#54371fangchenli wants to merge 21 commits intoapache:masterfrom
Conversation
|
That's a great finding! |
|
What is this for? Like that's the end goal? The JIRA just says need to investigate but generally we've just had maven be the build of record and sbt be the dev build is there a problem we're trying to solve? |
+1, I am also wondering how to use it in release process or daily development, and if there are differences, what are the follow-up actions? |
+1. This is actually a long-standing 'issue'. From a personal perspective, I would much prefer to convert the project into an sbt-only project to completely eliminate the possibility of such inconsistencies. However, I've somewhat forgotten the reasons for not doing so. |
|
knowing the differences is good but fixing them is more important. It's more valuable to merge the changes that can make SBT jar the same as maven jar. |
|
I started looking into this because I want to enable Scala 3 support for Spark. Cross-compiling Scala 3 and 2.13 gets a lot easier if we separate the sbt build from Maven, so I tried that out (haven’t opened a PR yet), and it worked locally. Right now, I’m just relying on “all tests passing” as the only sign that things work. The goal of this PR is to add more ways to observe build outputs, so we can be more confident that the sbt build refactor didn’t introduce any subtle changes. Alternatively, I can keep this as a personal dev tool for now and focus on advancing the native sbt build. We could get that merged quickly and use it experimentally to monitor for potential issues. Once we’re confident in its stability, we could migrate the release process to sbt and make Spark an sbt-only project. xref: SPARK-44173 |
What changes were proposed in this pull request?
Add a dev script to compare SBT and Maven builds. Pure Python, no dependency.
Why are the changes needed?
Currently, the Jars produced by Maven and SBT differ; we need to be able to inspect those differences. This is also the precursor for native SBT build. We can answer the question in the original Jira issue:
Output:
Does this PR introduce any user-facing change?
No.
How was this patch tested?
The script includes a partial self-test. But to further test this script, we need more user feedback and to investigate the differences it found.
Was this patch authored or co-authored using generative AI tooling?
Generated-by: Claude Opus 4.6