Skip to content

Support large Zip files#23594

Open
lzhou2025 wants to merge 1 commit intoeclipse-openj9:masterfrom
lzhou2025:zipSizeLimit
Open

Support large Zip files#23594
lzhou2025 wants to merge 1 commit intoeclipse-openj9:masterfrom
lzhou2025:zipSizeLimit

Conversation

@lzhou2025
Copy link
Copy Markdown
Member

@lzhou2025 lzhou2025 commented Mar 27, 2026

The changes are to remove the 4GB size limit of Zip files and support LARGE FILE for JVM in 64-bit and 32-bit environments.

Fixes: #23441, #23458

@lzhou2025 lzhou2025 force-pushed the zipSizeLimit branch 12 times, most recently from 5fc8e1c to 075e85f Compare April 2, 2026 19:30
@lzhou2025 lzhou2025 marked this pull request as ready for review April 6, 2026 12:11
Comment thread runtime/zip/zipsup.c Outdated
Comment thread runtime/zip/zipsup.c Outdated
Comment thread runtime/zip/zipsup.c Outdated
Comment thread runtime/zip/zipsup.c Outdated
Comment thread runtime/zip/zipsup.c Outdated
Comment thread runtime/zip/zipsup.c Outdated
Comment thread runtime/zip/zipsup.c Outdated
Comment thread runtime/zip/zipsup.c Outdated
Comment thread runtime/zip/zipsup.c Outdated
Comment thread runtime/zip/zipsup.c Outdated
Comment thread runtime/zip/zipsup.c Outdated
Comment thread runtime/zip/zipsup.c Outdated
Comment thread runtime/zip/zipsup.c Outdated
Comment thread runtime/zip/zipsup.c Outdated
Comment thread runtime/zip/zipsup.c Outdated
Comment thread runtime/zip/zipsup.c
Comment thread runtime/include/vmizip.h
Comment thread runtime/include/zipsup.h
Comment thread runtime/zip/zipsup.c
Comment thread runtime/zip/zipsup.c Outdated
Comment thread runtime/zip/zipsup.c Outdated
Comment thread runtime/zip/zipsup.c Outdated
Comment thread runtime/zip/zipsup.c Outdated
Comment thread runtime/zip/zipsup.c Outdated
Comment thread runtime/zip/zipsup.c Outdated
Comment thread runtime/zip/zipsup.c Outdated
@lzhou2025
Copy link
Copy Markdown
Member Author

I believe all comments are resolved, please reopen if missed.

Comment thread runtime/zip/zipsup.c
Comment thread runtime/zip/zipsup.c Outdated
Comment thread runtime/zip/zipsup.c Outdated
Comment thread runtime/zip/zipsup.c Outdated
Comment thread runtime/zip/zipsup.c Outdated
Comment thread runtime/zip/zipsup.c Outdated
Comment thread runtime/zip/zipsup.c Outdated
Comment thread runtime/zip/zipsup.c Outdated
Comment thread runtime/zip/zipsup.c Outdated
Comment thread runtime/zip/zipsup.c Outdated
@keithc-ca
Copy link
Copy Markdown
Contributor

The commit message would be improved by explicitly stating that this support is not yet complete for 32-bit platforms.

@lzhou2025 lzhou2025 force-pushed the zipSizeLimit branch 2 times, most recently from dde7099 to 39bab11 Compare April 9, 2026 20:57
@lzhou2025
Copy link
Copy Markdown
Member Author

lzhou2025 commented Apr 10, 2026

I believe all comments are resolved, please reopen if missed.
For the legacy codes written decades ago, signed 32-bit integer (IDATA) is widely used in 32-bit platforms, will be replaced by I_64 in the next PR to support #23458 and LARGE FILE in 32-bit platforms.

Comment thread runtime/zip/zipsup.c Outdated
Comment thread runtime/zip/zipsup.c Outdated
Comment thread runtime/zip/zipsup.c Outdated
Comment thread runtime/zip/zipsup.c Outdated
Comment thread runtime/zip/zipsup.c Outdated
@lzhou2025
Copy link
Copy Markdown
Member Author

lzhou2025 commented Apr 13, 2026

That IDATA is widely used in the zip codes appears because of OMR,

omrfile_read(struct OMRPortLibrary *portLibrary, intptr_t fd, void *buf, intptr_t nbytes)

intptr_t should be uintptr_t or size_t, because the read() calls of Unix like systems or POSIX are size_t, unsigned.

@keithc-ca
Copy link
Copy Markdown
Contributor

omrfile_read(struct OMRPortLibrary *portLibrary, intptr_t fd, void *buf, intptr_t nbytes)

intptr_t should be uintptr_t or size_t, because the read() calls of Unix like systems or POSIX are size_t, unsigned.

Yes, perhaps the OMR API could be improved, but that is unrelated to the concepts involved here. The file format allows for large entries and files, perhaps larger than could exist in memory at any given time. Things could still be made to work if OMR only allowed reading 32kB in a single call (in support of even 32-bit zip files).

@lzhou2025
Copy link
Copy Markdown
Member Author

omrfile_read(struct OMRPortLibrary *portLibrary, intptr_t fd, void *buf, intptr_t nbytes)

intptr_t should be uintptr_t or size_t, because the read() calls of Unix like systems or POSIX are size_t, unsigned.

Yes, perhaps the OMR API could be improved, but that is unrelated to the concepts involved here. The file format allows for large entries and files, perhaps larger than could exist in memory at any given time. Things could still be made to work if OMR only allowed reading 32kB in a single call (in support of even 32-bit zip files).

omrfile_read(struct OMRPortLibrary *portLibrary, intptr_t fd, void *buf, intptr_t nbytes)

intptr_t should be uintptr_t or size_t, because the read() calls of Unix like systems or POSIX are size_t, unsigned.

Yes, perhaps the OMR API could be improved, but that is unrelated to the concepts involved here. The file format allows for large entries and files, perhaps larger than could exist in memory at any given time. Things could still be made to work if OMR only allowed reading 32kB in a single call (in support of even 32-bit zip files).

The ZIP format supports up to 4Gb size of each individual uncompressed file. But depending how to decompress it, if trying to allocate that size of memory buffer in 32-bit system, I agree, that will exceed the maximum memory space.

@lzhou2025
Copy link
Copy Markdown
Member Author

OMR layer doesn't know what size of data is required in applications. Before the zip functions call OMR interfaces, should allocate memory buffers, which is fit to uncompress logic.

@lzhou2025
Copy link
Copy Markdown
Member Author

lzhou2025 commented Apr 15, 2026

ZIP64 & LARGE FILE support in two PRs:

@lzhou2025
Copy link
Copy Markdown
Member Author

I believe all comments related to this PR are resolved.

Comment thread runtime/include/zipsup.h
Comment on lines 88 to 99
#if defined(J9VM_ENV_DATA64)
/* entries in the central dir <= INT32_MAX */
I_32 totalEntries;
/* size of the central dir <= LLONG_MAX */
I_64 dirSize;
/* offset of the central dir <= LLONG_MAX */
I_64 dirOffset;
#else /* defined(J9VM_ENV_DATA64) */
U_16 totalEntries;
U_32 dirSize;
U_32 dirOffset;
#endif /* defined(J9VM_ENV_DATA64) */
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This pattern is incompatible with DDR. Field types may not change over time, nor vary by platform.
For example, on 32-bit platforms this would yield

    // U16 totalEntries
    @com.ibm.j9ddr.GeneratedFieldAccessor(offsetFieldName="_totalEntriesOffset_", declaredType="U16")
    public U16 totalEntries() throws CorruptDataException {
        return new U16(getShortAtOffset(J9ZipCentralEnd32._totalEntriesOffset_));
    }

    // U16 totalEntries
    public U16Pointer totalEntriesEA() throws CorruptDataException {
        return U16Pointer.cast(nonNullFieldEA(J9ZipCentralEnd32._totalEntriesOffset_));
    }

and on 64-bit platforms

    // I32 totalEntries
    @com.ibm.j9ddr.GeneratedFieldAccessor(offsetFieldName="_totalEntriesOffset_", declaredType="I32")
    public IDATA totalEntries() throws CorruptDataException {
        return new I32(getIntAtOffset(J9ZipCentralEnd._totalEntriesOffset_));
    }

    // I32 totalEntries
    public IDATAPointer totalEntriesEA() throws CorruptDataException {
        return IDATAPointer.cast(nonNullFieldEA(J9ZipCentralEnd._totalEntriesOffset_));
    }

The return types of those methods differ so jdmpview on a 32-bit platform would encounter NoSuchMethodError accessing a system dump from a 64-bit platform and vice versa (even if we ignore the difficulties writing code that's compatible with both sets of signatures).

The 64-bit declarations must be used exclusively.

@lzhou2025 lzhou2025 changed the title Support large Zip files for JVM/64-bit Support large Zip files Apr 22, 2026
The changes are to remove the 4GB size limit of Zip files and
support LARGE FILE for JVM in 64-bit and 32-bit environments.

Fixes: eclipse-openj9#23441, eclipse-openj9#23458
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

ZIP64: Remove the 4Gb size limit of archive file for JVM/64-bit

3 participants