Description
When running the changesets CLI command on a large .osm.bz2 changeset dump extracted with osmium, the process crashes with:
java.lang.OutOfMemoryError: Required array size too large
Note that importing the full planet file does not cause this issue.
Steps to reproduce
- Download the changesets planet file
- Prepare a large changeset dump in
.osm.bz2 format (in my case, Italy). I have run the following command:
osmium changeset-filter \
--bbox=6.240134062,35.5420306591,19.0018525438,47.2604346174 \
-o changesets-260124-italy.osm.bz2 \
changesets-260124.osm.bz2
This is the file I obtain and with which I can reproduce the error changesets-260124-italy.osm.bz2 (390MB).
- Launch a PostGIS container (as per the instructions in the README.md) with:
docker run \
--name "ohsome_planet_changeset_db" \
-e POSTGRES_PASSWORD=$OHSOME_PLANET_DB_PASSWORD \
-e POSTGRES_USER=$OHSOME_PLANET_DB_USER \
-e POSTGRES_DB=$OHSOME_PLANET_DB \
-p 5432:5432 \
-v postgis_data:/var/lib/postgresql/data \
postgis/postgis:latest
- Try importing into PostGIS with:
java -Xmx52G -jar ohsome-planet-cli/target/ohsome-planet.jar changesets \
--bz2 changesets-260124-italy.osm.bz2 \
--changeset-db "jdbc:postgresql://localhost:5432/$OHSOME_PLANET_DB?user=$OHSOME_PLANET_DB_USER&password=$OHSOME_PLANET_DB_PASSWORD" \
--create-tables \
--overwrite
Actual behavior
The process throws:
java.lang.OutOfMemoryError: Required array size too large
Stack trace (excerpt):
java.lang.OutOfMemoryError: Required array size too large
at java.base/java.io.InputStream.readNBytes(InputStream.java:420)
at java.base/java.io.InputStream.readAllBytes(InputStream.java:349)
at org.heigit.ohsome.osm.changesets.PBZ2Reader.decompress(PBZ2Reader.java:46)
...
(full log)
The process gets stuck and never ends.
Expected behavior
The changesets import should succeed or at least fail gracefully with a clear message.
Environment
osmium version 1.16.0
libosmium version 2.20.0
Supported PBF compression types: none zlib lz4
$ java -version
openjdk version "21.0.9" 2025-10-21
OpenJDK Runtime Environment (build 21.0.9+10-Ubuntu-124.04)
OpenJDK 64-Bit Server VM (build 21.0.9+10-Ubuntu-124.04, mixed mode, sharing)
$ lsb_release -a
No LSB modules are available.
Distributor ID: Ubuntu
Description: Ubuntu 24.04.3 LTS
Release: 24.04
Codename: noble
$ uname -a
Linux [host] 6.8.0-94-generic #96-Ubuntu SMP PREEMPT_DYNAMIC Fri Jan 9 20:36:55 UTC 2026 x86_64 x86_64 x86_64 GNU/Linux
Postgres (w/ Docker):
# SELECT version();
version
-----------------------------------------------------------------------------------------------------------------------------
PostgreSQL 17.5 (Debian 17.5-1.pgdg110+1) on x86_64-pc-linux-gnu, compiled by gcc (Debian 10.2.1-6) 10.2.1 20210110, 64-bit
(1 row)
PostGIS:
# SELECT extversion
FROM pg_catalog.pg_extension
WHERE extname='postgis';
extversion
------------
3.5.2
(1 row)
Workaround
Querying ChatGPT, I was able to find a workaround to this problem by repacking the .bz2 file:
bunzip2 -c changesets-260124-italy.osm.bz2 \
| pbzip2 -b50 -p8 > changesets-260124-italy.repacked.osm.bz2
In this way, I am able to import the changeset extract without issue.
The stack trace shows the exception originates from java.io.InputStream.readAllBytes() inside org.heigit.ohsome.osm.changesets.PBZ2Reader.decompress, suggesting the implementation tries to read the decompressed content into a single byte array.
This failure happens even with a large heap (-Xmx52G), which indicates that the problem likely lies with the JDK byte-array size / int indexing limit, triggered by attempting to allocate an array > ~2GB.
If using Apache Commons Compress for bzip2, it may also help to ensure concatenated bzip2 streams are supported (common in “multi-stream” .bz2 files), but the core issue here appears to be readAllBytes() forcing a single huge allocation.
The workaround works by repacking the compressed file so that each pack has a size smaller than 2GB.
Description
When running the
changesetsCLI command on a large.osm.bz2changeset dump extracted withosmium, the process crashes with:java.lang.OutOfMemoryError: Required array size too largeNote that importing the full planet file does not cause this issue.
Steps to reproduce
.osm.bz2format (in my case, Italy). I have run the following command:osmium changeset-filter \ --bbox=6.240134062,35.5420306591,19.0018525438,47.2604346174 \ -o changesets-260124-italy.osm.bz2 \ changesets-260124.osm.bz2This is the file I obtain and with which I can reproduce the error
changesets-260124-italy.osm.bz2(390MB).docker run \ --name "ohsome_planet_changeset_db" \ -e POSTGRES_PASSWORD=$OHSOME_PLANET_DB_PASSWORD \ -e POSTGRES_USER=$OHSOME_PLANET_DB_USER \ -e POSTGRES_DB=$OHSOME_PLANET_DB \ -p 5432:5432 \ -v postgis_data:/var/lib/postgresql/data \ postgis/postgis:latestjava -Xmx52G -jar ohsome-planet-cli/target/ohsome-planet.jar changesets \ --bz2 changesets-260124-italy.osm.bz2 \ --changeset-db "jdbc:postgresql://localhost:5432/$OHSOME_PLANET_DB?user=$OHSOME_PLANET_DB_USER&password=$OHSOME_PLANET_DB_PASSWORD" \ --create-tables \ --overwriteActual behavior
The process throws:
java.lang.OutOfMemoryError: Required array size too largeStack trace (excerpt):
(full log)
The process gets stuck and never ends.
Expected behavior
The
changesetsimport should succeed or at least fail gracefully with a clear message.Environment
22721f27fdd4b7120c502549a1fb6f2908d8b02b, tag:1.2.0(I cloned and checked out the tag, see issue Build of ohsome-planet-cli fails for release 1.2.0 #29)$ java -version openjdk version "21.0.9" 2025-10-21 OpenJDK Runtime Environment (build 21.0.9+10-Ubuntu-124.04) OpenJDK 64-Bit Server VM (build 21.0.9+10-Ubuntu-124.04, mixed mode, sharing)Postgres (w/ Docker):
PostGIS:
Workaround
Querying ChatGPT, I was able to find a workaround to this problem by repacking the
.bz2file:In this way, I am able to import the changeset extract without issue.
The stack trace shows the exception originates from
java.io.InputStream.readAllBytes()insideorg.heigit.ohsome.osm.changesets.PBZ2Reader.decompress, suggesting the implementation tries to read the decompressed content into a single byte array.This failure happens even with a large heap (
-Xmx52G), which indicates that the problem likely lies with the JDK byte-array size / int indexing limit, triggered by attempting to allocate an array > ~2GB.If using Apache Commons Compress for bzip2, it may also help to ensure concatenated bzip2 streams are supported (common in “multi-stream”
.bz2files), but the core issue here appears to bereadAllBytes()forcing a single huge allocation.The workaround works by repacking the compressed file so that each pack has a size smaller than 2GB.