Skip to content

Changeset import to PostGIS changesets fails with OutOfMemoryError: Required array size too large #33

@CristianCantoro

Description

@CristianCantoro

Description

When running the changesets CLI command on a large .osm.bz2 changeset dump extracted with osmium, the process crashes with:

java.lang.OutOfMemoryError: Required array size too large

Note that importing the full planet file does not cause this issue.

Steps to reproduce

  1. Download the changesets planet file
  2. Prepare a large changeset dump in .osm.bz2 format (in my case, Italy). I have run the following command:
osmium changeset-filter \
  --bbox=6.240134062,35.5420306591,19.0018525438,47.2604346174 \
  -o changesets-260124-italy.osm.bz2 \
    changesets-260124.osm.bz2

This is the file I obtain and with which I can reproduce the error changesets-260124-italy.osm.bz2 (390MB).

  1. Launch a PostGIS container (as per the instructions in the README.md) with:
docker run \
    --name "ohsome_planet_changeset_db" \
    -e POSTGRES_PASSWORD=$OHSOME_PLANET_DB_PASSWORD \
    -e POSTGRES_USER=$OHSOME_PLANET_DB_USER \
    -e POSTGRES_DB=$OHSOME_PLANET_DB \
    -p 5432:5432 \
    -v postgis_data:/var/lib/postgresql/data \
    postgis/postgis:latest
  1. Try importing into PostGIS with:
java -Xmx52G -jar ohsome-planet-cli/target/ohsome-planet.jar changesets \
  --bz2 changesets-260124-italy.osm.bz2 \
  --changeset-db "jdbc:postgresql://localhost:5432/$OHSOME_PLANET_DB?user=$OHSOME_PLANET_DB_USER&password=$OHSOME_PLANET_DB_PASSWORD" \
  --create-tables \
  --overwrite

Actual behavior

The process throws:
java.lang.OutOfMemoryError: Required array size too large

Stack trace (excerpt):

java.lang.OutOfMemoryError: Required array size too large
  at java.base/java.io.InputStream.readNBytes(InputStream.java:420)
  at java.base/java.io.InputStream.readAllBytes(InputStream.java:349)
  at org.heigit.ohsome.osm.changesets.PBZ2Reader.decompress(PBZ2Reader.java:46)
  ...

(full log)

The process gets stuck and never ends.

Expected behavior

The changesets import should succeed or at least fail gracefully with a clear message.

Environment

osmium version 1.16.0
libosmium version 2.20.0
Supported PBF compression types: none zlib lz4
  • Java:
$ java -version
openjdk version "21.0.9" 2025-10-21
OpenJDK Runtime Environment (build 21.0.9+10-Ubuntu-124.04)
OpenJDK 64-Bit Server VM (build 21.0.9+10-Ubuntu-124.04, mixed mode, sharing)
  • OS: Ubuntu 24.04
$ lsb_release -a
No LSB modules are available.
Distributor ID: Ubuntu
Description:  Ubuntu 24.04.3 LTS
Release:  24.04
Codename: noble
$ uname -a
Linux [host] 6.8.0-94-generic #96-Ubuntu SMP PREEMPT_DYNAMIC Fri Jan  9 20:36:55 UTC 2026 x86_64 x86_64 x86_64 GNU/Linux
  • Database:

Postgres (w/ Docker):

# SELECT version();
                                                           version                                                           
-----------------------------------------------------------------------------------------------------------------------------
 PostgreSQL 17.5 (Debian 17.5-1.pgdg110+1) on x86_64-pc-linux-gnu, compiled by gcc (Debian 10.2.1-6) 10.2.1 20210110, 64-bit
(1 row)

PostGIS:

# SELECT extversion
FROM pg_catalog.pg_extension
WHERE extname='postgis';
 extversion 
------------
 3.5.2
(1 row)

Workaround

Querying ChatGPT, I was able to find a workaround to this problem by repacking the .bz2 file:

bunzip2 -c changesets-260124-italy.osm.bz2 \
  | pbzip2 -b50 -p8 > changesets-260124-italy.repacked.osm.bz2

In this way, I am able to import the changeset extract without issue.

The stack trace shows the exception originates from java.io.InputStream.readAllBytes() inside org.heigit.ohsome.osm.changesets.PBZ2Reader.decompress, suggesting the implementation tries to read the decompressed content into a single byte array.

This failure happens even with a large heap (-Xmx52G), which indicates that the problem likely lies with the JDK byte-array size / int indexing limit, triggered by attempting to allocate an array > ~2GB.

If using Apache Commons Compress for bzip2, it may also help to ensure concatenated bzip2 streams are supported (common in “multi-stream” .bz2 files), but the core issue here appears to be readAllBytes() forcing a single huge allocation.

The workaround works by repacking the compressed file so that each pack has a size smaller than 2GB.

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type
    No fields configured for issues without a type.

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions