DeepCompress: Efficient Point Cloud Geometry Compression

Authors

Affiliation: Ericsson Research Paper: Research Paper (arXiv)

What is Point Cloud Compression?

The Problem

A point cloud is a collection of 3D points that represent the shape of an object or environment. Think of it like a 3D scan of the world—each point has an X, Y, and Z coordinate, and together they form a detailed 3D model.

Point clouds are used in:

Self-driving cars: LIDAR sensors generate millions of 3D points to understand the environment
Virtual/Augmented Reality: Creating realistic 3D environments
3D mapping: Surveying buildings, cities, and landscapes
Medical imaging: 3D body scans and organ models

The challenge: Point clouds are huge. A single LIDAR scan can contain millions of points, and streaming or storing this data requires enormous bandwidth and storage. For example:

A 10-second LIDAR capture might be 500MB uncompressed
Streaming this in real-time would require 400 Mbps bandwidth

The Solution

DeepCompress uses deep learning to compress point clouds efficiently—similar to how JPEG compresses images or MP3 compresses audio. The key insight is that point clouds have patterns and structure that a neural network can learn to represent more efficiently.

How it works (simplified):

Encode: The neural network analyzes the point cloud and creates a compact "summary" (called a latent representation)
Compress: This summary is converted to a small file using entropy coding (like ZIP, but smarter)
Decompress: The summary is expanded back
Decode: The neural network reconstructs the original point cloud

The result: 10-100x smaller files with minimal quality loss.

What's New in V2

DeepCompress V2 introduces two major improvements:

1. Smarter Compression (Advanced Entropy Models)

What is entropy modeling?

When compressing data, we need to predict "how surprising" each value is. Common values can be stored with fewer bits; rare values need more bits. This is called entropy coding.

Analogy: In English text, the letter 'E' is very common, so we could represent it with a short code (like '1'). The letter 'Z' is rare, so it gets a longer code (like '10110'). This is how Morse code works, and it's the foundation of all compression.

V2 offers multiple ways to predict these probabilities:

Entropy Model	How It Works	Best For
`gaussian`	Assumes all values follow a simple bell curve	Fast, basic compression
`hyperprior`	Learns a custom probability for each location	Good balance of speed and compression
`channel`	Uses already-decoded parts to predict the rest	Better compression, still fast
`context`	Looks at neighboring values for prediction	Best compression, slower
`attention`	Considers long-range patterns across the entire cloud	Complex shapes with repeating patterns
`hybrid`	Combines multiple approaches	Maximum compression quality

Expected improvements over baseline:

Hyperprior: 15-25% smaller files
Channel context: 25-35% smaller files
Full context: 30-40% smaller files

2. Faster Processing (Performance Optimizations)

V2 includes engineering optimizations that make the code run faster and use less memory:

What We Optimized	What It Does	Improvement
Binary search for scale lookup	Finding the right compression parameter is now O(log n) instead of O(n)	5x faster, 64x less memory
Vectorized mask creation	Creating neural network masks uses efficient array operations	10-100x faster
Windowed attention	Instead of comparing every point to every other point, we only compare nearby points	10-50x faster, 400x less memory
Pre-computed constants	Mathematical constants like log(2) are calculated once, not every time	~5% faster
Smarter memory allocation	Avoid creating unnecessary temporary data	25% less memory

Why does this matter?

Real-time compression becomes possible
Can run on less powerful hardware
Larger point clouds can be processed without running out of memory

Quick Start

Step 1: Installation

First, set up your Python environment:

# Download the code
git clone https://github.com/pmclsf/deepcompress.git
cd deepcompress

# Create an isolated Python environment (keeps dependencies separate)
python -m venv env
source env/bin/activate  # On Windows: env\Scripts\activate

# Install required packages
pip install -r requirements.txt

What this does: Downloads DeepCompress and installs the necessary Python libraries (TensorFlow for neural networks, NumPy for math, etc.).

Step 2: Quick Test (No Dataset Needed)

Want to see it work without downloading any data? Run our synthetic benchmark:

python -m src.quick_benchmark --compare

What this does: Creates artificial 3D shapes (spheres, random points) and tests how well different model configurations compress them. You'll see output like:

======================================================================
Summary Comparison
======================================================================
Model                PSNR (dB)    BPV        Time (ms)    Ratio
----------------------------------------------------------------------
v1                   7.20         N/A        92.8         N/A
v2-hyperprior        7.20         0.205      74.6         156.3x
v2-channel           7.20         0.349      138.4        91.8x
======================================================================

Reading the results:

PSNR (dB): Quality metric—higher is better. Low values here are expected because the model isn't trained yet.
BPV (Bits Per Voxel): How many bits needed per 3D point—lower is better compression.
Time (ms): Processing speed in milliseconds—lower is faster.
Ratio: Compression ratio—higher means smaller files.

Using V2 Models in Your Code

Basic Example

from model_transforms import DeepCompressModelV2, TransformConfig

# Step 1: Configure the model architecture
config = TransformConfig(
    filters=64,              # Number of neural network channels (more = better quality, slower)
    kernel_size=(3, 3, 3),   # Size of 3D convolution filters
    strides=(2, 2, 2),       # How much to downsample at each layer
    activation='cenic_gdn',  # Special activation function for compression
    conv_type='separable'    # Efficient convolution type
)

# Step 2: Create the model with your chosen entropy model
model = DeepCompressModelV2(
    config,
    entropy_model='hyperprior'  # Options: 'gaussian', 'hyperprior', 'channel', 'context', 'attention', 'hybrid'
)

# Step 3: Compress a point cloud
# input_tensor should be a 5D tensor: (batch, depth, height, width, channels)
x_hat, y, y_hat, z, rate_info = model(input_tensor, training=False)

# x_hat: The reconstructed point cloud
# rate_info['total_bits']: How many bits the compressed version would take

Enabling Faster Training with Mixed Precision

Modern GPUs can compute faster using 16-bit numbers instead of 32-bit. This is called mixed precision:

from precision_config import PrecisionManager

# Enable mixed precision (uses float16 for speed, float32 for accuracy where needed)
PrecisionManager.configure('mixed_float16')

# Wrap your optimizer to handle the precision scaling
optimizer = tf.keras.optimizers.Adam(learning_rate=0.0001)
optimizer = PrecisionManager.wrap_optimizer(optimizer)

# Now train as usual—it will automatically be faster on compatible GPUs
model.compile(optimizer=optimizer, loss=your_loss_function)
model.fit(training_data, epochs=100)

When to use this: If you have an NVIDIA GPU with Tensor Cores (RTX series, V100, A100, etc.), mixed precision can give you 1.5-2x speedup with minimal quality impact.

Full Training Pipeline

If you want to train your own model from scratch on real data, follow these steps:

Step 1: Environment Setup

# Clone and enter the repository
git clone https://github.com/pmclsf/deepcompress.git
cd deepcompress

# Create isolated Python environment
python -m venv env
source env/bin/activate

# Install dependencies
pip install -r requirements.txt

# Create folders for data and results
mkdir -p data/modelnet40    # Training data will go here
mkdir -p data/8ivfb         # Evaluation data will go here
mkdir -p results/models     # Trained models saved here
mkdir -p results/metrics    # Evaluation results saved here

Step 2: Dataset Preparation

We use two datasets:

ModelNet40: 3D CAD models for training (chairs, tables, airplanes, etc.)
8iVFB: High-quality point cloud sequences for evaluation

# Download ModelNet40 (3D object dataset from Princeton)
wget http://modelnet.cs.princeton.edu/ModelNet40.zip
unzip ModelNet40.zip -d data/modelnet40/

What is ModelNet40? A collection of 12,311 3D CAD models across 40 categories (airplane, bathtub, bed, bench, etc.). We use these to teach the neural network what 3D shapes look like.

Now we need to convert these 3D models into the format DeepCompress uses:

# Step 2a: Select the 200 largest models from each category
# (Larger models have more detail and are better for training)
python ds_select_largest.py \
    data/modelnet40/ModelNet40 \
    data/modelnet40/ModelNet40_200 \
    200

What this does: Goes through each category and picks the 200 models with the most vertices. Small models don't have enough detail to train on effectively.

# Step 2b: Convert 3D meshes to point clouds
# A mesh is triangles; a point cloud is just points
python ds_mesh_to_pc.py \
    data/modelnet40/ModelNet40_200 \
    data/modelnet40/ModelNet40_200_pc512 \
    --vg_size 512

What this does: Samples points from the surface of each 3D model and places them in a 512×512×512 voxel grid. Think of it like converting a smooth surface into LEGO blocks.

# Step 2c: Split into octree blocks
# Large point clouds are divided into smaller chunks for processing
python ds_pc_octree_blocks.py \
    data/modelnet40/ModelNet40_200_pc512 \
    data/modelnet40/ModelNet40_200_pc512_oct3 \
    --vg_size 512 \
    --level 3

What this does: Divides each point cloud into 8³ = 512 smaller blocks using an octree (a tree where each node has 8 children). This makes training more efficient because:

Each block fits in GPU memory
The network sees more variety (different parts of different objects)
Blocks can be processed in parallel

# Step 2d: Select the 4000 most detailed blocks
python ds_select_largest.py \
    data/modelnet40/ModelNet40_200_pc512_oct3 \
    data/modelnet40/ModelNet40_200_pc512_oct3_4k \
    4000

What this does: Not all blocks are useful—some are empty or nearly empty. We keep only the 4000 blocks with the most points, ensuring we train on meaningful data.

Step 3: Training

Create a configuration file that defines all training parameters:

cat > config/train_config.yml << EOL
# Data settings
data:
  modelnet40_path: "data/modelnet40/ModelNet40_200_pc512_oct3_4k"
  ivfb_path: "data/8ivfb"
  resolution: 64          # Size of input blocks (64×64×64 voxels)
  block_size: 1.0         # Physical size of each block
  min_points: 100         # Ignore blocks with fewer points
  augment: true           # Apply random rotations/flips for variety

# Model architecture
model:
  filters: 64             # Neural network width (more = more capacity)
  activation: "cenic_gdn" # Activation function optimized for compression
  conv_type: "separable"  # Efficient 1+2D convolutions instead of full 3D
  entropy_model: "hyperprior"  # Which entropy model to use

# Training settings
training:
  batch_size: 32          # How many blocks to process at once
  epochs: 100             # How many times to go through all data
  learning_rates:
    reconstruction: 1.0e-4  # Learning rate for quality
    entropy: 1.0e-3         # Learning rate for compression
  focal_loss:
    alpha: 0.75           # Weight for hard examples
    gamma: 2.0            # Focus on difficult cases
  checkpoint_dir: "results/models"
  mixed_precision: false  # Set to true for faster GPU training
EOL

Understanding the parameters:

batch_size: Larger batches are more stable but need more GPU memory
epochs: More epochs = more training, but eventually you overfit
learning_rate: How big of steps to take when learning. Too high = unstable, too low = slow
focal_loss: Helps the network focus on the hard parts of the point cloud (edges, fine details)

Now start training:

python training_pipeline.py config/train_config.yml

What happens during training:

The model loads batches of point cloud blocks
It tries to compress and reconstruct each block
It measures two things: reconstruction quality and compressed size
It adjusts its weights to improve both metrics
Every epoch, it saves a checkpoint so you can resume if interrupted

Training typically takes:

CPU only: Several days
Single GPU: 12-24 hours
Multiple GPUs: A few hours

Step 4: Evaluation

After training, test how well your model performs on new data:

# Run evaluation on the 8iVFB dataset
python evaluation_pipeline.py \
    config/train_config.yml \
    --checkpoint results/models/best_model

What this measures:

PSNR (Peak Signal-to-Noise Ratio): How similar the reconstruction is to the original (higher = better)
Chamfer Distance: Average distance between original and reconstructed points (lower = better)
Bits per point: How many bits needed per 3D point (lower = better compression)
Compression/decompression time: How fast is it?

# Generate comparison metrics against other methods
python ev_compare.py \
    --original data/8ivfb \
    --compressed results/compressed \
    --output results/metrics

# Create visualizations of the results
python ev_run_render.py config/train_config.yml

What this creates: Side-by-side images showing original vs. reconstructed point clouds, color-coded by error.

Step 5: Compare with Industry Standard (G-PCC)

G-PCC is the industry-standard point cloud codec from MPEG. Compare your results:

# Generate a final comparison report
python mp_report.py \
    results/metrics/evaluation_report.json \
    results/metrics/final_report.json

What you'll see: A table comparing DeepCompress vs. G-PCC on metrics like:

BD-Rate: Percentage bitrate savings at the same quality
BD-PSNR: Quality improvement at the same bitrate

Expected Results

After completing the full pipeline:

Metric	DeepCompress V1	DeepCompress V2 (Hyperprior)
BD-Rate vs G-PCC	-8%	-20% to -30%
Model Parameters	1.0M	1.2M
Inference Speed	Baseline	2-3x faster
Memory Usage	Baseline	50% lower

Understanding the Architecture

How Neural Compression Works

Traditional compression (like ZIP) looks for repeated patterns in data. Neural compression goes further—it learns what patterns exist in a specific type of data.

                    ENCODER                              DECODER

Original         ┌─────────────┐                     ┌─────────────┐
Point Cloud  ──► │  Analysis   │ ──► Latent y ──►   │  Synthesis  │ ──► Reconstructed
(Large)          │  Transform  │     (Small)        │  Transform  │     Point Cloud
                 └─────────────┘                     └─────────────┘
                       │                                   ▲
                       ▼                                   │
                 ┌─────────────┐                     ┌─────────────┐
                 │   Hyper     │ ──► z (Tiny) ──►   │   Hyper     │
                 │  Encoder    │                    │  Decoder    │
                 └─────────────┘                     └─────────────┘
                                                           │
                                                           ▼
                                                    ┌─────────────┐
                                                    │  Entropy    │
                                                    │   Model     │
                                                    └─────────────┘
                                                           │
                                                           ▼
                                                      Bitstream
                                                    (Compressed File)

The key insight: The "latent" representation (y) is much smaller than the original, but contains enough information to reconstruct it. The "hyper" path (z) helps the entropy model know what probabilities to use.

Why Different Entropy Models Matter

The entropy model is crucial because it determines how efficiently we can convert the latent representation into bits.

Gaussian (baseline): Assumes every value follows the same bell curve. Simple but not accurate.

Hyperprior: Learns a custom mean and variance for each position. Like having a different bell curve for each value.

Channel Context: Processes channels in order, using earlier channels to predict later ones. Like reading a book—earlier words help predict later words.

Spatial Context: Uses neighboring positions to predict each value. Like filling in a crossword puzzle—the letters around you give hints.

Attention: Looks at the entire point cloud to find relevant patterns. Like having a photographic memory of similar shapes you've seen before.

V2 Architecture Diagram

Input Voxel Grid (e.g., 64×64×64×1)
       │
       ▼
┌─────────────────────────────────────────────────────────────┐
│ ANALYSIS TRANSFORM                                          │
│ ┌─────────┐    ┌─────────┐    ┌─────────┐                  │
│ │ Conv3D  │───►│ Conv3D  │───►│ Conv3D  │───► Latent y     │
│ │ + GDN   │    │ + GDN   │    │ + GDN   │    (8×8×8×192)   │
│ └─────────┘    └─────────┘    └─────────┘                  │
│   64→128         128→192        192→192                     │
└─────────────────────────────────────────────────────────────┘
       │
       ▼
┌─────────────────────────────────────────────────────────────┐
│ HYPER ENCODER                                               │
│ Latent y ──► Conv3D ──► Conv3D ──► Hyper-latent z          │
│                                    (4×4×4×128)              │
└─────────────────────────────────────────────────────────────┘
       │
       ▼
┌─────────────────────────────────────────────────────────────┐
│ ENTROPY MODEL (V2 - Configurable)                           │
│                                                             │
│ ┌─────────────────────────────────────────────────────────┐│
│ │ Hyperprior: z ──► mean, scale for each position         ││
│ │                                                         ││
│ │ Channel: Process channels 1,2,3... using previous ones  ││
│ │          as context                                     ││
│ │                                                         ││
│ │ Attention: Use windowed self-attention to find          ││
│ │            long-range dependencies                      ││
│ └─────────────────────────────────────────────────────────┘│
│                          │                                  │
│                          ▼                                  │
│                    Probability                              │
│                    Distribution                             │
│                          │                                  │
│                          ▼                                  │
│               Arithmetic Coding ──► Bitstream               │
└─────────────────────────────────────────────────────────────┘

Performance Benchmarking

Running Benchmarks

Test the performance optimizations:

# Run all benchmarks
python -m src.benchmarks

This will output timing comparisons like:

============================================================
Benchmark Results
============================================================
  broadcast_quantize          :    45.23 ms (baseline)
  binary_search_quantize      :     9.05 ms (5.00x)
============================================================

What Each Benchmark Tests

Benchmark	What It Measures	Why It Matters
`benchmark_scale_quantization`	Speed of finding optimal quantization levels	Called millions of times during compression
`benchmark_masked_conv`	Speed of creating causal masks	Done once per layer, but slow if not optimized
`benchmark_attention`	Memory and speed of attention mechanism	Attention is O(n²) by default—we make it O(n)

Memory Profiling

Check how much GPU memory your model uses:

from src.benchmarks import MemoryProfiler

with MemoryProfiler() as mem:
    output = model(large_input)

print(f"Peak memory: {mem.peak_mb:.1f} MB")

Prerequisites

Required Software

Software	Version	Purpose
Python	3.10+	Programming language
TensorFlow	~=2.15	Neural network framework
TensorFlow Probability	~=0.23	Probability distributions for entropy modeling
MPEG G-PCC	Latest	Industry-standard codec for comparison
MPEG PCC Metrics	v0.12.3	Standard evaluation metrics

Python Dependencies

Install these with pip install -r requirements.txt:

Package	Version	Purpose
tensorflow	~=2.15	Neural network operations
tensorflow-probability	~=0.23	Probability distributions for entropy modeling
numpy	~=1.26	Numerical computations
matplotlib	~=3.8	Visualization
pandas	~=2.1	Data analysis
pyyaml	~=6.0	Configuration file parsing
scipy	~=1.11	Scientific computing
tqdm	~=4.66	Progress bars
numba	~=0.58	JIT compilation for speed
keras-tuner	~=1.4	Hyperparameter tuning (for cli_train.py)
pytest	~=8.0	Test framework
ruff	>=0.4	Linter (configured in pyproject.toml)

Project Structure

deepcompress/
├── src/                            # Source code
│   ├── Model Components
│   │   ├── model_transforms.py     # Main encoder/decoder (V1 + V2) architecture
│   │   ├── entropy_model.py        # Gaussian conditional, hyperprior entropy models
│   │   ├── entropy_parameters.py   # Hyperprior mean/scale prediction network
│   │   ├── context_model.py        # MaskedConv3D, autoregressive spatial context
│   │   ├── channel_context.py      # Channel-wise context model
│   │   └── attention_context.py    # Windowed attention context model
│   │
│   ├── Performance
│   │   ├── constants.py            # Pre-computed math constants (LOG_2, EPSILON)
│   │   ├── precision_config.py     # Mixed precision (float16) settings
│   │   ├── benchmarks.py           # Performance measurement
│   │   └── quick_benchmark.py      # Quick synthetic smoke test
│   │
│   ├── Data Processing
│   │   ├── data_loader.py          # Unified data loader (ModelNet40 / 8iVFB)
│   │   ├── ds_mesh_to_pc.py        # Convert .off meshes to point clouds
│   │   ├── ds_pc_octree_blocks.py  # Split point clouds into octree blocks
│   │   ├── ds_select_largest.py    # Select N largest blocks by point count
│   │   ├── octree_coding.py        # Octree encode/decode for voxel grids
│   │   ├── compress_octree.py      # Compression entry point
│   │   └── map_color.py            # Transfer colors between point clouds
│   │
│   ├── Training & Evaluation
│   │   ├── training_pipeline.py    # End-to-end training loop
│   │   ├── evaluation_pipeline.py  # Model evaluation pipeline
│   │   ├── cli_train.py            # Training CLI with hyperparameter tuning
│   │   └── experiment.py           # Experiment runner
│   │
│   └── Evaluation & Comparison
│       ├── ev_compare.py           # Point cloud quality metrics (PSNR, Chamfer)
│       ├── ev_run_render.py        # Visualization / rendering
│       ├── point_cloud_metrics.py  # D1/D2 point-to-point metrics
│       ├── mp_report.py            # MPEG G-PCC comparison reports
│       ├── colorbar.py             # Colorbar visualization utility
│       └── parallel_process.py     # Parallel processing utility
│
├── tests/                          # Automated tests (pytest + tf.test.TestCase)
│   ├── conftest.py                 # Session-scoped fixtures (tf_config, file factories)
│   ├── test_utils.py               # Shared test utilities (mock grids, configs)
│   ├── test_model_transforms.py    # V1 + V2 model tests
│   ├── test_entropy_model.py       # Entropy model tests
│   ├── test_context_model.py       # Context model tests
│   ├── test_channel_context.py     # Channel context tests
│   ├── test_attention_context.py   # Attention context tests
│   ├── test_performance.py         # Performance regression + optimization tests
│   ├── test_training_pipeline.py   # Training loop tests
│   ├── test_evaluation_pipeline.py # Evaluation pipeline tests
│   ├── test_data_loader.py         # Data loading tests
│   ├── test_compress_octree.py     # Compression pipeline tests
│   ├── test_octree_coding.py       # Octree codec tests
│   └── ...                         # + 10 more module-level test files
│
├── data/                           # Datasets (not in git)
├── results/                        # Output files (not in git)
├── CLAUDE.md                       # AI agent coding standards
├── pyproject.toml                  # Ruff linter configuration
├── pytest.ini                      # Pytest configuration and markers
├── setup.py                        # Package setup
├── requirements.txt                # Python dependencies
└── README.md                       # This file

Troubleshooting

Common Issues

"Out of memory" errors

Reduce batch_size in config
Use resolution: 32 instead of 64
Enable mixed precision training
Use entropy_model: 'hyperprior' (most memory-efficient)

Training is slow

Enable mixed precision: mixed_precision: true
Use a GPU (CPU training is 10-50x slower)
Reduce model size: filters: 32

Poor reconstruction quality

Train for more epochs
Increase model size: filters: 128
Try a better entropy model: entropy_model: 'channel'

Compression ratio is worse than expected

Ensure the model is fully trained
Use an advanced entropy model
Check that input data is similar to training data

Citation

If you use this code in your research, please cite:

@article{killea2021deepcompress,
  title={DeepCompress: Efficient Point Cloud Geometry Compression},
  author={Killea, Ryan and Li, Yun and Bastani, Saeed and McLachlan, Paul},
  journal={arXiv preprint arXiv:2106.01504},
  year={2021}
}

License

This project is licensed under the terms specified in the LICENSE file.

Getting Help

Issues: GitHub Issues
Paper: arXiv:2106.01504
Questions: Open a GitHub issue with the "question" label

Name		Name	Last commit message	Last commit date
Latest commit History 15 Commits
.github/workflows		.github/workflows
src		src
tests		tests
.gitignore		.gitignore
.python-version		.python-version
CLAUDE.md		CLAUDE.md
LICENSE		LICENSE
README.md		README.md
image.png		image.png
pyproject.toml		pyproject.toml
pytest.ini		pytest.ini
requirements.txt		requirements.txt
setup.py		setup.py

Folders and files

Latest commit

History

Repository files navigation

DeepCompress: Efficient Point Cloud Geometry Compression

Authors

What is Point Cloud Compression?

The Problem

The Solution

What's New in V2

1. Smarter Compression (Advanced Entropy Models)

2. Faster Processing (Performance Optimizations)

Quick Start

Step 1: Installation

Step 2: Quick Test (No Dataset Needed)

Using V2 Models in Your Code

Basic Example

Enabling Faster Training with Mixed Precision

Full Training Pipeline

Step 1: Environment Setup

Step 2: Dataset Preparation

Step 3: Training

Step 4: Evaluation

Step 5: Compare with Industry Standard (G-PCC)

Expected Results

Understanding the Architecture

How Neural Compression Works

Why Different Entropy Models Matter

V2 Architecture Diagram

Performance Benchmarking

Running Benchmarks

What Each Benchmark Tests

Memory Profiling

Prerequisites

Required Software

Python Dependencies

Project Structure

Troubleshooting

Common Issues

Citation

License

Getting Help

About

Topics

Resources

License

Uh oh!

Stars

Watchers

Forks

Releases 1

Packages 0

Uh oh!

Uh oh!

Contributors

Uh oh!

Languages

Packages