Skip to content

Latest commit

 

History

History
2122 lines (1677 loc) · 99.8 KB

File metadata and controls

2122 lines (1677 loc) · 99.8 KB

TRELLIS Forge Progress

Current Status

GUI APPLICATION PRODUCTION-READY (2026-02-01 08:00) TRELLIS.2 Image-to-3D Pipeline: FROZEN/STABLE (2026-02-01 08:00) TRELLIS.2 Image-to-3D: EXPORT DEFAULTS FIXED (2026-01-31 11:30) TRELLIS.2 Image-to-3D: FACE-CONSISTENT SAMPLING FIX (2026-01-31 13:00) TRELLIS.2 Image-to-3D: BLACK BARS FIXED (2026-01-31 09:45) Frontend renamed to "Genesis": COMPLETE (2026-01-24 00:00) File Reorganization: COMPLETE (2026-01-31 08:45)


STABLE RELEASE: Image-to-3D Pipeline (2026-02-01)

Status: PRODUCTION-READY - FROZEN

The TRELLIS.2 Image-to-3D pipeline is now stable and working extremely well via the GUI application. Any future changes to this pipeline require extreme caution.

Verified Working Configuration

Component Status Notes
GUI (Electron + FastAPI) STABLE Full generation workflow
TRELLIS.2-4B Pipeline STABLE All 3 stages working
CUDA Extensions STABLE flex_gemm, cumesh, nvdiffrast, o_voxel, spconv
Memory Management STABLE low_vram=True forced, prevents OOM
Output Quality VERIFIED Matches official HuggingFace demo

Critical Files (DO NOT MODIFY without careful review)

File Purpose Last Verified
gui/backend/main.py FastAPI server, job orchestration 2026-02-01
trellis2/pipelines/trellis2_image_to_3d.py TRELLIS.2 pipeline 2026-02-01
trellis2/representations/mesh/base.py Mesh export with degenerate face filter 2026-02-01
o_voxel/o_voxel/postprocess.py UV unwrap + black bar fix 2026-02-01

Critical Call Relationships (FROZEN)

GUI Generate Button
    |
    v
POST /api/generate/image (main.py:661)
    |
    v
run_image_to_3d_job() (main.py:534)
    |
    +-- load_pipeline("image_to_3d_v2") (main.py:302)
    |       |
    |       +-- Trellis2ImageTo3DPipeline.from_pretrained()
    |       +-- pipeline.low_vram = True  <-- CRITICAL: Prevents OOM
    |       +-- pipeline.cuda()
    |
    +-- Image.open(image_path)  <-- NO .convert('RGB'), preserves alpha
    |
    +-- clear_cuda_memory()  <-- Before generation
    |
    +-- pipeline.run(image, seed=seed, ...)
            |
            +-- preprocess_image()
            |       +-- If RGBA with alpha: use directly (skip BiRefNet)
            |       +-- If RGB: run BiRefNet background removal
            |
            +-- sample_sparse_structure() [Stage 1]
            |       +-- Flow model on GPU during sampling
            |       +-- Model to CPU after (low_vram=True)
            |
            +-- sample_shape_slat_cascade() [Stage 2]
            |       +-- Same memory management pattern
            |
            +-- sample_tex_slat_cascade() [Stage 3]
            |       +-- Same memory management pattern
            |
            +-- decode_shape() -> (mesh, subdivisions)
            |
            +-- decode_texture(guide_subs=subdivisions)
            |
            +-- MeshWithVoxel.to_glb()
                    +-- UV unwrap (xatlas)
                    +-- Degenerate face filtering
                    +-- Export

Memory Management (CRITICAL)

# main.py - MUST remain as-is
if pipeline_type == "image_to_3d_v2" and hasattr(pipelines[pipeline_type], 'low_vram'):
    pipelines[pipeline_type].low_vram = True  # NEVER change to False

Why low_vram=True is mandatory:

  • Even 24GB GPUs (RTX 4090) OOM without this
  • Flow models + intermediate tensors peak together
  • low_vram offloads models to CPU between stages
  • ~10-15% slower but 100% reliable

Image Mode Handling (CRITICAL)

# main.py:558 - MUST preserve alpha channel
image = Image.open(image_path)  # NO .convert('RGB')

Why alpha preservation matters:

  • RGBA with transparency: Pipeline uses existing mask directly
  • RGB without alpha: Pipeline runs BiRefNet background removal
  • Different masks = different outputs
  • Test scripts use RGBA, so GUI must too for parity

Freezing Options

To protect this stable state, consider one of these options:

  1. Git Tag (Recommended)

    git tag -a v1.0.0-stable-image2mesh -m "Stable Image-to-3D pipeline"
    git push origin v1.0.0-stable-image2mesh
  2. Git Branch

    git checkout -b stable/image-to-3d-v1
    git push origin stable/image-to-3d-v1
  3. GitHub Release

    • Create release from tag with changelog
    • Allows binary attachments if needed
  4. Protected Branch Rules (GitHub)

    • Require PR reviews for changes to critical files
    • Add CODEOWNERS file for gui/backend/main.py, trellis2/pipelines/*

Work: 2026-02-01 07:00-08:00 | Modified: 2026-02-01 08:00


Export Defaults & Degenerate Face Fix (2026-01-31 11:30)

Problems Identified

Full pipeline trace revealed critical discrepancies vs official TRELLIS.2:

Metric Before Fix After Fix Official
texture_size 1024 2048 2048
decimation_target None (fallback) 500000 500000
Degenerate faces 297-308 ~77 2
Face count 235k 491k 280k

Root Causes

  1. Half-resolution textures: base.py defaulted to texture_size=1024, official uses 2048
  2. Inconsistent decimation: decimation_target=None fell back to percentage-based simplification
  3. Post-UV degenerate faces: xatlas introduces degenerate faces during vertex remapping that weren't filtered

Fixes Applied (trellis2/representations/mesh/base.py)

  1. texture_size default: 1024 -> 2048
  2. decimation_target default: None -> 500000
  3. Post-UV degenerate filter: Added face area filtering (threshold 1e-10) after uv_unwrap()
# Step 4.5: Remove degenerate faces AFTER UV unwrapping
v0 = out_vertices[out_faces[:, 0]]
v1 = out_vertices[out_faces[:, 1]]
v2 = out_vertices[out_faces[:, 2]]
face_areas = torch.linalg.norm(torch.cross(v1 - v0, v2 - v0), dim=-1) / 2
valid_face_mask = face_areas > 1e-10
out_faces = out_faces[valid_face_mask]

Remaining Issues

  • Face count (491k) still higher than official (280k) - may need investigation into simplification behavior
  • Some degenerate faces remain (~77 vs 2) - threshold may need tuning
  • Visual comparison pending

Status

EXPORT DEFAULTS FIXED - Texture resolution and decimation target now match official.

Work: 2026-01-31 10:45-11:30 | Modified: 2026-01-31 11:30


Black Bar Artifact Fix (2026-01-31 09:30)

Problem

Generated GLBs showed vertical black bars extending through the model. Analysis revealed:

  • Degenerate triangles where two vertices have identical 3D coordinates (different indices, same position)
  • UV unwrapping creates duplicate vertices at UV seams
  • Some faces end up with vertices at same 3D position = collapsed triangles = black bars
  • Affected mesh: 213,486 duplicate vertices, faces with zero-length edges

Root Cause

After UV unwrapping, some triangles have vertex indices pointing to identical positions:

Face 325214: indices=[287868, 287869, 287870]
  v[287868] = [0.21057606, -0.04407465, -0.46032906]
  v[287870] = [0.21057606, -0.04407465, -0.46032906]  <- IDENTICAL

These degenerate triangles render as black bars/spikes.

Fix Applied

Added degenerate face filtering in o_voxel/o_voxel/postprocess.py after UV unwrapping:

# Remove degenerate faces (triangles with duplicate vertex positions)
v0 = out_vertices[out_faces[:, 0]]
v1 = out_vertices[out_faces[:, 1]]
v2 = out_vertices[out_faces[:, 2]]
edge1 = (v1 - v0).norm(dim=1)
edge2 = (v2 - v1).norm(dim=1)
edge3 = (v0 - v2).norm(dim=1)
valid_faces_mask = (edge1 > 1e-7) & (edge2 > 1e-7) & (edge3 > 1e-7)
out_faces = out_faces[valid_faces_mask]

Status

VERIFIED FIXED - User confirmed black bars no longer appear in generated models.

Work: 2026-01-31 09:00-09:30 | Modified: 2026-01-31 09:45


Texture Patch & Black Bars Fix (2026-01-31)

Problem

Generated GLBs showed:

  1. Triangular texture patches - different colors/brightness on adjacent triangles
  2. Black bars - vertical spikes through the model (FIXED with degenerate face filtering)

Root Cause: Windows CUDA Numerical Precision

User confirmed official TRELLIS.2 on Hugging Face demo (Linux) doesn't have texture patches. Our Windows-compiled CUDA extensions (cuBVH) have subtle numerical differences that cause adjacent texels to map to different original mesh faces at triangle boundaries.

Fixes Applied

1. Pipeline reverted to official (trellis2/pipelines/trellis2_image_to_3d.py):

  • Removed all BF16 autocast blocks
  • Pipeline now matches official exactly

2. Face-consistent BVH projection (o_voxel/o_voxel/postprocess.py):

  • Instead of each texel independently querying BVH (causing face-switching at boundaries)
  • Now: compute centroid of each simplified face, map to ONE original face via BVH
  • All texels in a simplified triangle use the SAME original face for sampling
  • Eliminates triangular patches caused by per-texel face inconsistency

3. Degenerate face filtering (threshold 1e-5):

  • Removes zero-area triangles that cause black bars

Status

FACE-CONSISTENT SAMPLING IMPLEMENTED - Needs visual verification.

Work: 2026-01-31 10:00-13:00 | Modified: 2026-01-31 13:00


Work: 2026-01-31 09:00-09:30 | Modified: 2026-01-31 09:45


File Reorganization: COMPLETE (2026-01-31 08:45)

Summary

Full reorganization of trellis-forge directory structure per REORGANIZATION_PLAN.md.

Actions Completed

Action Details Result
Deleted torch wheels torch_28.whl, torch-2.8.0+cu128...whl 2.44 GB freed
Deleted spatial root junk eigen/, models/, New folder/, =4.10.0, nul, o_voxel_install.log 590 MB freed
Created tests/ structure canonical/, unit/, integration/, debug/, diagnostics/, analysis/ Organized
Moved 58 scripts All test/debug/diagnose/analyze scripts Clean root
Created tools/ Viewers and utility scripts Organized
Reorganized outputs/ generations/, benchmarks/, debug/, test_artifacts/, logs/ Clean structure
Archived flash_attn wheel _archive/wheels/flash_attn-2.8.3+...whl Preserved

New Directory Structure

trellis-forge/
├── Start-TrellisForge.ps1      # Application launcher
├── Install-TrellisForge.ps1    # Installation
├── trellis-forge.bat           # Windows launcher
├── LICENSE, README.md, PROGRESS.md, etc.
├── gui/                        # Application (backend + electron)
├── trellis/                    # TRELLIS 1 pipeline
├── trellis2/                   # TRELLIS.2 pipeline
├── cumesh/, o_voxel/           # CUDA extensions
├── models/, configs/, assets/  # Resources
├── venv311/                    # Python environment
├── tests/                      # All test/debug scripts
│   ├── canonical/              # Primary validation (test_hybrid_precision.py)
│   ├── unit/                   # Unit tests
│   ├── integration/            # Integration tests
│   ├── debug/                  # Debug scripts
│   ├── diagnostics/            # Diagnostic scripts
│   └── analysis/               # Analysis scripts
├── tools/                      # Developer tools
│   └── viewers/                # HTML model viewers
├── outputs/                    # Generated outputs
│   ├── generations/            # User-generated GLBs
│   ├── benchmarks/             # Benchmark outputs
│   ├── debug/                  # Debug session outputs
│   ├── test_artifacts/         # Test outputs
│   └── logs/                   # Backend logs
├── _archive/                   # Archived items
│   ├── wheels/                 # Flash attention wheel
│   └── old_outputs/            # Legacy debug outputs
└── reference/                  # Test inputs and benchmarks

Verification

  • trellis2 module imports correctly
  • Application functionality unaffected
  • Total disk space freed: ~3 GB

Rollback

  • Backup manifest: _backup_20260131/cleanup_manifest.txt
  • Moved files can be restored from _archive/ and tests/ directories

Running Tests After Reorganization

# Canonical validation test
.\venv311\Scripts\python.exe tests\canonical\test_hybrid_precision.py

# Full generation test
.\venv311\Scripts\python.exe tests\canonical\run_generation_test.py

Previous: Phase 1 Simple Cleanup (2026-01-31 08:30)

Initial cleanup before full reorganization:

  • eigen/ (~500 MB) - Duplicate of bundled Eigen
  • models/ (~2 GB) - Old model cache
  • New folder/ - Empty/unnamed
  • Junk files (=4.10.0, nul)

Work: 2026-01-31 08:00 | Modified: 2026-01-31 08:30


VISUAL QUALITY ISSUES - ACTIVE INVESTIGATION (2026-01-30 Session 4)

Rollback Performed

Previous session's experimental changes to postprocess.py caused completely destroyed mesh output (fragmented geometry, floating pieces, black spikes). Changes were rolled back to match official TRELLIS.2:

Reverted:

  • Removed unused densify_sparse_attrs_hashmap() function
  • Removed import torch.nn.functional as F (unused)
  • Restored *grid_size.tolist() format (was changed to gs_int, gs_int, gs_int)
  • Removed nearest-neighbor fallback for zero samples
  • Removed extra comments about BVH and OPAQUE mode

Result: Mesh structure is now intact (not destroyed), but significant visual quality issues remain.

Current Visual Comparison (Official vs Ours)

Test: spiral_input.png with seed 42

Aspect Official Ours Issue
Glass facade Smooth, fine grid lines Chunky tiles (acceptable) Minor
Texture mapping Continuous, smooth Triangular patches with different UV mapping CRITICAL
Vegetation Cohesive green plants, pink flowers Fragmented brown debris CRITICAL
Surface continuity Smooth blending Visible seams, horizontal banding CRITICAL
Base geometry Clean flower clusters White triangular artifacts, scattered fragments CRITICAL

Confirmed Issues (User Verified)

  1. Polygonal texture patches - Triangular/diamond-shaped regions with mismatched texture mapping across facade
  2. UV direction/normal mapping errors - Textures not properly blended at polygon boundaries
  3. Small geometry destruction - Vegetation and small polygon clusters decimated into brown debris (threshold too aggressive?)
  4. Horizontal banding artifacts - Regular light lines cutting across surfaces
  5. Geometric artifacts - White triangular shapes that shouldn't exist (at base)
  6. Texture seams - Visible discontinuities between UV chart regions

Suspected Root Causes

  1. Mesh decimation threshold - remove_small_connected_components(1e-5) may be destroying important small geometry (vegetation)
  2. UV unwrapping (xatlas) - Chart boundary handling causing texture discontinuities
  3. BVH projection - bvh.unsigned_distance() returning inconsistent face_ids for adjacent texels
  4. Sparse trilinear sampling - Only 0.5% voxel occupancy means most samples have partial/no neighbors

Files Currently Matching Official

  • o_voxel/o_voxel/postprocess.py - Now matches official TRELLIS.2 exactly

Next Steps

  1. Investigate remove_small_connected_components() threshold - may need adjustment to preserve vegetation
  2. Examine xatlas UV chart boundary handling
  3. Debug BVH face_id consistency for adjacent texture samples
  4. Compare mesh decimation behavior between official and ours

Work: 2026-01-30 18:00-21:00 | Modified: 2026-01-30 21:00


RETRACTED: COLOR VARIANCE (2026-01-30 Session 3)

Status: Investigation was premature. Color variance analysis was conducted while visual artifacts were present. The "seed variance" conclusion may have been masking actual bugs.

Testing multiple seeds revealed HIGH variance in brightness across different seeds

This section is retracted pending proper investigation of texture/geometry issues.

Work: 2026-01-30 16:00-17:30 | Modified: 2026-01-30 21:00 (RETRACTED)


RETRACTED: DEEP ROOT CAUSE ANALYSIS (2026-01-30 Session 2)

Status: Changes from this session caused mesh destruction. All modifications have been rolled back.

Fixes Applied: 1. Nearest-neighbor fallback (postprocess.py) 2. Removed autocast from run_generation_test.py

Rolled back: postprocess.py restored to official TRELLIS.2 version.

Work: 2026-01-30 12:00-15:30 | Modified: 2026-01-30 21:00 (RETRACTED)


PREVIOUS ANALYSIS (2026-01-30 Session 1)

Problem Statement

Triangular texture discontinuities on flat surfaces (blue glass facade). Adjacent mesh triangles show different brightness/color despite representing the same surface.

INVESTIGATION COMPLETE - ROOT CAUSE IDENTIFIED

The triangular texture patches are INHERENT to the sparse sampling + BVH projection algorithm.

Evidence Summary

Test Result Implication
postprocess.py comparison BYTE-FOR-BYTE IDENTICAL Python code is not the cause
flex_gemm grid_sample_3d IDENTICAL to official CUDA kernel is not the cause
cumesh BVH IDENTICAL to official BVH projection is not the cause
Coordinate ordering test (X,Y,Z) CORRECT Coordinate handling is not the cause
grid_sample_3d precision test 22.1% of points differ >0.01 from dense sampling SPARSE SAMPLING BEHAVIOR

Key Technical Finding: Sparse vs Dense Sampling

flex_gemm.grid_sample_3d performs sparse-aware trilinear interpolation:

  • Only existing voxels contribute to interpolation (empty voxels have weight=0)
  • Weights are normalized by sum of valid neighbors
  • This is fundamentally different from dense F.grid_sample

Precision test results:

Max difference from PyTorch dense: 0.998510
Mean difference: 0.070025
Points with diff > 0.01: 221/1000 (22.1%)

At exact voxel locations, trilinear returns interpolated values (not exact) because neighboring voxels contribute. This is by design for sparse tensors.

Visual Comparison (Playwright Screenshots 2026-01-30)

Test Run Details:

  • Input: reference/spiral_input.png
  • Output: outputs/generation_test/generated_spiral_input.glb (16.8 MB)
  • Benchmark: spiral_official.glb (13.0 MB)
  • Pipeline Time: 239.2s total
  • Peak GPU Memory: 23,716 MB

Multi-Angle Comparison (Screenshots Captured):

View Official Ours Notes
3/4 View Clean spiral tower with vegetation Same structure, slightly different positioning Shape matches well
Front Smooth blue glass, pink flowers Blue glass with visible grid pattern, vegetation present Texture grid more visible
Right Clean facade More visible horizontal lines in texture Texture banding
Top Clear floor plan visible Simpler footprint, less detail Some detail loss

Visual Quality Scores (7 Dimensions):

Criterion Score Notes
Shape Fidelity (SF) 8/10 Overall tower shape matches, spiral vegetation correct
Structural Integrity (SI) 8/10 No fragmentation, coherent structure, some edge artifacts
Texture Clarity (TC) 6/10 Grid pattern visible, triangular patches on flat surfaces
Color Accuracy (CA) 8/10 Blue glass, green vegetation, correct overall palette
PBR Material Quality (PQ) 7/10 Glass reflectivity works, materials respond to light
UV Mapping Quality (UV) 6/10 Visible discontinuities at triangle boundaries
Mesh Topology (MT) 7/10 16.8 MB vs 13.0 MB official, slightly higher poly count
TOTAL 50/70 NEAR PASS (threshold: 56/70)

Key Visual Differences:

Aspect Official Ours
Glass facade Smooth, continuous Triangular patches visible, grid pattern more apparent
Window grid Uniform, subtle More pronounced, visible horizontal bands
Vegetation Dense spiral clusters Same spiral pattern, slightly less dense
Overall shape Spiral tower Same spiral tower, proportions match
Color Blue/green Same colors, slightly different saturation

Why Official Looks Better

Possible explanations for visual difference:

  1. Different model inference - The official benchmark may have been generated with:

    • Different random seed
    • Different model version
    • Different inference parameters
  2. Platform differences - Linux CUDA vs Windows CUDA may produce:

    • Different floating-point rounding
    • Different PRNG sequences
    • Different memory alignment affecting kernel behavior
  3. Model warmup effects - First generation may differ from subsequent ones

Conclusion

All Python code is IDENTICAL to official TRELLIS.2. The texture artifacts cannot be fixed by changing Python code.

The visual differences are caused by:

  • Model inference variance (stochastic elements in generation)
  • CUDA platform differences (Windows vs Linux)
  • Possible benchmark/seed mismatch

Status: ACCEPTED LIMITATION

The triangular texture patches are a characteristic of the sparse-sampling texture baking approach used by TRELLIS.2. Our implementation is correctly matching the official algorithm - any remaining differences are due to:

  • Random seed / inference variance
  • Platform-specific CUDA behavior

Next steps if user wants to pursue further:

  1. Generate multiple models with different seeds and compare
  2. Run generation on Linux WSL with same CUDA and compare
  3. Contact TRELLIS.2 authors about expected visual variance

Work: 2026-01-30 02:00 | Modified: 2026-01-30 11:45


Visual Quality Assessment (2026-01-29 - CORRECTED)

Test Configuration

  • Input: reference/spiral_input.png (681KB)
  • Generated: outputs/hybrid_precision_test/generated_spiral_input_seed42.glb (17.5 MB)
  • Official Benchmark: outputs/hybrid_precision_test/spiral_official.glb (13.0 MB)
  • Seed: 42
  • Pipeline: 1024_cascade with hybrid precision (BF16 shape, FP32 texture)
  • Timing: 348s total (~5.8 min)

Visual Scores (Corrected - Proper Model Comparison)

Criterion Score Assessment
Shape Fidelity (SF) 9/10 Correct spiral tower shape, proportions match
Structural Integrity (SI) 8/10 Coherent tower structure, vegetation spiral
Texture Clarity (TC) 6/10 Blue glass correct BUT small polygon-like artifacts
Color Accuracy (CA) 8/10 Blue facade, green vegetation match input
PBR Material Quality (PQ) 7/10 Materials respond to light correctly
UV Mapping Quality (UV) 6/10 Some visible seam artifacts at triangle boundaries
Mesh Topology (MT) 7/10 17.5 MB vs 13 MB - slightly over-tessellated
TOTAL 51/70 NEAR PASS (threshold: 56/70)

Key Finding: Texture Artifacts

ISSUE: Small polygon-like glitches scattered across flat surfaces (blue glass facade)

Visual Evidence (side-by-side comparison):

  • Official (left): Smooth, continuous blue glass surface with uniform grid pattern
  • Ours (right): Same blue glass BUT with small triangular/polygon fragments breaking surface continuity

Artifact Characteristics:

  • Small triangular/shard-like fragments
  • Scattered randomly across the glass surface
  • Break the visual continuity of flat surfaces
  • NOT grid/voxel-aligned patterns (different from previous issues)
  • Appear to be at mesh triangle boundaries

Root Cause Investigation (ACTIVE)

STRONG EVIDENCE: Unnormalized Normals

xatlas throws hundreds of warnings during UV unwrapping:

ASSERT: isNormalized(normal) cumesh\third_party\xatlas\xatlas.cpp 1263

This indicates face/vertex normals being passed to xatlas have length != 1.0, which causes:

  1. Incorrect UV chart computation
  2. Rendering artifacts due to improper normal interpolation
  3. Visual discontinuities at triangle boundaries

Diagnostic Results (diagnose_mesh_artifacts.py):

  • 14 degenerate faces (zero area)
  • 515 sliver triangles (aspect ratio > 10)
  • 7 extreme slivers (aspect ratio > 100, max: 9.8 million!)
  • 41 vertices with near-zero normals
  • 14 faces with near-zero normals
  • 2 adjacent faces with opposing normals

Tested and NOT the cause:

  • Grid coordinate clamping (removed, kernel handles OOB naturally)
  • Alpha mode (OPAQUE is correct)
  • BVH rebuilding (kept original as required)
  • remesh=True vs remesh=False (artifacts present with both)
  • remove_degenerate_faces() (improved slightly but not fixed)

CRITICAL FINDING: Sliver Triangle Count (RULED OUT)

User clarification: The issue is NOT geometry/sliver triangles - it's texture mapping per polygon.

Metric Official Ours Ratio
Vertices 228,398 218,939 0.96x
Faces 280,488 110,552 0.39x
Sliver triangles (>10) 130 40,304 310x MORE
Extreme slivers (>100) 2 40,064 20,000x MORE

Our mesh has 40,000+ sliver triangles vs official's 130. This is the root cause of polygon artifacts.

Revised Investigation Focus: Texture Baking Pipeline

The visual artifacts show triangular texture patches that don't align with neighboring triangles. This is NOT a geometry issue - it's a UV/normal/texture sampling issue.

Key pipeline stages to investigate:

  1. BVH projection (line 260-262) - bvh.unsigned_distance() returning inconsistent face_ids
  2. grid_sample_3d - Our flex_gemm kernel vs official behavior
  3. Normal interpolation - xatlas warnings about unnormalized normals

postprocess.py code comparison: Our code is IDENTICAL to official (lines 200-300)

INVESTIGATION FOCUS: Texture Mapping (NOT Geometry) (2026-01-30 01:15):

User clarification: The triangular texture artifacts are a texture mapping/UV/sampling issue, not a mesh geometry issue. Changes to mesh simplification did not fix the problem and actually made visual quality worse.

Visual Evidence: Side-by-side comparison shows:

  • Official (left): Smooth, continuous blue glass facade with uniform grid pattern
  • Ours (right): Same structure BUT with triangular/polygon-shaped texture discontinuities scattered across flat surfaces

Key observation: The artifacts follow TRIANGLE boundaries in the mesh, not voxel grid boundaries. This points to:

  1. BVH projection returning inconsistent face_ids for adjacent texels
  2. grid_sample_3d coordinate handling differences
  3. UV chart boundary issues in xatlas

Next steps:

  1. Compare BVH unsigned_distance output between our build and official
  2. Examine flex_gemm grid_sample_3d trilinear interpolation kernel
  3. Check if UV seam handling differs

Work: 2026-01-29 17:00 | Modified: 2026-01-30 01:15


CRITICAL: Test Asset Protocol (2026-01-29)

ALWAYS use reference/spiral_input.png for visual parity testing.

Canonical Test Assets

Asset Path Source Hash/Size
Input Image reference/spiral_input.png User-provided 681 KB
Official Benchmark outputs/hybrid_precision_test/spiral_official.glb Official TRELLIS.2 platform 13.0 MB
Generated Output outputs/hybrid_precision_test/generated_spiral_input_seed42.glb Our pipeline (seed 42) 17.5 MB
Visual Comparison outputs/visual_comparison.html Side-by-side viewer -

Visual Comparison Protocol

# 1. Generate with canonical input (default)
cd B:\M\ArtificialArchitecture\spatial\trellis-forge
.\venv311\Scripts\python.exe test_hybrid_precision.py

# Output: outputs/hybrid_precision_test/generated_spiral_input_seed42.glb

# 2. Start HTTP server from trellis-forge ROOT
npx http-server . -p 8766 --cors

# 3. Open visual comparison (server must be at root, not outputs/)
# http://localhost:8766/outputs/visual_comparison.html

CRITICAL: Server Must Run From Root

The visual_comparison.html uses relative paths like hybrid_precision_test/spiral_official.glb. If the server runs from outputs/ instead of trellis-forge root, the models will fail to load with 404 errors.

Work: 2026-01-29 17:00 | Modified: 2026-01-29 17:00


Visual Quality Fixes (2026-01-29) - INSUFFICIENT (shape generation broken)

Problem Statement

Generated GLBs had severe visual artifacts compared to official TRELLIS.2:

  1. Black vertical bars/spikes - Long black lines extending through the entire model
  2. Triangular texture patches - Misaligned/wrong textures in triangular regions
  3. See-through facade - Building skeleton visible instead of solid glass
  4. Patchy textures - Inconsistent texture sampling

Root Causes Identified and Fixed

# Root Cause Fix Status
1 Grid coordinates unclamped Added torch.clamp() before grid_sample_3d ✅ FIXED
2 BLEND alpha mode causing transparency Reverted to OPAQUE (matches official) ✅ FIXED
3 BVH rebuild breaking texture projection REMOVED BVH rebuild - must keep original ✅ FIXED

Critical Finding: BVH Must NOT Be Rebuilt

WRONG approach (what we tried first):

# After simplification
vertices, faces = mesh.read()
bvh = cumesh.cuBVH(vertices, faces)  # BREAKS texture projection!

CORRECT approach (official TRELLIS.2): The BVH is built once on the original high-res mesh (line 122) and NEVER rebuilt. At texture baking time (line 254), the code uses:

_, face_id, uvw = bvh.unsigned_distance(valid_pos, return_uvw=True)
orig_tri_verts = vertices[faces[face_id.long()]]  # Uses ORIGINAL vertices/faces

This projects UV positions back onto the original high-resolution mesh to sample accurate colors. Rebuilding BVH on the simplified mesh breaks this reference.

Alpha Mode: OPAQUE is Intentional

The official TRELLIS.2 note states:

"The .glb file is exported in OPAQUE mode by default. Although the alpha channel is preserved within the texture map, it is not active initially."

The alpha channel contains voxel density/opacity data from generation, NOT actual transparency for glass facades. Using BLEND mode makes solid surfaces incorrectly transparent.

Files Modified

File Changes
o_voxel/o_voxel/postprocess.py Grid clamping (lines 284-291), OPAQUE mode (line 322), removed BVH rebuilds

Verification Results (Playwright Visual Inspection)

Test: spiral_input.png with seed 42, compared to official spiral_official.glb

Criterion Before Fix After Fix Target
Glass Facade See-through skeleton Solid blue glass ✅ PASS
Texture Continuity Patchy, black bars Smooth, continuous ✅ PASS
Building Structure Fragmented Coherent tower ✅ PASS
Foliage Spiral Misaligned Proper diagonal ✅ PASS
PBR Materials Incorrect transparency Proper opaque ✅ PASS

Remaining Issue: Triangular Texture Patches (INVESTIGATING)

Problem: Our model still shows visible triangular patches/seams on the glass facade that break texture continuity. The official model has smooth, continuous textures.

Visual Evidence: Close-up comparison shows:

  • Official (left): Smooth blue glass with uniform grid pattern
  • Ours (right): Visible triangular seams where texture sampling differs between adjacent triangles

Suspected Causes:

  1. UV chart boundaries - xatlas UV unwrapping creates chart boundaries that cause texture discontinuities
  2. Inpainting radius too small - CV2 inpaint radius of 3px may not cover chart seams
  3. Texture resolution - 2048x2048 may not provide enough detail for large flat surfaces
  4. Interpolation differences - Barycentric interpolation at triangle edges

Status: INVESTIGATING

Work: 2026-01-29 06:00 | Modified: 2026-01-29 08:00


Visual Quality Analysis (2026-01-29 - Playwright Inspection)

Test Configuration

  • Input Image: reference/test_input.jpg.JPG (architectural model)
  • Benchmark: reference/sample_2026-01-24T055452.643.glb (official TRELLIS.2)
  • Generated: outputs/generation_test/generated_output.glb (our pipeline)
  • Seed: 42
  • Parameters: Default (sparse_steps=12, shape_guidance=7.5, tex_steps=12)

Visual Comparison Scores (0-10 scale, minimum passing: 8/10 per dimension)

Criterion Score Trace Stage Assessment
Shape Fidelity (SF) 7/10 1-4 Overall structure recognizable. Tan tower well-formed. Green lattice geometry differs - benchmark has cleaner grid pattern.
Structural Integrity (SI) 6/10 5 Horizontal platforms less defined. Some fragmentation in green lattice areas.
Texture Clarity (TC) 5/10 7 Textures present but washed out. Green areas darker. Tan building lacks crisp window definition.
Color Accuracy (CA) 6/10 3-4 Colors in right ballpark but saturation differs. Lime green less vibrant than benchmark.
PBR Material Quality (PQ) 6/10 4,7-8 Materials respond to light but appear more matte than benchmark.
UV Mapping Quality (UV) 6/10 6 Functional but shows stretching in green lattice areas.
Mesh Topology (MT) 7/10 5 494k triangles (ours) vs 299k (benchmark) - over-tessellation without quality benefit.
TOTAL 43/70 FAIL (threshold: 56/70)

Root Cause Analysis (Updated 2026-01-29)

VERIFIED: Parameters match official TRELLIS.2 exactly

Parameter Official Ours Match
sparse_guidance 7.5 7.5
sparse_rescale 0.7 0.7
shape_guidance 7.5 7.5
shape_rescale 0.5 0.5
tex_guidance 1.0 1.0
tex_rescale 0.0 0.0
decimation_target 500,000 500,000
texture_size 2048 2048

ROOT CAUSE IDENTIFIED: BF16 Autocast Precision Loss

The official TRELLIS.2 does NOT use torch.autocast(). Our pipeline wraps all sampling stages in:

with torch.autocast('cuda', dtype=torch.bfloat16, enabled=True):

This causes:

  1. Texture color degradation - BF16 has lower precision (7 mantissa bits vs 23 in FP32)
  2. Shape detail loss - Subtle geometric features get quantized
  3. Material property shifts - PBR values compressed

Performance vs Quality Tradeoff:

  • Without autocast: 78 minutes (FP32) - perfect quality
  • With autocast: 5.5 minutes (BF16) - degraded quality (43/70)

Proposed Fix: Hybrid Precision Strategy

Run texture-sensitive operations in FP32, compute-heavy shape sampling in BF16:

# Stage 1-2: BF16 for performance (shape is less precision-sensitive)
with torch.autocast('cuda', dtype=torch.bfloat16, enabled=True):
    coords = self.sample_sparse_structure(...)
    shape_slat, res = self.sample_shape_slat_cascade(...)

# Stage 3: FP32 for quality (texture colors need precision)
with torch.autocast('cuda', enabled=False):
    tex_slat = self.sample_tex_slat(...)

Expected result:

  • Time: ~15-20 minutes (3-4x slower than full BF16, 4-5x faster than full FP32)
  • Quality: Should recover texture clarity while keeping reasonable performance

Artifacts Observed

Artifact Location Severity Stage Likely Cause
Washed-out green Lattice structure HIGH 3-4 BF16 color quantization
Missing grid detail Green building facade MEDIUM 2 BF16 shape precision
Texture stretching Lattice UV areas MEDIUM 6-7 UV unwrap + BF16
Matte appearance Overall model LOW 7-8 BF16 PBR value loss

Hybrid Precision Results (2026-01-29)

FIX IMPLEMENTED AND VERIFIED

Metric Full BF16 Hybrid (BF16 shape, FP32 tex) Improvement
Inference Time 130s (~2.2 min) 130s (~2.2 min) Same
Total Time (with export) ~5.5 min ~12.7 min +7 min (export dominates)
Color Saturation Washed out Vibrant green FIXED
Shape Quality 7/10 7/10 Same
Texture Clarity 5/10 7-8/10 IMPROVED

Visual Comparison (Hybrid vs Benchmark):

  • Green lattice structure: Now matches benchmark color saturation
  • Tan tower: Window detail improved
  • Blue supports: Color accuracy restored
  • Overall: Much closer to official TRELLIS.2 output

File Modified:

  • trellis2/pipelines/trellis2_image_to_3d.py - Stage 1-2 use BF16, Stage 3 (texture) uses FP32

Output for Review:

  • Hybrid precision output: outputs/hybrid_precision_test/generated_seed42.glb
  • Comparison viewer: outputs/compare.html
  • Benchmark: reference/sample_2026-01-24T055452.643.glb

Complete Pipeline Trace: Launch to GLB Output (2026-01-29)

User Request: Deep/wide analysis of complete process stream from app launch to GLB output.

This section documents EVERY folder, file, script, import, class, and function that participates in the TRELLIS.2 Image-to-3D generation pipeline. If ANY of these are removed, the application will fail.

1.1 User Launches Application: trellis-forge Command

Entry Point: PowerShell profile function

File Location Purpose
Microsoft.PowerShell_profile.ps1 C:\Users\Admin\Documents\WindowsPowerShell\ Defines trellis-forge function
Start-TrellisForge.ps1 B:\M\ArtificialArchitecture\spatial\trellis-forge\ Main launcher script

Start-TrellisForge.ps1 Flow:

  1. Sets VENV_PYTHON = .\venv311\Scripts\python.exe
  2. Sets VCVARS64 = Visual Studio 2022 vcvars64.bat path
  3. Calls Start-Backend function:
    • Runs cmd /k "vcvars64.bat && python -m uvicorn gui.backend.main:app --host 127.0.0.1 --port 8000"
  4. Calls Start-Electron function:
    • Changes to gui/electron/
    • Runs npm start

1.2 Electron Frontend (Genesis)

File Location Purpose
package.json gui/electron/ App config: name="genesis", main=main.js
main.js gui/electron/ Electron main process, BrowserWindow, IPC handlers
preload.js gui/electron/ Context bridge: selectImage, saveModel, getBackendUrl
index.html gui/electron/ UI layout, parameter sliders, mode selector
styles.css gui/electron/ UI styling
app.js gui/electron/ Frontend logic, Three.js viewer, API calls

main.js Key Functions:

  • createWindow() - Creates BrowserWindow, loads index.html
  • startBackend() - Spawns cmd.exe with vcvars64.bat + uvicorn
  • IPC handlers: select-image, save-model, get-backend-url

app.js Key Functions:

  • init() - Gets backend URL, starts polling
  • initViewer() - Three.js scene, camera, renderer, controls
  • loadModel(url, type) - GLTFLoader for GLB viewing
  • generateModel() - Sends POST to /api/generate/image
  • fetchJobs() - Polls /api/jobs every 500ms
  • renderJobs() - Updates sidebar with active/completed jobs

1.3 User Clicks Image-to-3D Tab + Uploads Image + Clicks Generate

index.html Parameter Controls (Image-to-3D / TRELLIS.2):

  • imageSeedInput - Random seed (default: 42)
  • imageResolution - Output resolution (default: 256)
  • imageSparseSteps - Stage 1 steps (default: 12)
  • imageSparseGuidance - Stage 1 guidance (default: 7.5)
  • imageSparseRescale - Stage 1 rescale (default: 0.7)
  • imageShapeSteps - Stage 2 steps (default: 12)
  • imageShapeGuidance - Stage 2 guidance (default: 7.5)
  • imageShapeRescale - Stage 2 rescale (default: 0.5)
  • imageTexSteps - Stage 3 steps (default: 12)
  • imageTexRescale - Stage 3 rescale (default: 0.0)
  • imageSimplify - Mesh simplification (default: 0.95)
  • imageTextureSize - Texture resolution (default: 1024)

app.js generateModel() → API Request:

POST ${backendUrl}/api/generate/image
FormData: file (image blob), seed, pipeline_version='v2', resolution,
          sparse_steps, sparse_cfg, guidance_rescale_sparse,
          slat_steps, slat_cfg, guidance_rescale_shape,
          guidance_rescale_material, simplify, texture_size

1.4 Backend Processing (FastAPI → TRELLIS.2 Pipeline → GLB)

Backend Entry Point

File Location Purpose
main.py gui/backend/ FastAPI app, pipeline loading, job handling

main.py Key Components:

  • Environment Setup (lines 1-30):

    • PYTORCH_CUDA_ALLOC_CONF=expandable_segments:True
    • HF_HOME, HF_HUB_CACHE, TORCH_HOMEmodels/ directory
    • HF_HUB_OFFLINE=1 - No downloading
    • torch.backends.cudnn.benchmark = True
  • Pipeline Loading (load_pipeline()):

    • Imports Trellis2ImageTo3DPipeline from trellis.pipelines
    • Loads from models/hub/TRELLIS.2-4B
    • Calls configure_vram_mode(total_vram_gb) for high-VRAM mode
    • Calls pipeline.to("cuda")
  • Job Handling (run_image_to_3d_job()):

    • Saves uploaded image to uploads/
    • Calls pipeline.run() with parameters
    • Returns List[MeshWithVoxel]
    • Calls mesh.export() for GLB output

TRELLIS.2 Pipeline Module Structure

Root Package:

File Purpose
trellis/__init__.py Package init
trellis/pipelines/__init__.py Lazy imports: Trellis2ImageTo3DPipeline

TRELLIS.2 Pipeline Package (trellis2/):

File Purpose
trellis2/__init__.py Package init
trellis2/pipelines/__init__.py Lazy imports for pipeline classes
trellis2/pipelines/base.py Pipeline base class, from_pretrained()
trellis2/pipelines/trellis2_image_to_3d.py MAIN PIPELINE
trellis2/pipelines/samplers/__init__.py Sampler imports
trellis2/pipelines/samplers/base.py Sampler base class
trellis2/pipelines/samplers/flow_euler.py FlowEulerGuidanceIntervalSampler
trellis2/pipelines/samplers/classifier_free_guidance_mixin.py CFG mixin
trellis2/pipelines/samplers/guidance_interval_mixin.py Guidance interval mixin
trellis2/pipelines/rembg/__init__.py rembg imports
trellis2/pipelines/rembg/BiRefNet.py Background removal model

Trellis2ImageTo3DPipeline Class (trellis2_image_to_3d.py):

  • model_names_to_load: 8 models to load
  • from_pretrained(path) - Loads pipeline + models
  • preprocess_image(input) - Background removal, cropping
  • get_cond(image, resolution) - DINOv3 conditioning
  • sample_sparse_structure() - Stage 1: sparse coords (BF16)
  • sample_shape_slat_cascade() - Stage 2: shape latent 512→1024 (BF16)
  • sample_tex_slat() - Stage 3: texture latent (FP32 for color accuracy)
  • decode_shape_slat() - Mesh extraction
  • decode_tex_slat() - PBR attribute decoding
  • decode_latent() - Combined decode → MeshWithVoxel
  • run() - Main inference with hybrid precision (BF16 shape, FP32 texture)

Models Package

File Purpose
trellis2/models/__init__.py Model registry, from_pretrained(), state dict remapping
trellis2/models/sparse_structure_flow.py SparseStructureFlowModel, TimestepEmbedder
trellis2/models/structured_latent_flow.py SLatFlowModel (shape/texture latent)
trellis2/models/sparse_elastic_mixin.py SparseTransformerElasticMixin
trellis2/models/sparse_structure_vae.py Sparse structure VAE
trellis2/models/sc_vaes/__init__.py SC VAE package
trellis2/models/sc_vaes/fdg_vae.py FlexiDualGridVaeDecoder, FlexiDualGridVaeEncoder
trellis2/models/sc_vaes/sparse_unet_vae.py SparseUnetVaeEncoder, SparseUnetVaeDecoder

Models Loaded (from pipeline.json):

  1. sparse_structure_flow_model - 32 resolution, dense transformer
  2. sparse_structure_decoder - Decodes z_s to binary occupancy
  3. shape_slat_flow_model_512 - Low-res shape latent flow
  4. shape_slat_flow_model_1024 - High-res shape latent flow
  5. shape_slat_decoder - FlexiDualGridVaeDecoder → Mesh
  6. tex_slat_flow_model_512 - Low-res texture latent flow (if using 512)
  7. tex_slat_flow_model_1024 - High-res texture latent flow
  8. tex_slat_decoder - Decodes to PBR voxel attributes

Modules Package (Sparse Operations)

Sparse Core (trellis2/modules/sparse/):

File Purpose
__init__.py Exports: SparseTensor, SparseConv3d, SparseLinear, attention
config.py CONV='flex_gemm', ATTN='flash_attn'
basic.py SparseTensor class, VarLenTensor class
linear.py SparseLinear layer
norm.py Sparse normalization layers
nonlinearity.py Sparse activation functions

Sparse Convolution (trellis2/modules/sparse/conv/):

File Purpose
__init__.py SparseConv3d, SparseInverseConv3d exports
config.py FLEX_GEMM_ALGO='masked_implicit_gemm_splitk'
conv.py Dynamic backend loading based on config.CONV
conv_flex_gemm.py flex_gemm integration (primary backend)
conv_spconv.py spconv fallback

Sparse Attention (trellis2/modules/sparse/attention/):

File Purpose
__init__.py Attention exports
full_attn.py sparse_scaled_dot_product_attention (flash_attn/xformers)
windowed_attn.py Windowed sparse attention
modules.py Attention modules
rope.py SparseRotaryPositionEmbedder

Sparse Transformer (trellis2/modules/sparse/transformer/):

File Purpose
__init__.py Transformer exports
blocks.py Sparse transformer blocks
modulated.py ModulatedSparseTransformerCrossBlock

Sparse Spatial (trellis2/modules/sparse/spatial/):

File Purpose
__init__.py Spatial ops exports
basic.py SparseDownsample, SparseUpsample
spatial2channel.py Sparse spatial-channel conversion

Dense Modules (trellis2/modules/)

File Purpose
utils.py manual_cast(), convert_module_to(), str_to_dtype()
norm.py Normalization layers
spatial.py Spatial operations
image_feature_extractor.py DinoV3FeatureExtractor
attention/__init__.py Dense attention exports
attention/full_attn.py Dense attention
attention/modules.py Attention modules
attention/rope.py RotaryPositionEmbedder
attention/config.py BACKEND='flash_attn'
transformer/__init__.py Transformer exports
transformer/blocks.py Transformer blocks
transformer/modulated.py ModulatedTransformerCrossBlock

Representations Package

File Purpose
trellis2/representations/__init__.py Lazy imports: Mesh, MeshWithVoxel
trellis2/representations/mesh/__init__.py Mesh package
trellis2/representations/mesh/base.py Mesh, MeshWithVoxel, PbrMaterial, export()
trellis2/representations/voxel/__init__.py Voxel package
trellis2/representations/voxel/voxel_model.py Voxel class

MeshWithVoxel.export() (base.py:277-883):

  • Uses cumesh.CuMesh for mesh simplification
  • Uses cumesh.cuBVH for BVH projection
  • Uses cumesh.remeshing.remesh_narrow_band_dc for Dual Contouring
  • Uses cumesh.uv_unwrap for UV parameterization
  • Uses nvdiffrast.torch for UV rasterization
  • Uses flex_gemm.ops.grid_sample.grid_sample_3d for trilinear sampling
  • Uses OpenCV cv2.inpaint for texture completion
  • Uses trimesh for GLB export

External CUDA Extensions

o_voxel (trellis-forge/o_voxel/):

File Purpose
setup.py Build configuration
o_voxel/__init__.py Package init
o_voxel/postprocess.py to_glb() - Official GLB export function
o_voxel/rasterize.py Rasterization utilities
o_voxel/serialize.py Serialization
o_voxel/convert/__init__.py Convert utilities
o_voxel/convert/flexible_dual_grid.py flexible_dual_grid_to_mesh()
o_voxel/convert/volumetic_attr.py Volumetric attribute handling
o_voxel/io/ I/O formats (npz, ply, vxz)

cumesh (trellis-forge/cumesh/):

File Purpose
setup.py Build configuration (MSVC flags)
cumesh/__init__.py Exports: CuMesh, cuBVH, remeshing
cumesh/cumesh.py CuMesh class (simplify, fill_holes, uv_unwrap)
cumesh/bvh.py cuBVH class (unsigned_distance)
cumesh/remeshing.py remesh_narrow_band_dc()
cumesh/xatlas.py xatlas UV unwrapping
cumesh/third_party/cubvh/ CUDA BVH implementation

flex_gemm (spatial/flexgemm_source/):

File Purpose
setup.py Build configuration
flex_gemm/__init__.py Package init
flex_gemm/ops/__init__.py Operations exports
flex_gemm/ops/grid_sample/__init__.py grid_sample_3d export
flex_gemm/ops/grid_sample/grid_sample.py grid_sample_3d() - trilinear sampling
flex_gemm/ops/spconv/__init__.py Sparse conv exports
flex_gemm/ops/spconv/submanifold_conv3d.py sparse_submanifold_conv3d()
flex_gemm/ops/serialize.py Serialization
flex_gemm/ops/utils.py Utilities
flex_gemm/kernels/triton/ Triton kernels for spconv and grid_sample

Other Required Packages (installed in venv311):

Package Purpose
flash_attn Flash Attention for variable-length attention
nvdiffrast CUDA UV rasterization
spconv Fallback sparse convolution
xatlas UV unwrapping
transformers DINOv3ViTModel, BiRefNet
trimesh GLB export
pyvista Mesh operations (if used)

Pipeline Execution Flow

1. pipeline.run(image, seed, params)
   ├── preprocess_image(image)
   │   └── rembg_model(image) → RGBA with background removed
   ├── get_cond([preprocessed], 512) → cond_512
   ├── get_cond([preprocessed], 1024) → cond_1024
   │   └── image_cond_model(image) → DinoV3 features
   │
   ├── [BF16] with torch.autocast('cuda', dtype=torch.bfloat16):
       │   ├── sample_sparse_structure(cond_512, 32)
       │   │   ├── sparse_structure_flow_model.forward() → z_s
       │   │   └── sparse_structure_decoder(z_s) → coords [N, 4]
       │   │
       │   └── sample_shape_slat_cascade(cond_512, cond_1024, coords)
       │       ├── shape_slat_flow_model_512.forward() → lr_slat
       │       ├── shape_slat_decoder.upsample(lr_slat) → hr_coords
       │       └── shape_slat_flow_model_1024.forward() → shape_slat
       │
       ├── [FP32] sample_tex_slat(cond_1024, shape_slat)  # No autocast - color accuracy
       │   └── tex_slat_flow_model_1024.forward() → tex_slat
       │
       └── decode_latent(shape_slat, tex_slat, resolution)
           ├── decode_shape_slat(shape_slat)
           │   └── shape_slat_decoder(slat) → (meshes, subs)
           ├── decode_tex_slat(tex_slat, subs)
           │   └── tex_slat_decoder(slat, guide_subs) → tex_voxels
           └── MeshWithVoxel(vertices, faces, coords, attrs)

2. mesh.export(path, decimation_target, texture_size)
   ├── cumesh.CuMesh.init(vertices, faces)
   ├── cumesh.cuBVH(vertices, faces)
   ├── cumesh.remeshing.remesh_narrow_band_dc()
   ├── mesh.simplify(target)
   ├── mesh.uv_unwrap() → (vertices, faces, uvs, vmaps)
   ├── nvdiffrast.torch.rasterize() → UV space
   ├── flex_gemm.grid_sample_3d() → PBR attributes
   ├── cv2.inpaint() → texture completion
   └── trimesh.Trimesh.export(path) → GLB file

Work: 2026-01-29 03:00 | Modified: 2026-01-29 04:00


TRELLIS.2 Performance Optimization: COMPLETE (2026-01-29)

14x speedup achieved: 78 minutes → 5.45 minutes

Optimization Results

Metric Before After Improvement
Total Time 4,676s (78 min) 327s (5.45 min) 14.3x faster
Shape SLat Cascade 2,905s 66.8s 43x faster
Texture SLat 1,518s 31.0s 49x faster
Peak GPU 26,538 MB 42,340 MB Uses expandable_segments

Root Causes Fixed

# Issue Impact Fix
1 manual_cast() dtype conversion 71.6% CPU time on aten::_to_copy torch.autocast() bypasses manual_cast entirely
2 cuDNN benchmark disabled Missing kernel auto-tuning torch.backends.cudnn.benchmark = True
3 low_vram model transfers CPU↔GPU every sampling call Disabled for ≥20GB VRAM GPUs

How Autocast Fixes manual_cast() Overhead

The manual_cast() function in trellis2/modules/utils.py checks autocast status:

def manual_cast(tensor, dtype):
    if not torch.is_autocast_enabled():
        return tensor.type(dtype)  # <-- ALLOCATES + COPIES (slow)
    return tensor  # <-- RETURNS UNCHANGED (fast)

With 4 manual_cast() calls per forward pass × 30 blocks × 12 steps × 2 (CFG) = 2,880 tensor allocations per sampling stage. Autocast eliminates all of them.

Per-Stage Breakdown (After Optimization)

Stage Time Notes
Pipeline Loading 55.1s Model weights ~20GB RAM
Image Preprocessing 2.6s BiRefNet background removal
Image Conditioning 0.3s DINOv3 feature extraction
Sparse Structure (Stage 1) 2.8s 6,046 sparse coords
Shape SLat Cascade (Stage 2) 66.8s 512→1024, 28,672 tokens
Texture SLat (Stage 3) 31.0s PBR material sampling
Decode Shape + Texture 32.9s Mesh + voxel extraction
GLB Export 135.6s CuMesh simplify + UV + bake

Files Modified

File Change
gui/backend/main.py Added torch.backends.cudnn.benchmark = True, configure_vram_mode()
run_generation_test.py Added cuDNN benchmark, autocast, VRAM mode
diagnose_performance.py Added cuDNN benchmark
trellis2/pipelines/trellis2_image_to_3d.py Added configure_vram_mode(), hybrid precision (BF16 shape, FP32 texture), disabled autocast for upsample

Technical Notes

  • Hybrid precision strategy: Stage 1-2 (sparse structure + shape) use BF16 for performance. Stage 3 (texture) uses FP32 for color accuracy. This recovers vibrant colors while maintaining fast inference (~2.2 min).
  • Nested autocast context: The upsample operation in sample_shape_slat_cascade() uses torch.autocast('cuda', enabled=False) because flex_gemm Triton kernels don't support mixed precision (FP16 input with FP32 weights). After the disabled context exits, autocast properly resumes for HR sampling.
  • VRAM threshold: Changed from 24GB to 20GB because RTX 4090 reports 23.98GB total memory.
  • Peak GPU 42GB: Uses PyTorch's expandable_segments for virtual memory management, allowing GPU memory to exceed physical VRAM via unified memory.

Work: 2026-01-29 00:00 | Modified: 2026-01-29 01:00


TRELLIS.2 Full Generation Test: VERIFIED (2026-01-28)

End-to-end generation successful with visual parity to official benchmark.

Generation Results (Pre-Optimization Baseline)

Metric Value Notes
Total Time 4,676s (~78 min) High-res 1024_cascade pipeline
Peak GPU 26,538 MB Via PyTorch expandable_segments
GLB Output 21.8 MB At outputs/generation_test/generated_output.glb
Final Mesh 472,784 faces After decimation from 37.6M

Per-Stage Breakdown (Pre-Optimization)

Stage Time GPU Peak Notes
Pipeline Loading 73.0s 0 MB Model weights ~20GB RAM
Image Preprocessing 1.7s 3,189 MB BiRefNet background removal
Image Conditioning 1.7s 1,363 MB DINOv3 feature extraction
Sparse Structure (Stage 1) 3.1s 2,717 MB 6,046 sparse coords
Shape SLat Cascade (Stage 2) 2,905s 26,538 MB 512→1024, 28,672 tokens
Texture SLat (Stage 3) 1,518s 3,812 MB PBR material sampling
Decode Shape + Texture 7.6s 15,672 MB Mesh + voxel extraction
GLB Export 165.9s 17,252 MB CuMesh simplify + UV + bake

Structural Comparison vs Benchmark

Metric Generated Benchmark Notes
Vertices 489,768 342,684 43% more
Faces 472,532 299,350 58% more
Extents [0.88, 1.00, 0.46] [0.87, 1.00, 0.46] Near-identical bounds
Surface Area 14.65 17.02 14% less
Has PBR Textures Yes Yes Both have base_color + metallic_roughness

Texture Quality Comparison

Metric Generated Benchmark Status
R Histogram Distance 0.0346 - PASS (<0.1)
G Histogram Distance 0.0687 - PASS (<0.1)
B Histogram Distance 0.0529 - PASS (<0.1)
Mean Distance 0.0521 - PASS (<0.1)
Black Pixel Ratio 0.0000 0.0000 Perfect UV coverage

Parameter Fixes Applied (Pre-Generation)

Parameter Before After Impact
slat_cfg (ImageTo3DRequest) 3.0 7.5 Stronger shape guidance
texture_size 1024 2048 4x more texels
decimation_target 1,000,000 500,000 Match official
rescale_t (sampler params) implicit explicit 5.0/3.0/3.0 Match official app.py

Visual Parity

User confirmed visual quality matches official benchmark. Model recognizable as same building with correct shape, texture, and PBR materials.

Work: 2026-01-28 00:00 | Modified: 2026-01-28 01:30


Implementation Parity Analysis & Correction: COMPLETE (2026-01-26)

All code verified against official TRELLIS.2 codebase. 8/8 runtime tests PASS.

Comprehensive comparative analysis identified 6 critical divergences + 1 hidden override. All corrected.

Changes Applied

# File Issue Fix
C1 trellis2/modules/sparse/conv/config.py 3 wrong values: SPCONV_ALGO='native', FLEX_GEMM_ALGO='implicit_gemm_splitk', HASHMAP_RATIO=1.5 Changed to 'auto', 'masked_implicit_gemm_splitk', 2.0
C2 trellis2/pipelines/base.py Custom HuggingFace path resolution (~28 lines) diverged from official Replaced with official simple try/except (~8 lines)
C3 trellis2/representations/mesh/base.py fill_holes(), remove_faces(), simplify() wrapped in try/except with silent pass Removed try/except, added fail-fast import cumesh at module level
C4 gui/backend/main.py export block Used custom mesh.export() (606 lines) instead of official o_voxel.postprocess.to_glb() Switched to o_voxel.postprocess.to_glb() with official parameters
C5 trellis2/modules/sparse/config.py Extra backends: 'torch_native' for conv, 'sdpa'/'naive' for attn Removed non-official backends, fixed print prefix to '[SPARSE]'
C6 trellis2/modules/sparse/attention/full_attn.py + windowed_attn.py ~170 lines of sdpa/naive fallback code Removed all sdpa/naive code blocks
-- gui/backend/main.py (lines 102-103) HIDDEN: os.environ['SPCONV_ALGO']='native' and os.environ['ATTN_BACKEND']='sdpa' overriding config files Removed env overrides, added expandable_segments
-- trellis2/models/sc_vaes/fdg_vae.py try/except fallback import for flexible_dual_grid_to_mesh Direct import from o_voxel.convert matching official

Files Deleted (Not in official, PyTorch fallbacks)

File Purpose
trellis2/modules/sparse/conv/conv_torch_native.py PyTorch fallback sparse conv
trellis2/utils/grid_sample_3d_torch.py PyTorch fallback grid_sample
trellis2/utils/flexible_dual_grid_pytorch.py PyTorch fallback FlexGEMM dual grid
trellis2/utils/hashmap_pytorch.py PyTorch fallback FlexGEMM hashmap

Runtime Verification (8/8 PASS)

Test 1 PASS: Conv config values match official (auto, masked_implicit_gemm_splitk, 2.0)
Test 2 PASS: Attention backend = flash_attn
Test 3 PASS: cumesh imported at module level
Test 4 PASS: o_voxel.postprocess.to_glb accessible
Test 5 PASS: All CUDA deps loaded (flex_gemm, cumesh, nvdiffrast, spconv, flash_attn)
Test 6 PASS: Trellis2ImageTo3DPipeline import OK
Test 7 PASS: All fallback files deleted
Test 8 PASS: No env var overrides, expandable_segments set

Work: 2026-01-26 11:00 | Modified: 2026-01-26 11:30


Resource Parity: All 3 Phases Complete (2026-01-26)

All changes implemented. Awaiting user go-ahead for generation test.

Phase 1: flash_attn - COMPLETE

  • Installed flash_attn-2.8.3+cu128torch2.8.0cxx11abiFALSE-cp310-cp310-win_amd64.whl
  • Updated trellis2/modules/sparse/config.py: ATTN = 'flash_attn'
  • Updated trellis2/modules/attention/config.py: BACKEND = 'flash_attn'
  • Functional test passed (varlen_qkvpacked on 128 tokens)

Phase 2: expandable_segments - COMPLETE

  • Patched pytorch-build/c10/cuda/CMakeLists.txt: moved PYTORCH_C10_DRIVER_API_SUPPORTED macro outside if(NOT WIN32)
  • Patched pytorch-build/c10/cuda/driver_api.cpp: Win32 dynamic loading (LoadLibraryA/GetProcAddress)
  • Patched pytorch-build/c10/cuda/driver_api.h: Added C10_EXPORT to static method declarations
  • Patched pytorch-build/c10/cuda/CUDACachingAllocator.cpp: 7 Windows compatibility patches (platform headers, CU_MEM_HANDLE_TYPE_NONE, IPC guards, DWORD pid, GetCurrentProcessId)
  • Built standalone c10_cuda.dll (432KB) using installed PyTorch headers
  • Replaced in venv (original backed up as c10_cuda.dll.bak)
  • Verified: expandable_segments test PASSED, all CUDA extensions load correctly

Phase 3: Device Handling Alignment - COMPLETE

  • Step 3.1: Added PYTORCH_CUDA_ALLOC_CONF=expandable_segments:True to gui/backend/main.py
  • Step 3.2: Aligned decode_shape_slat with official (.to(device) first, then .low_vram = True)
  • Step 3.3: Aligned decode_tex_slat with official (.to(device) only, no low_vram flag)
  • Step 3.4: Reverted decoder forward() to official — removed block-level offloading, memory debug prints, gc.collect, empty_cache
  • Step 3.5: Removed _chunked_op, gc.collect, import gc, CHUNK_SIZE from sparse_unet_vae.py — 18 _chunked_op calls and 8 gc.collect calls reverted to direct calls matching official
  • Step 3.6: Simplified run() cleanup to single torch.cuda.empty_cache() matching official
  • Step 3.7: Removed decode_latent cleanup code (del, gc.collect, empty_cache)

Files Modified (All Phases)

File Phase Change
trellis2/modules/sparse/config.py 1 ATTN = 'flash_attn'
trellis2/modules/attention/config.py 1 BACKEND = 'flash_attn'
gui/backend/main.py 3 Added PYTORCH_CUDA_ALLOC_CONF env var
trellis2/pipelines/trellis2_image_to_3d.py 3 Aligned decode_shape/tex_slat + removed gc cleanup
trellis2/models/sc_vaes/sparse_unet_vae.py 3 Removed _chunked_op, gc.collect, block offloading — matches official

Work: 2026-01-26 00:00 | Modified: 2026-01-26 01:00


Resource Parity Investigation: expandable_segments + flash_attn (2026-01-26)

User Request: Achieve full functional AND resource usage parity with official TRELLIS.2 on Windows. Official states 24GB GPU is sufficient — our 24GB RTX 4090 should match.

Root Cause: Two features blocking resource parity on Windows

Feature 1: expandable_segments (CUDA VMM Allocator)

What it does: Uses CUDA Virtual Memory Management APIs (cuMemCreate, cuMemMap, cuMemAddressReserve, cuMemSetAccess) to create memory segments that grow/shrink at 2MiB page granularity. Eliminates fragmentation — the #1 cause of OOM on our system.

Why blocked on Windows: PyTorch c10/cuda/CMakeLists.txt has:

if(NOT WIN32)
  target_link_libraries(c10_cuda PRIVATE dl)
  target_compile_options(c10_cuda PRIVATE "-DPYTORCH_C10_DRIVER_API_SUPPORTED")
endif()

Without PYTORCH_C10_DRIVER_API_SUPPORTED:

  • driver_api.cpp compiles to nothing (entire file is #if ... #endif)
  • DriverAPI::get() does not exist — confirmed absent from c10_cuda.dll exports
  • CUDACachingAllocator.cpp compiles with stub ExpandableSegment that asserts false
  • expandable_segments() at ordinal 60 in c10_cuda.dll returns false unconditionally

Why the guard exists: driver_api.cpp uses dlopen/dlsym for NVML loading. Windows uses LoadLibraryW/GetProcAddress instead. However: NVML is optional (OOM error messages only). The VMM functions are loaded via cudaGetDriverEntryPoint / cudaGetDriverEntryPointByVersioncross-platform CUDA Runtime APIs.

Hardware confirmed ready: All 8 VMM APIs available in nvcuda.dll:

  • cuMemCreate, cuMemMap, cuMemAddressReserve, cuMemSetAccess
  • cuMemUnmap, cuMemRelease, cuMemAddressFree, cuMemGetAllocationGranularity

Fix required: Patch 3 PyTorch source files, rebuild 2 DLLs:

  1. c10/cuda/CMakeLists.txt (~3 lines) — remove if(NOT WIN32) guard, skip dl on Win32
  2. c10/cuda/driver_api.cpp (~15 lines) — platform-conditional #include <dlfcn.h><windows.h>, dlopenLoadLibraryA, dlsymGetProcAddress
  3. Rebuild c10_cuda.dll (406KB) + torch_cuda.dll (1GB)

Risk: Low. Mechanical platform abstraction on 4 function calls. No logic changes.

Feature 2: flash_attn (Flash Attention)

What it does: Tiled CUDA attention kernels operating on packed variable-length sequences via cu_seqlens. Zero padding overhead, O(N) memory.

Our sdpa replacement overhead: full_attn.py:230-232 allocates 3 dense padded tensors [N, max_len, H, C] + attention mask [N, 1, max_q_len, max_kv_len] per layer. With ~40+ attention layers across decoders, this adds hundreds of MB of intermediate memory per forward pass.

Fix required: pip install pre-built Windows wheel + 1 line config change:

  • Wheel: flash_attn-2.8.3+cu128torch2.8.0cxx11abiFALSE-cp310-cp310-win_amd64.whl
  • Source: https://github.com/bdashore3/flash-attention/releases/tag/v2.8.3
  • Exact match: Python 3.10, CUDA 12.8, PyTorch 2.8.0, Windows x64
  • Code paths already implemented in full_attn.py:184-195 and windowed_attn.py:118-121
  • Config change: config.py:10ATTN = 'flash_attn'

Risk: Low. Pre-built wheel matches our exact environment.

Additional Finding: Device Handling Difference

Method Official Ours
decode_shape_slat .to(device) THEN .low_vram = True .low_vram = True WITHOUT .to(device)
decode_tex_slat .to(device) (no low_vram set) .low_vram = True WITHOUT .to(device)

Official loads decoder to GPU first, then enables block-level offloading. Ours skips the initial GPU load. This may affect memory layout and should be aligned after resource parity features are in place.

Combined Impact

Both features compound:

  • flash_attn reduces peak memory during decoder inference (~30-40% less padding overhead)
  • expandable_segments reclaims fragmented memory after decoder inference (before fill_holes)
  • Together: 24GB sufficient for full pipeline including fill_holes() on 29M-face raw mesh

Without either: memory pressure accumulates → fragmentation → OOM on fill_holes() → machine crash (observed).

Investigation Inconsistency Resolved

Earlier subagent comparison incorrectly reported "fill_holes SKIPPED in decode_latent." Verified: fill_holes() IS called at trellis2_image_to_3d.py:490. The crash was caused by fill_holes running on a 29M-face mesh and OOMing due to fragmentation — not by it being skipped.

Priority Order

  1. flash_attn (immediate) — pip install + 1 line → memory reduction during inference
  2. expandable_segments (PyTorch source patch + rebuild) → fragmentation elimination
  3. Device handling alignment — match official .to(device) pattern
  4. Generation test — only after 1+2 complete (user must give go-ahead)

Work: 2026-01-26 00:00 | Modified: 2026-01-26 00:00


Tracing Analysis: Our Implementation vs Official TRELLIS.2 (2026-01-25 19:00)

User Request: Trace carefully our vs official TRELLIS.2, ensure 1024 is easily manageable, investigate huge resource usage

Key Differences Found:

Parameter Official Ours Impact
texture_size 4096 2048 4x fewer texels, lower quality
PYTORCH_CUDA_ALLOC_CONF expandable_segments:True Not set Missing GPU memory optimization
remesh_project 0 0 Same (correct)
remesh_band 1 1.0 Same (correct)
decimation_target 1000000 1000000 Same (correct)
max_num_tokens 49152 50000 Ours slightly higher
doubleSided (remesh) False True Minor - ours always True
Sparse conv backend flex_gemm (Linux) flex_gemm (Windows) ✅ Same - compiled for Windows
Attention backend flash_attn (Linux) sdpa (Windows) ⚠️ Different - SDPA is PyTorch native

Resource Usage Concerns:

  1. SDPA vs flash_attn: SDPA uses more memory than flash_attn but is the only option on Windows
  2. Missing CUDA allocator optimization: Official sets PYTORCH_CUDA_ALLOC_CONF=expandable_segments:True
  3. flex_gemm is working: Our sparse conv is using flex_gemm (CUDA-accelerated), not torch_native fallback

Thin Geometry Issue (Leaf holes):

  • Trace Stage: 5 (Mesh Extraction - Dual Contouring)
  • Likely Cause: At 512 resolution, thin planar structures (like leaves) may not have enough voxel density
  • Official behavior: Uses cumesh.remeshing.remesh_narrow_band_dc with project_back=0 (no snapping)
  • Our behavior: Same parameters, but thin structures may need higher resolution (1024) for better topology

Action Items:

  1. ✅ Already using flex_gemm (no PyTorch fallback)
  2. ⚠️ PYTORCH_CUDA_ALLOC_CONF=expandable_segments:True - Not supported on Windows, warning only
  3. ✅ 1024 resolution now works with staged simplification fix
  4. ✅ texture_size=4096 tested successfully

1024 Resolution Test Results (2026-01-25 19:30)

FIXED: 1024 resolution now works with pre-remesh simplification and staged post-remesh simplification

Metric Value Notes
Generation time 68.8s Pipeline inference
Export time 79.3s Mesh processing + texture baking
Total time 148.1s ~2.5 minutes
Peak GPU 6.36 GB Easily manageable on RTX 4090
Output file 80.21 MB 4096x4096 textures
Final mesh 494K verts, 988K faces After decimation to 1M target
UV coverage 54.0% Good coverage
RGB means R=93.0, G=94.7, B=69.0 Correct earth tones

Fixes Applied:

  1. Pre-remesh simplification: If input mesh > 4M faces, simplify to 4M before Dual Contouring (avoids 18M+ post-remesh)
  2. Staged post-remesh simplification: 18M → 9M → 4M → 2M → 1M (prevents int32 overflow in cumesh.simplify)
  3. expandable_segments: Added to env but not supported on Windows (warning only)

Resource Usage (1024 vs 512):

Resolution Peak GPU Export Time Output Size
512 2.75 GB 26.1s 39.61 MB
1024 6.36 GB 79.3s 80.21 MB

Work: 2026-01-25 19:00 | Modified: 2026-01-25 19:30


Visual Validation Results (2026-01-25 17:30)

IMPROVED: Texture contour artifacts fixed by using flex_gemm.grid_sample_3d

  • Test Input: reference/test_sphere.png (nature sphere - moss, metal gears, fabric, stone, leaves)
  • Output: outputs/test_sphere_512_flexgemm.glb (39.61 MB)
Criterion Score Trace Stage Notes
Shape Fidelity (SF) 8/10 1-4 Good overall shape, recognizable as sphere with wrapped elements
Structural Integrity (SI) 7/10 5 Minor holes visible, mostly watertight
Texture Clarity (TC) 8/10 7 FIXED - Contour artifacts eliminated by flex_gemm
Color Accuracy (CA) 8/10 3-4 RGB means: R=84.2, G=77.5, B=55.5 (correct earth tones)
PBR Material (PQ) 7/10 4,7-8 Metallic=0, Roughness=253-255 (slightly high)
UV Mapping (UV) 7/10 6 53.2% UV coverage, some seams visible
Mesh Topology (MT) 8/10 5 995K faces after decimation, good distribution
TOTAL 53/70 CONDITIONAL PASS - Major fix applied

Root Cause Found (Texture Contour Artifacts):

  • Symptom: Black moire/contour lines throughout fabric textures
  • Stage: 7 (Texture Baking - trilinear sampling)
  • Cause: PyTorch grid_sample_3d_torch.py fallback has subtle interpolation differences from native flex_gemm CUDA kernel
  • Fix: Changed base.py:668 from from ...utils.grid_sample_3d_torch import grid_sample_3d to from flex_gemm.ops.grid_sample import grid_sample_3d

Key Metrics (flex_gemm version):

  • Generation time: 13.9s
  • Export time: 26.1s
  • Peak GPU: 2.75 GB
  • File size: 39.61 MB

Previous Status (CRITICAL FAILURE - 2026-01-24)

Visual Inspection Result (2026-01-24 07:30)

FAILURE: Generated model is completely wrong compared to official benchmark.

Visual comparison performed via Playwright browser-based 3D viewer:

  • Input: reference/test_input.jpg (architectural model with pink brick tower + blue/green steel framework)
  • Our output: outputs/6e1837d4-a484-4e01-bcfc-ae7aac1104b4_model.glb
  • Benchmark: reference/sample_2026-01-24T055452.643.glb (official TRELLIS.2 output from identical image)
Metric Score Notes
Overall Shape 0.5/10 Completely wrong. Model appears fragmented, in weird disconnected pieces. Cannot even identify it as the same building without prior knowledge.
Texture Quality 0.1/10 Absolute failure. Visible repetitions, but impossible to properly evaluate because the underlying geometry is so fundamentally broken.
Spatial Generation BROKEN Model appears in disconnected fragments - indicates fundamental issues with the spatial/voxel generation pipeline itself, not just export.

Root Cause Analysis Required:

  • The mesh appears fragmented into disconnected pieces
  • This suggests issues in the sparse structure or shape latent sampling stages
  • Or potentially in the O-Voxel to mesh conversion (marching cubes)
  • The problem is NOT parameter tuning - this is a fundamental pipeline bug

Previous "WORKING" status was INCORRECT - assessment was based on:

  • File existence checks
  • PBR material presence verification
  • Vertex/face counts
  • But NOT actual visual inspection of generated geometry

This is a critical lesson: Technical metrics (file size, vertex count, material presence) do NOT indicate correct 3D generation. Visual inspection is mandatory.

Key Windows Compatibility Fixes

  1. o_voxel: Native CUDA extension compiled with /permissive- flag for MSVC 2022
  2. spconv: Native algorithm + dynamic dtype matching for fp16
  3. Attention: SDPA backend (flash_attn unavailable on Windows)
  4. GLB Export: Custom implementation using pyvista, xatlas, PyTorch F.grid_sample, OpenCV inpainting

Verified Working

TRELLIS 1 Pipelines

  • Text-to-3D: WORKING - Full pipeline with mesh/gaussian export
    • Work: 2026-01-20 | Modified: 2026-01-23
  • Image-to-3D: WORKING - Full pipeline with mesh/gaussian export
    • Work: 2026-01-20 | Modified: 2026-01-23

TRELLIS.2 Pipeline

  • Model Loading: WORKING - All 6 models load correctly (ss_model, slat_model_stage1/2/3, shape_slat_decoder, tex_slat_decoder)
    • Work: 2026-01-22 | Modified: 2026-01-23
  • 3-Stage Flow Sampling: WORKING - Sparse structure + shape latent + texture latent
    • Work: 2026-01-22 | Modified: 2026-01-23
  • Shape Decoder: WORKING - FlexiDualGridVaeDecoder with subdivision guides
    • Work: 2026-01-22 | Modified: 2026-01-23
  • Texture Decoder: WORKING - Uses guide_subs from shape decoder for PBR output
    • Work: 2026-01-23 | Modified: 2026-01-23
  • Mesh Extraction: WORKING - Marching cubes on SDF channel (80k+ vertices)
    • Work: 2026-01-23 | Modified: 2026-01-23
  • Low VRAM Mode: WORKING - Unloads decoders after use (~10GB peak)
    • Work: 2026-01-23 | Modified: 2026-01-23

Known Issues

Resolved

  • Texture Decoder guide_subs: FIXED - Implemented proper subdivision guide chaining
    • Shape decoder called with return_subs=True returns (decoded, subs) tuple
    • Texture decoder called with guide_subs=subs for proper upsampling
    • Low VRAM mode unloads models after use to fit in 24GB
    • Work: 2026-01-23 | Modified: 2026-01-23

spconv Compatibility

  • Windows/RTX 4090 dtype mismatch: FIXED - spconv requires float32, added conversion in flexi_decoder.py
    • Work: 2026-01-22 | Modified: 2026-01-23

Active Issues

CRITICAL: TRELLIS.2 Spatial Generation Failure (2026-01-24 07:30)

  • Status: ROOT CAUSE IDENTIFIED - spconv int32 overflow on Windows
  • Symptom: Generated mesh appears as disconnected fragments, completely different shape from input image
  • Visual Scores: Shape 0.5/10, Texture 0.1/10

Root Cause Analysis (2026-01-24 08:50):

  1. spconv int32 overflow - CONFIRMED

    • Official TRELLIS.2 uses flex_gemm backend (Linux only)
    • Windows uses spconv backend with int32 indices
    • The 1024_cascade pipeline creates ~17.7 million sparse voxels
    • spconv crashes with: your data exceed int32 range. this will be fixed in cumm + nvrtc (spconv 2.2/2.3)
    • Error occurs in FlexiDualGridVaeDecoder.forward() during shape decoding
  2. 512 pipeline works - VERIFIED

    • 512 resolution produces ~3 million voxels (within int32 limit)
    • Successfully generates mesh with 3M vertices, 6M faces
    • No spconv overflow errors
  3. Config difference:

    • Official: flex_gemm (supports 64-bit indices, Linux only)
    • Ours: spconv on Windows (trellis2/modules/sparse/config.py:6)
    • The previous fragmented output was from 1024_cascade hitting the int32 limit mid-generation

Solution Attempts:

  1. COP-OUT ATTEMPT (REJECTED by user 2026-01-24 10:15):

    • Proposed: Limit to 512 resolution on Windows to avoid overflow
    • User response: "thats a cop out. the correct solution is to actually rewriting spconv/or something better and more suited to do exactly what is needed"
    • Code limiting to 512 was written and REVERTED
  2. COP-OUT ATTEMPT #2 (REJECTED by user 2026-01-24 12:30):

    • Proposed: Use Open3D for mesh decimation instead of pyvista
    • User response: "thats a cop out...Theres a reason why the official trellis.2 uses the dependencies they do"
    • Issue: Open3D takes 163s for 2M faces, causes 90% memory usage, device unresponsive
    • Official behavior: pyvista/VTK is much faster and memory-efficient
    • Lesson: Substituting dependencies causes drift from official behavior
  3. PROPER SOLUTION - SPARSE CONV (COMPLETE):

    • Approach: Pure PyTorch sparse convolution with int64 indices
    • File: trellis2/modules/sparse/conv/conv_torch_native.py
    • Method:
      • Build per-kernel neighbor maps using sorted coordinate hash + binary search
      • Use scatter_add for aggregating kernel contributions
      • Process one kernel position at a time (memory efficient)
      • Cache neighbor maps per kernel size/dilation
    • Status: WORKING - Successfully processes 17.7M voxels in 1024_cascade pipeline
    • Verified: Full pipeline runs: sparse structure → shape SLat → texture SLat
    • Work: 2026-01-24 10:30 | Modified: 2026-01-24 12:30
  4. PROPER SOLUTION - MESH DECIMATION (COMPLETE):

    • Problem: pyvista/VTK crashes during decimation of 35M faces on Windows
    • Solution: CuMesh CUDA-accelerated mesh simplification compiled for Windows
    • Build fixes applied:
      • cumesh/setup.py: Added MSVC flags (/permissive-, /Zc:__cplusplus, /bigobj, /std:c++17)
      • cumesh/src/atlas.cu: Fixed CUDA 12.9+ preprocessor issue (defined CubSumOp type alias outside macro)
      • Cloned submodules: cubvh (trellis.2 branch), eigen
    • Status: WORKING - 19M→1M faces in 3 minutes
    • Work: 2026-01-24 16:00 | Modified: 2026-01-24 18:00
  5. BLOCKING: UV Rasterization Bottleneck (2026-01-24 18:00):

    • Problem: Pipeline hangs after CuMesh simplification completes
    • Root cause: Python loop iterating 1M faces for barycentric interpolation (lines 429-530 in base.py)
    • Location: trellis2/representations/mesh/base.py - export() method
    • Official solution: nvdiffrast CUDA rasterizer (available, verified working)
    • Status: NOT IMPLEMENTED - need to replace Python loops with nvdiffrast
  6. BLOCKING: cumesh Module Import Failure (2026-01-24 18:00):

    • Problem: import cumesh returns empty module (no CuMesh class)
    • Root cause: trellis-forge/cumesh/ source folder shadows installed package
    • Verification: print(dir(cumesh)) returns only ['__doc__', '__file__', ...]
    • Solution: Move cumesh/ source folder outside trellis-forge working directory
    • Work: 2026-01-24 18:00 | Modified: 2026-01-24 18:00

Pipeline Gaps vs Official TRELLIS.2 (Updated 2026-01-26):

Component Official Ours Status
Sparse conv flex_gemm flex_gemm (Windows build) ✅ MATCHING
Attention flash_attn flash_attn (Windows wheel) ✅ MATCHING
Mesh simplify cumesh cumesh (Windows build) ✅ MATCHING
Export pipeline o_voxel.postprocess.to_glb o_voxel.postprocess.to_glb ✅ MATCHING
3D sampling flex_gemm.grid_sample_3d flex_gemm.grid_sample_3d ✅ MATCHING
Config values official defaults official defaults ✅ MATCHING
Fallback code none none (deleted) ✅ MATCHING
  • Benchmark: reference/sample_2026-01-24T055452.643.glb (correct output from official platform using 1024_cascade)
  • Work: 2026-01-24 07:30 | Modified: 2026-01-24 18:00

Session Fixes (2026-01-23 22:15)

  • TRELLIS 1 Image-to-3D DinoV2 Loading: FIXED - Two issues resolved
    1. torch.hub.load path resolution: Changed from relative 'facebookresearch_dinov2_main' to absolute path local_dinov2_cache
    2. init double-init: Added if image_cond_model is not None guard to prevent calling _init_image_cond_model(None)
    • Root cause: When run from gui/backend/ working directory, torch.hub looked for repo relative to CWD
    • Root cause: base.from_pretrained calls cls() with only models, then from_pretrained calls _init_image_cond_model again
    • Location: trellis/pipelines/trellis_image_to_3d.py
    • Work: 2026-01-23 22:00 | Modified: 2026-01-23 22:15
  • TRELLIS.2 Pipeline Loading: FIXED - Multiple issues resolved
    1. Model path resolution: Added pipeline_dir tracking and proper relative vs HuggingFace path detection
    2. rope_phases state dict: Added computed buffer handling (rope_phases, pos_emb) with strict=False loading
    3. SparseStructureFlowModel init: Removed device reference in coords creation (uses CPU, moves with .to())
    • Location: trellis2/pipelines/base.py, trellis2/models/__init__.py, trellis2/models/sparse_structure_flow.py
    • Work: 2026-01-23 21:30 | Modified: 2026-01-23 21:50
  • TRELLIS.2 Texture Baking: FIXED - Coordinate axis ordering corrected in GLB export
    • Root cause: PyTorch grid_sample expects (x, y, z) mapping to (W, H, D) dimensions
    • Dense grid was indexed as (z, y, x) but sampled as (x, y, z)
    • Fix: Dense grid now uses grid[:, :, x, y, z] indexing
    • Fix: grid_sample coords swapped to (z, y, x) to match PyTorch expectations
    • Fix: Proper per-dimension normalization with align_corners=True
    • Location: trellis2/representations/mesh/base.py - to_dense_voxel_grid() and export()
    • Work: 2026-01-23 | Modified: 2026-01-23 21:50

Resolved Issues

  • GLB Export: FIXED - Custom PBR texture baking implementation
    • Uses pyvista for mesh decimation (same as official postprocessing_utils.py)
    • Uses xatlas for UV unwrapping via trimesh.unwrap()
    • GPU-accelerated texture baking via PyTorch F.grid_sample on dense voxel grid
    • OpenCV inpainting for unmapped UV regions
    • Full PBR material output: base_color (RGBA), ORM (Occlusion/Roughness/Metallic)
    • Work: 2026-01-23 13:30 | Modified: 2026-01-23 13:43
  • o_voxel CUDA Extension: FIXED - Compiled with MSVC 2022 compatibility
    • Added /permissive- and /Zc:__cplusplus flags for C++17 conformance
    • Added /bigobj for large object files
    • No flex_gemm needed - postprocess module optional
    • Work: 2026-01-23 12:00 | Modified: 2026-01-23 13:43
  • spconv dtype mismatch: FIXED - Layer dtype conversion for fp16 weights
    • spconv native algorithm for Windows compatibility
    • Dynamic dtype matching to input features
    • Work: 2026-01-23 12:15 | Modified: 2026-01-23 12:22
  • spconv weight format: FIXED - Both flex_gemm and spconv use (Co, Kd, Kh, Kw, Ci)
    • No permutation needed - direct weight copy
    • Work: 2026-01-23 12:10 | Modified: 2026-01-23 12:22

Recently Resolved

  • Stage 2 RoPE device mismatch: FIXED - Replaced with official RoPE implementation
    • New pattern: self.freqs.to(indices.device) inside _get_phases() method
    • Work: 2026-01-23 10:50 | Modified: 2026-01-23 15:00
  • sample_tex_slat wrong pattern: FIXED - Replaced with official pipeline implementation
    • Official pattern: denormalize shape_slat, create noise with remaining channels, pass via concat_cond
    • Work: 2026-01-23 10:30 | Modified: 2026-01-23 15:00
  • BiRefNet gated repo: FIXED - Switched from briaai/RMBG-2.0 to ZhengPeng7/BiRefNet
    • ZhengPeng7/BiRefNet is freely available via transformers AutoModelForImageSegmentation
    • Work: 2026-01-23 10:00 | Modified: 2026-01-23 15:00

Changelog

2026-02-01 (Production-Ready + OOM Fix)

  • GUI Application Production-Ready: Full generation workflow verified stable
    • Electron frontend + FastAPI backend working seamlessly
    • All CUDA extensions loading correctly
    • Output quality matches official HuggingFace demo
    • Work: 2026-02-01 07:00 | Modified: 2026-02-01 08:00
  • OOM Fix: Forced low_vram=True for TRELLIS.2 pipeline
    • Root cause: configure_vram_mode() set low_vram=False for 24GB+ GPUs
    • This kept all flow models on GPU, causing OOM during diffusion
    • Fix: Force low_vram=True regardless of GPU size in main.py:343-345
    • Trade-off: ~10-15% slower generation, but 100% reliable memory
    • Work: 2026-02-01 07:30 | Modified: 2026-02-01 08:00
  • Image Mode Fix: Removed .convert('RGB') to preserve alpha channel
    • Root cause: GUI was stripping alpha, causing BiRefNet to run unnecessarily
    • RGBA images should use existing alpha mask, not regenerate via BiRefNet
    • Fix: image = Image.open(image_path) without conversion in main.py:558
    • This ensures GUI output matches test script output exactly
    • Work: 2026-02-01 07:00 | Modified: 2026-02-01 08:00
  • Pipeline State Documented: Critical call relationships frozen
    • All critical files and their relationships documented
    • Memory management rationale explained
    • Freezing options provided (git tag, branch, GitHub release)
    • Work: 2026-02-01 08:00 | Modified: 2026-02-01 08:00

2026-01-29 (Performance Optimization + Frontend Fix)

  • Frontend Parameter Fix: Stage 2 Shape Guidance default was 3.0, should be 7.5
    • File: gui/electron/index.html line 165-166
    • Caused poor shape fidelity (model not following input image)
    • Backend had correct default (7.5), frontend was overriding with wrong value
    • Work: 2026-01-29 02:00 | Modified: 2026-01-29 02:00
  • 14x Speedup: Total generation time reduced from 78 minutes to 5.45 minutes
    • Root cause: manual_cast() allocating + copying tensors 2,880 times per sampling stage
    • Fix: torch.autocast('cuda', dtype=torch.bfloat16) makes torch.is_autocast_enabled() return True
    • Result: manual_cast() returns tensors unchanged (no allocation, no copy)
    • Work: 2026-01-29 00:00 | Modified: 2026-01-29 01:00
  • cuDNN Benchmark: Added torch.backends.cudnn.benchmark = True
    • Files: gui/backend/main.py, run_generation_test.py, diagnose_performance.py
    • Enables kernel auto-tuning for conv operations
    • Work: 2026-01-29 00:00 | Modified: 2026-01-29 00:15
  • High-VRAM Mode: Added configure_vram_mode() to pipeline
    • File: trellis2/pipelines/trellis2_image_to_3d.py
    • Disables low_vram for GPUs with ≥20GB (keeps flow models on GPU)
    • Threshold lowered from 24GB to 20GB (RTX 4090 reports 23.98GB)
    • Work: 2026-01-29 00:15 | Modified: 2026-01-29 00:30
  • Autocast Wrapper: Added to run() method in pipeline
    • Wraps all sampling operations in torch.autocast('cuda', dtype=torch.bfloat16)
    • Upsample decoder uses torch.autocast(enabled=False) due to flex_gemm Triton dtype requirement
    • Nested autocast properly resumes after disabled context (verified with test_nested_resume.py)
    • Work: 2026-01-29 00:30 | Modified: 2026-01-29 01:00

2026-01-28 (Full Generation Test)

  • Parameter Fixes: Aligned ImageTo3DRequest with official TRELLIS.2 app.py
    • slat_cfg: 3.0 → 7.5 (shape guidance strength)
    • texture_size: 1024 → 2048 (4x more texels)
    • decimation_target: 1,000,000 → 500,000 (match official)
    • Added explicit rescale_t to all sampler params (5.0/3.0/3.0)
    • Files: gui/backend/main.py (4 edits)
    • Work: 2026-01-28 00:00 | Modified: 2026-01-28 00:30
  • Generation Test Script: Created run_generation_test.py
    • Standalone script bypassing FastAPI
    • Per-stage GPU/RAM instrumentation via ResourceMonitor class
    • Uses psutil for system memory tracking
    • Output: GLB + preprocessed image + JSON resource report
    • Work: 2026-01-28 00:30 | Modified: 2026-01-28 00:45
  • Full Generation Run: Successfully generated 3D model from test_input.jpg
    • Total time: 4,676s (~78 minutes)
    • Peak GPU: 26,538 MB (via expandable_segments unified memory)
    • Output: 21.8 MB GLB with 472,784 faces
    • Visual parity confirmed by user
    • Work: 2026-01-28 00:45 | Modified: 2026-01-28 02:00
  • Analysis Scripts: Created structural and texture comparison tools
    • analyze_glb.py: Compares mesh metrics (vertices, faces, bounds, materials)
    • analyze_texture.py: Compares texture histograms, black pixel ratio, PBR values
    • All histogram distances < 0.1 (PASS)
    • Work: 2026-01-28 02:00 | Modified: 2026-01-28 02:15

2026-01-24 (Full Integration Testing)

  • Genesis Rename: Renamed frontend application from "TRELLIS Forge" to "Genesis"
    • gui/electron/package.json: name="genesis", productName="Genesis"
    • gui/electron/main.js: Window title "Genesis - 3D Generation"
    • gui/electron/index.html: Page title and header updated
    • Work: 2026-01-24 00:00 | Modified: 2026-01-24 00:00
  • TRELLIS 1 Text-to-3D: API functional (visual verification pending)
    • Job ID: 3c4eaac0-c56f-44b4-b84d-137cf0177be5
    • Output: 1.25 MB GLB with 1024x1024 baseColorTexture
    • WARNING: Only technical metrics verified, NOT visual quality
    • Work: 2026-01-23 23:00 | Modified: 2026-01-24 07:30
  • TRELLIS.2 Image-to-3D (Native): FAILED VISUAL INSPECTION
    • Job ID: 6e1837d4-a484-4e01-bcfc-ae7aac1104b4
    • Output: 10.39 MB GLB (285,209 vertices, 243,724 faces)
    • PBR Materials: Present but irrelevant due to broken geometry
    • VISUAL INSPECTION RESULT: Complete failure
      • Shape score: 0.5/10 - Fragmented, unrecognizable
      • Texture score: 0.1/10 - Visible repetitions, unusable
      • Model appears in disconnected pieces, fundamentally broken spatial generation
    • Work: 2026-01-23 23:15 | Modified: 2026-01-24 07:30
  • Benchmark Comparison: FAILED
    • Our output: Fragmented, wrong shape, unrecognizable
    • Official benchmark (reference/sample_2026-01-24T055452.643.glb): Correct architectural model
    • Previous claim of "matching quality" was INCORRECT - based on metrics, not visual inspection
    • Work: 2026-01-24 00:00 | Modified: 2026-01-24 07:30

2026-01-24 (Frontend Pipeline Separation)

  • Separate Parameter Panels: Created distinct UI controls for each pipeline
    • TRELLIS.1 Text-to-3D: seed, sparse_steps, sparse_cfg, slat_steps, slat_cfg, simplify, texture_size
    • TRELLIS.2 Image-to-3D: seed, resolution, 3-stage guidance (sparse/shape/texture rescale), simplify, texture_size
    • Mode switch automatically shows/hides appropriate settings panel
    • Location: gui/electron/index.html, gui/electron/app.js, gui/electron/styles.css
    • Work: 2026-01-24 09:00 | Modified: 2026-01-24 09:00
  • Backend API Update: Added TRELLIS.2 parameters to /api/generate/image endpoint
    • New parameters: pipeline_version, resolution, guidance_rescale_sparse/shape/material
    • Location: gui/backend/main.py
    • Work: 2026-01-24 09:00 | Modified: 2026-01-24 09:00

2026-01-24 (State Dict Key Remapping)

  • Bidirectional State Dict Key Remapping: Made _remap_state_dict_keys() handle both directions
    • Direction 1: flex_gemm → spconv: conv.weightconv.conv.weight (for TRELLIS.2-4B)
    • Direction 2: spconv → nn.Conv3d: conv.conv.weightconv.weight (for TRELLIS 1 decoders)
    • Detection: Compares model's expected keys vs state_dict keys to determine remapping direction
    • JeffreyXiang/TRELLIS-image-large weights were saved with nested format, TRELLIS 1 expects flat
    • Applied to both trellis/models/__init__.py and trellis2/models/__init__.py
    • Work: 2026-01-24 08:00 | Modified: 2026-01-24 15:00
  • HuggingFace Offline Mode Fixes: Updated all model loading to use local_files_only=True
    • trellis/pipelines/base.py - Pipeline config loading
    • trellis/models/__init__.py - Model weights loading
    • trellis/pipelines/trellis_text_to_3d.py - CLIP model loading
    • Work: 2026-01-24 07:00 | Modified: 2026-01-24 08:00
  • CLIP Cache Path Fix: Fixed TRANSFORMERS_CACHE pointing to wrong directory
    • Was: models/transformers (empty)
    • Now: models/hub (where CLIP model is cached)
    • Location: gui/backend/main.py line 23
    • Work: 2026-01-24 14:30 | Modified: 2026-01-24 15:00
  • Lazy Pipeline Imports: Deferred transformers import to allow env vars to be set first
    • trellis/pipelines/__init__.py - Uses __getattr__ for lazy class imports
    • trellis/pipelines/trellis_text_to_3d.py - Deferred from transformers import CLIPTextModel, AutoTokenizer to inside _init_text_cond_model()
    • Root cause: Module-level transformers import cached HF paths before env vars were set
    • Work: 2026-01-24 14:45 | Modified: 2026-01-24 15:00
  • Text-to-3D init Fix: Prevented _init_text_cond_model(None) call during pipeline loading
    • Pipeline.from_pretrained() calls cls(_models) which triggers __init__ with text_cond_model=None
    • Added if text_cond_model is not None: check before calling _init_text_cond_model
    • from_pretrained() handles text_cond_model initialization separately
    • Work: 2026-01-24 15:00 | Modified: 2026-01-24 15:00
  • Pipeline Import Fix: Restored separate TrellisImageTo3DPipeline for TRELLIS 1
    • Created trellis/pipelines/trellis_image_to_3d.py with TRELLIS 1 implementation
    • Fixed incorrect alias in trellis/pipelines/__init__.py that mapped V1 to V2
    • TRELLIS 1 uses slat_sampler, TRELLIS 2 uses shape_slat_sampler - these are incompatible
    • Work: 2026-01-24 06:30 | Modified: 2026-01-24 08:00

2026-01-23 (Backend-Frontend Wiring)

  • Image-to-3D Backend Integration: TRELLIS.2 wired to FastAPI backend
    • Updated run_image_to_3d_job() to detect TRELLIS.2 output format (List[MeshWithVoxel])
    • TRELLIS.2 export uses MeshWithVoxel.export() directly (not postprocessing_utils)
    • No gaussian output from TRELLIS.2 (different architecture than TRELLIS.1)
    • Preview generation skipped for TRELLIS.2 (requires mesh rendering, not gaussian)
    • Location: gui/backend/main.py

2026-01-23 (GLB Export Implementation)

  • MeshWithVoxel.export(): Complete GLB export with PBR textures
    • to_dense_voxel_grid(): Converts sparse voxel attrs to dense 3D grid for trilinear sampling
    • export(path, simplify, texture_size, verbose): Full export pipeline
    • Uses pyvista for decimation (5% default, ~250k faces output)
    • Uses xatlas for UV unwrapping via trimesh.unwrap()
    • GPU texture baking: PyTorch F.grid_sample on [1, C, D, H, W] dense voxel grid
    • UV rasterization: Barycentric interpolation to map UV coords to 3D positions
    • OpenCV inpainting: TELEA algorithm for unmapped regions
    • Proper glTF PBR material: base_color RGBA + ORM (Occlusion, Roughness, Metallic)
    • Coordinate conversion: Z-up to Y-up for GLB compatibility
    • Location: trellis2/representations/mesh/base.py
  • o_voxel CUDA Extension: Compiled successfully on Windows
    • Added /permissive- flag for strict C++ conformance (fixes std namespace ambiguity)
    • Added /Zc:__cplusplus for correct C++17 detection
    • Flex_gemm optional - postprocess module loads only if available
    • Location: o_voxel/setup.py

2026-01-23 (TRELLIS.2 Alignment)

  • Full Pipeline Replacement: Replaced trellis2_image_to_3d.py with official implementation
    • Added model_names_to_load list for proper model loading
    • Added get_cond(image, resolution) with resolution parameter for DinoV3
    • Added sample_shape_slat_cascade() for LR→HR cascade with max_num_tokens limit
    • Changed preprocess_image padding from 1.2 to 1.0 (official)
    • Added proper pipeline_type handling: '512', '1024', '1024_cascade', '1536_cascade'
    • Added decode_tex_slat() returns ret * 0.5 + 0.5 (official normalization)
  • BiRefNet Replacement: Switched to ZhengPeng7/BiRefNet
    • Location: trellis/pipelines/rembg/BiRefNet.py
    • Uses transformers.AutoModelForImageSegmentation with trust_remote_code=True
    • 1024x1024 input resolution, proper normalization
  • DinoV3 Feature Extractor: Already aligned with official
    • Location: trellis/modules/image_feature_extractor.py
    • Uses transformers.DINOv3ViTModel with RoPE position embeddings
    • Resolution-aware via self.image_size parameter
  • RoPE Implementations: Replaced with official pattern
    • Dense: trellis/modules/attention/rope.py - RotaryPositionEmbedder
    • Sparse: trellis/modules/sparse/attention/rope.py - SparseRotaryPositionEmbedder
    • Key fix: self.freqs.to(indices.device) inside _get_phases() instead of register_buffer
  • Sparse Spatial Blocks: Replaced with official implementation
    • Location: trellis/modules/sparse/spatial.py
    • Added SparseSpatial2Channel, SparseChannel2Spatial re-exports
    • SparseDownsample now has mode parameter and subdivision caching
    • SparseUpsample takes optional subdivision parameter
  • FlexiDualGridVaeDecoder: Already aligned (from previous session)
    • Uses o_voxel.convert.flexible_dual_grid_to_mesh for mesh extraction
    • Proper subdivision guide handling with pred_subdiv parameter

2026-01-23 (Late)

  • TRELLIS.2 Texture Decoding Fix: Implemented proper subdivision guide chaining
    • decode_shape() now returns (results, subs) tuple with subdivision guides
    • decode_texture() accepts guide_subs parameter for texture decoder
    • Added low_vram mode to unload decoders after use (peak ~10GB)
    • Added _merge_mesh_with_pbr() for MeshWithVoxel output
    • Added pbr_attr_layout for proper PBR channel mapping
    • Updated run() to chain decoders: shape→subs→texture
    • Added resolution parameter to control output mesh detail
  • V2 as Default: Changed ImageTo3DRequest default to pipeline_version="v2"
    • Backend API now uses TRELLIS.2 by default for Image-to-3D
    • Added resolution parameter (default 256) to API
    • Updated guidance_rescale defaults to match official TRELLIS.2
  • MeshWithVoxel: Added simplified MeshWithVoxel class
    • Located at trellis/representations/mesh/mesh_with_voxel.py
    • Stores mesh geometry with sparse PBR voxel attributes
    • Includes to_glb() export (base color only, full PBR requires additional work)

2026-01-23

  • Cleaned up debug/test files from repository root
  • Renamed folder: trellis -> trellis-forge
  • Cloned official repos: trellis_1_official, trellis.2_official
  • Removed training-only code: dataset_toolkits/, trellis/trainers/, trellis/datasets/

2026-01-22

  • Implemented FlexiDualGridVaeDecoder for TRELLIS.2
  • Added spconv float32 conversion for Windows compatibility
  • Implemented mesh extraction via marching cubes on O-Voxel SDF

Reference Architecture

TRELLIS.2 O-Voxel Format

7-channel features per voxel:

  • Channels 0-2: RGB color
  • Channel 3: Metallic
  • Channel 4: Roughness
  • Channel 5: Opacity
  • Channel 6: SDF (signed distance field)

Model Dependency Graph

Text-to-3D (TRELLIS 1):
  microsoft/TRELLIS-text-xlarge
  └── loads decoder from: JeffreyXiang/TRELLIS-image-large

Image-to-3D (TRELLIS 1):
  microsoft/TRELLIS-image-large

Image-to-3D PBR (TRELLIS.2):
  microsoft/TRELLIS.2-4B
  ├── ss_model (sparse structure)
  ├── slat_model_stage1 (sparse latent stage 1)
  ├── slat_model_stage2 (shape latent stage 2)
  ├── slat_model_stage3 (texture latent stage 3)
  ├── shape_slat_decoder (pred_subdiv=True)
  └── tex_slat_decoder (pred_subdiv=False, needs guide_subs)