Skip to content

Refactor main functions to enable multi-node distribution#118

Merged
francois-drielsma merged 5 commits intomainfrom
feature/multi-node
Mar 8, 2026
Merged

Refactor main functions to enable multi-node distribution#118
francois-drielsma merged 5 commits intomainfrom
feature/multi-node

Conversation

@francois-drielsma
Copy link
Member

@francois-drielsma francois-drielsma commented Mar 8, 2026

Description

This PR enables multi-node distributed training (proper handling of externally set RANK/WORLD_SIZE.

Type of Change

This is a new, non-breaking feature:

  • New feature (non-breaking change which adds functionality)
  • Code quality improvement (refactoring, type hints, etc.)
  • Performance improvement

How Has This Been Tested?

I have trained UResNet-PPN with single-node/single-gpu, single-node/multi-gpu and multi-node/multi-gpu settings, all function as expected. See /sdf/data/neutrino/ndlar/spine/train/multinode.

Test Configuration:

  • SPINE version: v0.10.5

@codecov
Copy link

codecov bot commented Mar 8, 2026

Codecov Report

❌ Patch coverage is 22.00000% with 39 lines in your changes missing coverage. Please review.
✅ Project coverage is 15.01%. Comparing base (db8e20a) to head (29c1b02).
⚠️ Report is 6 commits behind head on main.

Files with missing lines Patch % Lines
src/spine/main.py 29.16% 17 Missing ⚠️
src/spine/model/manager.py 15.38% 11 Missing ⚠️
src/spine/utils/torch/devices.py 22.22% 7 Missing ⚠️
src/spine/bin/cli.py 0.00% 3 Missing ⚠️
src/spine/driver.py 0.00% 1 Missing ⚠️
Additional details and impacted files

Impacted file tree graph

@@            Coverage Diff             @@
##             main     #118      +/-   ##
==========================================
- Coverage   15.17%   15.01%   -0.17%     
==========================================
  Files         273      273              
  Lines       20572    20591      +19     
==========================================
- Hits         3122     3091      -31     
- Misses      17450    17500      +50     
Flag Coverage Δ
unittests 15.01% <22.00%> (-0.17%) ⬇️

Flags with carried forward coverage won't be shown. Click here to find out more.

Files with missing lines Coverage Δ
src/spine/driver.py 11.14% <0.00%> (ø)
src/spine/bin/cli.py 22.58% <0.00%> (-0.56%) ⬇️
src/spine/utils/torch/devices.py 22.72% <22.22%> (+2.72%) ⬆️
src/spine/model/manager.py 10.38% <15.38%> (-8.73%) ⬇️
src/spine/main.py 18.05% <29.16%> (-0.70%) ⬇️

... and 3 files with indirect coverage changes

🚀 New features to boost your workflow:
  • ❄️ Test Analytics: Detect flaky tests, report on failures, and find test suite problems.

@francois-drielsma francois-drielsma merged commit 51f5ebf into main Mar 8, 2026
14 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant