Skip to content

implement lc0 neural network inference#41

Merged
NripeshN merged 50 commits intomainfrom
copilot/implement-lc0-neural-network-inference-again
Feb 9, 2026
Merged

implement lc0 neural network inference#41
NripeshN merged 50 commits intomainfrom
copilot/implement-lc0-neural-network-inference-again

Conversation

@NripeshN
Copy link
Copy Markdown
Owner

@NripeshN NripeshN commented Feb 7, 2026

Note

Medium Risk
Build and dependency surface changes (protobuf generation, new abseil/zlib links, workflow tweaks) plus significant CMake refactoring could break cross-platform builds or packaging if any dependency resolution differs across runners.

Overview
Adds a new transformer-network inference pipeline (protobuf net.proto generation + src/nn/* sources) and wires it into MCTS/Hybrid via the NNWeights option, with Metal/MPSGraph implementations included for macOS.

Refactors build and runtime GPU integration: removes CUDA backend/build branches and the standalone GPU benchmark, renames/reshapes GPU NNUE integration (gpu_integration.*), and makes AB evaluation always use CPU NNUE (GPU reserved for batched MCTS evaluation). Also updates NNUE weight handling to use a top-level networks/ directory and adjusts download/copy paths.

Updates CI workflows to install protobuf/zlib/abseil across OSes (Windows via vcpkg) and modifies the Elo tournament workflow to be manual-only with optional PR comment posting by input pr_number.

Written by Cursor Bugbot for commit e61c002. This will update automatically on new commits. Configure here.

Copilot AI and others added 21 commits January 24, 2026 15:32
- [x] Created protobuf format for network weights (adapted from compatible format)
- [x] Implemented weight loader with gzip support (.pb/.pb.gz files)
- [x] Implemented 112-plane position encoder
- [x] Created policy mapping infrastructure (1858 move space)
- [x] Implemented network interface with stub backend
- [x] Created MCTS evaluator integration
- [x] Updated build system (protobuf, zlib dependencies)
- [x] Created comprehensive test suite
- [x] All copyright headers updated to MetalFish
- [x] All lc0/lczero references removed
- [ ] TODO: Metal backend for transformer inference (requires ~2000 LOC)
- [ ] TODO: Full policy mapping tables
- [ ] TODO: Canonicalization transforms
- [ ] TODO: Integration with ThreadSafeMCTS
- [ ] TODO: Verification tests with actual network

Co-authored-by: NripeshN <86844847+NripeshN@users.noreply.github.com>
- Fix typo: 'Directcopy' -> 'Direct copy'
- Add error checking for dup() before gzdopen()
- Fix castling rights encoding (use WHITE_OO/BLACK_OO instead of bitwise AND)
- Fix policy mapping to stay within bounds (simplified but correct)
- Specify minimum protobuf version (3.0) in CMakeLists
Co-authored-by: NripeshN <86844847+NripeshN@users.noreply.github.com>
…ization

- Implemented full 8-position history encoding (was single position only)
- Added canonicalization transforms (flip/mirror/transpose)
- Implemented ChooseTransform() for optimal board orientation
- Added bit manipulation helpers (ReverseBitsInBytes, ReverseBytesInBytes, TransposeBitsInBytes)
- Proper auxiliary plane encoding (castling, en passant, rule50, etc.)
- Support for all input formats (classical, canonical v1/v2, hectoplies, armageddon)
- Fixed castling rights encoding with proper perspective
- Added early stopping for canonical formats
- Transform application to all piece and auxiliary planes

Co-authored-by: NripeshN <86844847+NripeshN@users.noreply.github.com>
…ence

- Create src/nn/metal/metal_network.h and metal_network.mm (~730 LOC)
- Implement complete transformer architecture with MPSGraph
- Support embedding, multi-head attention, FFN, layer norm
- Handle WDL output, batch processing, multiple activations
- Integrate with Network interface and CMake build system
- Link MetalPerformanceShadersGraph framework

Co-authored-by: NripeshN <86844847+NripeshN@users.noreply.github.com>
- Add comprehensive README.md for Metal backend
- Fix manual memory management (disable ARC, explicit release)
- Add null checks before releasing Metal objects
- Document memory management approach in file header

Co-authored-by: NripeshN <86844847+NripeshN@users.noreply.github.com>
Co-authored-by: NripeshN <86844847+NripeshN@users.noreply.github.com>
- Implement NNMCTSEvaluator with policy and value extraction
- Integrate NN evaluator into ThreadSafeMCTS for policy priors and evaluation
- Update test_nn_comparison with comprehensive NN testing
- Add fallback to GPU NNUE when NN weights unavailable
- Use vector<pair<Move,float>> to avoid std::hash<Move> issues

Co-authored-by: NripeshN <86844847+NripeshN@users.noreply.github.com>
- Replace hardcoded 0.7f, 0.3f, 10000.0f with named constants
- Improve code maintainability and make blending strategy clear
- Add comments explaining the configuration

Co-authored-by: NripeshN <86844847+NripeshN@users.noreply.github.com>
…e documentation

Co-authored-by: NripeshN <86844847+NripeshN@users.noreply.github.com>
Resolved conflicts:
- .gitignore: Kept both entries (_codeql files and network file)
- CMakeLists.txt: Merged MCTS sources, NN sources, and framework linking
  - Combined all MCTS source files from both branches
  - Added NN_SOURCES section from HEAD
  - Merged Metal/CUDA framework linking for all targets
  - Merged GPU benchmark targets with all frameworks
- src/mcts/thread_safe_mcts.cpp: Merged NN policy evaluation with softmax
  - Kept NN policy evaluation code from HEAD
  - Kept temperature-based softmax normalization from main
…s to queen

Co-authored-by: NripeshN <86844847+NripeshN@users.noreply.github.com>
…l PR number input

- Changed workflow trigger to only allow manual execution.
- Added input for specifying a PR number to run the tournament on.
- Updated concurrency group to use the provided PR number or run ID.
- Modified comment handling to check for the PR number input instead of the event type.
…S, Ubuntu, and Windows

- Added installation of protobuf and zlib dependencies for macOS and Ubuntu environments.
- Introduced a new step for Windows to install dependencies using vcpkg.
- Updated CMake configuration steps to accommodate the new dependency installations across different OS.
- Removed stockfish_adapter.cpp, hybrid_search.cpp, enhanced_hybrid_search.cpp, mcts_batch_evaluator.cpp, mcts_tt.cpp, and parallel_search.cpp from the MCTS_SOURCES list.
- Streamlined the build configuration for MCTS components.
…tegration

- Added custom commands in CMakeLists.txt to generate protobuf files from net.proto, ensuring compatibility with the installed protobuf version.
- Updated NN_SOURCES to include generated protobuf files and adjusted include directories accordingly.
- Modified CI workflows to install the abseil library alongside protobuf and zlib for macOS and Ubuntu environments.
- Ensured that the build configuration links against the absl library where necessary.
- Enhanced CMakeLists.txt to better manage absl library detection and linking, including support for pkg-config on Linux.
- Modified CI workflows to remove absl library installation, streamlining dependency management for Ubuntu environments.
…lative castling rights

Co-authored-by: NripeshN <86844847+NripeshN@users.noreply.github.com>
Co-authored-by: NripeshN <86844847+NripeshN@users.noreply.github.com>
Copilot AI review requested due to automatic review settings February 7, 2026 15:32
@NripeshN NripeshN changed the title Copilot/implement lc0 neural network inference again implement lc0 neural network inference again Feb 7, 2026
@NripeshN NripeshN changed the title implement lc0 neural network inference again implement lc0 neural network inference Feb 7, 2026
Copy link
Copy Markdown
Contributor

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

This PR reintroduces an Lc0-compatible (transformer/BT4-style) neural-network inference pipeline into MetalFish, wiring together weight loading, input encoding, policy mapping, a Metal/MPSGraph backend, and MCTS integration for NN-guided search.

Changes:

  • Added NN infrastructure: protobuf weight schema, loader/decoder, 112-plane encoder, and 1858-move policy mapping.
  • Implemented/connected a Metal (MPSGraph) NN backend behind a Network interface with a stub fallback.
  • Integrated NN evaluation and policy priors into ThreadSafeMCTS, plus added a standalone NN comparison test and updated build/CI dependencies.

Reviewed changes

Copilot reviewed 31 out of 32 changed files in this pull request and generated 1 comment.

Show a summary per file
File Description
tests/test_nn_comparison.cpp Adds a benchmark-style executable to exercise encoder, loader, network eval, and MCTS best-move checks.
src/nn/proto/net.proto Introduces protobuf schema for NN weight files.
src/nn/loader.{h,cpp} Adds weight loading, gzip decompression, and layer decoding.
src/nn/encoder.{h,cpp} Adds 112-plane position encoding and canonicalization-related helpers.
src/nn/policy_map.{h,cpp} Adds 1858-policy move mapping tables and lookup helpers.
src/nn/network.{h,cpp} Defines abstract NN API + factory; uses Metal backend when enabled, else stub.
src/nn/network_legacy.{h,cpp} Adds parsed/decoded weight structures for (multi)head formats.
src/nn/metal/** Adds Metal/MPSGraph implementation scaffolding, builder, and tables.
src/mcts/nn_mcts_evaluator.{h,cpp} Adds NN→MCTS bridge: encode, evaluate, map policy to legal moves.
src/mcts/thread_safe_mcts.{h,cpp} Wires NN evaluator into MCTS expansion/eval and tracks NN eval stats.
CMakeLists.txt Adds protobuf codegen, zlib/protobuf/abseil linking, Metal NN sources, and NN test target.
.github/workflows/{ci.yml,elo-tournament.yml} Installs protobuf/zlib deps in CI; adjusts Elo workflow to manual trigger with PR number input.
.gitignore Adds CodeQL build artifacts to ignore list.
IMPLEMENTATION_SUMMARY.md Adds a large implementation summary document for the NN work.

- Fix encoder missing vertical board flip for black to move: Apply
  ReverseBytesInBytes to all bitboards when black is the side to move,
  ensuring the board is always oriented from the side-to-move's perspective

- Fix policy map incorrectly mapping knight promotions: Add missing knight
  promotion entries (a7a8n, b7a8n, etc.) to kMoveStrings table and change
  knight promotion mapping from 'q' to 'n'. Update kPolicyOutputs from 1858
  to 1880 to account for the additional entries

- Fix history perspective flip conditional: Remove the incorrect condition
  'if (history_idx > 0)' that prevented perspective alternation for the
  first history position. Perspective should alternate unconditionally

- Remove stale contradictory content from IMPLEMENTATION_SUMMARY.md: The
  file had duplicate content starting at line 240 that contradicted the
  completed implementation status shown earlier in the file
@cursor
Copy link
Copy Markdown

cursor bot commented Feb 7, 2026

Bugbot Autofix resolved 4 of the 5 bugs found in the latest run.

  • ✅ Fixed: Encoder missing vertical board flip for black to move
    • Applied ReverseBytesInBytes to all bitboards when black is to move in both the simple and full encoder overloads, ensuring the board is oriented from the side-to-move's perspective.
  • ✅ Fixed: Policy map incorrectly maps knight promotions to queen
    • Added missing knight promotion entries (e.g., a7a8n, b7a8n) to kMoveStrings table, changed the knight promotion mapping from 'q' to 'n', and updated kPolicyOutputs from 1858 to 1880.
  • ✅ Fixed: History perspective flip conditional on wrong check
    • Removed the incorrect 'if (history_idx > 0)' condition so that the flip toggle alternates unconditionally for each history step as the NN training expects.
  • ✅ Fixed: Contradictory implementation summary with stale appended content
    • Removed the duplicate stale content starting at line 240 that contradicted the completed implementation status described in the first half of the document.

Co-authored-by: Copilot <175728472+Copilot@users.noreply.github.com>
NripeshN and others added 2 commits February 7, 2026 17:02
Bug 1: Policy table size mismatch (high severity)
- Changed kPolicyOutputs from 1880 to 1858 (standard Lc0 encoding)
- Removed queen promotions from kMoveStrings (they are encoded as regular moves)
- Updated MoveToNNIndex to treat queen promotions as regular queen-direction moves
- Only underpromotions (r/b/n) use explicit promotion indices 1792-1857

Bug 2: Encoder wrong color perspective (medium severity)
- Changed perspective_us from pos.side_to_move() to 'us' (current position's STM)
- All history positions now correctly encoded from current STM's perspective
- Board flip now consistently uses current STM for all history entries

Bug 3: ApplyNNPolicy incorrect logit clamping (medium severity)
- Removed incorrect clamping of negative policy logits to 0.0f
- Neural network logits can be negative (indicating low-probability moves)
- Clamping destroyed the relative ordering among low-probability moves

Bug 4: Float16 denormalized conversion (low severity)
- Fixed denormalized fp16 to fp32 conversion with proper renormalization
- Now correctly finds leading 1 bit and adjusts exponent accordingly
- Value = 2^(-14) × (mantissa/1024) is now properly represented in fp32
@cursor
Copy link
Copy Markdown

cursor bot commented Feb 7, 2026

Bugbot Autofix prepared fixes for 4 of the 4 bugs found in the latest run.

  • ✅ Fixed: Policy table size mismatch breaks all promotion moves
    • Changed kPolicyOutputs from 1880 to 1858, removed queen promotions from kMoveStrings table, and updated MoveToNNIndex to treat queen promotions as regular queen-direction moves.
  • ✅ Fixed: Encoder uses wrong color perspective for history positions
    • Changed perspective_us from pos.side_to_move() to the current position's 'us' variable, ensuring all history positions are encoded from the consistent perspective of the current STM.
  • ✅ Fixed: ApplyNNPolicy clamps negative logits distorting policy distribution
    • Removed the incorrect clamping of negative logits to 0.0f, allowing negative logits to properly flow through softmax to preserve the relative ordering of low-probability moves.
  • ✅ Fixed: Incorrect float16 denormalized number conversion
    • Implemented proper denormalized fp16 to fp32 conversion by finding the leading 1 bit in the mantissa and correctly computing the fp32 exponent as (103 + leading_bit) with proper fraction alignment.

Bug fixes:
1. MoveToNNIndex: Apply transform before attention policy map lookup
   - Previously, the attention policy map was queried with raw (untransformed)
     squares and returned early, bypassing the transform logic entirely
   - Now transforms are applied first, ensuring canonical move mapping works
     correctly for non-promotion moves

2. TransformSquare: Fix incorrect transpose implementation
   - Previously used OR-based conditions that treated kTransposeTransform as
     both flip and mirror (180° rotation)
   - Now correctly applies transforms in sequence: flip (file→7-file),
     mirror (rank→7-rank), transpose (swap file↔rank), matching the encoder's
     ApplyTransform behavior

3. CMakeLists.txt: Add missing Accelerate framework to test_nn_comparison
   - The test_nn_comparison target includes MCTS sources that use Accelerate
     framework functions on macOS, but was missing the framework link
@cursor
Copy link
Copy Markdown

cursor bot commented Feb 8, 2026

Bugbot Autofix prepared fixes for 3 of the 3 bugs found in the latest run.

  • ✅ Fixed: MoveToNNIndex bypasses transform for attention policy map
    • Fixed by applying the transform to squares before the attention policy map lookup, ensuring transformed squares are used for canonicalized positions.
  • ✅ Fixed: TransformSquare incorrect for transpose transform flag
    • Fixed by implementing transforms in the correct order (flip, mirror, transpose with swap) to match the encoder's ApplyTransform behavior.
  • ✅ Fixed: Missing Accelerate framework link for test_nn_comparison target
    • Added ${ACCELERATE_FRAMEWORK} to the target_link_libraries for test_nn_comparison on Apple platforms.

- Add policy softmax temperature (1.359) to match Lc0 default
- Simplify TransformSquare logic in policy_map.cpp
- Add missing <algorithm> include for std::exp
- Add comment clarifying value is from side-to-move perspective
Bug fixes:

1. Remove incorrect nn_policy > 0.0f check in expand_node
   - Neural network policy outputs are raw logits, not probabilities
   - Logits can be negative for less preferred moves
   - The check was incorrectly filtering out valid moves with negative logits
   - Now all moves are blended with NN policy regardless of logit sign

2. Cache NN evaluation result from expand_node to avoid redundant evaluation
   - expand_node now accepts an optional EvaluationResult* output parameter
   - When expand_node performs NN evaluation for policy priors, it caches the result
   - The cached result is reused for value evaluation in run_iteration
   - This eliminates redundant NN evaluations that were performing GPU sync
@cursor
Copy link
Copy Markdown

cursor bot commented Feb 8, 2026

Bugbot Autofix prepared fixes for 2 of the 2 bugs found in the latest run.

  • ✅ Fixed: NN policy logit check incorrectly filters negative values
    • Removed the incorrect nn_policy > 0.0f check since neural network policy outputs are raw logits that can be negative, and all moves should be blended regardless of logit sign.
  • ✅ Fixed: Redundant NN evaluation in expand_node fallback path
    • Modified expand_node to accept an optional EvaluationResult output parameter that caches the NN result, which is then reused in run_iteration for value evaluation, eliminating redundant NN calls.

- Updated CMakeLists.txt to reflect new directory structure for source files:
  - Changed main.cpp path to src/app/main.cpp
  - Updated benchmark source paths to src/bench/benchmark_gpu.cpp and src/bench/paper_benchmark.cpp
- Introduced new source files for GPU benchmarking and paper benchmark suite, enhancing performance measurement capabilities for the MetalFish engine.
- Fix Metal network to use actual batch size instead of max_batch_size_
  for GPU inference, eliminating ~256x performance waste in MCTS
- Update CI workflow to download NNUE files to networks/ directory
  to match CMakeLists.txt paths
- Remove incorrect mirror_next/pos.flip() logic in encoder that
  corrupted piece color extraction for historical positions
@cursor
Copy link
Copy Markdown

cursor bot commented Feb 9, 2026

Bugbot Autofix prepared fixes for 3 of the 3 bugs found in the latest run.

  • ✅ Fixed: Metal network always evaluates max batch size
    • Changed eval_batch from max_batch_size_ to batch so GPU inference uses the actual batch size instead of always running 256 positions.
  • ✅ Fixed: NNUE file paths mismatch between CI and CMakeLists
    • Updated all CI workflow NNUE download and copy paths from src/ to networks/ to match the CMakeLists.txt configuration.
  • ✅ Fixed: History encoding flip corrupts piece color extraction
    • Removed the incorrect mirror_next/pos.flip() logic since Stockfish uses absolute coordinates and the vertical flip for BLACK STM is already handled by ReverseBytesInBytes.

Rewrite README with accurate directory layout, current features,
Apple Silicon optimizations, and all three engine modes documented.
Remove outdated references and stale performance numbers.
- Updated CMakeLists.txt to enable ARC for Metal ObjC++ files, enhancing compatibility with MPSGraph.
- Revised README to clarify engine modes, UCI options, and Apple Silicon optimizations.
- Enhanced MCTS and hybrid search implementations, including better GPU resource management and error handling.
- Streamlined evaluator batch processing to reduce overhead and improve efficiency.
- Adjusted Metal network initialization to utilize dynamic batch sizes, eliminating unnecessary zero-padding.
- Cleaned up UCI engine options related to GPU usage and transformer network weights, ensuring clarity in configuration.
- Introduced GatherBatchEvaluator to enable queue-based cooperative batching of evaluation requests, improving efficiency in handling multiple worker submissions.
- Updated ThreadSafeMCTS to utilize GatherBatchEvaluator, allowing workers to submit positions and wait for batch evaluation results.
- Enhanced worker context management by assigning unique IDs for gather slot assignment.
- Improved cancellation handling for waiting workers during evaluation processes.
- Refactored evaluation logic to support batch processing, reducing overhead and improving performance.
@NripeshN NripeshN merged commit 45ae05a into main Feb 9, 2026
5 of 8 checks passed
@NripeshN NripeshN deleted the copilot/implement-lc0-neural-network-inference-again branch February 9, 2026 21:01
Copy link
Copy Markdown

@cursor cursor bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Cursor Bugbot has reviewed your changes and found 1 potential issue.

Bugbot Autofix is ON, but it could not run because the branch was deleted or merged before Autofix could start.

std::string msg4 =
"The default net can be downloaded from: "
"https://github.com/NripeshN/MetalFish/releases/download/nnue/" +
std::string(evalFile.defaultName);
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

NNUE download URL in error message likely invalid

Medium Severity

The NNUE download URL in the fatal error message was changed from the working Stockfish server (tests.stockfishchess.org/api/nn/) to github.com/NripeshN/MetalFish/releases/download/nnue/, but the CI workflows in this same commit still download NNUE files from the old Stockfish URL. This strongly suggests the GitHub releases URL doesn't exist yet. When users hit this exit(EXIT_FAILURE) path because NNUE files are missing, the error message will direct them to a non-working URL, leaving them unable to resolve the issue.

Fix in Cursor Fix in Web

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants