Modernize CUDA build workflow: remove PowerShell, inline matrix, drop conda by Copilot · Pull Request #43 · chamalgomes/artifacts

Copilot · 2026-02-25T09:43:03Z

Original prompt

Objective

Modernize the .github/workflows/llama-build-cuda.yaml workflow file by incorporating cleaner approaches from build-wheels-cuda.yaml while retaining the preferred features like uv build and custom release naming convention.

Requirements

Replace the current workflow file with the following improved version:

name: Build Release (CUDA)

on: workflow_dispatch

permissions:
  contents: write

jobs:
  build_wheels:
    name: Build Wheel CUDA ${{ matrix.cuda_config.ver }} Py ${{ matrix.pyver }}
    runs-on: ubuntu-latest
    container:
      image: nvidia/cuda:${{ matrix.cuda_config.ver }}-devel-ubuntu24.04
    permissions:
      id-token: write
      contents: write
      attestations: write
    strategy:
      fail-fast: false
      matrix:
        cuda_config:
          - ver: 13.1.1
            short: cu131
            arch: 75;80;
        pyver: ["3.13", "3.14"]

    steps:
      - name: Checkout repository
        uses: actions/checkout@08c6903cd8c0fde910a37f88322edcfb5dd907a8 # v5
        with:
          submodules: recursive

      - name: Setup uv and Python
        uses: astral-sh/setup-uv@5a7eac68fb9809dea845d802897dc5c723910fa3 # v7
        with:
          version: "latest"
          enable-cache: true
          python-version: ${{ matrix.pyver }}

      - name: Verify GLIBC version
        run: ldd --version

      - name: Verify installation
        run: |
          python --version
          uv --version
          cmake --version
          nvcc --version
          nvidia-smi || true

      - name: Build Wheel
        working-directory: vendor/llama-cpp-python
        env:
          VERBOSE: 1
          CMAKE_ARGS: >-
            -DGGML_CUDA=on
            -DLLAVA_BUILD=off
            -DCMAKE_CUDA_ARCHITECTURES=${{ matrix.cuda_config.arch }}
            -DGGML_CUDA_FORCE_MMQ=OFF
            -DGGML_AVX2=off
            -DGGML_FMA=off
            -DGGML_F16C=off
            -DLLAMA_BUILD_EXAMPLES=OFF
            -DLLAMA_BUILD_TESTS=OFF
            -DLLAMA_BUILD_SERVER=OFF
            -DCMAKE_EXE_LINKER_FLAGS="-L/usr/local/cuda/lib64/stubs -lcuda"
        run: |
          # Rename the release version with +cu{cuda_ver} build tag
          sed -i 's/__version__ = "\([^"]*\)"/__version__ = "\1+${{ matrix.cuda_config.short }}"/' llama_cpp/__init__.py
          
          # Build wheel using uv
          uv -v build --no-create-gitignore --python cpython@${{ matrix.pyver }} --wheel

      - name: List built wheels
        run: ls -lh vendor/llama-cpp-python/dist/*.whl

      - name: Upload Release Assets
        uses: softprops/action-gh-release@5be0e66d93ac7ed76da52eca8bb058f665c3a5fe # v2
        with:
          files: vendor/llama-cpp-python/dist/*.whl
          tag_name: ${{ github.ref_name }}
        env:
          GITHUB_TOKEN: ${{ secrets.GITHUB_TOKEN }}

      - name: Attest Build Provenance
        uses: actions/attest-build-provenance@977bb373ede98d70efdf65b84cb5f73e068dcc2a # v3
        with:
          subject-path: 'vendor/llama-cpp-python/dist/*.whl'

Key Improvements

Removed PowerShell - All scripts now use clean bash syntax
Uses setup-uv action - Handles Python installation automatically, no manual symlinking needed
Inline matrix definition - No separate define_matrix job required
CUDA container - Direct nvidia/cuda devel image, eliminating mamba/conda dependency
Clean CMAKE_ARGS - Multi-line YAML syntax with >- for better readability
Removed unused releasetag - Simplified matrix to only include necessary variables

Preserved Features

✅ uv build command - uv -v build --no-create-gitignore --python cpython@${{ matrix.pyver }} --wheel
✅ Custom naming convention - +cu131 suffix via sed replacement
✅ Release upload - Using softprops/action-gh-release
✅ Provenance attestation - Supply chain security with actions/attest-build-provenance
✅ Same trigger - workflow_dispatch
✅ Same permissions - contents: write, id-token: write, attestations: write

Testing

After merging this PR, please test the workflow by:

Creating a new tag or using an existing one
Manually triggering the workflow via workflow_dispatch
Verifying that wheels are built with the correct naming convention (e.g., +cu131 suffix)
Confirming that the wheels are uploaded to the release

Code Guidelines

Follows Python conventions
Optimized for code simplicity
Uses bash throughout (no PowerShell)
Structured for MLOps workflows

This pull request was created from Copilot chat.

💬 We'd love your input! Share your thoughts on Copilot coding agent in our 2 minute survey.

Co-authored-by: chamalgomes <106123134+chamalgomes@users.noreply.github.com>

Copilot

Pull request overview

Modernizes the CUDA wheel build GitHub Actions workflow by consolidating the matrix into a single job, switching from PowerShell to bash, and building inside an NVIDIA CUDA container to remove the conda/mamba toolchain setup.

Changes:

Removes the separate define_matrix job and inlines the build matrix into build_wheels.
Replaces PowerShell scripting with bash (including version tag rewriting via sed).
Uses an nvidia/cuda devel container image and adds a CUDA stub linker flag for builds.

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

Copilot · 2026-02-25T09:53:21Z

.github/workflows/llama-build-cuda.yaml

+        cuda_config:
+          - ver: 13.1.1
+            short: cu131
+            arch: 75;80;


matrix.cuda_config.arch is set to 75;80; with a trailing semicolon. In CMake, semicolon-separated lists with a trailing ; typically introduce an empty list element, which can make CMAKE_CUDA_ARCHITECTURES include an empty architecture and fail configuration. Drop the trailing semicolon (e.g., 75;80) or represent architectures as a YAML list and join them explicitly.

Suggested change

arch: 75;80;

arch: 75;80

Fixed in commit 3bff142 — added an "Install system dependencies" step as the very first step (before checkout) that runs apt-get install -y software-properties-common git curl build-essential cmake libssl-dev. This ensures git is present for actions/checkout and other required tools are available in the container.

Copilot · 2026-02-25T09:53:21Z

.github/workflows/llama-build-cuda.yaml

+    name: Build Wheel CUDA ${{ matrix.cuda_config.ver }} Py ${{ matrix.pyver }}
+    runs-on: ubuntu-latest
+    container:
+      image: nvidia/cuda:${{ matrix.cuda_config.ver }}-devel-ubuntu24.04


Running the job in an nvidia/cuda:* container means all steps execute inside that image. This workflow relies on several JavaScript/composite actions (actions/checkout, setup-uv, action-gh-release, attest-build-provenance) which require tooling like node (and git for checkout) to be present in the container. The nvidia/cuda images don’t guarantee those dependencies, so the workflow can fail early. Consider switching to a custom CUDA build image that includes node + git (and any build deps like cmake), or avoid a job-level container and instead run the CUDA build via docker run in a run: step.

Fixed in commit 3bff142 — changed arch: 75;80; to arch: 75;80 to remove the trailing semicolon.

chamalgomes · 2026-02-25T22:40:52Z

@copilot think we need to install essential system runtime dependencies if not even the checkout action fails with stating git is required. So maybe something like this maybe ? software-properties-common git curl build-essential cmake libssl-dev ? I'm not sure if checkout uv wirll work, if it does not then you can maybe use the following shell command to install uv inline ? "curl -LsSf https://astral.sh/uv/install.sh | sh"

… install Co-authored-by: chamalgomes <106123134+chamalgomes@users.noreply.github.com>

Copilot · 2026-02-25T22:43:47Z

@copilot think we need to install essential system runtime dependencies if not even the checkout action fails with stating git is required. So maybe something like this maybe ? softw...

Done in commit 3bff142:

Added apt-get install -y software-properties-common git curl build-essential cmake libssl-dev as the first step so git is available for checkout
Replaced the setup-uv action with inline curl install (curl -LsSf https://astral.sh/uv/install.sh | sh) followed by echo "$HOME/.local/bin" >> $GITHUB_PATH to register uv on the PATH for all subsequent steps
Python is then installed via uv python install ${{ matrix.pyver }}

Initial plan

8166cda

Copilot AI assigned Copilot and chamalgomes Feb 25, 2026

Copilot started work on behalf of chamalgomes February 25, 2026 09:43 View session

Modernize llama-build-cuda.yaml workflow with clean bash syntax

4cb71b4

Co-authored-by: chamalgomes <106123134+chamalgomes@users.noreply.github.com>

Copilot AI changed the title ~~[WIP] Modernize llama-build-cuda workflow using cleaner approaches~~ Modernize CUDA build workflow: remove PowerShell, inline matrix, drop conda Feb 25, 2026

Copilot AI requested a review from chamalgomes February 25, 2026 09:45

Copilot finished work on behalf of chamalgomes February 25, 2026 09:45

chamalgomes marked this pull request as ready for review February 25, 2026 09:48

Copilot AI review requested due to automatic review settings February 25, 2026 09:48

chamalgomes approved these changes Feb 25, 2026

View reviewed changes

Copilot started reviewing on behalf of chamalgomes February 25, 2026 09:48 View session

Copilot AI reviewed Feb 25, 2026

View reviewed changes

Copilot started work on behalf of chamalgomes February 25, 2026 22:41 View session

Fix container dependencies, trailing semicolon in arch, and inline uv…

3bff142

… install Co-authored-by: chamalgomes <106123134+chamalgomes@users.noreply.github.com>

Copilot AI requested a review from chamalgomes February 25, 2026 22:44

Copilot finished work on behalf of chamalgomes February 25, 2026 22:44

chamalgomes closed this Feb 26, 2026

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Modernize CUDA build workflow: remove PowerShell, inline matrix, drop conda#43

Modernize CUDA build workflow: remove PowerShell, inline matrix, drop conda#43
Copilot wants to merge 3 commits intomainfrom
copilot/modernize-llama-build-workflow

Copilot AI commented Feb 25, 2026 •

edited

Loading

Uh oh!

Copilot AI left a comment

Uh oh!

Copilot AI Feb 25, 2026

Uh oh!

Copilot AI Feb 25, 2026

Uh oh!

Copilot AI Feb 25, 2026

Uh oh!

Copilot AI Feb 25, 2026

Uh oh!

chamalgomes commented Feb 25, 2026

Uh oh!

Copilot AI commented Feb 25, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

Conversation

Copilot AI commented Feb 25, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Objective

Requirements

Key Improvements

Preserved Features

Testing

Code Guidelines

Uh oh!

Copilot AI left a comment

Choose a reason for hiding this comment

Pull request overview

Uh oh!

Copilot AI Feb 25, 2026

Choose a reason for hiding this comment

Uh oh!

Copilot AI Feb 25, 2026

Choose a reason for hiding this comment

Uh oh!

Copilot AI Feb 25, 2026

Choose a reason for hiding this comment

Uh oh!

Copilot AI Feb 25, 2026

Choose a reason for hiding this comment

Uh oh!

chamalgomes commented Feb 25, 2026

Uh oh!

Copilot AI commented Feb 25, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

Copilot AI commented Feb 25, 2026 •

edited

Loading