Skip to content

Optimize CI for wolfProvider#400

Open
aidangarske wants to merge 34 commits into
wolfSSL:masterfrom
aidangarske:ci-draft-pause
Open

Optimize CI for wolfProvider#400
aidangarske wants to merge 34 commits into
wolfSSL:masterfrom
aidangarske:ci-draft-pause

Conversation

@aidangarske
Copy link
Copy Markdown
Member

@aidangarske aidangarske commented May 23, 2026

Description

  • trigger OSP projects to run nightly and send slack message if fail
  • dynamically get latest wolfssl version and openssl version
  • (All OSP where getting tested by 3.0.20 from debian:bookworm not 3.5.4)
  • add ubsan and asan for WP specifically
  • Add smoke tests for draft
  • Only test on status "open" only smoke on draft
  • no apt-get use ghcr container

Copilot AI review requested due to automatic review settings May 23, 2026 06:27
@aidangarske aidangarske marked this pull request as draft May 23, 2026 06:30
@aidangarske aidangarske changed the title ci: pause non-smoke workflows on draft PRs, add smoke preflight Optimize CI for wolfProvider May 23, 2026
@aidangarske aidangarske reopened this May 23, 2026
@aidangarske aidangarske self-assigned this May 23, 2026
@aidangarske aidangarske requested review from dgarske and removed request for Copilot May 23, 2026 06:43
Reduce per-PR job count by dropping coverage axes that don't change
behavior on PR. The full sweeps that left the PR matrix move to a
nightly schedule in a follow-up commit.

- simple.yml: 48 -> 4 jobs. Drop --debug axis (debug only on nightly).
  Trim openssl_ref to newest 3.5.x + oldest 3.0.x (was 6 versions).
  Keep one wolfssl_ref (v5.8.4-stable).
- git-ssh-dr.yml: 16 -> 8 jobs. key_type [rsa, ed25519] on PR (was 4
  values), iterations 3 (was 10).
- curl.yml: drop second curl_ref on PR.
- openssh.yml: drop second openssh_ref on PR.

paths-ignore on every pull_request trigger so docs/README-only PRs
skip the full CI sweep. List covers **.md, docs/, LICENSE*, README*,
CHANGELOG*, .github/ISSUE_TEMPLATE/**, dependabot config, .gitignore,
AUTHORS, COPYING.

xmlsec.yml is skipped (its pull_request: block is commented out).
The existing draft-PR skip already prevents heavy CI from running on
WIP PRs, but every workflow still spins up its full matrix the moment
a PR is marked ready-for-review. Add a leading wait_for_smoke job to
every workflow that calls build-wolfprovider.yml so the smoke build
acts as the single cross-workflow gate.

How it works:
- wait_for_smoke runs the .github/actions/wait-for-smoke composite
  action. The action is a no-op on non-PR events (push to master
  exits immediately).
- On a pull_request, it polls the Smoke Test workflow run for the
  head SHA and exits success when smoke passes, failure when it
  fails, success when smoke was skipped (e.g. paths-ignore).
- Downstream build_wolfprovider needs wait_for_smoke, so a failed
  smoke skips the entire heavy build/test matrix for that workflow.
- Draft PRs still skip via the existing `if:` (wait_for_smoke is
  skipped, dependents skip transitively).

Applies to 38 workflows. xmlsec.yml is skipped because its
pull_request trigger is commented out (push-to-master only).

Estimated savings on a broken-build PR: ~280 jobs that previously
ran to apt-get or build failure now never start.
Stop hardcoding wolfSSL versions in each workflow. Borrow wolfTPM's
pattern from .github/workflows/wolfssl-versions-pqc.yml: resolve the
highest v*-stable tag at run time via git ls-remote and pass it down
as a workflow output.

New: .github/workflows/_discover-wolfssl.yml -- reusable workflow that
emits `latest_stable` output (e.g. v5.9.2-stable).

Applied to workflows that build wolfSSL from source:
- simple.yml: matrix.wolfssl_ref dropped; build step pulls from
  needs.discover_versions.outputs.latest_stable.
- smoke-test.yml: stable row uses discovered tag; master row stays.
- libtss2.yml: same swap as simple.yml.

Not applied to workflows that consume pre-built .deb packages from
ghcr.io (the 36 Debian app workflows). Those .debs are built by a
Jenkins debian-export job that currently bakes v5.8.4-stable into the
package contents. Switching wolfssl_ref to latest_stable would just
mislabel the artifact name -- the library shipped in the deb would
still be v5.8.4. Follow-up needed on the Jenkins side to track
latest-stable too; that change is out of scope for this PR.

multi-compiler.yml is left alone: it intentionally pins v5.8.0-stable
on one matrix row as a backward-compat check.
Reliability (Phase D):
- build-wolfprovider.yml: retry the ORAS install download (GitHub
  releases occasionally flake) and the ORAS pull of pre-built .deb
  packages from ghcr.io (the biggest single flake source per the
  triage). Three attempts with linear backoff (10s/15s).
- Same retry wrapper around the Yocto WIC pull and the apt-get
  install of xz-utils.

fail-fast: false on every multi-shard matrix (44 files, 73 matrix
blocks). Reason: one apt-mirror or container-pull flake on shard 3
of 8 used to kill all 8 shards and force a full re-run. Now the
healthy shards still report.

Nightly full sweep (Phase E2):
- New .github/workflows/nightly-full-sweep.yml on cron '0 6 * * *'
  plus workflow_dispatch.
- Restores the simple.yml axes that PRs dropped: 2 wolfssl x 6
  openssl x 2 debug x 2 replace_default = 48 jobs.
- Curl/openssh/git-ssh-dr nightly expansions are a follow-up; those
  workflows need a workflow_call refactor first.
Adding a 48-job cron sweep to a repo that is already throttled on
shared runners works against the goal of this PR. The coverage that
left the PR matrix (older OpenSSL versions, --debug variant, second
external project refs) is not lost: it remains in simple.yml's full
matrix on master pushes, and a follow-up can wire a workflow_dispatch
trigger so a release engineer can fan it out on demand.
Every app workflow doubled its test matrix on
force_fail: ['WOLFPROV_FORCE_FAIL=1', ''] -- two jobs that shared the
same build artifact, the same apt-installed dependencies, and ~95% of
the same wall time, just to flip one env var.

Drop the matrix axis and run both modes back-to-back inside one job.
Each test step now contains:

    # --- normal mode ---
    <test cmd>  | tee X-test-normal.log
    check-workflow-result.sh $? "" X
    # --- force-fail mode ---
    export WOLFPROV_FORCE_FAIL=1
    <test cmd>  | tee X-test-ff.log
    check-workflow-result.sh $? "WOLFPROV_FORCE_FAIL=1" X

42 workflows, ~72 jobs/PR removed without dropping coverage.

Special cases handled in-place:
- bind9.yml: split the combined build-and-test step so autoreconf +
  configure + make run once, tests run twice.
- hostap.yml: hwsim VM runs require rewriting vm/inside.sh per mode.
  Two setup steps and two test rounds (smoke + EAP) per mode.
- curl.yml: cert-gen branch on curl=master is now unconditional for
  that ref (no longer gated on force_fail).
- openldap.yml: keep the 15-minute timeout and exit-code 124 handling
  only on the force-fail round (WPFF hangs on test 067).
- openvpn.yml: set +e/-e wrapped per round (force-fail run is allowed
  to return non-zero; check-workflow-result.sh interprets).
- grpc.yml, x11vnc.yml: extracted a per-round shell function so the
  matrix.tests loop / test_x11vnc.sh wrapper isn't duplicated by hand.
- iperf.yml: kill+wait the iperf3 server between rounds so the second
  round can bind the same port.
- sssd.yml: set +e/-e only around the force-fail make check (the
  pre-existing "if WOLFPROV_FORCE_FAIL set +e" conditional collapsed).
- xmlsec.yml: two test-keys + check-enc log pairs.
- debian-package.yml: use env to inject REPLACE_DEFAULT, ISFIPS, and
  optionally WOLFPROV_FORCE_FAIL into do-cmd-tests.sh.
Eliminate per-job apt-get update / apt-get install across every
Debian-container PR workflow. The Debian mirror flake class is gone:
zero apt-get update calls during a normal PR run.

New: docker/wolfprovider-test-deps/Dockerfile
  Single Debian-bookworm image with every build dep, test fixture
  binary, perl/python module, and X11/netlink/TPM/PCSC dev header
  any workflow installs at job time today. Built once, cached
  across jobs.

New: .github/workflows/publish-test-deps-image.yml
  Builds and pushes the image to ghcr.io on push to master when the
  Dockerfile changes, or via workflow_dispatch. Forks just build to
  verify (no push). Tagged :bookworm and :bookworm-<sha>.

Migrated 39 workflows to image: ghcr.io/wolfssl/wolfprovider-test-deps:bookworm
  bind9, cjose, curl, debian-package, git-ssh-dr, grpc, hostap,
  iperf, krb5, libcryptsetup, libeac3, libfido2, libhashkit2,
  libnice, liboauth2, librelp, libssh2, libtss2, libwebsockets,
  net-snmp, nginx, openldap, opensc, openssh, openvpn, pam-pkcs11,
  ppp, python3-ntp, qt5network5, rsync, socat, sscep, stunnel,
  systemd, tcpdump, tnftp, tpm2-tools, x11vnc, xmlsec.

  libtss2.yml moved from bare ubuntu-22.04 into the container, with
  its libssl-dev removal preserved (configure needs to pick up only
  the in-tree openssl headers built by scripts/build-wolfprovider.sh).

Removed redundant steps in those workflows -- the dep installs they
performed are now baked into the image. Kept:
  - apt install of the wolfssl/openssl/wolfprov .debs that arrive
    from build-wolfprovider artifacts;
  - apt-mark hold of the wolfprov-patched libssl3 chain;
  - debian-package.yml's apt-get remove --purge libwolfprov lifecycle
    check;
  - hostap.yml's apt-get remove python3-cryptography + pip install
    cryptography (test scripts require the pip wheel, not the apt one).

Not migrated:
  - build-wolfprovider.yml: keeps the existing wolfssl-built
    ghcr.io/wolfssl/build-wolfprovider-debian:bookworm. It is the
    .deb-producer container, separate concern.
  - sssd.yml: keeps quay.io/sssd/ci-client-devel; the SSSD CI image
    bundles upstream test fixtures we cannot easily replicate here.
  - multi-compiler.yml: bare runner intentionally; the matrix
    installs gcc-9..13 and clang-13..17 to exercise compiler
    compat, and baking those into the image would balloon it.
  - static-analysis.yml: schedule-only, leaving as bare runner.
Copilot AI review requested due to automatic review settings May 25, 2026 19:18
@aidangarske aidangarske marked this pull request as ready for review May 25, 2026 19:25

This comment was marked as resolved.

…ew fix)

Was: every workflow pulled ghcr.io/wolfssl/wolfprovider-test-deps:bookworm,
which doesn't exist until upstream master runs the publish workflow.
Bootstrap chicken-and-egg.

Now: publish-test-deps-image.yml fires on any branch push (and PRs)
and pushes to ghcr.io/<repo-owner>/wolfprovider-test-deps:bookworm.
Consumer workflows read from the PR head's owner when on a PR, else
the running repo's owner. Result: a fork PR publishes to the fork's
ghcr namespace and pulls from it; master pushes publish to the org's
ghcr namespace and pulls from it.

Also fixes copilot review feedback from
wolfSSL#400 (review)

- Phase B log filename renames broke check-workflow-result.sh's
  hardcoded log paths (curl-test.log, openvpn-test.log, sssd-test.log,
  net-snmp-test.log, nginx-test.log, openssh-test.log, tcpdump-test.log,
  liboauth2-test.log, stunnel-test.log) plus in-step greps in cjose,
  libcryptsetup, libfido2, libhashkit2, libtss2, opensc, python3-ntp,
  qt5network5, tnftp, tpm2-tools. Reverted log names back to
  <app>-test.log; second mode overwrites first.
- libtss2.yml: fix `if $(grep -q ...)` (invalid shell -- command
  substitution of grep used as the if condition expanded to an empty
  command). Use `if grep -q ...; then`.
- opensc.yml: fix `TEST_RESULT=$(((grep ...) && echo 0 || echo 1))`
  (arithmetic expansion `(( ))` can't contain shell commands). Hoist
  to a check_opensc_log() function called from both modes.
- stunnel.yml: `grep -c "failed: 0"` returns 1 on success, but
  check-workflow-result.sh expects TEST_RESULT==0 for pass.
  Use `if grep -q ...; then TEST_RESULT=0; else TEST_RESULT=1; fi`.
  Also mirror tests/logs/results.log to stunnel-test.log so the
  force-fail check finds the expected file.
- hostap.yml: drop continue-on-error from the normal-mode test step.
  Without it the step's exit code was swallowed and normal-mode test
  failures didn't fail the job.

One-time setup: after this lands, the owner of each fork that opens a
PR has to make their ghcr.io/<owner>/wolfprovider-test-deps package
public (GitHub UI: Packages -> Package settings -> Change visibility).
GitHub's Actions runners can only pull public packages from another
namespace.
build fail:
  libunwind-14-dev : Conflicts: libunwind-dev
  E: Unable to correct problems, you have held broken packages.

libunwind-dev is a virtual that resolves to libunwind-14-dev on
bookworm; explicitly requesting it conflicts when another package
already pulled in the versioned form. Nothing in our workflows directly
asks for libunwind-dev, so just drop it -- it'll come in transitively.
The matrix on every Debian-container workflow was claiming
openssl_ref: 'openssl-3.5.4', but the wolfprov .deb on ghcr.io is built
by patching Debian Bookworm's stock libssl3 source -- which is currently
3.0.20. So the matrix label has been lying about what's actually
installed and tested. The wolfssl_ref was likewise pinned and could
drift.

Replaces .github/workflows/_discover-wolfssl.yml with
.github/workflows/_discover-versions.yml that resolves both at run time:

  - wolfSSL latest -stable tag via git ls-remote (same as before).
  - Debian Bookworm's currently-resolvable OpenSSL via
    `docker run --rm debian:bookworm apt-cache madison openssl`,
    stripping the Debian revision suffix.

Outputs both plain (`wolfssl_ref`) and JSON-array (`wolfssl_ref_array`)
forms; matrix consumers use the array form via fromJson.

Wired into every workflow that calls build-wolfprovider.yml (38 heavy
workflows + openssl-version.yml's wolfssl axis + the three workflows
that previously used the wolfssl-only resolver). Each gets a
`discover_versions` job that the build_wolfprovider and test_X jobs
depend on.

Today's resolution: wolfssl=v5.8.4-stable, openssl=openssl-3.0.20.
When Bookworm bumps to 3.0.21 (or whenever) the label tracks
automatically -- no CI edit needed.
PR CI was burning runner time on 40 OSP integration workflows that
each spin up multiple matrix jobs, install a Debian container, install
.debs, and run upstream test suites -- on every push. That's the
runner-throttling we've been hitting. Move all of that to nightly.

OSP workflows -> reusable + dispatch-only
=========================================
40 workflows converted from `on: pull_request + push` to
`on: workflow_call + workflow_dispatch`. PRs no longer trigger them.
The `wait_for_smoke` job inside each is removed -- nightly doesn't
have a smoke gate (smoke gates the open-PR fast feedback loop, not
scheduled runs). Upstream matrices restored where Phase A had trimmed
them:
  - curl: curl_ref back to [curl-8_4_0, curl-7_88_1]
  - openssh: openssh_ref back to [V_10_0_P2, V_9_9_P1]
  - git-ssh-dr: key_type back to all four, iterations back to 10

The 40 OSPs: bind9, cjose, curl, debian-package, git-ssh-dr, grpc,
hostap, iperf, krb5, libcryptsetup, libeac3, libfido2, libhashkit2,
libnice, liboauth2, librelp, libssh2, libtss2, libwebsockets, net-snmp,
nginx, openldap, opensc, openssh, openvpn, pam-pkcs11, ppp,
python3-ntp, qt5network5, rsync, socat, sscep, sssd, stunnel, systemd,
tcpdump, tnftp, tpm2-tools, x11vnc, xmlsec.

New nightly orchestrator (.github/workflows/nightly-osp.yml)
============================================================
`schedule: 0 6 * * *` + workflow_dispatch. Fans out all 40 OSP
workflows in parallel via `uses:` and aggregates results in a
`notify` job that:
  - Always runs (`if: always()`) so failures don't suppress the report.
  - Parses `toJSON(needs)` to build pass/fail lists with jq:
      to_entries[] | select(.value.result != "success") | "\(.key) (\(.value.result))"
    (the `[]` stream is load-bearing -- `map(...)` then `.[].key`
    inside a string template is malformed jq.)
  - Posts a green/red Slack attachment to SLACK_WEBHOOK_URL, with
    `curl -fsS` so HTTP errors actually fail the workflow.
  - Writes the same summary to $GITHUB_STEP_SUMMARY so the run page
    is readable even when SLACK_WEBHOOK_URL isn't set.
  - SLACK_WEBHOOK_URL is read at JOB-level env so the step `if:` can
    see it. Step-level env is not in scope for that step's own `if:`.

ASan + UBSan workflow (.github/workflows/sanitizers.yml)
========================================================
Builds OpenSSL, wolfSSL, and wolfProvider from source under
-fsanitize=address,undefined -fno-omit-frame-pointer
-fno-sanitize-recover=all -static-libasan, then runs do-cmd-tests.sh
against the instrumented binaries. ASAN_OPTIONS and UBSAN_OPTIONS set
to halt on first hit so we don't drown in cascades. Versions come
from _discover-versions.yml. Gated on smoke. Runs on PR.

wait_for_smoke kept where it matters
====================================
After the OSP move, the PR-triggered workflows that build wolfProvider
all gate on smoke: simple, cmdline, fips-ready, openssl-version,
seed-src, multi-compiler, sanitizers. codespell stays ungated (it
doesn't build wolfprov).

Requires repo secret SLACK_WEBHOOK_URL for the Slack push to fire;
absent it the workflow still runs and writes the summary to the job
output.
The wait_for_smoke job (composite action that polls Smoke Test on the
PR head SHA) was forcing every other PR workflow to wait for smoke
before kicking off. End result: smoke + multi-compiler + sanitizers
were serialized when they could be parallel.

What we wanted from this gate -- "don't burn CI on a broken build" --
isn't actually saving much. Smoke takes the same wall time as the
shortest other workflow, and PR-mode draft-skip already prevents the
sweep on WIP PRs. The gate was holding back the open-PR signal more
than it was saving runner-minutes.

Strip it everywhere:
  - simple, cmdline, fips-ready, openssl-version, seed-src,
    multi-compiler, sanitizers
  - the OSP workflows already had it removed in the nightly move.

Draft-skip stays. Smoke itself still runs on PR -- it's just no longer
a barrier in front of everything else.

The .github/actions/wait-for-smoke composite action stays in the tree;
nothing references it now, but it's small and harmless to keep around
in case someone wants to opt a specific workflow back into it.
_discover-versions.yml now emits wolfssl_ref_array=["master","<latest_stable>"]
instead of a single-element array. Every matrix consumer of that
output -- the 40 OSP workflows already use fromJson on it -- now
iterates both refs.

Wired wolfssl_ref into the matrix of the workflows that were still
using the singular `outputs.wolfssl_ref`:
  - simple.yml: matrix gains wolfssl_ref (now 8 jobs / PR, was 4)
  - cmdline.yml: matrix.wolfssl_ref switches from ['v5.8.4-stable']
    to the array form, also adds a discover_versions job
  - libtss2.yml: matrix.wolfssl_ref added; build step reads it
  - sanitizers.yml: matrix added so ASan+UBSan exercises both refs

The singular `wolfssl_ref` output still resolves to latest-stable for
the few non-matrix consumers (none currently, but the API stays
backwards-compatible).

Nightly OSP now runs every OSP workflow with master AND latest-stable
in parallel, which is the load nightly was built to absorb.
…orkflows

_discover-versions.yml now resolves a third version too:

  openssl_latest_ref       e.g. openssl-3.5.4
  openssl_latest_ref_array e.g. ["openssl-3.5.4"]

Sourced from upstream openssl/openssl via git ls-remote, filtered to
release-shaped tags (^openssl-3\.[0-9]+\.[0-9]+$ -- no -alpha/-beta/-pre).

Wired into the workflows that build OpenSSL from source and had been
pinning a "latest" version by hand:

  - simple.yml      newest slot was openssl-3.5.4
  - libtss2.yml     was openssl-3.5.4
  - sanitizers.yml  was using openssl_ref (Bookworm 3.0.20!)
  - smoke-test.yml  was openssl-3.5.4
  - cmdline.yml     was openssl-3.5.0
  - fips-ready.yml  was openssl-3.5.0
  - seed-src.yml    newest slot was openssl-3.5.4 (also added discover_versions)

Untouched on purpose:
  - the OSP workflows that consume the wolfprov .deb (openssl_ref
    stays as Bookworm 3.0.20 -- that's what the .deb actually carries)
  - openssl-version.yml's matrix (its whole job is to sweep specific
    versions, not always-latest)
  - the "oldest LTS" slots (openssl-3.0.17) in simple.yml and
    seed-src.yml -- those still exercise the older series intentionally.

The user-facing diff is small for now (3.5.4 was already the upstream
latest), but next release the matrix labels track upstream
automatically.
Replace the 46-line static list of openssl-3.x.y matrix entries with
an output from _discover-versions.yml that resolves the complete set
of upstream release-shaped tags at run time.

New resolver output: openssl_all_releases_array.

Implementation:
  git ls-remote --tags --refs https://github.com/openssl/openssl.git
    | grep -E '^openssl-3\.[0-9]+\.[0-9]+$'    # strip alphas/betas
    | sort -V
    | awk '/^openssl-3\.0\.3$/ {p=1} p'        # floor at historical
                                                # oldest in the static list

The floor is important: upstream actually tags openssl-3.0.0,
openssl-3.0.1, openssl-3.0.2, but the previous static matrix
intentionally excluded those. Preserve that exclusion so we don't
silently regress into running those versions.

Net effect today vs. the static matrix:
  - same 46 entries the static list had, plus everything upstream
    has shipped since (openssl-3.5.5, openssl-3.6.1, openssl-3.6.2,
    etc.). Confirmed locally: 58 tags currently.
  - the highest entry (`openssl_latest_ref`, used by source-build
    workflows in the previous commit) is now openssl-3.6.2 today
    rather than openssl-3.5.4, which the previous resolver topped at
    by mistake (.5.x is the latest 3.5 patch, not the latest release).

continue-on-error stays true on the openssl_version_test job, so a
broken openssl release tag doesn't fail the workflow.
… PR)

The dynamic resolver turned openssl-version.yml's matrix into ~58
upstream openssl-3.X.Y releases x 2 wolfssl refs = ~116 jobs. That's
the load nightly was built to absorb, not something to fire on every
push.

  - openssl-version.yml: on: { workflow_call, workflow_dispatch }
    (was push + pull_request). The `if: github.event.pull_request.draft`
    guards are removed -- workflow_call inherits caller context.
  - nightly-osp.yml: added openssl-version to both the dispatch list
    and the notify job's `needs:` so it shows up in the Slack summary
    alongside the OSP integration results.

PR-side OpenSSL coverage stays adequate via simple.yml:
  wolfssl_ref:  master + latest-stable        (resolved dynamically)
  openssl_ref:  openssl_latest_ref + openssl-3.0.17

2 x 2 = 4 combos x 2 replace_default = 8 jobs. Exercises latest
upstream and the oldest still-maintained 3.0.x LTS, against both
wolfssl master and the latest -stable tag, on every PR. The full
58-version sweep runs once a night.
After Phase B collapsed force_fail off the matrix axis, sssd.yml was
left with:

  matrix:
    sssd_ref: [ '2.9.1' ]
    wolfssl_ref: [ 'master', 'v5.8.0-stable' ]
    ...
    exclude:
      - sssd_ref: 'master'
        force_fail: 'WOLFPROV_FORCE_FAIL=1'

force_fail isn't a matrix key anymore, so the parser rejected the
exclude with:

  Matrix exclude key 'force_fail' does not match any key within
  the matrix

(which is what HTTP 422 from `gh workflow run nightly-osp.yml` was
surfacing -- the orchestrator couldn't load this reusable workflow).

The exclude was also dead code on master: there's no sssd_ref=master
in the matrix, only '2.9.1'. The intended skip was wolfssl_ref=master
+ force_fail (sssd is known-broken under WPFF when built against
wolfssl master). Express that intent inline in the test step: skip
the force-fail round when wolfssl_ref=master.
Symptom: gh workflow run nightly-osp.yml fired all 41 OSP workflows
but most got "cancelled" within seconds of starting. Only the last
few to start (libtss2, openssl-version) actually ran.

Root cause: each OSP workflow has

  concurrency:
    group: ${{ github.workflow }}-${{ github.ref }}
    cancel-in-progress: true

For workflow_call'd reusable workflows, github.workflow evaluates to
the CALLER's workflow name -- "Nightly OSP Suite" for everything the
orchestrator fans out. Result: all 41 workflows share the same
concurrency group "Nightly OSP Suite-refs/heads/ci-draft-pause"
with cancel-in-progress:true, so each new OSP that started cancelled
all the OSPs that had already started.

The OSP workflows no longer have push/pull_request triggers
(workflow_call + workflow_dispatch only), so their own concurrency
control isn't needed -- the nightly-osp.yml orchestrator's own
`concurrency: nightly-osp` handles repeat full-suite runs.

Stripped the top-level concurrency block from all 41 reusable
workflows (40 OSPs + openssl-version).
build-wolfprovider.yml gated its `Login to ghcr.io` and
`Download pre-built packages from ghcr.io` steps on
`github.repository == 'wolfSSL/wolfProvider'`. On a fork run
(aidangarske/wolfProvider firing nightly-osp.yml), that condition
is false, so the .deb pull was silently skipped, the package
directories stayed empty, `dpkg -i .../*.deb` was a no-op, and
wolfprov's configure failed with "could not locate wolfSSL".

The published .debs (ghcr.io/wolfssl/wolfprovider/debs:*) are
public, so anonymous pulls work regardless of which repo's CI
is running. Drop the fork guard. Login is best-effort
(continue-on-error: true) -- it helps rate limits when a token is
available, but anonymous pulls keep working for forks without
write-scope tokens against wolfssl's namespace.

Also use github.actor for the login username instead of
github.repository_owner so the token's actual user is used
(matters on fork runs where repository_owner is the fork owner,
not the actor).
Default ghcr visibility for a newly-created container package is
private. That breaks fork-CI runs (the consumer workflows on
aidangarske/wolfProvider can't pull the image from
ghcr.io/aidangarske/wolfprovider-test-deps because the fork's
GITHUB_TOKEN doesn't authorize cross-namespace reads).

Add a Mark-package-public step after the build/push that PATCHes
visibility=public:

  - Detects whether github.repository_owner is an Organization or
    User and hits the right endpoint:
      orgs/wolfSSL/packages/container/wolfprovider-test-deps
      user/packages/container/wolfprovider-test-deps
  - Uses GH_PACKAGES_ADMIN_TOKEN if the repo has it (a PAT with
    admin:packages scope), else falls back to GITHUB_TOKEN. The
    fallback may not have enough scope on first creation; if so the
    step is `continue-on-error: true` so the publish itself still
    succeeds and the visibility just needs to be flipped manually
    once via the GitHub UI. After that, the package is public and
    future runs are no-ops.

Skipped on fork PRs (same as the push step) -- no perms to flip
visibility on a remote repo's namespace.
…vate)

Earlier commits tried to make fork CI work by:
  - having publish-test-deps-image.yml push to a per-owner ghcr namespace
    (ghcr.io/<owner>/wolfprovider-test-deps)
  - having consumer workflows pull from the PR head's owner
  - auto-PATCHing the test-deps package to visibility=public
  - dropping the `github.repository == 'wolfSSL/wolfProvider'` guard on
    the wolfprov-debs ORAS pull in build-wolfprovider.yml

That path only works if the packages can be public, which they can't
(some of the .debs contain commercially-licensed bits). Revert to the
canonical-only behavior:

publish-test-deps-image.yml
  - fires only on push to master/main (was '**')
  - guards the publish on github.repository == 'wolfSSL/wolfProvider'
  - drops the per-owner namespace; always pushes to
    ghcr.io/wolfssl/wolfprovider-test-deps
  - removes the Mark-package-public step

build-wolfprovider.yml
  - restores the github.repository == 'wolfSSL/wolfProvider' guard on
    the Login, Download .debs, and Download WIC steps

39 consumer workflows
  - container.image reverted from the per-owner expression back to the
    literal ghcr.io/wolfssl/wolfprovider-test-deps:bookworm

Practical effect: PR CI and nightly only run on the canonical repo
(or once PR wolfSSL#400 merges, on wolfSSL/wolfProvider's runners). Fork
pushes will skip the wolfprov-deb pull and any container-using job
will fail loud at the image pull -- which is the right signal: those
runs need to happen on the canonical repo.
…idation)

Add pull_request trigger to nightly-osp.yml so PR wolfSSL#400's reviewers
can see the dispatcher actually fan all 41 reusable workflows out
and the notify job hit Slack.

Marked temporary in the file header -- revert this trigger before
merging if you don't want the full nightly job set firing on every
PR. (For everyday CI, scheduled + workflow_dispatch is the intended
shape.)

Note: PR runs from forks will still hit the private-package issue
for the wolfprov-debs pull (the wolfSSL/wolfProvider repo guard
short-circuits the ORAS step on non-canonical repos). The plumbing
itself -- dispatch, discover-versions, notify, Slack -- runs
regardless and is what this PR-trigger lets you verify end-to-end.
Adds aidangarske/wolfProvider to the publish workflow's repository
allowlist so PR wolfSSL#400's working branch can bootstrap a test-deps
image on the fork's ghcr namespace. Pushed image lands at
ghcr.io/aidangarske/wolfprovider-test-deps:bookworm.

Also adds 'ci-draft-pause' to the branches list (alongside master/
main) so a push to that branch triggers the workflow without needing
a separate workflow_dispatch.

Consumer workflows continue to pull from ghcr.io/wolfssl/... so this
fork-side push is purely for the fork owner to verify the
build/push pipeline works end to end before PR merges. After merge,
the canonical wolfSSL/wolfProvider master push will publish the
authoritative image and consumers will find it.

Note: the 'ci-draft-pause' branch entry is TEMPORARY for PR wolfSSL#400.
Drop it (and remove aidangarske from the allowlist if desired)
once the PR merges.
Drop the fork-allowance/ci-draft-pause-branch additions so this
file matches the version going in via PR wolfSSL#402. After wolfSSL#402 merges
to master, this PR's branch will have the identical content -- no
merge conflict, no duplicate-but-different file diff to resolve.

Reverts the temporary changes from previous commits:
  - branches: ['master','main','ci-draft-pause'] -> ['master','main']
  - aidangarske/wolfProvider repo allowance dropped
  - per-owner ghcr namespace logic dropped (canonical wolfssl/ only)
  - concurrency group simplified (no ${{ github.repository }} suffix)

If you still want fork-side iteration after wolfSSL#402 merges, do it on
the bootstrap branch with workflow_dispatch on the canonical repo;
the canonical publish flow is what consumers actually pull from.
dgarske pushed a commit that referenced this pull request May 26, 2026
)

Bootstrap PR: introduces the test-deps container image that PR #400's
nightly OSP workflows consume. This is a minimal subset of PR #400
intended to merge first, so the publish workflow fires once on master
and the test-deps image lands at ghcr.io/wolfssl/wolfprovider-test-deps
:bookworm before the rest of PR #400 merges. Without this, PR #400's
OSP container jobs all fail with "manifest unknown" because the image
they pull doesn't exist anywhere yet.

Two files only:
  docker/wolfprovider-test-deps/Dockerfile
    Single Debian-bookworm image with every apt dep that the OSP
    integration tests used to install at job time. One apt-get update
    at build time, zero at job time -- eliminates Debian mirror flake.

  .github/workflows/publish-test-deps-image.yml
    Builds the Dockerfile and pushes to
    ghcr.io/wolfssl/wolfprovider-test-deps:bookworm on push to
    master/main (path-filtered to docker/wolfprovider-test-deps/**)
    or workflow_dispatch. Guarded with
    github.repository == 'wolfSSL/wolfProvider' so forks don't try
    to push to wolfSSL's namespace.

The OSP workflows themselves, the discover-versions resolver, the
ASan/UBSan workflow, and all the matrix/force-fail consolidation
land via PR #400 once this is in place.
dgarske added a commit that referenced this pull request May 26, 2026
ci: bootstrap test-deps Docker image (prep for PR #400)
aidangarske and others added 8 commits May 26, 2026 08:28
PR wolfSSL#402 published ghcr.io/wolfssl/wolfprovider-test-deps:bookworm.
This empty commit bumps the head SHA so PR wolfSSL#400's checks rerun
against the now-existing image.
…CFLAGS

Two real bugs the latest sanitizer + osp-version failures surfaced.

Sanitizer build failure
=======================
sanitizers.yml's overridden WOLFSSL_CONFIG_CFLAGS dropped all the
defaults that scripts/utils-wolfssl.sh would have provided when the
env var is unset. wolfprov then built without -DWC_RSA_NO_PADDING and
the compiler treated wc_RsaDirect as an implicit declaration:

  src/wp_rsa_sig.c:817: error: implicit declaration of function
    'wc_RsaDirect'; did you mean 'wc_ReadDirNext'?

Fix: spell out the defaults explicitly in the workflow and append the
sanitizer flags. (Keep this in sync with the default in
scripts/utils-wolfssl.sh -- if that default changes, the workflow
needs to track it.)

wolfssl_ref now reflects the actual .deb on ghcr
================================================
The old _discover-versions.yml computed wolfssl_ref via
`git ls-remote upstream wolfssl/wolfssl 'v*-stable'`. That gives
"what's the latest -stable tag" (v5.9.1-stable today), but the OSP
workflows install the wolfprov .deb on ghcr.io which Jenkins built
against a different tag (v5.8.4-stable today). The matrix label lied.

_discover-versions.yml now probes the actual non-FIPS .deb:

  oras pull ghcr.io/wolfssl/wolfprovider/debs:nonfips
  -> parse libwolfssl_<VER>+...amd64.deb filename for VER
  -> wolfssl_ref = "v<VER>-stable"

Two outputs now:

  wolfssl_ref / wolfssl_ref_array
    Actual version installed by the wolfprov .deb on ghcr.
    Used by the 40 OSP workflows (they use the .deb).

  wolfssl_latest_ref / wolfssl_latest_ref_array
    Latest upstream v*-stable tag. Used by source-built workflows
    (smoke, simple, sanitizers, libtss2, cmdline, seed-src,
    openssl-version) that clone wolfssl from git.

If the .deb probe fails (network blip, packages-read scope missing
on a fork PR token, future filename change), the resolver falls back
to upstream-latest with a ::warning:: so it's visible in the run log.

Updates to consumer workflows:
  simple, smoke-test, libtss2, sanitizers, cmdline, seed-src,
  openssl-version  ->  switch from wolfssl_ref to wolfssl_latest_ref
- stunnel: replace log-scraping with direct exit-code asserts. The
  prior `grep -c "failed: 0" || echo 1` produced a multi-line value
  that bash word-split into the check-workflow-result.sh arg list,
  silently routing every call past the stunnel-specific branch and
  returning "Tests passed successfully" regardless of test outcome.
  Switch to: normal mode = `timeout 600 make check` must exit 0;
  force-fail mode = `timeout 30 make check` must exit non-zero.
- openssl-version: raise OSSL_FLOOR from 3.0.3 to 3.0.6. OpenSSL
  3.0.3-3.0.5 ship with a known ECX EVP_PKEY_cmp regression that
  breaks test_ecx_sign_verify_raw_pub; those releases were
  superseded within months and no supported user runs them today.
  Also drop stray sanitizer CFLAGS (live in sanitizers.yml) and the
  now-unneeded continue-on-error.
- libtss2: pin shell: bash on the two `source $GITHUB_WORKSPACE/...`
  steps. The wolfprovider-test-deps:bookworm container defaults to
  dash, which errors with "source: not found" before any build runs.
- sanitizers: drop -static-libasan and use LD_PRELOAD'd libasan so
  the libwolfprov.so the openssl binary dlopens shares a single ASan
  runtime instead of doubling up; relax ASAN_OPTIONS so OpenSSL's
  intentional process-lifetime allocations don't kill the test
  before it starts.
- curl.yml: bump timeout-minutes 20->40. Phase B collapse doubled
  the wall time and FIPS+curl-7_88_1 was hitting the cap.
- sanitizers.yml: OpenSSL builds with -fsanitize-recover=undefined
  so benign upstream UBSan trips don't abort `openssl list -providers`
  in env-setup. wolfprov flags stay strict.
The collapse trimmed PR job count by inlining force_fail as sequential
steps, but these workflows are now nightly-only via nightly-osp.yml,
so job count no longer matters. Restoring matrix-axis force_fail
matches master's shape -- one mode per job, cleaner check-workflow-result
invocation, no shared-state risk between modes.

Preserved everywhere:
- workflow_call/workflow_dispatch triggers (no PR/push)
- _discover-versions.yml for wolfssl_ref/openssl_ref
- ubuntu-22.04 + wolfprovider-test-deps:bookworm container
- apt-mark hold / verify-install.sh

Per-workflow specifics preserved:
- hostap: FIPS patch split (fips.patch vs wolfprov.patch)
- stunnel: FIPS patch split + inline exit-code assertion (the master
  check-workflow-result stunnel branch was a no-op due to bash
  word-splitting; inline check is correct)
- libtss2: shell:bash on `source` steps
- openssh: full force_fail axis as on master
- timeout-minutes bumped where the inline collapse needed headroom
  (curl 30, stunnel 20, nginx 30)
Real cause of "Failed to source env-setup": build-wolfprovider.sh runs
its own internal `make test` after install. That triggers `openssl list
-providers` which dlopens libwolfprov.so. With both the openssl binary
and libwolfprov.so built with ASan, two ASan runtimes load and the
dlopen fails silently (stderr swallowed by the build script's
>/dev/null 2>&1).

Fix:
- Drop sanitizer flags from OpenSSL entirely. OpenSSL is third-party;
  we don't need to chase its upstream UBSan patterns. wolfprov is what
  this job is meant to instrument.
- Export LD_PRELOAD=libasan before invoking build-wolfprovider.sh so
  the runtime is in the process when openssl dlopens the ASan-built
  libwolfprov.so during the build script's internal env-setup phase.
check-workflow-result.sh CURL/WPFF=1 block: drop the pinned
per-version expected-failure lists. Under WOLFPROV_FORCE_FAIL=1 the
suite is expected to fail somewhere; a non-zero make test-ci exit
code is enough. The pinned lists drift across curl releases and
masked real regressions while flagging unrelated drift.

curl.yml: add Checkout OSP + Apply OSP curl patch steps before the
Build curl step so the wolfssl/osp/wolfProvider/curl/<ref>-wolfprov.patch
gets applied (skip cleanly if the patch file doesn't exist for the
matrix curl_ref).
While the corresponding wolfssl/osp PR is in flight, consume the candidate
OSP patches from a local osp-patches-staging/ tree so nightly OSP CI can
exercise them end-to-end. Each per-app workflow now pulls its FIPS (or
both) variant from osp-patches-staging/wolfProvider/<app>/ instead of
cloning wolfssl/osp.

  - krb5     : FIPS variant from staging; non-FIPS unchanged
  - hostap   : FIPS variant from staging (new -fips patch); non-FIPS unchanged
  - stunnel  : both variants from staging (same content; covers session
               resumption + FIPS-mode test self-checks)
  - libssh2  : debian variant from staging (adds tests/mansyntax.sh
               LANG=C.UTF-8 fix so locale-less containers pass)
  - curl     : 7_88_1 variant from staging (adds test 1560 to DISABLED).
               Also fixes a pre-existing ordering bug where the Apply step
               ran before the Build step's checkout, leaving the patch
               effectively unused.

Once the wolfssl/osp PR merges these patches, revert the workflow path
edits back to $GITHUB_WORKSPACE/osp/wolfProvider/... and delete this
staging tree.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants