Add GitHub Actions workflow for benchmark load tests with baseline comparison by Copilot · Pull Request #801 · nextcloud/whiteboard

Copilot · 2025-11-21T07:55:57Z

Automates load testing on PRs touching websocket server or benchmark code. Posts formatted results with baseline comparison as PR comments.

Implementation

Workflow trigger: PR changes to websocket_server/, tools/benchmarks/, or workflow file; manual via workflow_dispatch
CI-optimized execution: 50, 100 user concurrency over 45s (vs production 50, 100, 300, 500)
Baseline comparison: Reads baseline from tools/benchmarks/results.json and shows deltas with color indicators (🟢 improvement, 🔴 regression, ⚪ no change)
Result formatting: Markdown table with CPU/memory/network/message metrics + collapsible JSON details
Error handling: JSON validation via Node.js, graceful failure messaging, artifact upload on all outcomes

Example Output

The workflow produces PR comments with baseline comparison including network and message metrics:

## 🚀 Benchmark Load Test Results

Comparison with baseline from `tools/benchmarks/results.json`.
Legend: 🟢 improvement, 🔴 regression, ⚪ no change

| Concurrent Users | Avg CPU | Peak CPU | Avg Memory (MB) | Peak Memory (MB) | Bytes Sent | Bytes Received | Messages Sent | Messages Received | Dropped Connections |
|------------------|---------|----------|-----------------|------------------|------------|----------------|---------------|-------------------|--------------------|
| 50 | 11.50% (🔴+1.03%) | 50.00% (🔴+2.30%) | 250.00 (🔴+7.23) | 270.00 (🔴+6.61) | 908,445 (🔴+759,123) | 70,074,578 (🔴+46,579,353) | 5,323 | 109,385 | 0 (⚪0) |
| 100 | 18.50% (🟢-1.37%) | 52.00% (🟢-3.60%) | 350.00 (🟢-13.73) | 365.00 (🟢-5.77) | 1,800,000 (🔴+1,500,981) | 140,000,000 (🔴+43,389,016) | 10,500 | 218,000 | 0 (⚪0) |

<details>
<summary>📊 50 concurrent users</summary>
...detailed JSON...
</details>

Network byte metrics show diffs compared to baseline, while message counts are new metrics not in the baseline. Large numbers are formatted with thousand separators for readability.

Uses existing tools/benchmarks/runBenchmarks.mjs script with extraction via sed and validation via Node.js JSON.parse.

Original prompt

On benchmarks, I want to have a github action that automatically runs the load test as described in the readme and prints the result. If it runs on a pull request I want to post a comment with the result

✨ Let Copilot coding agent set things up for you — coding agent works faster and does higher quality work when set up for your repo.

Co-authored-by: juliusknorr <3404133+juliusknorr@users.noreply.github.com>

github-actions · 2025-11-21T08:09:12Z

🚀 Benchmark Load Test Results

These results show the performance characteristics of the websocket server under load.

Concurrent Users	Avg CPU	Peak CPU	Avg Memory (MB)	Peak Memory (MB)	Dropped Connections
50	31.09%	132.00%	216.55	223.71	0
100	63.76%	145.00%	418.70	498.18	0

Details

📊 50 concurrent users

{
  "concurrency": 50,
  "cpu": {
    "average": 31.091304347826078,
    "peak": 132
  },
  "memory": {
    "averageRssMb": 216.54976222826087,
    "peakRssMb": 223.71484375
  },
  "network": {
    "bytesIn": 0,
    "bytesOut": 0
  },
  "loadSummary": {
    "serverUrl": "http://127.0.0.1:3002",
    "roomId": "benchmark-room",
    "concurrency": 50,
    "activeSenders": 15,
    "activeRatio": 0.3,
    "durationSeconds": 45,
    "updateRate": 3,
    "bytesSent": 457958,
    "bytesReceived": 28560682,
    "messagesSent": 2697,
    "messagesReceived": 72316,
    "averageJoinDelayMs": 109.18,
    "droppedConnections": 0
  }
}

📊 100 concurrent users

{
  "concurrency": 100,
  "cpu": {
    "average": 63.76382978723404,
    "peak": 145
  },
  "memory": {
    "averageRssMb": 418.6961436170213,
    "peakRssMb": 498.17578125
  },
  "network": {
    "bytesIn": 0,
    "bytesOut": 0
  },
  "loadSummary": {
    "serverUrl": "http://127.0.0.1:3002",
    "roomId": "benchmark-room",
    "concurrency": 100,
    "activeSenders": 30,
    "activeRatio": 0.3,
    "durationSeconds": 45,
    "updateRate": 3,
    "bytesSent": 917154,
    "bytesReceived": 103672940,
    "messagesSent": 5374,
    "messagesReceived": 182705,
    "averageJoinDelayMs": 239.39,
    "droppedConnections": 0
  }
}

Note: These benchmarks run with reduced load (50, 100 users) for CI efficiency. For full benchmarks, see the README.

juliusknorr · 2025-11-21T08:24:09Z

@copilot I'd like to store a baseline of the output in the repo and when posting the pull request comment show the diffed values.

Co-authored-by: juliusknorr <3404133+juliusknorr@users.noreply.github.com>

Copilot · 2025-11-21T08:27:36Z

@copilot I'd like to store a baseline of the output in the repo and when posting the pull request comment show the diffed values.

Implemented in commit a042309. The workflow now:

Reads baseline from tools/benchmarks/results.json
Compares current results with baseline
Shows deltas with color indicators:
- 🟢 improvements (lower CPU/memory, fewer dropped connections)
- 🔴 regressions (higher CPU/memory, more dropped connections)
- ⚪ no change

Example output:

| Concurrent Users | Avg CPU | Peak CPU | Avg Memory (MB) | Peak Memory (MB) | Dropped Connections |
|------------------|---------|----------|-----------------|------------------|--------------------|
| 50 | 11.50% (🔴+1.03%) | 50.00% (🔴+2.30%) | 250.00 (🔴+7.23) | 270.00 (🔴+6.61) | 0 (⚪0) |
| 100 | 18.50% (🟢-1.37%) | 52.00% (🟢-3.60%) | 350.00 (🟢-13.73) | 365.00 (🟢-5.77) | 0 (⚪0) |

Falls back to regular display if baseline is not found.

Co-authored-by: juliusknorr <3404133+juliusknorr@users.noreply.github.com>

github-actions · 2025-11-21T08:48:33Z

🚀 Benchmark Load Test Results

Comparison with baseline from tools/benchmarks/results.json.
Legend: 🟢 improvement, 🔴 regression, ⚪ no change

Concurrent Users	Avg CPU	Peak CPU	Avg Memory (MB)	Peak Memory (MB)	Dropped Connections
50	0.00% (🟢-10.47%)	0.00% (🟢-47.70%)	224.00 (🟢-18.77)	245.34 (🟢-18.05)	0 (⚪0)
100	0.00% (🟢-19.87%)	0.00% (🟢-55.60%)	327.62 (🟢-36.11)	369.86 (🟢-0.91)	0 (⚪0)

Details

📊 50 concurrent users

{
  "concurrency": 50,
  "cpu": {
    "average": 0,
    "peak": 0
  },
  "memory": {
    "averageRssMb": 223.9958444148936,
    "peakRssMb": 245.3359375
  },
  "network": {
    "bytesIn": 0,
    "bytesOut": 0
  },
  "loadSummary": {
    "serverUrl": "http://127.0.0.1:3002",
    "roomId": "benchmark-room",
    "concurrency": 50,
    "activeSenders": 15,
    "activeRatio": 0.3,
    "durationSeconds": 45,
    "updateRate": 3,
    "bytesSent": 457512,
    "bytesReceived": 34522513,
    "messagesSent": 2694,
    "messagesReceived": 104054,
    "averageJoinDelayMs": 142.28,
    "droppedConnections": 0
  }
}

📊 100 concurrent users

{
  "concurrency": 100,
  "cpu": {
    "average": 0,
    "peak": 0
  },
  "memory": {
    "averageRssMb": 327.6195977393617,
    "peakRssMb": 369.85546875
  },
  "network": {
    "bytesIn": 0,
    "bytesOut": 0
  },
  "loadSummary": {
    "serverUrl": "http://127.0.0.1:3002",
    "roomId": "benchmark-room",
    "concurrency": 100,
    "activeSenders": 30,
    "activeRatio": 0.3,
    "durationSeconds": 45,
    "updateRate": 3,
    "bytesSent": 915782,
    "bytesReceived": 103712476,
    "messagesSent": 5366,
    "messagesReceived": 250862,
    "averageJoinDelayMs": 317.19,
    "droppedConnections": 0
  }
}

Note: These benchmarks run with reduced load (50, 100 users) for CI efficiency. For full benchmarks, see the README.

github-actions · 2025-11-21T08:49:43Z

🚀 Benchmark Load Test Results

Comparison with baseline from tools/benchmarks/results.json.
Legend: 🟢 improvement, 🔴 regression, ⚪ no change

Concurrent Users	Avg CPU	Peak CPU	Avg Memory (MB)	Peak Memory (MB)	Dropped Connections
50	0.00% (🟢-10.47%)	0.00% (🟢-47.70%)	167.21 (🟢-75.56)	179.20 (🟢-84.19)	0 (⚪0)
100	0.00% (🟢-19.87%)	0.00% (🟢-55.60%)	281.16 (🟢-82.57)	307.07 (🟢-63.70)	0 (⚪0)

Details

📊 50 concurrent users

{
  "concurrency": 50,
  "cpu": {
    "average": 0,
    "peak": 0
  },
  "memory": {
    "averageRssMb": 167.21017323369566,
    "peakRssMb": 179.203125
  },
  "network": {
    "bytesIn": 0,
    "bytesOut": 0
  },
  "loadSummary": {
    "serverUrl": "http://127.0.0.1:3002",
    "roomId": "benchmark-room",
    "concurrency": 50,
    "activeSenders": 15,
    "activeRatio": 0.3,
    "durationSeconds": 45,
    "updateRate": 3,
    "bytesSent": 456453,
    "bytesReceived": 25743126,
    "messagesSent": 2688,
    "messagesReceived": 68888,
    "averageJoinDelayMs": 259.12,
    "droppedConnections": 0
  }
}

📊 100 concurrent users

{
  "concurrency": 100,
  "cpu": {
    "average": 0,
    "peak": 0
  },
  "memory": {
    "averageRssMb": 281.1586277173913,
    "peakRssMb": 307.07421875
  },
  "network": {
    "bytesIn": 0,
    "bytesOut": 0
  },
  "loadSummary": {
    "serverUrl": "http://127.0.0.1:3002",
    "roomId": "benchmark-room",
    "concurrency": 100,
    "activeSenders": 30,
    "activeRatio": 0.3,
    "durationSeconds": 45,
    "updateRate": 3,
    "bytesSent": 908445,
    "bytesReceived": 70074578,
    "messagesSent": 5323,
    "messagesReceived": 109385,
    "averageJoinDelayMs": 607.58,
    "droppedConnections": 0
  }
}

Note: These benchmarks run with reduced load (50, 100 users) for CI efficiency. For full benchmarks, see the README.

juliusknorr · 2025-11-21T08:58:02Z

@copilot Those values should also be included in the table and diff

"bytesSent": 908445,
"bytesReceived": 70074578,
"messagesSent": 5323,
"messagesReceived": 109385,

juliusknorr · 2025-11-21T09:37:18Z

@copilot Those values should also be included in the table and diff

juliusknorr · 2025-11-21T09:37:43Z

@copilot Those values should also be included in the table and diff

"bytesSent": 908445,
"bytesReceived": 70074578,
"messagesSent": 5323,
"messagesReceived": 109385,

Co-authored-by: juliusknorr <3404133+juliusknorr@users.noreply.github.com>

Copilot · 2025-11-21T09:41:57Z

@copilot Those values should also be included in the table and diff

"bytesSent": 908445,...

Implemented in commit de03398. Added four new columns to the results table:

Bytes Sent - with diff vs baseline ingressBytes
Bytes Received - with diff vs baseline egressBytes
Messages Sent - formatted with thousand separators
Messages Received - formatted with thousand separators

Example output:

| Concurrent Users | Avg CPU | Peak CPU | Avg Memory (MB) | Peak Memory (MB) | Bytes Sent | Bytes Received | Messages Sent | Messages Received | Dropped Connections |
|------------------|---------|----------|-----------------|------------------|------------|----------------|---------------|-------------------|--------------------|
| 50 | 11.50% (🔴+1.03%) | 50.00% (🔴+2.30%) | 250.00 (🔴+7.23) | 270.00 (🔴+6.61) | 908,445 (🔴+759,123) | 70,074,578 (🔴+46,579,353) | 5,323 | 109,385 | 0 (⚪0) |

Network bytes show diffs compared to baseline, while message counts are new metrics not in the baseline.

Co-authored-by: juliusknorr <3404133+juliusknorr@users.noreply.github.com>

juliusknorr · 2025-11-25T20:58:34Z

Closing as I was just experimenting, don't think the baseline comparison is useful yet, we could think about other metrics to compare though

Initial plan

a2abedb

Copilot AI assigned Copilot and juliusknorr Nov 21, 2025

Copilot started work on behalf of juliusknorr November 21, 2025 07:56 View session

Copilot AI and others added 5 commits November 21, 2025 08:01

Add GitHub Actions workflow for benchmark load tests

b1d6af5

Co-authored-by: juliusknorr <3404133+juliusknorr@users.noreply.github.com>

Add error handling and validation to benchmark workflow

13dacb2

Co-authored-by: juliusknorr <3404133+juliusknorr@users.noreply.github.com>

Replace jq with Node.js for JSON validation and formatting

86bb6cb

Co-authored-by: juliusknorr <3404133+juliusknorr@users.noreply.github.com>

Fix JSON validation and add await to async API call

edd2844

Co-authored-by: juliusknorr <3404133+juliusknorr@users.noreply.github.com>

Clean up unused variable and fix stderr redirect

fbc3b8d

Co-authored-by: juliusknorr <3404133+juliusknorr@users.noreply.github.com>

Copilot AI changed the title ~~[WIP] Add GitHub action for automated load testing~~ Add GitHub Actions workflow for benchmark load tests Nov 21, 2025

Copilot AI requested a review from juliusknorr November 21, 2025 08:10

Copilot finished work on behalf of juliusknorr November 21, 2025 08:10

Copilot started work on behalf of juliusknorr November 21, 2025 08:24 View session

Add baseline comparison to benchmark results

a042309

Co-authored-by: juliusknorr <3404133+juliusknorr@users.noreply.github.com>

Fix baseline property access for droppedConnections

b0bcb82

Co-authored-by: juliusknorr <3404133+juliusknorr@users.noreply.github.com>

Copilot AI changed the title ~~Add GitHub Actions workflow for benchmark load tests~~ Add GitHub Actions workflow for benchmark load tests with baseline comparison Nov 21, 2025

Copilot finished work on behalf of juliusknorr November 21, 2025 08:31

Copilot stopped work on behalf of juliusknorr due to an error November 21, 2025 09:37
Copilot has encountered an error. See logs for additional details.

Copilot started work on behalf of juliusknorr November 21, 2025 09:38 View session

Add network and message metrics to benchmark results table

de03398

Co-authored-by: juliusknorr <3404133+juliusknorr@users.noreply.github.com>

Add clarifying comments for network byte mapping

6bd5370

Co-authored-by: juliusknorr <3404133+juliusknorr@users.noreply.github.com>

Copilot finished work on behalf of juliusknorr November 21, 2025 09:45

juliusknorr closed this Nov 25, 2025

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Add GitHub Actions workflow for benchmark load tests with baseline comparison#801

Add GitHub Actions workflow for benchmark load tests with baseline comparison#801
Copilot wants to merge 10 commits into
mainfrom
copilot/add-load-test-github-action

Copilot AI commented Nov 21, 2025 •

edited

Loading

Uh oh!

github-actions Bot commented Nov 21, 2025

Uh oh!

juliusknorr commented Nov 21, 2025

Uh oh!

Copilot AI commented Nov 21, 2025

Uh oh!

github-actions Bot commented Nov 21, 2025

Uh oh!

github-actions Bot commented Nov 21, 2025

Uh oh!

juliusknorr commented Nov 21, 2025

Uh oh!

juliusknorr commented Nov 21, 2025

Uh oh!

juliusknorr commented Nov 21, 2025

Uh oh!

Copilot AI commented Nov 21, 2025

Uh oh!

juliusknorr commented Nov 25, 2025

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

Conversation

Copilot AI commented Nov 21, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Implementation

Example Output

Uh oh!

github-actions Bot commented Nov 21, 2025

🚀 Benchmark Load Test Results

Details

Uh oh!

juliusknorr commented Nov 21, 2025

Uh oh!

Copilot AI commented Nov 21, 2025

Uh oh!

github-actions Bot commented Nov 21, 2025

🚀 Benchmark Load Test Results

Details

Uh oh!

github-actions Bot commented Nov 21, 2025

🚀 Benchmark Load Test Results

Details

Uh oh!

juliusknorr commented Nov 21, 2025

Uh oh!

juliusknorr commented Nov 21, 2025

Uh oh!

juliusknorr commented Nov 21, 2025

Uh oh!

Copilot AI commented Nov 21, 2025

Uh oh!

juliusknorr commented Nov 25, 2025

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

Copilot AI commented Nov 21, 2025 •

edited

Loading