Skip to content

feat(otel_thread_ctx): thread-level ctx publication#1791

Open
yannham wants to merge 13 commits intomainfrom
yannham/thread-ctx-tls-shim
Open

feat(otel_thread_ctx): thread-level ctx publication#1791
yannham wants to merge 13 commits intomainfrom
yannham/thread-ctx-tls-shim

Conversation

@yannham
Copy link
Copy Markdown
Contributor

@yannham yannham commented Mar 25, 2026

What does this PR do?

This PR implements a first version of publishing side of the OTel proposal on thread-level context sharing.

The changes are two-folds:

  1. Add a C shim for accessing the correct thread-local variable that holds the thread context. Indeed the spec mandates the use of the TLSDESC dialect, but this isn't possible in Rust, which uses the classical TLS dialect (gnu instead of gnu2) by default. This is not possible to configure on stable Rust. See additional notes below for more details.
  2. Add a new otel_thread_ctx module, similar to the otel_process_ctx one, which provides an abstraction over the thread-level context and handle attach/detach/modify.

The part of the spec around the interned string table, hooking it up in the process-level ctx and tracer metadata is left for future work.

Motivation

See the corresponding OTEP linked above for more details on the motivation.

Additional Notes

TLSDESC is chosen in the spec for performance reasons. I initially found a bit sad that we have to call to a C function from libdatadog to retrieve the TLS address, which incurs an additional cost. I researched potential alternatives:

  • force Rust to use the TLSDESC dialect. This is just not possible/supported.
  • cache the address (which is stable per thread). But this must be put in...drumroll... another thread-local variable, since it's well, thread-local, so back to square one. One possibility would be to use a cached_addr thread-local and force the access to use the initial-exec model, which is very fast (the price to pay is that libdatadog coulnd't be dlopen at runtime, but I'm not sure there's any usage for that). We would trade a function call for an offset and loads, which I expect to be faster. But this requires nightly Rust, unfortunately, and would apply to the entirety of libdatadog.
  • inline-asm: the access sequence for TLSDESC is really simple (it's a few LLVM IR instructions, calling a function obtained from the global offset table). I looked into inline LLVM assembly for Rust, but it's not supported and is anon-goal (LLVM is almost considered as an implementation detail, as rust-gcc could one day become a viable alternative, for example). Native ASM is just too much hassle to pay off.

All in all, I think there's no simple, reasonable, portable and maintainable alternative to the C shim for now. It is worth noting that calling to the C shim is likely to be still faster than the default TLS model chosen by Rust in a dynamic library (the latter requires a function call to __get_tls_addr anyway, but that function is more involved). I wonder if some aggressive cross-language late LTO could inline the C shim.

How to test the change?

There are a bunch of tests in this PR. Ideally we will later test this against other thread-level ctx readers implementations once we hook this work in the FFI; this is left for a follow-up.

@yannham yannham requested review from ivoanjo and scottgerring March 25, 2026 13:25
@yannham yannham self-assigned this Mar 25, 2026
@yannham yannham requested review from a team as code owners March 25, 2026 13:25
@yannham yannham requested review from mtoffl01 and removed request for a team March 25, 2026 13:25
@github-actions
Copy link
Copy Markdown

github-actions bot commented Mar 25, 2026

📚 Documentation Check Results

⚠️ 798 documentation warning(s) found

📦 libdd-library-config - 150 warning(s)

📦 libdd-profiling - 648 warning(s)


Updated: 2026-03-27 16:19:14 UTC | Commit: ea242af | missing-docs job results

@github-actions
Copy link
Copy Markdown

Clippy Allow Annotation Report

Comparing clippy allow annotations between branches:

  • Base Branch: origin/main
  • PR Branch: origin/yannham/thread-ctx-tls-shim

Summary by Rule

Rule Base Branch PR Branch Change

Annotation Counts by File

File Base Branch PR Branch Change

Annotation Stats by Crate

Crate Base Branch PR Branch Change
clippy-annotation-reporter 5 5 No change (0%)
datadog-ffe-ffi 1 1 No change (0%)
datadog-ipc 20 20 No change (0%)
datadog-live-debugger 6 6 No change (0%)
datadog-live-debugger-ffi 10 10 No change (0%)
datadog-profiling-replayer 4 4 No change (0%)
datadog-remote-config 3 3 No change (0%)
datadog-sidecar 55 55 No change (0%)
libdd-common 10 10 No change (0%)
libdd-common-ffi 12 12 No change (0%)
libdd-data-pipeline 5 5 No change (0%)
libdd-ddsketch 2 2 No change (0%)
libdd-dogstatsd-client 1 1 No change (0%)
libdd-profiling 13 13 No change (0%)
libdd-telemetry 19 19 No change (0%)
libdd-tinybytes 4 4 No change (0%)
libdd-trace-normalization 2 2 No change (0%)
libdd-trace-obfuscation 8 8 No change (0%)
libdd-trace-utils 15 15 No change (0%)
Total 195 195 No change (0%)

About This Report

This report tracks Clippy allow annotations for specific rules, showing how they've changed in this PR. Decreasing the number of these annotations generally improves code quality.

@github-actions
Copy link
Copy Markdown

github-actions bot commented Mar 25, 2026

🔒 Cargo Deny Results

No issues found!

📦 libdd-library-config - ✅ No issues

📦 libdd-profiling - ✅ No issues


Updated: 2026-03-27 16:22:39 UTC | Commit: ea242af | dependency-check job results

@datadog-datadog-prod-us1
Copy link
Copy Markdown
Contributor

datadog-datadog-prod-us1 bot commented Mar 25, 2026

✅ Tests

🎉 All green!

❄️ No new flaky tests detected
🧪 All tests passed

🎯 Code Coverage (details)
Patch Coverage: 98.80%
Overall Coverage: 71.28% (+0.11%)

This comment will be updated automatically if new data arrives.
🔗 Commit SHA: ce8e143 | Docs | Datadog PR Page | Was this helpful? React with 👍/👎 or give us feedback!

@codecov-commenter
Copy link
Copy Markdown

codecov-commenter commented Mar 25, 2026

Codecov Report

❌ Patch coverage is 98.80478% with 3 lines in your changes missing coverage. Please review.
✅ Project coverage is 71.28%. Comparing base (a29b90b) to head (ce8e143).

Additional details and impacted files
@@            Coverage Diff             @@
##             main    #1791      +/-   ##
==========================================
+ Coverage   71.16%   71.28%   +0.11%     
==========================================
  Files         414      415       +1     
  Lines       65739    65990     +251     
==========================================
+ Hits        46786    47038     +252     
+ Misses      18953    18952       -1     
Components Coverage Δ
libdd-crashtracker 65.33% <ø> (+0.06%) ⬆️
libdd-crashtracker-ffi 35.24% <ø> (ø)
libdd-alloc 98.77% <ø> (ø)
libdd-data-pipeline 87.55% <ø> (ø)
libdd-data-pipeline-ffi 75.77% <ø> (ø)
libdd-common 79.79% <ø> (ø)
libdd-common-ffi 73.87% <ø> (ø)
libdd-telemetry 62.48% <ø> (ø)
libdd-telemetry-ffi 16.75% <ø> (ø)
libdd-dogstatsd-client 82.64% <ø> (ø)
datadog-ipc 72.56% <ø> (ø)
libdd-profiling 82.06% <98.80%> (+0.43%) ⬆️
libdd-profiling-ffi 64.94% <ø> (ø)
datadog-sidecar 30.79% <ø> (ø)
datdog-sidecar-ffi 9.37% <ø> (ø)
spawn-worker 54.69% <ø> (ø)
libdd-tinybytes 93.16% <ø> (ø)
libdd-trace-normalization 81.71% <ø> (ø)
libdd-trace-obfuscation 87.37% <ø> (ø)
libdd-trace-protobuf 68.25% <ø> (ø)
libdd-trace-utils 88.64% <ø> (ø)
datadog-tracer-flare 86.88% <ø> (ø)
libdd-log 74.69% <ø> (ø)
🚀 New features to boost your workflow:
  • ❄️ Test Analytics: Detect flaky tests, report on failures, and find test suite problems.
  • 📦 JS Bundle Analysis: Save yourself from yourself by tracking and limiting bundle sizes in JS merges.

@pr-commenter
Copy link
Copy Markdown

pr-commenter bot commented Mar 25, 2026

Benchmarks

Comparison

Benchmark execution time: 2026-03-27 16:36:02

Comparing candidate commit ce8e143 in PR branch yannham/thread-ctx-tls-shim with baseline commit a29b90b in branch main.

Found 0 performance improvements and 2 performance regressions! Performance is the same for 60 metrics, 0 unstable metrics.

Explanation

This is an A/B test comparing a candidate commit's performance against that of a baseline commit. Performance changes are noted in the tables below as:

  • 🟩 = significantly better candidate vs. baseline
  • 🟥 = significantly worse candidate vs. baseline

We compute a confidence interval (CI) over the relative difference of means between metrics from the candidate and baseline commits, considering the baseline as the reference.

If the CI is entirely outside the configured SIGNIFICANT_IMPACT_THRESHOLD (or the deprecated UNCONFIDENCE_THRESHOLD), the change is considered significant.

Feel free to reach out to #apm-benchmarking-platform on Slack if you have any questions.

More details about the CI and significant changes

You can imagine this CI as a range of values that is likely to contain the true difference of means between the candidate and baseline commits.

CIs of the difference of means are often centered around 0%, because often changes are not that big:

---------------------------------(------|---^--------)-------------------------------->
                              -0.6%    0%  0.3%     +1.2%
                                 |          |        |
         lower bound of the CI --'          |        |
sample mean (center of the CI) -------------'        |
         upper bound of the CI ----------------------'

As described above, a change is considered significant if the CI is entirely outside the configured SIGNIFICANT_IMPACT_THRESHOLD (or the deprecated UNCONFIDENCE_THRESHOLD).

For instance, for an execution time metric, this confidence interval indicates a significantly worse performance:

----------------------------------------|---------|---(---------^---------)---------->
                                       0%        1%  1.3%      2.2%      3.1%
                                                  |   |         |         |
       significant impact threshold --------------'   |         |         |
                      lower bound of CI --------------'         |         |
       sample mean (center of the CI) --------------------------'         |
                      upper bound of CI ----------------------------------'

scenario:receiver_entry_point/report/2598

  • 🟥 execution_time [+260.175µs; +271.632µs] or [+7.523%; +7.854%]

scenario:tags/replace_trace_tags

  • 🟥 execution_time [+96.550ns; +103.368ns] or [+4.040%; +4.325%]

Candidate

Candidate benchmark details

Group 1

cpu_model git_commit_sha git_commit_date git_branch
Intel(R) Xeon(R) Platinum 8259CL CPU @ 2.50GHz ce8e143 1774628300 yannham/thread-ctx-tls-shim
scenario metric min mean ± sd median ± mad p75 p95 p99 max peak_to_median_ratio skewness kurtosis cv sem runs sample_size
tags/replace_trace_tags execution_time 2.457µs 2.490µs ± 0.018µs 2.486µs ± 0.010µs 2.499µs 2.532µs 2.535µs 2.536µs 2.02% 0.900 0.202 0.73% 0.001µs 1 200
scenario metric 95% CI mean Shapiro-Wilk pvalue Ljung-Box pvalue (lag=1) Dip test pvalue
tags/replace_trace_tags execution_time [2.487µs; 2.492µs] or [-0.102%; +0.102%] None None None

Group 2

cpu_model git_commit_sha git_commit_date git_branch
Intel(R) Xeon(R) Platinum 8259CL CPU @ 2.50GHz ce8e143 1774628300 yannham/thread-ctx-tls-shim
scenario metric min mean ± sd median ± mad p75 p95 p99 max peak_to_median_ratio skewness kurtosis cv sem runs sample_size
two way interface execution_time 13.892µs 14.324µs ± 0.389µs 14.078µs ± 0.090µs 14.651µs 15.035µs 15.539µs 15.662µs 11.25% 1.064 0.423 2.71% 0.028µs 1 200
scenario metric 95% CI mean Shapiro-Wilk pvalue Ljung-Box pvalue (lag=1) Dip test pvalue
two way interface execution_time [14.270µs; 14.378µs] or [-0.376%; +0.376%] None None None

Group 3

cpu_model git_commit_sha git_commit_date git_branch
Intel(R) Xeon(R) Platinum 8259CL CPU @ 2.50GHz ce8e143 1774628300 yannham/thread-ctx-tls-shim
scenario metric min mean ± sd median ± mad p75 p95 p99 max peak_to_median_ratio skewness kurtosis cv sem runs sample_size
profile_add_sample_frames_x1000 execution_time 4.170ms 4.174ms ± 0.003ms 4.173ms ± 0.001ms 4.174ms 4.178ms 4.181ms 4.198ms 0.60% 3.963 28.178 0.07% 0.000ms 1 200
scenario metric 95% CI mean Shapiro-Wilk pvalue Ljung-Box pvalue (lag=1) Dip test pvalue
profile_add_sample_frames_x1000 execution_time [4.173ms; 4.174ms] or [-0.009%; +0.009%] None None None

Group 4

cpu_model git_commit_sha git_commit_date git_branch
Intel(R) Xeon(R) Platinum 8259CL CPU @ 2.50GHz ce8e143 1774628300 yannham/thread-ctx-tls-shim
scenario metric min mean ± sd median ± mad p75 p95 p99 max peak_to_median_ratio skewness kurtosis cv sem runs sample_size
sdk_test_data/rules-based execution_time 144.724µs 146.500µs ± 1.855µs 146.223µs ± 0.479µs 146.693µs 148.158µs 153.849µs 165.115µs 12.92% 6.468 55.230 1.26% 0.131µs 1 200
scenario metric 95% CI mean Shapiro-Wilk pvalue Ljung-Box pvalue (lag=1) Dip test pvalue
sdk_test_data/rules-based execution_time [146.243µs; 146.757µs] or [-0.175%; +0.175%] None None None

Group 5

cpu_model git_commit_sha git_commit_date git_branch
Intel(R) Xeon(R) Platinum 8259CL CPU @ 2.50GHz ce8e143 1774628300 yannham/thread-ctx-tls-shim
scenario metric min mean ± sd median ± mad p75 p95 p99 max peak_to_median_ratio skewness kurtosis cv sem runs sample_size
normalization/normalize_name/normalize_name/Too-Long-.Too-Long-.Too-Long-.Too-Long-.Too-Long-.Too-Lo... execution_time 185.634µs 186.180µs ± 0.332µs 186.138µs ± 0.178µs 186.332µs 186.612µs 187.777µs 187.959µs 0.98% 2.065 8.146 0.18% 0.023µs 1 200
normalization/normalize_name/normalize_name/Too-Long-.Too-Long-.Too-Long-.Too-Long-.Too-Long-.Too-Lo... throughput 5320310.632op/s 5371175.791op/s ± 9544.837op/s 5372347.320op/s ± 5149.375op/s 5377178.672op/s 5382725.256op/s 5385552.208op/s 5386950.006op/s 0.27% -2.034 7.955 0.18% 674.922op/s 1 200
normalization/normalize_name/normalize_name/bad-name execution_time 17.826µs 17.905µs ± 0.028µs 17.904µs ± 0.018µs 17.923µs 17.955µs 17.967µs 17.983µs 0.44% 0.135 0.073 0.16% 0.002µs 1 200
normalization/normalize_name/normalize_name/bad-name throughput 55608230.205op/s 55850150.642op/s ± 88674.839op/s 55854056.655op/s ± 54803.845op/s 55905696.956op/s 55990145.735op/s 56070701.630op/s 56097517.284op/s 0.44% -0.126 0.073 0.16% 6270.258op/s 1 200
normalization/normalize_name/normalize_name/good execution_time 10.541µs 10.587µs ± 0.026µs 10.582µs ± 0.015µs 10.603µs 10.626µs 10.643µs 10.784µs 1.91% 2.408 14.597 0.25% 0.002µs 1 200
normalization/normalize_name/normalize_name/good throughput 92732771.551op/s 94453964.274op/s ± 231031.474op/s 94499790.067op/s ± 134660.123op/s 94617212.501op/s 94722360.694op/s 94759155.430op/s 94867591.434op/s 0.39% -2.330 13.834 0.24% 16336.392op/s 1 200
scenario metric 95% CI mean Shapiro-Wilk pvalue Ljung-Box pvalue (lag=1) Dip test pvalue
normalization/normalize_name/normalize_name/Too-Long-.Too-Long-.Too-Long-.Too-Long-.Too-Long-.Too-Lo... execution_time [186.134µs; 186.226µs] or [-0.025%; +0.025%] None None None
normalization/normalize_name/normalize_name/Too-Long-.Too-Long-.Too-Long-.Too-Long-.Too-Long-.Too-Lo... throughput [5369852.969op/s; 5372498.614op/s] or [-0.025%; +0.025%] None None None
normalization/normalize_name/normalize_name/bad-name execution_time [17.901µs; 17.909µs] or [-0.022%; +0.022%] None None None
normalization/normalize_name/normalize_name/bad-name throughput [55837861.162op/s; 55862440.122op/s] or [-0.022%; +0.022%] None None None
normalization/normalize_name/normalize_name/good execution_time [10.584µs; 10.591µs] or [-0.034%; +0.034%] None None None
normalization/normalize_name/normalize_name/good throughput [94421945.534op/s; 94485983.015op/s] or [-0.034%; +0.034%] None None None

Group 6

cpu_model git_commit_sha git_commit_date git_branch
Intel(R) Xeon(R) Platinum 8259CL CPU @ 2.50GHz ce8e143 1774628300 yannham/thread-ctx-tls-shim
scenario metric min mean ± sd median ± mad p75 p95 p99 max peak_to_median_ratio skewness kurtosis cv sem runs sample_size
sql/obfuscate_sql_string execution_time 288.948µs 289.514µs ± 0.479µs 289.458µs ± 0.157µs 289.594µs 289.829µs 291.467µs 293.769µs 1.49% 5.857 44.277 0.17% 0.034µs 1 200
scenario metric 95% CI mean Shapiro-Wilk pvalue Ljung-Box pvalue (lag=1) Dip test pvalue
sql/obfuscate_sql_string execution_time [289.448µs; 289.581µs] or [-0.023%; +0.023%] None None None

Group 7

cpu_model git_commit_sha git_commit_date git_branch
Intel(R) Xeon(R) Platinum 8259CL CPU @ 2.50GHz ce8e143 1774628300 yannham/thread-ctx-tls-shim
scenario metric min mean ± sd median ± mad p75 p95 p99 max peak_to_median_ratio skewness kurtosis cv sem runs sample_size
benching deserializing traces from msgpack to their internal representation execution_time 49.805ms 50.142ms ± 0.901ms 50.002ms ± 0.076ms 50.109ms 50.233ms 55.375ms 59.155ms 18.30% 7.901 66.243 1.79% 0.064ms 1 200
scenario metric 95% CI mean Shapiro-Wilk pvalue Ljung-Box pvalue (lag=1) Dip test pvalue
benching deserializing traces from msgpack to their internal representation execution_time [50.017ms; 50.267ms] or [-0.249%; +0.249%] None None None

Group 8

cpu_model git_commit_sha git_commit_date git_branch
Intel(R) Xeon(R) Platinum 8259CL CPU @ 2.50GHz ce8e143 1774628300 yannham/thread-ctx-tls-shim
scenario metric min mean ± sd median ± mad p75 p95 p99 max peak_to_median_ratio skewness kurtosis cv sem runs sample_size
profile_serialize_compressed_pprof_timestamped_x1000 execution_time 911.572µs 914.303µs ± 1.310µs 914.276µs ± 0.917µs 915.154µs 916.601µs 917.439µs 918.241µs 0.43% 0.365 -0.088 0.14% 0.093µs 1 200
scenario metric 95% CI mean Shapiro-Wilk pvalue Ljung-Box pvalue (lag=1) Dip test pvalue
profile_serialize_compressed_pprof_timestamped_x1000 execution_time [914.121µs; 914.484µs] or [-0.020%; +0.020%] None None None

Group 9

cpu_model git_commit_sha git_commit_date git_branch
Intel(R) Xeon(R) Platinum 8259CL CPU @ 2.50GHz ce8e143 1774628300 yannham/thread-ctx-tls-shim
scenario metric min mean ± sd median ± mad p75 p95 p99 max peak_to_median_ratio skewness kurtosis cv sem runs sample_size
profile_add_sample2_frames_x1000 execution_time 742.216µs 743.110µs ± 0.463µs 743.099µs ± 0.283µs 743.343µs 744.053µs 744.266µs 745.268µs 0.29% 0.923 2.165 0.06% 0.033µs 1 200
scenario metric 95% CI mean Shapiro-Wilk pvalue Ljung-Box pvalue (lag=1) Dip test pvalue
profile_add_sample2_frames_x1000 execution_time [743.046µs; 743.174µs] or [-0.009%; +0.009%] None None None

Group 10

cpu_model git_commit_sha git_commit_date git_branch
Intel(R) Xeon(R) Platinum 8259CL CPU @ 2.50GHz ce8e143 1774628300 yannham/thread-ctx-tls-shim
scenario metric min mean ± sd median ± mad p75 p95 p99 max peak_to_median_ratio skewness kurtosis cv sem runs sample_size
ip_address/quantize_peer_ip_address_benchmark execution_time 5.049µs 5.105µs ± 0.047µs 5.082µs ± 0.017µs 5.149µs 5.192µs 5.194µs 5.198µs 2.28% 0.738 -1.069 0.92% 0.003µs 1 200
scenario metric 95% CI mean Shapiro-Wilk pvalue Ljung-Box pvalue (lag=1) Dip test pvalue
ip_address/quantize_peer_ip_address_benchmark execution_time [5.099µs; 5.112µs] or [-0.128%; +0.128%] None None None

Group 11

cpu_model git_commit_sha git_commit_date git_branch
Intel(R) Xeon(R) Platinum 8259CL CPU @ 2.50GHz ce8e143 1774628300 yannham/thread-ctx-tls-shim
scenario metric min mean ± sd median ± mad p75 p95 p99 max peak_to_median_ratio skewness kurtosis cv sem runs sample_size
normalization/normalize_trace/test_trace execution_time 242.329ns 254.525ns ± 14.115ns 248.872ns ± 4.346ns 257.673ns 286.725ns 295.341ns 296.056ns 18.96% 1.644 1.499 5.53% 0.998ns 1 200
scenario metric 95% CI mean Shapiro-Wilk pvalue Ljung-Box pvalue (lag=1) Dip test pvalue
normalization/normalize_trace/test_trace execution_time [252.569ns; 256.481ns] or [-0.769%; +0.769%] None None None

Group 12

cpu_model git_commit_sha git_commit_date git_branch
Intel(R) Xeon(R) Platinum 8259CL CPU @ 2.50GHz ce8e143 1774628300 yannham/thread-ctx-tls-shim
scenario metric min mean ± sd median ± mad p75 p95 p99 max peak_to_median_ratio skewness kurtosis cv sem runs sample_size
single_flag_killswitch/rules-based execution_time 190.341ns 192.504ns ± 1.677ns 192.448ns ± 1.325ns 193.413ns 195.366ns 197.761ns 198.081ns 2.93% 0.805 0.578 0.87% 0.119ns 1 200
scenario metric 95% CI mean Shapiro-Wilk pvalue Ljung-Box pvalue (lag=1) Dip test pvalue
single_flag_killswitch/rules-based execution_time [192.271ns; 192.736ns] or [-0.121%; +0.121%] None None None

Group 13

cpu_model git_commit_sha git_commit_date git_branch
Intel(R) Xeon(R) Platinum 8259CL CPU @ 2.50GHz ce8e143 1774628300 yannham/thread-ctx-tls-shim
scenario metric min mean ± sd median ± mad p75 p95 p99 max peak_to_median_ratio skewness kurtosis cv sem runs sample_size
redis/obfuscate_redis_string execution_time 33.657µs 34.338µs ± 1.242µs 33.776µs ± 0.054µs 33.841µs 37.022µs 37.055µs 37.496µs 11.01% 1.698 0.913 3.61% 0.088µs 1 200
scenario metric 95% CI mean Shapiro-Wilk pvalue Ljung-Box pvalue (lag=1) Dip test pvalue
redis/obfuscate_redis_string execution_time [34.166µs; 34.511µs] or [-0.501%; +0.501%] None None None

Group 14

cpu_model git_commit_sha git_commit_date git_branch
Intel(R) Xeon(R) Platinum 8259CL CPU @ 2.50GHz ce8e143 1774628300 yannham/thread-ctx-tls-shim
scenario metric min mean ± sd median ± mad p75 p95 p99 max peak_to_median_ratio skewness kurtosis cv sem runs sample_size
write only interface execution_time 5.397µs 5.461µs ± 0.028µs 5.467µs ± 0.020µs 5.482µs 5.498µs 5.511µs 5.520µs 0.98% -0.362 -0.777 0.50% 0.002µs 1 200
scenario metric 95% CI mean Shapiro-Wilk pvalue Ljung-Box pvalue (lag=1) Dip test pvalue
write only interface execution_time [5.457µs; 5.464µs] or [-0.070%; +0.070%] None None None

Group 15

cpu_model git_commit_sha git_commit_date git_branch
Intel(R) Xeon(R) Platinum 8259CL CPU @ 2.50GHz ce8e143 1774628300 yannham/thread-ctx-tls-shim
scenario metric min mean ± sd median ± mad p75 p95 p99 max peak_to_median_ratio skewness kurtosis cv sem runs sample_size
benching serializing traces from their internal representation to msgpack execution_time 13.767ms 13.814ms ± 0.029ms 13.809ms ± 0.015ms 13.827ms 13.866ms 13.914ms 13.956ms 1.06% 1.640 4.474 0.21% 0.002ms 1 200
scenario metric 95% CI mean Shapiro-Wilk pvalue Ljung-Box pvalue (lag=1) Dip test pvalue
benching serializing traces from their internal representation to msgpack execution_time [13.810ms; 13.818ms] or [-0.029%; +0.029%] None None None

Group 16

cpu_model git_commit_sha git_commit_date git_branch
Intel(R) Xeon(R) Platinum 8259CL CPU @ 2.50GHz ce8e143 1774628300 yannham/thread-ctx-tls-shim
scenario metric min mean ± sd median ± mad p75 p95 p99 max peak_to_median_ratio skewness kurtosis cv sem runs sample_size
profile_add_sample_timestamped_x1000 execution_time 4.183ms 4.187ms ± 0.008ms 4.186ms ± 0.001ms 4.187ms 4.190ms 4.196ms 4.296ms 2.62% 11.959 155.352 0.19% 0.001ms 1 200
scenario metric 95% CI mean Shapiro-Wilk pvalue Ljung-Box pvalue (lag=1) Dip test pvalue
profile_add_sample_timestamped_x1000 execution_time [4.186ms; 4.188ms] or [-0.027%; +0.027%] None None None

Group 17

cpu_model git_commit_sha git_commit_date git_branch
Intel(R) Xeon(R) Platinum 8259CL CPU @ 2.50GHz ce8e143 1774628300 yannham/thread-ctx-tls-shim
scenario metric min mean ± sd median ± mad p75 p95 p99 max peak_to_median_ratio skewness kurtosis cv sem runs sample_size
benching string interning on wordpress profile execution_time 160.478µs 161.521µs ± 0.241µs 161.485µs ± 0.115µs 161.608µs 161.950µs 162.293µs 162.396µs 0.56% 0.544 2.734 0.15% 0.017µs 1 200
scenario metric 95% CI mean Shapiro-Wilk pvalue Ljung-Box pvalue (lag=1) Dip test pvalue
benching string interning on wordpress profile execution_time [161.488µs; 161.555µs] or [-0.021%; +0.021%] None None None

Group 18

cpu_model git_commit_sha git_commit_date git_branch
Intel(R) Xeon(R) Platinum 8259CL CPU @ 2.50GHz ce8e143 1774628300 yannham/thread-ctx-tls-shim
scenario metric min mean ± sd median ± mad p75 p95 p99 max peak_to_median_ratio skewness kurtosis cv sem runs sample_size
normalization/normalize_service/normalize_service/A0000000000000000000000000000000000000000000000000... execution_time 495.452µs 496.384µs ± 0.645µs 496.262µs ± 0.349µs 496.638µs 497.446µs 498.879µs 499.641µs 0.68% 1.957 5.513 0.13% 0.046µs 1 200
normalization/normalize_service/normalize_service/A0000000000000000000000000000000000000000000000000... throughput 2001435.697op/s 2014570.884op/s ± 2609.172op/s 2015064.962op/s ± 1417.204op/s 2016374.134op/s 2017442.822op/s 2018019.714op/s 2018358.541op/s 0.16% -1.943 5.437 0.13% 184.496op/s 1 200
normalization/normalize_service/normalize_service/Data🐨dog🐶 繋がっ⛰てて execution_time 370.786µs 371.580µs ± 0.378µs 371.539µs ± 0.250µs 371.818µs 372.219µs 372.557µs 373.478µs 0.52% 0.837 2.232 0.10% 0.027µs 1 200
normalization/normalize_service/normalize_service/Data🐨dog🐶 繋がっ⛰てて throughput 2677533.884op/s 2691212.747op/s ± 2737.222op/s 2691507.644op/s ± 1808.505op/s 2693193.057op/s 2695157.734op/s 2695790.913op/s 2696973.456op/s 0.20% -0.827 2.182 0.10% 193.551op/s 1 200
normalization/normalize_service/normalize_service/Test Conversion 0f Weird !@#$%^&**() Characters execution_time 167.859µs 168.250µs ± 0.283µs 168.178µs ± 0.094µs 168.299µs 168.978µs 169.387µs 169.478µs 0.77% 2.484 6.506 0.17% 0.020µs 1 200
normalization/normalize_service/normalize_service/Test Conversion 0f Weird !@#$%^&**() Characters throughput 5900470.048op/s 5943546.994op/s ± 9967.730op/s 5946076.827op/s ± 3324.289op/s 5948829.689op/s 5952750.364op/s 5954290.203op/s 5957392.452op/s 0.19% -2.472 6.450 0.17% 704.825op/s 1 200
normalization/normalize_service/normalize_service/[empty string] execution_time 36.821µs 37.024µs ± 0.108µs 37.025µs ± 0.090µs 37.110µs 37.196µs 37.238µs 37.241µs 0.59% -0.016 -0.984 0.29% 0.008µs 1 200
normalization/normalize_service/normalize_service/[empty string] throughput 26851831.753op/s 27009674.026op/s ± 78671.729op/s 27009107.923op/s ± 65760.423op/s 27076324.439op/s 27139319.893op/s 27154629.707op/s 27158777.205op/s 0.55% 0.025 -0.984 0.29% 5562.931op/s 1 200
normalization/normalize_service/normalize_service/test_ASCII execution_time 46.268µs 46.374µs ± 0.048µs 46.368µs ± 0.027µs 46.399µs 46.467µs 46.498µs 46.557µs 0.41% 0.650 0.777 0.10% 0.003µs 1 200
normalization/normalize_service/normalize_service/test_ASCII throughput 21479186.091op/s 21563989.791op/s ± 22468.519op/s 21566690.178op/s ± 12773.367op/s 21577776.283op/s 21595327.414op/s 21608055.460op/s 21613295.384op/s 0.22% -0.643 0.763 0.10% 1588.764op/s 1 200
scenario metric 95% CI mean Shapiro-Wilk pvalue Ljung-Box pvalue (lag=1) Dip test pvalue
normalization/normalize_service/normalize_service/A0000000000000000000000000000000000000000000000000... execution_time [496.295µs; 496.474µs] or [-0.018%; +0.018%] None None None
normalization/normalize_service/normalize_service/A0000000000000000000000000000000000000000000000000... throughput [2014209.278op/s; 2014932.490op/s] or [-0.018%; +0.018%] None None None
normalization/normalize_service/normalize_service/Data🐨dog🐶 繋がっ⛰てて execution_time [371.528µs; 371.632µs] or [-0.014%; +0.014%] None None None
normalization/normalize_service/normalize_service/Data🐨dog🐶 繋がっ⛰てて throughput [2690833.394op/s; 2691592.099op/s] or [-0.014%; +0.014%] None None None
normalization/normalize_service/normalize_service/Test Conversion 0f Weird !@#$%^&**() Characters execution_time [168.211µs; 168.289µs] or [-0.023%; +0.023%] None None None
normalization/normalize_service/normalize_service/Test Conversion 0f Weird !@#$%^&**() Characters throughput [5942165.563op/s; 5944928.426op/s] or [-0.023%; +0.023%] None None None
normalization/normalize_service/normalize_service/[empty string] execution_time [37.009µs; 37.039µs] or [-0.040%; +0.040%] None None None
normalization/normalize_service/normalize_service/[empty string] throughput [26998770.881op/s; 27020577.171op/s] or [-0.040%; +0.040%] None None None
normalization/normalize_service/normalize_service/test_ASCII execution_time [46.367µs; 46.380µs] or [-0.014%; +0.014%] None None None
normalization/normalize_service/normalize_service/test_ASCII throughput [21560875.870op/s; 21567103.711op/s] or [-0.014%; +0.014%] None None None

Group 19

cpu_model git_commit_sha git_commit_date git_branch
Intel(R) Xeon(R) Platinum 8259CL CPU @ 2.50GHz ce8e143 1774628300 yannham/thread-ctx-tls-shim
scenario metric min mean ± sd median ± mad p75 p95 p99 max peak_to_median_ratio skewness kurtosis cv sem runs sample_size
concentrator/add_spans_to_concentrator execution_time 14.793ms 14.824ms ± 0.016ms 14.821ms ± 0.009ms 14.832ms 14.852ms 14.870ms 14.906ms 0.58% 1.577 4.743 0.11% 0.001ms 1 200
scenario metric 95% CI mean Shapiro-Wilk pvalue Ljung-Box pvalue (lag=1) Dip test pvalue
concentrator/add_spans_to_concentrator execution_time [14.822ms; 14.827ms] or [-0.015%; +0.015%] None None None

Group 20

cpu_model git_commit_sha git_commit_date git_branch
Intel(R) Xeon(R) Platinum 8259CL CPU @ 2.50GHz ce8e143 1774628300 yannham/thread-ctx-tls-shim
scenario metric min mean ± sd median ± mad p75 p95 p99 max peak_to_median_ratio skewness kurtosis cv sem runs sample_size
receiver_entry_point/report/2598 execution_time 3.682ms 3.724ms ± 0.033ms 3.715ms ± 0.011ms 3.727ms 3.799ms 3.855ms 3.865ms 4.05% 2.234 5.228 0.89% 0.002ms 1 200
scenario metric 95% CI mean Shapiro-Wilk pvalue Ljung-Box pvalue (lag=1) Dip test pvalue
receiver_entry_point/report/2598 execution_time [3.720ms; 3.729ms] or [-0.124%; +0.124%] None None None

Group 21

cpu_model git_commit_sha git_commit_date git_branch
Intel(R) Xeon(R) Platinum 8259CL CPU @ 2.50GHz ce8e143 1774628300 yannham/thread-ctx-tls-shim
scenario metric min mean ± sd median ± mad p75 p95 p99 max peak_to_median_ratio skewness kurtosis cv sem runs sample_size
credit_card/is_card_number/ execution_time 3.894µs 3.912µs ± 0.003µs 3.913µs ± 0.001µs 3.914µs 3.917µs 3.918µs 3.921µs 0.20% -1.563 12.781 0.07% 0.000µs 1 200
credit_card/is_card_number/ throughput 255068094.303op/s 255594876.365op/s ± 169814.663op/s 255589546.979op/s ± 95600.394op/s 255691721.981op/s 255815248.608op/s 255862585.104op/s 256831213.078op/s 0.49% 1.587 12.976 0.07% 12007.710op/s 1 200
credit_card/is_card_number/ 3782-8224-6310-005 execution_time 78.904µs 79.609µs ± 0.356µs 79.585µs ± 0.223µs 79.789µs 80.257µs 80.590µs 80.883µs 1.63% 0.771 0.860 0.45% 0.025µs 1 200
credit_card/is_card_number/ 3782-8224-6310-005 throughput 12363602.966op/s 12561670.694op/s ± 56020.255op/s 12565217.608op/s ± 35369.580op/s 12603223.907op/s 12641114.994op/s 12661156.208op/s 12673650.384op/s 0.86% -0.741 0.784 0.44% 3961.230op/s 1 200
credit_card/is_card_number/ 378282246310005 execution_time 72.346µs 73.014µs ± 0.354µs 72.988µs ± 0.236µs 73.217µs 73.700µs 73.833µs 73.969µs 1.35% 0.428 -0.327 0.48% 0.025µs 1 200
credit_card/is_card_number/ 378282246310005 throughput 13519102.245op/s 13696355.755op/s ± 66271.355op/s 13700952.911op/s ± 44504.692op/s 13745422.436op/s 13792730.124op/s 13813393.211op/s 13822493.556op/s 0.89% -0.406 -0.352 0.48% 4686.092op/s 1 200
credit_card/is_card_number/37828224631 execution_time 3.894µs 3.912µs ± 0.003µs 3.912µs ± 0.001µs 3.914µs 3.917µs 3.918µs 3.921µs 0.23% -1.104 11.247 0.07% 0.000µs 1 200
credit_card/is_card_number/37828224631 throughput 255023341.221op/s 255596902.368op/s ± 174851.908op/s 255616576.739op/s ± 96680.444op/s 255696186.526op/s 255794768.532op/s 255857332.352op/s 256825545.822op/s 0.47% 1.129 11.424 0.07% 12363.897op/s 1 200
credit_card/is_card_number/378282246310005 execution_time 69.122µs 69.610µs ± 0.259µs 69.543µs ± 0.154µs 69.757µs 70.100µs 70.244µs 70.770µs 1.76% 1.107 1.612 0.37% 0.018µs 1 200
credit_card/is_card_number/378282246310005 throughput 14130261.781op/s 14365949.547op/s ± 53160.545op/s 14379543.661op/s ± 31999.280op/s 14403283.949op/s 14431382.849op/s 14440103.659op/s 14467204.505op/s 0.61% -1.081 1.505 0.37% 3759.018op/s 1 200
credit_card/is_card_number/37828224631000521389798 execution_time 52.234µs 52.312µs ± 0.033µs 52.310µs ± 0.023µs 52.331µs 52.377µs 52.402µs 52.457µs 0.28% 0.781 1.457 0.06% 0.002µs 1 200
credit_card/is_card_number/37828224631000521389798 throughput 19063194.888op/s 19116163.316op/s ± 12070.403op/s 19116722.778op/s ± 8306.153op/s 19125247.059op/s 19133181.859op/s 19137483.533op/s 19144488.641op/s 0.15% -0.776 1.440 0.06% 853.506op/s 1 200
credit_card/is_card_number/x371413321323331 execution_time 5.709µs 5.858µs ± 0.064µs 5.864µs ± 0.042µs 5.902µs 5.963µs 6.003µs 6.038µs 2.98% 0.080 -0.522 1.10% 0.005µs 1 200
credit_card/is_card_number/x371413321323331 throughput 165611525.634op/s 170727061.167op/s ± 1875869.043op/s 170542747.567op/s ± 1217098.997op/s 172283412.143op/s 173689764.678op/s 174314252.645op/s 175148574.661op/s 2.70% -0.032 -0.547 1.10% 132643.972op/s 1 200
credit_card/is_card_number_no_luhn/ execution_time 3.892µs 3.913µs ± 0.003µs 3.913µs ± 0.002µs 3.915µs 3.918µs 3.919µs 3.922µs 0.22% -1.638 13.576 0.07% 0.000µs 1 200
credit_card/is_card_number_no_luhn/ throughput 254974597.271op/s 255548670.867op/s ± 189114.832op/s 255537005.722op/s ± 123315.796op/s 255681584.535op/s 255774008.211op/s 255827966.940op/s 256949029.231op/s 0.55% 1.666 13.812 0.07% 13372.438op/s 1 200
credit_card/is_card_number_no_luhn/ 3782-8224-6310-005 execution_time 63.849µs 64.355µs ± 0.181µs 64.351µs ± 0.119µs 64.476µs 64.640µs 64.772µs 64.952µs 0.93% 0.044 0.118 0.28% 0.013µs 1 200
credit_card/is_card_number_no_luhn/ 3782-8224-6310-005 throughput 15395899.638op/s 15538935.316op/s ± 43765.667op/s 15539820.494op/s ± 28615.164op/s 15566661.272op/s 15615568.566op/s 15629192.722op/s 15662060.916op/s 0.79% -0.026 0.107 0.28% 3094.700op/s 1 200
credit_card/is_card_number_no_luhn/ 378282246310005 execution_time 57.821µs 58.087µs ± 0.186µs 58.031µs ± 0.109µs 58.190µs 58.433µs 58.679µs 58.766µs 1.27% 1.169 1.178 0.32% 0.013µs 1 200
credit_card/is_card_number_no_luhn/ 378282246310005 throughput 17016769.485op/s 17215819.719op/s ± 55000.148op/s 17232167.298op/s ± 32291.941op/s 17255082.002op/s 17277262.879op/s 17293564.719op/s 17294847.865op/s 0.36% -1.152 1.121 0.32% 3889.098op/s 1 200
credit_card/is_card_number_no_luhn/37828224631 execution_time 3.892µs 3.912µs ± 0.003µs 3.911µs ± 0.001µs 3.913µs 3.917µs 3.920µs 3.924µs 0.33% -0.805 14.288 0.07% 0.000µs 1 200
credit_card/is_card_number_no_luhn/37828224631 throughput 254832296.032op/s 255636331.748op/s ± 184496.028op/s 255668309.913op/s ± 82226.393op/s 255741647.530op/s 255817436.841op/s 255858509.976op/s 256969480.628op/s 0.51% 0.839 14.494 0.07% 13045.839op/s 1 200
credit_card/is_card_number_no_luhn/378282246310005 execution_time 54.594µs 55.031µs ± 0.314µs 54.970µs ± 0.211µs 55.217µs 55.632µs 55.983µs 56.180µs 2.20% 1.021 0.928 0.57% 0.022µs 1 200
credit_card/is_card_number_no_luhn/378282246310005 throughput 17799966.890op/s 18172200.658op/s ± 103074.143op/s 18191696.664op/s ± 70019.476op/s 18252408.671op/s 18295727.967op/s 18310347.803op/s 18317056.843op/s 0.69% -0.989 0.829 0.57% 7288.443op/s 1 200
credit_card/is_card_number_no_luhn/37828224631000521389798 execution_time 52.216µs 52.320µs ± 0.037µs 52.322µs ± 0.025µs 52.346µs 52.372µs 52.393µs 52.441µs 0.23% -0.102 -0.134 0.07% 0.003µs 1 200
credit_card/is_card_number_no_luhn/37828224631000521389798 throughput 19069164.555op/s 19113011.804op/s ± 13371.691op/s 19112293.684op/s ± 9116.912op/s 19121945.652op/s 19135496.349op/s 19141896.764op/s 19151393.763op/s 0.20% 0.106 -0.136 0.07% 945.521op/s 1 200
credit_card/is_card_number_no_luhn/x371413321323331 execution_time 5.704µs 5.848µs ± 0.063µs 5.852µs ± 0.048µs 5.896µs 5.940µs 5.981µs 6.008µs 2.67% -0.003 -0.641 1.08% 0.004µs 1 200
credit_card/is_card_number_no_luhn/x371413321323331 throughput 166431075.746op/s 171016170.666op/s ± 1850209.584op/s 170868191.812op/s ± 1390944.919op/s 172380978.288op/s 174022598.218op/s 174422478.667op/s 175302186.236op/s 2.59% 0.046 -0.654 1.08% 130829.574op/s 1 200
scenario metric 95% CI mean Shapiro-Wilk pvalue Ljung-Box pvalue (lag=1) Dip test pvalue
credit_card/is_card_number/ execution_time [3.912µs; 3.913µs] or [-0.009%; +0.009%] None None None
credit_card/is_card_number/ throughput [255571341.686op/s; 255618411.044op/s] or [-0.009%; +0.009%] None None None
credit_card/is_card_number/ 3782-8224-6310-005 execution_time [79.559µs; 79.658µs] or [-0.062%; +0.062%] None None None
credit_card/is_card_number/ 3782-8224-6310-005 throughput [12553906.826op/s; 12569434.563op/s] or [-0.062%; +0.062%] None None None
credit_card/is_card_number/ 378282246310005 execution_time [72.965µs; 73.063µs] or [-0.067%; +0.067%] None None None
credit_card/is_card_number/ 378282246310005 throughput [13687171.182op/s; 13705540.327op/s] or [-0.067%; +0.067%] None None None
credit_card/is_card_number/37828224631 execution_time [3.912µs; 3.913µs] or [-0.009%; +0.009%] None None None
credit_card/is_card_number/37828224631 throughput [255572669.575op/s; 255621135.161op/s] or [-0.009%; +0.009%] None None None
credit_card/is_card_number/378282246310005 execution_time [69.574µs; 69.646µs] or [-0.051%; +0.051%] None None None
credit_card/is_card_number/378282246310005 throughput [14358582.006op/s; 14373317.087op/s] or [-0.051%; +0.051%] None None None
credit_card/is_card_number/37828224631000521389798 execution_time [52.307µs; 52.316µs] or [-0.009%; +0.009%] None None None
credit_card/is_card_number/37828224631000521389798 throughput [19114490.474op/s; 19117836.158op/s] or [-0.009%; +0.009%] None None None
credit_card/is_card_number/x371413321323331 execution_time [5.849µs; 5.867µs] or [-0.152%; +0.152%] None None None
credit_card/is_card_number/x371413321323331 throughput [170467083.759op/s; 170987038.575op/s] or [-0.152%; +0.152%] None None None
credit_card/is_card_number_no_luhn/ execution_time [3.913µs; 3.914µs] or [-0.010%; +0.010%] None None None
credit_card/is_card_number_no_luhn/ throughput [255522461.370op/s; 255574880.364op/s] or [-0.010%; +0.010%] None None None
credit_card/is_card_number_no_luhn/ 3782-8224-6310-005 execution_time [64.330µs; 64.380µs] or [-0.039%; +0.039%] None None None
credit_card/is_card_number_no_luhn/ 3782-8224-6310-005 throughput [15532869.815op/s; 15545000.816op/s] or [-0.039%; +0.039%] None None None
credit_card/is_card_number_no_luhn/ 378282246310005 execution_time [58.061µs; 58.113µs] or [-0.044%; +0.044%] None None None
credit_card/is_card_number_no_luhn/ 378282246310005 throughput [17208197.227op/s; 17223442.210op/s] or [-0.044%; +0.044%] None None None
credit_card/is_card_number_no_luhn/37828224631 execution_time [3.911µs; 3.912µs] or [-0.010%; +0.010%] None None None
credit_card/is_card_number_no_luhn/37828224631 throughput [255610762.373op/s; 255661901.123op/s] or [-0.010%; +0.010%] None None None
credit_card/is_card_number_no_luhn/378282246310005 execution_time [54.987µs; 55.074µs] or [-0.079%; +0.079%] None None None
credit_card/is_card_number_no_luhn/378282246310005 throughput [18157915.573op/s; 18186485.743op/s] or [-0.079%; +0.079%] None None None
credit_card/is_card_number_no_luhn/37828224631000521389798 execution_time [52.315µs; 52.325µs] or [-0.010%; +0.010%] None None None
credit_card/is_card_number_no_luhn/37828224631000521389798 throughput [19111158.616op/s; 19114864.991op/s] or [-0.010%; +0.010%] None None None
credit_card/is_card_number_no_luhn/x371413321323331 execution_time [5.839µs; 5.857µs] or [-0.150%; +0.150%] None None None
credit_card/is_card_number_no_luhn/x371413321323331 throughput [170759749.412op/s; 171272591.920op/s] or [-0.150%; +0.150%] None None None

Baseline

Omitted due to size.

@dd-octo-sts
Copy link
Copy Markdown
Contributor

dd-octo-sts bot commented Mar 25, 2026

Artifact Size Benchmark Report

aarch64-alpine-linux-musl
Artifact Baseline Commit Change
/aarch64-alpine-linux-musl/lib/libdatadog_profiling.so 8.76 MB 8.76 MB 0% (0 B) 👌
/aarch64-alpine-linux-musl/lib/libdatadog_profiling.a 101.90 MB 101.90 MB +0% (+286 B) 👌
aarch64-unknown-linux-gnu
Artifact Baseline Commit Change
/aarch64-unknown-linux-gnu/lib/libdatadog_profiling.a 118.84 MB 118.85 MB +0% (+4.72 KB) 👌
/aarch64-unknown-linux-gnu/lib/libdatadog_profiling.so 11.36 MB 11.36 MB 0% (0 B) 👌
libdatadog-x64-windows
Artifact Baseline Commit Change
/libdatadog-x64-windows/debug/dynamic/datadog_profiling_ffi.dll 27.39 MB 27.39 MB 0% (0 B) 👌
/libdatadog-x64-windows/debug/dynamic/datadog_profiling_ffi.lib 80.69 KB 80.69 KB 0% (0 B) 👌
/libdatadog-x64-windows/debug/dynamic/datadog_profiling_ffi.pdb 187.21 MB 187.19 MB -0% (-16.00 KB) 👌
/libdatadog-x64-windows/debug/static/datadog_profiling_ffi.lib 924.91 MB 925.25 MB +.03% (+349.11 KB) 🔍
/libdatadog-x64-windows/release/dynamic/datadog_profiling_ffi.dll 9.06 MB 9.06 MB 0% (0 B) 👌
/libdatadog-x64-windows/release/dynamic/datadog_profiling_ffi.lib 80.69 KB 80.69 KB 0% (0 B) 👌
/libdatadog-x64-windows/release/dynamic/datadog_profiling_ffi.pdb 26.98 MB 26.98 MB 0% (0 B) 👌
/libdatadog-x64-windows/release/static/datadog_profiling_ffi.lib 61.28 MB 61.28 MB +0% (+174 B) 👌
libdatadog-x86-windows
Artifact Baseline Commit Change
/libdatadog-x86-windows/debug/dynamic/datadog_profiling_ffi.dll 23.20 MB 23.20 MB 0% (0 B) 👌
/libdatadog-x86-windows/debug/dynamic/datadog_profiling_ffi.lib 81.94 KB 81.94 KB 0% (0 B) 👌
/libdatadog-x86-windows/debug/dynamic/datadog_profiling_ffi.pdb 191.48 MB 191.48 MB 0% (0 B) 👌
/libdatadog-x86-windows/debug/static/datadog_profiling_ffi.lib 908.20 MB 907.97 MB --.02% (-236.40 KB) 💪
/libdatadog-x86-windows/release/dynamic/datadog_profiling_ffi.dll 6.90 MB 6.90 MB -0% (-512 B) 👌
/libdatadog-x86-windows/release/dynamic/datadog_profiling_ffi.lib 81.94 KB 81.94 KB 0% (0 B) 👌
/libdatadog-x86-windows/release/dynamic/datadog_profiling_ffi.pdb 29.11 MB 29.11 MB 0% (0 B) 👌
/libdatadog-x86-windows/release/static/datadog_profiling_ffi.lib 57.68 MB 57.68 MB -0% (-486 B) 👌
x86_64-alpine-linux-musl
Artifact Baseline Commit Change
/x86_64-alpine-linux-musl/lib/libdatadog_profiling.a 88.72 MB 88.72 MB +0% (+2.41 KB) 👌
/x86_64-alpine-linux-musl/lib/libdatadog_profiling.so 10.32 MB 10.32 MB 0% (0 B) 👌
x86_64-unknown-linux-gnu
Artifact Baseline Commit Change
/x86_64-unknown-linux-gnu/lib/libdatadog_profiling.a 111.54 MB 111.54 MB +0% (+5.18 KB) 👌
/x86_64-unknown-linux-gnu/lib/libdatadog_profiling.so 12.07 MB 12.07 MB 0% (0 B) 👌

@nsavoire
Copy link
Copy Markdown
Contributor

nsavoire commented Mar 25, 2026

Concerning cross language LTO, I did an experiment there that inlines a small C function into rust:
https://github.com/nsavoire/cross_language_lto_demo

Copy link
Copy Markdown
Member

@ivoanjo ivoanjo left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

👍 Left some notes but really excited to see this starting to land!

Note: My rust-fu is kinda sucky so may be worth getting a second opinion on the rusty bits, I may have missed something obvious there ;)

@yannham yannham requested a review from a team as a code owner March 25, 2026 16:09
@yannham yannham requested review from vpellan and removed request for a team March 25, 2026 16:09
@ivoanjo
Copy link
Copy Markdown
Member

ivoanjo commented Mar 25, 2026

Concerning cross language LTO, I did an experiment there that inlines a small C function into rust:

That's very cool! Although I soft-wonder if it'll still respect the TLSDESC... 👀

@yannham
Copy link
Copy Markdown
Contributor Author

yannham commented Mar 25, 2026

That's very cool! Although I soft-wonder if it'll still respect the TLSDESC... 👀

I'm rather confident on that front. What I want and that @nsavoire managed to do is just to inline the C function that has the right instructions for TLSDESC access as LLVM bytecode or native code. The TLSDESC access pattern should be a simple read the GOT and call into something, which is an inlineable snippet - in fact we're really using C as a portable ASM macro here (plus the fact that it sets the proper relocation table). In any case, it seems simple enough that I'll definitely try to do it (it's also interesting that in high-perf Rust code you could use this technique to get TLSDESC perf for selected thread-locals).

@ivoanjo
Copy link
Copy Markdown
Member

ivoanjo commented Mar 25, 2026

I'm rather confident on that front.

Coolio! Let's use it then! This is all experimental anyway so we can always go back if we find any problems with the approach.

Copy link
Copy Markdown
Member

@scottgerring scottgerring left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nice!

// as of today.
#include <stddef.h>

__attribute__((visibility("default")))
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Do we ever strip, and if we do, is this sufficient to ensure the TLS symbol remains? In the demo model I used a linker script but tbh I don't know if that was necessary

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Good question. Let me research to see if the TLS symbols could be stripped; the script looks simple enough that it's ok to include it if needed.

Copy link
Copy Markdown
Contributor Author

@yannham yannham Mar 27, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for pointing this out. I've done some experimentation by adding a dummy wrapper in the -ffi crate and indeed by default the symbol is stripped (or made local, to be precise). It turns out a working solution is to use a linker version script like:

{
    global: custom_labels_current_set_v2;
}

on the FFI crate. The reason is that rustc generates itself a version script that looks like local: *, hiding all symbols not explicitly exported/whitelisted by default, and this happens after all the C business. Thankfully LLD support multiple version scripts (kinda merge them), and we can make sure the TLS variable makes it to the final shared library by providing our own additional one.

Though all of this is required, I believe it doesn't really affect this PR, because this will be needed on the -ffi crate producing the shared library, not the pure Rust part.

#include <stddef.h>

__attribute__((visibility("default")))
__thread void *custom_labels_current_set_v2 = NULL;
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@ivoanjo are we going to launch from 'v2' for this

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Good point -- let's discuss in the channel.

@nsavoire
Copy link
Copy Markdown
Contributor

That's very cool! Although I soft-wonder if it'll still respect the TLSDESC... 👀

I'm rather confident on that front. What I want and that @nsavoire managed to do is just to inline the C function that has the right instructions for TLSDESC access as LLVM bytecode or native code. The TLSDESC access pattern should be a simple read the GOT and call into something, which is an inlineable snippet - in fact we're really using C as a portable ASM macro here (plus the fact that it sets the proper relocation table). In any case, it seems simple enough that I'll definitely try to do it (it's also interesting that in high-perf Rust code you could use this technique to get TLSDESC perf for selected thread-locals).

I made some experiments and turns out @ivoanjo is right: TLSDESC is lost in the C -> llvm IR translation.
Adding RUSTFLAGS= "-C link-arg=-Wl,-plugin-opt=--enable-tlsdesc" (in addition to -C linker-plugin-lto, required for cross-language LTO) forces use of TLSDESC.
But then we might as well use RUSTFLAGS="-C llvm-args=--enable-tlsdesc" with pure Rust TLS variable.

@yannham
Copy link
Copy Markdown
Contributor Author

yannham commented Mar 26, 2026

I made some experiments and turns out @ivoanjo is right: TLSDESC is lost in the C -> llvm IR translation.
Adding RUSTFLAGS= "-C link-arg=-Wl,-plugin-opt=--enable-tlsdesc" (in addition to -C linker-plugin-lto, required for cross-language LTO) forces use of TLSDESC.

A good demonstration that being confident doesn't equal to being right 😅 though it's beyond me how LTO could change the TLS model that is used, since the emitted instructions are different, so somehow the TLS access pattern is "already compiled and decided" at the point where the linker does its magic (well, it's not entirely true for the relocations, but at least for the instructions emitted).

But then we might as well use RUSTFLAGS="-C llvm-args=--enable-tlsdesc" with pure Rust TLS variable.

Won't that permeate through at least the whole crate, if not all of libdatadog? Do you know how much could we scope this option?

@yannham yannham force-pushed the yannham/thread-ctx-tls-shim branch from 87d3f08 to 1e03fc8 Compare March 26, 2026 14:13
@yannham yannham requested a review from a team as a code owner March 27, 2026 16:10
@yannham
Copy link
Copy Markdown
Contributor Author

yannham commented Mar 27, 2026

So indeed linkers might actually change the TLS access model at link-time: Link-time TLS relaxation. Though I don't see how this could affect a dynamic library. I can see how it could happen in an executable, though.

In any case, I think cross-language inlining is premature for this PR, so I'll ignore this for now and experiment later when we have a proper FFI in "real" conditions, if that sounds good to everyone.

Declares `custom_labels_current_set_v2` in a C file so the linker emits
a proper TLSDESC dynamic symbol entry readable by out-of-process tools
(e.g. eBPF profiler).  Rust's `#[thread_local]` does not produce a
TLSDESC dialect, so a C shim is required.

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
yannham and others added 12 commits March 27, 2026 17:18
Apply suggestion from @ivoanjo

Co-authored-by: Ivo Anjo <ivo.anjo@datadoghq.com>
It turns out the previous implementation used volatile to synchronize
with the async-signal-handler reader, but the proper device to use is
actually atomics + compiler-only fences. This commit updates the model
to do that, and get rid of all the `xxx_volatile` details.
The module and its C shim (tls_shim.c) logically belong in the profiling
crate rather than library-config. Move otel_thread_ctx.rs and tls_shim.c
to libdd-profiling/src/, wire up the cc build step in its build.rs, and
restore libdd-library-config to its pre-PR state.

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
@yannham yannham force-pushed the yannham/thread-ctx-tls-shim branch from d2f6e3a to ce8e143 Compare March 27, 2026 16:18
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

5 participants