Skip to content

Add: Server-side TLS handshake milestones and timing fixes#12860

Merged
bryancall merged 7 commits intoapache:masterfrom
bryancall:server-tls-milestones
Mar 11, 2026
Merged

Add: Server-side TLS handshake milestones and timing fixes#12860
bryancall merged 7 commits intoapache:masterfrom
bryancall:server-tls-milestones

Conversation

@bryancall
Copy link
Copy Markdown
Contributor

@bryancall bryancall commented Feb 5, 2026

Problem

Only client-side TLS handshake timing was available via TS_MILESTONE_TLS_HANDSHAKE_START/END. There was no way to measure origin-side TLS handshake overhead in the slow log or via the milestone API. Additionally, SSLNetVConnection::sslClientHandShakeEvent() was recording the TLS handshake start time for outbound connections but never calling _record_tls_handshake_end_time(), so the outbound TLS end time was always 0. The slow log's difference_msec() could also produce garbage values when only ms_start was unset (0) but ms_end was set.

Changes

  • Add TS_MILESTONE_SERVER_TLS_HANDSHAKE_START/END milestones -- new enum values in apidefs.h.in (ABI compatible, appended before TS_MILESTONE_LAST_ENTRY). Capture server TLS timing in HttpSM::state_http_server_open() for both the H2 multiplexed (CONNECT_EVENT_TXN) and H1 direct (VC_EVENT_WRITE_COMPLETE) connection paths. Update Lua plugin constants, LogField.cc milestone mappings, and API documentation.
  • Fix missing outbound TLS handshake end time -- sslClientHandShakeEvent() (which handles the outbound/origin side despite the name) was recording start time but never calling _record_tls_handshake_end_time(). Add the missing call so get_tls_handshake_end_time() returns a valid value for HttpSM milestone capture.
  • Fix difference_msec() to check both start and end -- previously only checked ms_end == 0. If ms_start was unset but ms_end was set, the subtraction produced a garbage value. Now checks both for 0 before computing the difference.
  • Rename slow log field tls_handshake to ua_tls_handshake -- clarifies this is the client-side TLS handshake. Add server_tls_handshake field as an offset from SM_START (consistent with other server-side slow log fields). slow_log_report.pl accepts both old and new field names for backward compatibility.
  • Use pre-remap URL in slow log -- changed from client_request (post-remap/origin-facing) to unmapped_url (pre-remap) so the log shows the URL the client actually requested. Falls back to client_request if unmapped_url is not available.

Testing

  • Build passes (including ASAN build)
  • Manual testing with client-side and server-side TLS verified both milestone pairs populate correctly
  • slow_log_report.pl correctly parses both ua_tls_handshake and server_tls_handshake
  • Verified milestones remain 0 for non-TLS and connection-reuse scenarios

Add TS_MILESTONE_SERVER_TLS_HANDSHAKE_START/END to capture origin-side
TLS handshake timing, complementing the existing client-side
TS_MILESTONE_TLS_HANDSHAKE_START/END milestones.

Changes:
- Add new milestone enum values in apidefs.h.in (ABI compatible -
  appended before TS_MILESTONE_LAST_ENTRY)
- Capture server TLS timing in HttpSM::state_http_server_open() for
  both the direct (VC_EVENT_WRITE_COMPLETE) and multiplexed
  (CONNECT_EVENT_TXN) connection paths
- Fix missing _record_tls_handshake_end_time() call in
  SSLNetVConnection::sslClientHandShakeEvent() - outbound TLS
  handshakes were recording start time but never end time
- Rename slow log field "tls_handshake" to "ua_tls_handshake" for
  clarity and add "server_tls_handshake" field
- Update LogField.cc milestone mappings for custom log fields
- Add Lua plugin milestone constants
- Update API documentation and slow_log_report.pl
Previously, difference_msec() only checked if ms_end was 0 (unset)
before returning the "missing" sentinel value. If ms_start was unset
(0) but ms_end was set, the subtraction would produce a garbage value
equal to the raw nanosecond timestamp of ms_end.

This affects any milestone difference calculation where the start
milestone may not be set, such as server-side milestones on cache
hits where no origin connection was made.

The fix adds a check for ms_start == 0 alongside the existing ms_end
check. This is safe because ink_hrtime values are nanoseconds from a
reference point and are never legitimately 0. The difference_sec()
function inherits this fix since it delegates to difference_msec().
@bryancall bryancall added this to the 10.2.0 milestone Feb 5, 2026
@bryancall bryancall self-assigned this Feb 5, 2026
Change server_tls_handshake from a duration (START to END) to an
offset from SM_START (SM_START to END), consistent with all other
server-side fields in the slow log (server_connect, server_connect_end,
server_first_read, etc.). The TLS duration can be inferred from
server_tls_handshake - server_connect_end.
Clarify that CONNECT_EVENT_TXN is the multiplexed path (e.g. HTTP/2)
where ConnectingEntry creates the session, and VC_EVENT_WRITE_COMPLETE
is the direct path (e.g. HTTP/1) where HttpSM owns the netvc. Both
paths need to capture server TLS handshake timing.
Copy link
Copy Markdown
Contributor

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

This pull request adds server-side TLS handshake milestones to Apache Traffic Server, enabling measurement of origin TLS handshake overhead. The PR addresses two main issues: (1) missing visibility into server-side TLS handshake timing, and (2) bugs in milestone timing capture and calculation.

Changes:

  • Adds TS_MILESTONE_SERVER_TLS_HANDSHAKE_START/END enum values and captures server TLS timing in both direct and multiplexed connection paths
  • Fixes missing TLS handshake end time recording for outbound connections in SSLNetVConnection::sslClientHandShakeEvent()
  • Fixes difference_msec() to check both start and end milestones for 0 before computing difference
  • Renames slow log field tls_handshake to ua_tls_handshake for clarity and adds server_tls_handshake field

Reviewed changes

Copilot reviewed 9 out of 9 changed files in this pull request and generated 1 comment.

Show a summary per file
File Description
include/ts/apidefs.h.in Adds TS_MILESTONE_SERVER_TLS_HANDSHAKE_START/END enum values before TS_MILESTONE_LAST_ENTRY (ABI-compatible)
include/proxy/Milestones.h Fixes difference_msec() to check both start and end for zero before subtraction
src/iocore/net/SSLNetVConnection.cc Adds missing _record_tls_handshake_end_time() call for outbound TLS connections
src/proxy/http/HttpSM.cc Captures server TLS milestones in both connection paths (direct and multiplexed); renames slow log field to ua_tls_handshake and adds server_tls_handshake
src/proxy/logging/LogField.cc Adds milestone string mappings for new server TLS milestones with TODO comments for ATS 11 rename
plugins/lua/ts_lua_http_milestone.cc Adds Lua plugin constants for server TLS handshake milestones
doc/developer-guide/api/functions/TSHttpTxnMilestoneGet.en.rst Updates API documentation to clarify client vs. server TLS handshake milestones
doc/admin-guide/plugins/lua.en.rst Documents new Lua milestone constants
tools/slow_log_report.pl Updates regex to parse renamed ua_tls_handshake field and new server_tls_handshake field

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

Comment thread src/proxy/http/HttpSM.cc
The tls_handshake field was renamed to ua_tls_handshake but the
report script needs to handle both formats so it can parse logs
from before and after the rename.
The slow log was printing the post-remap (origin-facing) URL which
is less useful for debugging since it does not show what the client
actually requested. Use t_state.unmapped_url (the pre-remap URL)
instead, with a fallback to client_request URL if unmapped_url is
not available.
Copy link
Copy Markdown
Contributor

@cmcfarlen cmcfarlen left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I see the milestones are set in two places, but it's not clear why or which is correct. Can you clarify this for me?

// Record TLS handshake end time for outbound connections
if (this->get_tls_handshake_begin_time()) {
this->_record_tls_handshake_end_time();
}
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why was this added? This function seems to deal with the client side, but the comment mentions outbound connections. And, if this were the outbound side, are we expecting the metrics this function updates to include both sides TLS?

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The naming in this file is from the TLS role perspective, not the ATS role. sslClientHandShakeEvent() handles the outbound side where ATS acts as the TLS client connecting to the origin. sslServerHandShakeEvent() handles the inbound side where ATS acts as the TLS server accepting client connections.

The inbound side already records the end time (line 1383), but the outbound side was missing it — this is a bug fix. Without this, get_tls_handshake_end_time() returns zero when HttpSM tries to copy the timestamps into milestones.

I've updated the comment to clarify the naming convention and the two-stage pattern.

Comment thread src/proxy/http/HttpSM.cc
if (auto *netvc = session->get_netvc()) {
if (auto tbs = netvc->get_service<TLSBasicSupport>()) {
milestones[TS_MILESTONE_SERVER_TLS_HANDSHAKE_START] = tbs->get_tls_handshake_begin_time();
milestones[TS_MILESTONE_SERVER_TLS_HANDSHAKE_END] = tbs->get_tls_handshake_end_time();
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This sets both milestones after the end time is known. Should we set the start milestone when it starts vs here? As it is, if this code isn't reached, the start milestone won't be set.

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The HttpSM doesn't have access to the netvc during the handshake in the multiplexed path — ConnectingEntry owns the connection and dispatches CONNECT_EVENT_TXN after the handshake completes. So this is the earliest point the SM can grab the timestamps.

The actual start/end times are captured in real-time at the netvc layer (sslClientHandShakeEvent() records the end time, and the begin time is set when the handshake starts). We're just copying them into milestones here. If this code isn't reached (e.g. connection failure), the milestones stay at zero, which is correct — difference_msec() checks for zero before computing the difference.

Updated the comments to make this clearer.

Comment thread src/proxy/http/HttpSM.cc
if (auto tbs = _netvc->get_service<TLSBasicSupport>()) {
milestones[TS_MILESTONE_SERVER_TLS_HANDSHAKE_START] = tbs->get_tls_handshake_begin_time();
milestones[TS_MILESTONE_SERVER_TLS_HANDSHAKE_END] = tbs->get_tls_handshake_end_time();
}
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why are these set again?

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

These aren't duplicates — they're two mutually exclusive connection paths. CONNECT_EVENT_TXN fires for multiplexed connections (H2/H3) where ConnectingEntry manages the connection. VC_EVENT_WRITE_COMPLETE fires for direct connections (H1) where the SM owns the netvc. A given connection goes through one path or the other, never both.

Updated both comments to cross-reference each other and explicitly state they're mutually exclusive.

Clarify the confusing sslClientHandShakeEvent/sslServerHandShakeEvent
naming convention (TLS role vs ATS role), explain why milestones are
set in two mutually exclusive connection paths (H2 multiplexed vs H1
direct), and document the two-stage timestamp pattern where the netvc
records times in real-time and HttpSM copies them into milestones.
Copy link
Copy Markdown
Contributor

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

Copilot reviewed 9 out of 9 changed files in this pull request and generated no new comments.

@bryancall
Copy link
Copy Markdown
Contributor Author

[approve ci autest 1]

@bryancall bryancall merged commit 33566ff into apache:master Mar 11, 2026
19 checks passed
@github-project-automation github-project-automation Bot moved this to For v10.2.0 in ATS v10.2.x Mar 11, 2026
@cmcfarlen cmcfarlen moved this from For v10.2.0 to Picked v10.2.0 in ATS v10.2.x Mar 18, 2026
@cmcfarlen
Copy link
Copy Markdown
Contributor

Cherry-picked to 10.2.x

cmcfarlen pushed a commit that referenced this pull request Mar 18, 2026
Add TS_MILESTONE_SERVER_TLS_HANDSHAKE_START/END to capture origin-side
TLS handshake timing, complementing the existing client-side milestones.

Capture server TLS timing in HttpSM::state_http_server_open() for both
H2 multiplexed and H1 direct connection paths. Fix the missing
_record_tls_handshake_end_time() call for outbound connections and
fix difference_msec() to check both start and end for 0. Rename slow
log field tls_handshake to ua_tls_handshake for clarity and add
server_tls_handshake. Use pre-remap URL in slow log output.

(cherry picked from commit 33566ff)
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

Status: Picked v10.2.0

Development

Successfully merging this pull request may close these issues.

3 participants