Skip to content

ftime-trace: break up the template instance span into sub-phases#22825

Open
suyashkumar102 wants to merge 6 commits intodlang:masterfrom
suyashkumar102:fix/ftime-trace-sema-subspans
Open

ftime-trace: break up the template instance span into sub-phases#22825
suyashkumar102 wants to merge 6 commits intodlang:masterfrom
suyashkumar102:fix/ftime-trace-sema-subspans

Conversation

@suyashkumar102
Copy link
Copy Markdown

-ftime-trace output in perfetto and the sema1 template instance block was just one big blob with nothing inside it. makes it really hard to tell if the time is going into arg evaluation, overload resolution, expanding members, sema2 or sema3.

added spans for:

  • semanticTiargs (arg semantic)
  • findBestMatch (overload resolution)
  • expandMembers (sema1 on members)
  • semantic2 on the instance
  • trySemantic3 on the instance

fixes #22711

@dlang-bot
Copy link
Copy Markdown
Contributor

Thanks for your pull request and interest in making D better, @suyashkumar102! We are looking forward to reviewing it, and you should be hearing from a maintainer soon.
Please verify that your PR follows this checklist:

  • My PR is fully covered with tests (you can see the coverage diff by visiting the details link of the codecov check)
  • My PR is as minimal as possible (smaller, focused PRs are easier to review than big ones)
  • I have provided a detailed rationale explaining my changes
  • New or modified functions have Ddoc comments (with Params: and Returns:)

Please see CONTRIBUTING.md for more information.


If you have addressed all reviews or aren't sure how to proceed, don't hesitate to ping us with a simple comment.

Bugzilla references

Your PR doesn't reference any Bugzilla issue.

If your PR contains non-trivial changes, please reference a Bugzilla issue or create a manual changelog.

⚠️⚠️⚠️ Warnings ⚠️⚠️⚠️

  • In preparation for migrating from Bugzilla to GitHub Issues, the issue reference syntax has changed. Please add the word "Bugzilla" to issue references. For example, Fix Bugzilla Issue 12345 or Fix Bugzilla 12345.(Reminder: the edit needs to be done in the Git commit message, not the GitHub pull request.)

Testing this PR locally

If you don't have a local development environment setup, you can use Digger to test this PR:

dub run digger -- build "master + dmd#22825"

@suyashkumar102 suyashkumar102 force-pushed the fix/ftime-trace-sema-subspans branch 2 times, most recently from ae08b16 to 3802242 Compare March 27, 2026 08:07
@suyashkumar102
Copy link
Copy Markdown
Author

Tested locally on Windows — here's what the trace looks like in perfetto after this PR. The Sema1: Template Instance blocks now have visible sub-spans nested inside them. Before this, the entire semantic analysis block was flat with no sub-breakdown, making it impossible to attribute compile-time regressions to specific phases. The selected slice at the bottom confirms Sema1: Template Arg Semantic: Repeat!(float, 4) is showing correctly.
Screenshot 2026-03-27 141359

@dkorpel
Copy link
Copy Markdown
Contributor

dkorpel commented Mar 27, 2026

How were these spans chosen? It's important to weigh the added granularity vs. the added overhead. If a span is always a constant time, or always a fixed percentage of the parent span, then it would only make the flame chart larger without adding new information. If you could show how the spans vary in different scenarios, that would be great!

@suyashkumar102
Copy link
Copy Markdown
Author

suyashkumar102 commented Mar 27, 2026

@dkorpel
The spans were chosen to line up with the main phases involved in template instantiation in the compiler — argument semantic, overload resolution, member expansion, and the semantic2/3 passes. The idea was to break the template instance into a small number of meaningful steps rather than adding very fine-grained instrumentation.

To check whether they actually add useful information, I tried two contrasting cases:

Scenario A — Overload-heavy (select!bool with multiple constrained candidates):

Template instance ≈ 643 µs

Overload resolution ≈ 609 µs (~95%)

Members negligible

bool

Scenario B — Recursive template (Fib!10):

Template instance ≈ 3.3 ms

Members ≈ 3.25 ms (~98%)

Overload resolution negligible

fib10

So the dominant cost shifts depending on the pattern — overload resolution in one case, member expansion in the other. Without the sub-spans both would just appear as “template instantiation”, even though the underlying work is very different.

For granularity vs overhead, I tried to keep it at a level where each span corresponds to a distinct kind of work, rather than splitting things further. That keeps the trace readable while still making it possible to see where time is going. Also, this is only active with -ftime-trace, so it doesn’t affect normal compilation.

was looking at a -ftime-trace output in perfetto and the sema1 template
instance block was just one big blob with nothing inside it. makes it
really hard to tell if the time is going into arg evaluation, overload
resolution, expanding members, sema2 or sema3.

added spans for:
- semanticTiargs (arg semantic)
- findBestMatch (overload resolution)
- expandMembers (sema1 on members)
- semantic2 on the instance
- trySemantic3 on the instance

updated compilable/ftimetrace.d to expect the new span names.

fixes Bugzilla 22711
@suyashkumar102 suyashkumar102 force-pushed the fix/ftime-trace-sema-subspans branch from 3802242 to 511f335 Compare March 27, 2026 13:59
Comment on lines 970 to 972
if (false)
{
Lerror:
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

please make Lerror nested function and then return Lerror();

The previous commit converted goto Lerror to return Lerror() across
the file, but Lerror() as a nested function only exists inside
templateInstanceSemantic. The four call sites inside functionResolve
(two in applyTemplate, two in the outer body) still have their own
Lerror: labels in scope, so they should stay as goto Lerror.
@suyashkumar102
Copy link
Copy Markdown
Author

@dkorpel just following up on PR #22825 — answered your question about the span selection with some benchmark data. Let me know if you'd like any changes.

Comment on lines +1 to +9
$(CHANGELOG_NAV_INJECT)

$(VERSION $(CHANGELOG_VERSION_NIGHTLY), =================================================,

$(BUGSTITLE_TEXT_HEADER Compiler changes,

$(LI $(RELATIVE_LINK2 ftime-trace-sema-subspans,`-ftime-trace`: add sub-spans for template argument semantic and overload resolution))

)
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Please check the documentation on how to structure changelog entries, and proof-read the text

Copy link
Copy Markdown
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Fixed, used the correct format from the README and matched the span names to the actual trace output

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

-ftime-trace: semantic_analysis block is too coarse for useful profiling

4 participants