ftime-trace: break up the template instance span into sub-phases#22825
ftime-trace: break up the template instance span into sub-phases#22825suyashkumar102 wants to merge 6 commits intodlang:masterfrom
Conversation
|
Thanks for your pull request and interest in making D better, @suyashkumar102! We are looking forward to reviewing it, and you should be hearing from a maintainer soon.
Please see CONTRIBUTING.md for more information. If you have addressed all reviews or aren't sure how to proceed, don't hesitate to ping us with a simple comment. Bugzilla referencesYour PR doesn't reference any Bugzilla issue. If your PR contains non-trivial changes, please reference a Bugzilla issue or create a manual changelog.
|
ae08b16 to
3802242
Compare
|
How were these spans chosen? It's important to weigh the added granularity vs. the added overhead. If a span is always a constant time, or always a fixed percentage of the parent span, then it would only make the flame chart larger without adding new information. If you could show how the spans vary in different scenarios, that would be great! |
|
@dkorpel To check whether they actually add useful information, I tried two contrasting cases: Scenario A — Overload-heavy (select!bool with multiple constrained candidates): Template instance ≈ 643 µs Overload resolution ≈ 609 µs (~95%) Members negligible
Scenario B — Recursive template (Fib!10): Template instance ≈ 3.3 ms Members ≈ 3.25 ms (~98%) Overload resolution negligible
So the dominant cost shifts depending on the pattern — overload resolution in one case, member expansion in the other. Without the sub-spans both would just appear as “template instantiation”, even though the underlying work is very different. For granularity vs overhead, I tried to keep it at a level where each span corresponds to a distinct kind of work, rather than splitting things further. That keeps the trace readable while still making it possible to see where time is going. Also, this is only active with -ftime-trace, so it doesn’t affect normal compilation. |
was looking at a -ftime-trace output in perfetto and the sema1 template instance block was just one big blob with nothing inside it. makes it really hard to tell if the time is going into arg evaluation, overload resolution, expanding members, sema2 or sema3. added spans for: - semanticTiargs (arg semantic) - findBestMatch (overload resolution) - expandMembers (sema1 on members) - semantic2 on the instance - trySemantic3 on the instance updated compilable/ftimetrace.d to expect the new span names. fixes Bugzilla 22711
3802242 to
511f335
Compare
compiler/src/dmd/templatesem.d
Outdated
| if (false) | ||
| { | ||
| Lerror: |
There was a problem hiding this comment.
please make Lerror nested function and then return Lerror();
The previous commit converted goto Lerror to return Lerror() across the file, but Lerror() as a nested function only exists inside templateInstanceSemantic. The four call sites inside functionResolve (two in applyTemplate, two in the outer body) still have their own Lerror: labels in scope, so they should stay as goto Lerror.
| $(CHANGELOG_NAV_INJECT) | ||
|
|
||
| $(VERSION $(CHANGELOG_VERSION_NIGHTLY), =================================================, | ||
|
|
||
| $(BUGSTITLE_TEXT_HEADER Compiler changes, | ||
|
|
||
| $(LI $(RELATIVE_LINK2 ftime-trace-sema-subspans,`-ftime-trace`: add sub-spans for template argument semantic and overload resolution)) | ||
|
|
||
| ) |
There was a problem hiding this comment.
Please check the documentation on how to structure changelog entries, and proof-read the text
There was a problem hiding this comment.
Fixed, used the correct format from the README and matched the span names to the actual trace output



-ftime-trace output in perfetto and the sema1 template instance block was just one big blob with nothing inside it. makes it really hard to tell if the time is going into arg evaluation, overload resolution, expanding members, sema2 or sema3.
added spans for:
fixes #22711