Skip to content

feature - add more profile scope#848

Merged
LLLLKKKK merged 1 commit intomainfrom
feature/profile
Apr 10, 2026
Merged

feature - add more profile scope#848
LLLLKKKK merged 1 commit intomainfrom
feature/profile

Conversation

@jianglan89
Copy link
Copy Markdown
Collaborator

No description provided.

@jianglan89 jianglan89 requested a review from LLLLKKKK as a code owner March 31, 2026 09:45
@jianglan89 jianglan89 force-pushed the feature/profile branch 2 times, most recently from 2752c7e to 75c2b70 Compare April 1, 2026 06:46
@LLLLKKKK
Copy link
Copy Markdown
Collaborator

LLLLKKKK commented Apr 1, 2026

Code Review: PR #848 — feature - add more profile scope

Author: jianglan89
Review version: v1
Head SHA: 75c2b70
Verdict: LGTM

Summary

为 RTP-LLM 添加更细粒度的 profiling 支持,主要变更:

  1. 大量 RTP_LLM_PROFILE_FUNCTION() 添加: 在 GenerateStream、StreamCacheResource、FIFOScheduler 的关键函数中添加 profiling scope,便于性能分析
  2. applyTimelineGate 提取为共享方法: 从 LocalRpcServer 中提取,DecodeRpcServer、PrefillRpcServer、*New 版本统一调用
  3. all_tp 广播 profiling: StartProfile 新增 all_tp 参数,tp_rank=0 通过 BroadcastManager 将 profiling 请求广播到所有 TP rank
  4. step_profiler_.tick() 移到 process() 之后: 确保所有 TP rank 在 NCCL 同步后再启停 profiler,时间窗口对齐
  5. Proto 新增 StartProfileInternalRequestPB: 用于 TP 内部广播

评价

改动方向正确,profiling scope 覆盖了调度器、stream 生命周期、缓存管理等关键路径。all_tp 广播设计合理,step_profiler_.tick() 位置调整有明确的技术理由。

P2 Suggestions

1. DecodeRpcServerNew.ccconst_cast 修改 protobuf request

const_cast<GenerateInputPB*>(request)->mutable_generate_config()->set_gen_timeline(true);

const_cast 修改 gRPC 传入的 const request 是不安全的做法。建议改为在 transQuery 之后修改 input->generate_config->gen_timeline,与 DecodeRpcServer.ccPrefillRpcServer.cc 的做法保持一致。

2. request_counter.py 似乎未被使用

新增了 rtp_llm/utils/request_counter.py,但在本 PR 中没有任何文件 import 它。如果是为后续 PR 准备的,建议在后续 PR 中一起提交,避免死代码。

P3 Nits

3. Trace name 格式变更

- prefix = "profiler_wr" + std::to_string(world_rank_) + "_ts" + ...
+ prefix = "profiler_ts" + ... + "wr" + std::to_string(world_rank_) + "_";

world_rank 从前缀移到后缀,现有的 trace 解析脚本如果依赖文件名格式可能需要更新。

@jianglan89 jianglan89 force-pushed the feature/profile branch 11 times, most recently from 183c0f6 to b41b927 Compare April 8, 2026 11:06
- Remove unsafe const_cast on protobuf request in DecodeRpcServerNew.cc,
  set gen_timeline on generate_input after transQuery instead
- Rename all_tp to enable_all_rank in StartProfileRequestPB
- Add profile scopes to cache connector async read/write path
- Fix missing ProfilingScope.h include in RemoteStoreTaskImpl.cpp
- Fix RTP_LLM_PROFILE_SCOPE() empty param calls to RTP_LLM_PROFILE_FUNCTION()

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
@LLLLKKKK LLLLKKKK merged commit 4183635 into main Apr 10, 2026
7 of 9 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants