Skip to content

Commit 7251ca7

Browse files
authored
perf: build optimization (P0/P1/P2) (#26)
* perf(P0): frontend dirty check — skip prepare_build when inputs unchanged On a successful build, writes target/.build_cache with the output dir and ninja binary path. On the next invocation, if build.ninja in that dir is newer than all source files and mcpp.toml, invokes ninja directly without re-running toolchain resolve, scanner, make_plan, or emit. Reduces no-change builds from ~10s to <0.5s. * perf(P1): per-file dyndep — only rebuild changed modules Replace the global build.ninja.dd (one dyndep file for all modules) with per-file .dd files. Each .cppm gets its own .ddi → .dd conversion via the new cxx_dyndep rule, and each compile edge references only its own .dd file. Before: touching one file → all 39 modules dirty → full rebuild (~21s) After: touching one file → only that file's .dd dirty → incremental New `mcpp dyndep --single --output <file.dd> <file.ddi>` mode added for the per-file conversion. Legacy multi-file mode still available. * perf(P2): BMI copy_if_different — prevent cascade rebuilds GCC always updates the .gcm (BMI) file's timestamp even when the module interface hasn't changed. This causes all downstream modules to be recompiled unnecessarily. Fix: the cxx_module rule now backs up the BMI before compilation, and if the new BMI is byte-identical to the backup, restores the old file (preserving its timestamp). Combined with restat = 1 in the per-file dyndep entries, ninja skips downstream modules when only the implementation changed. This means modifying a function body without changing the module interface no longer triggers a cascade rebuild of all importers. * docs: add build optimization analysis report
1 parent 0b8b81b commit 7251ca7

5 files changed

Lines changed: 439 additions & 20 deletions

File tree

Lines changed: 216 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,216 @@
1+
# mcpp 构建优化深度分析报告
2+
3+
> 2026-05-12 — 模块化编译优化、缓存机制、增量编译分析
4+
> 基于 mcpp 0.0.10 代码库分析
5+
6+
## 1. 当前构建流程与耗时分解
7+
8+
### 1.1 mcpp 自身项目的构建数据
9+
10+
| 场景 | 耗时 | 分析 |
11+
|---|---|---|
12+
| 全量构建(冷) | ~21s | 合理 |
13+
| 无改动重新 build | **~10s** | ❌ 前端开销 |
14+
| touch 一个文件 | **~21s** | ❌ 全量重编译 |
15+
| ninja 直接 no-op | 0.023s | 参考基线 |
16+
17+
### 1.2 耗时分解
18+
19+
```
20+
mcpp build (touch 一个文件):
21+
├── mcpp 前端 (~10s)
22+
│ ├── toolchain resolve (xlings interface 子进程)
23+
│ ├── manifest parse + dep fetch
24+
│ ├── regex scanner (扫描所有 .cppm 文件)
25+
│ ├── modgraph validate
26+
│ ├── fingerprint compute
27+
│ ├── ensure_built std module
28+
│ ├── make_plan (BuildPlan 构建)
29+
│ ├── BMI cache check/stage
30+
│ └── build.ninja + compile_commands.json 生成
31+
32+
└── ninja 执行 (~11s)
33+
├── Phase 1: SCAN (1 个 .ddi 变化) ~1s
34+
├── Phase 2: COLLECT (build.ninja.dd 重生成) ~0.1s
35+
├── Phase 3: ALL 39 个模块重编译 ~10s ← 核心问题
36+
└── LINK ~0.5s
37+
```
38+
39+
## 2. 两大核心问题
40+
41+
### 2.1 问题一:全局 dyndep 导致全量重编译
42+
43+
**根因**:所有编译边依赖同一个 `build.ninja.dd` 文件。
44+
45+
```
46+
cli.cppm 被 touch
47+
48+
cli.cppm.ddi (scan) 重新生成
49+
50+
build.ninja.dd 依赖 ALL 39 个 .ddi → build.ninja.dd 被重新生成
51+
52+
所有 39 个 compile edge 都有 `| build.ninja.dd` 作为 implicit dep
53+
54+
全部 39 个模块标记为 dirty → 全量重编译
55+
```
56+
57+
**理想行为**:只有 `cli.cppm` 和直接依赖 `mcpp.cli` 模块的文件需要重编译。
58+
59+
**ninja 的 dyndep 机制本身支持 per-file dyndep**,当前实现选择了最简单的全局方案。
60+
61+
### 2.2 问题二:mcpp 前端每次全量重算
62+
63+
即使没有任何文件改动,mcpp 仍然花 ~10s 做:
64+
- 启动 xlings 子进程解析工具链
65+
- 扫描所有源文件的 module 声明
66+
- 生成 BuildPlan
67+
- 重新写入 build.ninja
68+
69+
这些步骤的结果在大多数增量编译场景下都不变。
70+
71+
## 3. 优化策略
72+
73+
### 3.1 策略一:per-file dyndep(影响最大)
74+
75+
**目标**:改一个文件只重编译该文件及其下游依赖。
76+
77+
**方案**:将全局 `build.ninja.dd` 拆分为 per-file dyndep。
78+
79+
```ninja
80+
# 当前(全局 dyndep):
81+
build build.ninja.dd : cxx_collect obj/cli.cppm.ddi obj/ui.cppm.ddi ...
82+
build obj/cli.m.o | gcm.cache/mcpp.cli.gcm : cxx_module src/cli.cppm | build.ninja.dd
83+
dyndep = build.ninja.dd
84+
85+
# 优化后(per-file dyndep):
86+
build obj/cli.cppm.dd : cxx_dyndep obj/cli.cppm.ddi
87+
restat = 1
88+
build obj/cli.m.o | gcm.cache/mcpp.cli.gcm : cxx_module src/cli.cppm | obj/cli.cppm.dd
89+
dyndep = obj/cli.cppm.dd
90+
```
91+
92+
**效果**:touch `ui.cppm` 时,只有 `ui.cppm.ddi``ui.cppm.dd` 变化,只有 `ui.m.o` 和依赖 `mcpp.ui` 的下游文件需要重编译。
93+
94+
**实现复杂度**:中等。需要修改 `ninja_backend.cppm` 的 emit 逻辑和 `dyndep.cppm` 的生成方式。
95+
96+
### 3.2 策略二:BMI restat + copy_if_different(减少级联重编译)
97+
98+
**目标**:当模块接口不变时(只改了实现),阻止级联重编译。
99+
100+
**方案**(业界标准做法,CMake 采用):
101+
1. 编译器输出 BMI 到临时文件
102+
2. 比较临时文件与当前 BMI 内容
103+
3. 内容不同才覆盖(保持旧时间戳)
104+
4. ninja `restat = 1` 检测到 BMI 未变,跳过下游
105+
106+
```ninja
107+
rule cxx_module
108+
command = $cxx $cxxflags -c $in -o $out.tmp && \
109+
(cmp -s $bmi_out.tmp $bmi_out && rm $bmi_out.tmp || mv $bmi_out.tmp $bmi_out) && \
110+
mv $out.tmp $out
111+
restat = 1
112+
```
113+
114+
**效果**:修改 `ui.cppm` 的函数体但不改接口 → BMI 不变 → 依赖 `mcpp.ui` 的下游不重编译。
115+
116+
**GCC 注意**:GCC 每次都会重新生成 BMI 文件(即使内容相同时间戳也变),所以必须在构建系统层面做 copy_if_different。
117+
118+
### 3.3 策略三:mcpp 前端缓存(减少 10s 前端开销)
119+
120+
**目标**:无改动时 mcpp 应在 <1s 内完成。
121+
122+
**方案**
123+
124+
1. **快速脏检查**:在调用 scanner/make_plan 之前,检查 `build.ninja` 是否比所有源文件更新。如果是,直接跳到 ninja 执行。
125+
126+
2. **增量 scanner**:缓存上一次的扫描结果(module graph),只重新扫描修改过的文件。
127+
128+
3. **工具链缓存**:toolchain resolve 结果缓存到 `.mcpp/cache/toolchain.json`,避免每次启动 xlings 子进程。
129+
130+
**效果**:无改动 → ~0.1s(直接 ninja no-op),改一个文件 → ~1s(增量 scan + ninja)。
131+
132+
### 3.4 策略四:Clang 两阶段编译(未来多工具链支持)
133+
134+
**当前**:GCC 一次生成 BMI + .o,串行依赖。
135+
136+
**Clang 支持两阶段**
137+
```
138+
Phase 1: clang --precompile A.cppm -o A.pcm (生成 BMI)
139+
Phase 2: clang -c A.pcm -o A.o (BMI → .o)
140+
```
141+
142+
**好处**:A 的 BMI 就绪后,B 可以开始编译 BMI,同时 A 继续编译 .o。并行度更高。
143+
144+
**Clang 还支持 Reduced BMI**`-fmodules-reduced-bmi`):BMI 只包含接口信息,不包含实现细节,更小、更少级联。
145+
146+
## 4. 架构设计建议
147+
148+
### 4.1 构建后端抽象层
149+
150+
当前 `Backend` 接口已经有抽象,但实际只有 NinjaBackend。建议扩展:
151+
152+
```
153+
Backend (abstract)
154+
├── NinjaBackend (当前,GCC + Ninja)
155+
├── NinjaClangBackend (未来,Clang + Ninja,两阶段编译)
156+
├── MSBuildBackend (未来,MSVC)
157+
└── DirectBackend (未来,无 ninja,mcpp 直接调度编译)
158+
```
159+
160+
### 4.2 Scanner 抽象层
161+
162+
```
163+
ModuleScanner (abstract)
164+
├── RegexScanner (当前,快速但不精确)
165+
├── P1689Scanner (当前,GCC -fdeps-format=p1689r5)
166+
├── ClangScanDepsScanner (未来,clang-scan-deps)
167+
└── CachedScanner (装饰器,缓存上一次结果,增量更新)
168+
```
169+
170+
### 4.3 BMI 管理层
171+
172+
```
173+
BmiManager
174+
├── ProjectBmiCache (per-project target/ 目录)
175+
├── GlobalBmiCache (当前 $MCPP_HOME/bmi/,跨项目共享)
176+
├── BmiRestatHelper (copy_if_different + restat 机制)
177+
└── BmiContentHash (未来,基于 BMI 内容哈希而非时间戳)
178+
```
179+
180+
### 4.4 工具链抽象层
181+
182+
```
183+
Toolchain
184+
├── GccToolchain (当前,GCC 16.1)
185+
├── ClangToolchain (未来)
186+
├── MsvcToolchain (未来)
187+
└── ToolchainCache (缓存 resolve 结果)
188+
```
189+
190+
## 5. 优先级建议
191+
192+
| 优先级 | 策略 | 预期收益 | 实现复杂度 |
193+
|---|---|---|---|
194+
| P0 | 前端快速脏检查 | 无改动 10s → <0.5s ||
195+
| P1 | per-file dyndep | 改一文件 21s → ~3s ||
196+
| P2 | BMI restat + copy_if_different | 改实现不改接口 → 0 级联 ||
197+
| P3 | 增量 scanner | scanner 耗时减少 80%+ ||
198+
| P4 | 工具链 resolve 缓存 | 减少 1-2s 启动开销 ||
199+
| P5 | Clang 两阶段编译支持 | 并行度提升,减少级联 ||
200+
201+
## 6. 业界参考
202+
203+
| 构建系统 | 模块编译策略 | 增量方案 |
204+
|---|---|---|
205+
| CMake 3.28+ | per-file scan + per-target collation dyndep | restat + copy_if_different |
206+
| build2 | GCC module mapper 协议(编译时动态发现依赖) | 无需 scan 阶段 |
207+
| xmake | 编译器原生 scan + jobgraph 并行 | 增量 scan |
208+
209+
## 7. 总结
210+
211+
mcpp 的模块构建基础架构是正确的(三阶段 dyndep pipeline + BMI 缓存 + 指纹隔离),但在增量编译效率上有显著优化空间。最大的两个 win 是:
212+
213+
1. **前端脏检查**(P0)— 即刻将无改动场景从 10s 降到 <0.5s
214+
2. **per-file dyndep**(P1)— 将单文件修改场景从 21s 降到 ~3s
215+
216+
这两个优化不影响正确性,不需要改变架构,可以增量实施。

src/build/backend.cppm

Lines changed: 1 addition & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -21,6 +21,7 @@ struct BuildResult {
2121
std::chrono::milliseconds elapsed { 0 };
2222
std::size_t cacheHits = 0;
2323
std::size_t cacheMisses = 0;
24+
std::string ninjaProgram; // P0: cached for fast-path rebuilds
2425
};
2526

2627
struct BuildError {

src/build/ninja_backend.cppm

Lines changed: 57 additions & 15 deletions
Original file line numberDiff line numberDiff line change
@@ -187,8 +187,33 @@ std::string emit_ninja_string(const BuildPlan& plan) {
187187
append(" command = mkdir -p $$(dirname $out) && cp -f $in $out\n");
188188
append(" description = STAGE $out\n\n");
189189

190+
// P1: per-file dyndep rule. Converts one .ddi → .dd independently.
191+
append("rule cxx_dyndep\n");
192+
append(" command = $mcpp dyndep --single --output $out $in\n");
193+
append(" description = DYNDEP $out\n");
194+
append(" restat = 1\n\n");
195+
196+
// P2: cxx_module preserves BMI timestamps when interface is unchanged.
197+
// GCC always updates the .gcm timestamp even if content is identical.
198+
// We backup the BMI before compilation, compile, then restore the old
199+
// file if content is byte-identical. Combined with restat = 1 in the
200+
// dyndep file, this prevents cascading rebuilds when only the module
201+
// implementation changed (not the interface).
202+
//
203+
// $bmi_out is set per build edge to the BMI path (gcm.cache/<module>.gcm).
204+
// If $bmi_out is empty (no module provided), we just compile normally.
190205
append("rule cxx_module\n");
191-
append(" command = $cxx $cxxflags -c $in -o $out\n");
206+
append(" command = "
207+
"if [ -n \"$bmi_out\" ] && [ -f \"$bmi_out\" ]; then "
208+
"cp -p \"$bmi_out\" \"$bmi_out.bak\"; "
209+
"fi && "
210+
"$cxx $cxxflags -c $in -o $out && "
211+
"if [ -n \"$bmi_out\" ] && [ -f \"$bmi_out.bak\" ] && "
212+
"cmp -s \"$bmi_out\" \"$bmi_out.bak\"; then "
213+
"mv \"$bmi_out.bak\" \"$bmi_out\"; "
214+
"else "
215+
"rm -f \"$bmi_out.bak\"; "
216+
"fi\n");
192217
append(" description = MOD $out\n");
193218
if (dyndep)
194219
append(" restat = 1\n");
@@ -287,16 +312,21 @@ std::string emit_ninja_string(const BuildPlan& plan) {
287312
}
288313
append("\n");
289314

290-
// ── Phase 2: collect into dyndep file. ──────────────────────────
291-
std::string ddi_inputs;
292-
for (auto& d : ddi_paths)
293-
ddi_inputs += " " + d;
294-
append("build build.ninja.dd : cxx_collect" + ddi_inputs + "\n\n");
315+
// ── Phase 2: per-file dyndep (P1 optimization). ────────────────
316+
// Each .ddi → .dd independently, so modifying one source file only
317+
// invalidates that file's .dd and its compile edge, not all edges.
318+
// Map ddi path → dd path for Phase 3 reference.
319+
std::map<std::string, std::string> ddi_to_dd;
320+
for (auto& ddi : ddi_paths) {
321+
auto dd = ddi + ".dd"; // e.g. obj/cli.cppm.ddi.dd
322+
ddi_to_dd[ddi] = dd;
323+
append(std::format("build {} : cxx_dyndep {}\n", dd, ddi));
324+
}
325+
append("\n");
295326

296-
// ── Phase 3: compile edges with dyndep. ─────────────────────────
297-
// BMI implicit outputs are still declared statically (we know
298-
// them from the plan); the dyndep file adds implicit BMI INPUTS
299-
// (the requires) so ninja schedules in the right order.
327+
// ── Phase 3: compile edges with per-file dyndep. ────────────────
328+
// Each compile edge references its OWN .dd file instead of a global one.
329+
// P2: module compile edges get a $bmi_out variable for BMI preservation.
300330
for (auto& cu : plan.compileUnits) {
301331
std::string rule = pick_rule(cu.source);
302332

@@ -306,10 +336,19 @@ std::string emit_ninja_string(const BuildPlan& plan) {
306336
}
307337
out_line += std::format(" : {} {}", rule, escape_ninja_path(cu.source));
308338
if (rule != "c_object") {
309-
// build.ninja.dd is the dyndep file; ninja requires it as an
310-
// implicit input (so it's built before the compile runs).
311-
out_line += " | build.ninja.dd";
312-
out_line += "\n dyndep = build.ninja.dd\n";
339+
auto ddi = (cu.object.parent_path() / cu.source.filename()).string() + ".ddi";
340+
auto it = ddi_to_dd.find(ddi);
341+
if (it != ddi_to_dd.end()) {
342+
out_line += " | " + it->second;
343+
out_line += "\n dyndep = " + it->second;
344+
// P2: set bmi_out for the copy_if_different logic in cxx_module.
345+
if (cu.providesModule) {
346+
out_line += "\n bmi_out = " + bmi_path(*cu.providesModule);
347+
}
348+
out_line += "\n";
349+
} else {
350+
out_line += "\n";
351+
}
313352
} else {
314353
out_line += "\n";
315354
}
@@ -446,6 +485,10 @@ std::expected<BuildResult, BuildError> NinjaBackend::build(const BuildPlan& plan
446485
std::string ninjaProgram =
447486
!ninjaBin.empty() ? std::format("'{}'", ninjaBin.string()) : std::string{"ninja"};
448487

488+
// Record ninja binary for P0 fast-path cache.
489+
BuildResult r;
490+
r.ninjaProgram = ninjaProgram;
491+
449492
std::string cmd = std::format("{} -C '{}'", ninjaProgram, plan.outputDir.string());
450493
if (opts.verbose)
451494
cmd += " -v";
@@ -459,7 +502,6 @@ std::expected<BuildResult, BuildError> NinjaBackend::build(const BuildPlan& plan
459502
std::fputs(out.c_str(), stdout);
460503
}
461504

462-
BuildResult r;
463505
r.exitCode = ok ? 0 : 1;
464506
r.elapsed = std::chrono::duration_cast<std::chrono::milliseconds>(
465507
std::chrono::steady_clock::now() - t0);

0 commit comments

Comments
 (0)