fix: stabilize D2F LoRA decoding under preempt-rebuild scenarios by Osc-7 · Pull Request #40 · SJTU-DENG-Lab/Diffulex

Osc-7 · 2026-04-21T20:20:49Z

Summary

This PR fixes multiple instability issues in D2F LoRA decoding, especially under preempt + rebuild scenarios.

The main problems were caused by inconsistent indexing and request state handling during prefill and decode phases, which could lead to incorrect logits alignment and unstable generation.

Key Changes

LoRA
- Align target module matching with packed_modules_mapping
D2F / Decode flow
- Stabilize LLaDA decode after preemption
- Introduce rebuild-aware request flow
Sampler
- Fix misalignment of prefill indices affecting token selection
Request / KV cache
- Bound prefill-to-cache indices within visible logits window
- Prevent invalid cache access during rebuild

Results

Model	Before	After
LLaDA (D2F LoRA)	9.0	79.0

Notes

No interface changes
Fully backward compatible
Mainly affects D2F + LoRA pipelines under preemption

Checklist

Tested on GSM8K benchmarks
No regression observed
Rebased onto latest upstream/main

Summary by CodeRabbit

Bug Fixes
- Safer prefill/resume handling to prevent invalid token indexing and improve resume/preempt stability.
Configuration
- Increased evaluation max_tokens (256 → 512) in several GSM8K configs; lowered some engine batching/concurrency and buffer defaults in selected presets.
New Features
- Added optional server CLI flag --mask-token-id.
- Improved LoRA target matching for packed module mappings.
Documentation
- Reorganized cookbook into model recipes and added multiple model-specific guides.
Chores
- Added ignore patterns for experiment/ and ckpts/; extended UI display stop tokens.

…uest flow - add preempt-rebuild request state handling in multi_block request template - prevent decode livelock by recycling cached head block when frontier is lost - align no_shift prefill sampling with resume_prefill_until window to avoid out-of-window block sampling This keeps the fix scoped to request/no_shift without experimental guards or benchmark config changes.

coderabbitai · 2026-04-21T20:20:58Z

Warning

Rate limit exceeded

@Osc-7 has exceeded the limit for the number of commits that can be reviewed per hour. Please wait 40 minutes and 11 seconds before requesting another review.

To keep reviews running without waiting, you can enable usage-based add-on for your organization. This allows additional reviews beyond the hourly cap. Account admins can enable it under billing.

⌛ How to resolve this issue?

After the wait time has elapsed, a review can be triggered using the @coderabbitai review command as a PR comment. Alternatively, push new commits to this PR.

We recommend that you space out your commits to avoid hitting the rate limit.

🚦 How do rate limits work?

CodeRabbit enforces hourly rate limits for each developer per organization.

Our paid plans have higher rate limits than the trial, open-source and free plans. In all cases, we re-allow further reviews after a brief timeout.

Please see our FAQ for further information.

ℹ️ Review info

⚙️ Run configuration

Configuration used: defaults

Review profile: CHILL

Plan: Pro

Run ID: e4e399b4-1991-40b6-b8d6-155627b80c51

📥 Commits

Reviewing files that changed from the base of the PR and between 966b75f and 4550c0b.

📒 Files selected for processing (1)

diffulex/engine/engine.py

📝 Walkthrough

Walkthrough

Adds gitignore patterns, resume-aware prefill and safer logits indexing in samplers, threads resume/cache/terminal-context state through multi-block request control flow, extends LoRA loader mapping, exposes mask-token override via engine/server, updates bench configs and cookbook docs, and updates an example stop token.

Changes

Cohort / File(s)	Summary
Repo ignore `\.gitignore`	Added `experiment/` and `ckpts/` to ignore patterns.
Sampler: prefill & shift guards `diffulex/sampler/base/no_shift.py`, `diffulex/sampler/base/shift.py`	Skip mask-token sampling when computed `local_ids` is empty (return empty list instead of IndexError); `_fetch_last_logits` bounds-checks cached-block index and falls back to map/cache lookup.
Multi-block request resume & caching `diffulex/strategy_template/multi_block/engine/request.py`	Introduced `_resume_prefill_until` and `_terminal_context_block_id`; changed running_len/visible-cache computation, demote cached blocks on preempt, add livelock escape and resume-aware prefix/dummy/postprocessing logic.
Loader: LoRA packed-module mapping `diffulex/utils/loader.py`	`enable_lora_for_model` gains `packed_modules_mapping` arg; builds reverse mapping and matches LoRA targets against effective local leaf names; `load_model` forwards model mapping.
Engine & server mask token handling `diffulex/engine/engine.py`, `diffulex/server/args.py`	Extracted `maybe_override_mask_token_id` helper with tokenizer / hf_config two-stage override; `ServerArgs` adds optional `mask_token_id` CLI flag and conditionally includes it in engine kwargs.
Bench configs `diffulex_bench/configs/...` `dream_base_gsm8k.yml`, `fast_dllm_v2_gsm8k.yml`, `llada_instruct_gsm8k.yml`, `sdar_chat_gsm8k.yml`	Increased `eval.max_tokens` (256→512) across configs; `llada_instruct_gsm8k.yml` also reduces engine sizing (`max_model_len`, `max_num_batched_tokens`, `max_num_reqs`, `buffer_size`).
Examples & docs `examples/streamlit_block_append_chat.py`, `docs/cookbook/...`	Added `"<

Sequence Diagram(s)

sequenceDiagram
    participant Client as Client
    participant Engine as Engine
    participant Req as MultiBlockReq
    participant Cache as Cache
    participant Sampler as Sampler

    Client->>Engine: submit request / resume
    Engine->>Req: lazy_activate / set _resume_prefill_until
    Req->>Cache: compute visible cached prefix (to_cache_last_token_id)
    Cache-->>Req: visible cached prefix id
    Req->>Sampler: schedule prefill (skip blocks beyond resume or with empty local_ids)
    Sampler->>Cache: fetch last logits (safe index / fallback)
    Sampler-->>Engine: token outputs / prefill progress

Estimated code review effort

🎯 4 (Complex) | ⏱️ ~45 minutes

Possibly related PRs

feat(moe): add moe support and fused topk & moe kernels #37 — Loader changes adding packed-module mapping and weight-resolution helpers; strongly related to enable_lora_for_model / packed mapping updates.

Poem

🐇 I hop through cached pages, resume with a twitch,
I skip empty ids so logits don't glitch.
LoRA maps find their tucked-away home,
Docs and benches wander farther to roam,
A tiny rabbit patch — quick nibble, then switch!

🚥 Pre-merge checks | ✅ 4 | ❌ 1

❌ Failed checks (1 warning)

Check name	Status	Explanation	Resolution
Docstring Coverage	⚠️ Warning	Docstring coverage is 13.79% which is insufficient. The required threshold is 80.00%.	Write docstrings for the functions missing them to satisfy the coverage threshold.

✅ Passed checks (4 passed)

Check name	Status	Explanation
Description Check	✅ Passed	Check skipped - CodeRabbit’s high-level summary is enabled.
Title check	✅ Passed	The title accurately describes the main objective of the PR—fixing D2F LoRA decoding stability under preempt-rebuild scenarios, which is the central theme across code changes in request.py, sampler files, and LoRA configuration.
Linked Issues check	✅ Passed	Check skipped because no linked issues were found for this pull request.
Out of Scope Changes check	✅ Passed	Check skipped because no linked issues were found for this pull request.

_{✏️ Tip: You can configure your own custom pre-merge checks in the settings.}

✨ Finishing Touches

🧪 Generate unit tests (beta)

Create PR with unit tests

Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share

_{Review rate limit: 0/1 reviews remaining, refill in 40 minutes and 11 seconds.}

_{Comment @coderabbitai help to get the list of available commands and usage tips.}

coderabbitai

Actionable comments posted: 2

Caution

Some comments are outside the diff and can’t be posted inline due to platform limitations.

⚠️ Outside diff range comments (1)

diffulex/strategy_template/multi_block/engine/request.py (1)
11-25: ⚠️ Potential issue | 🟡 Minor

Initialize _resume_prefill_until in _restore_req_runtime_state to prevent AttributeError on unpickling.

_resume_prefill_until is initialized only in init_multi_block (line 31) but not restored in _restore_req_runtime_state. When a multi-block request is unpickled via __setstate__, if the attribute is missing from the pickled state (edge case: old pickled requests, failed migrations, etc.), subsequent accesses at lines 232, 415, 507, 550 will raise AttributeError. The sampler already uses defensive getattr(req, "_resume_prefill_until", 0) at line 99 of no_shift.py, establishing the pattern.

Initialize it in _restore_req_runtime_state to match the mixin pattern shown in the test suite:
Suggested fix
     def _restore_req_runtime_state(self):
         super()._restore_req_runtime_state()

         if not self.is_multi_block:
             return

+        self._resume_prefill_until = getattr(self, "_resume_prefill_until", 0)
+
         dllm_blocks = self.dllm_blocks
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@diffulex/strategy_template/multi_block/engine/request.py` around lines 11 -
25, The _restore_req_runtime_state method must ensure the _resume_prefill_until
attribute exists after unpickling to avoid AttributeError; update the
_restore_req_runtime_state implementation to initialize
self._resume_prefill_until = getattr(self, "_resume_prefill_until", 0) (or
assign 0 if missing) before any sampler or block logic runs—do this alongside
the existing multi-block handling in _restore_req_runtime_state (the same place
that accesses dllm_blocks and dllm_block_buffer) so it matches initialization in
init_multi_block and the defensive pattern used in no_shift.py.

🧹 Nitpick comments (2)

diffulex_bench/configs/llada_instruct_gsm8k.yml (1)

19-21: Prefill concurrency is effectively capped to 1 with these values.

With max_num_batched_tokens == max_model_len (Line 20 and Line 19), warmup prefill concurrency becomes 1 even though max_num_reqs is 24 (Line 21). If this profile is meant for stability-only that’s fine, but if benchmark throughput matters, consider setting max_num_batched_tokens above max_model_len (or add a comment explaining the intentional tradeoff).
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@diffulex_bench/configs/llada_instruct_gsm8k.yml` around lines 19 - 21, The
config sets max_num_batched_tokens equal to max_model_len which forces warmup
prefill concurrency to 1 despite max_num_reqs being 24; either raise
max_num_batched_tokens above max_model_len (e.g., to a multiple of
max_model_len) to allow higher prefill concurrency or add a comment nearby
documenting that this equality is intentional to cap concurrency for stability;
update the values or add the explanatory comment referencing the keys
max_model_len, max_num_batched_tokens, and max_num_reqs.

diffulex/strategy_template/multi_block/engine/request.py (1)

397-401: Redundant continue in the preempt demotion loop.

If block.is_to_cache is True, block.is_in_cache is False (the statuses are mutually exclusive), so the if block.is_in_cache: branch is already a no-op for those blocks. The continue at line 399 therefore skips nothing.

Either drop the condition, or if the intent is to document that TO_CACHE blocks within rebuild_until are deliberately preserved, convert it to a clarifying comment. As written, future readers may infer that TO_CACHE would otherwise be demoted here, which is not the case.

♻️ Suggested simplification

         if self.is_multi_block:
             rebuild_until = 0
             if self.is_decoding:
                 rebuild_until = int(self.running_seq_start)
                 self._resume_prefill_until = rebuild_until
-            # Scheduler preemption frees this req's page_table immediately after moving it
-            # back to WAITING. Any block still marked IN_CACHE would then point to KV pages
-            # that no longer belong to the req. Demote those blocks back to TO_CACHE so a
-            # resumed req rebuilds its cached prefix instead of reusing stale block state.
-            for block in self.dllm_blocks:
-                if rebuild_until > 0 and block.end <= rebuild_until and block.is_to_cache:
-                    continue
-                if block.is_in_cache:
-                    block.status = DllmBlockStatus.TO_CACHE
+            # Scheduler preemption frees this req's page_table immediately after moving it
+            # back to WAITING. Any block still marked IN_CACHE would then point to KV pages
+            # that no longer belong to the req. Demote IN_CACHE back to TO_CACHE so a
+            # resumed req rebuilds its cached prefix instead of reusing stale block state.
+            # TO_CACHE blocks are left untouched.
+            for block in self.dllm_blocks:
+                if block.is_in_cache:
+                    block.status = DllmBlockStatus.TO_CACHE

🤖 Prompt for AI Agents

Verify each finding against the current code and only fix it if needed.

In `@diffulex/strategy_template/multi_block/engine/request.py` around lines 397 -
401, The preempt demotion loop contains a redundant continue: in the loop over
self.dllm_blocks, the check "if rebuild_until > 0 and block.end <= rebuild_until
and block.is_to_cache: continue" is unnecessary because block.is_to_cache and
block.is_in_cache are mutually exclusive; remove the continue (or replace the
condition with a short clarifying comment) so the code simply checks "if
block.is_in_cache: block.status = DllmBlockStatus.TO_CACHE" and, if you keep a
note, add a comment near the loop that TO_CACHE blocks within rebuild_until are
intentionally preserved.

🤖 Prompt for all review comments with AI agents

Verify each finding against the current code and only fix it if needed.

Inline comments:
In `@diffulex/sampler/base/no_shift.py`:
- Around line 18-31: The method _maybe_log_prefill_alignment currently only
tracks req_ids in self._debug_prefill_alignment_logged and never emits logs,
wasting CPU because prefill_block_summaries is always built; fix by either
removing this dead scaffolding or implementing real gated logging: in the hot
path where prefill_block_summaries is constructed (the code that builds
per-block summaries and calls _maybe_log_prefill_alignment), wrap that
construction and the call with a logger.isEnabledFor(logging.DEBUG) check, and
update _maybe_log_prefill_alignment to call logger.debug(...) with a concise
summary (using req_id and the passed block_summaries) while still using
self._debug_prefill_alignment_logged to de-duplicate; reference
_maybe_log_prefill_alignment, prefill_block_summaries (the summary construction
site), and self._debug_prefill_alignment_logged when making the changes.

In `@diffulex/utils/loader.py`:
- Around line 54-57: Normalize target_modules to a sequence before the
comparison: if target_modules is a str (or other non-iterable intended as a
single module), wrap it into a list so the any(...) check in the loader logic
works on module names rather than characters. Update the code surrounding
target_modules (used with name, leaf, rev_mapping and should_apply) to coerce
strings into [target_modules] (or otherwise ensure it's a list/tuple) before
computing effective and should_apply.

---

Outside diff comments:
In `@diffulex/strategy_template/multi_block/engine/request.py`:
- Around line 11-25: The _restore_req_runtime_state method must ensure the
_resume_prefill_until attribute exists after unpickling to avoid AttributeError;
update the _restore_req_runtime_state implementation to initialize
self._resume_prefill_until = getattr(self, "_resume_prefill_until", 0) (or
assign 0 if missing) before any sampler or block logic runs—do this alongside
the existing multi-block handling in _restore_req_runtime_state (the same place
that accesses dllm_blocks and dllm_block_buffer) so it matches initialization in
init_multi_block and the defensive pattern used in no_shift.py.

---

Nitpick comments:
In `@diffulex_bench/configs/llada_instruct_gsm8k.yml`:
- Around line 19-21: The config sets max_num_batched_tokens equal to
max_model_len which forces warmup prefill concurrency to 1 despite max_num_reqs
being 24; either raise max_num_batched_tokens above max_model_len (e.g., to a
multiple of max_model_len) to allow higher prefill concurrency or add a comment
nearby documenting that this equality is intentional to cap concurrency for
stability; update the values or add the explanatory comment referencing the keys
max_model_len, max_num_batched_tokens, and max_num_reqs.

In `@diffulex/strategy_template/multi_block/engine/request.py`:
- Around line 397-401: The preempt demotion loop contains a redundant continue:
in the loop over self.dllm_blocks, the check "if rebuild_until > 0 and block.end
<= rebuild_until and block.is_to_cache: continue" is unnecessary because
block.is_to_cache and block.is_in_cache are mutually exclusive; remove the
continue (or replace the condition with a short clarifying comment) so the code
simply checks "if block.is_in_cache: block.status = DllmBlockStatus.TO_CACHE"
and, if you keep a note, add a comment near the loop that TO_CACHE blocks within
rebuild_until are intentionally preserved.

🪄 Autofix (Beta)

Fix all unresolved CodeRabbit comments on this PR:

Push a commit to this branch (recommended)
Create a new PR with the fixes

ℹ️ Review info

⚙️ Run configuration

Configuration used: defaults

Review profile: CHILL

Plan: Pro

Run ID: 7bbaa9ce-b02f-4e4c-a2d5-c1076ad99b9d

📥 Commits

Reviewing files that changed from the base of the PR and between 0b9c6a1 and f5352f9.

📒 Files selected for processing (9)

.gitignore
diffulex/sampler/base/no_shift.py
diffulex/sampler/base/shift.py
diffulex/strategy_template/multi_block/engine/request.py
diffulex/utils/loader.py
diffulex_bench/configs/dream_base_gsm8k.yml
diffulex_bench/configs/fast_dllm_v2_gsm8k.yml
diffulex_bench/configs/llada_instruct_gsm8k.yml
diffulex_bench/configs/sdar_chat_gsm8k.yml

coderabbitai

Actionable comments posted: 1

Caution

Some comments are outside the diff and can’t be posted inline due to platform limitations.

⚠️ Outside diff range comments (2)

diffulex/strategy_template/multi_block/engine/request.py (1)

586-598: ⚠️ Potential issue | 🔴 Critical

Set a terminal context for every truncation path.

Line 586 only sets terminal_context_block for EOS/max-token/model-len stops, but Line 592 also gates deactivation on max_nfe_reached and max_repetition_run_reached. For those two stop reasons, last_block_finished stays false because terminal_context_block is still None, so the request can livelock instead of completing.

🐛 Proposed fix

-        if self.eos_token_generated or self.max_new_tokens_reached or self.max_model_len_reached:
+        if self.is_truncated:
             self.set_terminal_context_block(self.dllm_block_buffer.last_valid_block)

🤖 Prompt for AI Agents

Verify each finding against the current code and only fix it if needed.

In `@diffulex/strategy_template/multi_block/engine/request.py` around lines 586 -
598, The code only calls
set_terminal_context_block(self.dllm_block_buffer.last_valid_block) when
eos_token_generated/max_new_tokens_reached/max_model_len_reached are true, but
later the completion check also includes max_nfe_reached and
max_repetition_run_reached; ensure terminal_context_block is set for those two
stop reasons too. Update the logic around set_terminal_context_block so that if
any of eos_token_generated, max_new_tokens_reached, max_model_len_reached,
max_nfe_reached, or max_repetition_run_reached is true you call
set_terminal_context_block(self.dllm_block_buffer.last_valid_block) before
evaluating the final conditional that checks last_block_finished; reference the
flags named above and the methods/attributes set_terminal_context_block,
terminal_context_block, last_block_finished, and
dllm_block_buffer.last_valid_block when making the change.

diffulex/engine/engine.py (1)

75-84: ⚠️ Potential issue | 🟠 Major

Move tokenizer loading and mask-token normalization before spawning worker processes.

With the spawn context, config is serialized when process.start() executes. Worker ranks are spawned on lines 29–31 before the tokenizer is loaded and config.mask_token_id is normalized on lines 32–35, causing workers to retain the stale/default mask token ID while rank 0 receives the corrected value.

Proposed fix

         self.ps = []
         self.events = []
+        self.tokenizer = AutoTokenizer.from_pretrained(config.model, use_fast=True, trust_remote_code=True)
+        config.tokenizer_vocab_size = len(self.tokenizer)
+        config.eos = self.tokenizer.eos_token_id
+        maybe_override_mask_token_id(config, self.tokenizer)
+
         ctx = mp.get_context("spawn")
         for i in range(1, self.model_parallel_world_size):
             event = ctx.Event()
             process = ctx.Process(target=AutoModelRunner.from_config, args=(config, i, event))
             process.start()
             self.ps.append(process)
             self.events.append(event)
-        self.tokenizer = AutoTokenizer.from_pretrained(config.model, use_fast=True, trust_remote_code=True)
-        config.tokenizer_vocab_size = len(self.tokenizer)
-        config.eos = self.tokenizer.eos_token_id
-        maybe_override_mask_token_id(config, self.tokenizer)

🤖 Prompt for AI Agents

Verify each finding against the current code and only fix it if needed.

In `@diffulex/engine/engine.py` around lines 75 - 84, Move the tokenizer
initialization and mask-token normalization so they run before spawning worker
processes: load AutoTokenizer.from_pretrained and set
config.tokenizer_vocab_size, config.eos and call
maybe_override_mask_token_id(config, tokenizer) prior to the loop that creates
ctx.Event and ctx.Process and starts AutoModelRunner.from_config; this ensures
the serialized config passed to each child (via the ctx.Process start loop that
appends to self.ps/self.events and targets AutoModelRunner.from_config) contains
the corrected mask_token_id and tokenizer-derived fields.

🤖 Prompt for all review comments with AI agents

Verify each finding against the current code and only fix it if needed.

Inline comments:
In `@diffulex/engine/engine.py`:
- Around line 32-40: maybe_override_mask_token_id is unconditionally overwriting
user-provided mask_token_id; detect whether the user explicitly passed
--mask-token-id by checking config_kwargs in DiffulexEngine.__init__ (set a
boolean mask_token_id_explicit) and pass that flag into
maybe_override_mask_token_id; inside maybe_override_mask_token_id, skip the
tokenizer-based override when mask_token_id_explicit is True so CLI/config
values are preserved, and ensure you set/pass the explicit flag before worker
processes are spawned so non-rank-0 runners receive the correct config.

---

Outside diff comments:
In `@diffulex/engine/engine.py`:
- Around line 75-84: Move the tokenizer initialization and mask-token
normalization so they run before spawning worker processes: load
AutoTokenizer.from_pretrained and set config.tokenizer_vocab_size, config.eos
and call maybe_override_mask_token_id(config, tokenizer) prior to the loop that
creates ctx.Event and ctx.Process and starts AutoModelRunner.from_config; this
ensures the serialized config passed to each child (via the ctx.Process start
loop that appends to self.ps/self.events and targets
AutoModelRunner.from_config) contains the corrected mask_token_id and
tokenizer-derived fields.

In `@diffulex/strategy_template/multi_block/engine/request.py`:
- Around line 586-598: The code only calls
set_terminal_context_block(self.dllm_block_buffer.last_valid_block) when
eos_token_generated/max_new_tokens_reached/max_model_len_reached are true, but
later the completion check also includes max_nfe_reached and
max_repetition_run_reached; ensure terminal_context_block is set for those two
stop reasons too. Update the logic around set_terminal_context_block so that if
any of eos_token_generated, max_new_tokens_reached, max_model_len_reached,
max_nfe_reached, or max_repetition_run_reached is true you call
set_terminal_context_block(self.dllm_block_buffer.last_valid_block) before
evaluating the final conditional that checks last_block_finished; reference the
flags named above and the methods/attributes set_terminal_context_block,
terminal_context_block, last_block_finished, and
dllm_block_buffer.last_valid_block when making the change.

🪄 Autofix (Beta)

Fix all unresolved CodeRabbit comments on this PR:

Push a commit to this branch (recommended)
Create a new PR with the fixes

ℹ️ Review info

⚙️ Run configuration

Configuration used: defaults

Review profile: CHILL

Plan: Pro

Run ID: 17fad98f-c049-44e2-a280-cc0a73608e66

📥 Commits

Reviewing files that changed from the base of the PR and between f5352f9 and 1fca0f5.

📒 Files selected for processing (6)

diffulex/engine/engine.py
diffulex/sampler/base/no_shift.py
diffulex/server/args.py
diffulex/strategy_template/multi_block/engine/request.py
diffulex/utils/loader.py
examples/streamlit_block_append_chat.py

✅ Files skipped from review due to trivial changes (1)

examples/streamlit_block_append_chat.py

🚧 Files skipped from review as they are similar to previous changes (1)

diffulex/utils/loader.py

coderabbitai

♻️ Duplicate comments (1)

diffulex/engine/engine.py (1)

24-55: ⚠️ Potential issue | 🟠 Major

Preserve explicit mask_token_id overrides from CLI/config.

maybe_override_mask_token_id still unconditionally replaces config.mask_token_id when tokenizer/hf_config has a different value, so explicit --mask-token-id can still be ignored (see diffulex/server/args.py Line 92-93 passing it through). This was previously flagged and remains unresolved.

🐛 Proposed fix

-def maybe_override_mask_token_id(config: Config, tokenizer) -> None:
+def maybe_override_mask_token_id(
+    config: Config,
+    tokenizer,
+    *,
+    mask_token_id_explicit: bool = False,
+) -> None:
@@
+    if mask_token_id_explicit:
+        return
+
     tokenizer_mask_token_id = getattr(tokenizer, "mask_token_id", None)
@@
-        maybe_override_mask_token_id(config, self.tokenizer)
+        maybe_override_mask_token_id(
+            config,
+            self.tokenizer,
+            mask_token_id_explicit="mask_token_id" in config_kwargs,
+        )

Also applies to: 77-77

🤖 Prompt for AI Agents

Verify each finding against the current code and only fix it if needed.

In `@diffulex/engine/engine.py` around lines 24 - 55, The function
maybe_override_mask_token_id currently unconditionally replaces
config.mask_token_id from the tokenizer branch, which lets explicit CLI/config
overrides be lost; change both override sites in maybe_override_mask_token_id
(the tokenizer branch that sets config.mask_token_id from
tokenizer.mask_token_id and the hf_config branch) to only assign when the
current config.mask_token_id equals the class default sentinel
(Config.mask_token_id). In other words, obtain default_mask_token_id =
Config.mask_token_id and wrap the tokenizer override in a guard like "and
config.mask_token_id == default_mask_token_id" (the hf_config branch already has
this check—ensure the same logic is applied to the tokenizer branch and any
duplicate occurrence around the later block).

🤖 Prompt for all review comments with AI agents

Verify each finding against the current code and only fix it if needed.

Duplicate comments:
In `@diffulex/engine/engine.py`:
- Around line 24-55: The function maybe_override_mask_token_id currently
unconditionally replaces config.mask_token_id from the tokenizer branch, which
lets explicit CLI/config overrides be lost; change both override sites in
maybe_override_mask_token_id (the tokenizer branch that sets
config.mask_token_id from tokenizer.mask_token_id and the hf_config branch) to
only assign when the current config.mask_token_id equals the class default
sentinel (Config.mask_token_id). In other words, obtain default_mask_token_id =
Config.mask_token_id and wrap the tokenizer override in a guard like "and
config.mask_token_id == default_mask_token_id" (the hf_config branch already has
this check—ensure the same logic is applied to the tokenizer branch and any
duplicate occurrence around the later block).

ℹ️ Review info

⚙️ Run configuration

Configuration used: defaults

Review profile: CHILL

Plan: Pro

Run ID: 2ea6e083-500e-40c3-9eb8-32544fe5a09c

📥 Commits

Reviewing files that changed from the base of the PR and between 1fca0f5 and 5e8b082.

📒 Files selected for processing (11)

diffulex/engine/engine.py
diffulex/strategy_template/multi_block/engine/request.py
diffulex_bench/configs/dream_base_gsm8k.yml
diffulex_bench/configs/llada_instruct_gsm8k.yml
docs/cookbook/benchmark.md
docs/cookbook/index.md
docs/cookbook/models/dream_base.md
docs/cookbook/models/fast_dllm_v2.md
docs/cookbook/models/llada_instruct.md
docs/cookbook/models/sdar_chat.md
docs/cookbook/server.md

✅ Files skipped from review due to trivial changes (7)

docs/cookbook/models/fast_dllm_v2.md
docs/cookbook/server.md
docs/cookbook/benchmark.md
docs/cookbook/models/llada_instruct.md
docs/cookbook/models/dream_base.md
docs/cookbook/index.md
docs/cookbook/models/sdar_chat.md

🚧 Files skipped from review as they are similar to previous changes (2)

diffulex_bench/configs/llada_instruct_gsm8k.yml
diffulex_bench/configs/dream_base_gsm8k.yml

coderabbitai

♻️ Duplicate comments (1)

diffulex/engine/engine.py (1)

24-33: ⚠️ Potential issue | 🟡 Minor

Explicit mask_token_id is still ambiguous when user passes the default value.

The helper infers “explicitness” by value (config.mask_token_id == Config.mask_token_id), so an explicitly provided default can still be overridden. Please pass an explicit boolean from config_kwargs to preserve user intent in all cases.

🐛 Minimal fix

-def maybe_override_mask_token_id(config: Config, tokenizer) -> None:
+def maybe_override_mask_token_id(
+    config: Config,
+    tokenizer,
+    *,
+    mask_token_id_explicit: bool = False,
+) -> None:
@@
+    if mask_token_id_explicit:
+        return
+
     default_mask_token_id = Config.mask_token_id
@@
-        maybe_override_mask_token_id(config, self.tokenizer)
+        maybe_override_mask_token_id(
+            config,
+            self.tokenizer,
+            mask_token_id_explicit="mask_token_id" in config_kwargs,
+        )

Also applies to: 35-42, 85-85

🤖 Prompt for AI Agents

Verify each finding against the current code and only fix it if needed.

In `@diffulex/engine/engine.py` around lines 24 - 33, The current
maybe_override_mask_token_id logic infers whether the user explicitly set
mask_token_id by comparing values, which misclassifies an explicitly-provided
default; change the function to accept an explicit boolean flag (e.g.,
mask_token_id_explicit) passed from config_kwargs and use that flag instead of
value equality to decide whether to skip tokenizer/HF overrides; update the call
sites that construct Config (where config_kwargs are built) to set this boolean
when the user provided mask_token_id, and apply the same change to the other
similar checks referenced (the duplicate logic around lines 35-42 and the check
at 85) to use the explicit boolean parameter so user intent is preserved.

🤖 Prompt for all review comments with AI agents

Verify each finding against the current code and only fix it if needed.

Duplicate comments:
In `@diffulex/engine/engine.py`:
- Around line 24-33: The current maybe_override_mask_token_id logic infers
whether the user explicitly set mask_token_id by comparing values, which
misclassifies an explicitly-provided default; change the function to accept an
explicit boolean flag (e.g., mask_token_id_explicit) passed from config_kwargs
and use that flag instead of value equality to decide whether to skip
tokenizer/HF overrides; update the call sites that construct Config (where
config_kwargs are built) to set this boolean when the user provided
mask_token_id, and apply the same change to the other similar checks referenced
(the duplicate logic around lines 35-42 and the check at 85) to use the explicit
boolean parameter so user intent is preserved.

ℹ️ Review info

⚙️ Run configuration

Configuration used: defaults

Review profile: CHILL

Plan: Pro

Run ID: a1f832dc-d836-4e03-a73e-4c0f992d3fd4

📥 Commits

Reviewing files that changed from the base of the PR and between 5e8b082 and 966b75f.

📒 Files selected for processing (6)

diffulex/engine/engine.py
docs/cookbook/models/dream_base.md
docs/cookbook/models/fast_dllm_v2.md
docs/cookbook/models/llada_instruct.md
docs/cookbook/models/sdar_chat.md
docs/cookbook/server.md

✅ Files skipped from review due to trivial changes (3)

docs/cookbook/models/llada_instruct.md
docs/cookbook/models/fast_dllm_v2.md
docs/cookbook/models/dream_base.md

🚧 Files skipped from review as they are similar to previous changes (2)

docs/cookbook/models/sdar_chat.md
docs/cookbook/server.md

Osc-7 added 5 commits April 21, 2026 18:47

fix: align LoRA target matching with packed_modules_mapping

60cd7c5

fix(sampler): align prefill indices for preempt-rebuild stability

2b06344

fix(request): bound prefill to-cache index to visible logits window

dcd0251

chore: update configuration files and .gitignore

f5352f9

Osc-7 marked this pull request as ready for review April 22, 2026 02:32

coderabbitai Bot reviewed Apr 22, 2026

View reviewed changes

Comment thread diffulex/sampler/base/no_shift.py Outdated

Comment thread diffulex/utils/loader.py

Osc-7 added 2 commits April 22, 2026 17:37

Address PR review feedback

ef6ecee

Fix LLaDA server mask token handling

1fca0f5

coderabbitai Bot reviewed Apr 22, 2026

View reviewed changes

Comment thread diffulex/engine/engine.py

Osc-7 added 3 commits April 22, 2026 18:02

Fix worker config init and truncation state handling

fa5be3b

docs: add model-oriented cookbook recipes

757dceb

fix: align cookbook benchmark configs

5e8b082

coderabbitai Bot reviewed Apr 29, 2026

View reviewed changes

Osc-7 added 2 commits April 29, 2026 08:03

fix: preserve explicit mask token ids

cdac2d6

docs: increase server context recommendations

966b75f

coderabbitai Bot reviewed Apr 29, 2026

View reviewed changes

fix: respect explicit default mask token id

4550c0b

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

fix: stabilize D2F LoRA decoding under preempt-rebuild scenarios#40

fix: stabilize D2F LoRA decoding under preempt-rebuild scenarios#40
Osc-7 wants to merge 13 commits into
SJTU-DENG-Lab:mainfrom
Osc-7:fix-llada-lora-bug

Osc-7 commented Apr 21, 2026 •

edited by coderabbitai Bot

Loading

Uh oh!

coderabbitai Bot commented Apr 21, 2026 •

edited

Loading

Rate limit exceeded

Walkthrough

Changes

Sequence Diagram(s)

Estimated code review effort

Possibly related PRs

Poem

❌ Failed checks (1 warning)

Uh oh!

coderabbitai Bot left a comment

Uh oh!

Uh oh!

Uh oh!

coderabbitai Bot left a comment

Uh oh!

Uh oh!

coderabbitai Bot left a comment

Uh oh!

coderabbitai Bot left a comment

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

Conversation

Osc-7 commented Apr 21, 2026 • edited by coderabbitai Bot Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Summary

Key Changes

Results

Notes

Checklist

Summary by CodeRabbit

Uh oh!

coderabbitai Bot commented Apr 21, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Rate limit exceeded

Walkthrough

Changes

Sequence Diagram(s)

Estimated code review effort

Possibly related PRs

Poem

❌ Failed checks (1 warning)

Uh oh!

coderabbitai Bot left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

coderabbitai Bot left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

coderabbitai Bot left a comment

Choose a reason for hiding this comment

Uh oh!

coderabbitai Bot left a comment

Choose a reason for hiding this comment

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

Osc-7 commented Apr 21, 2026 •

edited by coderabbitai Bot

Loading

coderabbitai Bot commented Apr 21, 2026 •

edited

Loading