fix: stabilize D2F LoRA decoding under preempt-rebuild scenarios#40
fix: stabilize D2F LoRA decoding under preempt-rebuild scenarios#40Osc-7 wants to merge 13 commits into
Conversation
…uest flow - add preempt-rebuild request state handling in multi_block request template - prevent decode livelock by recycling cached head block when frontier is lost - align no_shift prefill sampling with resume_prefill_until window to avoid out-of-window block sampling This keeps the fix scoped to request/no_shift without experimental guards or benchmark config changes.
|
Warning Rate limit exceeded
To keep reviews running without waiting, you can enable usage-based add-on for your organization. This allows additional reviews beyond the hourly cap. Account admins can enable it under billing. ⌛ How to resolve this issue?After the wait time has elapsed, a review can be triggered using the We recommend that you space out your commits to avoid hitting the rate limit. 🚦 How do rate limits work?CodeRabbit enforces hourly rate limits for each developer per organization. Our paid plans have higher rate limits than the trial, open-source and free plans. In all cases, we re-allow further reviews after a brief timeout. Please see our FAQ for further information. 📝 WalkthroughWalkthroughAdds gitignore patterns, resume-aware prefill and safer logits indexing in samplers, threads resume/cache/terminal-context state through multi-block request control flow, extends LoRA loader mapping, exposes mask-token override via engine/server, updates bench configs and cookbook docs, and updates an example stop token. Changes
Sequence Diagram(s)sequenceDiagram
participant Client as Client
participant Engine as Engine
participant Req as MultiBlockReq
participant Cache as Cache
participant Sampler as Sampler
Client->>Engine: submit request / resume
Engine->>Req: lazy_activate / set _resume_prefill_until
Req->>Cache: compute visible cached prefix (to_cache_last_token_id)
Cache-->>Req: visible cached prefix id
Req->>Sampler: schedule prefill (skip blocks beyond resume or with empty local_ids)
Sampler->>Cache: fetch last logits (safe index / fallback)
Sampler-->>Engine: token outputs / prefill progress
Estimated code review effort🎯 4 (Complex) | ⏱️ ~45 minutes Possibly related PRs
Poem
🚥 Pre-merge checks | ✅ 4 | ❌ 1❌ Failed checks (1 warning)
✅ Passed checks (4 passed)
✏️ Tip: You can configure your own custom pre-merge checks in the settings. ✨ Finishing Touches🧪 Generate unit tests (beta)
Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out. Review rate limit: 0/1 reviews remaining, refill in 40 minutes and 11 seconds.Comment |
There was a problem hiding this comment.
Actionable comments posted: 2
Caution
Some comments are outside the diff and can’t be posted inline due to platform limitations.
⚠️ Outside diff range comments (1)
diffulex/strategy_template/multi_block/engine/request.py (1)
11-25:⚠️ Potential issue | 🟡 MinorInitialize
_resume_prefill_untilin_restore_req_runtime_stateto preventAttributeErroron unpickling.
_resume_prefill_untilis initialized only ininit_multi_block(line 31) but not restored in_restore_req_runtime_state. When a multi-block request is unpickled via__setstate__, if the attribute is missing from the pickled state (edge case: old pickled requests, failed migrations, etc.), subsequent accesses at lines 232, 415, 507, 550 will raiseAttributeError. The sampler already uses defensivegetattr(req, "_resume_prefill_until", 0)at line 99 ofno_shift.py, establishing the pattern.Initialize it in
_restore_req_runtime_stateto match the mixin pattern shown in the test suite:Suggested fix
def _restore_req_runtime_state(self): super()._restore_req_runtime_state() if not self.is_multi_block: return + self._resume_prefill_until = getattr(self, "_resume_prefill_until", 0) + dllm_blocks = self.dllm_blocks🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@diffulex/strategy_template/multi_block/engine/request.py` around lines 11 - 25, The _restore_req_runtime_state method must ensure the _resume_prefill_until attribute exists after unpickling to avoid AttributeError; update the _restore_req_runtime_state implementation to initialize self._resume_prefill_until = getattr(self, "_resume_prefill_until", 0) (or assign 0 if missing) before any sampler or block logic runs—do this alongside the existing multi-block handling in _restore_req_runtime_state (the same place that accesses dllm_blocks and dllm_block_buffer) so it matches initialization in init_multi_block and the defensive pattern used in no_shift.py.
🧹 Nitpick comments (2)
diffulex_bench/configs/llada_instruct_gsm8k.yml (1)
19-21: Prefill concurrency is effectively capped to 1 with these values.With
max_num_batched_tokens == max_model_len(Line 20 and Line 19), warmup prefill concurrency becomes1even thoughmax_num_reqsis24(Line 21). If this profile is meant for stability-only that’s fine, but if benchmark throughput matters, consider settingmax_num_batched_tokensabovemax_model_len(or add a comment explaining the intentional tradeoff).🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@diffulex_bench/configs/llada_instruct_gsm8k.yml` around lines 19 - 21, The config sets max_num_batched_tokens equal to max_model_len which forces warmup prefill concurrency to 1 despite max_num_reqs being 24; either raise max_num_batched_tokens above max_model_len (e.g., to a multiple of max_model_len) to allow higher prefill concurrency or add a comment nearby documenting that this equality is intentional to cap concurrency for stability; update the values or add the explanatory comment referencing the keys max_model_len, max_num_batched_tokens, and max_num_reqs.diffulex/strategy_template/multi_block/engine/request.py (1)
397-401: Redundantcontinuein the preempt demotion loop.If
block.is_to_cacheis True,block.is_in_cacheis False (the statuses are mutually exclusive), so theif block.is_in_cache:branch is already a no-op for those blocks. Thecontinueat line 399 therefore skips nothing.Either drop the condition, or if the intent is to document that TO_CACHE blocks within
rebuild_untilare deliberately preserved, convert it to a clarifying comment. As written, future readers may infer that TO_CACHE would otherwise be demoted here, which is not the case.♻️ Suggested simplification
if self.is_multi_block: rebuild_until = 0 if self.is_decoding: rebuild_until = int(self.running_seq_start) self._resume_prefill_until = rebuild_until - # Scheduler preemption frees this req's page_table immediately after moving it - # back to WAITING. Any block still marked IN_CACHE would then point to KV pages - # that no longer belong to the req. Demote those blocks back to TO_CACHE so a - # resumed req rebuilds its cached prefix instead of reusing stale block state. - for block in self.dllm_blocks: - if rebuild_until > 0 and block.end <= rebuild_until and block.is_to_cache: - continue - if block.is_in_cache: - block.status = DllmBlockStatus.TO_CACHE + # Scheduler preemption frees this req's page_table immediately after moving it + # back to WAITING. Any block still marked IN_CACHE would then point to KV pages + # that no longer belong to the req. Demote IN_CACHE back to TO_CACHE so a + # resumed req rebuilds its cached prefix instead of reusing stale block state. + # TO_CACHE blocks are left untouched. + for block in self.dllm_blocks: + if block.is_in_cache: + block.status = DllmBlockStatus.TO_CACHE🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@diffulex/strategy_template/multi_block/engine/request.py` around lines 397 - 401, The preempt demotion loop contains a redundant continue: in the loop over self.dllm_blocks, the check "if rebuild_until > 0 and block.end <= rebuild_until and block.is_to_cache: continue" is unnecessary because block.is_to_cache and block.is_in_cache are mutually exclusive; remove the continue (or replace the condition with a short clarifying comment) so the code simply checks "if block.is_in_cache: block.status = DllmBlockStatus.TO_CACHE" and, if you keep a note, add a comment near the loop that TO_CACHE blocks within rebuild_until are intentionally preserved.
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.
Inline comments:
In `@diffulex/sampler/base/no_shift.py`:
- Around line 18-31: The method _maybe_log_prefill_alignment currently only
tracks req_ids in self._debug_prefill_alignment_logged and never emits logs,
wasting CPU because prefill_block_summaries is always built; fix by either
removing this dead scaffolding or implementing real gated logging: in the hot
path where prefill_block_summaries is constructed (the code that builds
per-block summaries and calls _maybe_log_prefill_alignment), wrap that
construction and the call with a logger.isEnabledFor(logging.DEBUG) check, and
update _maybe_log_prefill_alignment to call logger.debug(...) with a concise
summary (using req_id and the passed block_summaries) while still using
self._debug_prefill_alignment_logged to de-duplicate; reference
_maybe_log_prefill_alignment, prefill_block_summaries (the summary construction
site), and self._debug_prefill_alignment_logged when making the changes.
In `@diffulex/utils/loader.py`:
- Around line 54-57: Normalize target_modules to a sequence before the
comparison: if target_modules is a str (or other non-iterable intended as a
single module), wrap it into a list so the any(...) check in the loader logic
works on module names rather than characters. Update the code surrounding
target_modules (used with name, leaf, rev_mapping and should_apply) to coerce
strings into [target_modules] (or otherwise ensure it's a list/tuple) before
computing effective and should_apply.
---
Outside diff comments:
In `@diffulex/strategy_template/multi_block/engine/request.py`:
- Around line 11-25: The _restore_req_runtime_state method must ensure the
_resume_prefill_until attribute exists after unpickling to avoid AttributeError;
update the _restore_req_runtime_state implementation to initialize
self._resume_prefill_until = getattr(self, "_resume_prefill_until", 0) (or
assign 0 if missing) before any sampler or block logic runs—do this alongside
the existing multi-block handling in _restore_req_runtime_state (the same place
that accesses dllm_blocks and dllm_block_buffer) so it matches initialization in
init_multi_block and the defensive pattern used in no_shift.py.
---
Nitpick comments:
In `@diffulex_bench/configs/llada_instruct_gsm8k.yml`:
- Around line 19-21: The config sets max_num_batched_tokens equal to
max_model_len which forces warmup prefill concurrency to 1 despite max_num_reqs
being 24; either raise max_num_batched_tokens above max_model_len (e.g., to a
multiple of max_model_len) to allow higher prefill concurrency or add a comment
nearby documenting that this equality is intentional to cap concurrency for
stability; update the values or add the explanatory comment referencing the keys
max_model_len, max_num_batched_tokens, and max_num_reqs.
In `@diffulex/strategy_template/multi_block/engine/request.py`:
- Around line 397-401: The preempt demotion loop contains a redundant continue:
in the loop over self.dllm_blocks, the check "if rebuild_until > 0 and block.end
<= rebuild_until and block.is_to_cache: continue" is unnecessary because
block.is_to_cache and block.is_in_cache are mutually exclusive; remove the
continue (or replace the condition with a short clarifying comment) so the code
simply checks "if block.is_in_cache: block.status = DllmBlockStatus.TO_CACHE"
and, if you keep a note, add a comment near the loop that TO_CACHE blocks within
rebuild_until are intentionally preserved.
🪄 Autofix (Beta)
Fix all unresolved CodeRabbit comments on this PR:
- Push a commit to this branch (recommended)
- Create a new PR with the fixes
ℹ️ Review info
⚙️ Run configuration
Configuration used: defaults
Review profile: CHILL
Plan: Pro
Run ID: 7bbaa9ce-b02f-4e4c-a2d5-c1076ad99b9d
📒 Files selected for processing (9)
.gitignorediffulex/sampler/base/no_shift.pydiffulex/sampler/base/shift.pydiffulex/strategy_template/multi_block/engine/request.pydiffulex/utils/loader.pydiffulex_bench/configs/dream_base_gsm8k.ymldiffulex_bench/configs/fast_dllm_v2_gsm8k.ymldiffulex_bench/configs/llada_instruct_gsm8k.ymldiffulex_bench/configs/sdar_chat_gsm8k.yml
There was a problem hiding this comment.
Actionable comments posted: 1
Caution
Some comments are outside the diff and can’t be posted inline due to platform limitations.
⚠️ Outside diff range comments (2)
diffulex/strategy_template/multi_block/engine/request.py (1)
586-598:⚠️ Potential issue | 🔴 CriticalSet a terminal context for every truncation path.
Line 586 only sets
terminal_context_blockfor EOS/max-token/model-len stops, but Line 592 also gates deactivation onmax_nfe_reachedandmax_repetition_run_reached. For those two stop reasons,last_block_finishedstays false becauseterminal_context_blockis stillNone, so the request can livelock instead of completing.🐛 Proposed fix
- if self.eos_token_generated or self.max_new_tokens_reached or self.max_model_len_reached: + if self.is_truncated: self.set_terminal_context_block(self.dllm_block_buffer.last_valid_block)🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@diffulex/strategy_template/multi_block/engine/request.py` around lines 586 - 598, The code only calls set_terminal_context_block(self.dllm_block_buffer.last_valid_block) when eos_token_generated/max_new_tokens_reached/max_model_len_reached are true, but later the completion check also includes max_nfe_reached and max_repetition_run_reached; ensure terminal_context_block is set for those two stop reasons too. Update the logic around set_terminal_context_block so that if any of eos_token_generated, max_new_tokens_reached, max_model_len_reached, max_nfe_reached, or max_repetition_run_reached is true you call set_terminal_context_block(self.dllm_block_buffer.last_valid_block) before evaluating the final conditional that checks last_block_finished; reference the flags named above and the methods/attributes set_terminal_context_block, terminal_context_block, last_block_finished, and dllm_block_buffer.last_valid_block when making the change.diffulex/engine/engine.py (1)
75-84:⚠️ Potential issue | 🟠 MajorMove tokenizer loading and mask-token normalization before spawning worker processes.
With the
spawncontext,configis serialized whenprocess.start()executes. Worker ranks are spawned on lines 29–31 before the tokenizer is loaded andconfig.mask_token_idis normalized on lines 32–35, causing workers to retain the stale/default mask token ID while rank 0 receives the corrected value.Proposed fix
self.ps = [] self.events = [] + self.tokenizer = AutoTokenizer.from_pretrained(config.model, use_fast=True, trust_remote_code=True) + config.tokenizer_vocab_size = len(self.tokenizer) + config.eos = self.tokenizer.eos_token_id + maybe_override_mask_token_id(config, self.tokenizer) + ctx = mp.get_context("spawn") for i in range(1, self.model_parallel_world_size): event = ctx.Event() process = ctx.Process(target=AutoModelRunner.from_config, args=(config, i, event)) process.start() self.ps.append(process) self.events.append(event) - self.tokenizer = AutoTokenizer.from_pretrained(config.model, use_fast=True, trust_remote_code=True) - config.tokenizer_vocab_size = len(self.tokenizer) - config.eos = self.tokenizer.eos_token_id - maybe_override_mask_token_id(config, self.tokenizer)🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@diffulex/engine/engine.py` around lines 75 - 84, Move the tokenizer initialization and mask-token normalization so they run before spawning worker processes: load AutoTokenizer.from_pretrained and set config.tokenizer_vocab_size, config.eos and call maybe_override_mask_token_id(config, tokenizer) prior to the loop that creates ctx.Event and ctx.Process and starts AutoModelRunner.from_config; this ensures the serialized config passed to each child (via the ctx.Process start loop that appends to self.ps/self.events and targets AutoModelRunner.from_config) contains the corrected mask_token_id and tokenizer-derived fields.
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.
Inline comments:
In `@diffulex/engine/engine.py`:
- Around line 32-40: maybe_override_mask_token_id is unconditionally overwriting
user-provided mask_token_id; detect whether the user explicitly passed
--mask-token-id by checking config_kwargs in DiffulexEngine.__init__ (set a
boolean mask_token_id_explicit) and pass that flag into
maybe_override_mask_token_id; inside maybe_override_mask_token_id, skip the
tokenizer-based override when mask_token_id_explicit is True so CLI/config
values are preserved, and ensure you set/pass the explicit flag before worker
processes are spawned so non-rank-0 runners receive the correct config.
---
Outside diff comments:
In `@diffulex/engine/engine.py`:
- Around line 75-84: Move the tokenizer initialization and mask-token
normalization so they run before spawning worker processes: load
AutoTokenizer.from_pretrained and set config.tokenizer_vocab_size, config.eos
and call maybe_override_mask_token_id(config, tokenizer) prior to the loop that
creates ctx.Event and ctx.Process and starts AutoModelRunner.from_config; this
ensures the serialized config passed to each child (via the ctx.Process start
loop that appends to self.ps/self.events and targets
AutoModelRunner.from_config) contains the corrected mask_token_id and
tokenizer-derived fields.
In `@diffulex/strategy_template/multi_block/engine/request.py`:
- Around line 586-598: The code only calls
set_terminal_context_block(self.dllm_block_buffer.last_valid_block) when
eos_token_generated/max_new_tokens_reached/max_model_len_reached are true, but
later the completion check also includes max_nfe_reached and
max_repetition_run_reached; ensure terminal_context_block is set for those two
stop reasons too. Update the logic around set_terminal_context_block so that if
any of eos_token_generated, max_new_tokens_reached, max_model_len_reached,
max_nfe_reached, or max_repetition_run_reached is true you call
set_terminal_context_block(self.dllm_block_buffer.last_valid_block) before
evaluating the final conditional that checks last_block_finished; reference the
flags named above and the methods/attributes set_terminal_context_block,
terminal_context_block, last_block_finished, and
dllm_block_buffer.last_valid_block when making the change.
🪄 Autofix (Beta)
Fix all unresolved CodeRabbit comments on this PR:
- Push a commit to this branch (recommended)
- Create a new PR with the fixes
ℹ️ Review info
⚙️ Run configuration
Configuration used: defaults
Review profile: CHILL
Plan: Pro
Run ID: 17fad98f-c049-44e2-a280-cc0a73608e66
📒 Files selected for processing (6)
diffulex/engine/engine.pydiffulex/sampler/base/no_shift.pydiffulex/server/args.pydiffulex/strategy_template/multi_block/engine/request.pydiffulex/utils/loader.pyexamples/streamlit_block_append_chat.py
✅ Files skipped from review due to trivial changes (1)
- examples/streamlit_block_append_chat.py
🚧 Files skipped from review as they are similar to previous changes (1)
- diffulex/utils/loader.py
There was a problem hiding this comment.
♻️ Duplicate comments (1)
diffulex/engine/engine.py (1)
24-55:⚠️ Potential issue | 🟠 MajorPreserve explicit
mask_token_idoverrides from CLI/config.
maybe_override_mask_token_idstill unconditionally replacesconfig.mask_token_idwhen tokenizer/hf_config has a different value, so explicit--mask-token-idcan still be ignored (seediffulex/server/args.pyLine 92-93 passing it through). This was previously flagged and remains unresolved.🐛 Proposed fix
-def maybe_override_mask_token_id(config: Config, tokenizer) -> None: +def maybe_override_mask_token_id( + config: Config, + tokenizer, + *, + mask_token_id_explicit: bool = False, +) -> None: @@ + if mask_token_id_explicit: + return + tokenizer_mask_token_id = getattr(tokenizer, "mask_token_id", None) @@ - maybe_override_mask_token_id(config, self.tokenizer) + maybe_override_mask_token_id( + config, + self.tokenizer, + mask_token_id_explicit="mask_token_id" in config_kwargs, + )Also applies to: 77-77
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@diffulex/engine/engine.py` around lines 24 - 55, The function maybe_override_mask_token_id currently unconditionally replaces config.mask_token_id from the tokenizer branch, which lets explicit CLI/config overrides be lost; change both override sites in maybe_override_mask_token_id (the tokenizer branch that sets config.mask_token_id from tokenizer.mask_token_id and the hf_config branch) to only assign when the current config.mask_token_id equals the class default sentinel (Config.mask_token_id). In other words, obtain default_mask_token_id = Config.mask_token_id and wrap the tokenizer override in a guard like "and config.mask_token_id == default_mask_token_id" (the hf_config branch already has this check—ensure the same logic is applied to the tokenizer branch and any duplicate occurrence around the later block).
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.
Duplicate comments:
In `@diffulex/engine/engine.py`:
- Around line 24-55: The function maybe_override_mask_token_id currently
unconditionally replaces config.mask_token_id from the tokenizer branch, which
lets explicit CLI/config overrides be lost; change both override sites in
maybe_override_mask_token_id (the tokenizer branch that sets
config.mask_token_id from tokenizer.mask_token_id and the hf_config branch) to
only assign when the current config.mask_token_id equals the class default
sentinel (Config.mask_token_id). In other words, obtain default_mask_token_id =
Config.mask_token_id and wrap the tokenizer override in a guard like "and
config.mask_token_id == default_mask_token_id" (the hf_config branch already has
this check—ensure the same logic is applied to the tokenizer branch and any
duplicate occurrence around the later block).
ℹ️ Review info
⚙️ Run configuration
Configuration used: defaults
Review profile: CHILL
Plan: Pro
Run ID: 2ea6e083-500e-40c3-9eb8-32544fe5a09c
📒 Files selected for processing (11)
diffulex/engine/engine.pydiffulex/strategy_template/multi_block/engine/request.pydiffulex_bench/configs/dream_base_gsm8k.ymldiffulex_bench/configs/llada_instruct_gsm8k.ymldocs/cookbook/benchmark.mddocs/cookbook/index.mddocs/cookbook/models/dream_base.mddocs/cookbook/models/fast_dllm_v2.mddocs/cookbook/models/llada_instruct.mddocs/cookbook/models/sdar_chat.mddocs/cookbook/server.md
✅ Files skipped from review due to trivial changes (7)
- docs/cookbook/models/fast_dllm_v2.md
- docs/cookbook/server.md
- docs/cookbook/benchmark.md
- docs/cookbook/models/llada_instruct.md
- docs/cookbook/models/dream_base.md
- docs/cookbook/index.md
- docs/cookbook/models/sdar_chat.md
🚧 Files skipped from review as they are similar to previous changes (2)
- diffulex_bench/configs/llada_instruct_gsm8k.yml
- diffulex_bench/configs/dream_base_gsm8k.yml
There was a problem hiding this comment.
♻️ Duplicate comments (1)
diffulex/engine/engine.py (1)
24-33:⚠️ Potential issue | 🟡 MinorExplicit
mask_token_idis still ambiguous when user passes the default value.The helper infers “explicitness” by value (
config.mask_token_id == Config.mask_token_id), so an explicitly provided default can still be overridden. Please pass an explicit boolean fromconfig_kwargsto preserve user intent in all cases.🐛 Minimal fix
-def maybe_override_mask_token_id(config: Config, tokenizer) -> None: +def maybe_override_mask_token_id( + config: Config, + tokenizer, + *, + mask_token_id_explicit: bool = False, +) -> None: @@ + if mask_token_id_explicit: + return + default_mask_token_id = Config.mask_token_id @@ - maybe_override_mask_token_id(config, self.tokenizer) + maybe_override_mask_token_id( + config, + self.tokenizer, + mask_token_id_explicit="mask_token_id" in config_kwargs, + )Also applies to: 35-42, 85-85
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@diffulex/engine/engine.py` around lines 24 - 33, The current maybe_override_mask_token_id logic infers whether the user explicitly set mask_token_id by comparing values, which misclassifies an explicitly-provided default; change the function to accept an explicit boolean flag (e.g., mask_token_id_explicit) passed from config_kwargs and use that flag instead of value equality to decide whether to skip tokenizer/HF overrides; update the call sites that construct Config (where config_kwargs are built) to set this boolean when the user provided mask_token_id, and apply the same change to the other similar checks referenced (the duplicate logic around lines 35-42 and the check at 85) to use the explicit boolean parameter so user intent is preserved.
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.
Duplicate comments:
In `@diffulex/engine/engine.py`:
- Around line 24-33: The current maybe_override_mask_token_id logic infers
whether the user explicitly set mask_token_id by comparing values, which
misclassifies an explicitly-provided default; change the function to accept an
explicit boolean flag (e.g., mask_token_id_explicit) passed from config_kwargs
and use that flag instead of value equality to decide whether to skip
tokenizer/HF overrides; update the call sites that construct Config (where
config_kwargs are built) to set this boolean when the user provided
mask_token_id, and apply the same change to the other similar checks referenced
(the duplicate logic around lines 35-42 and the check at 85) to use the explicit
boolean parameter so user intent is preserved.
ℹ️ Review info
⚙️ Run configuration
Configuration used: defaults
Review profile: CHILL
Plan: Pro
Run ID: a1f832dc-d836-4e03-a73e-4c0f992d3fd4
📒 Files selected for processing (6)
diffulex/engine/engine.pydocs/cookbook/models/dream_base.mddocs/cookbook/models/fast_dllm_v2.mddocs/cookbook/models/llada_instruct.mddocs/cookbook/models/sdar_chat.mddocs/cookbook/server.md
✅ Files skipped from review due to trivial changes (3)
- docs/cookbook/models/llada_instruct.md
- docs/cookbook/models/fast_dllm_v2.md
- docs/cookbook/models/dream_base.md
🚧 Files skipped from review as they are similar to previous changes (2)
- docs/cookbook/models/sdar_chat.md
- docs/cookbook/server.md
Summary
This PR fixes multiple instability issues in D2F LoRA decoding, especially under preempt + rebuild scenarios.
The main problems were caused by inconsistent indexing and request state handling during prefill and decode phases, which could lead to incorrect logits alignment and unstable generation.
Key Changes
LoRA
packed_modules_mappingD2F / Decode flow
Sampler
Request / KV cache
Results
Notes
Checklist
Summary by CodeRabbit
Bug Fixes
Configuration
New Features
Documentation
Chores