Skip to content

GPT-1 Implementation#37

Open
ChanLumerico wants to merge 3 commits intomainfrom
gpt
Open

GPT-1 Implementation#37
ChanLumerico wants to merge 3 commits intomainfrom
gpt

Conversation

@ChanLumerico
Copy link
Copy Markdown
Owner

No description provided.

@ChanLumerico ChanLumerico self-assigned this Mar 27, 2026
Copy link
Copy Markdown

@chatgpt-codex-connector chatgpt-codex-connector Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

💡 Codex Review

Here are some automated review suggestions for this pull request.

Reviewed commit: 63b45629e3

ℹ️ About Codex in GitHub

Your team has set up Codex to review pull requests in this repo. Reviews are triggered when you

  • Open a pull request for review
  • Mark a draft as ready
  • Comment "@codex review".

If Codex has suggestions, it will comment; otherwise it will react with 👍.

Codex can also answer questions or update the PR. Try commenting "@codex address that feedback".

Comment thread lucid/_utils/__init__.py
@@ -130,8 +130,8 @@ def roll(
return func.roll(shifts, axis)(a)


def gather(a: Tensor, /, dim: int, index: Tensor) -> Tensor:
return func.gather(dim)(a, index)
def gather(a: Tensor, /, axis: int, index: Tensor) -> Tensor:
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

P1 Badge Preserve dim keyword compatibility in lucid.gather

Renaming the public argument from dim to axis without a compatibility path breaks existing call sites that still pass dim= (for example, lucid/models/text/bert/_model.py uses lucid.gather(..., dim=1, ...) in the default embedding path). In practice, BERT/RoFormer calls that omit token_type_ids will now raise TypeError: gather() got an unexpected keyword argument 'dim', which is a regression outside the new GPT code.

Useful? React with 👍 / 👎.

Comment thread lucid/nn/utils/_base.py
attr = getattr(_activation, attr_name, None)
if issubclass(attr, Module):
if attr.__name__.lower() == act_name:
return attr
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

P1 Badge Return an activation instance, not the activation class

get_activation_module_from_name currently returns the module class itself (e.g., GELU) rather than an instantiated module. _GPTMLP stores this in self.act and then executes self.act(self.c_fc(x)), which invokes class construction with a tensor argument and fails for the default hidden_act='gelu'; this makes GPT forward unusable.

Useful? React with 👍 / 👎.

max_len = config.max_position_embeddings
self.register_buffer(
"causal_mask",
lucid.tril(lucid.ones(max_len, max_len)).reshape(1, 1, max_len, max_len),
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

P1 Badge Build GPT causal mask as additive -inf mask

The registered causal_mask is lower-triangular 0/1, but the attention kernel treats attn_mask as an additive bias to scores (masked positions must be large negative values). Adding this 0/1 mask does not block future tokens, so training/inference are no longer causal and tokens can attend to the future.

Useful? React with 👍 / 👎.

attn_out = attn_out.swapaxes(1, 2).reshape(B, T, C)
attn_out = self.resid_drop(self.c_proj(attn_out))

return attn_out, past_key_value if use_cache else None
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

P2 Badge Return valid cache objects when use_cache is enabled

When use_cache=True and no past_key_values are provided (first decoding step), _GPTAttention returns None for each layer’s present cache, so the model returns a list of None. Passing that list back into the next forward causes past_key_values[0].get_seq_length() to crash, which breaks iterative generation with returned caches.

Useful? React with 👍 / 👎.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant