Skip to content

Allow SDXL CLIP token expansion#22

Draft
Keswiik wants to merge 1 commit intoNerogar:masterfrom
Keswiik:token-expansion
Draft

Allow SDXL CLIP token expansion#22
Keswiik wants to merge 1 commit intoNerogar:masterfrom
Keswiik:token-expansion

Conversation

@Keswiik
Copy link
Copy Markdown

@Keswiik Keswiik commented Apr 10, 2025

Not sure how correct this implementation is, but I dropped it into my local venv and am able to run SDXL training with no issues in OneTrainer.

image
I've also updated the OT training tab, train config / config migration, and SDXL model loader locally and, if this is approved, will have a follow up PR to get those added too.


def get_item(self, variation: int, index: int, requested_name: str = None) -> dict:
if not self.add_layer_norm and self.expand_token_limit and self.expanded_chunk_size != 0:
return self.encode_text_long(variation, index, requested_name)
Copy link
Copy Markdown
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Made this with SDXL in mind and I didn't see it using layer norms, so I added this as its own method to avoid possibly breaking other models.

if self.pooled_out_name:
if (hasattr(text_encoder_output, "text_embeds")):
pooled_state = text_encoder_output.text_embeds
pooled_state = pooled_state.mean(dim=0).reshape((1,pooled_state.shape[-1]))
Copy link
Copy Markdown
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I have no idea if this is the right way to handle text embeddings, but results seem ok after training?

@Keswiik Keswiik marked this pull request as draft April 11, 2025 00:03
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant