Skip to content

Implementation Details #32

@nasosger

Description

@nasosger

Hi,
First of all thanks for sharing this incredible project!
Regarding the implementation of the generative pretraining, I would like to ask for some clarifications. Your paper is very detailed with emphasis on all the hyperparameters, but I think these two things are missing.

  1. How is the prefix length sampled? Is it constant for each batch, or is it specific for each sample in the batch? Are there any minimum, maximum values for it?
  2. In the paper, you mention that the MLP head (that predicts the normalized pixel values), consists of 12 MLP blocks. Does this mean that each MLP block is similar to the MLP block from the attention layer? Does the head include residual connections as well?

Thanks again for sharing! If you could clarify these things, it would be very helpful for other researchers who seek to build upon this nice work.

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions