Thanks for your excellent work! I have a question about your choice of attention injection. You mentioned that inspired by PnP image editing, you decide to inject temporal/spatial self attention and features. But PnP is based on T2I, your work is based on I2V, the modality of cross attention is different. Have you test injecting cross attention? Thanks for your work again!
Thanks for your excellent work! I have a question about your choice of attention injection. You mentioned that inspired by PnP image editing, you decide to inject temporal/spatial self attention and features. But PnP is based on T2I, your work is based on I2V, the modality of cross attention is different. Have you test injecting cross attention? Thanks for your work again!