In the PVT-v2 code, have you tried not using a linear projection after the pooling layer in the spatial reduction part of the attention?

I noticed in the PVT-v2 code, that you use a linear projection after the pooling layer in the spatial reduction part of the attention? 

I am wondering have you tried training the model without using a linear projection after the pooling layer in the spatial reduction part of the attention? Does it work or not?