Hi, thanks for this innovative work!
May I ask if you have conducted more ablation experiments on the data (e.g., just train more steps in imitation learning phase)?I think it's necessary to justify the efficiency of RL phase.
Thanks for your precious time!
Hi, thanks for this innovative work!
May I ask if you have conducted more ablation experiments on the data (e.g., just train more steps in imitation learning phase)?I think it's necessary to justify the efficiency of RL phase.
Thanks for your precious time!