Hello, thank you for the great work on the RLVMR. I'm particularly interested in the ReAct results mentioned in the experiments (Qwen-1.5B/7B ReAct).
I noticed that your experimental setup and evaluation environment likely differ from the official ReAct implementation. Could you please clarify if the code for the prompting ReAct (without any fine-tuning) and its evaluation pipeline are included in this repository?
Hello, thank you for the great work on the RLVMR. I'm particularly interested in the ReAct results mentioned in the experiments (Qwen-1.5B/7B ReAct).
I noticed that your experimental setup and evaluation environment likely differ from the official ReAct implementation. Could you please clarify if the code for the prompting ReAct (without any fine-tuning) and its evaluation pipeline are included in this repository?