Comparative Study of Reasoning, Planning, and Execution with Monte Carlo Tree Search in LLM-Based Web Agents
CS294 Large Language Model Agents Hackathon (Fundamental Track) Work Repository
Team Positronic Web Pilot
In this project, we propose to integrate Monte Carlo Tree Search (MCTS) techniques to enhance reasoning, planning, and execution in LLM web agents.
We design two prompts, one of which enforce stricter decision-making instruction, while the other one encourage more exploration and harness the advantageous exploitation-exploration trade-off of MCTS. Experiments on the WebShop benchmark demonstrate that combining flexible prompts with MCTS significantly improves agent performance among all the configurations tested.
We use the GPT-4o model as LLM and integrate MCTS to the SeeAct framework. We run experiments on the WebShop dataset.
# create conda environment
conda create -n seeactmcts python=3.11
conda activate seeactmcts
# install dependencies
pip install seeact
pip uninstall openai # we will use older version commits hence need a compatible openai
pip install openai==0.28.0
# set up PlayWright and install the browser kernels
playwright install chromiumFollow the step 1 to 6 in the official README or README-MAC if you are using Apple Mac device.
Note1 You would need to create another new conda environment for WebShop to start the environment
conda create -n webshop python=3.8.13
conda activate webshopNote2 In step 5, you would need to run the command below to load the full version data and follow step 6, to run the experiments in our demo.
./setup.sh allcd ./ experiments/WebShop-master
./run_dev.shFirst navigate to ./experiments/SeeAct-main/src and create a .env file.
cd ./experiments/SeeAct-main/src
vim .env # then add your openai API key: OPENAI_API_KEY="YOUR_API_KEY"Then you could run the scripts for experiment with web agents.
# single rollout using strict prompt
for n in {0..20}; do python seeact.py -c config/webshop_mode.toml -n $n; done;
# multiple rollouts with MCTS using strict prompt
for n in {0..20}; do python seeact_3.py -c config/webshop_mode.toml -n $n; done;
# single rollout using flexible prompt
for n in {0..20}; do python seeact.py -c config/webshop_mode2.toml -n $n; done;
# multiple rollouts with MCTS using strict prompt
for n in {0..20}; do python seeact_3.py -c config/webshop_mode2.toml -n $n; done;# You shoulf modify the folder names in agg_score.py before running
python agg_score.py👉 Primary contact of this project: Shiying He (sy.he0303@gmail.com)
👉 MOOC course site: http://llmagents-learning.org/f24.
👉 Hackathon website: https://rdi.berkeley.edu/llm-agents-hackathon/.
