Skip to content

ASTRAL-Group/WebAgent_Visual_Attribution

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

34 Commits
 
 
 
 
 
 
 
 
 
 

Repository files navigation

How do Visual Attributes Influence Web Agents?

A Comprehensive Evaluation of User Interface Design Factors

Kuai Yu2 · Naicheng Yu3 · Han Wang1 · Rui Yang1 · Huan Zhang1
1 University of Illinois Urbana-Champaign  ·  2 Columbia University  ·  3 University of California San Diego

Paper Project Page Code

Contributions

🔬 First Systematic Study
We present the first controlled evaluation of how visual UI attributes shape web-agent decision-making, filling a critical gap beyond adversarial robustness research.
⚙️ VAF Pipeline
A three-stage framework — Variant Generation → Browsing Simulation → Dual Evaluation — enabling reproducible, scalable measurement of any visual attribute’s influence.
📊 Actionable Findings
Across 48 variants, 5 websites, 4 agents: background color contrast, item size, position, and card clarity dominate agent behavior; font and text color matter far less.

Method Overview

Starting from a real-world webpage, we simulate viewport-based browsing with a web agent on the original page, generate semantics-preserving visual variants, and compare how prompts and actions (e.g., scroll vs. click) shift until click verification on the target item.

VAF method overview: original-site browsing with user prompt and agent observation; variants generation from the full webpage; variant-site browsing where visual changes (e.g., background color) alter the agent’s decision; click verification on the target item.


About this repository

This repository contains pipelines for web page variant generation and visual attribution evaluation: generating varied web pages (HTML, screenshots, target coordinates) and evaluating how those variants influence model click behavior.

For the paper’s method overview, interactive variant browser, and quantitative results, see the project page.


Repository structure

Directory Purpose
comprehensive_pipeline/ End-to-end evaluation pipeline: discovers variants, runs model inference (e.g. GLM-4V, Qwen, UI-TARS), compares predicted clicks to target coordinates, and produces reports. See comprehensive_pipeline/README.md for setup, scenarios, and usage.
web_variants_generation/ Variant generation pipeline: produces HTML variants from source snapshots, takes screenshots, extracts target-element coordinates, and draws verification overlays. Outputs under web_variants_generation/data/. See web_variants_generation/README.md for setup and per-scenario instructions.

Each part has its own README with detailed setup, options, and troubleshooting.


Data directory (web variant outputs)

Generated outputs (HTML, screenshots, coordinates.json, verification images) go under web_variants_generation/data/.

  • You do not need to create this folder. The repo includes an empty web_variants_generation/data/ directory (with a .gitkeep placeholder). When you run a scenario, the pipeline creates the needed subfolders (e.g. data/amazon_first/html, data/amazon_first/screenshots) automatically.

If you prefer to create the output root yourself before the first run, you can:

mkdir -p web_variants_generation/data

This is optional; the pipeline will create it if missing.


Quick start

1. Generate variants (HTML + screenshots + coordinates)

From the repository root:

bash web_variants_generation/pipeline/run.sh <scenario_name>

Examples: amazon_first, amazon_second, booking, npr, expedia, ebay.

Run all scenarios in one command:

bash web_variants_generation/pipeline/run_all.sh

Useful options:

# Keep running remaining scenarios even if one fails
bash web_variants_generation/pipeline/run_all.sh --continue-on-error

# Run only selected scenarios
bash web_variants_generation/pipeline/run_all.sh --scenarios "amazon_first booking npr"

Requirements: Python 3.8+, Node.js, Playwright. One-time setup:

# If node/npm is missing, install Node.js first (example with nvm):
curl -fsSL https://raw.githubusercontent.com/nvm-sh/nvm/v0.40.3/install.sh | bash
export NVM_DIR="$HOME/.nvm" && [ -s "$NVM_DIR/nvm.sh" ] && . "$NVM_DIR/nvm.sh"
nvm install --lts && nvm use --lts

cd web_variants_generation
pip install -r requirements.txt
npm install
playwright install chromium

Optional (recommended) Python setup with uv:

# Install uv (if missing)
curl -LsSf https://astral.sh/uv/install.sh | sh
source "$HOME/.local/bin/env"

cd web_variants_generation
uv venv .venv
source .venv/bin/activate
uv pip install -r requirements.txt
npm install
uv run playwright install chromium

Results appear under web_variants_generation/data/<scenario_name>/ (html, screenshots, coordinates, verifications). See web_variants_generation/README.md for scenario list and step-by-step flow.

2. Run comprehensive evaluation

For model inference, coordinate comparison, and reports, use the comprehensive pipeline. Setup and usage (including scenario names, model types, and output paths) are in comprehensive_pipeline/README.md.


Summary

Goal Where to look
Generate page variants and coordinates web_variants_generation/README.md
Evaluate model click behavior on variants comprehensive_pipeline/README.md
Data/output location web_variants_generation/data/ (auto-created by pipeline; contents gitignored)

Citation

If you find this work useful, please cite:

@misc{yu2026visualattributesinfluenceweb,
  title={How do Visual Attributes Influence Web Agents? A Comprehensive Evaluation of User Interface Design Factors},
  author={Kuai Yu and Naicheng Yu and Han Wang and Rui Yang and Huan Zhang},
  year={2026},
  eprint={2601.21961},
  archivePrefix={arXiv},
  primaryClass={cs.AI},
  url={https://arxiv.org/abs/2601.21961}
}

Releases

No releases published

Packages

 
 
 

Contributors