AgentOpt · chinganc · Mar 17, 2026 · Mar 9, 2026 · Mar 9, 2026 · Mar 9, 2026
diff --git a/.gitignore b/.gitignore
@@ -168,4 +168,5 @@ OAI_CONFIG_LIST
 *.gv.pdf
 
 # jupyter book API output
-docs/api/*
+docs/api/*examples/notebooks/bbeh/
+t6_m2_bbeh_2.ipynb
diff --git a/README.md b/README.md
@@ -32,7 +32,7 @@ Or for development, clone the repo and run the following.
 
     pip install -e .
 
-The library requires Python >= 3.9. By default (starting with v0.1.3.5), we use [LiteLLM](https://github.com/BerriAI/litellm) as the backend of LLMs. For backward compatibility, we provide backend-support with [AutoGen](https://github.com/microsoft/autogen); when installing, users can add `[autogen]` tag to install a compatible AutoGen version (e.g., `pip install trace-opt[autogen]`). You may require [Git Large File Storage](https://git-lfs.com/) if
+The library requires Python >= 3.10. By default (starting with v0.1.3.5), we use [LiteLLM](https://github.com/BerriAI/litellm) as the backend of LLMs. For backward compatibility, we provide backend-support with [AutoGen](https://github.com/microsoft/autogen); when installing, users can add `[autogen]` tag to install a compatible AutoGen version (e.g., `pip install trace-opt[autogen]`). You may require [Git Large File Storage](https://git-lfs.com/) if
 git is unable to clone the repository.
 
 **For questions or reporting bugs, please use Github Issues or post on our [Discord channel](https://discord.gg/4VeAvwFcWy). We actively check these channels.**
@@ -241,6 +241,36 @@ Defining and training an agent through Trace will give you more flexibility and
 | Advanced | [Robotic Arm Control](https://agentopt.github.io/Trace/examples/robotics/metaworld.html) | [![Open In Colab](https://colab.research.google.com/assets/colab-badge.svg)](https://colab.research.google.com/github/AgentOpt/Trace/blob/website/docs/examples/robotics/metaworld.ipynb)                                     | Trace can optimize code to control a robotic arm after observing a full trajectory of interactions.                                                                                   |
 
 
+## Multi-Objective Optimization
+
+Trace supports **multi-objective optimization** where candidates are evaluated on
+multiple metrics simultaneously (e.g. accuracy + token cost, or base loss +
+regularization loss).
+
+See the full guide: **[docs/multi_objective_scores.md](docs/multi_objective_scores.md)**
+
+Key features:
+- **Vector scores** — `Guide.get_score_dict()` returns `Dict[str, float]` with named metrics
+- **Weighted scalarization** and **Pareto dominance** ranking via `ObjectiveConfig`
+- Supported in `BasicSearchAlgorithm`, `BeamsearchAlgorithm`, and `BeamsearchHistoryAlgorithm`
+- Token-minimization pattern using `UsageTrackingLLM` + `TokenUsageAugmentingGuide`
+
+Canonical notebooks:
+
+| Notebook | Description |
+|---|---|
+| [multiobjective_quickstart](examples/notebooks/multiobjective_quickstart.ipynb) | Core vector-score infrastructure and BasicSearch integration |
+| [multiobjective_trainers](examples/notebooks/multiobjective_trainers.ipynb) | Beamsearch and PrioritySearch multi-objective support |
+| [multiobjective_bbeh_langgraph](examples/notebooks/multiobjective_bbeh_langgraph.ipynb) | Real LLM task: BBEH boolean expressions with accuracy + execution time |
+
+Trace-Bench multi-objective benchmarks (in [AgentOpt/Trace-Bench](https://github.com/AgentOpt/Trace-Bench)):
+
+| Notebook | Task | Metrics |
+|---|---|---|
+| `multiobjective_convex` | SixHumpCamel | base_loss, reg_loss |
+| `multiobjective_bbeh` | BBEH boolean_expressions | accuracy, execution_time_s |
+| `multiobjective_gsm8k` | GSM8K + token usage | error, tokens_in, tokens_out |
+
 ## Supported Optimizers
 
 Currently, we support three optimizers:

diff --git a/docs/T6_technical_plan.md b/docs/T6_technical_plan.md
diff --git a/examples/notebooks/t6_m0_analysis.ipynb → .../multi_objective_design_exploration.ipynb b/examples/notebooks/t6_m0_analysis.ipynb → .../multi_objective_design_exploration.ipynb
@@ -35,7 +35,7 @@
    "cell_type": "markdown",
    "id": "b1a58d26",
    "metadata": {},
-   "source": "# T6 Multi-Objective Vector Scores — M0 Analysis\n\n[![Open In Colab](https://colab.research.google.com/assets/colab-badge.svg)](https://colab.research.google.com/github/AgentOpt/OpenTrace/blob/pull/61/head/examples/notebooks/t6_m0_analysis.ipynb)\n\n**Milestone 0 Deliverable** — Analysis + Technical Plan + Interface Spec\n\nThis notebook demonstrates:\n1. **Current baseline**: How Guide returns scalar scores, how evaluators aggregate, where selection happens\n2. **Exact touchpoints**: The specific lines of code in BasicSearch and Beamsearch that perform scalar selection\n3. **Planned behavior**: A deterministic prototype showing weighted vs Pareto selection on toy candidates\n\n**Motivation (why score-as-dict):** adding extra metrics into the *feedback dict/text* can help optimizers (OptoPrime/OPRO), but trainers typically only use the scalar score for ranking/UCB and ignore additional feedback structure. To enable Pareto/weighted multi-objective selection at the trainer level, we need vector score (score-as-dict) with backward-compatible scalar reduction.\n\n**No API keys required for M0.** All examples use deterministic dummy data. (From M1 onward, milestone notebooks must validate both StubLLM and real LLM modes.)\n\n---"
+   "source": "# T6 Multi-Objective Vector Scores — M0 Analysis\n\n[![Open In Colab](https://colab.research.google.com/assets/colab-badge.svg)](https://colab.research.google.com/github/AgentOpt/OpenTrace/blob/experimental/docs/dev/multi_objective_design_exploration.ipynb)\n\n**Milestone 0 Deliverable** — Analysis + Technical Plan + Interface Spec\n\nThis notebook demonstrates:\n1. **Current baseline**: How Guide returns scalar scores, how evaluators aggregate, where selection happens\n2. **Exact touchpoints**: The specific lines of code in BasicSearch and Beamsearch that perform scalar selection\n3. **Planned behavior**: A deterministic prototype showing weighted vs Pareto selection on toy candidates\n\n**Motivation (why score-as-dict):** adding extra metrics into the *feedback dict/text* can help optimizers (OptoPrime/OPRO), but trainers typically only use the scalar score for ranking/UCB and ignore additional feedback structure. To enable Pareto/weighted multi-objective selection at the trainer level, we need vector score (score-as-dict) with backward-compatible scalar reduction.\n\n**No API keys required for M0.** All examples use deterministic dummy data. (From M1 onward, milestone notebooks must validate both StubLLM and real LLM modes.)\n\n---"
   },
   {
    "cell_type": "markdown",
@@ -932,4 +932,4 @@
  },
  "nbformat": 4,
  "nbformat_minor": 5
-}
+}