Guidance on num_simulations, max_depth, and large-branching setups for MAPF in MCTX

Hi—thanks for the fantastic library!

I’m using MCTX (Gumbel MuZero search) for multi-agent path finding on grids. Each agent has 5 actions (UP/DOWN/LEFT/RIGHT/STAY), so the joint action space grows as $5^N$:

* 2 agents → 25 actions
* 3 agents → 125 actions
* 4 agents → 625 actions

I don’t have a policy-value network yet; I’m using GMZ as a planner with uniform priors and either `value=0` or a light heuristic. Horizons can be long on large maps.

**Current settings**

* `num_simulations`: 10k–20k
* `max_depth`: 15–30
* `max_num_considered_actions`: 125

**Observation**
Despite the large simulation budget, plans are often suboptimal compared to a human baseline.

**Questions**

1. Any recommended rules of thumb for choosing `num_simulations` vs. `max_depth` as the branching factor explodes?
2. For joint action spaces, guidance on `max_num_considered_actions` (consider-all vs. subsample)?
3. Suggested `qtransform` settings (e.g., `value_scale`, `maxvisit_init`, `use_mixed_value`, `rescale_values`) when values are zero/heuristic rather than learned?
4. With uniform priors, should I keep a nonzero `gumbel_scale` to break ties, or is a deterministic setting preferable here?


Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Guidance on num_simulations, max_depth, and large-branching setups for MAPF in MCTX #108

Metadata

Assignees

Labels

Type

Fields

Projects

Milestone

Relationships

Development

Guidance on num_simulations, max_depth, and large-branching setups for MAPF in MCTX #108

Description

Metadata

Metadata

Assignees

Labels

Type

Fields

Projects

Milestone

Relationships

Development

Issue actions