Three server-side bugs in gym Docker images affecting 19 benchmark tasks

We found three server-side bugs in the MCP gym Docker images that make 19 benchmark tasks impossible to complete regardless of model capability. These were discovered while running the full 649-task benchmark (`enterprise_ops_gym_oracle.parquet`) against the official Docker images.

---

## Bug 1: `create_virtual_event_townhall` crashes with Python TypeError (8 tasks)

**Domain:** Teams, Hybrid
**Image:** `enterpriseops-gym-mcp-teams`

**Error:**
```
Error calling create_virtual_event_townhall: Failed to create townhall:
schemas.virtual_event_townhall.VirtualEventTownhallResponse() argument after ** must be a mapping, not NoneType
```

**Root cause:** The townhall creation handler returns `None` internally, then the response constructor tries to unpack it via `**None`, which crashes.

**Affected tasks:**
- `task_20251125_113335_636_7ebc1127_a4ebffe8` (teams)
- `task_20251205_150335_071_464ee3e0_19d263cd` (teams)
- `task_20251205_192805_864_464ee3e0_91f2c02f` (teams)
- `task_20260106_104903_639_0154326e_81f6a68a` (teams)
- `task_20260108_172650_003_8e9e30d7_a1ff8588` (teams)
- `task_20260108_194921_186_8e9e30d7_a087d6f8` (teams)
- `task_20260109_004703_994_7ebc1127_6f97f3e9` (teams)
- `task_20260114_164939_471_4d9df647_2aaff95f` (hybrid)

The model calls the tool with valid arguments, but the server crashes before producing any response.

---

## Bug 2: `create_send_as_alias` returns HTTP 500 (5 tasks)

**Domain:** Email, Hybrid
**Image:** `enterpriseops-gym-mcp-email`

**Error:**
```
Error calling create_send_as_alias: ❌ ❌ HTTP 500: Internal Server Error
```

No error details returned — the server crashes with an unhandled exception.

**Affected tasks:**
- `task_20251218_102205_211_1628b966_06687c79` (hybrid)
- `task_20260107_131200_705_1628b966_84e87e0f` (email)
- `task_20260107_141130_029_1628b966_8e264839` (email)
- `task_20260109_160851_122_911d75d7_3ece8e5e` (email)
- `task_20260116_064331_915_d8f93f2d_854d09b8` (hybrid)

---

## Bug 3: `create_draft` / `send_message` FOREIGN KEY constraint failures (6 tasks)

**Domain:** Email, Hybrid
**Image:** `enterpriseops-gym-mcp-email`

**Error (two variants):**

**Variant A — threads table FK (4 tasks, all hybrid):**
```
Error creating draft: (sqlite3.IntegrityError) FOREIGN KEY constraint failed
[SQL: INSERT INTO threads (id, user_id, snippet, ...) VALUES (?, ?, ?, ?, ?, ?)]
```
Occurs when `userId="me"` in hybrid tasks. The email gym resolves `"me"` to a user_id that does not exist in the seeded database's `users` table.

**Variant B — message_labels table FK (3 tasks):**
```
Error sending message: (sqlite3.IntegrityError) FOREIGN KEY constraint failed
[SQL: INSERT INTO message_labels (message_id, label_id) VALUES (?, ?)]
```
The label exists (verified by `modify_message` working with the same `label_id` immediately after), but the `send_message` endpoint fails when `labelIds` is provided in the request body. Likely a transaction ordering issue where the message is committed to `message_labels` before the thread/message FK chain is fully resolved.

**Affected tasks:**
- `task_20251211_100706_985_701c5774_8d658f71` (email)
- `task_20251218_103430_472_701c5774_42c99459` (hybrid)
- `task_20251218_120340_971_701c5774_de18aa54` (hybrid)
- `task_20251219_115208_288_701c5774_66a236f8` (hybrid)
- `task_20251224_104537_197_701c5774_5c104832` (hybrid)
- `task_20251225_064512_901_701c5774_964bf3bf` (hybrid)

---

## Impact

These 19 tasks (2.9% of the benchmark) are **guaranteed failures** regardless of the model — the server crashes or rejects valid requests before the model can complete the task. This deflates reported scores for any model evaluated on the full benchmark.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Three server-side bugs in gym Docker images affecting 19 benchmark tasks #4

Bug 1: `create_virtual_event_townhall` crashes with Python TypeError (8 tasks)

Bug 2: `create_send_as_alias` returns HTTP 500 (5 tasks)

Bug 3: `create_draft` / `send_message` FOREIGN KEY constraint failures (6 tasks)

Impact

Metadata

Assignees

Labels

Type

Fields

Projects

Milestone

Relationships

Development

Three server-side bugs in gym Docker images affecting 19 benchmark tasks #4

Description

Bug 1: create_virtual_event_townhall crashes with Python TypeError (8 tasks)

Bug 2: create_send_as_alias returns HTTP 500 (5 tasks)

Bug 3: create_draft / send_message FOREIGN KEY constraint failures (6 tasks)

Impact

Metadata

Metadata

Assignees

Labels

Type

Fields

Projects

Milestone

Relationships

Development

Issue actions

Bug 1: `create_virtual_event_townhall` crashes with Python TypeError (8 tasks)

Bug 2: `create_send_as_alias` returns HTTP 500 (5 tasks)

Bug 3: `create_draft` / `send_message` FOREIGN KEY constraint failures (6 tasks)