[New Task] find-fingers by Eric123-tech · Pull Request #21 · cocoabench/cocoa-agent

Eric123-tech · 2026-01-16T04:09:45Z

This task provides an image and requires identifying and distinguishing the objects specified in the instructions. The image contains many misleading elements, testing the agent’s ability to accurately recognize the image, perform preprocessing, and carry out deep reasoning.

Copilot

Pull request overview

This PR adds three new image recognition tasks to the CocoaBench benchmark: find-players, find-hero, and find-fingers. Each task requires agents to identify and distinguish specific objects in images containing misleading elements, testing their visual recognition, preprocessing capabilities, and reasoning abilities.

Changes:

Added three new encrypted benchmark tasks (find-players, find-hero, find-fingers) following the contribution guidelines structure
Each task includes encrypted instruction, evaluation, solution, and metadata files
Added image URLs hosted on postimg.cc for each task

Reviewed changes

Copilot reviewed 18 out of 18 changed files in this pull request and generated 1 comment.

File	Description
cocoabench-head/find-players/*	New task files for identifying players in an image
cocoabench-head/find-hero/*	New task files for identifying heroes in an image
cocoabench-head/find-fingers/*	New task files for identifying fingers in an image

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

Copilot · 2026-01-16T04:14:02Z

@@ -0,0 +1 @@
+02c10714a5911f76


The PR title indicates a single new task "[New Task] find-fingers", but this PR actually adds three separate tasks: find-players, find-hero, and find-fingers. The PR description should be updated to mention all three tasks being added, or explain why they are being grouped together in a single PR.

Leolty · 2026-01-16T09:41:40Z

Hi @Eric123-tech! Thanks for the contribution!

I read all three tasks carefully. My current impression is that they mostly look like single-image VQA / counting / ID-style problems (even though models can still make mistakes). In contrib/CONTRIBUTING.md, we emphasize multi-step/multi-tool/multi-cognition tasks, typically ones that are difficult to solve in just a few steps and require more extensive work (e.g., search/browse, reasoning, and coding). From that lens, these may not align as strongly with the benchmark direction as we’re aiming for.

For the find-hero task specifically, it does feel somewhat more promising, but currently it reads like a prior-knowledge recognition question. Is the intended workflow that the agent should actually search/browse all Honor of Kings heroes and compare candidates one by one?

Eric123-tech · 2026-01-16T17:59:31Z

Hi @Eric123-tech! Thanks for the contribution!

I read all three tasks carefully. My current impression is that they mostly look like single-image VQA / counting / ID-style problems (even though models can still make mistakes). In contrib/CONTRIBUTING.md, we emphasize multi-step/multi-tool/multi-cognition tasks, typically ones that are difficult to solve in just a few steps and require more extensive work (e.g., search/browse, reasoning, and coding). From that lens, these may not align as strongly with the benchmark direction as we’re aiming for.

For the find-hero task specifically, it does feel somewhat more promising, but currently it reads like a prior-knowledge recognition question. Is the intended workflow that the agent should actually search/browse all Honor of Kings heroes and compare candidates one by one?

Hi @Leolty, thanks for the feedback!

You are right that in a general scenario, one-to-one visual matching might be needed. However, in the image provided for this task, the original hero's key quote/line appears at the bottom. The intended workflow is that the agent can extract this text and use it to search for the specific hero directly, rather than browsing through all candidates visually.
Also, I think this is a fun contribution process and I’ll iterate and think up tasks that better fit the benchmark goal.

Copilot AI review requested due to automatic review settings January 16, 2026 04:09

Copilot started reviewing on behalf of Eric123-tech January 16, 2026 04:10 View session

Copilot AI reviewed Jan 16, 2026

View reviewed changes

Add find-fingers only

46ea831

Eric123-tech force-pushed the task/find-fingers branch from 9140c12 to 46ea831 Compare January 16, 2026 17:44

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[New Task] find-fingers#21

[New Task] find-fingers#21
Eric123-tech wants to merge 1 commit into
cocoabench:mainfrom
Eric123-tech:task/find-fingers

Eric123-tech commented Jan 16, 2026

Uh oh!

Copilot AI left a comment

Uh oh!

Copilot AI Jan 16, 2026

Uh oh!

Leolty commented Jan 16, 2026

Uh oh!

Eric123-tech commented Jan 16, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

		@@ -0,0 +1 @@
		02c10714a5911f76 No newline at end of file

Conversation

Eric123-tech commented Jan 16, 2026

Uh oh!

Copilot AI left a comment

Choose a reason for hiding this comment

Pull request overview

Reviewed changes

Uh oh!

Copilot AI Jan 16, 2026

Choose a reason for hiding this comment

Uh oh!

Leolty commented Jan 16, 2026

Uh oh!

Eric123-tech commented Jan 16, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants