Skip to content

[New Task] find-fingers#21

Open
Eric123-tech wants to merge 1 commit into
cocoabench:mainfrom
Eric123-tech:task/find-fingers
Open

[New Task] find-fingers#21
Eric123-tech wants to merge 1 commit into
cocoabench:mainfrom
Eric123-tech:task/find-fingers

Conversation

@Eric123-tech
Copy link
Copy Markdown
Contributor

This task provides an image and requires identifying and distinguishing the objects specified in the instructions. The image contains many misleading elements, testing the agent’s ability to accurately recognize the image, perform preprocessing, and carry out deep reasoning.

Copilot AI review requested due to automatic review settings January 16, 2026 04:09
Copy link
Copy Markdown

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

This PR adds three new image recognition tasks to the CocoaBench benchmark: find-players, find-hero, and find-fingers. Each task requires agents to identify and distinguish specific objects in images containing misleading elements, testing their visual recognition, preprocessing capabilities, and reasoning abilities.

Changes:

  • Added three new encrypted benchmark tasks (find-players, find-hero, find-fingers) following the contribution guidelines structure
  • Each task includes encrypted instruction, evaluation, solution, and metadata files
  • Added image URLs hosted on postimg.cc for each task

Reviewed changes

Copilot reviewed 18 out of 18 changed files in this pull request and generated 1 comment.

File Description
cocoabench-head/find-players/* New task files for identifying players in an image
cocoabench-head/find-hero/* New task files for identifying heroes in an image
cocoabench-head/find-fingers/* New task files for identifying fingers in an image

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

Comment thread cocoabench-head/find-players/canary.txt Outdated
@@ -0,0 +1 @@
02c10714a5911f76 No newline at end of file
Copy link

Copilot AI Jan 16, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The PR title indicates a single new task "[New Task] find-fingers", but this PR actually adds three separate tasks: find-players, find-hero, and find-fingers. The PR description should be updated to mention all three tasks being added, or explain why they are being grouped together in a single PR.

Copilot uses AI. Check for mistakes.
@Leolty
Copy link
Copy Markdown
Collaborator

Leolty commented Jan 16, 2026

Hi @Eric123-tech! Thanks for the contribution!

I read all three tasks carefully. My current impression is that they mostly look like single-image VQA / counting / ID-style problems (even though models can still make mistakes). In contrib/CONTRIBUTING.md, we emphasize multi-step/multi-tool/multi-cognition tasks, typically ones that are difficult to solve in just a few steps and require more extensive work (e.g., search/browse, reasoning, and coding). From that lens, these may not align as strongly with the benchmark direction as we’re aiming for.

For the find-hero task specifically, it does feel somewhat more promising, but currently it reads like a prior-knowledge recognition question. Is the intended workflow that the agent should actually search/browse all Honor of Kings heroes and compare candidates one by one?

@Eric123-tech
Copy link
Copy Markdown
Contributor Author

Hi @Eric123-tech! Thanks for the contribution!

I read all three tasks carefully. My current impression is that they mostly look like single-image VQA / counting / ID-style problems (even though models can still make mistakes). In contrib/CONTRIBUTING.md, we emphasize multi-step/multi-tool/multi-cognition tasks, typically ones that are difficult to solve in just a few steps and require more extensive work (e.g., search/browse, reasoning, and coding). From that lens, these may not align as strongly with the benchmark direction as we’re aiming for.

For the find-hero task specifically, it does feel somewhat more promising, but currently it reads like a prior-knowledge recognition question. Is the intended workflow that the agent should actually search/browse all Honor of Kings heroes and compare candidates one by one?

Hi @Leolty, thanks for the feedback!

You are right that in a general scenario, one-to-one visual matching might be needed. However, in the image provided for this task, the original hero's key quote/line appears at the bottom. The intended workflow is that the agent can extract this text and use it to search for the specific hero directly, rather than browsing through all candidates visually.
Also, I think this is a fun contribution process and I’ll iterate and think up tasks that better fit the benchmark goal.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants