Skip to content

[New Task] find hero#19

Open
Eric123-tech wants to merge 3 commits into
cocoabench:mainfrom
Eric123-tech:task/find-hero
Open

[New Task] find hero#19
Eric123-tech wants to merge 3 commits into
cocoabench:mainfrom
Eric123-tech:task/find-hero

Conversation

@Eric123-tech
Copy link
Copy Markdown
Contributor

An image shows a fan-made character that is a secondary creation based on a hero from Honor of Kings. The goal is to infer which Honor of Kings hero it is imitating, using the character’s visual traits in the image and/or key lines of dialogue shown in the picture. This task evaluates the agent’s ability to analyze and reason about visual features, perform text recognition, and search for and compare hero characteristics to identify the closest match.

Copilot AI review requested due to automatic review settings January 16, 2026 01:31
Copy link
Copy Markdown

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

This PR adds a new task called "find-hero" to the CocoaBench benchmark. The task evaluates an AI agent's ability to identify which Honor of Kings hero a fan-made character is imitating, based on visual features and dialogue from an image. This tests multi-modal reasoning, text recognition, and search capabilities.

Changes:

  • Added new encrypted task files (instruction, evaluation, solution, metadata) following the CocoaBench contribution format
  • Included asset reference URL pointing to the task image
  • Added canary token for encryption verification

Reviewed changes

Copilot reviewed 6 out of 6 changed files in this pull request and generated no comments.

Show a summary per file
File Description
cocoabench-head/find-hero/instruction.md.enc Encrypted task instruction file for the agent
cocoabench-head/find-hero/evaluation.md.enc Encrypted evaluation criteria and expected answer
cocoabench-head/find-hero/solution.md.enc Encrypted human solution walkthrough
cocoabench-head/find-hero/metadata.json.enc Encrypted task metadata including task ID and properties
cocoabench-head/find-hero/canary.txt Encryption key token for decryption
cocoabench-head/find-hero/assets/urls.txt URL reference to the task image asset

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

@zzn-nzz
Copy link
Copy Markdown
Collaborator

zzn-nzz commented Feb 4, 2026

Hi @Eric123-tech, thanks for the contribution!

I noticed that the Google Drive link under assets/ currently requires permission to access. Could you please make it public and update the link so others can view it?

Additionally, I’m a bit concerned that this image may be easily searchable online, which could lead to trivial or obvious answers. I’m also not sure whether the current setup sufficiently involves long-horizon planning and multi-step reasoning.

We’d appreciate it if you could revise the task with these points in mind. Thanks again, and we look forward to the updated version!

@Eric123-tech
Copy link
Copy Markdown
Contributor Author

Hi @Eric123-tech, thanks for the contribution!

I noticed that the Google Drive link under assets/ currently requires permission to access. Could you please make it public and update the link so others can view it?

I think this is overall an interesting task. We’d appreciate it if you could revise the gdrive link. Thanks again, and we look forward to the updated version!

Hi @zzn-nzz ,sorry for the late reply! Sure, I have made the origininal gdrive link public now:
https://drive.google.com/drive/folders/17JN-btYlcrucxfdXp2a4HA2cvXb4uRlY?usp=drive_link

@zzn-nzz
Copy link
Copy Markdown
Collaborator

zzn-nzz commented Feb 6, 2026

Hi @Eric123-tech, thanks for the contribution!
I noticed that the Google Drive link under assets/ currently requires permission to access. Could you please make it public and update the link so others can view it?
I think this is overall an interesting task. We’d appreciate it if you could revise the gdrive link. Thanks again, and we look forward to the updated version!

Hi @zzn-nzz ,sorry for the late reply! Sure, I have made the origininal gdrive link public now: https://drive.google.com/drive/folders/17JN-btYlcrucxfdXp2a4HA2cvXb4uRlY?usp=drive_link

Hi @Eric123-tech,

Sorry for the wrongly posted message earlier. It should be posted to PR #20.

Could you please address the previous concerns? We would appreciate it if you could consider incorporating more long-horizon aspects and involving a broader range of capabilities. In addition, the current solution feels a bit high-level.

It would be very helpful if you could provide a more concrete and step-by-step solution that someone without domain-specific knowledge could reasonably follow and use to solve the problem.

Thanks! Looking forward to the updated version.

@zzn-nzz zzn-nzz closed this Apr 9, 2026
@zzn-nzz zzn-nzz reopened this Apr 9, 2026
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants