[New Task] find hero#19
Conversation
There was a problem hiding this comment.
Pull request overview
This PR adds a new task called "find-hero" to the CocoaBench benchmark. The task evaluates an AI agent's ability to identify which Honor of Kings hero a fan-made character is imitating, based on visual features and dialogue from an image. This tests multi-modal reasoning, text recognition, and search capabilities.
Changes:
- Added new encrypted task files (instruction, evaluation, solution, metadata) following the CocoaBench contribution format
- Included asset reference URL pointing to the task image
- Added canary token for encryption verification
Reviewed changes
Copilot reviewed 6 out of 6 changed files in this pull request and generated no comments.
Show a summary per file
| File | Description |
|---|---|
| cocoabench-head/find-hero/instruction.md.enc | Encrypted task instruction file for the agent |
| cocoabench-head/find-hero/evaluation.md.enc | Encrypted evaluation criteria and expected answer |
| cocoabench-head/find-hero/solution.md.enc | Encrypted human solution walkthrough |
| cocoabench-head/find-hero/metadata.json.enc | Encrypted task metadata including task ID and properties |
| cocoabench-head/find-hero/canary.txt | Encryption key token for decryption |
| cocoabench-head/find-hero/assets/urls.txt | URL reference to the task image asset |
💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.
|
Hi @Eric123-tech, thanks for the contribution! I noticed that the Google Drive link under Additionally, I’m a bit concerned that this image may be easily searchable online, which could lead to trivial or obvious answers. I’m also not sure whether the current setup sufficiently involves long-horizon planning and multi-step reasoning. We’d appreciate it if you could revise the task with these points in mind. Thanks again, and we look forward to the updated version! |
Hi @zzn-nzz ,sorry for the late reply! Sure, I have made the origininal gdrive link public now: |
Hi @Eric123-tech, Sorry for the wrongly posted message earlier. It should be posted to PR #20. Could you please address the previous concerns? We would appreciate it if you could consider incorporating more long-horizon aspects and involving a broader range of capabilities. In addition, the current solution feels a bit high-level. It would be very helpful if you could provide a more concrete and step-by-step solution that someone without domain-specific knowledge could reasonably follow and use to solve the problem. Thanks! Looking forward to the updated version. |
An image shows a fan-made character that is a secondary creation based on a hero from Honor of Kings. The goal is to infer which Honor of Kings hero it is imitating, using the character’s visual traits in the image and/or key lines of dialogue shown in the picture. This task evaluates the agent’s ability to analyze and reason about visual features, perform text recognition, and search for and compare hero characteristics to identify the closest match.