Skip to content

[Team Falcon] Oracle Forge — TRP1 FDE Programme, April 2026#35

Open
richh-s wants to merge 1 commit intoucbepic:mainfrom
richh-s:main
Open

[Team Falcon] Oracle Forge — TRP1 FDE Programme, April 2026#35
richh-s wants to merge 1 commit intoucbepic:mainfrom
richh-s:main

Conversation

@richh-s
Copy link
Copy Markdown

@richh-s richh-s commented Apr 18, 2026

Oracle Forge — Team Falcon

Agent name: Oracle Forge
Team: Team Falcon, TRP1 FDE Programme, April 2026
Backbone LLM: gemini/gemini-2.0-flash-001 via OpenRouter (OpenAI-compatible API)
Trial count: 5 trials per query
Hints used: Yes — three-layer context system

Context Layers

  • Layer 1 (schema): agent/AGENT.md — full schema for 8 DAB datasets, join key rules, behavioral rules
  • Layer 2 (domain): kb/domain/ — Yelp field map, join key glossary, query skeletons, anti-patterns (15 entries)
  • Layer 3 (corrections): kb/corrections/corrections_log.md — 32 structured failure entries read at session start

Architecture

  • Self-correcting execution loop: 4 failure categories (syntax_error, join_key_format, wrong_table, domain_knowledge_gap), retry up to 3×
  • MCP Toolbox JSON-RPC server routing to PostgreSQL, MongoDB, SQLite, DuckDB
  • Best result: 100% pass@1 on Yelp (7/7 queries, 5 trials)

Results Summary

Dataset Queries Best pass@1 Trials
yelp 7 100% 5
bookreview 3 33% 1
GITHUB_REPOS 4 0% 1
stockmarket 5 0% 1

Submission File

leaderboard_submissions/oracleforge_teamfalcon_trp1_n5.json — 47 entries (dataset/query/run/answer)

Code

https://github.com/Natnael-Alemseged/oracle-forge

@shreyashankar
Copy link
Copy Markdown
Collaborator

Hi @richh-s — we're missing coverage. The file has 47 entries across 4 of 12 datasets (yelp, bookreview, GITHUB_REPOS, stockmarket) and 19 of 54 queries, with 1 run on most of them. Per the instructions in the README, we need every query across all 12 datasets with at least 5 runs per query. If you didn't attempt some queries, include those entries with "answer": "". Once it's in I'll re-run verification and post the Pass@1 here.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants