Popular repositories Loading
-
skillsbench
skillsbench PublicSkillsBench evaluates how well skills work and how effective agents are at using them
-
-
-
Repositories
- cli Public Forked from googleworkspace/cli
Google Workspace CLI — one command-line tool for Drive, Gmail, Calendar, Sheets, Docs, Chat, Admin, and more. Dynamically built from Google Discovery Service. Includes AI agent skills.
benchflow-ai/cli’s past year of commit activity - harbor Public Forked from harbor-framework/harbor
Harbor is a framework for running agent evaluations and creating and using RL environments.
benchflow-ai/harbor’s past year of commit activity - gepa Public Forked from gepa-ai/gepa
Optimize prompts, code, and more with AI-powered Reflective Text Evolution
benchflow-ai/gepa’s past year of commit activity - skillsbench Public
SkillsBench evaluates how well skills work and how effective agents are at using them
benchflow-ai/skillsbench’s past year of commit activity - terminal-bench-3 Public Forked from harbor-framework/terminal-bench-3
🚧 Accepting Task Submissions 🚧
benchflow-ai/terminal-bench-3’s past year of commit activity - skillsbench-trajectories Public
benchflow-ai/skillsbench-trajectories’s past year of commit activity - llm-builds-linux Public
benchflow-ai/llm-builds-linux’s past year of commit activity - benchflow Public
AI benchmark runtime framework that allows you to integrate and evaluate AI tasks using Docker-based benchmarks.
benchflow-ai/benchflow’s past year of commit activity - pokemon-gym Public
benchflow-ai/pokemon-gym’s past year of commit activity
Top languages
Loading…
Most used topics
Loading…