Add custom evaluation function and configuration loading by behrrad · Pull Request #27 · zenbase-ai/aiai-cli

behrrad · 2025-05-21T00:38:32Z

This pull request introduces enhancements to the evaluation process, configuration management, and modularity in the aiai codebase. Key changes include the addition of a utility function to dynamically load evaluation logic, support for configuration via JSON files, and the removal of redundant code for custom evaluation file handling.

Enhancements to Evaluation Process:

Dynamic Evaluation Loading: Added a new get_evaluation function in aiai/main.py to dynamically load a callable named eval from the entrypoint module or fall back to generating evaluation criteria. This simplifies and centralizes evaluation logic. [1] [2]
Custom Evaluation Example: Introduced a template evaluation function test_eval in aiai/examples/openai_agent.py to serve as a guide for defining custom evaluation logic.

Configuration Management:

JSON Configuration Support: Added support for loading configuration parameters from a JSON file specified via the config environment variable. This allows overriding CLI/default values with JSON-defined settings.

Code Cleanup and Simplification:

Removed Custom Eval File Handling: Removed redundant logic for handling custom evaluation files in main and replaced it with the new get_evaluation function. This streamlines the code and reduces duplication. [1] [2]
Miscellaneous Updates: Added the random module import in aiai/examples/openai_agent.py to support the new test_eval function.

vercel · 2025-05-21T00:38:37Z

The latest updates on your projects. Learn more about Vercel for Git ↗︎

Name	Status	Preview	Comments	Updated (UTC)
aiai-cli-docs	✅ Ready (Inspect)	Visit Preview	💬 Add feedback	May 21, 2025 11:25pm

- Remove redundant code related to test cases using custom evaluation functions. - Enhance tests to verify that the custom eval function is executed correctly during optimization. - Ensure proper cleanup of temporary files created during tests.

Copilot

Pull Request Overview

This PR refactors evaluation loading into a centralized function, adds JSON-based configuration in the CLI, and updates tests and examples to align with the new evaluation approach.

Extracted eval-loading logic into get_evaluation and removed ad-hoc custom-file handling
Introduced JSON configuration support via the config environment variable
Updated tests to drive eval loading through entrypoints and added a template test_eval in examples

Reviewed Changes

Copilot reviewed 3 out of 3 changed files in this pull request and generated 2 comments.

File	Description
aiai/main.py	Added `get_evaluation`, removed `custom_eval_file` option and logic, added JSON config logic
aiai/test_main.py	Removed legacy custom-eval tests, adapted existing test, and added `test_cli_with_entrypoint_eval_function`
aiai/examples/openai_agent.py	Imported `random` and added a `test_eval` template for custom evaluation functions

Comments suppressed due to low confidence (1)

aiai/test_main.py:123

The multiline string passed to temp_file.write has inconsistent indentation and context comments that may produce invalid Python syntax. Consider using textwrap.dedent or aligning the triple-quoted content so the generated file is syntactically correct.

temp_file.write(

Copilot · 2025-05-21T03:19:58Z

+@patch.object(sys, "argv", ["aiai"])
+def test_cli_with_entrypoint_eval_function(
+    mock_load_dotenv,
+    mock_reset_db,
+    mock_optimization_run,
+    mock_analyze_code,
+    mock_validate_entrypoint,
+    mock_prompt,


Patching sys.argv is unnecessary because runner.invoke controls the CLI arguments. Additionally, mock_prompt is never called now that custom-file prompting is removed. Update the test to pass --entrypoint via runner.invoke or remove the unused patches.

Suggested change

@patch.object(sys, "argv", ["aiai"])

def test_cli_with_entrypoint_eval_function(

mock_load_dotenv,

mock_reset_db,

mock_optimization_run,

mock_analyze_code,

mock_validate_entrypoint,

mock_prompt,

def test_cli_with_entrypoint_eval_function(

mock_load_dotenv,

mock_reset_db,

mock_optimization_run,

mock_analyze_code,

mock_validate_entrypoint,

Copilot · 2025-05-21T03:19:58Z

-        )
-        rules_eval = None
-        custom_eval_fn = None
+    custom_eval_fn, rules_eval = get_evaluation(entrypoint, opt_ctx=opt_ctx, evaluator=evaluator)


The returned rules_eval is not clearly forwarded to _optimization_run. If _optimization_run expects both the custom function and generated rules, ensure you pass rules_eval=rules_eval (or merge it into the call signature) so fallback evaluations are used.

ammirsm

LGTM.

Add custom evaluation function and configuration loading

da6bf42

behrrad requested a review from ammirsm May 21, 2025 00:38

Improve eval function testing

3b59858

- Remove redundant code related to test cases using custom evaluation functions. - Enhance tests to verify that the custom eval function is executed correctly during optimization. - Ensure proper cleanup of temporary files created during tests.

vercel Bot deployed to Preview May 21, 2025 00:59 View deployment

ammirsm requested a review from Copilot May 21, 2025 03:15

Copilot AI reviewed May 21, 2025

View reviewed changes

Remove patching of sys.argv in test_cli_with_entrypoint_eval_function

9336f10

vercel Bot deployed to Preview May 21, 2025 23:25 View deployment

ammirsm approved these changes May 21, 2025

View reviewed changes

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Add custom evaluation function and configuration loading#27

Add custom evaluation function and configuration loading#27
behrrad wants to merge 3 commits into
mainfrom
update-eval-function

behrrad commented May 21, 2025

Uh oh!

vercel Bot commented May 21, 2025 •

edited

Loading

Uh oh!

Copilot AI left a comment

Uh oh!

Copilot AI May 21, 2025

Uh oh!

Copilot AI May 21, 2025

Uh oh!

ammirsm left a comment

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

Conversation

behrrad commented May 21, 2025

Enhancements to Evaluation Process:

Configuration Management:

Code Cleanup and Simplification:

Uh oh!

vercel Bot commented May 21, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

Copilot AI left a comment

Choose a reason for hiding this comment

Pull Request Overview

Reviewed Changes

Uh oh!

Copilot AI May 21, 2025

Choose a reason for hiding this comment

Uh oh!

Copilot AI May 21, 2025

Choose a reason for hiding this comment

Uh oh!

ammirsm left a comment

Choose a reason for hiding this comment

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

vercel Bot commented May 21, 2025 •

edited

Loading