Skip to content

Add custom evaluation function and configuration loading#27

Open
behrrad wants to merge 3 commits into
mainfrom
update-eval-function
Open

Add custom evaluation function and configuration loading#27
behrrad wants to merge 3 commits into
mainfrom
update-eval-function

Conversation

@behrrad
Copy link
Copy Markdown
Contributor

@behrrad behrrad commented May 21, 2025

This pull request introduces enhancements to the evaluation process, configuration management, and modularity in the aiai codebase. Key changes include the addition of a utility function to dynamically load evaluation logic, support for configuration via JSON files, and the removal of redundant code for custom evaluation file handling.

Enhancements to Evaluation Process:

  • Dynamic Evaluation Loading: Added a new get_evaluation function in aiai/main.py to dynamically load a callable named eval from the entrypoint module or fall back to generating evaluation criteria. This simplifies and centralizes evaluation logic. [1] [2]

  • Custom Evaluation Example: Introduced a template evaluation function test_eval in aiai/examples/openai_agent.py to serve as a guide for defining custom evaluation logic.

Configuration Management:

  • JSON Configuration Support: Added support for loading configuration parameters from a JSON file specified via the config environment variable. This allows overriding CLI/default values with JSON-defined settings.

Code Cleanup and Simplification:

  • Removed Custom Eval File Handling: Removed redundant logic for handling custom evaluation files in main and replaced it with the new get_evaluation function. This streamlines the code and reduces duplication. [1] [2]

  • Miscellaneous Updates: Added the random module import in aiai/examples/openai_agent.py to support the new test_eval function.

@behrrad behrrad requested a review from ammirsm May 21, 2025 00:38
@vercel
Copy link
Copy Markdown

vercel Bot commented May 21, 2025

The latest updates on your projects. Learn more about Vercel for Git ↗︎

Name Status Preview Comments Updated (UTC)
aiai-cli-docs ✅ Ready (Inspect) Visit Preview 💬 Add feedback May 21, 2025 11:25pm

- Remove redundant code related to test cases using custom evaluation functions.
- Enhance tests to verify that the custom eval function is executed correctly during optimization.
- Ensure proper cleanup of temporary files created during tests.
Copy link
Copy Markdown

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull Request Overview

This PR refactors evaluation loading into a centralized function, adds JSON-based configuration in the CLI, and updates tests and examples to align with the new evaluation approach.

  • Extracted eval-loading logic into get_evaluation and removed ad-hoc custom-file handling
  • Introduced JSON configuration support via the config environment variable
  • Updated tests to drive eval loading through entrypoints and added a template test_eval in examples

Reviewed Changes

Copilot reviewed 3 out of 3 changed files in this pull request and generated 2 comments.

File Description
aiai/main.py Added get_evaluation, removed custom_eval_file option and logic, added JSON config logic
aiai/test_main.py Removed legacy custom-eval tests, adapted existing test, and added test_cli_with_entrypoint_eval_function
aiai/examples/openai_agent.py Imported random and added a test_eval template for custom evaluation functions
Comments suppressed due to low confidence (1)

aiai/test_main.py:123

  • The multiline string passed to temp_file.write has inconsistent indentation and context comments that may produce invalid Python syntax. Consider using textwrap.dedent or aligning the triple-quoted content so the generated file is syntactically correct.
temp_file.write(

Comment thread aiai/test_main.py Outdated
Comment on lines +213 to +220
@patch.object(sys, "argv", ["aiai"])
def test_cli_with_entrypoint_eval_function(
mock_load_dotenv,
mock_reset_db,
mock_optimization_run,
mock_analyze_code,
mock_validate_entrypoint,
mock_prompt,
Copy link

Copilot AI May 21, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Patching sys.argv is unnecessary because runner.invoke controls the CLI arguments. Additionally, mock_prompt is never called now that custom-file prompting is removed. Update the test to pass --entrypoint via runner.invoke or remove the unused patches.

Suggested change
@patch.object(sys, "argv", ["aiai"])
def test_cli_with_entrypoint_eval_function(
mock_load_dotenv,
mock_reset_db,
mock_optimization_run,
mock_analyze_code,
mock_validate_entrypoint,
mock_prompt,
def test_cli_with_entrypoint_eval_function(
mock_load_dotenv,
mock_reset_db,
mock_optimization_run,
mock_analyze_code,
mock_validate_entrypoint,

Copilot uses AI. Check for mistakes.
Comment thread aiai/main.py
)
rules_eval = None
custom_eval_fn = None
custom_eval_fn, rules_eval = get_evaluation(entrypoint, opt_ctx=opt_ctx, evaluator=evaluator)
Copy link

Copilot AI May 21, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The returned rules_eval is not clearly forwarded to _optimization_run. If _optimization_run expects both the custom function and generated rules, ensure you pass rules_eval=rules_eval (or merge it into the call signature) so fallback evaluations are used.

Copilot uses AI. Check for mistakes.
Copy link
Copy Markdown
Contributor

@ammirsm ammirsm left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants