Add custom evaluation function and configuration loading#27
Conversation
|
The latest updates on your projects. Learn more about Vercel for Git ↗︎
|
- Remove redundant code related to test cases using custom evaluation functions. - Enhance tests to verify that the custom eval function is executed correctly during optimization. - Ensure proper cleanup of temporary files created during tests.
There was a problem hiding this comment.
Pull Request Overview
This PR refactors evaluation loading into a centralized function, adds JSON-based configuration in the CLI, and updates tests and examples to align with the new evaluation approach.
- Extracted eval-loading logic into
get_evaluationand removed ad-hoc custom-file handling - Introduced JSON configuration support via the
configenvironment variable - Updated tests to drive eval loading through entrypoints and added a template
test_evalin examples
Reviewed Changes
Copilot reviewed 3 out of 3 changed files in this pull request and generated 2 comments.
| File | Description |
|---|---|
| aiai/main.py | Added get_evaluation, removed custom_eval_file option and logic, added JSON config logic |
| aiai/test_main.py | Removed legacy custom-eval tests, adapted existing test, and added test_cli_with_entrypoint_eval_function |
| aiai/examples/openai_agent.py | Imported random and added a test_eval template for custom evaluation functions |
Comments suppressed due to low confidence (1)
aiai/test_main.py:123
- The multiline string passed to
temp_file.writehas inconsistent indentation and context comments that may produce invalid Python syntax. Consider usingtextwrap.dedentor aligning the triple-quoted content so the generated file is syntactically correct.
temp_file.write(
| @patch.object(sys, "argv", ["aiai"]) | ||
| def test_cli_with_entrypoint_eval_function( | ||
| mock_load_dotenv, | ||
| mock_reset_db, | ||
| mock_optimization_run, | ||
| mock_analyze_code, | ||
| mock_validate_entrypoint, | ||
| mock_prompt, |
There was a problem hiding this comment.
Patching sys.argv is unnecessary because runner.invoke controls the CLI arguments. Additionally, mock_prompt is never called now that custom-file prompting is removed. Update the test to pass --entrypoint via runner.invoke or remove the unused patches.
| @patch.object(sys, "argv", ["aiai"]) | |
| def test_cli_with_entrypoint_eval_function( | |
| mock_load_dotenv, | |
| mock_reset_db, | |
| mock_optimization_run, | |
| mock_analyze_code, | |
| mock_validate_entrypoint, | |
| mock_prompt, | |
| def test_cli_with_entrypoint_eval_function( | |
| mock_load_dotenv, | |
| mock_reset_db, | |
| mock_optimization_run, | |
| mock_analyze_code, | |
| mock_validate_entrypoint, |
| ) | ||
| rules_eval = None | ||
| custom_eval_fn = None | ||
| custom_eval_fn, rules_eval = get_evaluation(entrypoint, opt_ctx=opt_ctx, evaluator=evaluator) |
There was a problem hiding this comment.
The returned rules_eval is not clearly forwarded to _optimization_run. If _optimization_run expects both the custom function and generated rules, ensure you pass rules_eval=rules_eval (or merge it into the call signature) so fallback evaluations are used.
This pull request introduces enhancements to the evaluation process, configuration management, and modularity in the
aiaicodebase. Key changes include the addition of a utility function to dynamically load evaluation logic, support for configuration via JSON files, and the removal of redundant code for custom evaluation file handling.Enhancements to Evaluation Process:
Dynamic Evaluation Loading: Added a new
get_evaluationfunction inaiai/main.pyto dynamically load a callable namedevalfrom the entrypoint module or fall back to generating evaluation criteria. This simplifies and centralizes evaluation logic. [1] [2]Custom Evaluation Example: Introduced a template evaluation function
test_evalinaiai/examples/openai_agent.pyto serve as a guide for defining custom evaluation logic.Configuration Management:
configenvironment variable. This allows overriding CLI/default values with JSON-defined settings.Code Cleanup and Simplification:
Removed Custom Eval File Handling: Removed redundant logic for handling custom evaluation files in
mainand replaced it with the newget_evaluationfunction. This streamlines the code and reduces duplication. [1] [2]Miscellaneous Updates: Added the
randommodule import inaiai/examples/openai_agent.pyto support the newtest_evalfunction.