feat(designer): Agent evaluations tab#8932
Conversation
🤖 AI PR Validation ReportPR Review ResultsThank you for your submission! Here's detailed feedback on your PR title and body compliance:✅ PR Title
✅ Commit Type
❌ Risk Level
✅ What & Why
|
| Section | Status | Recommendation |
|---|---|---|
| Title | ✅ | Title is good; optionally call out backend dependency in title if required. |
| Commit Type | ✅ | Correct (feature). |
| Risk Level | ❌ | Update to High and the risk:high label; include rollout coordination notes. |
| What & Why | ✅ | Good; add note about backend/feature flag if needed. |
| Impact of Change | Add explicit backend/API contract and package/version impact notes. | |
| Test Plan | ❌ | Add unit + E2E tests or provide a justified mitigation plan with timeline. |
| Contributors | Consider adding other contributors (PM/Design) or acknowledge them. | |
| Screenshots/Videos | Add screenshots or a short recording of the new UI flows. |
Action items (required before merging):
- Update PR Risk Level to High and add/replace repo label with
risk:high. Document why the risk was raised (broad UI + service + API changes). - Add automated tests (preferred):
- Unit tests for new queries and evaluation slice (reducers/selectors).
- Unit tests for key UI behavior (form validation, list interactions, enabling/disabling actions).
- E2E test(s) for the evaluate flow: open Evaluate tab, select run/agent, create evaluator, run evaluation, assert result displayed.
If tests are blocked, add a clear Test Plan that explains blockers and a follow-up ticket/PR with ETA.
- Add screenshots/videos of the new Evaluate tab and panels (management, form, details, results) or a short walkthrough video.
- Call out backend/API prerequisites (new endpoints) in the PR body and confirm whether the backend is already deployed or will be released simultaneously. If the backend is not ready, mark the PR as draft or gate behind a feature flag.
- Consider adding a short migration/compatibility note for other consumers of shared packages (designer-v2, logic-apps-shared).
Please update the PR title/body and attach tests/screenshots as recommended, then re-submit. Because of the scope and cross-cutting changes, I recommend coordinating a release plan with the backend and QA teams and bumping the risk label to risk:high prior to merging. Thank you for the thorough implementation — once the test coverage and rollout details are provided, this will be much easier to approve.
Last updated: Sat, 21 Mar 2026 06:00:28 GMT
🤖 AI PR Validation ReportPR Review ResultsThank you for your submission! Here's detailed feedback on your PR title and body compliance:✅ PR Title
✅ Commit Type
|
| Section | Status | Recommendation |
|---|---|---|
| Title | ✅ | Keep as-is or slightly expand for clarity |
| Commit Type | ✅ | OK |
| Risk Level | Recommend bump to risk:high and update label |
|
| What & Why | ✅ | Good; optionally mention high-level files changed |
| Impact of Change | ❌ | Expand to list system-level impacts and API changes |
| Test Plan | ❌ | Add unit/E2E tests or a detailed manual test plan |
| Contributors | ✅ | OK; add others if applicable |
| Screenshots/Videos | Add visual proof for UI changes |
Summary:
This PR introduces a large feature set (new evaluation UI, new redux slice, queries, models, and a new StandardEvaluationService). Because this touches core libraries, the store, service initialization, and adds network/API interactions, I recommend raising the risk to High (please update label) and adding tests or a detailed manual test plan. At present, the PR does NOT pass the PR body checklist because the Test Plan is empty — please add automated tests or a robust manual testing section and address the risk label.
Please update the PR title/body with the following specific items and then re-submit:
- Risk label: change to
risk:high(comment in PR explaining why: touches core libs/store/services/API). - Test Plan: either add test files (unit tests for evaluationSlice, queries, EvaluateView components; integration/E2E flow that covers create/run evaluation) OR add a detailed manual testing section with step-by-step instructions and expected results.
- Impact of Change: expand to describe system/backend/API impacts (new endpoints, potential runtime/cost), and any migration steps (none seen — if none, explicitly state so).
- Screenshots/Videos: include a screenshot of the Evaluate tab, the create evaluator form, and an evaluation result (or a short demo GIF).
Thank you for the thorough implementation. Once tests/manual test plan and the risk label are addressed, this will be in much better shape for merging.
Helpful file-specific test suggestions:
- libs/designer-v2/src/lib/core/state/evaluation/evaluationSlice.ts -> unit tests for reducer actions and reset behavior.
- libs/designer-v2/src/lib/core/queries/evaluations.ts -> mock EvaluationService and test query keys, enabled/disabled logic, and onSuccess invalidations for mutations.
- libs/logic-apps-shared/src/designer-client-services/lib/standard/evaluation.ts -> unit tests for URL/HTTP calls using a mocked IHttpClient.
- EvaluateView & panels -> component tests for rendering states (empty, loading, error, result) and form submission flows (EvaluatorFormPanel).
Please update and ping reviewers when ready. Thank you!
Last updated: Tue, 17 Mar 2026 17:31:38 GMT
|
📊 Coverage check completed. See workflow run for details. |
…vice, update views
Commit Type
Risk Level
What & Why
Add agent evaluations functionality in a new designer tab. Allows users to evaluate A2A/agentic workflow runs using a predefined set of evaluators (tool call trajectory, semantic similarity, custom prompt). All evaluators either use reference runs as ground truth or a separate evaluator model as a judge.
Impact of Change
Test Plan
Contributors
@andrew-eldridge
Screenshots/Videos