Skip to content

Kba draft review fixes#20

Merged
abossard merged 12 commits intomainfrom
kba-draft-review-fixes
Mar 10, 2026
Merged

Kba draft review fixes#20
abossard merged 12 commits intomainfrom
kba-draft-review-fixes

Conversation

@abossard
Copy link
Copy Markdown
Owner

This pull request introduces significant updates to support the new KBA Drafter feature, which leverages OpenAI for Knowledge Base Article generation, and removes the previous Ollama-based LLM integration. The changes include new configuration options, improved documentation, enhanced error handling, and additional API endpoints. Below are the most important changes grouped by theme:

KBA Drafter & OpenAI Integration

  • Added new environment variables in .env.example for configuring OpenAI and future KBA publishing adapters, and included Ollama config as an alternative (for KBA Drafter), enabling flexible backend LLM support.
  • Updated documentation (README.md) to focus on OpenAI as the primary LLM provider, removing Ollama setup instructions and API references, and providing new guides for the KBA Drafter feature. [1] [2] [3] [4] [5] [6] [7] [8]

Scheduler and Application Lifecycle

  • Introduced application lifecycle hooks in backend/app.py to start and stop an auto-generation scheduler for KBA drafts on app startup and shutdown, improving reliability and automation.

Error Handling Enhancements

  • Implemented custom error handlers in backend/app.py for KBA Drafter-specific exceptions, providing clear and actionable API error responses for issues like LLM unavailability, authentication, rate limits, and publishing errors.

API Improvements

  • Added a new endpoint /api/csv-tickets/by-incident/<incident_id> to fetch tickets by incident ID, supporting optional field selection for more flexible data retrieval.

Dependency Updates

  • Added APScheduler and tzlocal as backend dependencies to support scheduled KBA auto-generation.

SubSonic731 and others added 11 commits March 3, 2026 17:09
- struktur aufgeräumt
- README.md angepasst
- learning_mechanism.md plan erstellt
- desing fixes
Database & Backend:
- Add search_questions column migration in operations.py (ALTER TABLE for existing databases)
- Add /api/kba/drafts/{id}/replace endpoint in app.py
- Fix backward compatibility in kba_service.py (_table_to_draft, _draft_to_table)
- Add search questions generation to replace_draft workflow
- Fix NULL constraint errors by ensuring empty strings for required fields
- Update related_tickets validation: accept INC + 9-12 digits (was fixed at 12)

Frontend:
- Add Text component import to KBADrafterPage.jsx (fix TypeError)
- Add full-screen blur overlay with centered spinner during KBA generation
- Show overlay for both new draft creation and replacement operations
- Update styles: loadingOverlay with backdrop-filter blur effect

Documentation:
- Update kba_prompts.py: clarify related_tickets format with examples
- Update GENERAL.md: correct related_tickets format specification

Fixes #1 - KBA drafts not loading (missing DB column)
Fixes #2 - Replace endpoint not found (405 error)
Fixes #3 - Ticket ID validation too strict
- Add "Zurück zu Entwurf" button for reviewed status KBAs
- Add handleUnreview() handler to update status from "reviewed" to "draft"
- Import ArrowUndo24Regular icon for the unreview action
- Allow users to continue editing KBAs after review without deletion

This enables editing of reviewed KBAs that need changes before publishing.
… improvements

- Add ticket viewer dialog to display original incident details
  * New "Ticket" button in KBA header with DocumentSearch icon
  * Modal dialog showing incident data (ID, summary, status, priority, assignee, notes, resolution)
  * Backend endpoint /api/csv-tickets/by-incident/<incident_id> for incident ID lookup
  * Frontend API function getCSVTicketByIncident()

- Add unreview functionality for reviewed KBAs
  * "Zurück zu Entwurf" button with ArrowUndo icon
  * Allows resetting reviewed KBAs back to draft status for further editing

- Redesign KBA overview list
  * Replace corner delete button with professional overflow menu (⋮)
  * Horizontal layout: content left, status badge right-aligned, menu button
  * Menu component with delete option

- Add status filter dropdown to KBA overview
  * Filter options: All, draft, reviewed, published
  * Dropdown in card header for easy filtering

- Align EditableList "Add" button width with input fields
  * Use invisible placeholder buttons for exact width matching
  * Ensures consistent layout regardless of allowReorder setting

Files modified:
- frontend/src/features/kba-drafter/KBADrafterPage.jsx
- frontend/src/features/kba-drafter/components/EditableList.jsx
- frontend/src/services/api.js
- backend/app.py
- Fix delete draft error: use response.items instead of response.drafts
- Make AutoGenSettings card collapsible with chevron icon
  - Starts collapsed to reduce visual dominance
  - Smooth slide-down animation when expanded
  - Status badge visible in collapsed header
  - Clickable header with keyboard support (Enter key)
When clicking on a draft from the list after scrolling down,
the page now automatically scrolls to the top with a smooth animation.
This ensures users always start at the beginning of the draft content.
…changes

Replace native window.confirm() with ConfirmDialog component for better UX
consistency and modern appearance. Adds centered warning modal when user
attempts to discard unsaved changes (close draft, switch to preview, or
load different draft).

Changes:
- Add unsavedChangesDialogOpen and pendingAction states
- Update toggleEditMode, loadDraft, and handleClose to trigger modal
- Add handleDiscardChanges and handleCancelDiscard handlers
- Add ConfirmDialog with warning intent at end of component
Fixes:
- Fix CSV folder case mismatch (CSV -> csv) in app.py and operations.py
- Remove duplicate get_ticket_by_incident_id method in csv_data.py
- Replace inefficient len(session.exec().all()) with SQL COUNT(*) in kba_service.py
- Replace hardcoded placeholder credentials with env var lookups in kba_service.py
- Fix scheduler swallowing exceptions (remove bare raise, return None)
- Add settings reload at start of each scheduler run to fix race condition
- Add generation_warnings field to surface search questions failures to users
- Add schema migration for generation_warnings column

Tests:
- Add 19 Playwright e2e tests for KBA Drafter feature covering:
  page load, navigation, LLM health status, draft generation,
  draft display, draft list, editing, review workflow,
  duplicate handling, and backend API integration

Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
Copilot AI review requested due to automatic review settings March 10, 2026 20:13
Copy link
Copy Markdown

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

This PR adds the new KBA Drafter feature (OpenAI-backed KBA draft generation + auto-generation scheduling), expands frontend API helpers/UI to support KBA draft workflows, and updates documentation/guidelines to formalize KBA structure, quality checks, and publishing rules.

Changes:

  • Added KBA Drafter frontend API methods and new KBA-focused UI components/pages.
  • Introduced backend OpenAI LLM service + KBA schemas/models/audit + auto-generation scheduler and settings.
  • Added comprehensive KBA guideline docs (system + category) and updated repo docs/README for OpenAI-first setup.

Reviewed changes

Copilot reviewed 59 out of 67 changed files in this pull request and generated 2 comments.

Show a summary per file
File Description
frontend/src/services/api.js Adds KBA Drafter + auto-gen API calls; improves fetchJSON error handling for 409.
frontend/src/features/tickets/TicketsWithoutAnAssignee.jsx Improves text wrapping in message bars and detail fields.
frontend/src/features/tickets/TicketList.jsx Improves text wrapping in table/detail UI.
frontend/src/features/kba-drafter/components/TagEditor.jsx New tag editing component for KBA drafts.
frontend/src/features/kba-drafter/components/EditableList.jsx New reusable list editor (symptoms/steps/etc.).
frontend/src/features/kba-drafter/components/DuplicateKBADialog.jsx New dialog for handling duplicate KBA drafts.
frontend/src/features/kba-drafter/components/ConfirmDialog.jsx New reusable confirmation dialog.
frontend/src/features/kba-drafter/components/AutoGenSettings.jsx New settings panel for auto-generation configuration.
frontend/src/features/kba-drafter/components/AutoGenSettings.css Styling for auto-generation settings panel.
frontend/src/App.jsx Adds “KBA Drafter” tab/route.
docs/kba_guidelines/system/*.md Defines system-level KBA rules (structure/style/quality/publish workflow).
docs/kba_guidelines/categories/*.md Adds category-specific KBA guidance (VPN/Network/Password Reset/General).
docs/KBA_OPENAI_INTEGRATION.md Documents OpenAI structured output integration and error mapping.
docs/KBA_DRAFTER_QUICKSTART.md Quickstart instructions for running the KBA Drafter.
docs/KBA_DRAFTER.md Full technical documentation for KBA Drafter.
backend/tests/test_search_questions.py Adds tests for search question generation/validation.
backend/tests/test_kba_schema.py Adds/updates tests validating the legacy JSON schema example.
backend/tests/test_guidelines_loader.py(.broken) Updates guidelines loader tests (note: a “.broken” copy exists).
backend/test_auto_gen.py Adds a manual test script for auto-generation functionality.
backend/scheduler.py Adds APScheduler-based auto-generation scheduler.
backend/requirements.txt Adds APScheduler/jsonschema/pandas dependencies.
backend/pytest.ini Adds pytest configuration (asyncio auto mode, discovery patterns).
backend/operations.py Adds KBA Drafter operations and KBA DB initialization/migration helpers.
backend/llm_service.py Introduces OpenAI Async client wrapper with structured output parsing.
backend/kba_schemas.py Adds legacy JSON schema used for docs/back-compat reference.
backend/kba_prompts.py Builds prompts for KBA generation and search question generation.
backend/kba_output_models.py Adds Pydantic schemas/validators for structured output + search questions.
backend/kba_exceptions.py Adds custom exception hierarchy for KBA/LLM/publishing.
backend/kba_audit.py Adds audit trail logging/retrieval for KBA draft lifecycle.
backend/kb_adapters.py Adds publishing adapter pattern (file adapter implemented; others stubbed).
backend/csv_data.py Enhances lookup by incident_id with UUID fallback.
backend/auto_gen_service.py Adds auto-generation settings + ticket selection + generation runner.
backend/auto_gen_models.py Adds settings and run-result models/tables for auto-generation.
backend/=3.10.4 New file added (appears to be accidental pip output artifact).
README.md Updates repo documentation to OpenAI-first and adds KBA Drafter links.
.env.example Adds OpenAI + publishing + (optional) Ollama configuration placeholders.

…dependency

- LiteLLM is now the default LLM backend (no .env or API key needed)
- Multistage model fallback chain: claude-sonnet-4 → gpt-4o → gpt-4o-mini
- OpenAI SDK still used when OPENAI_API_KEY is explicitly set
- agents.py and workbench service use ChatLiteLLM when no OpenAI key
- Added csv_ticket_stats and csv_sla_breach_tickets to agent tools
- Added KBA Drafter to Playwright nav tests and menu screenshots
- Added e2e tests: publish, delete, status filter, ticket viewer
- 32 unit tests + 5 live integration tests for LLM service
- Updated .env.example with LiteLLM-first documentation

Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
Copy link
Copy Markdown
Contributor

Copilot AI commented Mar 10, 2026

@abossard I've opened a new pull request, #21, to work on those changes. Once the pull request is ready, I'll request review from you.

Copy link
Copy Markdown
Contributor

Copilot AI commented Mar 10, 2026

@abossard I've opened a new pull request, #22, to work on those changes. Once the pull request is ready, I'll request review from you.

@abossard abossard merged commit 7ac9bb4 into main Mar 10, 2026
2 checks passed
@abossard abossard deleted the kba-draft-review-fixes branch March 10, 2026 21:52
abossard added a commit that referenced this pull request Mar 18, 2026
* Optimize agent runtime and SLA demo flow (#15)

* feat: Enhance SLA Breach Risk functionality and UI integration
- Increased max_length for agent prompt to 5000
- Added fields parameter to list and search tickets for selective data retrieval
- Updated timeout for usecase demo agent to 300 seconds
- Introduced SLA Breach Risk demo with detailed prompt and ticket analysis
- Added E2E tests for SLA Breach Risk demo page

* feat: add incident_id field to ticket model and related components

- Added incident_id to the ticket mapping in app.py.
- Updated csv_data.py to include incident_id when converting CSV rows to tickets.
- Modified operations.py to define incident_id as a CSV ticket field.
- Enhanced the Ticket model in tickets.py to include incident_id.
- Updated usecase_demo.py to accommodate changes in ticket structure.
- Modified CSVTicketTable.jsx to display incident_id in the ticket table.
- Updated TicketList.jsx to filter and display incident_id in the ticket list.
- Enhanced TicketsWithoutAnAssignee.jsx to include incident_id in ticket operations.
- Updated UsecaseDemoPage.jsx to pass matchingTickets to the render function.
- Enhanced demoDefinitions.js to improve prompts for use case demos.
- Added SLA Breach Overview result view in resultViews.jsx to visualize SLA status of tickets.

Signed-off-by: Andre Bossard <anbossar@microsoft.com>

* refactor: clean up import statements across multiple components

Signed-off-by: Andre Bossard <anbossar@microsoft.com>

* refactor: standardize import statement formatting in resultViews.jsx

Signed-off-by: Andre Bossard <anbossar@microsoft.com>

* feat: add SLA breach reporting functionality and related API endpoints

Signed-off-by: Andre Bossard <anbossar@microsoft.com>

* feat: implement SLA breach report retrieval for unassigned tickets

Signed-off-by: Andre Bossard <anbossar@microsoft.com>

---------

Signed-off-by: Andre Bossard <anbossar@microsoft.com>

* fix: update API proxy target from localhost to 127.0.0.1 in vite.config.js (#16)

Co-authored-by: luca Spring <luca.spring@bit.admin.ch>

* Agent fabric (#17)

* feat: Implement Tool Registry and Workbench Integration

- Added ToolRegistry class to manage LangChain StructuredTool instances.
- Created workbench_integration.py to wire tools into the Agent Workbench.
- Developed WorkbenchPage component for agent management in the frontend.
- Implemented backend tests for tool registration and agent operations.
- Added end-to-end tests for agent creation and deletion in the UI.

Signed-off-by: Andre Bossard <anbossar@microsoft.com>

* feat: Refactor Agent Workbench to Agent Fabric and enhance tool metadata handling

Signed-off-by: Andre Bossard <anbossar@microsoft.com>

* feat: Add required input handling to agent definitions and update UI components

Signed-off-by: Andre Bossard <anbossar@microsoft.com>

* feat: Enhance Markdown output handling in agent workflow and update UI components

Signed-off-by: Andre Bossard <anbossar@microsoft.com>

---------

Signed-off-by: Andre Bossard <anbossar@microsoft.com>

* Agent fabric (#18)

* feat: Implement Tool Registry and Workbench Integration

- Added ToolRegistry class to manage LangChain StructuredTool instances.
- Created workbench_integration.py to wire tools into the Agent Workbench.
- Developed WorkbenchPage component for agent management in the frontend.
- Implemented backend tests for tool registration and agent operations.
- Added end-to-end tests for agent creation and deletion in the UI.

Signed-off-by: Andre Bossard <anbossar@microsoft.com>

* feat: Refactor Agent Workbench to Agent Fabric and enhance tool metadata handling

Signed-off-by: Andre Bossard <anbossar@microsoft.com>

* feat: Add required input handling to agent definitions and update UI components

Signed-off-by: Andre Bossard <anbossar@microsoft.com>

* feat: Enhance Markdown output handling in agent workflow and update UI components

Signed-off-by: Andre Bossard <anbossar@microsoft.com>

* feat: Enhance ticket handling by adding incident ID support and improve UI components for better user experience

Signed-off-by: Andre Bossard <anbossar@microsoft.com>

* feat: Add tool invocation logging with latency tracking in WorkbenchService

Signed-off-by: Andre Bossard <anbossar@microsoft.com>

---------

Signed-off-by: Andre Bossard <anbossar@microsoft.com>

* Kba draft review fixes (#20)

* kba-draft implementiert

* - test dateien entfernt
- struktur aufgeräumt
- README.md angepasst
- learning_mechanism.md plan erstellt
- desing fixes

* feat: add search questions generation with database migration and UI

Database & Backend:
- Add search_questions column migration in operations.py (ALTER TABLE for existing databases)
- Add /api/kba/drafts/{id}/replace endpoint in app.py
- Fix backward compatibility in kba_service.py (_table_to_draft, _draft_to_table)
- Add search questions generation to replace_draft workflow
- Fix NULL constraint errors by ensuring empty strings for required fields
- Update related_tickets validation: accept INC + 9-12 digits (was fixed at 12)

Frontend:
- Add Text component import to KBADrafterPage.jsx (fix TypeError)
- Add full-screen blur overlay with centered spinner during KBA generation
- Show overlay for both new draft creation and replacement operations
- Update styles: loadingOverlay with backdrop-filter blur effect

Documentation:
- Update kba_prompts.py: clarify related_tickets format with examples
- Update GENERAL.md: correct related_tickets format specification

Fixes #1 - KBA drafts not loading (missing DB column)
Fixes #2 - Replace endpoint not found (405 error)
Fixes #3 - Ticket ID validation too strict

* tickets in popup ansehen

* feat(kba-drafter): add ability to reset reviewed KBAs back to draft

- Add "Zurück zu Entwurf" button for reviewed status KBAs
- Add handleUnreview() handler to update status from "reviewed" to "draft"
- Import ArrowUndo24Regular icon for the unreview action
- Allow users to continue editing KBAs after review without deletion

This enables editing of reviewed KBAs that need changes before publishing.

* feat(kba-drafter): add ticket viewer, unreview, status filter, and UI improvements

- Add ticket viewer dialog to display original incident details
  * New "Ticket" button in KBA header with DocumentSearch icon
  * Modal dialog showing incident data (ID, summary, status, priority, assignee, notes, resolution)
  * Backend endpoint /api/csv-tickets/by-incident/<incident_id> for incident ID lookup
  * Frontend API function getCSVTicketByIncident()

- Add unreview functionality for reviewed KBAs
  * "Zurück zu Entwurf" button with ArrowUndo icon
  * Allows resetting reviewed KBAs back to draft status for further editing

- Redesign KBA overview list
  * Replace corner delete button with professional overflow menu (⋮)
  * Horizontal layout: content left, status badge right-aligned, menu button
  * Menu component with delete option

- Add status filter dropdown to KBA overview
  * Filter options: All, draft, reviewed, published
  * Dropdown in card header for easy filtering

- Align EditableList "Add" button width with input fields
  * Use invisible placeholder buttons for exact width matching
  * Ensures consistent layout regardless of allowReorder setting

Files modified:
- frontend/src/features/kba-drafter/KBADrafterPage.jsx
- frontend/src/features/kba-drafter/components/EditableList.jsx
- frontend/src/services/api.js
- backend/app.py

* fix(kba): fix draft deletion bug and add collapsible AutoGenSettings

- Fix delete draft error: use response.items instead of response.drafts
- Make AutoGenSettings card collapsible with chevron icon
  - Starts collapsed to reduce visual dominance
  - Smooth slide-down animation when expanded
  - Status badge visible in collapsed header
  - Clickable header with keyboard support (Enter key)

* fix(kba): auto-scroll to top when opening draft

When clicking on a draft from the list after scrolling down,
the page now automatically scrolls to the top with a smooth animation.
This ensures users always start at the beginning of the draft content.

* feat: replace browser confirms with custom modal dialogs for unsaved changes

Replace native window.confirm() with ConfirmDialog component for better UX
consistency and modern appearance. Adds centered warning modal when user
attempts to discard unsaved changes (close draft, switch to preview, or
load different draft).

Changes:
- Add unsavedChangesDialogOpen and pendingAction states
- Update toggleEditMode, loadDraft, and handleClose to trigger modal
- Add handleDiscardChanges and handleCancelDiscard handlers
- Add ConfirmDialog with warning intent at end of component

* fix: address code review issues and add KBA drafter e2e tests

Fixes:
- Fix CSV folder case mismatch (CSV -> csv) in app.py and operations.py
- Remove duplicate get_ticket_by_incident_id method in csv_data.py
- Replace inefficient len(session.exec().all()) with SQL COUNT(*) in kba_service.py
- Replace hardcoded placeholder credentials with env var lookups in kba_service.py
- Fix scheduler swallowing exceptions (remove bare raise, return None)
- Add settings reload at start of each scheduler run to fix race condition
- Add generation_warnings field to surface search questions failures to users
- Add schema migration for generation_warnings column

Tests:
- Add 19 Playwright e2e tests for KBA Drafter feature covering:
  page load, navigation, LLM health status, draft generation,
  draft display, draft list, editing, review workflow,
  duplicate handling, and backend API integration

Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>

* feat: add LiteLLM fallback, Playwright tests, and remove OpenAI hard dependency

- LiteLLM is now the default LLM backend (no .env or API key needed)
- Multistage model fallback chain: claude-sonnet-4 → gpt-4o → gpt-4o-mini
- OpenAI SDK still used when OPENAI_API_KEY is explicitly set
- agents.py and workbench service use ChatLiteLLM when no OpenAI key
- Added csv_ticket_stats and csv_sla_breach_tickets to agent tools
- Added KBA Drafter to Playwright nav tests and menu screenshots
- Added e2e tests: publish, delete, status filter, ticket viewer
- 32 unit tests + 5 live integration tests for LLM service
- Updated .env.example with LiteLLM-first documentation

Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>

---------

Co-authored-by: SubSonic731 <alessandro.roschi@bit.admin.ch>
Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>

* Agent workbench v2 (#19)

* Extract agent builder into extensible module with tests

Create backend/agent_builder/ as a standalone, deeply layered module
following Grokking Simplicity (data/calculations/actions separation)
and A Philosophy of Software Design (deep modules).

Structure:
- models/: Pure data (Pydantic/SQLModel) - agent, run, evaluation, chat
- tools/: ToolRegistry, schema converter, MCP adapter
- engine/: Unified ReAct runner, callbacks, prompt builder
- evaluator.py: Success criteria evaluation (mostly calculations)
- persistence/: DB engine setup + repository pattern
- service.py: WorkbenchService (deep module facade)
- chat_service.py: ChatService using shared ReAct engine
- routes.py: Quart Blueprint replacing 200+ lines from app.py
- tests/: 107 tests (unit + integration + E2E)

Key improvements:
- Eliminated duplicate ReAct agent building (was in both agents.py
  and agent_workbench/service.py)
- DRY error handling in routes via Blueprint
- Repository pattern isolates DB from business logic
- Pure calculation modules (prompt_builder, schema_converter,
  evaluator) are independently testable
- Backward-compatible: agent_workbench/__init__.py shims to new module

Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>

* Add per-agent LLM config: model, temperature, recursion_limit, max_tokens, output_instructions

Each AgentDefinition now stores configurable LLM parameters:
- model: override service default (e.g. gpt-4o vs gpt-4o-mini)
- temperature: 0.0-2.0 (deterministic to creative)
- recursion_limit: 1-100 max ReAct loop iterations
- max_tokens: cap response length (0 = unlimited)
- output_instructions: custom formatting (replaces default markdown)

Changes:
- models/agent.py: 5 new fields with validation (ge/le bounds)
- persistence/database.py: migrations for existing DBs
- engine/react_runner.py: build_llm accepts temperature+max_tokens
- engine/prompt_builder.py: append_output_instructions for custom formatting
- service.py: _resolve_llm_for_agent builds per-agent LLM when config differs
- routes.py: ui-config v2 exposes llm_config_fields and defaults
- 12 new tests (model validation, CRUD, E2E roundtrip via REST)

Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>

* Add output_schema for type-safe structured output, fix defaults

Changes:
- recursion_limit default: 10 → 3 (most agents finish in 1-3 tool calls)
- max_tokens default: 0 → 4096 (sensible cap instead of unlimited)
- New field: output_schema (JSON Schema stored as JSON in DB)

output_schema is config, not code. You define the expected response
shape as a JSON Schema:
  {"type":"object","properties":{"breaches":{"type":"array",...}}}

At runtime this does two things:
1. Injected into system prompt so the LLM knows the expected structure
2. Takes priority over output_instructions and default markdown

Priority chain for output formatting:
  output_schema (strict JSON) > output_instructions (free text) > default markdown

128 tests pass (9 new tests for schema handling).

Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>

* Add suggest-schema endpoint and UI button

New endpoint: POST /api/workbench/suggest-schema
Takes agent name, description, system_prompt and asks the LLM to
propose a JSON Schema for the agent's structured output.

Backend:
- service.py: suggest_schema() method - builds a prompt, calls LLM,
  parses JSON response (handles markdown fences), falls back to
  generic schema on parse failure
- routes.py: POST /api/workbench/suggest-schema route

Frontend:
- api.js: suggestOutputSchema() function
- WorkbenchPage.jsx: output schema textarea + Suggest Schema button
  in the create form. Schema is editable JSON, sent as output_schema
  on agent creation. Button disabled until name or prompt is filled.

129 tests pass (1 new E2E test for suggest-schema endpoint).

Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>

* Wire output_schema to LangGraph response_format for SDK-level enforcement

When an agent has output_schema configured, it now does TWO things:

1. Prompt injection (existing) — schema is described in the system prompt
   so the LLM understands the expected structure
2. SDK enforcement (new) — schema is passed as response_format to
   create_react_agent(), which uses LangGraph's built-in structured
   output mechanism (provider-native or tool-based)

At runtime, structured_response from the LangGraph result takes
priority over raw message content. If the agent has no output_schema,
behavior is unchanged (markdown output from final message).

The output pipeline:
  output_schema defined → response_format=schema → structured_response → JSON
  no output_schema → final message content → markdown (default)

Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>

* Always use structured_response with default schema

Every agent now always returns structured output via LangGraph's
response_format — no more untyped markdown strings.

Default schema (when no custom output_schema is set):
  {
    "message": "string (markdown)",
    "referenced_tickets": ["string"]
  }

This means:
- Plain agents → get {message: '...markdown...', referenced_tickets: [...]}
- Custom schema agents → get whatever schema they define
- Both enforced at SDK level via response_format, not just prompt

Changes:
- prompt_builder.py: DEFAULT_OUTPUT_SCHEMA, resolve_output_schema()
- service.py: always passes effective schema to create_react_agent
- routes.py: ui-config exposes default_output_schema for frontend
- Tests updated (132 pass)

Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>

* Add comprehensive docs with mermaid diagrams, clean up stale docs

New: docs/AGENT_BUILDER.md — full architecture documentation with:
- Architecture diagram (module layers + data flow)
- Sequence diagram (agent run lifecycle)
- Structured output pipeline flowchart
- ER diagram (DB schema)
- Data/Calculations/Actions separation diagram
- Deep modules table
- Extensibility flowchart
- API endpoint reference
- Testing commands

Updated:
- AGENTS_IMPLEMENTATION.md — replaced stale content with summary + pointer
- docs/AGENTS.md — replaced stale architecture with mermaid + pointer
- docs/PROJECT_STRUCTURE.md — added agent_builder/ to tree

Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>

* Docs overhaul + remove ~1800 lines of dead code/stale docs

Documentation:
- README.md: Complete rewrite with features table, screenshots, mermaid
  architecture diagram, agent builder section, correct tech stack
- PROJECT_STRUCTURE.md: Full rewrite matching actual codebase
- AGENTS.md: Fixed AgentService→WorkbenchService, updated examples
- LEARNING.md: Fixed broken link

Deleted stale docs:
- AGENTS_IMPLEMENTATION.md (was a 3-line redirect stub)
- docs/RULES.md (empty file)
- docs/SQLMODEL_MIGRATION.md (historical, migration complete)

Dead code removed from agents.py (~250 lines):
- MCP client stubs (_mcp_tool_to_langchain, _ensure_ticket_mcp_connection, close)
- Schema helpers only used by dead MCP code (_json_type_to_python, _schema_to_pydantic)
- OpenAI logging callback (duplicated in agent_builder/engine/callbacks.py)
- _build_state_graph learning example (dead code)
- Unused imports (get_langchain_tools, MCPClient, create_model)

Deleted old agent_workbench/ source files (~1030 lines):
- models.py, service.py, evaluator.py, tool_registry.py
- Only __init__.py shim remains for backward compatibility

132 backend tests + 15 Playwright tests pass.

Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>

* Add Playwright tests for suggest-schema and agent chat

New E2E tests in workbench.spec.js:
- 'creates agent with output schema via suggest button' — mocks
  /api/workbench/suggest-schema, clicks Suggest Schema, verifies
  schema populates textarea, creates agent, deletes it
- 'sends message and displays mocked response' (Agent Chat UI) —
  mocks /api/agents/run, types message, clicks send, verifies
  markdown heading and tool badge render

17 Playwright tests pass (was 15, +2 new).

Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>

* Add VPN agent and failure handling Playwright tests

New Agent Fabric E2E tests:
- 'runs VPN troubleshooting agent and verifies structured output'
  Creates agent with VPN analysis prompt, runs it (mocked),
  verifies structured JSON output with ticket IDs (INC-101, INC-312),
  referenced_tickets field, and VPN content in rendered output
- 'handles agent run failure gracefully'
  Creates agent, runs it with mocked failure response,
  verifies UI doesn't crash and shows completion state

19 Playwright tests pass (was 17, +2 new).

Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>

* Fix structured output rendering in Agent Fabric UI

The output is now always structured JSON ({message, referenced_tickets}).
The UI now parses it and renders each part appropriately:

- message → rendered as GitHub-flavored Markdown (ReactMarkdown)
- referenced_tickets → rendered as monospace badges below the output
- Extra custom schema fields → rendered as formatted JSON in a pre block
- Button preview → shows message text, not raw JSON

Also handles non-JSON output gracefully (falls back to raw markdown).

Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>

* Add MCP App technical documentation

New: docs/MCP_APP.md — comprehensive guide on how this project
works as an MCP application:

- What an MCP App is (app that exposes business logic via MCP protocol)
- Architecture diagrams: consumers (Claude, Copilot, agents) → MCP endpoint
- Full protocol sequence diagram (initialize → tools/list → tools/call)
- The @operation decorator: single source of truth for REST + MCP + LangChain
- How to connect clients (Claude Desktop, Python, curl examples)
- 4-layer architecture diagram (business logic → operations → adapters → consumers)
- Extension roadmap: Resources, Prompts, SSE streaming
- Security considerations table

Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>

* Add SchemaRenderer + visual SchemaEditor with x-ui widget system

SchemaRenderer (frontend/src/features/workbench/SchemaRenderer.jsx):
- Generic component: takes {data, schema} and renders each property
  using x-ui widget annotations
- Widgets: markdown, table, badge-list, stat-card, bar-chart (Nivo),
  pie-chart (Nivo), json, hidden
- Auto-detection when no x-ui: string→markdown, integer→stat-card,
  array of objects→table, array of strings→badge-list, object→json
- Console debug logging, data-testid per field for E2E testing

SchemaEditor (frontend/src/features/workbench/SchemaEditor.jsx):
- Visual property list editor (no raw JSON editing needed)
- Add/remove properties, set name/type/description
- Widget picker dropdown with all available widgets
- Context-sensitive options (columns for table, label for stat-card,
  indexBy/keys for bar-chart)
- Syncs with suggest-schema: LLM suggestion populates visual editor
- Outputs valid JSON Schema with x-ui annotations

Backend:
- DEFAULT_OUTPUT_SCHEMA now has x-ui annotations (markdown + badge-list)
- suggest_schema prompt updated to suggest x-ui widgets per property

Wiring:
- WorkbenchPage uses SchemaRenderer for run output (replaces hardcoded)
- WorkbenchPage uses SchemaEditor for create form (replaces textarea)

20 Playwright tests pass (including new SchemaRenderer widget test).
132 backend tests pass.

Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>

* Improve suggest-schema prompt with full data domain + widget docs

The suggest-schema LLM prompt now includes:
- Ticket data domain (all field names, types, enum values, example cities)
- Available tools with descriptions (csv_list_tickets, csv_search_tickets, etc.)
- Full widget documentation with use-cases and options for each:
  markdown, table (columns), badge-list, stat-card (label),
  bar-chart (indexBy, keys), pie-chart, json, hidden
- Explicit rules: always include message+referenced_tickets,
  match widget to data shape, use snake_case names

This gives the LLM enough context to suggest schemas that actually
match the ticket data (e.g. status distribution → pie-chart,
ticket list → table with incident_id/summary/status columns).

Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>

* Fix latency issues: schema title bug + recursion_limit headroom

Investigation found 3 root causes for slow AI calls:

1. gpt-5-nano is a REASONING model — burns 192-832 reasoning tokens
   per LLM call (invisible chain-of-thought), taking 2-8s each.
   A simple 'say hello' costs 8.4s with 832 reasoning tokens.

2. response_format adds a 3rd LLM call — LangGraph's
   generate_structured_response makes a separate LLM call to format
   the output as JSON after the ReAct loop finishes.
   Without: 4.7s (2 calls). With: 13s (3 calls).

3. Missing 'title' in output_schema crashed with_structured_output.
   OpenAI's API requires a top-level 'title' in the JSON Schema.

Fixes applied:
- resolve_output_schema() now auto-adds 'title': 'AgentOutput'
  when missing (both default and custom schemas)
- DEFAULT_OUTPUT_SCHEMA has explicit 'title' field
- recursion_limit: user's setting (default 3) is now multiplied by 4
  for the actual LangGraph graph, with a floor of 10. This prevents
  GraphRecursionError when response_format adds extra graph steps.

Note: The main latency driver (reasoning tokens) is inherent to the
model choice. Users can switch to gpt-4o-mini via per-agent 'model'
field for ~10x faster non-reasoning responses.

133 backend + 20 Playwright tests pass.

Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>

* Fix agent tool token bloat: compact fields + lower default limits

Root cause: csv_list_tickets tool returned full Ticket objects with ALL
fields (notes, description, resolution, work logs) — ~65K tokens for
100 tickets. The LLM had to process all of this, causing 30-60s per
step with a reasoning model.

Changes to operations.py:
- csv_list_tickets: returns compact dicts (10 fields, not 30+),
  default limit 25 (was 100), max limit 100 (was 500)
- csv_search_tickets: same compact treatment, limit 25 (was 50)
- csv_get_ticket: now accepts optional 'fields' parameter for
  selective detail drill-down, returns dict (was full Ticket)
- Tool descriptions updated to guide agents: 'use csv_get_ticket
  for full details' pattern

Token impact per tool call:
  Before: 100 tickets × ~400 tokens = ~65,000 tokens
  After:  25 tickets × ~60 tokens = ~1,500 tokens (97% reduction)

Expected latency improvement:
  Before: ~13s per tool call (65K token input processing)
  After:  ~3-5s per tool call (1.5K token input)

153 tests pass (133 backend + 20 Playwright).

Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>

* Drop response_format to eliminate extra LLM call

LangGraph 1.0.8 implements response_format via a SEPARATE LLM call
(generate_structured_response) — adding 5-10s latency per run.
The refactor to inline tool-based structured output (github.com/
langchain-ai/langgraph/issues/5872) hasn't shipped yet.

Fix: remove response_format from create_react_agent. The system
prompt already instructs the LLM to produce JSON matching the
schema (via append_output_instructions). The frontend's
SchemaRenderer handles both parsed JSON and raw text gracefully.

Latency impact:
  Before: 3 LLM calls (decide tool + answer + format JSON) ~13s
  After:  2 LLM calls (decide tool + answer as JSON)       ~5s

When LangGraph ships inline structured output, we can re-enable
response_format with zero code changes (just pass it back to
build_react_agent).

Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>

* Enable OpenAI JSON mode for guaranteed valid JSON output

Adds response_format: {type: 'json_object'} to the ChatOpenAI
constructor via model_kwargs. This is a model-level setting that
constrains token generation to valid JSON — no extra LLM call,
no post-processing, just guaranteed JSON from every response.

This is different from LangGraph's response_format parameter
(which adds a separate LLM call). This is OpenAI's native JSON
mode applied at the API level during the same call.

Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>

* Revert JSON mode — incompatible with non-strict tool schemas

OpenAI's response_format: json_object requires all tools to have
strict schemas. Our tools (from @operation decorator) don't set
strict=True, causing: 'csv_search_tickets is not strict. Only
strict function tools can be auto-parsed'.

Reverting to prompt-only JSON enforcement, which tested at 3/3
reliability with gpt-5-nano. The frontend fallback (wraps non-JSON
as {message: raw_text}) provides additional safety.

Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>

* Add widget E2E tests + strict tools + Agent Chat JSON mode

New Playwright tests (23 total, +3):
- 'renders bar-chart and pie-chart from x-ui annotations' — injects
  mock agent with output_schema containing x-ui widgets, verifies
  SVG rendering for pie/bar charts, stat-card with label, badges
- 'renders raw JSON for object data' — verifies auto-detection:
  objects render as formatted JSON in pre blocks
- 'falls back gracefully for non-JSON output' — verifies plain
  markdown string wraps as {message: text} and renders correctly

Agent Chat (agents.py) fixes:
- Added JSON output mode (response_format: json_object)
- Added strict=True tool binding for compatibility
- Matches the same pattern as agent_builder

Strict tool binding (react_runner.py):
- build_react_agent pre-binds tools with strict=True
- Required for OpenAI JSON mode (response_format: json_object)

Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>

* Fix NameError: OpenAICallLoggingCallback was removed but still referenced

The class was deleted in the dead code cleanup but agents.py still
used it. Replaced with make_llm_logging_callback from agent_builder.

Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>

* Add 'Show in Menu' — agents appear as tabs in navigation

When an agent has show_in_menu=true, it appears as a tab in the
main navigation bar. Clicking it opens a dedicated run page with
just the input field, run button, and SchemaRenderer output.

Backend:
- AgentDefinition: new show_in_menu bool field (default false)
- AgentDefinitionCreate/Update: show_in_menu parameter
- Migration for existing DBs
- Service wires it through create/update

Frontend:
- WorkbenchPage: 'Show in menu' checkbox in create form
- App.jsx: fetches agents with show_in_menu=true, injects as tabs
- AgentRunPage.jsx: simple standalone run page (title, description,
  optional input, run button, SchemaRenderer output)
- Dynamic routes: /agent-run/{agentId}

E2E test:
- Creates agent via API with show_in_menu=true
- Verifies tab appears in navigation with agent name
- Clicks tab, verifies AgentRunPage renders
- Runs agent (mocked), verifies output with SchemaRenderer

24 Playwright + 133 backend = 157 tests pass.

Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>

* Add missing tools to chat agent: csv_sla_breach_tickets, csv_ticket_stats

The SLA Breach page was slow because the chat agent (agents.py)
didn't have the csv_sla_breach_tickets tool. The prompt said
'call csv_sla_breach_tickets' but the tool didn't exist, so the
LLM tried to replicate SLA breach logic manually using
csv_list_tickets — fetching many tickets and reasoning over them.

Now the chat agent has all 6 CSV tools matching the operations:
- csv_list_tickets (existing)
- csv_get_ticket (existing)
- csv_search_tickets (existing)
- csv_ticket_fields (existing)
- csv_sla_breach_tickets (NEW — pre-computed, ~1000 tokens)
- csv_ticket_stats (NEW — aggregated stats, ~350 bytes)

Expected improvement: 1 tool call (~1000 tokens) instead of
multiple list calls + manual reasoning (~30-60K tokens).

Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>

* Add ticket detail modal and enhance CSV ticket table functionality

Signed-off-by: Andre Bossard <anbossar@microsoft.com>

* Refactor CSVTicketTable component: reorder DialogActions import for consistency

Signed-off-by: Andre Bossard <anbossar@microsoft.com>

* Add reasoning_effort config + new tools for major speed improvement

Performance:
- reasoning_effort='low' as default — reduces gpt-5-nano from
  512 reasoning tokens (~7s) to 0-192 tokens (~1-3s) per LLM call
- Configurable per agent: low (fast), medium, high (deep), default
- Both agent_builder and legacy chat agent use reasoning_effort='low'

New tools:
- csv_count_tickets: count matching tickets WITHOUT returning data.
  Lets the LLM check 'how many VPN tickets?' (~50 tokens) before
  deciding to fetch details (~5000 tokens)
- csv_search_tickets_with_details: search + return full details
  (notes, resolution, description) in ONE call. Eliminates the
  N × csv_get_ticket drill-down pattern that caused the
  'Ticket Knowledgebase Creator' to make 5+ sequential LLM calls

Impact on 'Ticket Knowledgebase Creator' agent:
  Before: search(compact) → get_ticket × N → generate = 5+ LLM calls × ~5s = 25s+
  After:  search_with_details(query, limit=10) → generate = 2 LLM calls × ~2s = 4s

Also fixed: removed stale response_format: json_object from build_llm
(was causing strict tool errors).

Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>

* Update incident details in FALL_2_HARDWARE_PERIPHERIE and FALL_3_ZUGRIFF_BERECHTIGUNG documentation for consistency and clarity

Signed-off-by: Andre Bossard <anbossar@microsoft.com>

* Fix: all E2E tests now clean up created agents

Two tests were creating agents via the UI but not deleting them,
leaving orphans in the DB after each test run:
- 'runs an agent and appends output to run button'
- 'requires and forwards configured run input'

Added Delete button clicks at the end of both tests.
All 10 agent-creating tests now verified to clean up.

Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>

* Rewrite workbench e2e tests for tabbed UI

- Add helpers: goToCreateTab, goToAgentsTab, createAgent, createAgentViaAPI, deleteAgentViaAPI, mockEmptyRuns
- Update 'creates and deletes' to use Create Agent tab and agent cards
- Update 'runs an agent' to verify output in RunsSidePanel
- Update 'requires input' to use card inline input field + Go button
- Update 'suggest schema' to navigate to Create tab first
- Update 'failure handling' to check error in run detail panel
- Refactor SchemaRenderer tests to use setupSchemaTest helper (API-created agents, run output in side panel)
- Keep Agent Chat UI and Show in Menu tests unchanged

Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>

* feat: redesign workbench with agent cards, runs side panel, and tabbed layout

- Split WorkbenchPage into tabbed UI: Agents (cards grid) + Create Agent
- AgentCardsPanel: icon cards with Run/Edit/Delete buttons per agent
- RunsSidePanel: scrollable run history with click-to-view output
- AgentEditDialog: edit existing agents via dialog
- AgentCreateForm: extracted creation form (reusable for create + edit)
- Added API functions: updateWorkbenchAgent, listAllRuns, getRun
- All 47 Playwright tests pass (12 workbench tests updated for new UI)
- Removed Ollama references from setup.sh and package.json

Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>

* fix: LiteLLM fallback in agent_builder + add live lifecycle test

- Fixed agent_builder/engine/react_runner.py: ChatLiteLLM when no API key
- Fixed agent_builder/service.py: removed hard OpenAI key requirement
- Fixed agent_builder/chat_service.py: same
- Fixed RunsSidePanel output parsing for raw string output
- Added full lifecycle e2e test (live LLM): create → run → edit → re-run → verify history → delete

Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>

* feat: suggest schema & tools, default no tools, pure function refactor

- 'Suggest Schema & Tools' button: LLM suggests output schema AND tool selection
- Backend: _build_suggest_prompt and _parse_suggest_response as pure functions
- Frontend: tools default to empty, populated by suggest response
- RunsSidePanel: pure calculations extracted (buildAgentMap, sortRunsNewestFirst,
  resolveOutputSchema, resolveAgentName, parseRunOutput, formatRelativeTime)
- All 49 Playwright tests pass (2 live LLM tests included)

Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>

* fix: result dialog, chart rendering, markdown fence parsing

- Run results now open in a large Dialog (900px wide, 85vh max)
- Fixed parseRunOutput: strips markdown code fences from LLM output
- Fixed PieChartWidget: filters non-numeric values, formats labels
- Fixed BarChartWidget: accepts object {key: number} in addition to arrays
- Chart containers: 300px height, 600px max-width
- Tests: close dialog before cleanup (dialog blocks pointer events)
- All 49 Playwright tests pass

Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>

* feat: all-live Playwright tests, result dialog fix, runs panel fix

- Rewrote workbench tests: ZERO mocks, all 8 tests use live LLM
- Fixed RunsSidePanel: min-height for layout, runs visible on load
- Fixed parseRunOutput: strips markdown fences from LLM output
- Fixed chart widgets: pie/bar handle non-numeric values, proper sizing
- Fixed dialog close: tests use X button (in viewport) not Close (scrolled)
- Total: 43 tests, all passing, all live (1.1 min)

Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>

* refactor: extract shared parseRunOutput, add delete-all-runs

- Extracted parseRunOutput (fence-stripping + JSON parsing) into
  outputUtils.js — shared by RunsSidePanel and AgentRunPage
- Fixed AgentRunPage (show_in_menu): renders markdown instead of raw JSON
- Added DELETE /api/workbench/runs endpoint + trash button in Runs panel
- Runs panel: min-height 500px so content is visible on load

Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>

* feat: add SSE activity monitor, settings page, agent templates & run history

- Agent Activity page with real-time SSE event stream (tool calls, LLM
  thinking, run lifecycle), filterable by run_id via URL query param
- EventBus pub/sub + StreamingCallbackHandler wired into ReAct engine
- Settings page: drag-and-drop tab reorder, hide/show toggles, icon
  picker (57 FluentUI icons), persisted to localStorage
- Agent templates dropdown (KBA from tickets, worklog stats, next step
  advisor) pre-fills the create agent form
- AgentRunPage now shows filtered run history with detail dialog and
  link to Activity page filtered by run_id
- 19 new Playwright E2E tests (8 activity + 11 settings)

Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>

* feat: add Support Workflow canvas page with interactive editor

Purely browser-side workflow visualization using HTML Canvas:
- 5 node types: Start, End, Action, Decision, Wait (each with
  distinct shape and color)
- Drag-and-drop to reposition nodes
- Shift+drag to create connections between nodes
- Double-click to rename nodes inline
- Animate button shows flowing dots along edges
- Toolbar to add/delete nodes, reset to default workflow
- Default workflow: Ticket Created → Auto-Classify → Priority
  decision → L1/L2 paths → Resolved decision → Close/Reopen
- 9 Playwright E2E tests with screenshots

Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>

* feat: metro-map workflow with presets, color picker, agent assignment

Rewrite WorkflowPage as metro-map style inspired by Incident &
Problem Solving methodology:
- 3 workflow presets: Incident Solving, Problem Solving, Change Mgmt
- Metro station circle nodes with thick colored edge lines
- Edge color inherited from outgoing node
- Click node → dialog with color picker (8 colors) and agent selector
  (10 agent presets)
- Agent indicator dot on nodes with assigned agents
- Color legend auto-generated from used colors
- 12 Playwright E2E tests covering presets, node config, animation

Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>

* feat: friendlier workflow editor — connect mode, double-click add, dialog edges

- Connect Mode toggle button: click source node then target to draw edge
  (no shift key needed). Crosshair cursor + green '+' hint on target.
- Double-click empty canvas area to add a node at that position
- Node dialog now has 'Connect to…' section with buttons for each
  unconnected node — draw edges without touching the canvas
- Add Node button opens config dialog immediately for the new node
- Dynamic help text updates based on current mode
- Escape key exits connect mode
- Updated Playwright tests for new UX

Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>

* feat: add 'Improve my Prompt' button to Agent Fabric

LLM-powered prompt improvement following 2025 best practices:
- Backend: /api/workbench/improve-prompt endpoint + service method
  that rewrites prompts with clear role, goals, numbered steps,
  tool references, output format, and constraints
- Frontend: '✨ Improve my Prompt' button below the system prompt
  textarea, disabled when empty, replaces prompt with improved version
- 4 Playwright E2E tests with before/after screenshots

Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>

* fix: prompt improvement skips output format (handled by schema)

Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>

* fix: improve-prompt uses selected tools, not all available

Pass tool_names from frontend form state so the LLM only references
tools the user actually selected for this agent.

Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>

* fix: remove maxHeight on tools list to avoid scrolling

Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>

* fix: replace worklog template with Topic & Product Analysis

Worklog columns in data.csv are all empty/zero. New template analyzes
topics, products, services, priority distribution, and group workload
using data that actually exists in the CSV.

Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>

---------

Signed-off-by: Andre Bossard <anbossar@microsoft.com>
Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>

* fix: enhance CSV ticket handling and update LLM backend initialization (#25)

Signed-off-by: Andre Bossard <anbossar@microsoft.com>

* Initial plan

---------

Signed-off-by: Andre Bossard <anbossar@microsoft.com>
Co-authored-by: Andre Bossard <abossard@users.noreply.github.com>
Co-authored-by: luca Spring <luca.spring@bit.admin.ch>
Co-authored-by: SubSonic731 <alessandro.roschi@bit.admin.ch>
Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
Co-authored-by: copilot-swe-agent[bot] <198982749+Copilot@users.noreply.github.com>
abossard added a commit that referenced this pull request Mar 25, 2026
* kba-draft implementiert

* - test dateien entfernt
- struktur aufgeräumt
- README.md angepasst
- learning_mechanism.md plan erstellt
- desing fixes

* feat: add search questions generation with database migration and UI

Database & Backend:
- Add search_questions column migration in operations.py (ALTER TABLE for existing databases)
- Add /api/kba/drafts/{id}/replace endpoint in app.py
- Fix backward compatibility in kba_service.py (_table_to_draft, _draft_to_table)
- Add search questions generation to replace_draft workflow
- Fix NULL constraint errors by ensuring empty strings for required fields
- Update related_tickets validation: accept INC + 9-12 digits (was fixed at 12)

Frontend:
- Add Text component import to KBADrafterPage.jsx (fix TypeError)
- Add full-screen blur overlay with centered spinner during KBA generation
- Show overlay for both new draft creation and replacement operations
- Update styles: loadingOverlay with backdrop-filter blur effect

Documentation:
- Update kba_prompts.py: clarify related_tickets format with examples
- Update GENERAL.md: correct related_tickets format specification

Fixes #1 - KBA drafts not loading (missing DB column)
Fixes #2 - Replace endpoint not found (405 error)
Fixes #3 - Ticket ID validation too strict

* tickets in popup ansehen

* feat(kba-drafter): add ability to reset reviewed KBAs back to draft

- Add "Zurück zu Entwurf" button for reviewed status KBAs
- Add handleUnreview() handler to update status from "reviewed" to "draft"
- Import ArrowUndo24Regular icon for the unreview action
- Allow users to continue editing KBAs after review without deletion

This enables editing of reviewed KBAs that need changes before publishing.

* feat(kba-drafter): add ticket viewer, unreview, status filter, and UI improvements

- Add ticket viewer dialog to display original incident details
  * New "Ticket" button in KBA header with DocumentSearch icon
  * Modal dialog showing incident data (ID, summary, status, priority, assignee, notes, resolution)
  * Backend endpoint /api/csv-tickets/by-incident/<incident_id> for incident ID lookup
  * Frontend API function getCSVTicketByIncident()

- Add unreview functionality for reviewed KBAs
  * "Zurück zu Entwurf" button with ArrowUndo icon
  * Allows resetting reviewed KBAs back to draft status for further editing

- Redesign KBA overview list
  * Replace corner delete button with professional overflow menu (⋮)
  * Horizontal layout: content left, status badge right-aligned, menu button
  * Menu component with delete option

- Add status filter dropdown to KBA overview
  * Filter options: All, draft, reviewed, published
  * Dropdown in card header for easy filtering

- Align EditableList "Add" button width with input fields
  * Use invisible placeholder buttons for exact width matching
  * Ensures consistent layout regardless of allowReorder setting

Files modified:
- frontend/src/features/kba-drafter/KBADrafterPage.jsx
- frontend/src/features/kba-drafter/components/EditableList.jsx
- frontend/src/services/api.js
- backend/app.py

* fix(kba): fix draft deletion bug and add collapsible AutoGenSettings

- Fix delete draft error: use response.items instead of response.drafts
- Make AutoGenSettings card collapsible with chevron icon
  - Starts collapsed to reduce visual dominance
  - Smooth slide-down animation when expanded
  - Status badge visible in collapsed header
  - Clickable header with keyboard support (Enter key)

* fix(kba): auto-scroll to top when opening draft

When clicking on a draft from the list after scrolling down,
the page now automatically scrolls to the top with a smooth animation.
This ensures users always start at the beginning of the draft content.

* feat: replace browser confirms with custom modal dialogs for unsaved changes

Replace native window.confirm() with ConfirmDialog component for better UX
consistency and modern appearance. Adds centered warning modal when user
attempts to discard unsaved changes (close draft, switch to preview, or
load different draft).

Changes:
- Add unsavedChangesDialogOpen and pendingAction states
- Update toggleEditMode, loadDraft, and handleClose to trigger modal
- Add handleDiscardChanges and handleCancelDiscard handlers
- Add ConfirmDialog with warning intent at end of component

* fix: address code review issues and add KBA drafter e2e tests

Fixes:
- Fix CSV folder case mismatch (CSV -> csv) in app.py and operations.py
- Remove duplicate get_ticket_by_incident_id method in csv_data.py
- Replace inefficient len(session.exec().all()) with SQL COUNT(*) in kba_service.py
- Replace hardcoded placeholder credentials with env var lookups in kba_service.py
- Fix scheduler swallowing exceptions (remove bare raise, return None)
- Add settings reload at start of each scheduler run to fix race condition
- Add generation_warnings field to surface search questions failures to users
- Add schema migration for generation_warnings column

Tests:
- Add 19 Playwright e2e tests for KBA Drafter feature covering:
  page load, navigation, LLM health status, draft generation,
  draft display, draft list, editing, review workflow,
  duplicate handling, and backend API integration

Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>

* feat: add LiteLLM fallback, Playwright tests, and remove OpenAI hard dependency

- LiteLLM is now the default LLM backend (no .env or API key needed)
- Multistage model fallback chain: claude-sonnet-4 → gpt-4o → gpt-4o-mini
- OpenAI SDK still used when OPENAI_API_KEY is explicitly set
- agents.py and workbench service use ChatLiteLLM when no OpenAI key
- Added csv_ticket_stats and csv_sla_breach_tickets to agent tools
- Added KBA Drafter to Playwright nav tests and menu screenshots
- Added e2e tests: publish, delete, status filter, ticket viewer
- 32 unit tests + 5 live integration tests for LLM service
- Updated .env.example with LiteLLM-first documentation

Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>

---------

Co-authored-by: SubSonic731 <alessandro.roschi@bit.admin.ch>
Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
abossard added a commit that referenced this pull request Mar 25, 2026
* Add comprehensive Ubuntu installation guide for 22.04 and 24.04 LTS

Co-authored-by: abossard <86611+abossard@users.noreply.github.com>

* Fix footer to say "Target Platforms" instead of falsely claiming "Tested On"

Co-authored-by: abossard <86611+abossard@users.noreply.github.com>

* Simplify guide: Ubuntu 22.04 only, one method per tool, Python 3.13, Node 20 LTS

Co-authored-by: abossard <86611+abossard@users.noreply.github.com>

* Update setup for Python virtual environment and improve documentation

- Change virtual environment creation to use `.venv` at the repo root
- Update activation commands in various documentation files
- Modify setup and start scripts to reflect new virtual environment structure
- Ensure consistency across installation guides and troubleshooting documentation

* Add Chromium installation instructions and verification step to Ubuntu guide

* Add launch configuration for Python Quart backend and frontend development

* Update .gitignore to include vscode-chromium-profile and exclude launch.json

* Add VSCode extensions recommendations for Python and JavaScript development

* Update LEARNING.md

* feat: Integrate Ollama LLM for AI chat functionality (#3)

- Added `httpx` dependency for async HTTP requests to Ollama API.
- Implemented OllamaChat component in frontend for user interaction with the LLM.
- Created backend service for handling chat requests and model listing.
- Updated setup scripts to check for Ollama installation and pull required models.
- Added API endpoints for chat and model listing in the backend.
- Implemented end-to-end tests for Ollama integration, covering model listing and chat functionality.
- Enhanced error handling and user feedback in the chat interface.

* Andre prepare day 4 (#4)

* feat: Implement MCP JSON-RPC 2.0 handler and refactor API decorators

Signed-off-by: Andre Bossard <anbossar@microsoft.com>

* fix: Update API decorators for optional HTTP path and clean up imports
docs: Enhance LEVEL_UP.md with Copilot chat testing instructions

Signed-off-by: Andre Bossard <anbossar@microsoft.com>

* feat: Migrate task management to SQLModel ORM and update related documentation

Signed-off-by: Andre Bossard <anbossar@microsoft.com>

* feat: Implement LangGraph agent with Azure OpenAI integration and extend API decorators for tool conversion

Signed-off-by: Andre Bossard <anbossar@microsoft.com>

* feat: Refactor AgentService to use OpenAI SDK and enhance tool integration

Signed-off-by: Andre Bossard <anbossar@microsoft.com>

* feat: Refactor AgentService to integrate LangGraph and replace OpenAI SDK usage

Signed-off-by: Andre Bossard <anbossar@microsoft.com>

* fix: Correct docstring formatting in tool_wrapper for consistency

Signed-off-by: Andre Bossard <anbossar@microsoft.com>

* feat: Update documentation structure for Day 4 lessons and announcements

Signed-off-by: Andre Bossard <anbossar@microsoft.com>

---------

Signed-off-by: Andre Bossard <anbossar@microsoft.com>

* refactor: remove agent service initialization and related endpoints (#7)

Signed-off-by: Andre Bossard <anbossar@microsoft.com>

* OpenAI and ticket feed (#9)

* refactor: update Azure OpenAI configuration and streamline environment variable usage

Signed-off-by: Andre Bossard <anbossar@microsoft.com>

* feat: integrate FastMCP client for external tool support and add tests

Signed-off-by: Andre Bossard <anbossar@microsoft.com>

* feat: implement Ticket MCP integration with FastMCP client and add REST endpoints for ticket management

Signed-off-by: Andre Bossard <anbossar@microsoft.com>

---------

Signed-off-by: Andre Bossard <anbossar@microsoft.com>

* Update for support (#10)

* feat: Add QA tickets management with new TicketList component and API integration

* feat: Add initial diagram for project planning in explain.drawio

Signed-off-by: Andre Bossard <anbossar@microsoft.com>

* feat: Add RULES.md to document project guidelines

Signed-off-by: Andre Bossard <anbossar@microsoft.com>

* feat: Add ticket models and reminder functionality for "Assigned without Assignee"

Signed-off-by: Andre Bossard <anbossar@microsoft.com>

* refactor: Rearrange imports and enhance startup logging for REST API and MCP JSON-RPC

Signed-off-by: Andre Bossard <anbossar@microsoft.com>

* feat: Enhance ticket handling by adding mapping functions and updating QA tickets endpoint

Signed-off-by: Andre Bossard <anbossar@microsoft.com>

* feat: Add TicketsWithoutAnAssignee component to display unassigned tickets

Signed-off-by: Andre Bossard <anbossar@microsoft.com>

* refactor: Clean up code formatting and improve ticket handling in various components

Signed-off-by: Andre Bossard <anbossar@microsoft.com>

* Refactor Ollama integration to use Azure OpenAI agent; remove OllamaChat component and related API calls, add AgentChat component for task management; update frontend routing and backend operations accordingly.

Signed-off-by: Andre Bossard <anbossar@microsoft.com>

* feat: Enhance AgentService with detailed logging for MCP tool calls and agent execution

Signed-off-by: Andre Bossard <anbossar@microsoft.com>

---------

Signed-off-by: Andre Bossard <anbossar@microsoft.com>

* refactor: update architecture documentation and improve environment variable handling in agents.py (#11)

Signed-off-by: Andre Bossard <anbossar@microsoft.com>

* Work on csv (#12)

* feat: Implement CSV Ticket Viewer

- Refactor App component to replace existing features with CSV Ticket Table.
- Add CSVTicketTable component for displaying tickets from CSV data source.
- Introduce API functions for fetching CSV ticket fields, tickets, and statistics.
- Create CSV data source in backend to handle loading and processing of CSV files.
- Enhance AgentChat component to display error details from API responses.
- Update styles and layout for improved user experience in ticket viewing.

Signed-off-by: Andre Bossard <anbossar@microsoft.com>

* refactor: Update import formatting and enhance status badge display in CSVTicketTable component

Signed-off-by: Andre Bossard <anbossar@microsoft.com>

* feat: Add Nivo chart visualizations for CSV tickets and enhance documentation

Signed-off-by: Andre Bossard <anbossar@microsoft.com>

---------

Signed-off-by: Andre Bossard <anbossar@microsoft.com>

* Ai baby workbench (#13)

* refactor: update configuration from Azure OpenAI to OpenAI and enhance agent service initialization

Signed-off-by: Andre Bossard <anbossar@microsoft.com>

* refactor: update AgentChat component for OpenAI integration and enhance markdown support

Signed-off-by: Andre Bossard <anbossar@microsoft.com>

---------

Signed-off-by: Andre Bossard <anbossar@microsoft.com>

* feat: Implement Usecase Demo Agent orchestration and UI components (#14)

- Added backend orchestration for usecase demo agent runs in `usecase_demo.py`.
- Created documentation for CSV ticket guidance in `CSV_AI_GUIDANCE.md`.
- Developed frontend components for usecase demo description and page in `UsecaseDemoDescription.jsx` and `UsecaseDemoPage.jsx`.
- Introduced demo definitions for usecase demos in `demoDefinitions.js`.
- Implemented result views for structured table and markdown in `resultViews.jsx`.
- Added utility functions for handling usecase demo runs in `usecaseDemoUtils.js`.
- Included a network diagram in `net.drawio`.

* Optimize agent runtime and SLA demo flow (#15)

* feat: Enhance SLA Breach Risk functionality and UI integration
- Increased max_length for agent prompt to 5000
- Added fields parameter to list and search tickets for selective data retrieval
- Updated timeout for usecase demo agent to 300 seconds
- Introduced SLA Breach Risk demo with detailed prompt and ticket analysis
- Added E2E tests for SLA Breach Risk demo page

* feat: add incident_id field to ticket model and related components

- Added incident_id to the ticket mapping in app.py.
- Updated csv_data.py to include incident_id when converting CSV rows to tickets.
- Modified operations.py to define incident_id as a CSV ticket field.
- Enhanced the Ticket model in tickets.py to include incident_id.
- Updated usecase_demo.py to accommodate changes in ticket structure.
- Modified CSVTicketTable.jsx to display incident_id in the ticket table.
- Updated TicketList.jsx to filter and display incident_id in the ticket list.
- Enhanced TicketsWithoutAnAssignee.jsx to include incident_id in ticket operations.
- Updated UsecaseDemoPage.jsx to pass matchingTickets to the render function.
- Enhanced demoDefinitions.js to improve prompts for use case demos.
- Added SLA Breach Overview result view in resultViews.jsx to visualize SLA status of tickets.

Signed-off-by: Andre Bossard <anbossar@microsoft.com>

* refactor: clean up import statements across multiple components

Signed-off-by: Andre Bossard <anbossar@microsoft.com>

* refactor: standardize import statement formatting in resultViews.jsx

Signed-off-by: Andre Bossard <anbossar@microsoft.com>

* feat: add SLA breach reporting functionality and related API endpoints

Signed-off-by: Andre Bossard <anbossar@microsoft.com>

* feat: implement SLA breach report retrieval for unassigned tickets

Signed-off-by: Andre Bossard <anbossar@microsoft.com>

---------

Signed-off-by: Andre Bossard <anbossar@microsoft.com>

* fix: update API proxy target from localhost to 127.0.0.1 in vite.config.js (#16)

Co-authored-by: luca Spring <luca.spring@bit.admin.ch>

* Agent fabric (#17)

* feat: Implement Tool Registry and Workbench Integration

- Added ToolRegistry class to manage LangChain StructuredTool instances.
- Created workbench_integration.py to wire tools into the Agent Workbench.
- Developed WorkbenchPage component for agent management in the frontend.
- Implemented backend tests for tool registration and agent operations.
- Added end-to-end tests for agent creation and deletion in the UI.

Signed-off-by: Andre Bossard <anbossar@microsoft.com>

* feat: Refactor Agent Workbench to Agent Fabric and enhance tool metadata handling

Signed-off-by: Andre Bossard <anbossar@microsoft.com>

* feat: Add required input handling to agent definitions and update UI components

Signed-off-by: Andre Bossard <anbossar@microsoft.com>

* feat: Enhance Markdown output handling in agent workflow and update UI components

Signed-off-by: Andre Bossard <anbossar@microsoft.com>

---------

Signed-off-by: Andre Bossard <anbossar@microsoft.com>

* Agent fabric (#18)

* feat: Implement Tool Registry and Workbench Integration

- Added ToolRegistry class to manage LangChain StructuredTool instances.
- Created workbench_integration.py to wire tools into the Agent Workbench.
- Developed WorkbenchPage component for agent management in the frontend.
- Implemented backend tests for tool registration and agent operations.
- Added end-to-end tests for agent creation and deletion in the UI.

Signed-off-by: Andre Bossard <anbossar@microsoft.com>

* feat: Refactor Agent Workbench to Agent Fabric and enhance tool metadata handling

Signed-off-by: Andre Bossard <anbossar@microsoft.com>

* feat: Add required input handling to agent definitions and update UI components

Signed-off-by: Andre Bossard <anbossar@microsoft.com>

* feat: Enhance Markdown output handling in agent workflow and update UI components

Signed-off-by: Andre Bossard <anbossar@microsoft.com>

* feat: Enhance ticket handling by adding incident ID support and improve UI components for better user experience

Signed-off-by: Andre Bossard <anbossar@microsoft.com>

* feat: Add tool invocation logging with latency tracking in WorkbenchService

Signed-off-by: Andre Bossard <anbossar@microsoft.com>

---------

Signed-off-by: Andre Bossard <anbossar@microsoft.com>

* Kba draft review fixes (#20)

* kba-draft implementiert

* - test dateien entfernt
- struktur aufgeräumt
- README.md angepasst
- learning_mechanism.md plan erstellt
- desing fixes

* feat: add search questions generation with database migration and UI

Database & Backend:
- Add search_questions column migration in operations.py (ALTER TABLE for existing databases)
- Add /api/kba/drafts/{id}/replace endpoint in app.py
- Fix backward compatibility in kba_service.py (_table_to_draft, _draft_to_table)
- Add search questions generation to replace_draft workflow
- Fix NULL constraint errors by ensuring empty strings for required fields
- Update related_tickets validation: accept INC + 9-12 digits (was fixed at 12)

Frontend:
- Add Text component import to KBADrafterPage.jsx (fix TypeError)
- Add full-screen blur overlay with centered spinner during KBA generation
- Show overlay for both new draft creation and replacement operations
- Update styles: loadingOverlay with backdrop-filter blur effect

Documentation:
- Update kba_prompts.py: clarify related_tickets format with examples
- Update GENERAL.md: correct related_tickets format specification

Fixes #1 - KBA drafts not loading (missing DB column)
Fixes #2 - Replace endpoint not found (405 error)
Fixes #3 - Ticket ID validation too strict

* tickets in popup ansehen

* feat(kba-drafter): add ability to reset reviewed KBAs back to draft

- Add "Zurück zu Entwurf" button for reviewed status KBAs
- Add handleUnreview() handler to update status from "reviewed" to "draft"
- Import ArrowUndo24Regular icon for the unreview action
- Allow users to continue editing KBAs after review without deletion

This enables editing of reviewed KBAs that need changes before publishing.

* feat(kba-drafter): add ticket viewer, unreview, status filter, and UI improvements

- Add ticket viewer dialog to display original incident details
  * New "Ticket" button in KBA header with DocumentSearch icon
  * Modal dialog showing incident data (ID, summary, status, priority, assignee, notes, resolution)
  * Backend endpoint /api/csv-tickets/by-incident/<incident_id> for incident ID lookup
  * Frontend API function getCSVTicketByIncident()

- Add unreview functionality for reviewed KBAs
  * "Zurück zu Entwurf" button with ArrowUndo icon
  * Allows resetting reviewed KBAs back to draft status for further editing

- Redesign KBA overview list
  * Replace corner delete button with professional overflow menu (⋮)
  * Horizontal layout: content left, status badge right-aligned, menu button
  * Menu component with delete option

- Add status filter dropdown to KBA overview
  * Filter options: All, draft, reviewed, published
  * Dropdown in card header for easy filtering

- Align EditableList "Add" button width with input fields
  * Use invisible placeholder buttons for exact width matching
  * Ensures consistent layout regardless of allowReorder setting

Files modified:
- frontend/src/features/kba-drafter/KBADrafterPage.jsx
- frontend/src/features/kba-drafter/components/EditableList.jsx
- frontend/src/services/api.js
- backend/app.py

* fix(kba): fix draft deletion bug and add collapsible AutoGenSettings

- Fix delete draft error: use response.items instead of response.drafts
- Make AutoGenSettings card collapsible with chevron icon
  - Starts collapsed to reduce visual dominance
  - Smooth slide-down animation when expanded
  - Status badge visible in collapsed header
  - Clickable header with keyboard support (Enter key)

* fix(kba): auto-scroll to top when opening draft

When clicking on a draft from the list after scrolling down,
the page now automatically scrolls to the top with a smooth animation.
This ensures users always start at the beginning of the draft content.

* feat: replace browser confirms with custom modal dialogs for unsaved changes

Replace native window.confirm() with ConfirmDialog component for better UX
consistency and modern appearance. Adds centered warning modal when user
attempts to discard unsaved changes (close draft, switch to preview, or
load different draft).

Changes:
- Add unsavedChangesDialogOpen and pendingAction states
- Update toggleEditMode, loadDraft, and handleClose to trigger modal
- Add handleDiscardChanges and handleCancelDiscard handlers
- Add ConfirmDialog with warning intent at end of component

* fix: address code review issues and add KBA drafter e2e tests

Fixes:
- Fix CSV folder case mismatch (CSV -> csv) in app.py and operations.py
- Remove duplicate get_ticket_by_incident_id method in csv_data.py
- Replace inefficient len(session.exec().all()) with SQL COUNT(*) in kba_service.py
- Replace hardcoded placeholder credentials with env var lookups in kba_service.py
- Fix scheduler swallowing exceptions (remove bare raise, return None)
- Add settings reload at start of each scheduler run to fix race condition
- Add generation_warnings field to surface search questions failures to users
- Add schema migration for generation_warnings column

Tests:
- Add 19 Playwright e2e tests for KBA Drafter feature covering:
  page load, navigation, LLM health status, draft generation,
  draft display, draft list, editing, review workflow,
  duplicate handling, and backend API integration

Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>

* feat: add LiteLLM fallback, Playwright tests, and remove OpenAI hard dependency

- LiteLLM is now the default LLM backend (no .env or API key needed)
- Multistage model fallback chain: claude-sonnet-4 → gpt-4o → gpt-4o-mini
- OpenAI SDK still used when OPENAI_API_KEY is explicitly set
- agents.py and workbench service use ChatLiteLLM when no OpenAI key
- Added csv_ticket_stats and csv_sla_breach_tickets to agent tools
- Added KBA Drafter to Playwright nav tests and menu screenshots
- Added e2e tests: publish, delete, status filter, ticket viewer
- 32 unit tests + 5 live integration tests for LLM service
- Updated .env.example with LiteLLM-first documentation

Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>

---------

Co-authored-by: SubSonic731 <alessandro.roschi@bit.admin.ch>
Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>

* Agent workbench v2 (#19)

* Extract agent builder into extensible module with tests

Create backend/agent_builder/ as a standalone, deeply layered module
following Grokking Simplicity (data/calculations/actions separation)
and A Philosophy of Software Design (deep modules).

Structure:
- models/: Pure data (Pydantic/SQLModel) - agent, run, evaluation, chat
- tools/: ToolRegistry, schema converter, MCP adapter
- engine/: Unified ReAct runner, callbacks, prompt builder
- evaluator.py: Success criteria evaluation (mostly calculations)
- persistence/: DB engine setup + repository pattern
- service.py: WorkbenchService (deep module facade)
- chat_service.py: ChatService using shared ReAct engine
- routes.py: Quart Blueprint replacing 200+ lines from app.py
- tests/: 107 tests (unit + integration + E2E)

Key improvements:
- Eliminated duplicate ReAct agent building (was in both agents.py
  and agent_workbench/service.py)
- DRY error handling in routes via Blueprint
- Repository pattern isolates DB from business logic
- Pure calculation modules (prompt_builder, schema_converter,
  evaluator) are independently testable
- Backward-compatible: agent_workbench/__init__.py shims to new module

Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>

* Add per-agent LLM config: model, temperature, recursion_limit, max_tokens, output_instructions

Each AgentDefinition now stores configurable LLM parameters:
- model: override service default (e.g. gpt-4o vs gpt-4o-mini)
- temperature: 0.0-2.0 (deterministic to creative)
- recursion_limit: 1-100 max ReAct loop iterations
- max_tokens: cap response length (0 = unlimited)
- output_instructions: custom formatting (replaces default markdown)

Changes:
- models/agent.py: 5 new fields with validation (ge/le bounds)
- persistence/database.py: migrations for existing DBs
- engine/react_runner.py: build_llm accepts temperature+max_tokens
- engine/prompt_builder.py: append_output_instructions for custom formatting
- service.py: _resolve_llm_for_agent builds per-agent LLM when config differs
- routes.py: ui-config v2 exposes llm_config_fields and defaults
- 12 new tests (model validation, CRUD, E2E roundtrip via REST)

Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>

* Add output_schema for type-safe structured output, fix defaults

Changes:
- recursion_limit default: 10 → 3 (most agents finish in 1-3 tool calls)
- max_tokens default: 0 → 4096 (sensible cap instead of unlimited)
- New field: output_schema (JSON Schema stored as JSON in DB)

output_schema is config, not code. You define the expected response
shape as a JSON Schema:
  {"type":"object","properties":{"breaches":{"type":"array",...}}}

At runtime this does two things:
1. Injected into system prompt so the LLM knows the expected structure
2. Takes priority over output_instructions and default markdown

Priority chain for output formatting:
  output_schema (strict JSON) > output_instructions (free text) > default markdown

128 tests pass (9 new tests for schema handling).

Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>

* Add suggest-schema endpoint and UI button

New endpoint: POST /api/workbench/suggest-schema
Takes agent name, description, system_prompt and asks the LLM to
propose a JSON Schema for the agent's structured output.

Backend:
- service.py: suggest_schema() method - builds a prompt, calls LLM,
  parses JSON response (handles markdown fences), falls back to
  generic schema on parse failure
- routes.py: POST /api/workbench/suggest-schema route

Frontend:
- api.js: suggestOutputSchema() function
- WorkbenchPage.jsx: output schema textarea + Suggest Schema button
  in the create form. Schema is editable JSON, sent as output_schema
  on agent creation. Button disabled until name or prompt is filled.

129 tests pass (1 new E2E test for suggest-schema endpoint).

Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>

* Wire output_schema to LangGraph response_format for SDK-level enforcement

When an agent has output_schema configured, it now does TWO things:

1. Prompt injection (existing) — schema is described in the system prompt
   so the LLM understands the expected structure
2. SDK enforcement (new) — schema is passed as response_format to
   create_react_agent(), which uses LangGraph's built-in structured
   output mechanism (provider-native or tool-based)

At runtime, structured_response from the LangGraph result takes
priority over raw message content. If the agent has no output_schema,
behavior is unchanged (markdown output from final message).

The output pipeline:
  output_schema defined → response_format=schema → structured_response → JSON
  no output_schema → final message content → markdown (default)

Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>

* Always use structured_response with default schema

Every agent now always returns structured output via LangGraph's
response_format — no more untyped markdown strings.

Default schema (when no custom output_schema is set):
  {
    "message": "string (markdown)",
    "referenced_tickets": ["string"]
  }

This means:
- Plain agents → get {message: '...markdown...', referenced_tickets: [...]}
- Custom schema agents → get whatever schema they define
- Both enforced at SDK level via response_format, not just prompt

Changes:
- prompt_builder.py: DEFAULT_OUTPUT_SCHEMA, resolve_output_schema()
- service.py: always passes effective schema to create_react_agent
- routes.py: ui-config exposes default_output_schema for frontend
- Tests updated (132 pass)

Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>

* Add comprehensive docs with mermaid diagrams, clean up stale docs

New: docs/AGENT_BUILDER.md — full architecture documentation with:
- Architecture diagram (module layers + data flow)
- Sequence diagram (agent run lifecycle)
- Structured output pipeline flowchart
- ER diagram (DB schema)
- Data/Calculations/Actions separation diagram
- Deep modules table
- Extensibility flowchart
- API endpoint reference
- Testing commands

Updated:
- AGENTS_IMPLEMENTATION.md — replaced stale content with summary + pointer
- docs/AGENTS.md — replaced stale architecture with mermaid + pointer
- docs/PROJECT_STRUCTURE.md — added agent_builder/ to tree

Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>

* Docs overhaul + remove ~1800 lines of dead code/stale docs

Documentation:
- README.md: Complete rewrite with features table, screenshots, mermaid
  architecture diagram, agent builder section, correct tech stack
- PROJECT_STRUCTURE.md: Full rewrite matching actual codebase
- AGENTS.md: Fixed AgentService→WorkbenchService, updated examples
- LEARNING.md: Fixed broken link

Deleted stale docs:
- AGENTS_IMPLEMENTATION.md (was a 3-line redirect stub)
- docs/RULES.md (empty file)
- docs/SQLMODEL_MIGRATION.md (historical, migration complete)

Dead code removed from agents.py (~250 lines):
- MCP client stubs (_mcp_tool_to_langchain, _ensure_ticket_mcp_connection, close)
- Schema helpers only used by dead MCP code (_json_type_to_python, _schema_to_pydantic)
- OpenAI logging callback (duplicated in agent_builder/engine/callbacks.py)
- _build_state_graph learning example (dead code)
- Unused imports (get_langchain_tools, MCPClient, create_model)

Deleted old agent_workbench/ source files (~1030 lines):
- models.py, service.py, evaluator.py, tool_registry.py
- Only __init__.py shim remains for backward compatibility

132 backend tests + 15 Playwright tests pass.

Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>

* Add Playwright tests for suggest-schema and agent chat

New E2E tests in workbench.spec.js:
- 'creates agent with output schema via suggest button' — mocks
  /api/workbench/suggest-schema, clicks Suggest Schema, verifies
  schema populates textarea, creates agent, deletes it
- 'sends message and displays mocked response' (Agent Chat UI) —
  mocks /api/agents/run, types message, clicks send, verifies
  markdown heading and tool badge render

17 Playwright tests pass (was 15, +2 new).

Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>

* Add VPN agent and failure handling Playwright tests

New Agent Fabric E2E tests:
- 'runs VPN troubleshooting agent and verifies structured output'
  Creates agent with VPN analysis prompt, runs it (mocked),
  verifies structured JSON output with ticket IDs (INC-101, INC-312),
  referenced_tickets field, and VPN content in rendered output
- 'handles agent run failure gracefully'
  Creates agent, runs it with mocked failure response,
  verifies UI doesn't crash and shows completion state

19 Playwright tests pass (was 17, +2 new).

Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>

* Fix structured output rendering in Agent Fabric UI

The output is now always structured JSON ({message, referenced_tickets}).
The UI now parses it and renders each part appropriately:

- message → rendered as GitHub-flavored Markdown (ReactMarkdown)
- referenced_tickets → rendered as monospace badges below the output
- Extra custom schema fields → rendered as formatted JSON in a pre block
- Button preview → shows message text, not raw JSON

Also handles non-JSON output gracefully (falls back to raw markdown).

Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>

* Add MCP App technical documentation

New: docs/MCP_APP.md — comprehensive guide on how this project
works as an MCP application:

- What an MCP App is (app that exposes business logic via MCP protocol)
- Architecture diagrams: consumers (Claude, Copilot, agents) → MCP endpoint
- Full protocol sequence diagram (initialize → tools/list → tools/call)
- The @operation decorator: single source of truth for REST + MCP + LangChain
- How to connect clients (Claude Desktop, Python, curl examples)
- 4-layer architecture diagram (business logic → operations → adapters → consumers)
- Extension roadmap: Resources, Prompts, SSE streaming
- Security considerations table

Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>

* Add SchemaRenderer + visual SchemaEditor with x-ui widget system

SchemaRenderer (frontend/src/features/workbench/SchemaRenderer.jsx):
- Generic component: takes {data, schema} and renders each property
  using x-ui widget annotations
- Widgets: markdown, table, badge-list, stat-card, bar-chart (Nivo),
  pie-chart (Nivo), json, hidden
- Auto-detection when no x-ui: string→markdown, integer→stat-card,
  array of objects→table, array of strings→badge-list, object→json
- Console debug logging, data-testid per field for E2E testing

SchemaEditor (frontend/src/features/workbench/SchemaEditor.jsx):
- Visual property list editor (no raw JSON editing needed)
- Add/remove properties, set name/type/description
- Widget picker dropdown with all available widgets
- Context-sensitive options (columns for table, label for stat-card,
  indexBy/keys for bar-chart)
- Syncs with suggest-schema: LLM suggestion populates visual editor
- Outputs valid JSON Schema with x-ui annotations

Backend:
- DEFAULT_OUTPUT_SCHEMA now has x-ui annotations (markdown + badge-list)
- suggest_schema prompt updated to suggest x-ui widgets per property

Wiring:
- WorkbenchPage uses SchemaRenderer for run output (replaces hardcoded)
- WorkbenchPage uses SchemaEditor for create form (replaces textarea)

20 Playwright tests pass (including new SchemaRenderer widget test).
132 backend tests pass.

Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>

* Improve suggest-schema prompt with full data domain + widget docs

The suggest-schema LLM prompt now includes:
- Ticket data domain (all field names, types, enum values, example cities)
- Available tools with descriptions (csv_list_tickets, csv_search_tickets, etc.)
- Full widget documentation with use-cases and options for each:
  markdown, table (columns), badge-list, stat-card (label),
  bar-chart (indexBy, keys), pie-chart, json, hidden
- Explicit rules: always include message+referenced_tickets,
  match widget to data shape, use snake_case names

This gives the LLM enough context to suggest schemas that actually
match the ticket data (e.g. status distribution → pie-chart,
ticket list → table with incident_id/summary/status columns).

Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>

* Fix latency issues: schema title bug + recursion_limit headroom

Investigation found 3 root causes for slow AI calls:

1. gpt-5-nano is a REASONING model — burns 192-832 reasoning tokens
   per LLM call (invisible chain-of-thought), taking 2-8s each.
   A simple 'say hello' costs 8.4s with 832 reasoning tokens.

2. response_format adds a 3rd LLM call — LangGraph's
   generate_structured_response makes a separate LLM call to format
   the output as JSON after the ReAct loop finishes.
   Without: 4.7s (2 calls). With: 13s (3 calls).

3. Missing 'title' in output_schema crashed with_structured_output.
   OpenAI's API requires a top-level 'title' in the JSON Schema.

Fixes applied:
- resolve_output_schema() now auto-adds 'title': 'AgentOutput'
  when missing (both default and custom schemas)
- DEFAULT_OUTPUT_SCHEMA has explicit 'title' field
- recursion_limit: user's setting (default 3) is now multiplied by 4
  for the actual LangGraph graph, with a floor of 10. This prevents
  GraphRecursionError when response_format adds extra graph steps.

Note: The main latency driver (reasoning tokens) is inherent to the
model choice. Users can switch to gpt-4o-mini via per-agent 'model'
field for ~10x faster non-reasoning responses.

133 backend + 20 Playwright tests pass.

Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>

* Fix agent tool token bloat: compact fields + lower default limits

Root cause: csv_list_tickets tool returned full Ticket objects with ALL
fields (notes, description, resolution, work logs) — ~65K tokens for
100 tickets. The LLM had to process all of this, causing 30-60s per
step with a reasoning model.

Changes to operations.py:
- csv_list_tickets: returns compact dicts (10 fields, not 30+),
  default limit 25 (was 100), max limit 100 (was 500)
- csv_search_tickets: same compact treatment, limit 25 (was 50)
- csv_get_ticket: now accepts optional 'fields' parameter for
  selective detail drill-down, returns dict (was full Ticket)
- Tool descriptions updated to guide agents: 'use csv_get_ticket
  for full details' pattern

Token impact per tool call:
  Before: 100 tickets × ~400 tokens = ~65,000 tokens
  After:  25 tickets × ~60 tokens = ~1,500 tokens (97% reduction)

Expected latency improvement:
  Before: ~13s per tool call (65K token input processing)
  After:  ~3-5s per tool call (1.5K token input)

153 tests pass (133 backend + 20 Playwright).

Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>

* Drop response_format to eliminate extra LLM call

LangGraph 1.0.8 implements response_format via a SEPARATE LLM call
(generate_structured_response) — adding 5-10s latency per run.
The refactor to inline tool-based structured output (github.com/
langchain-ai/langgraph/issues/5872) hasn't shipped yet.

Fix: remove response_format from create_react_agent. The system
prompt already instructs the LLM to produce JSON matching the
schema (via append_output_instructions). The frontend's
SchemaRenderer handles both parsed JSON and raw text gracefully.

Latency impact:
  Before: 3 LLM calls (decide tool + answer + format JSON) ~13s
  After:  2 LLM calls (decide tool + answer as JSON)       ~5s

When LangGraph ships inline structured output, we can re-enable
response_format with zero code changes (just pass it back to
build_react_agent).

Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>

* Enable OpenAI JSON mode for guaranteed valid JSON output

Adds response_format: {type: 'json_object'} to the ChatOpenAI
constructor via model_kwargs. This is a model-level setting that
constrains token generation to valid JSON — no extra LLM call,
no post-processing, just guaranteed JSON from every response.

This is different from LangGraph's response_format parameter
(which adds a separate LLM call). This is OpenAI's native JSON
mode applied at the API level during the same call.

Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>

* Revert JSON mode — incompatible with non-strict tool schemas

OpenAI's response_format: json_object requires all tools to have
strict schemas. Our tools (from @operation decorator) don't set
strict=True, causing: 'csv_search_tickets is not strict. Only
strict function tools can be auto-parsed'.

Reverting to prompt-only JSON enforcement, which tested at 3/3
reliability with gpt-5-nano. The frontend fallback (wraps non-JSON
as {message: raw_text}) provides additional safety.

Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>

* Add widget E2E tests + strict tools + Agent Chat JSON mode

New Playwright tests (23 total, +3):
- 'renders bar-chart and pie-chart from x-ui annotations' — injects
  mock agent with output_schema containing x-ui widgets, verifies
  SVG rendering for pie/bar charts, stat-card with label, badges
- 'renders raw JSON for object data' — verifies auto-detection:
  objects render as formatted JSON in pre blocks
- 'falls back gracefully for non-JSON output' — verifies plain
  markdown string wraps as {message: text} and renders correctly

Agent Chat (agents.py) fixes:
- Added JSON output mode (response_format: json_object)
- Added strict=True tool binding for compatibility
- Matches the same pattern as agent_builder

Strict tool binding (react_runner.py):
- build_react_agent pre-binds tools with strict=True
- Required for OpenAI JSON mode (response_format: json_object)

Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>

* Fix NameError: OpenAICallLoggingCallback was removed but still referenced

The class was deleted in the dead code cleanup but agents.py still
used it. Replaced with make_llm_logging_callback from agent_builder.

Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>

* Add 'Show in Menu' — agents appear as tabs in navigation

When an agent has show_in_menu=true, it appears as a tab in the
main navigation bar. Clicking it opens a dedicated run page with
just the input field, run button, and SchemaRenderer output.

Backend:
- AgentDefinition: new show_in_menu bool field (default false)
- AgentDefinitionCreate/Update: show_in_menu parameter
- Migration for existing DBs
- Service wires it through create/update

Frontend:
- WorkbenchPage: 'Show in menu' checkbox in create form
- App.jsx: fetches agents with show_in_menu=true, injects as tabs
- AgentRunPage.jsx: simple standalone run page (title, description,
  optional input, run button, SchemaRenderer output)
- Dynamic routes: /agent-run/{agentId}

E2E test:
- Creates agent via API with show_in_menu=true
- Verifies tab appears in navigation with agent name
- Clicks tab, verifies AgentRunPage renders
- Runs agent (mocked), verifies output with SchemaRenderer

24 Playwright + 133 backend = 157 tests pass.

Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>

* Add missing tools to chat agent: csv_sla_breach_tickets, csv_ticket_stats

The SLA Breach page was slow because the chat agent (agents.py)
didn't have the csv_sla_breach_tickets tool. The prompt said
'call csv_sla_breach_tickets' but the tool didn't exist, so the
LLM tried to replicate SLA breach logic manually using
csv_list_tickets — fetching many tickets and reasoning over them.

Now the chat agent has all 6 CSV tools matching the operations:
- csv_list_tickets (existing)
- csv_get_ticket (existing)
- csv_search_tickets (existing)
- csv_ticket_fields (existing)
- csv_sla_breach_tickets (NEW — pre-computed, ~1000 tokens)
- csv_ticket_stats (NEW — aggregated stats, ~350 bytes)

Expected improvement: 1 tool call (~1000 tokens) instead of
multiple list calls + manual reasoning (~30-60K tokens).

Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>

* Add ticket detail modal and enhance CSV ticket table functionality

Signed-off-by: Andre Bossard <anbossar@microsoft.com>

* Refactor CSVTicketTable component: reorder DialogActions import for consistency

Signed-off-by: Andre Bossard <anbossar@microsoft.com>

* Add reasoning_effort config + new tools for major speed improvement

Performance:
- reasoning_effort='low' as default — reduces gpt-5-nano from
  512 reasoning tokens (~7s) to 0-192 tokens (~1-3s) per LLM call
- Configurable per agent: low (fast), medium, high (deep), default
- Both agent_builder and legacy chat agent use reasoning_effort='low'

New tools:
- csv_count_tickets: count matching tickets WITHOUT returning data.
  Lets the LLM check 'how many VPN tickets?' (~50 tokens) before
  deciding to fetch details (~5000 tokens)
- csv_search_tickets_with_details: search + return full details
  (notes, resolution, description) in ONE call. Eliminates the
  N × csv_get_ticket drill-down pattern that caused the
  'Ticket Knowledgebase Creator' to make 5+ sequential LLM calls

Impact on 'Ticket Knowledgebase Creator' agent:
  Before: search(compact) → get_ticket × N → generate = 5+ LLM calls × ~5s = 25s+
  After:  search_with_details(query, limit=10) → generate = 2 LLM calls × ~2s = 4s

Also fixed: removed stale response_format: json_object from build_llm
(was causing strict tool errors).

Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>

* Update incident details in FALL_2_HARDWARE_PERIPHERIE and FALL_3_ZUGRIFF_BERECHTIGUNG documentation for consistency and clarity

Signed-off-by: Andre Bossard <anbossar@microsoft.com>

* Fix: all E2E tests now clean up created agents

Two tests were creating agents via the UI but not deleting them,
leaving orphans in the DB after each test run:
- 'runs an agent and appends output to run button'
- 'requires and forwards configured run input'

Added Delete button clicks at the end of both tests.
All 10 agent-creating tests now verified to clean up.

Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>

* Rewrite workbench e2e tests for tabbed UI

- Add helpers: goToCreateTab, goToAgentsTab, createAgent, createAgentViaAPI, deleteAgentViaAPI, mockEmptyRuns
- Update 'creates and deletes' to use Create Agent tab and agent cards
- Update 'runs an agent' to verify output in RunsSidePanel
- Update 'requires input' to use card inline input field + Go button
- Update 'suggest schema' to navigate to Create tab first
- Update 'failure handling' to check error in run detail panel
- Refactor SchemaRenderer tests to use setupSchemaTest helper (API-created agents, run output in side panel)
- Keep Agent Chat UI and Show in Menu tests unchanged

Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>

* feat: redesign workbench with agent cards, runs side panel, and tabbed layout

- Split WorkbenchPage into tabbed UI: Agents (cards grid) + Create Agent
- AgentCardsPanel: icon cards with Run/Edit/Delete buttons per agent
- RunsSidePanel: scrollable run history with click-to-view output
- AgentEditDialog: edit existing agents via dialog
- AgentCreateForm: extracted creation form (reusable for create + edit)
- Added API functions: updateWorkbenchAgent, listAllRuns, getRun
- All 47 Playwright tests pass (12 workbench tests updated for new UI)
- Removed Ollama references from setup.sh and package.json

Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>

* fix: LiteLLM fallback in agent_builder + add live lifecycle test

- Fixed agent_builder/engine/react_runner.py: ChatLiteLLM when no API key
- Fixed agent_builder/service.py: removed hard OpenAI key requirement
- Fixed agent_builder/chat_service.py: same
- Fixed RunsSidePanel output parsing for raw string output
- Added full lifecycle e2e test (live LLM): create → run → edit → re-run → verify history → delete

Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>

* feat: suggest schema & tools, default no tools, pure function refactor

- 'Suggest Schema & Tools' button: LLM suggests output schema AND tool selection
- Backend: _build_suggest_prompt and _parse_suggest_response as pure functions
- Frontend: tools default to empty, populated by suggest response
- RunsSidePanel: pure calculations extracted (buildAgentMap, sortRunsNewestFirst,
  resolveOutputSchema, resolveAgentName, parseRunOutput, formatRelativeTime)
- All 49 Playwright tests pass (2 live LLM tests included)

Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>

* fix: result dialog, chart rendering, markdown fence parsing

- Run results now open in a large Dialog (900px wide, 85vh max)
- Fixed parseRunOutput: strips markdown code fences from LLM output
- Fixed PieChartWidget: filters non-numeric values, formats labels
- Fixed BarChartWidget: accepts object {key: number} in addition to arrays
- Chart containers: 300px height, 600px max-width
- Tests: close dialog before cleanup (dialog blocks pointer events)
- All 49 Playwright tests pass

Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>

* feat: all-live Playwright tests, result dialog fix, runs panel fix

- Rewrote workbench tests: ZERO mocks, all 8 tests use live LLM
- Fixed RunsSidePanel: min-height for layout, runs visible on load
- Fixed parseRunOutput: strips markdown fences from LLM output
- Fixed chart widgets: pie/bar handle non-numeric values, proper sizing
- Fixed dialog close: tests use X button (in viewport) not Close (scrolled)
- Total: 43 tests, all passing, all live (1.1 min)

Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>

* refactor: extract shared parseRunOutput, add delete-all-runs

- Extracted parseRunOutput (fence-stripping + JSON parsing) into
  outputUtils.js — shared by RunsSidePanel and AgentRunPage
- Fixed AgentRunPage (show_in_menu): renders markdown instead of raw JSON
- Added DELETE /api/workbench/runs endpoint + trash button in Runs panel
- Runs panel: min-height 500px so content is visible on load

Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>

* feat: add SSE activity monitor, settings page, agent templates & run history

- Agent Activity page with real-time SSE event stream (tool calls, LLM
  thinking, run lifecycle), filterable by run_id via URL query param
- EventBus pub/sub + StreamingCallbackHandler wired into ReAct engine
- Settings page: drag-and-drop tab reorder, hide/show toggles, icon
  picker (57 FluentUI icons), persisted to localStorage
- Agent templates dropdown (KBA from tickets, worklog stats, next step
  advisor) pre-fills the create agent form
- AgentRunPage now shows filtered run history with detail dialog and
  link to Activity page filtered by run_id
- 19 new Playwright E2E tests (8 activity + 11 settings)

Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>

* feat: add Support Workflow canvas page with interactive editor

Purely browser-side workflow visualization using HTML Canvas:
- 5 node types: Start, End, Action, Decision, Wait (each with
  distinct shape and color)
- Drag-and-drop to reposition nodes
- Shift+drag to create connections between nodes
- Double-click to rename nodes inline
- Animate button shows flowing dots along edges
- Toolbar to add/delete nodes, reset to default workflow
- Default workflow: Ticket Created → Auto-Classify → Priority
  decision → L1/L2 paths → Resolved decision → Close/Reopen
- 9 Playwright E2E tests with screenshots

Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>

* feat: metro-map workflow with presets, color picker, agent assignment

Rewrite WorkflowPage as metro-map style inspired by Incident &
Problem Solving methodology:
- 3 workflow presets: Incident Solving, Problem Solving, Change Mgmt
- Metro station circle nodes with thick colored edge lines
- Edge color inherited from outgoing node
- Click node → dialog with color picker (8 colors) and agent selector
  (10 agent presets)
- Agent indicator dot on nodes with assigned agents
- Color legend auto-generated from used colors
- 12 Playwright E2E tests covering presets, node config, animation

Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>

* feat: friendlier workflow editor — connect mode, double-click add, dialog edges

- Connect Mode toggle button: click source node then target to draw edge
  (no shift key needed). Crosshair cursor + green '+' hint on target.
- Double-click empty canvas area to add a node at that position
- Node dialog now has 'Connect to…' section with buttons for each
  unconnected node — draw edges without touching the canvas
- Add Node button opens config dialog immediately for the new node
- Dynamic help text updates based on current mode
- Escape key exits connect mode
- Updated Playwright tests for new UX

Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>

* feat: add 'Improve my Prompt' button to Agent Fabric

LLM-powered prompt improvement following 2025 best practices:
- Backend: /api/workbench/improve-prompt endpoint + service method
  that rewrites prompts with clear role, goals, numbered steps,
  tool references, output format, and constraints
- Frontend: '✨ Improve my Prompt' button below the system prompt
  textarea, disabled when empty, replaces prompt with improved version
- 4 Playwright E2E tests with before/after screenshots

Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>

* fix: prompt improvement skips output format (handled by schema)

Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>

* fix: improve-prompt uses selected tools, not all available

Pass tool_names from frontend form state so the LLM only references
tools the user actually selected for this agent.

Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>

* fix: remove maxHeight on tools list to avoid scrolling

Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>

* fix: replace worklog template with Topic & Product Analysis

Worklog columns in data.csv are all empty/zero. New template analyzes
topics, products, services, priority distribution, and group workload
using data that actually exists in the CSV.

Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>

---------

Signed-off-by: Andre Bossard <anbossar@microsoft.com>
Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>

* fix: enhance CSV ticket handling and update LLM backend initialization (#25)

Signed-off-by: Andre Bossard <anbossar@microsoft.com>

* Implement code changes to enhance functionality and improve performance

Signed-off-by: Andre Bossard <anbossar@microsoft.com>

* feat: enhance UI components and improve test coverage for agent functionality

Signed-off-by: Andre Bossard <anbossar@microsoft.com>

* Add model selection to agent workbench

Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>

* Add DSPy Prompt Tuning Playground — 8 notebooks, 20 tasks, 165 tests

Interactive Jupyter notebook series teaching prompt optimization with DSPy.
Organized by learning concepts from Grokking Simplicity and A Philosophy
of Software Design.

Structure:
- 8 notebooks (00-07): Introduction → Data/Calc/Actions → Deep Modules →
  Evaluation as Spec → Optimizer as Compiler → Domain Tuning → Agentic → Finale
- 20 tasks across 4 tiers: Basics, Reasoning, Composition, Agentic
- dspy_tasks/ library: data.py (DATA), calculations.py (CALCULATIONS),
  actions.py (ACTIONS), tools.py, visualize.py (ipywidgets + Plotly)
- 16 JSON datasets (13 generic + 3 CSV-derived from ticket data)
- 165 passing pytest tests covering signatures, metrics, and registry

Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>

* Integrate DSPy notebooks with project LiteLLM/Copilot config

Add config.py that reads .env and dynamically discovers models via
litellm.get_valid_models() — same env vars as the backend (LITELLM_MODEL,
LITELLM_FALLBACK_MODELS). Replace all hardcoded model lists in 8 notebooks
with get_available_models(). Replace raw dspy.LM() calls with
configure_dspy(). 192 tests passing (27 new config tests).

Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>

* Add DSPy Playground section to README

Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>

* Remove unused screenshot files from the repository

Signed-off-by: Andre Bossard <anbossar@microsoft.com>

* Add cross-platform start.sh for notebook playground

Auto-creates venv, installs/updates deps, launches Jupyter Lab.
Works on macOS (zsh/bash) and Ubuntu (bash).

Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>

* Replace pure-function tests with E2E tests hitting live LiteLLM

Remove test_calculations.py, test_data.py, test_config.py (pure function tests).
Add test_e2e.py: 11 tests covering config discovery, tier 1-3 predictions,
baseline scoring, BootstrapFewShot optimization, and cross-model comparison
— all running against real Copilot models via LiteLLM.

Also fix configure_dspy() to inject Editor-Version headers for Copilot models.

Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>

* Translate notebooks to informal German, add Mermaid diagrams + quiz

All 8 notebooks: markdown → informelles Deutsch (du-Form).
Code cells unchanged. Coherent story arc with bridges between notebooks.

- 5 Mermaid diagrams (learning path, DATA/CALC/ACTIONS, module depth,
  optimizer pipeline, full architecture)
- Interactive quiz in Notebook 07 (7 MC questions via ipywidgets)
- mermaid() and quiz() helpers added to visualize.py
- 11 E2E tests still passing

Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>

* Tone down DSPy, add HTML/CSS diagrams, collapse setup cells

- DSPy mentions in markdown: 29 → 4 (only where referencing code)
- Focus on universal concepts: evaluation, tuning, optimization
- mermaid() (CDN) → diagram()/diagram_compare() (pure HTML/CSS, zero deps)
- Setup code cells collapsed via jupyter.source_hidden metadata
- Hands-on narrative: baseline → improve → see difference
- 11 E2E tests passing

Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>

* Add hands-on learning flow: LLM failures, manual tuning, benchmarks

New library functions:
- run_with_prompt(): evaluate custom user instructions
- run_on_examples(): evaluate on benchmark datasets
- prompt_workshop(): prefilled Textarea + Run + score history widget
- benchmarks.py: load_hotpotqa(), load_math(), load_truthfulqa()
- datasets/truthfulqa_sample.json: 30 validated hallucination questions

Notebook enhancements (user edits text only, never code):
- NB01: Tricky sarcasm examples showing LLM failures + TruthfulQA
  benchmark + 'Was ist Accuracy?' explanation
- NB03: Interactive prompt_workshop() for manual tuning with score
  history tracking + TruthfulQA editing exercise
- NB04: 'Erst du, dann die Maschine' — manual attempt before
  auto-optimization with side-by-side comparison

Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>

* Fix diagram_compare() call in NB05 (left/right, not before/after)

All 8 notebooks verified: execute headless without errors.

Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>

* Restructure: Grokking Simplicity + Deep Modules to appendix, print→display

Main flow streamlined to 6 notebooks:
00 Intro → 01 Evaluation → 02 Optimization → 03 Domain → 04 Agents → 05 Finale

Grokking Simplicity and Deep Modules moved to optional appendices.
Replaced print() with styled display(HTML()) in standalone cells.
Removed '20 Aufgaben' listing from intro. Updated all bridges,
learning path diagram, and README.

All 8 notebooks execute headless without errors. 11 E2E tests pass.

Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>

* Fix HTML string escaping in NB01 metrics cell

Use triple-quoted f-string and HTML entities (&lsquo;) instead of
raw single quotes inside single-quoted strings.

Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>

* Replace display(HTML()) with native print/display in all notebooks

No more inline HTML in code cells. Use print() with emoji formatting,
pandas DataFrames, and display(Markdown()) for the finale only.
15 cells rewritten across 7 notebooks.

Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>

* Fix token_f1 examples to show precision vs recall tradeoff

Old example had precision=recall=0.5 which doesn't teach anything.
New examples show: cautious (high P, low R), overeager (low P, high R),
perfect, and completely wrong.

Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>

* Add clickable links to next notebook at bottom of each

Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>

* Merge 00+01 into single notebook, streamline to 5+2 structure

00_introduction + 01_evaluation merged into 01_evaluation_and_tuning.
Flow: Setup → first call → LLM failures → accuracy → manual tuning
all in one notebook. Normalized cell IDs. Updated README + links.

Structure: 01-05 main path + 2 appendices.

Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>

* Replace 'Deine erste Vorhersage' with myth-busting quiz

Instead of DSPy-specific Predict demo, start with factual questions
where the model gets things WRONG (Australian capital, glass myth,
goldfish memory). Shows immediately: LLMs sound confident but aren't
always correct. Motivates why evaluation matters.

Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>

* Add 5 evaluation strategies with live demos

After showing that LLMs return unpredictable strings, teach 5 solutions:
1. Constrain answers (number/yes/no) — live demo
2. Multiple choice
3. Keyword matching
4. Semantic similarity (Token F1)
5. LLM-as-Judge — live demo with a judge LLM

Includes tradeoff comparison table.

Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>

* Single model config: remove model arg from all action functions

Model is configured ONCE via configure_dspy() at notebook startup.
Action functions (run_baseline, run_optimization, etc.) use the
already-configured LM. No more threading issues from widget callbacks.

Fixed broken prompt_workshop() calls (missing commas from regex cleanup).

Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>

* Add markdown explanations before every code cell, collapse setup

Every visible code cell now has a German markdown header explaining
what the user will see and why it matters. Setup cells collapsed.
Fixed compare_models import (removed from simplified actions.py).

Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>

* Fix nb01: swap hidden setup cell before markdown to ensure visible code has header

Reorder cells [22] and [23] so the hidden setup code comes before
the 'Klassische Tests vs KI-Metriken' markdown, which then directly
precedes the visible diagram_compare code cell.

Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>

* Restructure NB01: numbered chapters, code gen moved to end with exec

Clean chapter flow:
1. Wie gut ist das Modell? (quiz + failures)
2. Wie bewertet man LLM-Antworten? (5 strategies + live demos)
3. Metriken in der Praxis (F1, composite, tradeoffs)
4. Prompt-Tuning Workshop (interactive text box)
5. Kann das Modell Code schreiben? (generates + executes code)

Removed 'Teil 2' divider, removed orphan cells.

Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>

* Rename 'Workshop' to 'Kann ich mit dem Prompt die Genauigkeit verbessern?'

Clearer framing: not a DSPy workshop, but the natural question after
seeing failures + metrics. Bridges to automatic optimization.

Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>

* Replace model picker widgets with simple code variable

MODEL = 'github_copilot/gpt-5.1' — user edits the string to switch.
Available models listed via print(). No more dropdown widgets.
Cleaned all model_dd/model_picker references across all notebooks.

Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>

* Fix table truncation: show full content with word-wrap

Removed [:80] and [:100] truncation from actions.py and visualize.py.
Table uses table-layout:fixed + word-break:break-word for wrapping.

Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>

* Remove ALL widgets — pure executable notebooks, no interaction needed

Every notebook now runs top-to-bottom without clicks or input:
- Model selection: MODEL = 'github_copilot/gpt-5.1' (edit the string)
- Task selection: TASK = 'ticket_routing' (edit the string)
- ROI calculator: plain variables instead of sliders
- Prompt tuning: PROMPT_V1/V2 variables instead of Textarea
- No more buttons, dropdowns, or callbacks anywhere

All 5 notebooks verified headless (NB03 token-expired during long run).

Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>

* Enhance code generation dataset and improve example evaluation with timeout handling

- Added a new function to evaluate mathematical expressions respecting operator precedence and parentheses to the code generation dataset.
- Introduced a function to merge overlapping intervals in the code generation dataset.
- Modified the _evaluate_examples function to include a timeout feature for LLM calls, ensuring that each example is evaluated within a specified time limit.
- Improved error handling and output formatting during example evaluations to provide clearer feedback on timeouts and errors.

* Refactor code generation examples: simplify Fibonacci and palindrome functions, and enhance flatten function implementation

* Add 5s timeout to code generation exec() — prevents infinite loops

Both compilation and test execution are wrapped with signal.alarm(5).
If generated code runs too long (infinite loop), it times out cleanly
with '❌ Timeout: Code läuft zu lange (Endlosschleife?)'.

NB01 verified: executes headless without errors.

Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>

* Add baseline individual scores to optimization results

- Updated the `run_optimization` function to include `baseline_individual_scores` in the returned results, capturing per-example results before optimization.
- Modified the `OptimizationResult` class to define `baseline_individual_scores` as a list of dictionaries, allowing for detailed tracking of individual scores pre-optimization.

* Rewrite NB03: clean 4-step arc, no duplicates

1. See your real ticket data
2. Run generic prompt → mediocre score
3. Tune with your data → big improvement
4. Takeaway: your data is your moat

Removed 3x duplicate headers, old button references, and
repeated explanations. 21 cells → 11 cells.

Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>

* Strip notebook outputs + install nbstripout git filter

Outputs are automatically stripped on git add via .gitattributes filter.
Notebooks checked out from git will have no outputs — run them fresh.

Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>

* Integrate notebook setup into setup.sh, fix start.sh

setup.sh: adds notebook venv + deps install after main setup.
start.sh: --install-only flag for non-interactive use,
fixed filename reference (00→01). Both work on bash + zsh.

Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>

* Refactor optimization notebook and enhance actions module

- Updated the optimization notebook to improve prompt instructions and evaluation process.
- Changed the manual prompt variable name for clarity and adjusted the evaluation metrics.
- Enhanced the `run_optimization` function to accept custom instructions for better prompt optimization.
- Added a new function to format optimized prompts for improved readability.
- Introduced a new CSV analysis script to summarize categories, priorities, and assigned groups from the dataset.

* Add data processing scripts for ticket analysis and routing

- Implemented _check_fields.py to analyze fields in the CSV data and print useful content.
- Created _check_more_fields.py to explore additional fields of interest and their fill rates.
- Developed _curate_data.py to curate and score tickets for training, focusing on informative content.
- Added _import_csv.py to import CSV data, analyze incident types, and generate new tickets for underrepresented groups.
- Introduced _predict_group.py to evaluate predictive power of various fields for ticket assignment.
- Built _rebuild_data.py to create a balanced dataset for ticket routing, ensuring representation across groups.

* Enhance run_optimization function: adjust BootstrapFewShot parameters for improved performance

* SECURITY: Pin litellm to safe versions (supply chain attack)

litellm PyPI versions 1.82.7 and 1.82.8 were compromised by attacker
TeamPCP with credential-stealing malware. See:
https://github.com/BerriAI/litellm/issues/24518

Pinned to known-safe versions:
- backend: litellm==1.82.1
- notebooks: litellm==1.82.6

Do NOT upgrade until BerriAI confirms PyPI is clean.

Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>

* Move task picker from NB02 to NB03, expand domain tuning

NB02: removed 'Beliebige Aufgabe' section (belongs in NB03)
NB03: 6-step arc with task catalog + domain tuning:
  1. See all 20 tasks
  2. Pick any task and optimize it
  3. Load real ticket data
  4. Run generic prompt → mediocre
  5. Tune with domain data → much better
  6. Takeaway: your data is your moat

Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>

* Fix _format_optimized_prompt for different dump_state return types

dump_state() can return a list or dict depending on DSPy version.
Handle both gracefully with type checks.

Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>

* Refactor and enhance agent tasks and tools

- Updated task definitions in notebooks to use more descriptive field names (e.g., 'query' to 'question').
- Changed the default task in domain tuning notebook from "sentiment" to "plan_execute".
- Improved agent behavior optimization by refining prompts and adding explanations for model choices.
- Enhanced search functionality in tools to provide better ticket search results and counts.
- Updated calculations for plan quality and self-correct accuracy to align with new output structures.
- Added MIPROv2 optimization step to improve agent responses based on vague prompts.
- Adjusted dataset for search agent to include more complex queries and answers.
- Updated kernel specifications across notebooks to use Python 3.13.12.

* Update domain tuning notebook to use 'plan_execute' task and improve agent optimization examples. Change model to 'github_copilot/gpt-4o-mini' for faster performance. Enhance explanations for prompt optimization and MIPROv2. Adjust markdown formatting and update kernel specifications across notebooks.

* Refactor domain tuning notebook: streamline optimization section and enhance takeaway insights

* Refactor environment loading: support .env files from both project root and notebooks directory

Signed-off-by: Andre Bossard <anbossar@microsoft.com>

* Refactor test files: streamline imports and enhance readability in LLM service tests and evaluation…
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants