OpenBB-finance · HOYALIM · Apr 8, 2026 · Apr 8, 2026
diff --git a/content/workspace/developers/agents-integration.md b/content/workspace/developers/agents-integration.md
@@ -26,6 +26,8 @@ Recommended stack: FastAPI with `EventSourceResponse` from `sse_starlette` and O
 
 See this repository to [get started](https://github.com/OpenBB-finance/agents-for-openbb).
 
+If you need a repeatable way to debug unstable agent behavior, see [External robustness testing](./ai-features/external-robustness-testing).
+
 ## Adding an Agent in Workspace
 
 1. Deploy your service (locally or remote).

diff --git a/content/workspace/developers/ai-features/external-robustness-testing.md b/content/workspace/developers/ai-features/external-robustness-testing.md
@@ -0,0 +1,88 @@
+---
+title: External robustness testing
+sidebar_position: 9
+description: Use an external failure map to diagnose unstable OpenBB AI workflows
+keywords:
+- robustness
+- RAG
+- LLM
+- agents
+- evaluation
+- troubleshooting
+---
+
+import HeadTitle from '@site/src/components/General/HeadTitle.tsx';
+
+<HeadTitle title="AI Features — External robustness testing | OpenBB Workspace Docs" />
+
+Use an external failure map when an OpenBB AI workflow works once but behaves inconsistently across document retrieval, tool execution, or multi-step reasoning. This page shows how to use the [WFGY ProblemMap](https://github.com/onestardao/WFGY/blob/main/ProblemMap/README.md) as an optional diagnostic reference for OpenBB-based chat and agent workflows.
+
+This is a docs-only integration pattern. OpenBB remains independent from any specific evaluation toolkit.
+
+## Architecture
+
+The goal is not to replace your agent logic. The goal is to give you a repeatable way to classify failures before you change prompts, tools, or retrieval settings.
+
+An OpenBB workflow usually combines several moving parts:
+
+- context added from widgets or PDFs
+- MCP tools or other external tools
+- agent reasoning over multiple steps
+- model output grounded in retrieved data
+
+When one of those layers becomes unstable, use a single failure map to decide what to inspect first.
+
+### Good fits for this pattern
+
+- the same question sometimes produces different answers
+- PDF or filing retrieval appears incomplete
+- a tool call succeeds but the final answer is still wrong
+- a multi-step agent flow loops, skips steps, or loses grounding
+
+### Related OpenBB patterns
+
+- [Parse PDF context](./parse-pdf-context)
+- [Citations for documents](./citations-for-documents)
+- [MCP tools integration](./mcp-tools)
+- [Share step-by-step reasoning](./share-step-by-step-reasoning)
+
+## Symptom map
+
+Use the external map as a triage layer, then inspect the matching OpenBB workflow component.
+
+| Symptom in an OpenBB workflow | Example ProblemMap entry | Inspect first in your OpenBB setup |
+| --- | --- | --- |
+| Answer cites the wrong filing or misses a passage | No.1 hallucination & chunk drift; No.5 semantic ≠ embedding | PDF parsing, context assembly, citations, retrieval boundaries |
+| The agent calls the wrong tool or uses the right tool with bad arguments | No.13 multi-agent chaos | MCP tool metadata, tool descriptions, tool-call prompt framing |
+| The model answers confidently without grounding | No.1 hallucination & chunk drift | context assembly, citation handling, prompt instructions, response checks |
+| The workflow works once, then drifts on follow-up turns | No.14 bootstrap ordering | conversation state, tool-result handoff, multi-request flow |
+| A multi-step task stalls or loops | No.13 multi-agent chaos; No.14 bootstrap ordering | reasoning visibility, tool sequencing, explicit stop/continue conditions |
+
+## Example workflow
+
+Assume you built a custom OpenBB agent that:
+
+1. reads a PDF filing with [Parse PDF context](./parse-pdf-context)
+2. adds citations with [Citations for documents](./citations-for-documents)
+3. calls an external tool through [MCP tools integration](./mcp-tools)
+4. shows intermediate steps with [Share step-by-step reasoning](./share-step-by-step-reasoning)
+
+If the final answer is unstable, classify the failure before changing code:
+
+- if the source text is missing, inspect the PDF/context layer first
+- if the tool call is wrong, inspect the MCP tool description and arguments first
+- if the data is correct but the answer is not, inspect the prompt, reasoning, and response checks first
+
+That keeps debugging focused on the failing layer instead of changing multiple parts of the workflow at once.
+
+## Minimal process
+
+1. Reproduce the failure with the smallest possible prompt.
+2. Record what context was available: widgets, PDFs, tools, and prior messages.
+3. Classify the failure with an external map such as WFGY.
+4. Change only the layer that matches the failure class.
+5. Re-run the same prompt and compare the result.
+
+## Scope note
+
+The [WFGY ProblemMap](https://github.com/onestardao/WFGY/blob/main/ProblemMap/README.md) is one external option for this kind of troubleshooting. You can use the same approach with any internal or external checklist, as long as it helps your team separate retrieval issues, tool issues, and reasoning issues before making changes.
diff --git a/content/workspace/developers/openbb-ai-sdk.md b/content/workspace/developers/openbb-ai-sdk.md
@@ -27,6 +27,8 @@ pip install openbb-ai
 
 The code is open source and is [available in this repository](https://github.com/OpenBB-finance/openbb-ai).
 
+For workflow-level debugging across retrieval, tool calls, and reasoning, see [External robustness testing](./ai-features/external-robustness-testing).
+
 ## Building Your First Agent
 
 Every agent starts with a query handler that receives a `QueryRequest` object containing everything your agent needs: