Explorbot integrates with Langfuse for tracing and observability. This lets you analyze what happened during a session — what data was received, which tools were called, and how the agents made decisions.
When Explorbot runs autonomously, you need visibility into:
- What prompts were sent to the AI
- What tools were called and with what parameters
- Token usage and costs per session
- Timing of each operation
- Errors and retries that occurred
This data helps you:
- Debug failed tests — see exactly what AI saw and decided
- Create Knowledge fixes — understand what context was missing
- Optimize prompts and agent performance
- Understand why a test passed or failed
- Export sessions for analysis with
/explorbot-debugskill
Sign up at langfuse.com (free tier available) or self-host.
From your Langfuse project settings, copy:
- Public Key
- Secret Key
Add credentials to your .env file:
LANGFUSE_PUBLIC_KEY=pk-lf-xxxxxxxx
LANGFUSE_SECRET_KEY=sk-lf-xxxxxxxxOr configure in explorbot.config.js:
export default {
ai: {
model: groq('gpt-oss-20b'),
langfuse: {
enabled: true,
publicKey: process.env.LANGFUSE_PUBLIC_KEY,
secretKey: process.env.LANGFUSE_SECRET_KEY,
baseUrl: 'https://cloud.langfuse.com', // or your self-hosted URL
},
},
};Once configured, all AI calls are automatically traced. No code changes needed.
Explorbot uses the Vercel AI SDK integration with Langfuse. Each session captures:
| Trace | Description |
|---|---|
tester.loop |
Full test execution cycle |
research |
Page analysis by Researcher agent |
navigator.loop |
Navigation and interaction attempts |
ai.generateText |
Text generation calls |
ai.generateObject |
Structured output calls |
codeceptjs.step |
Individual browser actions |
I.click, I.fillField, etc. |
Specific CodeceptJS commands |
- Open your Langfuse project
- Find the session by timestamp or name
- Click to see the full trace tree
- Inspect individual spans for:
- Input prompts
- Output responses
- Token counts
- Duration
- Errors
Export a session as JSON from Langfuse for detailed debugging:
- Open your Langfuse project
- Find the failed
tester.looptrace - Click the Export button (or use the API)
- Save as JSON file (e.g.,
failed-session.json)
The trace contains the full context: prompts, tool calls, page states, and AI decisions.
Explorbot includes a Claude Code skill for analyzing failed sessions.
In Claude Code, run:
/explorbot-debug
The skill will ask for:
- Langfuse JSON export — path to your exported trace file
- Or nothing — it will analyze
output/explorbot.loginstead
The debug skill looks for three failure patterns:
| Pattern | Symptoms | Solution |
|---|---|---|
| Missing Context | Wrong element clicked, didn't understand UI | Add Knowledge file with disambiguation rules |
| Wrong Prompts | Incorrect assumptions, wrong flow | Add Knowledge with business context |
| Wrong Tool Choice | Used click when form needed, typing issues | Add Knowledge with CodeceptJS code examples |
-
Extracts key data from trace using jq:
- Failed tool calls
- URLs visited
- Prompts sent to AI
-
Identifies root cause of failures
-
Suggests Knowledge files to fix the issue:
--- url: /admin/users/* --- ## User Table Each row has same buttons. Use container: I.click('Delete', '[data-user-id="123"]')
-
Can try interactions using browser tools (if available) and document working CodeceptJS code
Without Langfuse, you only see:
- Final test result (pass/fail)
- Basic logs
With Langfuse traces, you can see:
- Exact prompts AI received at each step
- What page state AI was analyzing
- Which tool calls succeeded/failed and why
- Token usage and timing
- Full decision chain
This makes debugging AI behavior possible — you can trace exactly where and why it went wrong.
# 1. Test fails
explorbot explore --from /admin/users
# 2. Open Langfuse, find tester.loop trace, export JSON
# Save to: ./traces/failed-users-test.json
# 3. In Claude Code:
/explorbot-debug
# Provide path: ./traces/failed-users-test.json
# 4. Skill analyzes and suggests Knowledge fix
# 5. Create knowledge file
explorbot know "/admin/users/*" "Use container context for table actions"
# 6. Re-run testexplorbot explore --verboseOr set the environment variable:
DEBUG=explorbot:* explorbot exploreThis shows detailed logs including:
- Prompts sent to AI
- Tool calls and results
- State transitions
# AI provider calls only
DEBUG=explorbot:provider explorbot explore
# Navigator agent only
DEBUG=explorbot:navigator explorbot explore
# Multiple namespaces
DEBUG=explorbot:tester,explorbot:navigator explorbot explore| Namespace | What it shows |
|---|---|
explorbot:provider |
AI API calls, responses |
explorbot:provider:out |
Outgoing prompts |
explorbot:provider:in |
Incoming responses |
explorbot:navigator |
Navigation decisions |
explorbot:researcher |
Page analysis |
explorbot:planner |
Test scenario generation |
explorbot:tester |
Test execution |
explorbot:historian |
Experience saving |
explorbot:quartermaster |
A11y analysis |
Langfuse tracks token usage per call. Use this to:
- Monitor costs across sessions
- Compare model efficiency
- Identify expensive operations
- Optimize prompts to reduce tokens
For privacy or compliance, you can self-host Langfuse:
# Docker
docker run -d -p 3000:3000 langfuse/langfuseThen set baseUrl in your config:
langfuse: {
baseUrl: 'http://localhost:3000',
}