Feature tabular data by paullizer · Pull Request #768 · microsoft/simplechat

paullizer · 2026-03-04T23:08:42Z

Tabular Data Analysis — SK Mini-Agent for Normal Chat
- Tabular files (CSV, XLSX, XLS, XLSM) detected in search results now trigger a lightweight Semantic Kernel mini-agent that pre-computes data analysis before the main LLM response. This brings the same analytical depth previously only available in full agent mode to every normal chat conversation.
- Automatic Detection: When AI Search results include tabular files from any workspace (personal, group, or public) or chat-uploaded documents, the system automatically identifies them via the TABULAR_EXTENSIONS configuration and routes the query through the SK mini-agent pipeline.
- Unified Workspace and Chat Handling: Tabular files are processed identically regardless of their storage location. The plugin resolves blob paths across all four container types (user-documents, group-documents, public-documents, personal-chat) with automatic fallback resolution if the primary source lookup fails. A user asking about an Excel file in their personal workspace gets the same analytical treatment as one asking about a CSV uploaded directly to a chat.
- Six Data Analysis Functions: The TabularProcessingPlugin exposes describe_tabular_file, aggregate_column (sum, mean, count, min, max, median, std, nunique, value_counts), filter_rows (==, !=, >, <, >=, <=, contains, startswith, endswith), query_tabular_data (pandas query syntax), group_by_aggregate, and list_tabular_files — all registered as Semantic Kernel functions that the mini-agent orchestrates autonomously.
- Pre-Computed Results Injected as Context: The mini-agent's computed analysis (exact numerical results, aggregations, filtered data) is injected into the main LLM's system context so it can present accurate, citation-backed answers without hallucinating numbers.
- Graceful Degradation: If the mini-agent analysis fails for any reason, the system falls back to instructing the main LLM to use the tabular processing plugin functions directly, preserving full functionality.
- Non-Streaming and Streaming Support: Both chat modes are supported. The mini-agent runs synchronously before the main LLM call in both paths.
- Requires Enhanced Citations: The tabular processing plugin depends on the blob storage client initialized by the enhanced citations system. The enable_enhanced_citations admin setting must be enabled for tabular data analysis to activate.
- Files Modified: route_backend_chats.py, semantic_kernel_plugins/tabular_processing_plugin.py, config.py.
- (Ref: run_tabular_sk_analysis(), TabularProcessingPlugin, collect_tabular_sk_citations(), TABULAR_EXTENSIONS)
Tabular Tool Execution Citations
- Every tool call made by the SK mini-agent during tabular analysis is captured and surfaced as an agent citation, providing full transparency into the data analysis pipeline.
- Automatic Capture: The existing @plugin_function_logger decorator on all TabularProcessingPlugin functions records each invocation including function name, input parameters, returned results, execution duration, and success/failure status.
- Citation Format: Tool execution citations appear in the same "Agent Tool Execution" modal used by full agent mode, showing tool_name (e.g., TabularProcessingPlugin.aggregate_column), function_arguments (the exact parameters passed), and function_result (the computed data returned).
- End-to-End Auditability: Users can verify exactly which aggregations, filters, or queries were run against their data, what parameters were used, and what raw results were returned — before the LLM summarized them into the final response.
- Files Modified: route_backend_chats.py.
- (Ref: collect_tabular_sk_citations(), plugin_invocation_logger.py)
SK Mini-Agent Performance Optimization
- Reduced typical tabular analysis time from ~74 seconds to an estimated ~30-33 seconds (55-60% reduction) through three complementary optimizations.
- DataFrame Caching: Per-request in-memory cache eliminates redundant blob downloads. Previously, each of the ~8 tool calls in a typical analysis downloaded and parsed the same file independently. Now the file is downloaded once and subsequent calls read from cache. Cache is automatically scoped to the request (new plugin instance per analysis) and garbage-collected afterward.
- Pre-Dispatch Schema Injection: File schemas (columns, data types, row counts, and a 3-row preview) are pre-loaded and injected into the SK mini-agent's system prompt before execution begins. This eliminates 2 LLM round-trips that were previously spent on file discovery (list_tabular_files) and schema inspection (describe_tabular_file), allowing the model to jump directly to analysis tool calls.
- Async Plugin Functions: All six @kernel_function methods converted to async def using asyncio.to_thread(). This enables Semantic Kernel's built-in asyncio.gather() to truly parallelize batched tool calls (e.g., 3 simultaneous aggregate_column calls) instead of executing them serially on the event loop.
- Batching Instructions: The system prompt now instructs the model to batch multiple independent function calls in a single response, reducing LLM round-trips further.
- Files Modified: tabular_processing_plugin.py, route_backend_chats.py, config.py.
- (Ref: _df_cache, asyncio.to_thread, pre-dispatch schema injection in run_tabular_sk_analysis())

…ab data without full agent

paullizer added 6 commits March 3, 2026 22:23

initial tabular data support

7fe4055

working with workspaces, aligned with chat

84f33bd

add popup modal view for xlsx

a2c23c5

added tabular data analysis as a mini-agent within chats to process t…

2918b2a

…ab data without full agent

Update route_backend_chats.py

ff65579

SK Mini-Agent Performance Optimization

12ca682

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Feature tabular data#768

Feature tabular data#768
paullizer wants to merge 6 commits intoDevelopmentfrom
feature-tabular-data

paullizer commented Mar 4, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

Conversation

paullizer commented Mar 4, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant