Skip to content

Improve all samples with cache-awareness, add 4 new samples, fix SDK versions, and prepare repo for public sharing#546

Open
leestott wants to merge 8 commits intomicrosoft:mainfrom
leestott:samples/improve-and-add-new-samples
Open

Improve all samples with cache-awareness, add 4 new samples, fix SDK versions, and prepare repo for public sharing#546
leestott wants to merge 8 commits intomicrosoft:mainfrom
leestott:samples/improve-and-add-new-samples

Conversation

@leestott
Copy link

@leestott leestott commented Mar 23, 2026

Summary

This PR improves every existing sample across all languages (C#, JavaScript, Python, Rust) with cache-awareness and visual feedback, adds 4 brand-new samples, fixes SDK version inconsistencies across the repo, and addresses repo hygiene issues for public sharing readiness.

93 files changed — 67 new files, 26 modified files.


What's Changed

1. New Samples (4)

samples/js/local-cag/ — Context-Augmented Generation (12 files)

Offline CAG-powered support agent for gas field engineers. Pre-loads domain documents (valve inspections, PPE requirements, emergency shutdown procedures, etc.) directly into the context window — no vector database, no embeddings, no retrieval pipeline needed.

  • Express web server with streaming chat UI
  • Full document context pre-loading at startup
  • Model auto-selection with cache awareness
  • Domain-specific gas field safety documentation included

samples/js/local-rag/ — Retrieval-Augmented Generation (11 files)

Offline RAG-powered support agent using SQLite + term-frequency vectors for document retrieval. Demonstrates the full RAG pipeline running 100% locally.

  • Document ingestion with chunking (npm run ingest)
  • SQLite-backed vector store with term-frequency ranking
  • Express web server with streaming chat UI
  • Same gas field domain docs for direct comparison with CAG approach

samples/python/agent-framework/ — Microsoft Agent Framework Integration (24 files)

Full-featured agent framework sample showing Foundry Local as the LLM backend for agentic AI workflows.

  • 5 interactive demos: weather tools, code reviewer, math agent, sentiment analyzer, multi-agent debate
  • Tool calling with automatic function dispatch
  • RAG pipeline with document ingestion
  • Flask web UI with streaming responses
  • Orchestrator pattern for multi-step reasoning
  • Comprehensive README with architecture diagrams

samples/cs/whisper-transcription/ — ASP.NET Core Whisper Transcription (13 files)

Production-quality audio transcription service using Foundry Local's Whisper model via WinML.

  • ASP.NET Core Minimal API with proper service architecture
  • Drag-and-drop audio upload UI
  • Real-time recording via MediaRecorder API
  • Health checks for Foundry service availability
  • Error handling middleware
  • Clean separation: FoundryModelService, TranscriptionService, FoundryHealthCheck

2. Cache-Awareness Improvements (All Existing Samples)

Every existing sample was updated to check the local model cache before attempting downloads. This provides:

  • Visual feedback — users see whether their model is already cached or needs downloading
  • Faster startup — skips unnecessary download operations
  • Better UX — clear progress indicators with ✓ Model already cached or ⏳ Downloading...

C# samples updated (6 files):

  • AudioTranscriptionExample/Program.cs
  • FoundryLocalWebServer/Program.cs
  • HelloFoundryLocalSdk/Program.cs
  • ModelManagementExample/Program.cs
  • ToolCallingFoundryLocalSdk/Program.cs
  • ToolCallingFoundryLocalWebServer/Program.cs

JavaScript samples updated (7 files):

  • audio-transcription-example/app.js
  • copilot-sdk-foundry-local/src/app.ts and src/tool-calling.ts
  • langchain-integration-example/app.js
  • native-chat-completions/app.js
  • tool-calling-foundry-local/src/app.js
  • web-server-example/app.js

Python samples updated (4 files):

  • hello-foundry-local/src/app.py
  • summarize/summarize.py
  • functioncalling/fl_tools.ipynb
  • functioncalling/README.md

Notebooks updated (1 file):

  • rag/rag_foundrylocal_demo.ipynb — significant rewrite with cache detection, clearer cell structure, and improved RAG pipeline

3. SDK API Correctness Fixes (7 files)

Validated all samples against the latest public SDK APIs (JS SDK sdk/js/src, Python SDK sdk_legacy/python, C# SDK sdk/cs/src) and fixed:

File Issue Fix
js/local-cag/src/modelSelector.js Used private selectedVariant._modelInfo Switched to public model.variants / variant.modelInfo / model.isCached
js/local-rag/src/chatEngine.js progress * 100 yielded 0–10000 (SDK reports 0–100) Changed to Math.round(progress) for display, progress / 100 for normalized value
python/summarize/summarize.py load_model(cached_models[0].id) inconsistent with alias pattern Changed to load_model(cached_models[0].alias)
python/agent-framework/foundry_boot.py Fragile str(m) substring match for model ID resolution Replaced with manager.get_model_info(alias).id
python/agent-framework/web.py drain() buffered all SSE events before yielding Replaced with incremental __anext__() loop for real-time streaming
cs/whisper-transcription/TranscriptionService.cs CancellationToken.None hardcoded Threaded CancellationToken through method and into all async calls
cs/whisper-transcription/FoundryModelService.cs progress % 10 == 0 unreliable for float Replaced with Math.Floor(progress / 10) threshold bucket approach

4. Review Feedback Fixes — Round 2 (3 files)

File Issue Fix
cs/whisper-transcription/FoundryModelService.cs InitializeAsync() not thread-safe — concurrent ASP.NET requests could double-initialize Added SemaphoreSlim with double-check locking pattern
python/summarize/README.md Claimed default model is phi-4-mini but code uses first cached model Aligned README with actual behavior
js/local-rag/README.md Claimed "TF-IDF" throughout but implementation uses raw term-frequency (no IDF) Replaced all "TF-IDF" references with "term-frequency"

5. Review Feedback Fixes — Round 3 (7 files)

File Issue Fix
python/agent-framework/README.md Troubleshooting referenced FLASK_PORT env var that doesn't exist in code Changed to --port <number> CLI flag which matches __main__.py
js/local-rag/package.json "tfidf" keyword misleading — implementation is term-frequency only Changed keyword to "term-frequency"
python/agent-framework/web.py asyncio.new_event_loop() without set_event_loop() — breaks on Python 3.10+ Added asyncio.set_event_loop(loop) after creation, clears in finally block
cs/whisper-transcription/FoundryModelService.cs EnsureModelReadyAsync lacked CancellationToken Added CancellationToken ct = default parameter, threaded through IsCachedAsync(ct), DownloadAsync(..., ct), LoadAsync(ct)
cs/whisper-transcription/TranscriptionService.cs Caller didn't pass ct to EnsureModelReadyAsync Now passes ct from TranscribeAsync
js/local-cag/src/config.js host hardcoded to "127.0.0.1" despite README documenting HOST env var Changed to process.env.HOST || "127.0.0.1"
js/local-rag/src/config.js All config values hardcoded — FOUNDRY_MODEL, PORT, HOST env vars documented but not read Added process.env.FOUNDRY_MODEL, parseInt(process.env.PORT, 10), process.env.HOST with sensible defaults

6. SDK Version Fixes

File Before After Issue
samples/js/local-cag/package.json ^0.9.0 ^0.5.1 Version 0.9.0 doesn't exist on npm
samples/js/local-rag/package.json ^0.9.0 ^0.5.1 Version 0.9.0 doesn't exist on npm
samples/js/copilot-sdk-foundry-local/package.json "latest" ^0.5.1 Unpinned — could break at any time
samples/js/chat-and-audio-foundry-local/package.json "latest" ^0.5.1 Unpinned — could break at any time
samples/js/electron-chat-application/package.json (missing) ^0.5.1 foundry-local-sdk not listed despite import in main.js
samples/python/summarize/requirements.txt >=0.3.1 >=0.5.1 Outdated min version
samples/python/hello-foundry-local/requirements.txt (file missing) Created with >=0.5.1 No requirements.txt existed at all

7. Repo Hygiene

  • SUPPORT.md — Replaced the default GitHub template (contained TODO and REPO MAINTAINER: INSERT INSTRUCTIONS HERE placeholders) with actual content pointing to GitHub Issues, docs, and samples.

Validation Performed

Check Result
SDK API correctness — validated all samples against latest SDK source in sdk/js/src, sdk_legacy/python, sdk/cs/src ✅ 7 issues fixed
Thread safety — FoundryModelService.InitializeAsync uses SemaphoreSlim ✅ Fixed
CancellationToken propagation — EnsureModelReadyAsync threads ct through all async calls ✅ Fixed
Event loop safety — web.py sets event loop for Python 3.10+ compatibility ✅ Fixed
Env var consistency — config.js files in local-cag and local-rag read documented env vars ✅ Fixed
README accuracy — all READMEs match actual implementation behavior ✅ Fixed
Security scan — searched all samples for hardcoded secrets, API keys, tokens ✅ Clean — all api_key references are programmatic
SDK version consistency — cross-referenced every dependency file against published SDK versions ✅ Fixed (7 issues resolved)
.gitignore coverage — verified no build artifacts can be committed ✅ Comprehensive
README coverage — checked every sample has documentation ✅ 35 README.md files
License files — verified legal files present ✅ All present
No TODO/FIXME in shipping code ✅ Clean
No committed build artifacts ✅ Clean
C# compile errors — checked TranscriptionService.cs and FoundryModelService.cs ✅ No errors

SDK Version Matrix (Current State)

Language Package Version Source
C# Microsoft.AI.Foundry.Local 0.9.0 Central Directory.Packages.props
C# Microsoft.AI.Foundry.Local.WinML 0.9.0 Central Directory.Packages.props
JavaScript foundry-local-sdk ^0.5.1 All sample package.json files
Python foundry-local-sdk >=0.5.1 All sample requirements.txt files
Rust foundry-local-sdk 0.1.0 Path reference to sdk/rust/

Notes

  • 4 JS samples (native-chat-completions, web-server-example, audio-transcription-example, langchain-integration-example) are intentionally single-file with no package.json — their READMEs instruct users to npm install manually.
  • Rust samples all use path = "../../../sdk/rust" which always resolves to the latest local SDK.
  • Python functioncalling notebook uses ! pip install foundry-local-sdk without version pin — standard for notebooks.

Copilot AI review requested due to automatic review settings March 23, 2026 23:37
@vercel
Copy link

vercel bot commented Mar 23, 2026

@leestott is attempting to deploy a commit to the MSFT-AIP Team on Vercel.

A member of the Team first needs to authorize it.

Copy link
Contributor

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

This PR updates the repository’s samples to be more “cache-aware” (skip redundant model downloads and provide clearer progress UX), adds several new end-to-end samples (JS local CAG/RAG, Python agent framework, C# Whisper transcription), and tightens repo hygiene/version consistency in preparation for public sharing.

Changes:

  • Added new JS offline CAG and offline RAG samples with web UIs + model init progress reporting.
  • Added a new Python “agent-framework” sample (multi-agent orchestration + Flask SSE UI) and smoke tests.
  • Updated multiple existing samples/notebooks/docs to use cache checks, clearer lifecycle steps, and pinned SDK versions (plus SUPPORT.md refresh).

Reviewed changes

Copilot reviewed 93 out of 93 changed files in this pull request and generated 7 comments.

Show a summary per file
File Description
samples/rag/rag_foundrylocal_demo.ipynb Updates notebook to use Foundry Local C# SDK lifecycle + SDK-managed endpoint.
samples/rag/README.md Documents SDK-based lifecycle and removes hardcoded endpoint/variant guidance.
samples/python/summarize/summarize.py Adds cache-aware model selection/download UX for summarize CLI.
samples/python/summarize/requirements.txt Bumps minimum foundry-local-sdk version.
samples/python/summarize/README.md Adds feature notes for cache-awareness + UX improvements.
samples/python/hello-foundry-local/src/app.py Adds cache-check + explicit lifecycle steps before streaming chat.
samples/python/hello-foundry-local/requirements.txt Adds missing requirements file with SDK + OpenAI deps.
samples/python/hello-foundry-local/README.md Adds cache-aware feature notes + clarifies run steps.
samples/python/functioncalling/fl_tools.ipynb Adds explicit lifecycle (start/cache/download/load) before tool-calling demo.
samples/python/functioncalling/README.md Fixes notebook link + adds prerequisites/features.
samples/python/agent-framework/tests/test_smoke.py Adds smoke tests for imports, doc loading, env override, demo registry.
samples/python/agent-framework/src/app/web.py Flask web UI + SSE endpoints for orchestrator + demos.
samples/python/agent-framework/src/app/tool_demo.py Standalone tool-calling validation for direct + LLM-driven tools.
samples/python/agent-framework/src/app/orchestrator.py Implements sequential/concurrent/hybrid orchestration as async generators.
samples/python/agent-framework/src/app/foundry_boot.py Bootstrapper for Foundry Local endpoint/model selection + env override.
samples/python/agent-framework/src/app/documents.py Loads/chunks local docs into retriever context.
samples/python/agent-framework/src/app/demos/weather_tools.py Adds multi-tool weather demo.
samples/python/agent-framework/src/app/demos/sentiment_analyzer.py Adds sentiment/emotion/key-phrase tools demo.
samples/python/agent-framework/src/app/demos/registry.py Central demo registry for web UI listing/routing.
samples/python/agent-framework/src/app/demos/multi_agent_debate.py Adds multi-agent debate demo.
samples/python/agent-framework/src/app/demos/math_agent.py Adds math/tools demo (includes expression evaluation).
samples/python/agent-framework/src/app/demos/code_reviewer.py Adds code review tools demo.
samples/python/agent-framework/src/app/demos/init.py Exposes demos + registry helpers for import/registration.
samples/python/agent-framework/src/app/agents.py Agent factories + shared tool functions.
samples/python/agent-framework/src/app/main.py CLI entry (web/cli modes) + orchestrator runner.
samples/python/agent-framework/src/app/init.py Defines package root.
samples/python/agent-framework/requirements.txt Declares runtime dependencies for the new sample.
samples/python/agent-framework/pyproject.toml Packaging metadata + deps + dev extras (pytest).
samples/python/agent-framework/data/orchestration_patterns.md Sample docs for retriever context.
samples/python/agent-framework/data/foundry_local_overview.md Sample docs for retriever context.
samples/python/agent-framework/data/agent_framework_guide.md Sample docs for retriever context.
samples/python/agent-framework/README.md Full sample documentation + quickstart + structure.
samples/python/agent-framework/.env.example Environment template for model/docs/log level.
samples/js/web-server-example/app.js Adds cache check + progress bar before downloading models.
samples/js/tool-calling-foundry-local/src/app.js Adds cache check + progress bar before downloading models.
samples/js/native-chat-completions/app.js Adds cache check + reusable progress bar for model download.
samples/js/local-rag/src/vectorStore.js New SQLite-backed TF store with inverted index + caching.
samples/js/local-rag/src/server.js New Express server with SSE status + chat + upload + ingestion.
samples/js/local-rag/src/prompts.js System prompts for gas-field RAG agent (full + compact).
samples/js/local-rag/src/ingest.js New ingestion script to chunk + index docs into SQLite.
samples/js/local-rag/src/config.js Config for model, chunking, paths, and server settings.
samples/js/local-rag/src/chunker.js Front-matter parsing + chunking + cosine similarity helpers.
samples/js/local-rag/src/chatEngine.js Initializes SDK/model + retrieval + streaming/non-streaming responses.
samples/js/local-rag/package.json New package manifest for local-rag sample.
samples/js/local-rag/docs/valve-inspection.md Domain doc for RAG ingestion.
samples/js/local-rag/docs/pressure-testing.md Domain doc for RAG ingestion.
samples/js/local-rag/docs/ppe-requirements.md Domain doc for RAG ingestion.
samples/js/local-rag/docs/gas-leak-detection.md Domain doc for RAG ingestion.
samples/js/local-rag/docs/emergency-shutdown.md Domain doc for RAG ingestion.
samples/js/local-rag/README.md New sample documentation (setup/ingest/architecture).
samples/js/local-cag/src/server.js New Express server for CAG sample + init status SSE.
samples/js/local-cag/src/prompts.js System prompts for gas-field CAG agent (full + compact).
samples/js/local-cag/src/modelSelector.js Auto model selection based on RAM + caching preference.
samples/js/local-cag/src/context.js Loads docs + keyword scoring + builds selected context per query.
samples/js/local-cag/src/config.js Config for model selection, RAM budget, server, and context size.
samples/js/local-cag/src/chatEngine.js Initializes SDK/model + injects preloaded context per query.
samples/js/local-cag/package.json New package manifest for local-cag sample.
samples/js/local-cag/docs/valve-inspection.md Domain doc for CAG startup context.
samples/js/local-cag/docs/pressure-testing.md Domain doc for CAG startup context.
samples/js/local-cag/docs/ppe-requirements.md Domain doc for CAG startup context.
samples/js/local-cag/docs/gas-leak-detection.md Domain doc for CAG startup context.
samples/js/local-cag/docs/emergency-shutdown.md Domain doc for CAG startup context.
samples/js/local-cag/README.md New sample documentation (setup/architecture/config).
samples/js/langchain-integration-example/app.js Adds cache check + progress bar before downloading models.
samples/js/electron-chat-application/package.json Adds missing foundry-local-sdk dependency.
samples/js/copilot-sdk-foundry-local/src/tool-calling.ts Pins SDK version + cache-aware model download.
samples/js/copilot-sdk-foundry-local/src/app.ts Pins SDK version + cache-aware model download.
samples/js/copilot-sdk-foundry-local/package.json Pins foundry-local-sdk version.
samples/js/chat-and-audio-foundry-local/package.json Pins foundry-local-sdk version.
samples/js/audio-transcription-example/app.js Adds cache check + progress bar before downloading models.
samples/cs/whisper-transcription/wwwroot/styles.css New UI styling for Whisper transcription sample.
samples/cs/whisper-transcription/wwwroot/index.html New drag/drop UI for uploading and transcribing audio.
samples/cs/whisper-transcription/wwwroot/app.js Client-side upload/transcribe/copy + health polling.
samples/cs/whisper-transcription/nuget.config Adds package source mapping for Foundry packages.
samples/cs/whisper-transcription/appsettings.json Adds Foundry config (model alias, log level).
samples/cs/whisper-transcription/WhisperTranscription.csproj New ASP.NET Core project for transcription service.
samples/cs/whisper-transcription/Services/TranscriptionService.cs Implements streaming transcription via Foundry SDK audio client.
samples/cs/whisper-transcription/Services/FoundryOptions.cs Options binding for model alias + logging.
samples/cs/whisper-transcription/Services/FoundryModelService.cs Initializes Foundry manager + cache-aware download + load.
samples/cs/whisper-transcription/README.md New sample documentation + endpoints + setup.
samples/cs/whisper-transcription/Program.cs Minimal API endpoints + swagger + error middleware.
samples/cs/whisper-transcription/Middleware/ErrorHandlingMiddleware.cs Centralized exception-to-JSON error handling.
samples/cs/whisper-transcription/Health/FoundryHealthCheck.cs Health check that validates model availability.
samples/cs/GettingStarted/src/ToolCallingFoundryLocalWebServer/Program.cs Adds explicit cache check + download progress bar.
samples/cs/GettingStarted/src/ToolCallingFoundryLocalSdk/Program.cs Adds explicit cache check + download progress bar.
samples/cs/GettingStarted/src/ModelManagementExample/Program.cs Adds explicit cache check + download progress bar.
samples/cs/GettingStarted/src/HelloFoundryLocalSdk/Program.cs Adds explicit cache check + download progress bar.
samples/cs/GettingStarted/src/FoundryLocalWebServer/Program.cs Adds explicit cache check + download progress bar.
samples/cs/GettingStarted/src/AudioTranscriptionExample/Program.cs Adds explicit cache check + download progress bar.
SUPPORT.md Replaces template placeholders with real support guidance.

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

@leestott leestott marked this pull request as draft March 24, 2026 21:38
@leestott leestott requested a review from Copilot March 24, 2026 23:42
@leestott leestott closed this Mar 24, 2026
@leestott leestott reopened this Mar 24, 2026
@leestott leestott marked this pull request as ready for review March 24, 2026 23:47
Copy link
Contributor

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

Copilot reviewed 93 out of 93 changed files in this pull request and generated 4 comments.


💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

… claims

- FoundryModelService.cs: add SemaphoreSlim for thread-safe InitializeAsync
  to prevent concurrent callers from double-initializing in ASP.NET
- summarize/README.md: align docs with code (uses first cached model,
  not phi-4-mini default)
- local-rag/README.md: replace 'TF-IDF' with 'term-frequency' throughout
  since the implementation uses raw term-frequency maps without IDF weighting
Copy link
Contributor

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

Copilot reviewed 93 out of 93 changed files in this pull request and generated 9 comments.


💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

Copy link
Contributor

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

Copilot reviewed 93 out of 93 changed files in this pull request and generated 7 comments.

Comments suppressed due to low confidence (1)

samples/python/agent-framework/src/app/web.py:177

  • api_demo_run() creates a new event loop but doesn't call asyncio.set_event_loop(loop) (and doesn't clear it). For consistency with api_run() and to avoid libraries failing due to missing current event loop, set/clear the loop in a try/finally around run_until_complete.

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

Copy link
Contributor

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

Copilot reviewed 94 out of 94 changed files in this pull request and generated 3 comments.

Comments suppressed due to low confidence (1)

samples/python/agent-framework/src/app/web.py:183

  • SSE responses for /api/demo/<demo_id>/run are returned with only mimetype="text/event-stream". For consistent real-time streaming (especially behind proxies), add the usual SSE headers (Cache-Control: no-cache, Connection: keep-alive, and optionally X-Accel-Buffering: no) to this Response as well.

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

Copy link
Contributor

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

Copilot reviewed 94 out of 94 changed files in this pull request and generated 4 comments.


💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

Copy link
Contributor

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

Copilot reviewed 94 out of 94 changed files in this pull request and generated 3 comments.


💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

Comment on lines +180 to +216
// Buffer chunks from the callback and yield them as an async iterable
const textChunks = [];
let resolve;
let done = false;

const streamPromise = this.chatClient.completeStreamingChat(messages, (chunk) => {
textChunks.push(chunk);
if (resolve) { resolve(); resolve = null; }
}).then(() => {
done = true;
if (resolve) { resolve(); resolve = null; }
});

// Yield sources metadata first
yield {
type: "sources",
data: chunks.map((c) => ({
title: c.title,
category: c.category,
docId: c.doc_id,
score: Math.round(c.score * 100) / 100,
})),
};

// Yield text chunks from the SDK streaming callback buffer
let head = 0;
while (!done || head < textChunks.length) {
if (head >= textChunks.length && !done) {
await new Promise((r) => { resolve = r; });
}
while (head < textChunks.length) {
const chunk = textChunks[head++];
const content = chunk.choices?.[0]?.delta?.content;
if (content) {
yield { type: "text", data: content };
}
}
Copy link

Copilot AI Mar 25, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

queryStream() buffers every streaming chunk into textChunks and never removes processed entries. For long responses this can grow unbounded and increase memory usage. Consider storing only the extracted delta.content strings and periodically compacting the buffer (e.g., slice/splice once head passes a threshold) to keep memory bounded.

Copilot uses AI. Check for mistakes.
Comment on lines +184 to +217
// Collect streamed chunks via callback and yield them
const chunks = [];
let resolve;
let done = false;

const promise = this.chatClient
.completeStreamingChat(messages, (chunk) => {
const content = chunk.choices?.[0]?.delta?.content;
if (content) {
chunks.push(content);
if (resolve) {
const r = resolve;
resolve = null;
r();
}
}
})
.then(() => {
done = true;
if (resolve) {
const r = resolve;
resolve = null;
r();
}
});

let index = 0;
while (!done || index < chunks.length) {
if (index < chunks.length) {
yield { type: "text", data: chunks[index++] };
} else {
await new Promise((r) => { resolve = r; });
}
}
Copy link

Copilot AI Mar 25, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

queryStream() accumulates streamed text in the chunks array but never clears already-yielded entries. Over time (or with many concurrent clients) this can lead to unnecessary memory growth. Consider using a bounded queue/compaction strategy (drop entries once yielded) so memory usage stays proportional to the largest in-flight gap, not total response size.

Copilot uses AI. Check for mistakes.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants