Improve all samples with cache-awareness, add 4 new samples, fix SDK versions, and prepare repo for public sharing#546
Conversation
…ions, and prepare repo for public sharing
|
@leestott is attempting to deploy a commit to the MSFT-AIP Team on Vercel. A member of the Team first needs to authorize it. |
There was a problem hiding this comment.
Pull request overview
This PR updates the repository’s samples to be more “cache-aware” (skip redundant model downloads and provide clearer progress UX), adds several new end-to-end samples (JS local CAG/RAG, Python agent framework, C# Whisper transcription), and tightens repo hygiene/version consistency in preparation for public sharing.
Changes:
- Added new JS offline CAG and offline RAG samples with web UIs + model init progress reporting.
- Added a new Python “agent-framework” sample (multi-agent orchestration + Flask SSE UI) and smoke tests.
- Updated multiple existing samples/notebooks/docs to use cache checks, clearer lifecycle steps, and pinned SDK versions (plus SUPPORT.md refresh).
Reviewed changes
Copilot reviewed 93 out of 93 changed files in this pull request and generated 7 comments.
Show a summary per file
| File | Description |
|---|---|
| samples/rag/rag_foundrylocal_demo.ipynb | Updates notebook to use Foundry Local C# SDK lifecycle + SDK-managed endpoint. |
| samples/rag/README.md | Documents SDK-based lifecycle and removes hardcoded endpoint/variant guidance. |
| samples/python/summarize/summarize.py | Adds cache-aware model selection/download UX for summarize CLI. |
| samples/python/summarize/requirements.txt | Bumps minimum foundry-local-sdk version. |
| samples/python/summarize/README.md | Adds feature notes for cache-awareness + UX improvements. |
| samples/python/hello-foundry-local/src/app.py | Adds cache-check + explicit lifecycle steps before streaming chat. |
| samples/python/hello-foundry-local/requirements.txt | Adds missing requirements file with SDK + OpenAI deps. |
| samples/python/hello-foundry-local/README.md | Adds cache-aware feature notes + clarifies run steps. |
| samples/python/functioncalling/fl_tools.ipynb | Adds explicit lifecycle (start/cache/download/load) before tool-calling demo. |
| samples/python/functioncalling/README.md | Fixes notebook link + adds prerequisites/features. |
| samples/python/agent-framework/tests/test_smoke.py | Adds smoke tests for imports, doc loading, env override, demo registry. |
| samples/python/agent-framework/src/app/web.py | Flask web UI + SSE endpoints for orchestrator + demos. |
| samples/python/agent-framework/src/app/tool_demo.py | Standalone tool-calling validation for direct + LLM-driven tools. |
| samples/python/agent-framework/src/app/orchestrator.py | Implements sequential/concurrent/hybrid orchestration as async generators. |
| samples/python/agent-framework/src/app/foundry_boot.py | Bootstrapper for Foundry Local endpoint/model selection + env override. |
| samples/python/agent-framework/src/app/documents.py | Loads/chunks local docs into retriever context. |
| samples/python/agent-framework/src/app/demos/weather_tools.py | Adds multi-tool weather demo. |
| samples/python/agent-framework/src/app/demos/sentiment_analyzer.py | Adds sentiment/emotion/key-phrase tools demo. |
| samples/python/agent-framework/src/app/demos/registry.py | Central demo registry for web UI listing/routing. |
| samples/python/agent-framework/src/app/demos/multi_agent_debate.py | Adds multi-agent debate demo. |
| samples/python/agent-framework/src/app/demos/math_agent.py | Adds math/tools demo (includes expression evaluation). |
| samples/python/agent-framework/src/app/demos/code_reviewer.py | Adds code review tools demo. |
| samples/python/agent-framework/src/app/demos/init.py | Exposes demos + registry helpers for import/registration. |
| samples/python/agent-framework/src/app/agents.py | Agent factories + shared tool functions. |
| samples/python/agent-framework/src/app/main.py | CLI entry (web/cli modes) + orchestrator runner. |
| samples/python/agent-framework/src/app/init.py | Defines package root. |
| samples/python/agent-framework/requirements.txt | Declares runtime dependencies for the new sample. |
| samples/python/agent-framework/pyproject.toml | Packaging metadata + deps + dev extras (pytest). |
| samples/python/agent-framework/data/orchestration_patterns.md | Sample docs for retriever context. |
| samples/python/agent-framework/data/foundry_local_overview.md | Sample docs for retriever context. |
| samples/python/agent-framework/data/agent_framework_guide.md | Sample docs for retriever context. |
| samples/python/agent-framework/README.md | Full sample documentation + quickstart + structure. |
| samples/python/agent-framework/.env.example | Environment template for model/docs/log level. |
| samples/js/web-server-example/app.js | Adds cache check + progress bar before downloading models. |
| samples/js/tool-calling-foundry-local/src/app.js | Adds cache check + progress bar before downloading models. |
| samples/js/native-chat-completions/app.js | Adds cache check + reusable progress bar for model download. |
| samples/js/local-rag/src/vectorStore.js | New SQLite-backed TF store with inverted index + caching. |
| samples/js/local-rag/src/server.js | New Express server with SSE status + chat + upload + ingestion. |
| samples/js/local-rag/src/prompts.js | System prompts for gas-field RAG agent (full + compact). |
| samples/js/local-rag/src/ingest.js | New ingestion script to chunk + index docs into SQLite. |
| samples/js/local-rag/src/config.js | Config for model, chunking, paths, and server settings. |
| samples/js/local-rag/src/chunker.js | Front-matter parsing + chunking + cosine similarity helpers. |
| samples/js/local-rag/src/chatEngine.js | Initializes SDK/model + retrieval + streaming/non-streaming responses. |
| samples/js/local-rag/package.json | New package manifest for local-rag sample. |
| samples/js/local-rag/docs/valve-inspection.md | Domain doc for RAG ingestion. |
| samples/js/local-rag/docs/pressure-testing.md | Domain doc for RAG ingestion. |
| samples/js/local-rag/docs/ppe-requirements.md | Domain doc for RAG ingestion. |
| samples/js/local-rag/docs/gas-leak-detection.md | Domain doc for RAG ingestion. |
| samples/js/local-rag/docs/emergency-shutdown.md | Domain doc for RAG ingestion. |
| samples/js/local-rag/README.md | New sample documentation (setup/ingest/architecture). |
| samples/js/local-cag/src/server.js | New Express server for CAG sample + init status SSE. |
| samples/js/local-cag/src/prompts.js | System prompts for gas-field CAG agent (full + compact). |
| samples/js/local-cag/src/modelSelector.js | Auto model selection based on RAM + caching preference. |
| samples/js/local-cag/src/context.js | Loads docs + keyword scoring + builds selected context per query. |
| samples/js/local-cag/src/config.js | Config for model selection, RAM budget, server, and context size. |
| samples/js/local-cag/src/chatEngine.js | Initializes SDK/model + injects preloaded context per query. |
| samples/js/local-cag/package.json | New package manifest for local-cag sample. |
| samples/js/local-cag/docs/valve-inspection.md | Domain doc for CAG startup context. |
| samples/js/local-cag/docs/pressure-testing.md | Domain doc for CAG startup context. |
| samples/js/local-cag/docs/ppe-requirements.md | Domain doc for CAG startup context. |
| samples/js/local-cag/docs/gas-leak-detection.md | Domain doc for CAG startup context. |
| samples/js/local-cag/docs/emergency-shutdown.md | Domain doc for CAG startup context. |
| samples/js/local-cag/README.md | New sample documentation (setup/architecture/config). |
| samples/js/langchain-integration-example/app.js | Adds cache check + progress bar before downloading models. |
| samples/js/electron-chat-application/package.json | Adds missing foundry-local-sdk dependency. |
| samples/js/copilot-sdk-foundry-local/src/tool-calling.ts | Pins SDK version + cache-aware model download. |
| samples/js/copilot-sdk-foundry-local/src/app.ts | Pins SDK version + cache-aware model download. |
| samples/js/copilot-sdk-foundry-local/package.json | Pins foundry-local-sdk version. |
| samples/js/chat-and-audio-foundry-local/package.json | Pins foundry-local-sdk version. |
| samples/js/audio-transcription-example/app.js | Adds cache check + progress bar before downloading models. |
| samples/cs/whisper-transcription/wwwroot/styles.css | New UI styling for Whisper transcription sample. |
| samples/cs/whisper-transcription/wwwroot/index.html | New drag/drop UI for uploading and transcribing audio. |
| samples/cs/whisper-transcription/wwwroot/app.js | Client-side upload/transcribe/copy + health polling. |
| samples/cs/whisper-transcription/nuget.config | Adds package source mapping for Foundry packages. |
| samples/cs/whisper-transcription/appsettings.json | Adds Foundry config (model alias, log level). |
| samples/cs/whisper-transcription/WhisperTranscription.csproj | New ASP.NET Core project for transcription service. |
| samples/cs/whisper-transcription/Services/TranscriptionService.cs | Implements streaming transcription via Foundry SDK audio client. |
| samples/cs/whisper-transcription/Services/FoundryOptions.cs | Options binding for model alias + logging. |
| samples/cs/whisper-transcription/Services/FoundryModelService.cs | Initializes Foundry manager + cache-aware download + load. |
| samples/cs/whisper-transcription/README.md | New sample documentation + endpoints + setup. |
| samples/cs/whisper-transcription/Program.cs | Minimal API endpoints + swagger + error middleware. |
| samples/cs/whisper-transcription/Middleware/ErrorHandlingMiddleware.cs | Centralized exception-to-JSON error handling. |
| samples/cs/whisper-transcription/Health/FoundryHealthCheck.cs | Health check that validates model availability. |
| samples/cs/GettingStarted/src/ToolCallingFoundryLocalWebServer/Program.cs | Adds explicit cache check + download progress bar. |
| samples/cs/GettingStarted/src/ToolCallingFoundryLocalSdk/Program.cs | Adds explicit cache check + download progress bar. |
| samples/cs/GettingStarted/src/ModelManagementExample/Program.cs | Adds explicit cache check + download progress bar. |
| samples/cs/GettingStarted/src/HelloFoundryLocalSdk/Program.cs | Adds explicit cache check + download progress bar. |
| samples/cs/GettingStarted/src/FoundryLocalWebServer/Program.cs | Adds explicit cache check + download progress bar. |
| samples/cs/GettingStarted/src/AudioTranscriptionExample/Program.cs | Adds explicit cache check + download progress bar. |
| SUPPORT.md | Replaces template placeholders with real support guidance. |
💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.
samples/cs/whisper-transcription/Services/FoundryModelService.cs
Outdated
Show resolved
Hide resolved
There was a problem hiding this comment.
Pull request overview
Copilot reviewed 93 out of 93 changed files in this pull request and generated 4 comments.
💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.
samples/cs/whisper-transcription/Services/FoundryModelService.cs
Outdated
Show resolved
Hide resolved
… claims - FoundryModelService.cs: add SemaphoreSlim for thread-safe InitializeAsync to prevent concurrent callers from double-initializing in ASP.NET - summarize/README.md: align docs with code (uses first cached model, not phi-4-mini default) - local-rag/README.md: replace 'TF-IDF' with 'term-frequency' throughout since the implementation uses raw term-frequency maps without IDF weighting
There was a problem hiding this comment.
Pull request overview
Copilot reviewed 93 out of 93 changed files in this pull request and generated 9 comments.
💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.
samples/cs/whisper-transcription/Services/FoundryModelService.cs
Outdated
Show resolved
Hide resolved
samples/cs/whisper-transcription/Services/TranscriptionService.cs
Outdated
Show resolved
Hide resolved
…onToken, README accuracy
There was a problem hiding this comment.
Pull request overview
Copilot reviewed 93 out of 93 changed files in this pull request and generated 7 comments.
Comments suppressed due to low confidence (1)
samples/python/agent-framework/src/app/web.py:177
api_demo_run()creates a new event loop but doesn't callasyncio.set_event_loop(loop)(and doesn't clear it). For consistency withapi_run()and to avoid libraries failing due to missing current event loop, set/clear the loop in atry/finallyaroundrun_until_complete.
💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.
There was a problem hiding this comment.
Pull request overview
Copilot reviewed 94 out of 94 changed files in this pull request and generated 3 comments.
Comments suppressed due to low confidence (1)
samples/python/agent-framework/src/app/web.py:183
- SSE responses for
/api/demo/<demo_id>/runare returned with onlymimetype="text/event-stream". For consistent real-time streaming (especially behind proxies), add the usual SSE headers (Cache-Control: no-cache,Connection: keep-alive, and optionallyX-Accel-Buffering: no) to thisResponseas well.
💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.
There was a problem hiding this comment.
Pull request overview
Copilot reviewed 94 out of 94 changed files in this pull request and generated 4 comments.
💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.
samples/cs/whisper-transcription/Middleware/ErrorHandlingMiddleware.cs
Outdated
Show resolved
Hide resolved
There was a problem hiding this comment.
Pull request overview
Copilot reviewed 94 out of 94 changed files in this pull request and generated 3 comments.
💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.
| // Buffer chunks from the callback and yield them as an async iterable | ||
| const textChunks = []; | ||
| let resolve; | ||
| let done = false; | ||
|
|
||
| const streamPromise = this.chatClient.completeStreamingChat(messages, (chunk) => { | ||
| textChunks.push(chunk); | ||
| if (resolve) { resolve(); resolve = null; } | ||
| }).then(() => { | ||
| done = true; | ||
| if (resolve) { resolve(); resolve = null; } | ||
| }); | ||
|
|
||
| // Yield sources metadata first | ||
| yield { | ||
| type: "sources", | ||
| data: chunks.map((c) => ({ | ||
| title: c.title, | ||
| category: c.category, | ||
| docId: c.doc_id, | ||
| score: Math.round(c.score * 100) / 100, | ||
| })), | ||
| }; | ||
|
|
||
| // Yield text chunks from the SDK streaming callback buffer | ||
| let head = 0; | ||
| while (!done || head < textChunks.length) { | ||
| if (head >= textChunks.length && !done) { | ||
| await new Promise((r) => { resolve = r; }); | ||
| } | ||
| while (head < textChunks.length) { | ||
| const chunk = textChunks[head++]; | ||
| const content = chunk.choices?.[0]?.delta?.content; | ||
| if (content) { | ||
| yield { type: "text", data: content }; | ||
| } | ||
| } |
There was a problem hiding this comment.
queryStream() buffers every streaming chunk into textChunks and never removes processed entries. For long responses this can grow unbounded and increase memory usage. Consider storing only the extracted delta.content strings and periodically compacting the buffer (e.g., slice/splice once head passes a threshold) to keep memory bounded.
| // Collect streamed chunks via callback and yield them | ||
| const chunks = []; | ||
| let resolve; | ||
| let done = false; | ||
|
|
||
| const promise = this.chatClient | ||
| .completeStreamingChat(messages, (chunk) => { | ||
| const content = chunk.choices?.[0]?.delta?.content; | ||
| if (content) { | ||
| chunks.push(content); | ||
| if (resolve) { | ||
| const r = resolve; | ||
| resolve = null; | ||
| r(); | ||
| } | ||
| } | ||
| }) | ||
| .then(() => { | ||
| done = true; | ||
| if (resolve) { | ||
| const r = resolve; | ||
| resolve = null; | ||
| r(); | ||
| } | ||
| }); | ||
|
|
||
| let index = 0; | ||
| while (!done || index < chunks.length) { | ||
| if (index < chunks.length) { | ||
| yield { type: "text", data: chunks[index++] }; | ||
| } else { | ||
| await new Promise((r) => { resolve = r; }); | ||
| } | ||
| } |
There was a problem hiding this comment.
queryStream() accumulates streamed text in the chunks array but never clears already-yielded entries. Over time (or with many concurrent clients) this can lead to unnecessary memory growth. Consider using a bounded queue/compaction strategy (drop entries once yielded) so memory usage stays proportional to the largest in-flight gap, not total response size.
Summary
This PR improves every existing sample across all languages (C#, JavaScript, Python, Rust) with cache-awareness and visual feedback, adds 4 brand-new samples, fixes SDK version inconsistencies across the repo, and addresses repo hygiene issues for public sharing readiness.
93 files changed — 67 new files, 26 modified files.
What's Changed
1. New Samples (4)
samples/js/local-cag/— Context-Augmented Generation (12 files)Offline CAG-powered support agent for gas field engineers. Pre-loads domain documents (valve inspections, PPE requirements, emergency shutdown procedures, etc.) directly into the context window — no vector database, no embeddings, no retrieval pipeline needed.
samples/js/local-rag/— Retrieval-Augmented Generation (11 files)Offline RAG-powered support agent using SQLite + term-frequency vectors for document retrieval. Demonstrates the full RAG pipeline running 100% locally.
npm run ingest)samples/python/agent-framework/— Microsoft Agent Framework Integration (24 files)Full-featured agent framework sample showing Foundry Local as the LLM backend for agentic AI workflows.
samples/cs/whisper-transcription/— ASP.NET Core Whisper Transcription (13 files)Production-quality audio transcription service using Foundry Local's Whisper model via WinML.
FoundryModelService,TranscriptionService,FoundryHealthCheck2. Cache-Awareness Improvements (All Existing Samples)
Every existing sample was updated to check the local model cache before attempting downloads. This provides:
✓ Model already cachedor⏳ Downloading...C# samples updated (6 files):
AudioTranscriptionExample/Program.csFoundryLocalWebServer/Program.csHelloFoundryLocalSdk/Program.csModelManagementExample/Program.csToolCallingFoundryLocalSdk/Program.csToolCallingFoundryLocalWebServer/Program.csJavaScript samples updated (7 files):
audio-transcription-example/app.jscopilot-sdk-foundry-local/src/app.tsandsrc/tool-calling.tslangchain-integration-example/app.jsnative-chat-completions/app.jstool-calling-foundry-local/src/app.jsweb-server-example/app.jsPython samples updated (4 files):
hello-foundry-local/src/app.pysummarize/summarize.pyfunctioncalling/fl_tools.ipynbfunctioncalling/README.mdNotebooks updated (1 file):
rag/rag_foundrylocal_demo.ipynb— significant rewrite with cache detection, clearer cell structure, and improved RAG pipeline3. SDK API Correctness Fixes (7 files)
Validated all samples against the latest public SDK APIs (JS SDK
sdk/js/src, Python SDKsdk_legacy/python, C# SDKsdk/cs/src) and fixed:js/local-cag/src/modelSelector.jsselectedVariant._modelInfomodel.variants/variant.modelInfo/model.isCachedjs/local-rag/src/chatEngine.jsprogress * 100yielded 0–10000 (SDK reports 0–100)Math.round(progress)for display,progress / 100for normalized valuepython/summarize/summarize.pyload_model(cached_models[0].id)inconsistent with alias patternload_model(cached_models[0].alias)python/agent-framework/foundry_boot.pystr(m)substring match for model ID resolutionmanager.get_model_info(alias).idpython/agent-framework/web.pydrain()buffered all SSE events before yielding__anext__()loop for real-time streamingcs/whisper-transcription/TranscriptionService.csCancellationToken.NonehardcodedCancellationTokenthrough method and into all async callscs/whisper-transcription/FoundryModelService.csprogress % 10 == 0unreliable for floatMath.Floor(progress / 10)threshold bucket approach4. Review Feedback Fixes — Round 2 (3 files)
cs/whisper-transcription/FoundryModelService.csInitializeAsync()not thread-safe — concurrent ASP.NET requests could double-initializeSemaphoreSlimwith double-check locking patternpython/summarize/README.mdphi-4-minibut code uses first cached modeljs/local-rag/README.md5. Review Feedback Fixes — Round 3 (7 files)
python/agent-framework/README.mdFLASK_PORTenv var that doesn't exist in code--port <number>CLI flag which matches__main__.pyjs/local-rag/package.json"tfidf"keyword misleading — implementation is term-frequency only"term-frequency"python/agent-framework/web.pyasyncio.new_event_loop()withoutset_event_loop()— breaks on Python 3.10+asyncio.set_event_loop(loop)after creation, clears infinallyblockcs/whisper-transcription/FoundryModelService.csEnsureModelReadyAsynclackedCancellationTokenCancellationToken ct = defaultparameter, threaded throughIsCachedAsync(ct),DownloadAsync(..., ct),LoadAsync(ct)cs/whisper-transcription/TranscriptionService.cscttoEnsureModelReadyAsyncctfromTranscribeAsyncjs/local-cag/src/config.jshosthardcoded to"127.0.0.1"despite README documentingHOSTenv varprocess.env.HOST || "127.0.0.1"js/local-rag/src/config.jsFOUNDRY_MODEL,PORT,HOSTenv vars documented but not readprocess.env.FOUNDRY_MODEL,parseInt(process.env.PORT, 10),process.env.HOSTwith sensible defaults6. SDK Version Fixes
samples/js/local-cag/package.json^0.9.0^0.5.1samples/js/local-rag/package.json^0.9.0^0.5.1samples/js/copilot-sdk-foundry-local/package.json"latest"^0.5.1samples/js/chat-and-audio-foundry-local/package.json"latest"^0.5.1samples/js/electron-chat-application/package.json^0.5.1foundry-local-sdknot listed despiteimportinmain.jssamples/python/summarize/requirements.txt>=0.3.1>=0.5.1samples/python/hello-foundry-local/requirements.txt>=0.5.17. Repo Hygiene
TODOandREPO MAINTAINER: INSERT INSTRUCTIONS HEREplaceholders) with actual content pointing to GitHub Issues, docs, and samples.Validation Performed
sdk/js/src,sdk_legacy/python,sdk/cs/srcapi_keyreferences are programmaticSDK Version Matrix (Current State)
Microsoft.AI.Foundry.LocalDirectory.Packages.propsMicrosoft.AI.Foundry.Local.WinMLDirectory.Packages.propsfoundry-local-sdkpackage.jsonfilesfoundry-local-sdkrequirements.txtfilesfoundry-local-sdksdk/rust/Notes
native-chat-completions,web-server-example,audio-transcription-example,langchain-integration-example) are intentionally single-file with nopackage.json— their READMEs instruct users tonpm installmanually.path = "../../../sdk/rust"which always resolves to the latest local SDK.functioncallingnotebook uses! pip install foundry-local-sdkwithout version pin — standard for notebooks.