feat: add HuggingFace as a local-first model provider (GGUF/llama-cpp)#12976
feat: add HuggingFace as a local-first model provider (GGUF/llama-cpp)#12976Empreiteiro wants to merge 34 commits intomainfrom
Conversation
Registers HuggingFace alongside the other configurable providers in the unified model catalog, but runs models locally via langchain-huggingface's HuggingFacePipeline + transformers — no external API calls required. - Default bundled model: HuggingFaceTB/SmolLM2-360M-Instruct (~720MB), small and CPU-friendly so a fresh install can answer prompts after the first lazy download. - Catalog ships small instruct checkpoints (SmolLM2 135M/1.7B, Qwen2.5 0.5B/1.5B) plus larger gated options (Llama-3.2 1B/3B, Phi-3.5-mini). - HUGGINGFACEHUB_API_TOKEN is optional — only needed to pull gated repos. - Providers with no required variables now stay enabled by default so the HF entry surfaces without the user having to configure credentials. - New endpoints: GET /api/v1/models/huggingface/installed lists repos present in the local Hub cache, and POST /api/v1/models/huggingface/ download eagerly fetches a model via huggingface_hub.snapshot_download (reusing the user's saved token for gated downloads). Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
The HuggingFace provider failed at build time with "HUGGINGFACEHUB_API_TOKEN variable not found." even though its token is documented as optional. Root cause: apply_provider_variable_config_to_build_config unconditionally set load_from_db=True with the canonical variable key on the provider's api_key field, so the runtime tried to resolve a value that the user had never configured and raised. For *required* provider variables the behavior is unchanged. For optional ones (top-level required=False) we now only auto-install load_from_db=True when the variable is actually present in the user's globals or in the process environment; otherwise we leave the field empty so the runtime gets a None api_key (which the local HuggingFace adapter handles fine). Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
…toggle Onboarding simplification: the HuggingFace catalog now ships exactly one model (HuggingFaceTB/SmolLM2-360M-Instruct, ~720MB, fast on CPU). The Settings → Model Providers screen shows a single toggle for it, defaulting to ON. When the user flips a HuggingFace model toggle on, POST /enabled_models now schedules a background snapshot_download into the local Hub cache so the first flow invocation doesn't pay the cold-start latency. Failures are logged but never block the toggle from being saved. Strong refs to in-flight tasks live at module scope to satisfy RUF006. The unified catalog's "first 5 are default" auto-promotion now defers to explicit per-model `default=True` declarations when any are present, so HuggingFace gets exactly the bundled model on by default while the other providers keep their existing behavior. Additional HF models can still be installed via POST /api/v1/models/huggingface/download with an arbitrary repo id. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
The earlier "honour explicit default=True per model" change broke the existing TestUnifiedModelsDefaults invariants: - IBM WatsonX declares default=True on all 7 of its models, so honouring the explicit flags returned 7 defaults where the test expects ≤5. - Google Generative AI doesn't declare any explicit defaults, so the fallback path was the only one exercised — but the override still changed the contract. Revert to the original "first 5 models per provider are default" behavior. The HuggingFace onboarding goal (single bundled model toggled on by default) is satisfied automatically because HUGGINGFACE_MODELS_DETAILED now contains exactly one entry, and i=0 < 5 lands it in the default set. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
The local pipeline was triggering worker SIGSEGV on first model load on macOS arm64 + Python 3.12. The crash happened at the very start of the weight download (0% progress), which points at torch's device init path running inside a forked uvicorn worker rather than the download itself. - device=-1 — force CPU and skip MPS/CUDA negotiation, which is the most fragile leg of torch on first-import-after-fork. - low_cpu_mem_usage=True — stream weights through the model during from_pretrained instead of double-buffering them, lowering peak RAM. If the SIGSEGV still happens, the workaround is to start the server with OBJC_DISABLE_INITIALIZE_FORK_SAFETY=YES exported (a known torch+Objective-C fork-safety interaction on macOS, not specific to this adapter). Documented inline. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
The first flow run that uses the local HuggingFace provider would block the request thread for tens of seconds while transformers pulled ~720MB to ~/.cache/huggingface. Worse, on macOS arm64 + Python 3.12 the load inside a uvicorn worker can SIGSEGV on torch's device init. Pre-warming the cache during lifespan startup uses huggingface_hub.snapshot_download exclusively (no torch import), so it cannot trigger the worker SIGSEGV — and by the time the user sends the first message, the weights are already on disk and the inference path only pays the load + generate cost. - Runs as a background task; tracked alongside sync_flows_from_fs_task and mcp_init_task and cancelled on lifespan shutdown. - Skippable via LANGFLOW_SKIP_HF_DEFAULT_DOWNLOAD=true (1/yes also work). - Forwards HUGGINGFACEHUB_API_TOKEN if set in env so gated default models would still pull. - Failures are logged at warning and never block startup; the first inference call will retry the download on demand. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
…tened
The startup prefetch was triggering a server crash loop on macOS arm64:
huggingface_hub.snapshot_download itself segfaulted at 0% (parallel
download backend interacting badly with forked uvicorn workers), the
worker died, uvicorn auto-reload restarted, and the cycle repeated. The
log also showed a "4.66 GB" total because the unfiltered snapshot pulled
every weight format the repo carries (safetensors + pytorch_model.bin +
ONNX + GGML).
Two changes:
1. Flip prefetch to opt-in: LANGFLOW_PREFETCH_HF_DEFAULT=true (was
"skip" via LANGFLOW_SKIP_HF_DEFAULT_DOWNLOAD). Default is now OFF so
a fresh install never crash-loops; users who actually want the warm
cache enable it explicitly.
2. Harden download_model:
- allow_patterns restricts the snapshot to safetensors + tokenizer +
config (no pytorch_model.bin, ONNX, GGML, etc.) - typically cuts
download size by 4-6x.
- max_workers=1 serializes file fetches; the multi-thread path is
what was crashing inside the worker on macOS arm64.
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
… (GGUF)
The transformers + torch path was unsalvageable on macOS arm64 + Python
3.12: both the inference load (torch device init in a forked uvicorn
worker) and the snapshot_download parallel fetcher SIGSEGV'd, with no
Python-level recovery possible. Mitigations like device=-1,
low_cpu_mem_usage, max_workers=1, and the OBJC fork-safety env var only
narrowed the failure window without closing it.
This commit replaces the backend wholesale:
- ChatHuggingFace now produces a langchain_community ChatLlamaCpp
(llama-cpp-python under the hood). No torch import, no fork-safety
pitfall, fast on CPU thanks to quantization.
- The bundled default flips from
HuggingFaceTB/SmolLM2-360M-Instruct (~720MB safetensors)
to
bartowski/SmolLM2-360M-Instruct-GGUF, file
SmolLM2-360M-Instruct-Q4_K_M.gguf (~270MB).
Smaller download, similar quality, runs in <500MB RAM.
- download_model uses hf_hub_download for a single .gguf file (no
snapshot_download, no parallel fetcher).
- list_installed_models now filters cache entries to repos that actually
contain a .gguf file we can load.
- A small in-process cache keys ChatLlamaCpp instances by
(model_path, temperature, max_tokens) so repeat calls reuse the same
mmaped model instead of reloading.
- Filename selection: catalog overrides via GGUF_FILENAME_BY_REPO; fallback
is "<model-name>-Q4_K_M.gguf" which works for all bartowski-style repos.
llama-cpp-python is already in langflow-base[llama-cpp] (already part of
the full langflow install).
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
…allback) The single-file hf_hub_download path was still crashing uvicorn workers on macOS arm64. Two complementary fixes: 1. Disable accelerated backends at module import time. xet and hf_transfer both spawn worker threads/processes whose fork-safety is broken on this platform. Forcing the plain HTTP path is more than fast enough for ~270MB GGUFs. - HF_HUB_DISABLE_XET=1 - HF_HUB_ENABLE_HF_TRANSFER=0 - HF_HUB_DISABLE_TELEMETRY=1 (drops one more import) - HF_HUB_DISABLE_PROGRESS_BARS=1 (drops tqdm in worker context) 2. If the in-process call still raises, retry the exact same hf_hub_download in an isolated subprocess via subprocess.run. A child process that crashes can't take the parent uvicorn worker with it; the parent recovers, logs the failure, and propagates the path captured from stdout when the subprocess succeeds. 600s timeout to bound network stalls. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
llama-cpp-python lives in the [local] extra of langflow-base but wasn't included in [complete], so the langflow main install (which pulls langflow-base[complete]) shipped without it. The new HuggingFace local provider needs llama-cpp-python at runtime, so the user saw: ImportError: Could not import llama-cpp-python library. Pulling [local] into [complete] makes the full langflow install include it without bloating bare langflow-base setups (which can still skip it). For an existing dev env, install on demand: uv pip install llama-cpp-python or: uv sync --reinstall Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
The Makefile install_backend target uses 'uv sync --frozen', which only installs what's in uv.lock and ignores fresh additions to pyproject.toml. Without regenerating the lockfile, contributors running 'make run_cli' or 'make backend' wouldn't get llama-cpp-python even though [complete] now references [local]. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
The Language Model component renders the HF logo correctly because it reads icon directly from the backend's option metadata. The Agent component filters models by tool_calling=True; HF (which doesn't claim tool_calling) doesn't land in that filtered list, so the trigger falls through to providersData[*].icon — which goes through the frontend's hardcoded getProviderIcon lookup. That map didn't include HuggingFace, so it returned 'Bot' and rendered the lucide robot icon next to the HF model in the trigger. Adding HuggingFace -> "HuggingFace" to the lookup makes the trigger match the dropdown list and the Language Model component. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
…ctivate Summary - Add display_name and url to ModelMetadata so the UI can render short Ollama-style slugs (smollm2, granite3.1:2b, qwen2.5:3b, hermes3, ...) while keeping the full HF repo id as the canonical identifier used for downloads and for the in-process model cache. - Expand the bundled HuggingFace catalog from one entry to seven curated laptop-friendly GGUF models (SmolLM2, Granite 3.1 2B/8B, Qwen2.5 1.5B/3B, Hermes 3, Phi 3.5 mini). All resolve via the existing bartowski-style filename heuristic - no GGUF_FILENAME_BY_REPO entries needed. - Frontend ModelTrigger and ModelList render display_name with a fallback to name; a hover-only ExternalLink icon next to each row opens the model's HF page so the canonical repo id stays one click away. - Provider settings now show a single primary/destructive Activate / Deactivate toggle for providers that don't require credentials, replacing the redundant disabled "X Activated" + separate "Deactivate" pair the previous flow rendered for HuggingFace and Ollama-style providers.
…calmer credentialless UX
Catalog default override
- model_catalog.get_unified_models_detailed used to force the first 5
models per provider to default=True regardless of what the catalog
said. That made every laptop-friendly HF GGUF auto-enabled (and on
toggle, auto-downloaded). Now: if any catalog entry sets default=True
explicitly, only those entries stay default; legacy catalogs with no
explicit default keep the position-based first-N fallback.
- Net effect for the new HF catalog: only the bundled SmolLM2 ships as
default-on. The other six light up only when the user toggles them in
Settings.
display_name reaches the canvas dropdown
- get_language_model_options rebuilt option dicts from scratch and
dropped display_name and url from the catalog metadata. Forwarded
both onto option_metadata so build_config-sourced options carry the
short slug all the way to the trigger.
- modelInputComponent's first loop now lifts display_name and url out
of option.metadata onto the top-level ModelOption fields, matching
what the augmentation loop already did. Without this, saved options
rendered the canonical repo id even after the augmentation was fine.
- ModelSelection (the per-provider toggle list in Settings) now renders
display_name with a hover-only ExternalLink to the model's page,
mirroring the dropdown affordance.
Credentialless provider UX
- ProviderConfigurationForm no longer titles the panel with the optional
variable's name ("HuggingFace Hub API Token") for providers that have
no required credentials — that text was misleading because the
variable is genuinely optional. The header is just the provider name.
- Subtext switches to "Runs locally — no credentials needed." for the
no-required-vars branch, dropping the "Activate to enable these
models" boilerplate that didn't apply to local inference.
- The activate/deactivate control collapses to a small right-aligned
button: outline+destructive accent when enabled, primary when not.
No more full-width red banner for a toggle.
The model-card link icon next to dropdown rows was rejected as visual noise, so this removes the rendering and the unused plumbing that backed it. ``url`` comes off the ModelMetadata TypedDict, the ``create_model_metadata`` helper, the HuggingFace catalog entries, the catalog-to-options forwarder, and the ``ModelOption`` frontend type + extraction sites in modelInputComponent / ModelSelection / ModelList. ``display_name`` and the rest of the work in this branch stay.
…er api_key Wrench-on-enabled-model bug - modelInputComponent's first loop honored a sticky not_enabled_locally flag that the backend bakes into a flow's saved options when the model wasn't in the user's enabled list at the time. Once the user enabled the provider afterwards (e.g. flipped HuggingFace on), the flag stayed glued to the saved option and rendered a "Configure" wrench next to a perfectly valid selection — even though enabled_models reported the model as enabled. Now the loop detects "sticky flag set but the model is enabled right now" and strips the flag from the option's metadata before grouping. The legitimate use (model genuinely disabled in settings) still surfaces the wrench. HUGGINGFACEHUB_API_TOKEN variable not found - apply_provider_variable_config_to_build_config was leaving the api_key field's stale state intact when switching from a configured provider (OpenAI) to a credentialless one (HuggingFace) where the optional variable wasn't configured. The runtime then tried to resolve the previous provider's var name — or, in the broken case, a HuggingFace var name the user never set — and raised "<VAR> variable not found." Added an explicit cleanup branch under the existing skip_optional path: clear value and load_from_db when the field is left pointing at any unconfigured optional variable (cross-provider stale or pointing at the new provider's own unconfigured var). The user has to re-select the provider once for the cleanup to fire on already-bad flows; new selections come out clean automatically.
…rs from assistant - Add readModelDisplayName helper at utils/modelDisplay.ts and route every metadata.display_name read through it (modelInputComponent's four ModelOption-building paths, the Settings ModelSelection row, and the Assistant ModelSelector's two callsites). Removes seven copies of the same "typeof md.display_name === string ? ... : ..." check. - Simplify the stale not_enabled_locally cleanup in modelInputComponent via destructuring instead of Object.fromEntries(filter). - Hide LOCAL_INFERENCE_PROVIDERS (HuggingFace) from the Assistant's model picker so its auto-default never lands on a model the assistant code path can't run yet, and have the Assistant render display_name with a fallback to the canonical id.
…lless providers
The activate/deactivate toggle for credentialless providers (HuggingFace
local) was using machinery built for credentialed providers and did
nothing useful:
- "Deactivate" called handleDisconnect, which only deletes a saved
credential variable. HuggingFace doesn't require one and the user
hadn't set the optional token, so the handler early-returned and the
click was a silent no-op.
- "Activate" called handleActivateProvider, which created a fake
HUGGINGFACEHUB_API_TOKEN with the Ollama URL placeholder
("http://localhost:11434") just to make a credential row exist.
- The credentialless deactivate path also routed through the
destructive DisconnectWarning dialog inherited from the credentialed
branch, which is over-the-top for what is really a toggle.
Add a credentialless toggle path that operates on the model layer
instead of the credential layer:
- toggleAllProviderModels(action) batches a single POST to
/api/v1/models/enabled_models for every model the provider ships.
``deactivate`` flips them all off (default models land in
``__disabled_models__``); ``activate`` flips only catalog defaults
back on. The provider's ``is_enabled`` flag is computed from
has_active_model on the API side, so this drives the UI's Activate
vs Deactivate label.
- ProviderConfigurationForm's credentialless branch now wires its
single button to onActivateDefaultModels / onDeactivateAllModels and
no longer pops the warning dialog.
- handleActivateProvider and the form's ``onActivate`` prop are dead
code now that the only credentialless provider doesn't need them;
drop both rather than leaving misleading wiring around.
Per-model switches in the provider settings panel stayed interactive even
after the user clicked Deactivate on a credentialless provider, so
toggling an individual model on silently re-activated the whole provider
without ever going through the explicit Activate button — exactly the
state the new credentialless toggle path was meant to gate.
Two narrow changes:
- ModelProvidersContent passes ``isEnabledModel = !!is_enabled`` instead
of ``is_enabled || is_configured``. ``is_configured`` is True for
HuggingFace from the moment the provider exists (no required
credential to satisfy), so the OR was always true and the gate did
nothing for the new credentialless flow. ``is_enabled`` reflects
``has_active_model`` on the API side, which matches what the user
expects "is the provider on right now" to mean.
- ModelSelection's ModelRow now always renders the Switch but passes
``disabled={!isEnabledModel}`` rather than gating the whole render.
The user keeps a visible read of each model's saved state and the
built-in disabled styling (opacity-50 + cursor-not-allowed) makes the
inactive state obvious. Toggling is blocked until the user clicks
Activate.
…f bleeding over the panel Bring back the destructive-action confirmation for the credentialless Deactivate flow but render it inline where the toggle button used to sit, sized to its own content. The existing credentialed branch still uses the absolute inset-0 overlay because that's what its layout budgets for; the credentialless branch never had that space allocated and the overlay leaked across the form, the model list, and the panel edges. - Credentialless Deactivate button now flips ``showDisconnectWarning`` on instead of calling ``onDeactivateAllModels`` directly. Confirm fires the model-toggle path, Cancel restores the button. - The DisconnectWarning swaps in for the toggle row when shown (``ifelse`` instead of overlapping render) so the form's flex flow stays clean — no absolute positioning, no fixed h-[165px], no margin-from-the-edge gymnastics. - The credentialed branch's overlay stays put behind a ``requiresConfiguration`` guard so it doesn't render for the new credentialless path.
…rlay as Disconnect Render exactly one DisconnectWarning for both credentialed (Disconnect) and credentialless (Deactivate) paths. Same absolute inset-0 overlay, same dimensions, same animation — the only difference is the message text and which handler ``onConfirm`` calls. Eliminates the inline variant added in the previous commit so the destructive-confirm UX is consistent across providers.
…s ancestor The overlay used ``absolute inset-0 m-4 ... h-[165px]`` but the form container had no ``position`` set, so the absolute positioning resolved against a larger ancestor (the modal panel). For credentialed providers the form is tall enough that 165px sits inside the form area; for the credentialless branch the form is ~100px tall and the same overlay spilled into the per-model toggle list rendered below. - Add ``relative`` to the form's outer div so the overlay anchors to the form itself. - Drop the fixed ``h-[165px]`` and replace ``inset-0`` with ``inset-x-0 top-0``: the warning now sizes to its message + buttons via ``h-fit`` (already on DisconnectWarning) and pins to the top of the form. Credentialed and credentialless render identically — the height just adapts.
The custom DisconnectWarning was rendered as an absolutely-positioned panel laid over the form. Sizing it required guessing how tall the host form would be (the credentialed branch was tuned to ~165px; the new credentialless form is ~100px so the overlay leaked into the per-model toggle list). Repeated attempts to anchor it cleanly produced visual regressions — empty bordered box on top, content rendering through to the rows behind, second ghost frame at the bottom. Step back: use the project's standard Dialog component for both branches and stop hand-rolling overlay positioning. - Both Disconnect (credentialed) and Deactivate (credentialless) now open a centered modal Dialog with header/description/footer; same shell, branched only on the message text and which handler the Confirm button calls. - Drop the now-orphaned DisconnectWarning component and its tests — nothing else imported it.
…penAI's toggle list Two regressions to undo: 1. The previous commit replaced DisconnectWarning with a brand-new Dialog. That broke the platform-wide pattern five other providers already use — there was no reason to invent a second confirm UX. Restore DisconnectWarning, delete the Dialog imports, and route both the credentialed Disconnect and the new credentialless Deactivate through it (same overlay, same height, same styling, branched only on message text and which handler Confirm calls). 2. ModelProvidersContent's per-model gate was tightened to ``!!is_enabled`` so credentialless providers couldn't silently re-activate via toggling a single model. That regressed credentialed providers: with all models disabled, ``is_enabled`` (= has_active_model) was false, the toggle list went disabled, and the user lost the only path to re-enable any model — looking for all the world like the provider had auto-disconnected. Branch on ``requiresConfiguration`` instead so credentialed keeps the legacy ``is_configured`` gate and only credentialless gets the new ``is_enabled`` gate.
…overlay When DisconnectWarning is visible the credentialless form was still ~100px tall, so the next flex sibling (ModelSelection) sat under the overlay's bottom edge — the first model row peeked out from beneath the Cancel/Confirm buttons. Force a min-height of 200px on the form while ``showDisconnectWarning`` is true (overlay is 165px high + 32px of inset margins ≈ 197px). The form then occupies enough space for ModelSelection to slide down past the overlay automatically. Credentialed forms (OpenAI, Anthropic, ...) are already taller than 200px so the min-h is a no-op there.
…ctivated look of the others ProviderListItem treated ``is_configured`` as "active enough to badge". HuggingFace is configured for free (no required credential variable), so even after Deactivate the row stayed colored and showed the model count badge — visually it looked just like a configured cloud provider, nothing like Anthropic / Google / Watsonx / Ollama in their not-yet-set-up state. Drop the ``is_configured`` fallback for credentialless providers: their ``isActive`` collapses to ``is_enabled`` (= has any active model). When HuggingFace is deactivated the row goes grayscale + muted-text + plus icon, consistent with the rest of the unconfigured list. Also extract the provider-category list to ``utils/providerCategories`` and reuse it in ``use-enabled-models`` (which had its own copy of the same set under a different name) instead of keeping two ad-hoc string sets in sync.
…ware active flag The provider list relied on the backend's sort_key, which puts configured providers above unconfigured ones. HuggingFace is configured for free (no required credential variable), so the backend sorts it with the active providers even when the user has clicked Deactivate — visually it sat between Anthropic and OpenAI instead of alphabetically with the other not-yet-set-up providers. ProviderList now re-sorts client-side using ``isCredentiallessProvider`` to decide what counts as "active": credentialed providers keep the ``is_enabled || is_configured`` definition; credentialless providers collapse to ``is_enabled``. Active rows still come first, alphabetical within each group, so deactivated HuggingFace ends up between Google and IBM as expected.
…r metadata
Replace the hardcoded ``CREDENTIALLESS_PROVIDERS = {"HuggingFace"}`` set
with a check against the provider's own variables list, which the API
already exposes alongside is_enabled/is_configured. Mirrors the
backend's own ``has_required_vars`` heuristic in
``_validate_and_get_enabled_providers``: a provider is credentialless
when it declares no variables marked ``required: true``.
- ``isCredentiallessProvider(provider)`` now takes the provider object
and inspects ``provider.variables`` (defensive: missing/empty list
falls back to credentialed so we don't accidentally drop the legacy
``is_enabled || is_configured`` activation rule).
- ``ModelProviderInfo`` (the API type) and the modal's ``Provider``
type both gain the ``variables`` array. ``ProviderList`` forwards
it through to ``ProviderListItem``.
- Adding a new credentialless provider on the backend (e.g. another
local-inference adapter) now lights up automatically — no second
list to update on the frontend.
|
Important Review skippedAuto incremental reviews are disabled on this repository. Please check the settings in the CodeRabbit UI or the ⚙️ Run configurationConfiguration used: Path: .coderabbit.yaml Review profile: CHILL Plan: Pro Run ID: You can disable this status message by setting the Use the checkbox below for a quick retry:
WalkthroughThis PR adds local HuggingFace GGUF model support via ChangesModel Metadata & Provider Infrastructure
HuggingFace Local Model Backend
Frontend Model Display & Provider UI
Sequence DiagramsequenceDiagram
participant Frontend as Frontend App
participant API as Langflow API
participant HFHub as HuggingFace Hub
participant LocalCache as Local Cache
participant UI as UI Components
rect rgba(200, 150, 255, 0.5)
note over Frontend,UI: Model Display & Provider Classification
Frontend->>UI: Render ModelSelector/ModelList
activate UI
UI->>UI: readModelDisplayName(metadata)
UI->>UI: isCredentiallessProvider(variables)
UI->>Frontend: display_name + isEnabled state
deactivate UI
end
rect rgba(150, 200, 255, 0.5)
note over Frontend,API: Background Download Flow
Frontend->>API: POST /models/huggingface/download {model_id}
activate API
API->>API: _maybe_schedule_huggingface_downloads()
API->>API: asyncio.to_thread(download_model)
API-->>Frontend: {model_id, path} (async)
API->>HFHub: hf_hub_download (background)
HFHub->>LocalCache: Store .gguf file
deactivate API
end
rect rgba(200, 255, 150, 0.5)
note over Frontend,API: Startup Warm-Up
API->>API: _prefetch_default_huggingface_model()
activate API
API->>HFHub: Download default model (task)
HFHub->>LocalCache: Cache default .gguf
API->>API: Log completion (non-blocking)
deactivate API
end
rect rgba(255, 200, 150, 0.5)
note over Frontend,API: Provider Activation/Deactivation
Frontend->>API: updateEnabledModelsAsync({updates})
activate API
API->>API: apply_provider_variable_config (per user_id)
API->>API: _is_optional_var_configured check
API-->>Frontend: Success alert
deactivate API
end
Estimated code review effort🎯 4 (Complex) | ⏱️ ~60 minutes Important Pre-merge checks failedPlease resolve all errors before merging. Addressing warnings is optional. ❌ Failed checks (1 error, 3 warnings)
✅ Passed checks (5 passed)
✨ Finishing Touches🧪 Generate unit tests (beta)
Tip 💬 Introducing Slack Agent: The best way for teams to turn conversations into code.Slack Agent is built on CodeRabbit's deep understanding of your code, so your team can collaborate across the entire SDLC without losing context.
Built for teams:
One agent for your entire SDLC. Right inside Slack. Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out. Comment |
fec45c3 to
4e373b3
Compare
Drop granite-3.1-2b/8b, Qwen2.5-1.5b/3b, Hermes-3, and Phi-3.5 from ``HUGGINGFACE_MODELS_DETAILED`` so the providers modal shows just the bundled SmolLM2 entry. Users who want a different GGUF can still download one via ``POST /api/v1/models/huggingface/download`` with any HF repo id; the catalog only governs what's preselectable in the UI. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
… selection HuggingFace declares no required variables, so it always lands in ``enabled_providers`` even when the user hasn't configured anything. The naive ``options[0]`` / ``flatOptions[0]`` fallbacks (backend ``update_model_options_in_build_config`` and frontend ``ModelInputComponent``) therefore let HF outrank credentialed providers when no user default existed, and the auto-select effect wrote that HF entry back as the saved value — overwriting what the user actually configured. Both call sites now prefer the first option from a credentialed provider, falling back to the very first option only when *every* available provider is credentialless. Existing sticky-default behavior is untouched, so an explicit user selection still persists across re-renders and refreshes (verified by the existing ``test_update_model_options_injects_saved_value_when_missing_from_options`` and ``test_update_model_options_does_not_duplicate_saved_value_already_in_options``). Adds ``is_credentialless_provider`` in ``provider_queries`` (mirrors the existing ``isCredentiallessProvider`` frontend helper) so the rule is derived from ``MODEL_PROVIDER_METADATA`` rather than hard-coded against HuggingFace. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
The previous default (SmolLM2-360M-Instruct) wasn't fine-tuned for
tool calling. The Agent component filters its model dropdown by
``tool_calling=True``, so HF-only setups with the trimmed catalog
landed on an empty Agent dropdown and the model wasn't auto-selected.
Qwen2.5-0.5B-Instruct (Q4_K_M, ~400 MB on disk) is the smallest model
in the Qwen2.5 family that ships with tool-calling fine-tuning, so the
bundled default now satisfies the Agent's filter and the dropdown
auto-populates as expected.
Updates:
* ``DEFAULT_HUGGINGFACE_MODEL`` →
``bartowski/Qwen2.5-0.5B-Instruct-GGUF``
* ``DEFAULT_GGUF_FILENAME`` →
``Qwen2.5-0.5B-Instruct-Q4_K_M.gguf``
* Catalog entry: ``display_name="qwen2.5:0.5b"``,
``tool_calling=True``.
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Summary
llama-cpp-python+ GGUF (no torch, no transformers, fork-safe on macOS arm64).bartowski/SmolLM2-360M-Instruct-GGUF(Q4_K_M, ~270MB). Settings → Model Providers shows one toggle for it, on by default.POST /enabled_modelsschedules a single-file download so the cache is warm by the next invocation.POST /api/v1/models/huggingface/download {"model_id": "<gguf-repo-id>"}accepts any HF repo id that publishes GGUF weights.LANGFLOW_PREFETCH_HF_DEFAULT=truewarms the cache on lifespan start. Off by default.hf_hub_downloadever crashes the worker, a subprocess retry kicks in so the parent uvicorn worker survives.Summary by CodeRabbit
New Features
Improvements