feat: add HuggingFace as a local-first model provider (GGUF/llama-cpp) by Empreiteiro · Pull Request #12976 · langflow-ai/langflow

Empreiteiro · 2026-05-04T19:40:34Z

Summary

Registers HuggingFace alongside the other configurable providers, running models locally via llama-cpp-python + GGUF (no torch, no transformers, fork-safe on macOS arm64).
Onboarding ships a single bundled model — bartowski/SmolLM2-360M-Instruct-GGUF (Q4_K_M, ~270MB). Settings → Model Providers shows one toggle for it, on by default.
Toggle-on triggers background download — flipping a HuggingFace model on in POST /enabled_models schedules a single-file download so the cache is warm by the next invocation.
Add-more-models via API — POST /api/v1/models/huggingface/download {"model_id": "<gguf-repo-id>"} accepts any HF repo id that publishes GGUF weights.
Optional startup prefetch — LANGFLOW_PREFETCH_HF_DEFAULT=true warms the cache on lifespan start. Off by default.
Subprocess-isolated downloads — if hf_hub_download ever crashes the worker, a subprocess retry kicks in so the parent uvicorn worker survives.

Summary by CodeRabbit

New Features
- Added local HuggingFace model download and management capabilities with new API endpoints to list and download models.
- Introduced background task tracking for model downloads to prevent task loss.
- Added optional automatic prefetch of default HuggingFace models on startup.
Improvements
- Enhanced model display with customizable display names in model selectors.
- Improved handling of credentialless model providers.
- Better metadata and variable information for provider credentials.

Registers HuggingFace alongside the other configurable providers in the unified model catalog, but runs models locally via langchain-huggingface's HuggingFacePipeline + transformers — no external API calls required. - Default bundled model: HuggingFaceTB/SmolLM2-360M-Instruct (~720MB), small and CPU-friendly so a fresh install can answer prompts after the first lazy download. - Catalog ships small instruct checkpoints (SmolLM2 135M/1.7B, Qwen2.5 0.5B/1.5B) plus larger gated options (Llama-3.2 1B/3B, Phi-3.5-mini). - HUGGINGFACEHUB_API_TOKEN is optional — only needed to pull gated repos. - Providers with no required variables now stay enabled by default so the HF entry surfaces without the user having to configure credentials. - New endpoints: GET /api/v1/models/huggingface/installed lists repos present in the local Hub cache, and POST /api/v1/models/huggingface/ download eagerly fetches a model via huggingface_hub.snapshot_download (reusing the user's saved token for gated downloads). Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

The HuggingFace provider failed at build time with "HUGGINGFACEHUB_API_TOKEN variable not found." even though its token is documented as optional. Root cause: apply_provider_variable_config_to_build_config unconditionally set load_from_db=True with the canonical variable key on the provider's api_key field, so the runtime tried to resolve a value that the user had never configured and raised. For *required* provider variables the behavior is unchanged. For optional ones (top-level required=False) we now only auto-install load_from_db=True when the variable is actually present in the user's globals or in the process environment; otherwise we leave the field empty so the runtime gets a None api_key (which the local HuggingFace adapter handles fine). Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

…toggle Onboarding simplification: the HuggingFace catalog now ships exactly one model (HuggingFaceTB/SmolLM2-360M-Instruct, ~720MB, fast on CPU). The Settings → Model Providers screen shows a single toggle for it, defaulting to ON. When the user flips a HuggingFace model toggle on, POST /enabled_models now schedules a background snapshot_download into the local Hub cache so the first flow invocation doesn't pay the cold-start latency. Failures are logged but never block the toggle from being saved. Strong refs to in-flight tasks live at module scope to satisfy RUF006. The unified catalog's "first 5 are default" auto-promotion now defers to explicit per-model `default=True` declarations when any are present, so HuggingFace gets exactly the bundled model on by default while the other providers keep their existing behavior. Additional HF models can still be installed via POST /api/v1/models/huggingface/download with an arbitrary repo id. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

The earlier "honour explicit default=True per model" change broke the existing TestUnifiedModelsDefaults invariants: - IBM WatsonX declares default=True on all 7 of its models, so honouring the explicit flags returned 7 defaults where the test expects ≤5. - Google Generative AI doesn't declare any explicit defaults, so the fallback path was the only one exercised — but the override still changed the contract. Revert to the original "first 5 models per provider are default" behavior. The HuggingFace onboarding goal (single bundled model toggled on by default) is satisfied automatically because HUGGINGFACE_MODELS_DETAILED now contains exactly one entry, and i=0 < 5 lands it in the default set. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

The local pipeline was triggering worker SIGSEGV on first model load on macOS arm64 + Python 3.12. The crash happened at the very start of the weight download (0% progress), which points at torch's device init path running inside a forked uvicorn worker rather than the download itself. - device=-1 — force CPU and skip MPS/CUDA negotiation, which is the most fragile leg of torch on first-import-after-fork. - low_cpu_mem_usage=True — stream weights through the model during from_pretrained instead of double-buffering them, lowering peak RAM. If the SIGSEGV still happens, the workaround is to start the server with OBJC_DISABLE_INITIALIZE_FORK_SAFETY=YES exported (a known torch+Objective-C fork-safety interaction on macOS, not specific to this adapter). Documented inline. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

The first flow run that uses the local HuggingFace provider would block the request thread for tens of seconds while transformers pulled ~720MB to ~/.cache/huggingface. Worse, on macOS arm64 + Python 3.12 the load inside a uvicorn worker can SIGSEGV on torch's device init. Pre-warming the cache during lifespan startup uses huggingface_hub.snapshot_download exclusively (no torch import), so it cannot trigger the worker SIGSEGV — and by the time the user sends the first message, the weights are already on disk and the inference path only pays the load + generate cost. - Runs as a background task; tracked alongside sync_flows_from_fs_task and mcp_init_task and cancelled on lifespan shutdown. - Skippable via LANGFLOW_SKIP_HF_DEFAULT_DOWNLOAD=true (1/yes also work). - Forwards HUGGINGFACEHUB_API_TOKEN if set in env so gated default models would still pull. - Failures are logged at warning and never block startup; the first inference call will retry the download on demand. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

…tened The startup prefetch was triggering a server crash loop on macOS arm64: huggingface_hub.snapshot_download itself segfaulted at 0% (parallel download backend interacting badly with forked uvicorn workers), the worker died, uvicorn auto-reload restarted, and the cycle repeated. The log also showed a "4.66 GB" total because the unfiltered snapshot pulled every weight format the repo carries (safetensors + pytorch_model.bin + ONNX + GGML). Two changes: 1. Flip prefetch to opt-in: LANGFLOW_PREFETCH_HF_DEFAULT=true (was "skip" via LANGFLOW_SKIP_HF_DEFAULT_DOWNLOAD). Default is now OFF so a fresh install never crash-loops; users who actually want the warm cache enable it explicitly. 2. Harden download_model: - allow_patterns restricts the snapshot to safetensors + tokenizer + config (no pytorch_model.bin, ONNX, GGML, etc.) - typically cuts download size by 4-6x. - max_workers=1 serializes file fetches; the multi-thread path is what was crashing inside the worker on macOS arm64. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

… (GGUF) The transformers + torch path was unsalvageable on macOS arm64 + Python 3.12: both the inference load (torch device init in a forked uvicorn worker) and the snapshot_download parallel fetcher SIGSEGV'd, with no Python-level recovery possible. Mitigations like device=-1, low_cpu_mem_usage, max_workers=1, and the OBJC fork-safety env var only narrowed the failure window without closing it. This commit replaces the backend wholesale: - ChatHuggingFace now produces a langchain_community ChatLlamaCpp (llama-cpp-python under the hood). No torch import, no fork-safety pitfall, fast on CPU thanks to quantization. - The bundled default flips from HuggingFaceTB/SmolLM2-360M-Instruct (~720MB safetensors) to bartowski/SmolLM2-360M-Instruct-GGUF, file SmolLM2-360M-Instruct-Q4_K_M.gguf (~270MB). Smaller download, similar quality, runs in <500MB RAM. - download_model uses hf_hub_download for a single .gguf file (no snapshot_download, no parallel fetcher). - list_installed_models now filters cache entries to repos that actually contain a .gguf file we can load. - A small in-process cache keys ChatLlamaCpp instances by (model_path, temperature, max_tokens) so repeat calls reuse the same mmaped model instead of reloading. - Filename selection: catalog overrides via GGUF_FILENAME_BY_REPO; fallback is "<model-name>-Q4_K_M.gguf" which works for all bartowski-style repos. llama-cpp-python is already in langflow-base[llama-cpp] (already part of the full langflow install). Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

…allback) The single-file hf_hub_download path was still crashing uvicorn workers on macOS arm64. Two complementary fixes: 1. Disable accelerated backends at module import time. xet and hf_transfer both spawn worker threads/processes whose fork-safety is broken on this platform. Forcing the plain HTTP path is more than fast enough for ~270MB GGUFs. - HF_HUB_DISABLE_XET=1 - HF_HUB_ENABLE_HF_TRANSFER=0 - HF_HUB_DISABLE_TELEMETRY=1 (drops one more import) - HF_HUB_DISABLE_PROGRESS_BARS=1 (drops tqdm in worker context) 2. If the in-process call still raises, retry the exact same hf_hub_download in an isolated subprocess via subprocess.run. A child process that crashes can't take the parent uvicorn worker with it; the parent recovers, logs the failure, and propagates the path captured from stdout when the subprocess succeeds. 600s timeout to bound network stalls. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

llama-cpp-python lives in the [local] extra of langflow-base but wasn't included in [complete], so the langflow main install (which pulls langflow-base[complete]) shipped without it. The new HuggingFace local provider needs llama-cpp-python at runtime, so the user saw: ImportError: Could not import llama-cpp-python library. Pulling [local] into [complete] makes the full langflow install include it without bloating bare langflow-base setups (which can still skip it). For an existing dev env, install on demand: uv pip install llama-cpp-python or: uv sync --reinstall Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

The Makefile install_backend target uses 'uv sync --frozen', which only installs what's in uv.lock and ignores fresh additions to pyproject.toml. Without regenerating the lockfile, contributors running 'make run_cli' or 'make backend' wouldn't get llama-cpp-python even though [complete] now references [local]. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

The Language Model component renders the HF logo correctly because it reads icon directly from the backend's option metadata. The Agent component filters models by tool_calling=True; HF (which doesn't claim tool_calling) doesn't land in that filtered list, so the trigger falls through to providersData[*].icon — which goes through the frontend's hardcoded getProviderIcon lookup. That map didn't include HuggingFace, so it returned 'Bot' and rendered the lucide robot icon next to the HF model in the trigger. Adding HuggingFace -> "HuggingFace" to the lookup makes the trigger match the dropdown list and the Language Model component. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

…ctivate Summary - Add display_name and url to ModelMetadata so the UI can render short Ollama-style slugs (smollm2, granite3.1:2b, qwen2.5:3b, hermes3, ...) while keeping the full HF repo id as the canonical identifier used for downloads and for the in-process model cache. - Expand the bundled HuggingFace catalog from one entry to seven curated laptop-friendly GGUF models (SmolLM2, Granite 3.1 2B/8B, Qwen2.5 1.5B/3B, Hermes 3, Phi 3.5 mini). All resolve via the existing bartowski-style filename heuristic - no GGUF_FILENAME_BY_REPO entries needed. - Frontend ModelTrigger and ModelList render display_name with a fallback to name; a hover-only ExternalLink icon next to each row opens the model's HF page so the canonical repo id stays one click away. - Provider settings now show a single primary/destructive Activate / Deactivate toggle for providers that don't require credentials, replacing the redundant disabled "X Activated" + separate "Deactivate" pair the previous flow rendered for HuggingFace and Ollama-style providers.

…calmer credentialless UX Catalog default override - model_catalog.get_unified_models_detailed used to force the first 5 models per provider to default=True regardless of what the catalog said. That made every laptop-friendly HF GGUF auto-enabled (and on toggle, auto-downloaded). Now: if any catalog entry sets default=True explicitly, only those entries stay default; legacy catalogs with no explicit default keep the position-based first-N fallback. - Net effect for the new HF catalog: only the bundled SmolLM2 ships as default-on. The other six light up only when the user toggles them in Settings. display_name reaches the canvas dropdown - get_language_model_options rebuilt option dicts from scratch and dropped display_name and url from the catalog metadata. Forwarded both onto option_metadata so build_config-sourced options carry the short slug all the way to the trigger. - modelInputComponent's first loop now lifts display_name and url out of option.metadata onto the top-level ModelOption fields, matching what the augmentation loop already did. Without this, saved options rendered the canonical repo id even after the augmentation was fine. - ModelSelection (the per-provider toggle list in Settings) now renders display_name with a hover-only ExternalLink to the model's page, mirroring the dropdown affordance. Credentialless provider UX - ProviderConfigurationForm no longer titles the panel with the optional variable's name ("HuggingFace Hub API Token") for providers that have no required credentials — that text was misleading because the variable is genuinely optional. The header is just the provider name. - Subtext switches to "Runs locally — no credentials needed." for the no-required-vars branch, dropping the "Activate to enable these models" boilerplate that didn't apply to local inference. - The activate/deactivate control collapses to a small right-aligned button: outline+destructive accent when enabled, primary when not. No more full-width red banner for a toggle.

The model-card link icon next to dropdown rows was rejected as visual noise, so this removes the rendering and the unused plumbing that backed it. ``url`` comes off the ModelMetadata TypedDict, the ``create_model_metadata`` helper, the HuggingFace catalog entries, the catalog-to-options forwarder, and the ``ModelOption`` frontend type + extraction sites in modelInputComponent / ModelSelection / ModelList. ``display_name`` and the rest of the work in this branch stay.

…er api_key Wrench-on-enabled-model bug - modelInputComponent's first loop honored a sticky not_enabled_locally flag that the backend bakes into a flow's saved options when the model wasn't in the user's enabled list at the time. Once the user enabled the provider afterwards (e.g. flipped HuggingFace on), the flag stayed glued to the saved option and rendered a "Configure" wrench next to a perfectly valid selection — even though enabled_models reported the model as enabled. Now the loop detects "sticky flag set but the model is enabled right now" and strips the flag from the option's metadata before grouping. The legitimate use (model genuinely disabled in settings) still surfaces the wrench. HUGGINGFACEHUB_API_TOKEN variable not found - apply_provider_variable_config_to_build_config was leaving the api_key field's stale state intact when switching from a configured provider (OpenAI) to a credentialless one (HuggingFace) where the optional variable wasn't configured. The runtime then tried to resolve the previous provider's var name — or, in the broken case, a HuggingFace var name the user never set — and raised "<VAR> variable not found." Added an explicit cleanup branch under the existing skip_optional path: clear value and load_from_db when the field is left pointing at any unconfigured optional variable (cross-provider stale or pointing at the new provider's own unconfigured var). The user has to re-select the provider once for the cleanup to fire on already-bad flows; new selections come out clean automatically.

…rs from assistant - Add readModelDisplayName helper at utils/modelDisplay.ts and route every metadata.display_name read through it (modelInputComponent's four ModelOption-building paths, the Settings ModelSelection row, and the Assistant ModelSelector's two callsites). Removes seven copies of the same "typeof md.display_name === string ? ... : ..." check. - Simplify the stale not_enabled_locally cleanup in modelInputComponent via destructuring instead of Object.fromEntries(filter). - Hide LOCAL_INFERENCE_PROVIDERS (HuggingFace) from the Assistant's model picker so its auto-default never lands on a model the assistant code path can't run yet, and have the Assistant render display_name with a fallback to the canonical id.

…lless providers The activate/deactivate toggle for credentialless providers (HuggingFace local) was using machinery built for credentialed providers and did nothing useful: - "Deactivate" called handleDisconnect, which only deletes a saved credential variable. HuggingFace doesn't require one and the user hadn't set the optional token, so the handler early-returned and the click was a silent no-op. - "Activate" called handleActivateProvider, which created a fake HUGGINGFACEHUB_API_TOKEN with the Ollama URL placeholder ("http://localhost:11434") just to make a credential row exist. - The credentialless deactivate path also routed through the destructive DisconnectWarning dialog inherited from the credentialed branch, which is over-the-top for what is really a toggle. Add a credentialless toggle path that operates on the model layer instead of the credential layer: - toggleAllProviderModels(action) batches a single POST to /api/v1/models/enabled_models for every model the provider ships. ``deactivate`` flips them all off (default models land in ``__disabled_models__``); ``activate`` flips only catalog defaults back on. The provider's ``is_enabled`` flag is computed from has_active_model on the API side, so this drives the UI's Activate vs Deactivate label. - ProviderConfigurationForm's credentialless branch now wires its single button to onActivateDefaultModels / onDeactivateAllModels and no longer pops the warning dialog. - handleActivateProvider and the form's ``onActivate`` prop are dead code now that the only credentialless provider doesn't need them; drop both rather than leaving misleading wiring around.

Per-model switches in the provider settings panel stayed interactive even after the user clicked Deactivate on a credentialless provider, so toggling an individual model on silently re-activated the whole provider without ever going through the explicit Activate button — exactly the state the new credentialless toggle path was meant to gate. Two narrow changes: - ModelProvidersContent passes ``isEnabledModel = !!is_enabled`` instead of ``is_enabled || is_configured``. ``is_configured`` is True for HuggingFace from the moment the provider exists (no required credential to satisfy), so the OR was always true and the gate did nothing for the new credentialless flow. ``is_enabled`` reflects ``has_active_model`` on the API side, which matches what the user expects "is the provider on right now" to mean. - ModelSelection's ModelRow now always renders the Switch but passes ``disabled={!isEnabledModel}`` rather than gating the whole render. The user keeps a visible read of each model's saved state and the built-in disabled styling (opacity-50 + cursor-not-allowed) makes the inactive state obvious. Toggling is blocked until the user clicks Activate.

…f bleeding over the panel Bring back the destructive-action confirmation for the credentialless Deactivate flow but render it inline where the toggle button used to sit, sized to its own content. The existing credentialed branch still uses the absolute inset-0 overlay because that's what its layout budgets for; the credentialless branch never had that space allocated and the overlay leaked across the form, the model list, and the panel edges. - Credentialless Deactivate button now flips ``showDisconnectWarning`` on instead of calling ``onDeactivateAllModels`` directly. Confirm fires the model-toggle path, Cancel restores the button. - The DisconnectWarning swaps in for the toggle row when shown (``ifelse`` instead of overlapping render) so the form's flex flow stays clean — no absolute positioning, no fixed h-[165px], no margin-from-the-edge gymnastics. - The credentialed branch's overlay stays put behind a ``requiresConfiguration`` guard so it doesn't render for the new credentialless path.

…rlay as Disconnect Render exactly one DisconnectWarning for both credentialed (Disconnect) and credentialless (Deactivate) paths. Same absolute inset-0 overlay, same dimensions, same animation — the only difference is the message text and which handler ``onConfirm`` calls. Eliminates the inline variant added in the previous commit so the destructive-confirm UX is consistent across providers.

…s ancestor The overlay used ``absolute inset-0 m-4 ... h-[165px]`` but the form container had no ``position`` set, so the absolute positioning resolved against a larger ancestor (the modal panel). For credentialed providers the form is tall enough that 165px sits inside the form area; for the credentialless branch the form is ~100px tall and the same overlay spilled into the per-model toggle list rendered below. - Add ``relative`` to the form's outer div so the overlay anchors to the form itself. - Drop the fixed ``h-[165px]`` and replace ``inset-0`` with ``inset-x-0 top-0``: the warning now sizes to its message + buttons via ``h-fit`` (already on DisconnectWarning) and pins to the top of the form. Credentialed and credentialless render identically — the height just adapts.

The custom DisconnectWarning was rendered as an absolutely-positioned panel laid over the form. Sizing it required guessing how tall the host form would be (the credentialed branch was tuned to ~165px; the new credentialless form is ~100px so the overlay leaked into the per-model toggle list). Repeated attempts to anchor it cleanly produced visual regressions — empty bordered box on top, content rendering through to the rows behind, second ghost frame at the bottom. Step back: use the project's standard Dialog component for both branches and stop hand-rolling overlay positioning. - Both Disconnect (credentialed) and Deactivate (credentialless) now open a centered modal Dialog with header/description/footer; same shell, branched only on the message text and which handler the Confirm button calls. - Drop the now-orphaned DisconnectWarning component and its tests — nothing else imported it.

…penAI's toggle list Two regressions to undo: 1. The previous commit replaced DisconnectWarning with a brand-new Dialog. That broke the platform-wide pattern five other providers already use — there was no reason to invent a second confirm UX. Restore DisconnectWarning, delete the Dialog imports, and route both the credentialed Disconnect and the new credentialless Deactivate through it (same overlay, same height, same styling, branched only on message text and which handler Confirm calls). 2. ModelProvidersContent's per-model gate was tightened to ``!!is_enabled`` so credentialless providers couldn't silently re-activate via toggling a single model. That regressed credentialed providers: with all models disabled, ``is_enabled`` (= has_active_model) was false, the toggle list went disabled, and the user lost the only path to re-enable any model — looking for all the world like the provider had auto-disconnected. Branch on ``requiresConfiguration`` instead so credentialed keeps the legacy ``is_configured`` gate and only credentialless gets the new ``is_enabled`` gate.

…overlay When DisconnectWarning is visible the credentialless form was still ~100px tall, so the next flex sibling (ModelSelection) sat under the overlay's bottom edge — the first model row peeked out from beneath the Cancel/Confirm buttons. Force a min-height of 200px on the form while ``showDisconnectWarning`` is true (overlay is 165px high + 32px of inset margins ≈ 197px). The form then occupies enough space for ModelSelection to slide down past the overlay automatically. Credentialed forms (OpenAI, Anthropic, ...) are already taller than 200px so the min-h is a no-op there.

…ctivated look of the others ProviderListItem treated ``is_configured`` as "active enough to badge". HuggingFace is configured for free (no required credential variable), so even after Deactivate the row stayed colored and showed the model count badge — visually it looked just like a configured cloud provider, nothing like Anthropic / Google / Watsonx / Ollama in their not-yet-set-up state. Drop the ``is_configured`` fallback for credentialless providers: their ``isActive`` collapses to ``is_enabled`` (= has any active model). When HuggingFace is deactivated the row goes grayscale + muted-text + plus icon, consistent with the rest of the unconfigured list. Also extract the provider-category list to ``utils/providerCategories`` and reuse it in ``use-enabled-models`` (which had its own copy of the same set under a different name) instead of keeping two ad-hoc string sets in sync.

…ware active flag The provider list relied on the backend's sort_key, which puts configured providers above unconfigured ones. HuggingFace is configured for free (no required credential variable), so the backend sorts it with the active providers even when the user has clicked Deactivate — visually it sat between Anthropic and OpenAI instead of alphabetically with the other not-yet-set-up providers. ProviderList now re-sorts client-side using ``isCredentiallessProvider`` to decide what counts as "active": credentialed providers keep the ``is_enabled || is_configured`` definition; credentialless providers collapse to ``is_enabled``. Active rows still come first, alphabetical within each group, so deactivated HuggingFace ends up between Google and IBM as expected.

…r metadata Replace the hardcoded ``CREDENTIALLESS_PROVIDERS = {"HuggingFace"}`` set with a check against the provider's own variables list, which the API already exposes alongside is_enabled/is_configured. Mirrors the backend's own ``has_required_vars`` heuristic in ``_validate_and_get_enabled_providers``: a provider is credentialless when it declares no variables marked ``required: true``. - ``isCredentiallessProvider(provider)`` now takes the provider object and inspects ``provider.variables`` (defensive: missing/empty list falls back to credentialed so we don't accidentally drop the legacy ``is_enabled || is_configured`` activation rule). - ``ModelProviderInfo`` (the API type) and the modal's ``Provider`` type both gain the ``variables`` array. ``ProviderList`` forwards it through to ``ProviderListItem``. - Adding a new credentialless provider on the backend (e.g. another local-inference adapter) now lights up automatically — no second list to update on the frontend.

coderabbitai · 2026-05-04T19:40:55Z

Important

Review skipped

Auto incremental reviews are disabled on this repository.

Please check the settings in the CodeRabbit UI or the .coderabbit.yaml file in this repository. To trigger a single review, invoke the @coderabbitai review command.

⚙️ Run configuration

Configuration used: Path: .coderabbit.yaml

Review profile: CHILL

Plan: Pro

Run ID: 6fa7df24-d373-45fc-b6af-9eaf0639e2e9

You can disable this status message by setting the reviews.review_status to false in the CodeRabbit configuration file.

Use the checkbox below for a quick retry:

🔍 Trigger review

Walkthrough

This PR adds local HuggingFace GGUF model support via llama-cpp-python, including backend download infrastructure and API endpoints, plus frontend enhancements to display model display names and distinguish credentialless vs. credentialed providers.

Changes

Model Metadata & Provider Infrastructure

Layer / File(s)	Summary
Metadata Types `src/lfx/.../model_metadata.py`	`ModelMetadata` gains optional `display_name` field; `create_model_metadata` accepts `display_name` parameter; new `"HuggingFace"` provider entry added to `MODEL_PROVIDER_METADATA`.
Provider Variable Handling `src/lfx/.../build_config.py`	`apply_provider_variable_config_to_build_config` accepts `user_id` to determine per-user optional variable configuration; adds `_is_optional_var_configured` helper to skip/clear optional provider fields when unconfigured.
Credentials & Instantiation `src/lfx/.../credentials.py`, `src/lfx/.../instantiation.py`	HuggingFace token validation via `whoami-v2` endpoint; `get_llm` exempts both Ollama and HuggingFace from API key requirement.
Model Catalog & Registry `src/lfx/.../model_catalog.py`, `src/lfx/.../class_registry.py`, `src/lfx/.../provider_queries.py`	Model defaults now respect explicit `default=True` flag; `display_name` propagated to options; `ChatHuggingFace` registered in class imports; HuggingFace models included in catalog.
Dependency Configuration `src/backend/base/pyproject.toml`	`complete` dependency group now includes `"langflow-base[local]"` extra.

HuggingFace Local Model Backend

Layer / File(s)	Summary
GGUF Model Module `src/lfx/.../huggingface_chat_model.py`, `src/lfx/.../huggingface_constants.py`	New module implements local GGUF chat adapter using `llama-cpp-python`, with subprocess-based download fallback, model caching, and curated model catalog including one default entry.
API Endpoints & Background Tasks `src/backend/base/langflow/api/v1/models.py`	Added module-level task tracking set `_HF_INFLIGHT_DOWNLOADS`; `update_enabled_models` triggers best-effort background downloads via `_maybe_schedule_huggingface_downloads`; new `GET /models/huggingface/installed` and `POST /models/huggingface/download` endpoints with error handling.
Startup Warm-Up `src/backend/base/langflow/main.py`	Added `_prefetch_default_huggingface_model` function guarded by `LANGFLOW_PREFETCH_HF_DEFAULT`; integrated into FastAPI lifespan with task creation during startup and cancellation during shutdown.

Frontend Model Display & Provider UI

Layer / File(s)	Summary
Display Name Utilities `src/frontend/src/utils/modelDisplay.ts`, `src/frontend/src/utils/providerCategories.ts`	New `readModelDisplayName` utility extracts `display_name` from metadata; new `isCredentiallessProvider` utility classifies providers based on required variables.
Provider API Shape `src/frontend/src/controllers/API/queries/models/use-get-model-providers.ts`, `src/frontend/src/modals/modelProviderModal/components/types.ts`	`ModelProviderInfo` adds optional `variables` array for per-variable metadata; `Provider` type extended with `variables: ProviderVariableInfo[]`; HuggingFace icon mapping added.
Model Selection & Display `src/frontend/src/components/core/assistantPanel/components/model-selector.tsx`, `src/frontend/src/components/core/assistantPanel/hooks/use-enabled-models.ts`, `src/frontend/src/components/core/parameterRenderComponent/components/modelInputComponent/*`	Model display uses `readModelDisplayName(metadata)` fallback to `model_name` across selector, hooks, and input components; `ModelOption` type gains `display_name` field; sticky `not_enabled_locally` flag detection and clearing when outdated.
Provider Management UI `src/frontend/src/modals/modelProviderModal/hooks/useProviderConfiguration.ts`, `src/frontend/src/modals/modelProviderModal/components/*`	`useProviderConfiguration` replaces `handleActivateProvider` with `handleDeactivateAllModels` and `handleActivateDefaultModels` that batch-update models; provider list now credentialless-aware for sorting/active-state; `ProviderConfigurationForm` updated with "Runs locally — no credentials needed" text and new button routing; `ProviderListItem` uses credentialless logic for `isActive` determination; `ModelSelection` disables switches instead of hiding them for unavailable models.

Sequence Diagram

sequenceDiagram
    participant Frontend as Frontend App
    participant API as Langflow API
    participant HFHub as HuggingFace Hub
    participant LocalCache as Local Cache
    participant UI as UI Components

    rect rgba(200, 150, 255, 0.5)
    note over Frontend,UI: Model Display & Provider Classification
    Frontend->>UI: Render ModelSelector/ModelList
    activate UI
    UI->>UI: readModelDisplayName(metadata)
    UI->>UI: isCredentiallessProvider(variables)
    UI->>Frontend: display_name + isEnabled state
    deactivate UI
    end

    rect rgba(150, 200, 255, 0.5)
    note over Frontend,API: Background Download Flow
    Frontend->>API: POST /models/huggingface/download {model_id}
    activate API
    API->>API: _maybe_schedule_huggingface_downloads()
    API->>API: asyncio.to_thread(download_model)
    API-->>Frontend: {model_id, path} (async)
    API->>HFHub: hf_hub_download (background)
    HFHub->>LocalCache: Store .gguf file
    deactivate API
    end

    rect rgba(200, 255, 150, 0.5)
    note over Frontend,API: Startup Warm-Up
    API->>API: _prefetch_default_huggingface_model()
    activate API
    API->>HFHub: Download default model (task)
    HFHub->>LocalCache: Cache default .gguf
    API->>API: Log completion (non-blocking)
    deactivate API
    end

    rect rgba(255, 200, 150, 0.5)
    note over Frontend,API: Provider Activation/Deactivation
    Frontend->>API: updateEnabledModelsAsync({updates})
    activate API
    API->>API: apply_provider_variable_config (per user_id)
    API->>API: _is_optional_var_configured check
    API-->>Frontend: Success alert
    deactivate API
    end

Estimated code review effort

🎯 4 (Complex) | ⏱️ ~60 minutes

Important

Pre-merge checks failed

Please resolve all errors before merging. Addressing warnings is optional.

❌ Failed checks (1 error, 3 warnings)

Check name	Status	Explanation	Resolution
Test Coverage For New Implementations	❌ Error	PR introduces 567+ lines of new functionality without corresponding test coverage for ChatHuggingFace class, API endpoints, utility functions, prefetch functionality, and huggingface_constants catalog.	Add comprehensive test coverage for ChatHuggingFace class, API endpoints, helper functions, prefetch functionality, and frontend utilities following project conventions.
Docstring Coverage	⚠️ Warning	Docstring coverage is 69.44% which is insufficient. The required threshold is 80.00%.	Write docstrings for the functions missing them to satisfy the coverage threshold.
Test Quality And Coverage	⚠️ Warning	PR adds 257 lines of HuggingFace model implementation with two new async API endpoints and startup prefetch function but contains zero test coverage for any new code.	Add comprehensive test coverage: create test_huggingface_models.py for API endpoints, test_huggingface_chat_model.py for download/list functions, async tests for prefetch, and frontend tests following existing project patterns.
Test File Naming And Structure	⚠️ Warning	PR adds 436+ lines of backend code for HuggingFace model management (2 new API endpoints, background scheduler, startup prefetch) with no corresponding test files added for verification.	Add comprehensive pytest files for new API endpoints, background tasks, model caching, and error handling; include frontend tests for new provider components.

✅ Passed checks (5 passed)

Check name	Status	Explanation
Description Check	✅ Passed	Check skipped - CodeRabbit’s high-level summary is enabled.
Title check	✅ Passed	The title clearly describes the main feature: adding HuggingFace as a local-first model provider using GGUF/llama-cpp. It accurately reflects the primary change across backend and frontend components.
Linked Issues check	✅ Passed	Check skipped because no linked issues were found for this pull request.
Out of Scope Changes check	✅ Passed	Check skipped because no linked issues were found for this pull request.
Excessive Mock Usage Warning	✅ Passed	The PR does not introduce excessive mock usage. New HuggingFace functionality lacks comprehensive test coverage, avoiding mock obscuration of business logic.

✨ Finishing Touches

🧪 Generate unit tests (beta)

Create PR with unit tests
Commit unit tests in branch fix/hf-pr-improvements

Tip

💬 Introducing Slack Agent: The best way for teams to turn conversations into code.

Slack Agent is built on CodeRabbit's deep understanding of your code, so your team can collaborate across the entire SDLC without losing context.

Generate code and open pull requests
Plan features and break down work
Investigate incidents and troubleshoot customer tickets together
Automate recurring tasks and respond to alerts with triggers
Summarize progress and report instantly

Built for teams:

Shared memory across your entire org—no repeating context
Per-thread sandboxes to safely plan and execute work
Governance built-in—scoped access, auditability, and budget controls

One agent for your entire SDLC. Right inside Slack.

👉 Get started

Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share

_{Comment @coderabbitai help to get the list of available commands and usage tips.}

Drop granite-3.1-2b/8b, Qwen2.5-1.5b/3b, Hermes-3, and Phi-3.5 from ``HUGGINGFACE_MODELS_DETAILED`` so the providers modal shows just the bundled SmolLM2 entry. Users who want a different GGUF can still download one via ``POST /api/v1/models/huggingface/download`` with any HF repo id; the catalog only governs what's preselectable in the UI. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

… selection HuggingFace declares no required variables, so it always lands in ``enabled_providers`` even when the user hasn't configured anything. The naive ``options[0]`` / ``flatOptions[0]`` fallbacks (backend ``update_model_options_in_build_config`` and frontend ``ModelInputComponent``) therefore let HF outrank credentialed providers when no user default existed, and the auto-select effect wrote that HF entry back as the saved value — overwriting what the user actually configured. Both call sites now prefer the first option from a credentialed provider, falling back to the very first option only when *every* available provider is credentialless. Existing sticky-default behavior is untouched, so an explicit user selection still persists across re-renders and refreshes (verified by the existing ``test_update_model_options_injects_saved_value_when_missing_from_options`` and ``test_update_model_options_does_not_duplicate_saved_value_already_in_options``). Adds ``is_credentialless_provider`` in ``provider_queries`` (mirrors the existing ``isCredentiallessProvider`` frontend helper) so the rule is derived from ``MODEL_PROVIDER_METADATA`` rather than hard-coded against HuggingFace. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

The previous default (SmolLM2-360M-Instruct) wasn't fine-tuned for tool calling. The Agent component filters its model dropdown by ``tool_calling=True``, so HF-only setups with the trimmed catalog landed on an empty Agent dropdown and the model wasn't auto-selected. Qwen2.5-0.5B-Instruct (Q4_K_M, ~400 MB on disk) is the smallest model in the Qwen2.5 family that ships with tool-calling fine-tuning, so the bundled default now satisfies the Agent's filter and the dropdown auto-populates as expected. Updates: * ``DEFAULT_HUGGINGFACE_MODEL`` → ``bartowski/Qwen2.5-0.5B-Instruct-GGUF`` * ``DEFAULT_GGUF_FILENAME`` → ``Qwen2.5-0.5B-Instruct-Q4_K_M.gguf`` * Catalog entry: ``display_name="qwen2.5:0.5b"``, ``tool_calling=True``. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

Empreiteiro and others added 28 commits May 1, 2026 16:21

github-actions Bot added the enhancement New feature or request label May 4, 2026

github-actions Bot added enhancement New feature or request and removed enhancement New feature or request labels May 4, 2026

Empreiteiro force-pushed the fix/hf-pr-improvements branch from fec45c3 to 4e373b3 Compare May 5, 2026 17:21

github-actions Bot removed the enhancement New feature or request label May 5, 2026

[autofix.ci] apply automated fixes

7cae0dd

github-actions Bot added enhancement New feature or request and removed enhancement New feature or request labels May 5, 2026

autofix-ci Bot and others added 2 commits May 5, 2026 17:26

[autofix.ci] apply automated fixes (attempt 2/3)

bec24a2

github-actions Bot added enhancement New feature or request and removed enhancement New feature or request labels May 5, 2026

[autofix.ci] apply automated fixes

3bb597b

github-actions Bot added enhancement New feature or request and removed enhancement New feature or request labels May 5, 2026

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

feat: add HuggingFace as a local-first model provider (GGUF/llama-cpp)#12976

feat: add HuggingFace as a local-first model provider (GGUF/llama-cpp)#12976
Empreiteiro wants to merge 34 commits intomainfrom
fix/hf-pr-improvements

Empreiteiro commented May 4, 2026 •

edited by coderabbitai Bot

Loading

Uh oh!

coderabbitai Bot commented May 4, 2026 •

edited

Loading

Review skipped

Pre-merge checks failed

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

Conversation

Empreiteiro commented May 4, 2026 • edited by coderabbitai Bot Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Summary

Summary by CodeRabbit

Uh oh!

coderabbitai Bot commented May 4, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Review skipped

Walkthrough

Changes

Sequence Diagram

Estimated code review effort

Pre-merge checks failed

❌ Failed checks (1 error, 3 warnings)

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

Empreiteiro commented May 4, 2026 •

edited by coderabbitai Bot

Loading

coderabbitai Bot commented May 4, 2026 •

edited

Loading