Skip to content

feat: add HuggingFace as a local-first model provider (GGUF/llama-cpp)#12976

Open
Empreiteiro wants to merge 34 commits intomainfrom
fix/hf-pr-improvements
Open

feat: add HuggingFace as a local-first model provider (GGUF/llama-cpp)#12976
Empreiteiro wants to merge 34 commits intomainfrom
fix/hf-pr-improvements

Conversation

@Empreiteiro
Copy link
Copy Markdown
Collaborator

@Empreiteiro Empreiteiro commented May 4, 2026

Summary

  • Registers HuggingFace alongside the other configurable providers, running models locally via llama-cpp-python + GGUF (no torch, no transformers, fork-safe on macOS arm64).
  • Onboarding ships a single bundled modelbartowski/SmolLM2-360M-Instruct-GGUF (Q4_K_M, ~270MB). Settings → Model Providers shows one toggle for it, on by default.
  • Toggle-on triggers background download — flipping a HuggingFace model on in POST /enabled_models schedules a single-file download so the cache is warm by the next invocation.
  • Add-more-models via APIPOST /api/v1/models/huggingface/download {"model_id": "<gguf-repo-id>"} accepts any HF repo id that publishes GGUF weights.
  • Optional startup prefetchLANGFLOW_PREFETCH_HF_DEFAULT=true warms the cache on lifespan start. Off by default.
  • Subprocess-isolated downloads — if hf_hub_download ever crashes the worker, a subprocess retry kicks in so the parent uvicorn worker survives.

Summary by CodeRabbit

  • New Features

    • Added local HuggingFace model download and management capabilities with new API endpoints to list and download models.
    • Introduced background task tracking for model downloads to prevent task loss.
    • Added optional automatic prefetch of default HuggingFace models on startup.
  • Improvements

    • Enhanced model display with customizable display names in model selectors.
    • Improved handling of credentialless model providers.
    • Better metadata and variable information for provider credentials.

Empreiteiro and others added 28 commits May 1, 2026 16:21
Registers HuggingFace alongside the other configurable providers in the
unified model catalog, but runs models locally via langchain-huggingface's
HuggingFacePipeline + transformers — no external API calls required.

- Default bundled model: HuggingFaceTB/SmolLM2-360M-Instruct (~720MB),
  small and CPU-friendly so a fresh install can answer prompts after the
  first lazy download.
- Catalog ships small instruct checkpoints (SmolLM2 135M/1.7B, Qwen2.5
  0.5B/1.5B) plus larger gated options (Llama-3.2 1B/3B, Phi-3.5-mini).
- HUGGINGFACEHUB_API_TOKEN is optional — only needed to pull gated repos.
- Providers with no required variables now stay enabled by default so the
  HF entry surfaces without the user having to configure credentials.
- New endpoints: GET  /api/v1/models/huggingface/installed lists repos
  present in the local Hub cache, and POST /api/v1/models/huggingface/
  download eagerly fetches a model via huggingface_hub.snapshot_download
  (reusing the user's saved token for gated downloads).

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
The HuggingFace provider failed at build time with "HUGGINGFACEHUB_API_TOKEN
variable not found." even though its token is documented as optional. Root
cause: apply_provider_variable_config_to_build_config unconditionally set
load_from_db=True with the canonical variable key on the provider's
api_key field, so the runtime tried to resolve a value that the user had
never configured and raised.

For *required* provider variables the behavior is unchanged. For optional
ones (top-level required=False) we now only auto-install load_from_db=True
when the variable is actually present in the user's globals or in the
process environment; otherwise we leave the field empty so the runtime
gets a None api_key (which the local HuggingFace adapter handles fine).

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
…toggle

Onboarding simplification: the HuggingFace catalog now ships exactly one
model (HuggingFaceTB/SmolLM2-360M-Instruct, ~720MB, fast on CPU). The
Settings → Model Providers screen shows a single toggle for it, defaulting
to ON.

When the user flips a HuggingFace model toggle on, POST /enabled_models
now schedules a background snapshot_download into the local Hub cache so
the first flow invocation doesn't pay the cold-start latency. Failures
are logged but never block the toggle from being saved. Strong refs to
in-flight tasks live at module scope to satisfy RUF006.

The unified catalog's "first 5 are default" auto-promotion now defers to
explicit per-model `default=True` declarations when any are present, so
HuggingFace gets exactly the bundled model on by default while the other
providers keep their existing behavior.

Additional HF models can still be installed via
POST /api/v1/models/huggingface/download with an arbitrary repo id.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
The earlier "honour explicit default=True per model" change broke the
existing TestUnifiedModelsDefaults invariants:
- IBM WatsonX declares default=True on all 7 of its models, so honouring
  the explicit flags returned 7 defaults where the test expects ≤5.
- Google Generative AI doesn't declare any explicit defaults, so the
  fallback path was the only one exercised — but the override still
  changed the contract.

Revert to the original "first 5 models per provider are default"
behavior. The HuggingFace onboarding goal (single bundled model toggled
on by default) is satisfied automatically because HUGGINGFACE_MODELS_DETAILED
now contains exactly one entry, and i=0 < 5 lands it in the default set.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
The local pipeline was triggering worker SIGSEGV on first model load on
macOS arm64 + Python 3.12. The crash happened at the very start of the
weight download (0% progress), which points at torch's device init path
running inside a forked uvicorn worker rather than the download itself.

- device=-1 — force CPU and skip MPS/CUDA negotiation, which is the most
  fragile leg of torch on first-import-after-fork.
- low_cpu_mem_usage=True — stream weights through the model during
  from_pretrained instead of double-buffering them, lowering peak RAM.

If the SIGSEGV still happens, the workaround is to start the server with
OBJC_DISABLE_INITIALIZE_FORK_SAFETY=YES exported (a known
torch+Objective-C fork-safety interaction on macOS, not specific to this
adapter). Documented inline.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
The first flow run that uses the local HuggingFace provider would block
the request thread for tens of seconds while transformers pulled ~720MB
to ~/.cache/huggingface. Worse, on macOS arm64 + Python 3.12 the load
inside a uvicorn worker can SIGSEGV on torch's device init.

Pre-warming the cache during lifespan startup uses
huggingface_hub.snapshot_download exclusively (no torch import), so it
cannot trigger the worker SIGSEGV — and by the time the user sends the
first message, the weights are already on disk and the inference path
only pays the load + generate cost.

- Runs as a background task; tracked alongside sync_flows_from_fs_task
  and mcp_init_task and cancelled on lifespan shutdown.
- Skippable via LANGFLOW_SKIP_HF_DEFAULT_DOWNLOAD=true (1/yes also work).
- Forwards HUGGINGFACEHUB_API_TOKEN if set in env so gated default
  models would still pull.
- Failures are logged at warning and never block startup; the first
  inference call will retry the download on demand.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
…tened

The startup prefetch was triggering a server crash loop on macOS arm64:
huggingface_hub.snapshot_download itself segfaulted at 0% (parallel
download backend interacting badly with forked uvicorn workers), the
worker died, uvicorn auto-reload restarted, and the cycle repeated. The
log also showed a "4.66 GB" total because the unfiltered snapshot pulled
every weight format the repo carries (safetensors + pytorch_model.bin +
ONNX + GGML).

Two changes:

1. Flip prefetch to opt-in: LANGFLOW_PREFETCH_HF_DEFAULT=true (was
   "skip" via LANGFLOW_SKIP_HF_DEFAULT_DOWNLOAD). Default is now OFF so
   a fresh install never crash-loops; users who actually want the warm
   cache enable it explicitly.

2. Harden download_model:
   - allow_patterns restricts the snapshot to safetensors + tokenizer +
     config (no pytorch_model.bin, ONNX, GGML, etc.) - typically cuts
     download size by 4-6x.
   - max_workers=1 serializes file fetches; the multi-thread path is
     what was crashing inside the worker on macOS arm64.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
… (GGUF)

The transformers + torch path was unsalvageable on macOS arm64 + Python
3.12: both the inference load (torch device init in a forked uvicorn
worker) and the snapshot_download parallel fetcher SIGSEGV'd, with no
Python-level recovery possible. Mitigations like device=-1,
low_cpu_mem_usage, max_workers=1, and the OBJC fork-safety env var only
narrowed the failure window without closing it.

This commit replaces the backend wholesale:

- ChatHuggingFace now produces a langchain_community ChatLlamaCpp
  (llama-cpp-python under the hood). No torch import, no fork-safety
  pitfall, fast on CPU thanks to quantization.
- The bundled default flips from
    HuggingFaceTB/SmolLM2-360M-Instruct (~720MB safetensors)
  to
    bartowski/SmolLM2-360M-Instruct-GGUF, file
    SmolLM2-360M-Instruct-Q4_K_M.gguf (~270MB).
  Smaller download, similar quality, runs in <500MB RAM.
- download_model uses hf_hub_download for a single .gguf file (no
  snapshot_download, no parallel fetcher).
- list_installed_models now filters cache entries to repos that actually
  contain a .gguf file we can load.
- A small in-process cache keys ChatLlamaCpp instances by
  (model_path, temperature, max_tokens) so repeat calls reuse the same
  mmaped model instead of reloading.
- Filename selection: catalog overrides via GGUF_FILENAME_BY_REPO; fallback
  is "<model-name>-Q4_K_M.gguf" which works for all bartowski-style repos.

llama-cpp-python is already in langflow-base[llama-cpp] (already part of
the full langflow install).

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
…allback)

The single-file hf_hub_download path was still crashing uvicorn workers
on macOS arm64. Two complementary fixes:

1. Disable accelerated backends at module import time. xet and
   hf_transfer both spawn worker threads/processes whose fork-safety is
   broken on this platform. Forcing the plain HTTP path is more than
   fast enough for ~270MB GGUFs.

   - HF_HUB_DISABLE_XET=1
   - HF_HUB_ENABLE_HF_TRANSFER=0
   - HF_HUB_DISABLE_TELEMETRY=1 (drops one more import)
   - HF_HUB_DISABLE_PROGRESS_BARS=1 (drops tqdm in worker context)

2. If the in-process call still raises, retry the exact same
   hf_hub_download in an isolated subprocess via subprocess.run. A child
   process that crashes can't take the parent uvicorn worker with it;
   the parent recovers, logs the failure, and propagates the path
   captured from stdout when the subprocess succeeds. 600s timeout to
   bound network stalls.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
llama-cpp-python lives in the [local] extra of langflow-base but wasn't
included in [complete], so the langflow main install (which pulls
langflow-base[complete]) shipped without it. The new HuggingFace local
provider needs llama-cpp-python at runtime, so the user saw:

  ImportError: Could not import llama-cpp-python library.

Pulling [local] into [complete] makes the full langflow install include
it without bloating bare langflow-base setups (which can still skip it).

For an existing dev env, install on demand:
  uv pip install llama-cpp-python
or:
  uv sync --reinstall

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
The Makefile install_backend target uses 'uv sync --frozen', which only
installs what's in uv.lock and ignores fresh additions to pyproject.toml.
Without regenerating the lockfile, contributors running 'make run_cli'
or 'make backend' wouldn't get llama-cpp-python even though [complete]
now references [local].

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
The Language Model component renders the HF logo correctly because it
reads icon directly from the backend's option metadata. The Agent
component filters models by tool_calling=True; HF (which doesn't claim
tool_calling) doesn't land in that filtered list, so the trigger falls
through to providersData[*].icon — which goes through the frontend's
hardcoded getProviderIcon lookup. That map didn't include HuggingFace,
so it returned 'Bot' and rendered the lucide robot icon next to the
HF model in the trigger.

Adding HuggingFace -> "HuggingFace" to the lookup makes the trigger
match the dropdown list and the Language Model component.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
…ctivate

Summary
- Add display_name and url to ModelMetadata so the UI can render short
  Ollama-style slugs (smollm2, granite3.1:2b, qwen2.5:3b, hermes3, ...)
  while keeping the full HF repo id as the canonical identifier used for
  downloads and for the in-process model cache.
- Expand the bundled HuggingFace catalog from one entry to seven curated
  laptop-friendly GGUF models (SmolLM2, Granite 3.1 2B/8B, Qwen2.5 1.5B/3B,
  Hermes 3, Phi 3.5 mini). All resolve via the existing bartowski-style
  filename heuristic - no GGUF_FILENAME_BY_REPO entries needed.
- Frontend ModelTrigger and ModelList render display_name with a fallback
  to name; a hover-only ExternalLink icon next to each row opens the
  model's HF page so the canonical repo id stays one click away.
- Provider settings now show a single primary/destructive Activate /
  Deactivate toggle for providers that don't require credentials,
  replacing the redundant disabled "X Activated" + separate "Deactivate"
  pair the previous flow rendered for HuggingFace and Ollama-style
  providers.
…calmer credentialless UX

Catalog default override
- model_catalog.get_unified_models_detailed used to force the first 5
  models per provider to default=True regardless of what the catalog
  said. That made every laptop-friendly HF GGUF auto-enabled (and on
  toggle, auto-downloaded). Now: if any catalog entry sets default=True
  explicitly, only those entries stay default; legacy catalogs with no
  explicit default keep the position-based first-N fallback.
- Net effect for the new HF catalog: only the bundled SmolLM2 ships as
  default-on. The other six light up only when the user toggles them in
  Settings.

display_name reaches the canvas dropdown
- get_language_model_options rebuilt option dicts from scratch and
  dropped display_name and url from the catalog metadata. Forwarded
  both onto option_metadata so build_config-sourced options carry the
  short slug all the way to the trigger.
- modelInputComponent's first loop now lifts display_name and url out
  of option.metadata onto the top-level ModelOption fields, matching
  what the augmentation loop already did. Without this, saved options
  rendered the canonical repo id even after the augmentation was fine.
- ModelSelection (the per-provider toggle list in Settings) now renders
  display_name with a hover-only ExternalLink to the model's page,
  mirroring the dropdown affordance.

Credentialless provider UX
- ProviderConfigurationForm no longer titles the panel with the optional
  variable's name ("HuggingFace Hub API Token") for providers that have
  no required credentials — that text was misleading because the
  variable is genuinely optional. The header is just the provider name.
- Subtext switches to "Runs locally — no credentials needed." for the
  no-required-vars branch, dropping the "Activate to enable these
  models" boilerplate that didn't apply to local inference.
- The activate/deactivate control collapses to a small right-aligned
  button: outline+destructive accent when enabled, primary when not.
  No more full-width red banner for a toggle.
The model-card link icon next to dropdown rows was rejected as visual
noise, so this removes the rendering and the unused plumbing that
backed it. ``url`` comes off the ModelMetadata TypedDict, the
``create_model_metadata`` helper, the HuggingFace catalog entries, the
catalog-to-options forwarder, and the ``ModelOption`` frontend type +
extraction sites in modelInputComponent / ModelSelection / ModelList.
``display_name`` and the rest of the work in this branch stay.
…er api_key

Wrench-on-enabled-model bug
- modelInputComponent's first loop honored a sticky not_enabled_locally
  flag that the backend bakes into a flow's saved options when the
  model wasn't in the user's enabled list at the time. Once the user
  enabled the provider afterwards (e.g. flipped HuggingFace on), the
  flag stayed glued to the saved option and rendered a "Configure"
  wrench next to a perfectly valid selection — even though
  enabled_models reported the model as enabled. Now the loop detects
  "sticky flag set but the model is enabled right now" and strips the
  flag from the option's metadata before grouping. The legitimate use
  (model genuinely disabled in settings) still surfaces the wrench.

HUGGINGFACEHUB_API_TOKEN variable not found
- apply_provider_variable_config_to_build_config was leaving the
  api_key field's stale state intact when switching from a configured
  provider (OpenAI) to a credentialless one (HuggingFace) where the
  optional variable wasn't configured. The runtime then tried to
  resolve the previous provider's var name — or, in the broken case,
  a HuggingFace var name the user never set — and raised
  "<VAR> variable not found." Added an explicit cleanup branch under
  the existing skip_optional path: clear value and load_from_db when
  the field is left pointing at any unconfigured optional variable
  (cross-provider stale or pointing at the new provider's own
  unconfigured var). The user has to re-select the provider once for
  the cleanup to fire on already-bad flows; new selections come out
  clean automatically.
…rs from assistant

- Add readModelDisplayName helper at utils/modelDisplay.ts and route every
  metadata.display_name read through it (modelInputComponent's four
  ModelOption-building paths, the Settings ModelSelection row, and the
  Assistant ModelSelector's two callsites). Removes seven copies of the
  same "typeof md.display_name === string ? ... : ..." check.
- Simplify the stale not_enabled_locally cleanup in modelInputComponent
  via destructuring instead of Object.fromEntries(filter).
- Hide LOCAL_INFERENCE_PROVIDERS (HuggingFace) from the Assistant's model
  picker so its auto-default never lands on a model the assistant code
  path can't run yet, and have the Assistant render display_name with a
  fallback to the canonical id.
…lless providers

The activate/deactivate toggle for credentialless providers (HuggingFace
local) was using machinery built for credentialed providers and did
nothing useful:

- "Deactivate" called handleDisconnect, which only deletes a saved
  credential variable. HuggingFace doesn't require one and the user
  hadn't set the optional token, so the handler early-returned and the
  click was a silent no-op.
- "Activate" called handleActivateProvider, which created a fake
  HUGGINGFACEHUB_API_TOKEN with the Ollama URL placeholder
  ("http://localhost:11434") just to make a credential row exist.
- The credentialless deactivate path also routed through the
  destructive DisconnectWarning dialog inherited from the credentialed
  branch, which is over-the-top for what is really a toggle.

Add a credentialless toggle path that operates on the model layer
instead of the credential layer:

- toggleAllProviderModels(action) batches a single POST to
  /api/v1/models/enabled_models for every model the provider ships.
  ``deactivate`` flips them all off (default models land in
  ``__disabled_models__``); ``activate`` flips only catalog defaults
  back on. The provider's ``is_enabled`` flag is computed from
  has_active_model on the API side, so this drives the UI's Activate
  vs Deactivate label.
- ProviderConfigurationForm's credentialless branch now wires its
  single button to onActivateDefaultModels / onDeactivateAllModels and
  no longer pops the warning dialog.
- handleActivateProvider and the form's ``onActivate`` prop are dead
  code now that the only credentialless provider doesn't need them;
  drop both rather than leaving misleading wiring around.
Per-model switches in the provider settings panel stayed interactive even
after the user clicked Deactivate on a credentialless provider, so
toggling an individual model on silently re-activated the whole provider
without ever going through the explicit Activate button — exactly the
state the new credentialless toggle path was meant to gate.

Two narrow changes:
- ModelProvidersContent passes ``isEnabledModel = !!is_enabled`` instead
  of ``is_enabled || is_configured``. ``is_configured`` is True for
  HuggingFace from the moment the provider exists (no required
  credential to satisfy), so the OR was always true and the gate did
  nothing for the new credentialless flow. ``is_enabled`` reflects
  ``has_active_model`` on the API side, which matches what the user
  expects "is the provider on right now" to mean.
- ModelSelection's ModelRow now always renders the Switch but passes
  ``disabled={!isEnabledModel}`` rather than gating the whole render.
  The user keeps a visible read of each model's saved state and the
  built-in disabled styling (opacity-50 + cursor-not-allowed) makes the
  inactive state obvious. Toggling is blocked until the user clicks
  Activate.
…f bleeding over the panel

Bring back the destructive-action confirmation for the credentialless
Deactivate flow but render it inline where the toggle button used to
sit, sized to its own content. The existing credentialed branch still
uses the absolute inset-0 overlay because that's what its layout
budgets for; the credentialless branch never had that space allocated
and the overlay leaked across the form, the model list, and the panel
edges.

- Credentialless Deactivate button now flips ``showDisconnectWarning``
  on instead of calling ``onDeactivateAllModels`` directly. Confirm
  fires the model-toggle path, Cancel restores the button.
- The DisconnectWarning swaps in for the toggle row when shown
  (``ifelse`` instead of overlapping render) so the form's flex flow
  stays clean — no absolute positioning, no fixed h-[165px], no
  margin-from-the-edge gymnastics.
- The credentialed branch's overlay stays put behind a
  ``requiresConfiguration`` guard so it doesn't render for the new
  credentialless path.
…rlay as Disconnect

Render exactly one DisconnectWarning for both credentialed (Disconnect)
and credentialless (Deactivate) paths. Same absolute inset-0 overlay,
same dimensions, same animation — the only difference is the message
text and which handler ``onConfirm`` calls. Eliminates the inline
variant added in the previous commit so the destructive-confirm UX is
consistent across providers.
…s ancestor

The overlay used ``absolute inset-0 m-4 ... h-[165px]`` but the form
container had no ``position`` set, so the absolute positioning resolved
against a larger ancestor (the modal panel). For credentialed providers
the form is tall enough that 165px sits inside the form area; for the
credentialless branch the form is ~100px tall and the same overlay
spilled into the per-model toggle list rendered below.

- Add ``relative`` to the form's outer div so the overlay anchors to the
  form itself.
- Drop the fixed ``h-[165px]`` and replace ``inset-0`` with
  ``inset-x-0 top-0``: the warning now sizes to its message + buttons
  via ``h-fit`` (already on DisconnectWarning) and pins to the top of
  the form. Credentialed and credentialless render identically — the
  height just adapts.
The custom DisconnectWarning was rendered as an absolutely-positioned
panel laid over the form. Sizing it required guessing how tall the host
form would be (the credentialed branch was tuned to ~165px; the new
credentialless form is ~100px so the overlay leaked into the per-model
toggle list). Repeated attempts to anchor it cleanly produced visual
regressions — empty bordered box on top, content rendering through to
the rows behind, second ghost frame at the bottom.

Step back: use the project's standard Dialog component for both branches
and stop hand-rolling overlay positioning.

- Both Disconnect (credentialed) and Deactivate (credentialless) now
  open a centered modal Dialog with header/description/footer; same
  shell, branched only on the message text and which handler the
  Confirm button calls.
- Drop the now-orphaned DisconnectWarning component and its tests —
  nothing else imported it.
…penAI's toggle list

Two regressions to undo:

1. The previous commit replaced DisconnectWarning with a brand-new
   Dialog. That broke the platform-wide pattern five other providers
   already use — there was no reason to invent a second confirm UX.
   Restore DisconnectWarning, delete the Dialog imports, and route both
   the credentialed Disconnect and the new credentialless Deactivate
   through it (same overlay, same height, same styling, branched only on
   message text and which handler Confirm calls).

2. ModelProvidersContent's per-model gate was tightened to
   ``!!is_enabled`` so credentialless providers couldn't silently
   re-activate via toggling a single model. That regressed credentialed
   providers: with all models disabled, ``is_enabled`` (=
   has_active_model) was false, the toggle list went disabled, and the
   user lost the only path to re-enable any model — looking for all the
   world like the provider had auto-disconnected. Branch on
   ``requiresConfiguration`` instead so credentialed keeps the legacy
   ``is_configured`` gate and only credentialless gets the new
   ``is_enabled`` gate.
…overlay

When DisconnectWarning is visible the credentialless form was still
~100px tall, so the next flex sibling (ModelSelection) sat under the
overlay's bottom edge — the first model row peeked out from beneath
the Cancel/Confirm buttons.

Force a min-height of 200px on the form while ``showDisconnectWarning``
is true (overlay is 165px high + 32px of inset margins ≈ 197px). The
form then occupies enough space for ModelSelection to slide down past
the overlay automatically. Credentialed forms (OpenAI, Anthropic, ...)
are already taller than 200px so the min-h is a no-op there.
…ctivated look of the others

ProviderListItem treated ``is_configured`` as "active enough to badge".
HuggingFace is configured for free (no required credential variable),
so even after Deactivate the row stayed colored and showed the model
count badge — visually it looked just like a configured cloud provider,
nothing like Anthropic / Google / Watsonx / Ollama in their
not-yet-set-up state.

Drop the ``is_configured`` fallback for credentialless providers: their
``isActive`` collapses to ``is_enabled`` (= has any active model). When
HuggingFace is deactivated the row goes grayscale + muted-text + plus
icon, consistent with the rest of the unconfigured list.

Also extract the provider-category list to ``utils/providerCategories``
and reuse it in ``use-enabled-models`` (which had its own copy of the
same set under a different name) instead of keeping two ad-hoc string
sets in sync.
…ware active flag

The provider list relied on the backend's sort_key, which puts
configured providers above unconfigured ones. HuggingFace is
configured for free (no required credential variable), so the backend
sorts it with the active providers even when the user has clicked
Deactivate — visually it sat between Anthropic and OpenAI instead of
alphabetically with the other not-yet-set-up providers.

ProviderList now re-sorts client-side using ``isCredentiallessProvider``
to decide what counts as "active": credentialed providers keep the
``is_enabled || is_configured`` definition; credentialless providers
collapse to ``is_enabled``. Active rows still come first, alphabetical
within each group, so deactivated HuggingFace ends up between Google
and IBM as expected.
…r metadata

Replace the hardcoded ``CREDENTIALLESS_PROVIDERS = {"HuggingFace"}`` set
with a check against the provider's own variables list, which the API
already exposes alongside is_enabled/is_configured. Mirrors the
backend's own ``has_required_vars`` heuristic in
``_validate_and_get_enabled_providers``: a provider is credentialless
when it declares no variables marked ``required: true``.

- ``isCredentiallessProvider(provider)`` now takes the provider object
  and inspects ``provider.variables`` (defensive: missing/empty list
  falls back to credentialed so we don't accidentally drop the legacy
  ``is_enabled || is_configured`` activation rule).
- ``ModelProviderInfo`` (the API type) and the modal's ``Provider``
  type both gain the ``variables`` array. ``ProviderList`` forwards
  it through to ``ProviderListItem``.
- Adding a new credentialless provider on the backend (e.g. another
  local-inference adapter) now lights up automatically — no second
  list to update on the frontend.
@coderabbitai
Copy link
Copy Markdown
Contributor

coderabbitai Bot commented May 4, 2026

Important

Review skipped

Auto incremental reviews are disabled on this repository.

Please check the settings in the CodeRabbit UI or the .coderabbit.yaml file in this repository. To trigger a single review, invoke the @coderabbitai review command.

⚙️ Run configuration

Configuration used: Path: .coderabbit.yaml

Review profile: CHILL

Plan: Pro

Run ID: 6fa7df24-d373-45fc-b6af-9eaf0639e2e9

You can disable this status message by setting the reviews.review_status to false in the CodeRabbit configuration file.

Use the checkbox below for a quick retry:

  • 🔍 Trigger review

Walkthrough

This PR adds local HuggingFace GGUF model support via llama-cpp-python, including backend download infrastructure and API endpoints, plus frontend enhancements to display model display names and distinguish credentialless vs. credentialed providers.

Changes

Model Metadata & Provider Infrastructure

Layer / File(s) Summary
Metadata Types
src/lfx/.../model_metadata.py
ModelMetadata gains optional display_name field; create_model_metadata accepts display_name parameter; new "HuggingFace" provider entry added to MODEL_PROVIDER_METADATA.
Provider Variable Handling
src/lfx/.../build_config.py
apply_provider_variable_config_to_build_config accepts user_id to determine per-user optional variable configuration; adds _is_optional_var_configured helper to skip/clear optional provider fields when unconfigured.
Credentials & Instantiation
src/lfx/.../credentials.py, src/lfx/.../instantiation.py
HuggingFace token validation via whoami-v2 endpoint; get_llm exempts both Ollama and HuggingFace from API key requirement.
Model Catalog & Registry
src/lfx/.../model_catalog.py, src/lfx/.../class_registry.py, src/lfx/.../provider_queries.py
Model defaults now respect explicit default=True flag; display_name propagated to options; ChatHuggingFace registered in class imports; HuggingFace models included in catalog.
Dependency Configuration
src/backend/base/pyproject.toml
complete dependency group now includes "langflow-base[local]" extra.

HuggingFace Local Model Backend

Layer / File(s) Summary
GGUF Model Module
src/lfx/.../huggingface_chat_model.py, src/lfx/.../huggingface_constants.py
New module implements local GGUF chat adapter using llama-cpp-python, with subprocess-based download fallback, model caching, and curated model catalog including one default entry.
API Endpoints & Background Tasks
src/backend/base/langflow/api/v1/models.py
Added module-level task tracking set _HF_INFLIGHT_DOWNLOADS; update_enabled_models triggers best-effort background downloads via _maybe_schedule_huggingface_downloads; new GET /models/huggingface/installed and POST /models/huggingface/download endpoints with error handling.
Startup Warm-Up
src/backend/base/langflow/main.py
Added _prefetch_default_huggingface_model function guarded by LANGFLOW_PREFETCH_HF_DEFAULT; integrated into FastAPI lifespan with task creation during startup and cancellation during shutdown.

Frontend Model Display & Provider UI

Layer / File(s) Summary
Display Name Utilities
src/frontend/src/utils/modelDisplay.ts, src/frontend/src/utils/providerCategories.ts
New readModelDisplayName utility extracts display_name from metadata; new isCredentiallessProvider utility classifies providers based on required variables.
Provider API Shape
src/frontend/src/controllers/API/queries/models/use-get-model-providers.ts, src/frontend/src/modals/modelProviderModal/components/types.ts
ModelProviderInfo adds optional variables array for per-variable metadata; Provider type extended with variables: ProviderVariableInfo[]; HuggingFace icon mapping added.
Model Selection & Display
src/frontend/src/components/core/assistantPanel/components/model-selector.tsx, src/frontend/src/components/core/assistantPanel/hooks/use-enabled-models.ts, src/frontend/src/components/core/parameterRenderComponent/components/modelInputComponent/*
Model display uses readModelDisplayName(metadata) fallback to model_name across selector, hooks, and input components; ModelOption type gains display_name field; sticky not_enabled_locally flag detection and clearing when outdated.
Provider Management UI
src/frontend/src/modals/modelProviderModal/hooks/useProviderConfiguration.ts, src/frontend/src/modals/modelProviderModal/components/*
useProviderConfiguration replaces handleActivateProvider with handleDeactivateAllModels and handleActivateDefaultModels that batch-update models; provider list now credentialless-aware for sorting/active-state; ProviderConfigurationForm updated with "Runs locally — no credentials needed" text and new button routing; ProviderListItem uses credentialless logic for isActive determination; ModelSelection disables switches instead of hiding them for unavailable models.

Sequence Diagram

sequenceDiagram
    participant Frontend as Frontend App
    participant API as Langflow API
    participant HFHub as HuggingFace Hub
    participant LocalCache as Local Cache
    participant UI as UI Components

    rect rgba(200, 150, 255, 0.5)
    note over Frontend,UI: Model Display & Provider Classification
    Frontend->>UI: Render ModelSelector/ModelList
    activate UI
    UI->>UI: readModelDisplayName(metadata)
    UI->>UI: isCredentiallessProvider(variables)
    UI->>Frontend: display_name + isEnabled state
    deactivate UI
    end

    rect rgba(150, 200, 255, 0.5)
    note over Frontend,API: Background Download Flow
    Frontend->>API: POST /models/huggingface/download {model_id}
    activate API
    API->>API: _maybe_schedule_huggingface_downloads()
    API->>API: asyncio.to_thread(download_model)
    API-->>Frontend: {model_id, path} (async)
    API->>HFHub: hf_hub_download (background)
    HFHub->>LocalCache: Store .gguf file
    deactivate API
    end

    rect rgba(200, 255, 150, 0.5)
    note over Frontend,API: Startup Warm-Up
    API->>API: _prefetch_default_huggingface_model()
    activate API
    API->>HFHub: Download default model (task)
    HFHub->>LocalCache: Cache default .gguf
    API->>API: Log completion (non-blocking)
    deactivate API
    end

    rect rgba(255, 200, 150, 0.5)
    note over Frontend,API: Provider Activation/Deactivation
    Frontend->>API: updateEnabledModelsAsync({updates})
    activate API
    API->>API: apply_provider_variable_config (per user_id)
    API->>API: _is_optional_var_configured check
    API-->>Frontend: Success alert
    deactivate API
    end
Loading

Estimated code review effort

🎯 4 (Complex) | ⏱️ ~60 minutes


Important

Pre-merge checks failed

Please resolve all errors before merging. Addressing warnings is optional.

❌ Failed checks (1 error, 3 warnings)

Check name Status Explanation Resolution
Test Coverage For New Implementations ❌ Error PR introduces 567+ lines of new functionality without corresponding test coverage for ChatHuggingFace class, API endpoints, utility functions, prefetch functionality, and huggingface_constants catalog. Add comprehensive test coverage for ChatHuggingFace class, API endpoints, helper functions, prefetch functionality, and frontend utilities following project conventions.
Docstring Coverage ⚠️ Warning Docstring coverage is 69.44% which is insufficient. The required threshold is 80.00%. Write docstrings for the functions missing them to satisfy the coverage threshold.
Test Quality And Coverage ⚠️ Warning PR adds 257 lines of HuggingFace model implementation with two new async API endpoints and startup prefetch function but contains zero test coverage for any new code. Add comprehensive test coverage: create test_huggingface_models.py for API endpoints, test_huggingface_chat_model.py for download/list functions, async tests for prefetch, and frontend tests following existing project patterns.
Test File Naming And Structure ⚠️ Warning PR adds 436+ lines of backend code for HuggingFace model management (2 new API endpoints, background scheduler, startup prefetch) with no corresponding test files added for verification. Add comprehensive pytest files for new API endpoints, background tasks, model caching, and error handling; include frontend tests for new provider components.
✅ Passed checks (5 passed)
Check name Status Explanation
Description Check ✅ Passed Check skipped - CodeRabbit’s high-level summary is enabled.
Title check ✅ Passed The title clearly describes the main feature: adding HuggingFace as a local-first model provider using GGUF/llama-cpp. It accurately reflects the primary change across backend and frontend components.
Linked Issues check ✅ Passed Check skipped because no linked issues were found for this pull request.
Out of Scope Changes check ✅ Passed Check skipped because no linked issues were found for this pull request.
Excessive Mock Usage Warning ✅ Passed The PR does not introduce excessive mock usage. New HuggingFace functionality lacks comprehensive test coverage, avoiding mock obscuration of business logic.
✨ Finishing Touches
🧪 Generate unit tests (beta)
  • Create PR with unit tests
  • Commit unit tests in branch fix/hf-pr-improvements

Tip

💬 Introducing Slack Agent: The best way for teams to turn conversations into code.

Slack Agent is built on CodeRabbit's deep understanding of your code, so your team can collaborate across the entire SDLC without losing context.

  • Generate code and open pull requests
  • Plan features and break down work
  • Investigate incidents and troubleshoot customer tickets together
  • Automate recurring tasks and respond to alerts with triggers
  • Summarize progress and report instantly

Built for teams:

  • Shared memory across your entire org—no repeating context
  • Per-thread sandboxes to safely plan and execute work
  • Governance built-in—scoped access, auditability, and budget controls

One agent for your entire SDLC. Right inside Slack.

👉 Get started


Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share

Comment @coderabbitai help to get the list of available commands and usage tips.

@github-actions github-actions Bot added the enhancement New feature or request label May 4, 2026
@github-actions github-actions Bot added enhancement New feature or request and removed enhancement New feature or request labels May 4, 2026
@Empreiteiro Empreiteiro force-pushed the fix/hf-pr-improvements branch from fec45c3 to 4e373b3 Compare May 5, 2026 17:21
@github-actions github-actions Bot removed the enhancement New feature or request label May 5, 2026
@github-actions github-actions Bot added enhancement New feature or request and removed enhancement New feature or request labels May 5, 2026
autofix-ci Bot and others added 2 commits May 5, 2026 17:26
Drop granite-3.1-2b/8b, Qwen2.5-1.5b/3b, Hermes-3, and Phi-3.5 from
``HUGGINGFACE_MODELS_DETAILED`` so the providers modal shows just the
bundled SmolLM2 entry. Users who want a different GGUF can still
download one via ``POST /api/v1/models/huggingface/download`` with any
HF repo id; the catalog only governs what's preselectable in the UI.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
@github-actions github-actions Bot added enhancement New feature or request and removed enhancement New feature or request labels May 5, 2026
@github-actions github-actions Bot added enhancement New feature or request and removed enhancement New feature or request labels May 5, 2026
… selection

HuggingFace declares no required variables, so it always lands in
``enabled_providers`` even when the user hasn't configured anything.
The naive ``options[0]`` / ``flatOptions[0]`` fallbacks (backend
``update_model_options_in_build_config`` and frontend
``ModelInputComponent``) therefore let HF outrank credentialed
providers when no user default existed, and the auto-select effect
wrote that HF entry back as the saved value — overwriting what the
user actually configured.

Both call sites now prefer the first option from a credentialed
provider, falling back to the very first option only when *every*
available provider is credentialless. Existing sticky-default
behavior is untouched, so an explicit user selection still persists
across re-renders and refreshes (verified by the existing
``test_update_model_options_injects_saved_value_when_missing_from_options``
and ``test_update_model_options_does_not_duplicate_saved_value_already_in_options``).

Adds ``is_credentialless_provider`` in ``provider_queries`` (mirrors
the existing ``isCredentiallessProvider`` frontend helper) so the rule
is derived from ``MODEL_PROVIDER_METADATA`` rather than hard-coded
against HuggingFace.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
@github-actions github-actions Bot added enhancement New feature or request and removed enhancement New feature or request labels May 5, 2026
The previous default (SmolLM2-360M-Instruct) wasn't fine-tuned for
tool calling. The Agent component filters its model dropdown by
``tool_calling=True``, so HF-only setups with the trimmed catalog
landed on an empty Agent dropdown and the model wasn't auto-selected.

Qwen2.5-0.5B-Instruct (Q4_K_M, ~400 MB on disk) is the smallest model
in the Qwen2.5 family that ships with tool-calling fine-tuning, so the
bundled default now satisfies the Agent's filter and the dropdown
auto-populates as expected.

Updates:

  * ``DEFAULT_HUGGINGFACE_MODEL`` →
    ``bartowski/Qwen2.5-0.5B-Instruct-GGUF``
  * ``DEFAULT_GGUF_FILENAME`` →
    ``Qwen2.5-0.5B-Instruct-Q4_K_M.gguf``
  * Catalog entry: ``display_name="qwen2.5:0.5b"``,
    ``tool_calling=True``.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
@github-actions github-actions Bot added enhancement New feature or request and removed enhancement New feature or request labels May 5, 2026
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

enhancement New feature or request

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant