Skip to content

Refactor pipeline actor: fastembed for VectorizerActor #80

@rrrodzilla

Description

@rrrodzilla

Overview

This issue tracks the refactoring of the VectorizerActor to support local embedding generation using fastembed (v5.2) as an optional feature flag, while maintaining OpenAI API as the default batteries-included approach.

Design Philosophy: Batteries Included with Privacy Option

Default: OpenAI API (batteries included, works out of the box)
Optional: fastembed local inference (feature flag for privacy-conscious users with available hardware)

This approach provides the best of both worlds:

  • New users get a working system immediately with minimal configuration
  • Privacy-conscious users can opt into local inference by enabling the local-embeddings feature flag
  • No breaking changes to existing workflows

⚠️ Developer Implementation Notes

IMPORTANT: During implementation, lean heavily on the fastembed-docs MCP server for accurate API usage:

# Query fastembed documentation via MCP
mcp__fastembed-docs__query_rust_docs("How do I initialize TextEmbedding with custom cache directory?")
mcp__fastembed-docs__query_rust_docs("What are the available EmbeddingModel variants and their dimensions?")
mcp__fastembed-docs__query_rust_docs("How do I use InitOptions to configure model downloads?")
mcp__fastembed-docs__query_rust_docs("How do I generate embeddings with batch processing?")

Why This Matters:

  • fastembed API may have changed since initial design
  • MCP server provides accurate, up-to-date documentation
  • Prevents implementation drift from actual library capabilities
  • Ensures proper error handling and edge cases are covered

Usage Pattern:

  1. Before implementing each function, query the MCP server for API details
  2. Verify enum variants and method signatures against current fastembed version
  3. Check for any initialization patterns or best practices
  4. Confirm error types and handling approaches

Why This Change

Benefits of Optional fastembed Support

  • Privacy & Security: No data sent to external services when local mode enabled
  • Offline Capability: No internet required after initial model download
  • Cost Reduction: Zero API costs for embedding generation in local mode
  • Performance: No network latency, true batch processing (up to 256 texts/batch)
  • Flexibility: Multiple model choices (BGE, MixedBread, all-MiniLM with 384-1024 dimensions)

Benefits of OpenAI as Default

  • Ease of Use: Works immediately with just an API key
  • No Setup: No model downloads, no disk space requirements
  • Consistent Quality: OpenAI embeddings are well-tested and reliable
  • Lower Barrier to Entry: Users can start immediately without hardware considerations

Affected Files

Cargo.toml

[dependencies]
# ... existing dependencies ...

# Optional local embedding support via fastembed
fastembed = { version = "5.2", optional = true }

[features]
default = []
local-embeddings = ["fastembed"]

src/actors/config.rs

Extensive VectorizeConfig updates:

  • Add EmbeddingBackend enum (Api as default, Local when feature enabled)
  • Add FastembedModel enum (BgeSmallEnV15, AllMiniLmL6V2, MxbaiEmbedLargeV1, BgeBaseEnV15, BgeLargeEnV15)
  • Add model_cache_dir field (XDG-compliant: $XDG_CACHE_HOME/crately/models)
  • Add show_download_progress field (default: true)
  • Keep existing API fields as primary configuration
  • Add vector_dimension() method (computed from backend selection)
  • Update validation for backend-specific requirements

src/actors/vectorizer_actor.rs

Dual-backend architecture with compile-time feature gating:

  • Update actor state: embedding_model: Option<Arc<Mutex<TextEmbedding>>> (conditional compilation)
  • Update new() constructor to initialize fastembed model when feature enabled
  • Update spawn() to handle both backends based on feature flag
  • Add vectorize_chunks() dispatcher function
  • Add vectorize_chunks_local() with true batching (32+ texts/batch) - gated by feature flag
  • Keep vectorize_chunks_api() as primary implementation (renamed from existing code)
  • Update EmbeddingError with fastembed-specific variants (conditional)
  • Add spawn_blocking for sync fastembed API in async context (feature-gated)

Implementation Strategy

Phase 1: Feature Flag Infrastructure

  1. Add fastembed as optional dependency in Cargo.toml
  2. Create local-embeddings feature flag
  3. Add conditional compilation attributes throughout
  4. Use MCP: Query fastembed-docs for correct dependency configuration

Phase 2: Configuration Updates

  1. Add new enums and types to config.rs with feature gates
  2. Extend VectorizeConfig with local-specific fields (behind feature flag)
  3. Add FastembedModel enum with dimension() and conversion methods
  4. Implement vector_dimension() that works with both backends
  5. Update config validation for dual-backend support
  6. Use MCP: Verify EmbeddingModel enum variants and their properties

Phase 3: VectorizerActor Refactoring

  1. Add conditional embedding_model field to actor state
  2. Update new() constructor with feature-gated initialization
  3. Implement vectorize_chunks_local() behind feature flag
  4. Refactor existing code into vectorize_chunks_api()
  5. Create dispatcher that selects backend based on config + feature flag
  6. Update error types with feature-gated variants
  7. Use MCP: Query for TextEmbedding initialization, InitOptions configuration, and embed() method usage

Phase 4: Testing & Documentation

  1. Add unit tests for both backends
  2. Add integration tests with feature flag variations
  3. Update config.toml examples with feature flag documentation
  4. Add migration guide for users wanting to switch to local embeddings
  5. Document hardware requirements and model sizes
  6. Use MCP: Verify error handling patterns and edge cases

Configuration Examples

Default (OpenAI API)

[pipeline.vectorize]
# Works out of the box with API key in environment
# No feature flag needed
api_endpoint = "https://api.openai.com/v1/embeddings"
api_model_name = "text-embedding-3-small"
batch_size = 32

Optional Local Embeddings (Requires Feature Flag)

# In Cargo.toml:
# crately = { version = "0.1.0", features = ["local-embeddings"] }

[pipeline.vectorize]
backend = "local"  # Only available with local-embeddings feature
local_model = "bge-small-en-v15"
model_cache_dir = "/home/user/.cache/crately/models"
show_download_progress = true
batch_size = 32

Testing Requirements

Without Feature Flag (Default)

  • Verify fastembed code not included in binary
  • Test API-based embedding generation works as before
  • Verify config validation for API mode
  • Test error handling for missing API key
  • Integration test with OpenAI API

With Feature Flag

  • Test fastembed model initialization
  • Test batch processing with real fastembed model
  • Test model download on first use
  • Test backend switching in config
  • Unit test FastembedModel dimensions
  • Verify config validation for local mode
  • Integration test with local embeddings

Both Modes

  • Zero clippy lints in both configurations
  • All tests pass with and without feature flag
  • Documentation builds correctly
  • Binary size comparison (with vs without feature)

Migration Path

For Existing Users

No action required - OpenAI API remains default, existing configurations continue to work unchanged.

For Users Wanting Local Embeddings

  1. Add feature flag to Cargo.toml: features = ["local-embeddings"]
  2. Update config.toml: set backend = "local" and configure model
  3. First run downloads model to cache (~100-500MB)
  4. Restart service to use local embeddings

Success Criteria

  • fastembed added as optional dependency
  • local-embeddings feature flag created
  • VectorizeConfig extended with local-specific fields (feature-gated)
  • VectorizerActor supports both backends with compile-time selection
  • All helper functions implemented and documented
  • Comprehensive unit tests pass for both configurations
  • Integration tests verify both backends
  • Zero clippy lints in both build configurations
  • Documentation updated with feature flag usage
  • Binary compiles successfully without feature flag (API-only)
  • Binary compiles successfully with feature flag (dual-backend)
  • MCP Usage: All fastembed implementation verified against fastembed-docs MCP server

Architecture Compliance

Actor Factory Pattern: spawn() method maintains consistency
Message Organization: No new messages required
No Unsafe Code: fastembed uses safe Rust
Feature Flags: Proper conditional compilation for optional functionality
No Mutexes: Arc<Mutex<>> only for fastembed (sync library in async context)
Comprehensive Docs: All functions fully documented
Complete Implementation: No placeholders or TODOs
Error Handling: Result types with anyhow context
Testing: Coverage for both feature configurations
MCP Integration: fastembed-docs MCP server used throughout implementation

Technical Details

FastembedModel Enum

#[cfg(feature = "local-embeddings")]
#[derive(Debug, Clone, Serialize, Deserialize, PartialEq, Eq)]
#[serde(rename_all = "kebab-case")]
pub enum FastembedModel {
    BgeSmallEnV15,       // 384 dim, default, fast
    AllMiniLmL6V2,       // 384 dim, very fast
    MxbaiEmbedLargeV1,   // 1024 dim, high quality
    BgeBaseEnV15,        // 768 dim, balanced
    BgeLargeEnV15,       // 1024 dim, high quality
}

Conditional Compilation Pattern

#[cfg(feature = "local-embeddings")]
use fastembed::{TextEmbedding, InitOptions, EmbeddingModel};

pub struct VectorizerActor {
    config: VectorizeConfig,
    #[cfg(feature = "local-embeddings")]
    embedding_model: Option<Arc<Mutex<TextEmbedding>>>,
}

Backend Dispatcher

async fn vectorize_chunks(...) -> Result<u32> {
    #[cfg(feature = "local-embeddings")]
    if config.backend == EmbeddingBackend::Local {
        return vectorize_chunks_local(...).await;
    }
    
    vectorize_chunks_api(...).await
}

MCP Server Queries for Implementation

Phase 1 - Dependency Setup:

Query: "What is the correct Cargo.toml syntax for adding fastembed as an optional dependency?"
Query: "What features does fastembed expose and which should be default?"

Phase 2 - Model Configuration:

Query: "List all available EmbeddingModel enum variants in fastembed with their dimensions"
Query: "How do I convert between custom enum types and fastembed's EmbeddingModel?"

Phase 3 - Initialization & Usage:

Query: "Show the complete initialization pattern for TextEmbedding with InitOptions including cache directory"
Query: "What is the correct method signature for generating embeddings with batch size?"
Query: "How do I handle model downloads and what errors can occur?"
Query: "What is the structure of embedding vectors returned by fastembed?"

Phase 4 - Error Handling:

Query: "What error types does fastembed use and how should they be handled?"
Query: "What can go wrong during model initialization and embedding generation?"

References

Notes

This issue focuses exclusively on VectorizerActor refactoring. ProcessorActor refactoring with text-splitter + tiktoken-rs is tracked in a separate issue.

The feature flag approach ensures:

  • Zero impact on existing users
  • Optional privacy-focused local embeddings
  • Smaller binary size when feature not needed
  • Clear separation of concerns
  • Easy testing of both configurations

Implementation Mandate: Use the fastembed-docs MCP server extensively throughout implementation to ensure accuracy and prevent API drift from the design phase.

Metadata

Metadata

Assignees

No one assigned

    Labels

    enhancementNew feature or request

    Type

    No type
    No fields configured for issues without a type.

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions