Overview
This issue tracks the refactoring of the VectorizerActor to support local embedding generation using fastembed (v5.2) as an optional feature flag, while maintaining OpenAI API as the default batteries-included approach.
Design Philosophy: Batteries Included with Privacy Option
Default: OpenAI API (batteries included, works out of the box)
Optional: fastembed local inference (feature flag for privacy-conscious users with available hardware)
This approach provides the best of both worlds:
- New users get a working system immediately with minimal configuration
- Privacy-conscious users can opt into local inference by enabling the
local-embeddings feature flag
- No breaking changes to existing workflows
⚠️ Developer Implementation Notes
IMPORTANT: During implementation, lean heavily on the fastembed-docs MCP server for accurate API usage:
# Query fastembed documentation via MCP
mcp__fastembed-docs__query_rust_docs("How do I initialize TextEmbedding with custom cache directory?")
mcp__fastembed-docs__query_rust_docs("What are the available EmbeddingModel variants and their dimensions?")
mcp__fastembed-docs__query_rust_docs("How do I use InitOptions to configure model downloads?")
mcp__fastembed-docs__query_rust_docs("How do I generate embeddings with batch processing?")
Why This Matters:
- fastembed API may have changed since initial design
- MCP server provides accurate, up-to-date documentation
- Prevents implementation drift from actual library capabilities
- Ensures proper error handling and edge cases are covered
Usage Pattern:
- Before implementing each function, query the MCP server for API details
- Verify enum variants and method signatures against current fastembed version
- Check for any initialization patterns or best practices
- Confirm error types and handling approaches
Why This Change
Benefits of Optional fastembed Support
- Privacy & Security: No data sent to external services when local mode enabled
- Offline Capability: No internet required after initial model download
- Cost Reduction: Zero API costs for embedding generation in local mode
- Performance: No network latency, true batch processing (up to 256 texts/batch)
- Flexibility: Multiple model choices (BGE, MixedBread, all-MiniLM with 384-1024 dimensions)
Benefits of OpenAI as Default
- Ease of Use: Works immediately with just an API key
- No Setup: No model downloads, no disk space requirements
- Consistent Quality: OpenAI embeddings are well-tested and reliable
- Lower Barrier to Entry: Users can start immediately without hardware considerations
Affected Files
Cargo.toml
[dependencies]
# ... existing dependencies ...
# Optional local embedding support via fastembed
fastembed = { version = "5.2", optional = true }
[features]
default = []
local-embeddings = ["fastembed"]
src/actors/config.rs
Extensive VectorizeConfig updates:
- Add
EmbeddingBackend enum (Api as default, Local when feature enabled)
- Add
FastembedModel enum (BgeSmallEnV15, AllMiniLmL6V2, MxbaiEmbedLargeV1, BgeBaseEnV15, BgeLargeEnV15)
- Add
model_cache_dir field (XDG-compliant: $XDG_CACHE_HOME/crately/models)
- Add
show_download_progress field (default: true)
- Keep existing API fields as primary configuration
- Add
vector_dimension() method (computed from backend selection)
- Update validation for backend-specific requirements
src/actors/vectorizer_actor.rs
Dual-backend architecture with compile-time feature gating:
- Update actor state:
embedding_model: Option<Arc<Mutex<TextEmbedding>>> (conditional compilation)
- Update
new() constructor to initialize fastembed model when feature enabled
- Update
spawn() to handle both backends based on feature flag
- Add
vectorize_chunks() dispatcher function
- Add
vectorize_chunks_local() with true batching (32+ texts/batch) - gated by feature flag
- Keep
vectorize_chunks_api() as primary implementation (renamed from existing code)
- Update
EmbeddingError with fastembed-specific variants (conditional)
- Add
spawn_blocking for sync fastembed API in async context (feature-gated)
Implementation Strategy
Phase 1: Feature Flag Infrastructure
- Add fastembed as optional dependency in Cargo.toml
- Create
local-embeddings feature flag
- Add conditional compilation attributes throughout
- Use MCP: Query fastembed-docs for correct dependency configuration
Phase 2: Configuration Updates
- Add new enums and types to config.rs with feature gates
- Extend VectorizeConfig with local-specific fields (behind feature flag)
- Add FastembedModel enum with dimension() and conversion methods
- Implement vector_dimension() that works with both backends
- Update config validation for dual-backend support
- Use MCP: Verify EmbeddingModel enum variants and their properties
Phase 3: VectorizerActor Refactoring
- Add conditional embedding_model field to actor state
- Update new() constructor with feature-gated initialization
- Implement vectorize_chunks_local() behind feature flag
- Refactor existing code into vectorize_chunks_api()
- Create dispatcher that selects backend based on config + feature flag
- Update error types with feature-gated variants
- Use MCP: Query for TextEmbedding initialization, InitOptions configuration, and embed() method usage
Phase 4: Testing & Documentation
- Add unit tests for both backends
- Add integration tests with feature flag variations
- Update config.toml examples with feature flag documentation
- Add migration guide for users wanting to switch to local embeddings
- Document hardware requirements and model sizes
- Use MCP: Verify error handling patterns and edge cases
Configuration Examples
Default (OpenAI API)
[pipeline.vectorize]
# Works out of the box with API key in environment
# No feature flag needed
api_endpoint = "https://api.openai.com/v1/embeddings"
api_model_name = "text-embedding-3-small"
batch_size = 32
Optional Local Embeddings (Requires Feature Flag)
# In Cargo.toml:
# crately = { version = "0.1.0", features = ["local-embeddings"] }
[pipeline.vectorize]
backend = "local" # Only available with local-embeddings feature
local_model = "bge-small-en-v15"
model_cache_dir = "/home/user/.cache/crately/models"
show_download_progress = true
batch_size = 32
Testing Requirements
Without Feature Flag (Default)
With Feature Flag
Both Modes
Migration Path
For Existing Users
No action required - OpenAI API remains default, existing configurations continue to work unchanged.
For Users Wanting Local Embeddings
- Add feature flag to Cargo.toml:
features = ["local-embeddings"]
- Update config.toml: set
backend = "local" and configure model
- First run downloads model to cache (~100-500MB)
- Restart service to use local embeddings
Success Criteria
Architecture Compliance
✅ Actor Factory Pattern: spawn() method maintains consistency
✅ Message Organization: No new messages required
✅ No Unsafe Code: fastembed uses safe Rust
✅ Feature Flags: Proper conditional compilation for optional functionality
✅ No Mutexes: Arc<Mutex<>> only for fastembed (sync library in async context)
✅ Comprehensive Docs: All functions fully documented
✅ Complete Implementation: No placeholders or TODOs
✅ Error Handling: Result types with anyhow context
✅ Testing: Coverage for both feature configurations
✅ MCP Integration: fastembed-docs MCP server used throughout implementation
Technical Details
FastembedModel Enum
#[cfg(feature = "local-embeddings")]
#[derive(Debug, Clone, Serialize, Deserialize, PartialEq, Eq)]
#[serde(rename_all = "kebab-case")]
pub enum FastembedModel {
BgeSmallEnV15, // 384 dim, default, fast
AllMiniLmL6V2, // 384 dim, very fast
MxbaiEmbedLargeV1, // 1024 dim, high quality
BgeBaseEnV15, // 768 dim, balanced
BgeLargeEnV15, // 1024 dim, high quality
}
Conditional Compilation Pattern
#[cfg(feature = "local-embeddings")]
use fastembed::{TextEmbedding, InitOptions, EmbeddingModel};
pub struct VectorizerActor {
config: VectorizeConfig,
#[cfg(feature = "local-embeddings")]
embedding_model: Option<Arc<Mutex<TextEmbedding>>>,
}
Backend Dispatcher
async fn vectorize_chunks(...) -> Result<u32> {
#[cfg(feature = "local-embeddings")]
if config.backend == EmbeddingBackend::Local {
return vectorize_chunks_local(...).await;
}
vectorize_chunks_api(...).await
}
MCP Server Queries for Implementation
Phase 1 - Dependency Setup:
Query: "What is the correct Cargo.toml syntax for adding fastembed as an optional dependency?"
Query: "What features does fastembed expose and which should be default?"
Phase 2 - Model Configuration:
Query: "List all available EmbeddingModel enum variants in fastembed with their dimensions"
Query: "How do I convert between custom enum types and fastembed's EmbeddingModel?"
Phase 3 - Initialization & Usage:
Query: "Show the complete initialization pattern for TextEmbedding with InitOptions including cache directory"
Query: "What is the correct method signature for generating embeddings with batch size?"
Query: "How do I handle model downloads and what errors can occur?"
Query: "What is the structure of embedding vectors returned by fastembed?"
Phase 4 - Error Handling:
Query: "What error types does fastembed use and how should they be handled?"
Query: "What can go wrong during model initialization and embedding generation?"
References
Notes
This issue focuses exclusively on VectorizerActor refactoring. ProcessorActor refactoring with text-splitter + tiktoken-rs is tracked in a separate issue.
The feature flag approach ensures:
- Zero impact on existing users
- Optional privacy-focused local embeddings
- Smaller binary size when feature not needed
- Clear separation of concerns
- Easy testing of both configurations
Implementation Mandate: Use the fastembed-docs MCP server extensively throughout implementation to ensure accuracy and prevent API drift from the design phase.
Overview
This issue tracks the refactoring of the VectorizerActor to support local embedding generation using
fastembed(v5.2) as an optional feature flag, while maintaining OpenAI API as the default batteries-included approach.Design Philosophy: Batteries Included with Privacy Option
Default: OpenAI API (batteries included, works out of the box)
Optional: fastembed local inference (feature flag for privacy-conscious users with available hardware)
This approach provides the best of both worlds:
local-embeddingsfeature flagIMPORTANT: During implementation, lean heavily on the
fastembed-docsMCP server for accurate API usage:Why This Matters:
Usage Pattern:
Why This Change
Benefits of Optional fastembed Support
Benefits of OpenAI as Default
Affected Files
Cargo.toml
src/actors/config.rs
Extensive VectorizeConfig updates:
EmbeddingBackendenum (Api as default, Local when feature enabled)FastembedModelenum (BgeSmallEnV15, AllMiniLmL6V2, MxbaiEmbedLargeV1, BgeBaseEnV15, BgeLargeEnV15)model_cache_dirfield (XDG-compliant: $XDG_CACHE_HOME/crately/models)show_download_progressfield (default: true)vector_dimension()method (computed from backend selection)src/actors/vectorizer_actor.rs
Dual-backend architecture with compile-time feature gating:
embedding_model: Option<Arc<Mutex<TextEmbedding>>>(conditional compilation)new()constructor to initialize fastembed model when feature enabledspawn()to handle both backends based on feature flagvectorize_chunks()dispatcher functionvectorize_chunks_local()with true batching (32+ texts/batch) - gated by feature flagvectorize_chunks_api()as primary implementation (renamed from existing code)EmbeddingErrorwith fastembed-specific variants (conditional)spawn_blockingfor sync fastembed API in async context (feature-gated)Implementation Strategy
Phase 1: Feature Flag Infrastructure
local-embeddingsfeature flagPhase 2: Configuration Updates
Phase 3: VectorizerActor Refactoring
Phase 4: Testing & Documentation
Configuration Examples
Default (OpenAI API)
Optional Local Embeddings (Requires Feature Flag)
Testing Requirements
Without Feature Flag (Default)
With Feature Flag
Both Modes
Migration Path
For Existing Users
No action required - OpenAI API remains default, existing configurations continue to work unchanged.
For Users Wanting Local Embeddings
features = ["local-embeddings"]backend = "local"and configure modelSuccess Criteria
local-embeddingsfeature flag createdArchitecture Compliance
✅ Actor Factory Pattern: spawn() method maintains consistency
✅ Message Organization: No new messages required
✅ No Unsafe Code: fastembed uses safe Rust
✅ Feature Flags: Proper conditional compilation for optional functionality
✅ No Mutexes: Arc<Mutex<>> only for fastembed (sync library in async context)
✅ Comprehensive Docs: All functions fully documented
✅ Complete Implementation: No placeholders or TODOs
✅ Error Handling: Result types with anyhow context
✅ Testing: Coverage for both feature configurations
✅ MCP Integration: fastembed-docs MCP server used throughout implementation
Technical Details
FastembedModel Enum
Conditional Compilation Pattern
Backend Dispatcher
MCP Server Queries for Implementation
Phase 1 - Dependency Setup:
Phase 2 - Model Configuration:
Phase 3 - Initialization & Usage:
Phase 4 - Error Handling:
References
Notes
This issue focuses exclusively on VectorizerActor refactoring. ProcessorActor refactoring with text-splitter + tiktoken-rs is tracked in a separate issue.
The feature flag approach ensures:
Implementation Mandate: Use the fastembed-docs MCP server extensively throughout implementation to ensure accuracy and prevent API drift from the design phase.