Refactor pipeline actor: fastembed for VectorizerActor

## Overview

This issue tracks the refactoring of the VectorizerActor to support local embedding generation using `fastembed` (v5.2) as an optional feature flag, while maintaining OpenAI API as the default batteries-included approach.

## Design Philosophy: Batteries Included with Privacy Option

**Default**: OpenAI API (batteries included, works out of the box)  
**Optional**: fastembed local inference (feature flag for privacy-conscious users with available hardware)

This approach provides the best of both worlds:
- New users get a working system immediately with minimal configuration
- Privacy-conscious users can opt into local inference by enabling the `local-embeddings` feature flag
- No breaking changes to existing workflows

## ⚠️ Developer Implementation Notes

**IMPORTANT**: During implementation, lean heavily on the `fastembed-docs` MCP server for accurate API usage:

```bash
# Query fastembed documentation via MCP
mcp__fastembed-docs__query_rust_docs("How do I initialize TextEmbedding with custom cache directory?")
mcp__fastembed-docs__query_rust_docs("What are the available EmbeddingModel variants and their dimensions?")
mcp__fastembed-docs__query_rust_docs("How do I use InitOptions to configure model downloads?")
mcp__fastembed-docs__query_rust_docs("How do I generate embeddings with batch processing?")
```

**Why This Matters**:
- fastembed API may have changed since initial design
- MCP server provides accurate, up-to-date documentation
- Prevents implementation drift from actual library capabilities
- Ensures proper error handling and edge cases are covered

**Usage Pattern**:
1. Before implementing each function, query the MCP server for API details
2. Verify enum variants and method signatures against current fastembed version
3. Check for any initialization patterns or best practices
4. Confirm error types and handling approaches

## Why This Change

### Benefits of Optional fastembed Support
- **Privacy & Security**: No data sent to external services when local mode enabled
- **Offline Capability**: No internet required after initial model download
- **Cost Reduction**: Zero API costs for embedding generation in local mode
- **Performance**: No network latency, true batch processing (up to 256 texts/batch)
- **Flexibility**: Multiple model choices (BGE, MixedBread, all-MiniLM with 384-1024 dimensions)

### Benefits of OpenAI as Default
- **Ease of Use**: Works immediately with just an API key
- **No Setup**: No model downloads, no disk space requirements
- **Consistent Quality**: OpenAI embeddings are well-tested and reliable
- **Lower Barrier to Entry**: Users can start immediately without hardware considerations

## Affected Files

### Cargo.toml
```toml
[dependencies]
# ... existing dependencies ...

# Optional local embedding support via fastembed
fastembed = { version = "5.2", optional = true }

[features]
default = []
local-embeddings = ["fastembed"]
```

### src/actors/config.rs
Extensive VectorizeConfig updates:
- Add `EmbeddingBackend` enum (Api as default, Local when feature enabled)
- Add `FastembedModel` enum (BgeSmallEnV15, AllMiniLmL6V2, MxbaiEmbedLargeV1, BgeBaseEnV15, BgeLargeEnV15)
- Add `model_cache_dir` field (XDG-compliant: $XDG_CACHE_HOME/crately/models)
- Add `show_download_progress` field (default: true)
- Keep existing API fields as primary configuration
- Add `vector_dimension()` method (computed from backend selection)
- Update validation for backend-specific requirements

### src/actors/vectorizer_actor.rs
Dual-backend architecture with compile-time feature gating:
- Update actor state: `embedding_model: Option<Arc<Mutex<TextEmbedding>>>` (conditional compilation)
- Update `new()` constructor to initialize fastembed model when feature enabled
- Update `spawn()` to handle both backends based on feature flag
- Add `vectorize_chunks()` dispatcher function
- Add `vectorize_chunks_local()` with true batching (32+ texts/batch) - gated by feature flag
- Keep `vectorize_chunks_api()` as primary implementation (renamed from existing code)
- Update `EmbeddingError` with fastembed-specific variants (conditional)
- Add `spawn_blocking` for sync fastembed API in async context (feature-gated)

## Implementation Strategy

### Phase 1: Feature Flag Infrastructure
1. Add fastembed as optional dependency in Cargo.toml
2. Create `local-embeddings` feature flag
3. Add conditional compilation attributes throughout
4. **Use MCP**: Query fastembed-docs for correct dependency configuration

### Phase 2: Configuration Updates
1. Add new enums and types to config.rs with feature gates
2. Extend VectorizeConfig with local-specific fields (behind feature flag)
3. Add FastembedModel enum with dimension() and conversion methods
4. Implement vector_dimension() that works with both backends
5. Update config validation for dual-backend support
6. **Use MCP**: Verify EmbeddingModel enum variants and their properties

### Phase 3: VectorizerActor Refactoring
1. Add conditional embedding_model field to actor state
2. Update new() constructor with feature-gated initialization
3. Implement vectorize_chunks_local() behind feature flag
4. Refactor existing code into vectorize_chunks_api()
5. Create dispatcher that selects backend based on config + feature flag
6. Update error types with feature-gated variants
7. **Use MCP**: Query for TextEmbedding initialization, InitOptions configuration, and embed() method usage

### Phase 4: Testing & Documentation
1. Add unit tests for both backends
2. Add integration tests with feature flag variations
3. Update config.toml examples with feature flag documentation
4. Add migration guide for users wanting to switch to local embeddings
5. Document hardware requirements and model sizes
6. **Use MCP**: Verify error handling patterns and edge cases

## Configuration Examples

### Default (OpenAI API)
```toml
[pipeline.vectorize]
# Works out of the box with API key in environment
# No feature flag needed
api_endpoint = "https://api.openai.com/v1/embeddings"
api_model_name = "text-embedding-3-small"
batch_size = 32
```

### Optional Local Embeddings (Requires Feature Flag)
```toml
# In Cargo.toml:
# crately = { version = "0.1.0", features = ["local-embeddings"] }

[pipeline.vectorize]
backend = "local"  # Only available with local-embeddings feature
local_model = "bge-small-en-v15"
model_cache_dir = "/home/user/.cache/crately/models"
show_download_progress = true
batch_size = 32
```

## Testing Requirements

### Without Feature Flag (Default)
- [ ] Verify fastembed code not included in binary
- [ ] Test API-based embedding generation works as before
- [ ] Verify config validation for API mode
- [ ] Test error handling for missing API key
- [ ] Integration test with OpenAI API

### With Feature Flag
- [ ] Test fastembed model initialization
- [ ] Test batch processing with real fastembed model
- [ ] Test model download on first use
- [ ] Test backend switching in config
- [ ] Unit test FastembedModel dimensions
- [ ] Verify config validation for local mode
- [ ] Integration test with local embeddings

### Both Modes
- [ ] Zero clippy lints in both configurations
- [ ] All tests pass with and without feature flag
- [ ] Documentation builds correctly
- [ ] Binary size comparison (with vs without feature)

## Migration Path

### For Existing Users
**No action required** - OpenAI API remains default, existing configurations continue to work unchanged.

### For Users Wanting Local Embeddings
1. Add feature flag to Cargo.toml: `features = ["local-embeddings"]`
2. Update config.toml: set `backend = "local"` and configure model
3. First run downloads model to cache (~100-500MB)
4. Restart service to use local embeddings

## Success Criteria

- [ ] fastembed added as optional dependency
- [ ] `local-embeddings` feature flag created
- [ ] VectorizeConfig extended with local-specific fields (feature-gated)
- [ ] VectorizerActor supports both backends with compile-time selection
- [ ] All helper functions implemented and documented
- [ ] Comprehensive unit tests pass for both configurations
- [ ] Integration tests verify both backends
- [ ] Zero clippy lints in both build configurations
- [ ] Documentation updated with feature flag usage
- [ ] Binary compiles successfully without feature flag (API-only)
- [ ] Binary compiles successfully with feature flag (dual-backend)
- [ ] **MCP Usage**: All fastembed implementation verified against fastembed-docs MCP server

## Architecture Compliance

✅ **Actor Factory Pattern**: spawn() method maintains consistency  
✅ **Message Organization**: No new messages required  
✅ **No Unsafe Code**: fastembed uses safe Rust  
✅ **Feature Flags**: Proper conditional compilation for optional functionality  
✅ **No Mutexes**: Arc<Mutex<>> only for fastembed (sync library in async context)  
✅ **Comprehensive Docs**: All functions fully documented  
✅ **Complete Implementation**: No placeholders or TODOs  
✅ **Error Handling**: Result types with anyhow context  
✅ **Testing**: Coverage for both feature configurations  
✅ **MCP Integration**: fastembed-docs MCP server used throughout implementation

## Technical Details

### FastembedModel Enum
```rust
#[cfg(feature = "local-embeddings")]
#[derive(Debug, Clone, Serialize, Deserialize, PartialEq, Eq)]
#[serde(rename_all = "kebab-case")]
pub enum FastembedModel {
    BgeSmallEnV15,       // 384 dim, default, fast
    AllMiniLmL6V2,       // 384 dim, very fast
    MxbaiEmbedLargeV1,   // 1024 dim, high quality
    BgeBaseEnV15,        // 768 dim, balanced
    BgeLargeEnV15,       // 1024 dim, high quality
}
```

### Conditional Compilation Pattern
```rust
#[cfg(feature = "local-embeddings")]
use fastembed::{TextEmbedding, InitOptions, EmbeddingModel};

pub struct VectorizerActor {
    config: VectorizeConfig,
    #[cfg(feature = "local-embeddings")]
    embedding_model: Option<Arc<Mutex<TextEmbedding>>>,
}
```

### Backend Dispatcher
```rust
async fn vectorize_chunks(...) -> Result<u32> {
    #[cfg(feature = "local-embeddings")]
    if config.backend == EmbeddingBackend::Local {
        return vectorize_chunks_local(...).await;
    }
    
    vectorize_chunks_api(...).await
}
```

## MCP Server Queries for Implementation

**Phase 1 - Dependency Setup**:
```
Query: "What is the correct Cargo.toml syntax for adding fastembed as an optional dependency?"
Query: "What features does fastembed expose and which should be default?"
```

**Phase 2 - Model Configuration**:
```
Query: "List all available EmbeddingModel enum variants in fastembed with their dimensions"
Query: "How do I convert between custom enum types and fastembed's EmbeddingModel?"
```

**Phase 3 - Initialization & Usage**:
```
Query: "Show the complete initialization pattern for TextEmbedding with InitOptions including cache directory"
Query: "What is the correct method signature for generating embeddings with batch size?"
Query: "How do I handle model downloads and what errors can occur?"
Query: "What is the structure of embedding vectors returned by fastembed?"
```

**Phase 4 - Error Handling**:
```
Query: "What error types does fastembed use and how should they be handled?"
Query: "What can go wrong during model initialization and embedding generation?"
```

## References

- **fastembed-docs MCP Server**: Primary reference for implementation (USE THIS HEAVILY!)
- fastembed docs: https://docs.rs/fastembed/5.2.0
- fastembed repo: https://github.com/Anush008/fastembed-rs
- Architecture guides: .agents/important-info/
- Related issue (ProcessorActor): #79

## Notes

This issue focuses **exclusively** on VectorizerActor refactoring. ProcessorActor refactoring with text-splitter + tiktoken-rs is tracked in a separate issue.

The feature flag approach ensures:
- Zero impact on existing users
- Optional privacy-focused local embeddings
- Smaller binary size when feature not needed
- Clear separation of concerns
- Easy testing of both configurations

**Implementation Mandate**: Use the fastembed-docs MCP server extensively throughout implementation to ensure accuracy and prevent API drift from the design phase.

Uh oh!

Refactor pipeline actor: fastembed for VectorizerActor #80

Description

Overview

Design Philosophy: Batteries Included with Privacy Option

⚠️ Developer Implementation Notes

Why This Change

Benefits of Optional fastembed Support

Benefits of OpenAI as Default

Affected Files

Cargo.toml

src/actors/config.rs

src/actors/vectorizer_actor.rs

Implementation Strategy

Phase 1: Feature Flag Infrastructure

Phase 2: Configuration Updates

Phase 3: VectorizerActor Refactoring

Phase 4: Testing & Documentation

Configuration Examples

Default (OpenAI API)

Optional Local Embeddings (Requires Feature Flag)

Testing Requirements

Without Feature Flag (Default)

With Feature Flag

Both Modes

Migration Path

For Existing Users

For Users Wanting Local Embeddings

Success Criteria

Architecture Compliance

Technical Details

FastembedModel Enum

Conditional Compilation Pattern

Backend Dispatcher

MCP Server Queries for Implementation

References

Notes

Metadata

Metadata

Assignees

Labels

Type

Fields

Projects

Milestone

Relationships

Development

Issue actions