Configuration for a feature extractor with field passthrough support. A feature extractor processes source data (from buckets or collections) and produces features (embeddings, extracted text, detected objects, etc.). With field passthrough, you can also include selected source fields in the output documents alongside the computed features. Core Concepts: 1. Feature Extraction: Extractors compute features from input data (e.g., text → embeddings, image → detections, video → scenes) 2. Field Passthrough: Selectively preserve source fields in output (e.g., title, category, campaign_id from source → output documents) 3. Output Schema: Combination of passed-through fields + extractor outputs (e.g., {title, category, text_embedding} all in one document) How Field Passthrough Works: 1. Define which source fields to include via field_passthrough list 2. During processing, these fields are extracted from source 3. They appear in output documents at root level 4. Combine with extractor outputs for complete documents 5. Use target_path to rename fields for cleaner schemas Field Selection Modes: - Explicit (field_passthrough + include_all=False): Only listed fields pass through. Clean, controlled output. Example: passthrough=[title, category] → output has ONLY title, category, embedding - Inclusive (include_all=True): All source fields pass through, field_passthrough for renaming. Example: source has 10 fields → output has all 10 + embedding - None (no field_passthrough): Only extractor outputs in documents. Example: → output has ONLY embedding (no source fields) Use Cases: - Preserve Identifiers: Keep campaign_id, product_sku, order_id for tracking - Enable Filtering: Pass category, status, department for query filters - Maintain Context: Include title, description for display - Track Metadata: Preserve author, created_at, source for lineage - Business Logic: Keep priority, region, type for application logic Common Patterns: 1. Minimal Passthrough (recommended): field_passthrough=[{"source_path": "id"}], include_all=False → Clean output, only ID + extractor features 2. Metadata Preservation: field_passthrough=[ {"source_path": "title"}, {"source_path": "category"}, {"source_path": "created_at"} ] → Document has context for display and filtering 3. Field Renaming: field_passthrough=[ {"source_path": "doc_title", "target_path": "title"}, {"source_path": "metadata.author", "target_path": "author"} ] → Cleaner output schema with flattened fields 4. Required Fields: field_passthrough=[ {"source_path": "campaign_id", "required": True}, {"source_path": "priority", "default": 0} ] → Ensures critical fields always present Requirements: - feature_extractor_name: REQUIRED - name of the extractor - version: REQUIRED - extractor version (e.g., "v1") - parameters: NOT REQUIRED - extractor-specific config (model, thresholds, etc.) - input_mappings: NOT REQUIRED - maps extractor inputs to source fields - field_passthrough: NOT REQUIRED - which source fields to preserve (default: none) - include_all_source_fields: NOT REQUIRED - preserve all fields (default: false)
| Name | Type | Description | Notes |
|---|---|---|---|
| feature_extractor_name | str | Name of the feature extractor | |
| version | str | Version of the feature extractor (e.g., 'v1', 'v2') | |
| params | Dict[str, object] | Optional extractor parameters that affect vector index configuration. Parameters set here are locked at namespace creation and determine vector dimensions in Qdrant. Collections using this extractor must use compatible params. Example: {'model': 'siglip_base'} | [optional] |
| parameters | Parameters | [optional] | |
| input_mappings | InputMappings | [optional] | |
| field_passthrough | List[FieldPassthrough] | NOT REQUIRED. List of specific fields to pass through from source to output documents. These fields are included alongside extractor-computed features (embeddings, detections, etc.). Empty list = only extractor outputs in documents (default behavior). With entries = specified fields + extractor outputs in documents. How It Works: 1. During processing, fields are extracted from source object/document 2. They appear in output documents at the root level 3. Field filtering happens automatically (only listed fields included) 4. Use target_path to rename fields for cleaner schemas Common Use Cases: - Preserve identifiers: campaign_id, product_sku, order_id - Keep metadata: category, tags, author, created_at - Enable filtering: department, status, priority, region - Maintain context: title, description, source_url Behavior: - Works with include_all_source_fields=False (default): ONLY these fields included - Works with include_all_source_fields=True: These configs used for renaming/defaults - Fields must exist in source bucket_schema or upstream collection output_schema - Missing optional fields are omitted (unless default provided) - Missing required fields cause processing errors Output Schema: output_schema = field_passthrough fields + extractor output fields Example: ['title', 'category', 'text_extractor_v1_embedding'] | [optional] |
| include_all_source_fields | bool | NOT REQUIRED. Whether to include ALL fields from source object/document in output. Default: False (only field_passthrough fields included). When False (RECOMMENDED): - Only fields listed in field_passthrough are included in output - Creates clean, predictable output schemas - Prevents data leakage of unwanted fields - Output = field_passthrough fields + extractor outputs When True (USE WITH CAUTION): - ALL source fields are included in output documents - field_passthrough still used for renaming/defaults/requirements - Can result in large documents if source has many fields - Can leak sensitive or unnecessary data - Output = all source fields + extractor outputs Use True When: - You want to preserve complete source data - Source has limited, well-defined fields - Downstream processing needs all context Use False When (MOST CASES): - You want clean, controlled output schemas - Source has many fields you don't need - You want explicit field selection - You're concerned about document size Examples: False: source={a,b,c,d} + passthrough=[a,b] → output={a,b,embedding} True: source={a,b,c,d} + passthrough=[a→x] → output={x,b,c,d,embedding} | [optional] [default to False] |
| feature_extractor_id | str | Construct unique identifier for the feature extractor instance (name + version). | [readonly] |
from mixpeek.models.shared_collection_features_extractors_models_feature_extractor_config_output import SharedCollectionFeaturesExtractorsModelsFeatureExtractorConfigOutput
# TODO update the JSON string below
json = "{}"
# create an instance of SharedCollectionFeaturesExtractorsModelsFeatureExtractorConfigOutput from a JSON string
shared_collection_features_extractors_models_feature_extractor_config_output_instance = SharedCollectionFeaturesExtractorsModelsFeatureExtractorConfigOutput.from_json(json)
# print the JSON string representation of the object
print(SharedCollectionFeaturesExtractorsModelsFeatureExtractorConfigOutput.to_json())
# convert the object into a dict
shared_collection_features_extractors_models_feature_extractor_config_output_dict = shared_collection_features_extractors_models_feature_extractor_config_output_instance.to_dict()
# create an instance of SharedCollectionFeaturesExtractorsModelsFeatureExtractorConfigOutput from a dict
shared_collection_features_extractors_models_feature_extractor_config_output_from_dict = SharedCollectionFeaturesExtractorsModelsFeatureExtractorConfigOutput.from_dict(shared_collection_features_extractors_models_feature_extractor_config_output_dict)