Configuration for collection source (bucket(s) or collection). Collections can process data from two types of sources: 1. Bucket Source: Process raw objects from one or more buckets (first-stage processing) - Use this to create your initial collections from uploaded data - Can specify multiple buckets to consolidate data from different sources - All buckets must have compatible schemas (validated at creation) - Example: Videos from multiple regions → Frame extraction collection 2. Collection Source: Process documents from another collection (decomposition trees) - Use this to create multi-stage processing pipelines - Example: Frames collection → Scene detection collection Multi-Bucket Requirements: - All buckets must have compatible schemas (same fields, types, and required status) - Schema compatibility is validated when the collection is created - Documents track which specific bucket they came from via root_bucket_id - Useful for consolidating data from multiple regions, teams, or environments The source determines: - What data the feature extractor receives as input - The input_schema available for input_mappings and field_passthrough - The lineage tracking in output documents Examples: Single bucket: {"type": "bucket", "bucket_ids": ["bkt_products"]} Multi-bucket: {"type": "bucket", "bucket_ids": ["bkt_us", "bkt_eu", "bkt_asia"]} Collection: {"type": "collection", "collection_id": "col_frames"}
| Name | Type | Description | Notes |
|---|---|---|---|
| type | SourceType | REQUIRED. Type of source for this collection. 'bucket': Process objects from one or more buckets (first-stage processing). 'collection': Process documents from another collection (downstream processing). Use 'bucket' for initial data ingestion, 'collection' for decomposition trees. | |
| bucket_ids | List[str] | List of bucket IDs when type='bucket'. REQUIRED when type='bucket'. NOT ALLOWED when type='collection'. Can specify one or more buckets to process. Single bucket: Use array with one element ['bkt_id']. Multiple buckets: All buckets MUST have compatible schemas. Schema compatibility validated at collection creation. Compatible schemas have: 1) Same field names, 2) Same field types, 3) Same required status. Documents will include root_bucket_id to track which bucket they came from. Use cases: multi-region data, multi-team consolidation, environment aggregation. | [optional] |
| source_namespace_id | str | Namespace ID where the source buckets reside. Use this to process buckets from a different namespace within the same organization. When omitted, buckets are looked up in the current (collection's) namespace. Only valid when type='bucket'. | [optional] |
| collection_id | str | Collection ID when type='collection' (single collection). Use this OR collection_ids (not both). REQUIRED when type='collection' and processing single collection. NOT ALLOWED when type='bucket'. The collection will process documents from this upstream collection. The upstream collection's output_schema becomes this collection's input_schema. This enables decomposition trees (multi-stage pipelines). Example: Process frames collection → create scenes collection. | [optional] |
| collection_ids | List[str] | List of collection IDs when type='collection' (multiple collections). Use this OR collection_id (not both). REQUIRED when type='collection' and processing multiple collections. NOT ALLOWED when type='bucket'. Used for operations that consolidate multiple upstream collections. Example: Clustering across multiple collections → cluster output collection. All collections must have compatible schemas for consolidation operations. | [optional] |
| inherited_bucket_ids | List[str] | List of original bucket IDs that source collections originated from. OPTIONAL. Only used when type='collection'. Tracks the complete lineage chain: buckets → collections → derived collections. Extracted from upstream collection metadata at collection creation time. Enables tracing derived collections (like cluster outputs) back to original data sources. Example: Cluster output collection inherits bucket IDs from its source collections. Format: List of bucket IDs with 'bkt_' prefix. | [optional] |
| source_filters | SourceFiltersOutput | Optional filters to apply to source data. When specified, only objects/documents matching these filters will be processed by this collection. Filters are evaluated at batch creation time. Uses same LogicalOperator model as list APIs for consistency. | [optional] |
from mixpeek.models.source_config_output import SourceConfigOutput
# TODO update the JSON string below
json = "{}"
# create an instance of SourceConfigOutput from a JSON string
source_config_output_instance = SourceConfigOutput.from_json(json)
# print the JSON string representation of the object
print(SourceConfigOutput.to_json())
# convert the object into a dict
source_config_output_dict = source_config_output_instance.to_dict()
# create an instance of SourceConfigOutput from a dict
source_config_output_from_dict = SourceConfigOutput.from_dict(source_config_output_dict)