Skip to content

Latest commit

 

History

History
41 lines (32 loc) · 4.83 KB

File metadata and controls

41 lines (32 loc) · 4.83 KB

CreateCollectionRequest

Request model for creating a new collection. Collections process data from buckets or other collections using a single feature extractor. KEY ARCHITECTURAL CHANGE: Each collection has EXACTLY ONE feature extractor. - Use field_passthrough to include additional source fields in output documents - Multiple extractors = multiple collections - This simplifies processing and makes output schema deterministic CRITICAL: To use input_mappings: 1. Your source bucket MUST have a bucket_schema defined 2. The input_mappings reference fields from that bucket_schema 3. The system validates that mapped fields exist in the source schema Example workflow: 1. Create bucket with schema: { "properties": { "image": {"type": "image"}, "title": {"type": "string"} } } 2. Upload objects conforming to that schema 3. Create collection with: - input_mappings: { "image": "image" } - field_passthrough: [{"source_path": "title"}] 4. Output documents will have both extractor outputs AND passthrough fields Schema Computation: - output_schema is computed IMMEDIATELY when collection is created - output_schema = field_passthrough fields + extractor output fields - No waiting for documents to be processed!

Properties

Name Type Description Notes
collection_name str Name of the collection to create
description str Description of the collection [optional]
source SourceConfigInput Source configuration (bucket or collection) for this collection
input_schema BucketSchemaInput Input schema for the collection. If not provided, inferred from source bucket's bucket_schema or source collection's output_schema. REQUIRED for input_mappings to work - defines what fields can be mapped to feature extractors. [optional]
feature_extractor SharedCollectionFeaturesExtractorsModelsFeatureExtractorConfigInput Single feature extractor for this collection. Use field_passthrough within the extractor config to include additional source fields. For multiple extractors, create multiple collections and use collection-to-collection pipelines.
enabled bool Whether the collection is enabled [optional] [default to True]
metadata Dict[str, object] Additional metadata for the collection [optional]
schedule CollectionScheduleConfig Optional schedule for automatic re-processing. Creates a COLLECTION_TRIGGER trigger behind the scenes. Supports cron and interval schedules. [optional]
taxonomy_applications List[TaxonomyApplicationConfigInput] Optional taxonomy applications to automatically enrich documents in this collection. Each taxonomy will classify/enrich documents based on configured retriever matches. [optional]
cluster_applications List[ClusterApplicationConfig] Optional cluster applications to automatically execute when batch processing completes. Each cluster enriches documents with cluster assignments (cluster_id, cluster_label, etc.). [optional]
alert_applications List[AlertApplicationConfigInput] Optional alert applications to automatically execute when documents are ingested. Each alert runs a retriever against new documents and sends notifications if matches are found. Supports both ON_INGEST (triggered per batch) and SCHEDULED (periodic) execution modes. [optional]
retriever_enrichments List[RetrieverEnrichmentConfigInput] Optional retriever enrichments to run on documents during post-processing. Each enrichment executes a retriever pipeline and writes selected result fields back to the document. Use for: LLM classification, cross-collection joins, multi-stage enrichment at ingestion time. [optional]

Example

from mixpeek.models.create_collection_request import CreateCollectionRequest

# TODO update the JSON string below
json = "{}"
# create an instance of CreateCollectionRequest from a JSON string
create_collection_request_instance = CreateCollectionRequest.from_json(json)
# print the JSON string representation of the object
print(CreateCollectionRequest.to_json())

# convert the object into a dict
create_collection_request_dict = create_collection_request_instance.to_dict()
# create an instance of CreateCollectionRequest from a dict
create_collection_request_from_dict = CreateCollectionRequest.from_dict(create_collection_request_dict)

[Back to Model list] [Back to API list] [Back to README]