Skip to content

Latest commit

 

History

History
44 lines (35 loc) · 4.95 KB

File metadata and controls

44 lines (35 loc) · 4.95 KB

VectorBasedConfigOutput

Configuration for vector-based clustering. Use canonical feature URIs to specify which vector embeddings to cluster. Feature URIs follow the format: mixpeek://{extractor}@{version}/{output} Supports both single and multi-feature clustering: - Single feature: Provide one feature_uri for standard clustering - Multi-feature: Provide multiple feature_uris for hybrid clustering Examples: Single feature: { "feature_uri": "mixpeek://multimodal_extractor@v1/vertex_multimodal_embedding", "clustering_method": "hdbscan", "sample_size": 1000 } Multi-feature: { "feature_uris": [ "mixpeek://text_extractor@v1/multilingual_e5_large_instruct_v1", "mixpeek://image_extractor@v1/embedding" ], "clustering_method": "hdbscan", "multi_feature_strategy": "concatenate" }

Properties

Name Type Description Notes
feature_uri str DEPRECATED: Use feature_uris instead. Canonical feature URI for the vector embedding to cluster. Format: 'mixpeek://{extractor}@{version}/{output}'. For multi-feature clustering, use feature_uris (plural) instead. [optional]
feature_uris List[str] RECOMMENDED. List of feature URIs to cluster. Format: 'mixpeek://{extractor}@{version}/{output}'. For single-feature clustering, provide a list with one element. For multi-feature clustering, provide multiple feature URIs. Each feature must exist in all input collections. [optional]
clustering_method ClusteringAlgorithm Clustering algorithm to use
sample_size int Number of samples to use for clustering [optional]
kmeans_parameters KmeansParameters [optional]
dbscan_parameters DbscanParameters [optional]
hdbscan_parameters HdbscanParameters [optional]
algorithm_params AlgorithmParams [optional]
multi_feature_strategy str Strategy for handling multiple feature vectors: - concatenate: Combine embeddings into one vector, single clustering - independent: Run separate clustering per feature - weighted: Learn optimal feature weights [optional] [default to 'concatenate']
normalize_features bool Apply L2 normalization to each feature block before concatenation. Prevents feature dominance when combining different modalities. Only applies when multi_feature_strategy='concatenate'. [optional] [default to True]
feature_weights Dict[str, float] Optional per-feature weights (0.0-1.0) for concatenation strategy. Keys are feature URIs, values are weights. Example: {'mixpeek://text@v1/emb': 0.7, 'mixpeek://image@v1/emb': 0.3}. Defaults to equal weights (1.0) if not specified. Only applies when multi_feature_strategy='concatenate'. If multi_feature_strategy='weighted' and this is None, weights are learned automatically using weight_learning_config. [optional]
weight_learning_config WeightLearningConfig Configuration for automatic feature weight learning. Only used when multi_feature_strategy='weighted' and feature_weights is None. If feature_weights is provided, manual weights are used instead of learning. If this is None when learning is needed, default WeightLearningConfig is used. [optional]
output_strategy str Output collection creation strategy: - single: Create one collection with all feature vectors - per_feature: Create separate collections for each feature (for hierarchical taxonomies) [optional] [default to 'single']
effective_feature_method str Method for calculating cluster centroids: - mean: Average of all vectors in cluster - median: Median vector (robust to outliers) - medoid: Actual cluster member closest to centroid [optional] [default to 'mean']
enrich_source bool Whether to enrich source documents with cluster_id [optional] [default to False]

Example

from mixpeek.models.vector_based_config_output import VectorBasedConfigOutput

# TODO update the JSON string below
json = "{}"
# create an instance of VectorBasedConfigOutput from a JSON string
vector_based_config_output_instance = VectorBasedConfigOutput.from_json(json)
# print the JSON string representation of the object
print(VectorBasedConfigOutput.to_json())

# convert the object into a dict
vector_based_config_output_dict = vector_based_config_output_instance.to_dict()
# create an instance of VectorBasedConfigOutput from a dict
vector_based_config_output_from_dict = VectorBasedConfigOutput.from_dict(vector_based_config_output_dict)

[Back to Model list] [Back to API list] [Back to README]