TextExtractorParams

Parameters for the text extractor. The text extractor generates dense vector embeddings optimized for semantic similarity search. It uses the E5-Large multilingual model to convert text into 1024-dimensional vectors. When source_type is \"youtube\", the extractor first resolves YouTube URLs to caption text via yt-dlp before chunking and embedding. Use split_by=\"time_segments\" with segment_length_seconds to segment captions by time window.

Properties

Name	Type	Description	Notes
extractor_type	str	Discriminator field for parameter type identification.	[optional] [default to 'text_extractor']
source_type	str	Source content type. Use 'youtube' to resolve YouTube URLs to caption text before embedding. Default: 'text' (plain text input).	[optional] [default to 'text']
split_by	TextSplitStrategy	Strategy for splitting text into multiple documents.	[optional]
chunk_size	int	Target size for each chunk.	[optional] [default to 1000]
chunk_overlap	int	Number of units to overlap between consecutive chunks.	[optional] [default to 0]
segment_length_seconds	int	Length of each transcript segment in seconds (for time_segments split strategy). Shorter segments give more precise search results but more documents.	[optional] [default to 120]
language	str	Preferred language code for YouTube captions (when source_type='youtube').	[optional] [default to 'en']
extract_captions	bool	Extract auto-captions or manual subtitles from YouTube videos (when source_type='youtube'). Falls back to video description if False.	[optional] [default to True]
response_shape	ResponseShape2		[optional]
llm_provider	str	LLM provider for structured extraction (openai, google, anthropic).	[optional]
llm_model	str	Specific LLM model for structured extraction.	[optional]
llm_api_key	str	API key for LLM operations (BYOK - Bring Your Own Key). Supports: - Direct key: 'sk-proj-abc123...' - Secret reference: '{{SECRET.openai_api_key}}' When using secret reference, the key is loaded from your organization's secrets vault at runtime. Store secrets via POST /v1/organizations/secrets. If not provided, uses Mixpeek's default API keys.	[optional]

Example

from mixpeek.models.text_extractor_params import TextExtractorParams

# TODO update the JSON string below
json = "{}"
# create an instance of TextExtractorParams from a JSON string
text_extractor_params_instance = TextExtractorParams.from_json(json)
# print the JSON string representation of the object
print(TextExtractorParams.to_json())

# convert the object into a dict
text_extractor_params_dict = text_extractor_params_instance.to_dict()
# create an instance of TextExtractorParams from a dict
text_extractor_params_from_dict = TextExtractorParams.from_dict(text_extractor_params_dict)

[Back to Model list] [Back to API list] [Back to README]

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

TextExtractorParams

Properties

Example

FilesExpand file tree

TextExtractorParams.md

Latest commit

History

TextExtractorParams.md

File metadata and controls

TextExtractorParams

Properties

Example