crawl4ai version
0.7.6
Expected Behavior
Adaptive crawler. strategy="embedding",
embedding_model="openai/text-embedding-3-small", embedding_llm_config=LLMConfig( provider="openai/gpt-4o-mini", // openai/text-embedding-3-small fails here too api_token=OPENAI_API_KEY, ),
Query:
'capabilities, services,
certifications, description of work,
products'
Query Expansion:
Original query expanded to 12 variations
- Where can I compare prices for various products?
- What new products have been launched this year?
- What products are recommended for pet owners?
- What are the must-have products for outdoor activities?
Current Behavior
• Query variations are replaced by a hard-coded “fried rice” list.
• embedding_llm_config is reused for both generation and embeddings, so the wrong provider/model can hit the wrong API:
• Chat model sent to embeddings endpoint → 403.
• Embedding model used as a “provider” for text generation → failures or zero variations.
• Embedding dimension sometimes mismatches the configured embedding_model.
Is this reproducible?
Yes
Inputs Causing the Bug
line 700
map_query_semantic_space function uses left over mock data
doesn't use the correct model for expansion or embedding
Steps to Reproduce
A) hard coded query variations:
1. Use strategy="embedding" and call AdaptiveCrawler.digest(...).
2. Observe variations list: always food-related (“fried rice…”) regardless of query.
B) 403 when embeddings are requested
`AdaptiveConfig(
strategy="embedding",
embedding_model="openai/text-embedding-3-small",
embedding_llm_config=LLMConfig(
provider="openai/gpt-4o-mini",
api_token=OPENAI_API_KEY,
),
n_query_variations=12,
)`
2. Run digest(...).
3. Intermittently see: `403 - You are not allowed to generate embeddings from this model` or end up with variations: 0, and embedding dims/behavior inconsistent with the configured model.
Code snippets
adaptive = AdaptiveCrawler(crawler, adaptive_cfg)
result = await adaptive.digest(start_url=start_url, query=query)
Or literally just run the adaptive crawler example available in the Craw4ai repository.
OS
macOS
Python version
3.13.5
Browser
No response
Browser version
No response
Error logs & Screenshots (if applicable)
litellm.exceptions.BadRequestError: litellm.BadRequestError: OpenAIException - Error code: 403 - {'error': {'message': 'You are not allowed to generate embeddings from this model', 'type': 'invalid_request_error', 'param': None, 'code': None}}
and
Adaptive Crawl Stats - Query:
'capabilities, services,
certifications, description of work,
products'
Query Expansion:
Original query expanded to 4 variations
- how to add flavor to vegetable fried rice?
- what are the best vegetables to use in fried rice?
- are there any tips for making healthy fried rice with vegetables?
- how do I make vegetable fried rice from scratch?
...
`
crawl4ai version
0.7.6
Expected Behavior
Adaptive crawler.
strategy="embedding",embedding_model="openai/text-embedding-3-small", embedding_llm_config=LLMConfig( provider="openai/gpt-4o-mini", // openai/text-embedding-3-small fails here too api_token=OPENAI_API_KEY, ),Query:
'capabilities, services,
certifications, description of work,
products'
Query Expansion:
Original query expanded to 12 variations
Current Behavior
Is this reproducible?
Yes
Inputs Causing the Bug
Steps to Reproduce
Code snippets
OS
macOS
Python version
3.13.5
Browser
No response
Browser version
No response
Error logs & Screenshots (if applicable)
litellm.exceptions.BadRequestError: litellm.BadRequestError: OpenAIException - Error code: 403 - {'error': {'message': 'You are not allowed to generate embeddings from this model', 'type': 'invalid_request_error', 'param': None, 'code': None}}andAdaptive Crawl Stats - Query:
'capabilities, services,
certifications, description of work,
products'
Query Expansion:
Original query expanded to 4 variations
...
`