feat: vision model support — image-to-text analysis capability

## Summary

FoodFiles needs to route **vision/image analysis** calls through `llm-providers` for its core photo-to-recipe workflow: a user uploads a photograph of a recipe card or dish, and the system extracts structured recipe data (ingredients, instructions, nutrition estimates) from the image.

Currently the FoodFiles demo uses static/hardcoded recipes. The real processing path needs vision model support in `llm-providers` so we can send an image (base64 or URL) and get structured text back.

## Requested capability

A way to send an image (or image + text prompt) through the existing `llm-providers` routing/fallback infrastructure and receive a text completion. Essentially: `generate()` or a new `analyzeImage()` method that accepts a multimodal message (image + prompt) and routes it to a vision-capable model.

## Use case

```
User uploads photo of handwritten recipe card
  → edge-auth: authn + quota check
  → llm-providers: vision model extracts text + structure from image
  → foodfiles-v2: formats into structured recipe, stores in D1
```

## Models to consider

- **Claude** (Anthropic) — vision via messages API, strong at structured extraction
- **GPT-4o** (OpenAI) — vision via chat completions, good at OCR-like tasks
- **Gemini** (Google) — multimodal native
- **Cloudflare Workers AI** — if any vision models are available on the platform

The choice of model/provider can be opaque to FoodFiles — we just need to send an image and a prompt and get structured text back. The fallback chain and cost routing should work the same as text-only calls.

## Questions for the llm-providers team

1. Does the current `generate()` interface already support multimodal messages (image content parts), and we just need to ensure at least one provider in the chain has a vision-capable model?
2. Or does this need a new method/interface for image inputs?
3. Any considerations around image size limits, base64 vs URL, or preprocessing that should happen caller-side vs provider-side?

## Context

- Consumer: `foodfiles-v2` worker (Stackbilt-dev/foodfiles)
- Auth: edge-auth service binding (already wired)
- Current demo: static recipes in `RecipeGeneratorDemo.tsx` — no live inference
- The old implementation used a direct Groq API key in the worker, which was flagged as a security issue (Stackbilt-dev/foodfiles#65) and removed

Filed from: Stackbilt-dev/foodfiles context (editorial design Phase 1 + demo section work)

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

feat: vision model support — image-to-text analysis capability #35

Summary

Requested capability

Use case

Models to consider

Questions for the llm-providers team

Context

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

feat: vision model support — image-to-text analysis capability #35

Description

Summary

Requested capability

Use case

Models to consider

Questions for the llm-providers team

Context

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions