A high-performance .NET tool for batch image analysis and labeling using Vision-Language Models (VLM). This application processes images in bulk, optimizes them for AI inference, and generates structured JSON sidecar files for each image.
- Batch Processing: Efficiently scans directories for common image formats (
.jpg,.png,.webp,.bmp,.tiff). - Parallel Execution: Processes multiple images concurrently (default: 4 at a time) using
Parallel.ForEachAsync. - Image Optimization: Automatically resizes images to a maximum longest edge of 1024px and compresses them to 80% quality JPEG in-memory. This reduces API latency and bandwidth without sacrificing analysis quality.
- Dynamic Configuration: Customize the labeling schema and instructions via
prompt.json. - Idempotent Updates: Automatically skips files that already have a corresponding
.jsonsidecar file, allowing you to resume interrupted jobs. - Regex Extraction: Robustly extracts JSON results from AI responses, even if the model includes additional conversational text.
- .NET SDK (8.0 or later recommended)
- AI Backend: An OpenAI-compatible API providing Vision-Language capabilities (e.g., LM Studio, Ollama, or remote providers).
- Vision Model: A model capable of image analysis (e.g.,
qwen/qwen3-vl-4b).
The application requires a prompt.json file in the working directory to define the output format and labeling rules.
{
"additional_rules": [
"Describe the lighting and atmosphere.",
"Identify any prominent objects or subjects."
],
"schema": {
"subject": "string",
"location": "indoor|outdoor",
"lighting": "natural|studio|low_light",
"tags": ["list", "of", "keywords"],
"summary": "Short description"
}
}Run the tool using the dotnet CLI:
dotnet run -- <folder_path> [model_name] [api_url]<folder_path>(Required): The absolute or relative path to the directory containing images.[model_name](Optional): The model identifier to use. Defaults toqwen/qwen3-vl-4b.[api_url](Optional): The endpoint for the AI service. Defaults tohttp://localhost:1234/v1/chat/completions.
dotnet run -- ~/Pictures/Vacation "qwen/qwen3-vl-4b" "http://localhost:1234/v1/chat/completions"- Scanning: The app finds all images in the specified folder.
- Skipping: It checks for existing
.jsonfiles to avoid redundant work. - Optimization: Images are loaded, resized, and encoded as base64 strings.
- Inference: A system prompt is constructed using your
schemaandrules, then sent to the AI alongside the image. - Extraction: The structured JSON is extracted from the model's response.
- Storage: The result is saved as
image_name.ext.jsonin the same directory.