Skip to content

morgang5522/image-analysis

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

2 Commits
 
 
 
 
 
 
 
 

Repository files navigation

ImageAnalysis

A high-performance .NET tool for batch image analysis and labeling using Vision-Language Models (VLM). This application processes images in bulk, optimizes them for AI inference, and generates structured JSON sidecar files for each image.

Features

  • Batch Processing: Efficiently scans directories for common image formats (.jpg, .png, .webp, .bmp, .tiff).
  • Parallel Execution: Processes multiple images concurrently (default: 4 at a time) using Parallel.ForEachAsync.
  • Image Optimization: Automatically resizes images to a maximum longest edge of 1024px and compresses them to 80% quality JPEG in-memory. This reduces API latency and bandwidth without sacrificing analysis quality.
  • Dynamic Configuration: Customize the labeling schema and instructions via prompt.json.
  • Idempotent Updates: Automatically skips files that already have a corresponding .json sidecar file, allowing you to resume interrupted jobs.
  • Regex Extraction: Robustly extracts JSON results from AI responses, even if the model includes additional conversational text.

Prerequisites

  • .NET SDK (8.0 or later recommended)
  • AI Backend: An OpenAI-compatible API providing Vision-Language capabilities (e.g., LM Studio, Ollama, or remote providers).
  • Vision Model: A model capable of image analysis (e.g., qwen/qwen3-vl-4b).

Configuration

The application requires a prompt.json file in the working directory to define the output format and labeling rules.

Example prompt.json

{
  "additional_rules": [
    "Describe the lighting and atmosphere.",
    "Identify any prominent objects or subjects."
  ],
  "schema": {
    "subject": "string",
    "location": "indoor|outdoor",
    "lighting": "natural|studio|low_light",
    "tags": ["list", "of", "keywords"],
    "summary": "Short description"
  }
}

Usage

Run the tool using the dotnet CLI:

dotnet run -- <folder_path> [model_name] [api_url]

Arguments

  1. <folder_path> (Required): The absolute or relative path to the directory containing images.
  2. [model_name] (Optional): The model identifier to use. Defaults to qwen/qwen3-vl-4b.
  3. [api_url] (Optional): The endpoint for the AI service. Defaults to http://localhost:1234/v1/chat/completions.

Example

dotnet run -- ~/Pictures/Vacation "qwen/qwen3-vl-4b" "http://localhost:1234/v1/chat/completions"

How It Works

  1. Scanning: The app finds all images in the specified folder.
  2. Skipping: It checks for existing .json files to avoid redundant work.
  3. Optimization: Images are loaded, resized, and encoded as base64 strings.
  4. Inference: A system prompt is constructed using your schema and rules, then sent to the AI alongside the image.
  5. Extraction: The structured JSON is extracted from the model's response.
  6. Storage: The result is saved as image_name.ext.json in the same directory.

About

High-performance .NET tool for batch image analysis and labeling using Vision-Language Models

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

 
 
 

Contributors

Languages