Skip to content

dagthomas/comfyui_dagthomas

Repository files navigation

comfyui_dagthomas

您可δ»₯εœ¨θΏ™ι‡Œζ‰Ύεˆ°δΈ­ζ–‡δΏ‘ζ―

plugin.aix.ink

Advanced Prompt Generation & Multi-Model AI Integration for ComfyUI

A comprehensive suite of nodes for ComfyUI featuring multi-provider LLM support (OpenAI, Gemini, Claude, Grok, Groq, QwenVL), local model inference (Phi, MiniCPM, Ollama), professional image effects, and advanced prompt generation tools.


πŸ“¦ Installation

Method 1: ComfyUI Manager (Recommended)

Search for "comfyui_dagthomas" in ComfyUI Manager and click Install.

Method 2: Manual Installation

cd ComfyUI/custom_nodes
git clone https://github.com/dagthomas/comfyui_dagthomas
cd comfyui_dagthomas
pip install -r requirements.txt

πŸ”‘ API Key Configuration

Set your API keys as environment variables:

# OpenAI GPT
set OPENAI_API_KEY=sk-your-key-here

# Google Gemini
set GEMINI_API_KEY=your-key-here

# Anthropic Claude
set ANTHROPIC_API_KEY=your-key-here
# or
set CLAUDE_API_KEY=your-key-here

# xAI Grok
set XAI_API_KEY=your-key-here
# or
set GROK_API_KEY=your-key-here

# Groq
set GROQ_API_KEY=your-key-here

🧩 Node Categories

πŸ“ Universal Nodes (Model-Agnostic)

APNext Universal Generator

Display Name: APNext Universal Generator

A model-agnostic prompt generator that automatically detects available API keys and supports all major LLM providers.

Input Description
input_text Base text to enhance
model Select provider:model or "auto-detect"
generation_mode Creative, Balanced, Focused, or Custom
seed Seed for reproducible variations
style_preference Cinematic, Photorealistic, Artistic, etc.
detail_level Brief to Very Detailed output

Supported Models:

  • gpt:gpt-4o, gpt:gpt-4o-mini, gpt:gpt-4-turbo
  • gemini:gemini-2.5-flash, gemini:gemini-2.5-pro
  • claude:claude-sonnet-4.5, claude:claude-3-5-sonnet
  • grok:grok-beta, grok:grok-2-vision
  • groq:llama-3.3-70b-versatile

Returns: (generated_prompt, model_used, seed_used)


APNext Universal Vision Cloner

Display Name: APNext Universal Vision Cloner

Analyze images with any supported vision model to generate detailed descriptions or clone image styles.

Input Description
images One or more images to analyze
model Vision model to use (auto-detect available)
fade_percentage Blend percentage for multiple images
analysis_mode Detailed Analysis, Style Cloning, Scene Description, Creative Interpretation
output_format Text Only, JSON Structure, or Formatted Prompt

Returns: (formatted_output, raw_response, faded_image, model_used)


πŸ€– Google Gemini Nodes

Gemini Prompt Enhancer

Display Name: APNext Gemini Prompt Enhancer

Enhances prompts with cinematic terminology and LLM refinement for video/image generation.

Input Description
base_prompt Original prompt to enhance
enhancement_mode Random Mix, Cinematic/Lighting/Camera/Motion/Style Focus, Full Enhancement, or LLM Only
use_llm Enable Gemini LLM enhancement
intensity Enhancement intensity (0.1-2.0)
Optional dropdowns visual_style, lighting_type, camera_angle, shot_size, lens_type, color_tone, etc.

Returns: (enhanced_prompt, random_enhanced, llm_enhanced)


Gemini Custom Vision

Display Name: APNext Gemini Custom Vision

Analyze multiple images with custom prompts. Supports dynamic prompt templates with variable substitution.

Input Description
images Input images
custom_prompt Custom analysis prompt
dynamic_prompt Enable ##TAG##, ##SEX##, ##PRONOUNS##, ##WORDS## substitution
fade_percentage Blend multiple images together

Returns: (output, clip_l, faded_image)


Gemini Text Only

Display Name: APNext Gemini Text Only

Pure text generation with Gemini models. Supports dynamic prompt templates.

Returns: (output, clip_l)


Gemini Next Scene

Display Name: APNext Gemini Next Scene

Generate cinematic transitions for visual narratives. Creates the "next scene" based on a previous prompt and current frame.

Input Description
image Current frame image
original_prompt Previous scene description
focus_on Camera Movement, Framing Evolution, Environmental Reveals, Atmospheric Shifts
transition_intensity Subtle, Moderate, or Dramatic

Returns: (next_scene_prompt, short_description)


πŸ’¬ OpenAI GPT Nodes

GPT Mini Generator

Display Name: APNext GPT Mini Generator

Efficient text generation using GPT-4o-mini.

Input Description
input_text Text to enhance
happy_talk Enthusiastic vs professional tone
compress Enable output compression
poster Movie poster style formatting

GPT Vision Cloner

Display Name: APNext GPT Vision Cloner

Clone image styles using GPT-4o vision capabilities with custom prompts.


GPT Custom Vision

Display Name: APNext GPT Custom Vision

Full custom vision analysis with GPT-4o.


🧠 Anthropic Claude Nodes

Claude Text Generator

Display Name: APNext Claude Text Generator

Text generation with Claude models (Claude 3.5 Sonnet, Claude Sonnet 4.5).

Input Description
input_text Text to process
claude_model Model selection
happy_talk, compress, poster Output style controls
variation_instruction Custom instruction for creative variations

Claude Vision Analyzer

Display Name: APNext Claude Vision Analyzer

Image analysis with Claude's multimodal capabilities.


⚑ xAI Grok Nodes

Grok Text Generator

Display Name: APNext Grok Text Generator

Text generation using xAI's Grok models.


Grok Vision Analyzer

Display Name: APNext Grok Vision Analyzer

Image analysis with Grok vision models.


πŸš€ Groq Nodes (Ultra-Fast Inference)

Groq Text Generator

Display Name: APNext Groq Text Generator

Lightning-fast text generation using Groq's optimized infrastructure with Llama and Mixtral models.

Input Description
groq_model llama-3.3-70b-versatile, llama-3.1-8b-instant, etc.
Other standard LLM inputs

Groq Vision Analyzer

Display Name: APNext Groq Vision Analyzer

Fast image analysis with Groq vision models.


πŸ” QwenVL Nodes (Local Vision)

QwenVL Vision Analyzer

Display Name: APNext QwenVL Vision Analyzer

Local vision analysis using Qwen-VL models. Downloads models automatically.

Input Description
images Input images
qwen_model Qwen3-VL-4B-Instruct, etc.
max_tokens Maximum response length
keep_model_loaded Cache model in memory

QwenVL Vision Cloner

Display Name: APNext QwenVL Vision Cloner

Clone image styles locally without API calls.


QwenVL Video Analyzer

Display Name: APNext QwenVL Video Analyzer

Analyze video content frame-by-frame.


QwenVL Next Scene

Display Name: APNext QwenVL Next Scene

Generate cinematic scene transitions locally using QwenVL models. Takes a previous scene description and 1-5 frame images, then creates natural camera movements, framing evolution, and atmospheric shifts. Multiple frames help the model understand motion/progression.

Input Description
images 1-5 frame images (batch)
original_prompt Previous scene description
qwen_model QwenVL model to use
prompt_file Custom prompt template file
custom_prompt Override with inline prompt (optional)
max_frames Max frames to use from batch (1-5)
focus_on Camera Movement, Framing Evolution, Environmental Reveals, Atmospheric Shifts
transition_intensity Subtle, Moderate, or Dramatic
keep_model_loaded Cache model in memory

Returns: (next_scene_prompt, short_description)

Custom Prompts: Create your own prompt templates in data/custom_prompts/. Use ##ORIGINAL_PROMPT## as placeholder for the previous scene description. Included templates:

  • next_scene.txt - Default detailed cinematography prompt
  • qwen_next_scene_simple.txt - Simplified version
  • qwen_next_scene_video.txt - Optimized for AI video generation

QwenVL Frame Prep

Display Name: APNext QwenVL Frame Prep

Utility node to prepare multiple images for QwenVL Next Scene. Accepts up to 5 individual images or a batch, scales them to max dimensions, and outputs a batched tensor.

Input Description
max_width Maximum width (default 1024)
max_height Maximum height (default 1024)
image_1 - image_5 Individual image inputs
image_batch Pre-batched images (optional)

Returns: (images, frame_count)


QwenVL Z-Image Vision

Display Name: APNext QwenVL Z-Image Vision

Analyzes images and outputs in Z-Image TurnBuilder chat format with <|im_start|>/<|im_end|> tokens.


πŸ¦™ Ollama Nodes (Local LLM)

Ollama Node

Display Name: APNext OllamaNode

Local LLM inference using Ollama. Supports any model installed in your Ollama instance.

Input Description
input_text Text to process
model_name Any Ollama model (llama3, mistral, etc.)
happy_talk, compress Output controls

Ollama Vision

Display Name: APNext OllamaVision

Local vision analysis with Ollama multimodal models (llava, bakllava, etc.).


πŸ“Έ MiniCPM Nodes (Local Vision)

MiniCPM Image Node

Display Name: APNext MiniCPM Image

Image understanding with MiniCPM-V 4.5 (OpenBMB). Supports thinking mode for complex reasoning.

Input Description
images Input images
question Question about the image
enable_thinking Deep reasoning mode
precision bfloat16 or float16
unload_after_inference Free memory after use

MiniCPM Video Node

Display Name: APNext MiniCPM Video

Video understanding and analysis.


πŸ”¬ Phi Nodes (Microsoft Vision)

Phi Model Loader

Display Name: APNext Phi Model Loader

Load Microsoft Phi-3.5-vision-instruct model.

Input Description
model_version Phi-3.5-vision-instruct
image_crops 4 or 16 crops for detail
attention_mechanism flash_attention_2, sdpa, or eager

Phi Model Inference / Custom Inference

Display Name: APNext Phi Model Inference

Run inference with loaded Phi model.


🎨 Image FX Nodes

Professional image effects using optimized tensor operations.

APNext Bloom FX

Creates a bloom/glow effect on bright areas.

Input Description
intensity Bloom strength (0-5)
threshold Brightness threshold (0-1)
blur_radius Glow spread (1-50)
blend_mode additive, screen, or overlay

APNext Color Grading FX

Professional color grading with LUT support or manual controls.

Input Description
method manual or lut_file
lut_file .cube, .3dl, or image LUT
exposure -3 to +3 stops
contrast, saturation Standard adjustments
highlights, shadows Tone controls
temperature, tint White balance

Supported LUT Formats: .cube (Adobe/Blackmagic), .3dl (Autodesk/Flame), Image LUTs (.png, .jpg)


APNext Sharpen FX

Intelligent image sharpening.


APNext Noise FX

Add film grain and noise effects.


APNext Rough FX

Add texture and roughness.


APNext Cross Processing FX

Film cross-processing color effects.


APNext Split Toning FX

Separate color toning for highlights and shadows.


APNext HDR Tone Mapping FX

HDR-style tone mapping.


APNext Glitch Art FX

Digital glitch and databending effects.


APNext Film Halation FX

Classic film halation (light bleeding) effect.


πŸ“ Latent Generators

APNext Latent Generator

Display Name: APNext Latent Generator

Generate latent tensors with intelligent dimension calculation.

Input Description
width, height Base dimensions (0 = auto-calculate)
megapixel_scale Target megapixels (0.1-2.0)
aspect_ratio 1:1, 3:2, 4:3, 16:9, 21:9
is_portrait Portrait orientation

Returns: (LATENT, width, height)


PGSD3 Latent Generator

Display Name: APNext PGSD3LatentGenerator

Optimized latent generation for Stable Diffusion 3 pipelines.


🎲 Prompt Generators

Auto Prompter

Display Name: Auto Prompter

Generate random prompts from extensive category databases.

Input Description
subject Main subject (can include LoRA triggers)
custom Prefix text for styling
artform Photography, digital art, etc.
Various category selections Random or specific choices

APNext Node

Display Name: APNext Node

Advanced prompt building with category-based enhancements.

Overview

Node Family Overview

The system includes numerous nodes that can be chained together to create complex workflows:

Node Chaining Example

Supports 24 main categories with subcategories:

  • Architecture: styles, buildings, interiors, materials
  • Art: painting, sculpture, techniques, palettes
  • Artist: concept artists, illustrators, painters
  • Character: anime, fantasy, sci-fi, superheroes
  • Cinematic: directors, genres, effects, color grading
  • Fashion: designers, outfits, accessories
  • Feelings: emotional modifiers
  • Geography: countries, nationalities
  • Human: jobs, hobbies, groups
  • Interaction: individual, couple, group, crowd interactions
  • Keywords: modifiers, genres, trending terms
  • People: archetypes, body types, expressions
  • Photography: cameras, lenses, lighting, film types
  • Plots: action, romance, horror, sci-fi scenarios
  • Poses: portrait and action poses
  • Scene: weather, textures, environments
  • Science: astronomy, mathematics, medical
  • Stuff: seasonal objects, gadgets, fantasy items
  • Time: eras, decades, centuries
  • Typography: fonts, word art styles
  • Vehicle: cars, classic cars, vehicle types
  • Video Game: games, engines, actions

πŸ”§ Utility Nodes

String Merger

Display Name: APNext String Merger

Combine multiple strings with separators.


Flexible String Merger

Display Name: APNext Flexible String Merger

Advanced string combining with custom formatting.


Sentence Mixer

Display Name: APNext Sentence Mixer

Shuffle and mix sentences from multiple inputs for creative variations.


Custom Prompt Loader

Display Name: APNext Custom Prompts

Load prompt templates from the data/custom_prompts/ directory.

Included templates:

  • promptcreator.txt - Full creative prompt generation
  • image_analyze.txt - Image analysis prompts
  • gemini_video.txt - Video generation prompts
  • cloner.txt - Style cloning prompts
  • Various LoRA-specific templates (ohwx, t5xxl, etc.)

Local Random Prompt

Display Name: APNext Local random prompt

Load random prompts from local text files.


Random Integer Generator

Display Name: APNext Random Integer Generator

Generate random integers with min/max range.


πŸ“ Adding Custom Categories

Create your own categories for APNextNode:

  1. Create a folder in data/next/ (e.g., data/next/mycategory/)
  2. Add JSON files for each field

Simple Format

["item1", "item2", "item3"]

Advanced Format

{
  "preprompt": "with",
  "separator": " and ",
  "endprompt": "visual effects",
  "items": ["motion blur", "lens flare", "particle effects"],
  "attributes": {
    "motion blur": ["dynamic", "cinematic"],
    "lens flare": ["bright", "atmospheric"]
  }
}

πŸ“ Custom Prompt Templates

Create your own prompt templates for use with the Custom Prompt Loader node.

Location

Place .txt files in: data/custom_prompts/

Creating a Template

Templates are plain text files containing instructions for LLM nodes. They support dynamic variable substitution:

Variable Description
##TAG## Replaced with the tag input (e.g., "ohwx man")
##SEX## Replaced with the sex input (e.g., "male", "female")
##PRONOUNS## Replaced with pronouns (e.g., "him, his")
##WORDS## Replaced with target word count

Example Template

Create a file data/custom_prompts/my_style.txt:

As a professional art critic, describe the provided image in detail.
Focus on creating a cohesive scene as if describing a movie still.

If the subject is ##TAG##, use ##PRONOUNS## pronouns appropriately.
The subject is ##SEX##.

Include:
- Main subject description with clothing, accessories, position
- Setting and environment details
- Lighting type, direction, and atmosphere
- Color palette and emotional tone
- Camera angle and composition

Output approximately ##WORDS## words.
Do not use JSON format. Provide a single cohesive paragraph.

Included Templates

Template Purpose
promptcreator.txt Detailed image analysis (~150 words)
promptcreator_small.txt Concise image analysis
image_analyze.txt General image description
cloner.txt Style cloning prompts
gemini_video.txt Video generation prompts
gemini_ohwx.txt LoRA trigger-aware prompts
t5xxl.txt T5-XXL optimized prompts
ltxv.txt LTX Video model prompts
next_scene.txt Cinematic scene transitions

βš™οΈ Configuring LLM Models

Customize available models by editing JSON configuration files in the data/ folder.

Model Configuration Files

File Provider Description
gemini_models.json Google Gemini Gemini model list
gpt_models.json OpenAI GPT model list
claude_models.json Anthropic Claude model list
grok_models.json xAI Grok model list
groq_models.json Groq Groq model list (text + vision)
qwenvl_models.json QwenVL Local Qwen vision models

QwenVL Models - Adding Private/Custom Models

QwenVL nodes support loading additional models from private configuration files. This allows you to add custom or uncensored models without modifying the main configuration.

How to add private models:

  1. Create a JSON file in data/ with a name matching private_*qwenvl*.json

    • Examples: private_qwenvl_models.json, private_uncensored.qwenvl_models.json
  2. Use the same format as qwenvl_models.json:

{
    "models": [
        "huihui-ai/Huihui-Qwen3-VL-4B-Instruct-abliterated",
        "huihui-ai/Huihui-Qwen3-VL-8B-Instruct-abliterated",
        "another-namespace/custom-model"
    ]
}
  1. Restart ComfyUI - the models will appear in the QwenVL node dropdowns

Notes:

  • Private files are loaded in addition to the main qwenvl_models.json
  • Duplicate models are automatically filtered out
  • Supports full HuggingFace repo paths (namespace/model-name)
  • Models are downloaded to ComfyUI/models/LLM/Qwen-VL/ on first use

Basic Format

Most model files use a simple array format:

{
    "models": [
        "model-name-1",
        "model-name-2",
        "model-name-3"
    ]
}

Example: Adding New Gemini Models

Edit data/gemini_models.json:

{
    "models": [
        "gemini-2.5-pro",
        "gemini-2.5-flash",
        "gemini-flash-latest",
        "gemini-flash-lite-latest",
        "gemini-2.5-flash-lite",
        "gemini-exp-1206"
    ]
}

Example: Adding New Claude Models

Edit data/claude_models.json:

{
    "models": [
        "claude-sonnet-4.5",
        "claude-sonnet-4",
        "claude-sonnet-3.7",
        "claude-opus-4.1",
        "claude-opus-4",
        "claude-haiku-3.5",
        "claude-haiku-3"
    ]
}

Groq Models (Advanced Format)

Groq supports separate text and vision model lists:

{
    "text_models": [
        "llama-3.3-70b-versatile",
        "llama-3.1-8b-instant",
        "groq/compound",
        "qwen/qwen3-32b"
    ],
    "vision_models": [
        "meta-llama/llama-4-scout-17b-16e-instruct",
        "meta-llama/llama-4-maverick-17b-128e-instruct"
    ],
    "note": "Edit this file to add/remove models"
}

Notes

  • Restart ComfyUI after editing model configuration files
  • For Groq, the system will first try to fetch models from the API, then fall back to the JSON file
  • Model names must match exactly what the provider's API expects
  • Invalid model names will cause API errors at runtime

πŸ–ΌοΈ Example Workflows

Example workflows are available in the examples/ directory:

  • APNext workflows: examples/flux/apnext/
  • Florence2 local: examples/flux/florence2/
  • GPT-4o Vision: examples/flux/gpt-4o_vision/
  • Ollama local: examples/flux/ollama_local_llm/
  • MiniCPM: examples/minicpm/

πŸ“‹ Requirements

Pillow>=10.4.0
requests>=2.32.5
openai>=1.44.0
blend-modes>=2.1.0
huggingface_hub>=0.34.0
color_matcher>=0.5.0
chardet>=5.2.0
google-generativeai>=0.7.2
anthropic
transformers>=4.40.0
decord>=0.6.0
scipy>=1.10.0
tqdm>=4.67.1

πŸ”„ Model Support Matrix

Provider Text Vision Video Local
OpenAI GPT βœ… βœ… ❌ ❌
Google Gemini βœ… βœ… βœ… ❌
Anthropic Claude βœ… βœ… ❌ ❌
xAI Grok βœ… βœ… ❌ ❌
Groq βœ… βœ… ❌ ❌
QwenVL βœ… βœ… βœ… βœ…
Ollama βœ… βœ… ❌ βœ…
MiniCPM βœ… βœ… βœ… βœ…
Phi-3.5 βœ… βœ… ❌ βœ…

πŸ“ License

MIT License


πŸ™ Acknowledgments

Built for the ComfyUI community. Special thanks to all contributors and users providing feedback.

About

comfyui_dagthomas - Advanced Prompt Generation and Image Analysis

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

 
 
 

Contributors

Languages