Advanced Prompt Generation & Multi-Model AI Integration for ComfyUI
A comprehensive suite of nodes for ComfyUI featuring multi-provider LLM support (OpenAI, Gemini, Claude, Grok, Groq, QwenVL), local model inference (Phi, MiniCPM, Ollama), professional image effects, and advanced prompt generation tools.
Search for "comfyui_dagthomas" in ComfyUI Manager and click Install.
cd ComfyUI/custom_nodes
git clone https://github.com/dagthomas/comfyui_dagthomas
cd comfyui_dagthomas
pip install -r requirements.txtSet your API keys as environment variables:
# OpenAI GPT
set OPENAI_API_KEY=sk-your-key-here
# Google Gemini
set GEMINI_API_KEY=your-key-here
# Anthropic Claude
set ANTHROPIC_API_KEY=your-key-here
# or
set CLAUDE_API_KEY=your-key-here
# xAI Grok
set XAI_API_KEY=your-key-here
# or
set GROK_API_KEY=your-key-here
# Groq
set GROQ_API_KEY=your-key-hereDisplay Name: APNext Universal Generator
A model-agnostic prompt generator that automatically detects available API keys and supports all major LLM providers.
| Input | Description |
|---|---|
input_text |
Base text to enhance |
model |
Select provider:model or "auto-detect" |
generation_mode |
Creative, Balanced, Focused, or Custom |
seed |
Seed for reproducible variations |
style_preference |
Cinematic, Photorealistic, Artistic, etc. |
detail_level |
Brief to Very Detailed output |
Supported Models:
gpt:gpt-4o,gpt:gpt-4o-mini,gpt:gpt-4-turbogemini:gemini-2.5-flash,gemini:gemini-2.5-proclaude:claude-sonnet-4.5,claude:claude-3-5-sonnetgrok:grok-beta,grok:grok-2-visiongroq:llama-3.3-70b-versatile
Returns: (generated_prompt, model_used, seed_used)
Display Name: APNext Universal Vision Cloner
Analyze images with any supported vision model to generate detailed descriptions or clone image styles.
| Input | Description |
|---|---|
images |
One or more images to analyze |
model |
Vision model to use (auto-detect available) |
fade_percentage |
Blend percentage for multiple images |
analysis_mode |
Detailed Analysis, Style Cloning, Scene Description, Creative Interpretation |
output_format |
Text Only, JSON Structure, or Formatted Prompt |
Returns: (formatted_output, raw_response, faded_image, model_used)
Display Name: APNext Gemini Prompt Enhancer
Enhances prompts with cinematic terminology and LLM refinement for video/image generation.
| Input | Description |
|---|---|
base_prompt |
Original prompt to enhance |
enhancement_mode |
Random Mix, Cinematic/Lighting/Camera/Motion/Style Focus, Full Enhancement, or LLM Only |
use_llm |
Enable Gemini LLM enhancement |
intensity |
Enhancement intensity (0.1-2.0) |
| Optional dropdowns | visual_style, lighting_type, camera_angle, shot_size, lens_type, color_tone, etc. |
Returns: (enhanced_prompt, random_enhanced, llm_enhanced)
Display Name: APNext Gemini Custom Vision
Analyze multiple images with custom prompts. Supports dynamic prompt templates with variable substitution.
| Input | Description |
|---|---|
images |
Input images |
custom_prompt |
Custom analysis prompt |
dynamic_prompt |
Enable ##TAG##, ##SEX##, ##PRONOUNS##, ##WORDS## substitution |
fade_percentage |
Blend multiple images together |
Returns: (output, clip_l, faded_image)
Display Name: APNext Gemini Text Only
Pure text generation with Gemini models. Supports dynamic prompt templates.
Returns: (output, clip_l)
Display Name: APNext Gemini Next Scene
Generate cinematic transitions for visual narratives. Creates the "next scene" based on a previous prompt and current frame.
| Input | Description |
|---|---|
image |
Current frame image |
original_prompt |
Previous scene description |
focus_on |
Camera Movement, Framing Evolution, Environmental Reveals, Atmospheric Shifts |
transition_intensity |
Subtle, Moderate, or Dramatic |
Returns: (next_scene_prompt, short_description)
Display Name: APNext GPT Mini Generator
Efficient text generation using GPT-4o-mini.
| Input | Description |
|---|---|
input_text |
Text to enhance |
happy_talk |
Enthusiastic vs professional tone |
compress |
Enable output compression |
poster |
Movie poster style formatting |
Display Name: APNext GPT Vision Cloner
Clone image styles using GPT-4o vision capabilities with custom prompts.
Display Name: APNext GPT Custom Vision
Full custom vision analysis with GPT-4o.
Display Name: APNext Claude Text Generator
Text generation with Claude models (Claude 3.5 Sonnet, Claude Sonnet 4.5).
| Input | Description |
|---|---|
input_text |
Text to process |
claude_model |
Model selection |
happy_talk, compress, poster |
Output style controls |
variation_instruction |
Custom instruction for creative variations |
Display Name: APNext Claude Vision Analyzer
Image analysis with Claude's multimodal capabilities.
Display Name: APNext Grok Text Generator
Text generation using xAI's Grok models.
Display Name: APNext Grok Vision Analyzer
Image analysis with Grok vision models.
Display Name: APNext Groq Text Generator
Lightning-fast text generation using Groq's optimized infrastructure with Llama and Mixtral models.
| Input | Description |
|---|---|
groq_model |
llama-3.3-70b-versatile, llama-3.1-8b-instant, etc. |
| Other standard LLM inputs |
Display Name: APNext Groq Vision Analyzer
Fast image analysis with Groq vision models.
Display Name: APNext QwenVL Vision Analyzer
Local vision analysis using Qwen-VL models. Downloads models automatically.
| Input | Description |
|---|---|
images |
Input images |
qwen_model |
Qwen3-VL-4B-Instruct, etc. |
max_tokens |
Maximum response length |
keep_model_loaded |
Cache model in memory |
Display Name: APNext QwenVL Vision Cloner
Clone image styles locally without API calls.
Display Name: APNext QwenVL Video Analyzer
Analyze video content frame-by-frame.
Display Name: APNext QwenVL Next Scene
Generate cinematic scene transitions locally using QwenVL models. Takes a previous scene description and 1-5 frame images, then creates natural camera movements, framing evolution, and atmospheric shifts. Multiple frames help the model understand motion/progression.
| Input | Description |
|---|---|
images |
1-5 frame images (batch) |
original_prompt |
Previous scene description |
qwen_model |
QwenVL model to use |
prompt_file |
Custom prompt template file |
custom_prompt |
Override with inline prompt (optional) |
max_frames |
Max frames to use from batch (1-5) |
focus_on |
Camera Movement, Framing Evolution, Environmental Reveals, Atmospheric Shifts |
transition_intensity |
Subtle, Moderate, or Dramatic |
keep_model_loaded |
Cache model in memory |
Returns: (next_scene_prompt, short_description)
Custom Prompts: Create your own prompt templates in data/custom_prompts/. Use ##ORIGINAL_PROMPT## as placeholder for the previous scene description. Included templates:
next_scene.txt- Default detailed cinematography promptqwen_next_scene_simple.txt- Simplified versionqwen_next_scene_video.txt- Optimized for AI video generation
Display Name: APNext QwenVL Frame Prep
Utility node to prepare multiple images for QwenVL Next Scene. Accepts up to 5 individual images or a batch, scales them to max dimensions, and outputs a batched tensor.
| Input | Description |
|---|---|
max_width |
Maximum width (default 1024) |
max_height |
Maximum height (default 1024) |
image_1 - image_5 |
Individual image inputs |
image_batch |
Pre-batched images (optional) |
Returns: (images, frame_count)
Display Name: APNext QwenVL Z-Image Vision
Analyzes images and outputs in Z-Image TurnBuilder chat format with <|im_start|>/<|im_end|> tokens.
Display Name: APNext OllamaNode
Local LLM inference using Ollama. Supports any model installed in your Ollama instance.
| Input | Description |
|---|---|
input_text |
Text to process |
model_name |
Any Ollama model (llama3, mistral, etc.) |
happy_talk, compress |
Output controls |
Display Name: APNext OllamaVision
Local vision analysis with Ollama multimodal models (llava, bakllava, etc.).
Display Name: APNext MiniCPM Image
Image understanding with MiniCPM-V 4.5 (OpenBMB). Supports thinking mode for complex reasoning.
| Input | Description |
|---|---|
images |
Input images |
question |
Question about the image |
enable_thinking |
Deep reasoning mode |
precision |
bfloat16 or float16 |
unload_after_inference |
Free memory after use |
Display Name: APNext MiniCPM Video
Video understanding and analysis.
Display Name: APNext Phi Model Loader
Load Microsoft Phi-3.5-vision-instruct model.
| Input | Description |
|---|---|
model_version |
Phi-3.5-vision-instruct |
image_crops |
4 or 16 crops for detail |
attention_mechanism |
flash_attention_2, sdpa, or eager |
Display Name: APNext Phi Model Inference
Run inference with loaded Phi model.
Professional image effects using optimized tensor operations.
Creates a bloom/glow effect on bright areas.
| Input | Description |
|---|---|
intensity |
Bloom strength (0-5) |
threshold |
Brightness threshold (0-1) |
blur_radius |
Glow spread (1-50) |
blend_mode |
additive, screen, or overlay |
Professional color grading with LUT support or manual controls.
| Input | Description |
|---|---|
method |
manual or lut_file |
lut_file |
.cube, .3dl, or image LUT |
exposure |
-3 to +3 stops |
contrast, saturation |
Standard adjustments |
highlights, shadows |
Tone controls |
temperature, tint |
White balance |
Supported LUT Formats: .cube (Adobe/Blackmagic), .3dl (Autodesk/Flame), Image LUTs (.png, .jpg)
Intelligent image sharpening.
Add film grain and noise effects.
Add texture and roughness.
Film cross-processing color effects.
Separate color toning for highlights and shadows.
HDR-style tone mapping.
Digital glitch and databending effects.
Classic film halation (light bleeding) effect.
Display Name: APNext Latent Generator
Generate latent tensors with intelligent dimension calculation.
| Input | Description |
|---|---|
width, height |
Base dimensions (0 = auto-calculate) |
megapixel_scale |
Target megapixels (0.1-2.0) |
aspect_ratio |
1:1, 3:2, 4:3, 16:9, 21:9 |
is_portrait |
Portrait orientation |
Returns: (LATENT, width, height)
Display Name: APNext PGSD3LatentGenerator
Optimized latent generation for Stable Diffusion 3 pipelines.
Display Name: Auto Prompter
Generate random prompts from extensive category databases.
| Input | Description |
|---|---|
subject |
Main subject (can include LoRA triggers) |
custom |
Prefix text for styling |
artform |
Photography, digital art, etc. |
| Various category selections | Random or specific choices |
Display Name: APNext Node
Advanced prompt building with category-based enhancements.
The system includes numerous nodes that can be chained together to create complex workflows:
Supports 24 main categories with subcategories:
- Architecture: styles, buildings, interiors, materials
- Art: painting, sculpture, techniques, palettes
- Artist: concept artists, illustrators, painters
- Character: anime, fantasy, sci-fi, superheroes
- Cinematic: directors, genres, effects, color grading
- Fashion: designers, outfits, accessories
- Feelings: emotional modifiers
- Geography: countries, nationalities
- Human: jobs, hobbies, groups
- Interaction: individual, couple, group, crowd interactions
- Keywords: modifiers, genres, trending terms
- People: archetypes, body types, expressions
- Photography: cameras, lenses, lighting, film types
- Plots: action, romance, horror, sci-fi scenarios
- Poses: portrait and action poses
- Scene: weather, textures, environments
- Science: astronomy, mathematics, medical
- Stuff: seasonal objects, gadgets, fantasy items
- Time: eras, decades, centuries
- Typography: fonts, word art styles
- Vehicle: cars, classic cars, vehicle types
- Video Game: games, engines, actions
Display Name: APNext String Merger
Combine multiple strings with separators.
Display Name: APNext Flexible String Merger
Advanced string combining with custom formatting.
Display Name: APNext Sentence Mixer
Shuffle and mix sentences from multiple inputs for creative variations.
Display Name: APNext Custom Prompts
Load prompt templates from the data/custom_prompts/ directory.
Included templates:
promptcreator.txt- Full creative prompt generationimage_analyze.txt- Image analysis promptsgemini_video.txt- Video generation promptscloner.txt- Style cloning prompts- Various LoRA-specific templates (ohwx, t5xxl, etc.)
Display Name: APNext Local random prompt
Load random prompts from local text files.
Display Name: APNext Random Integer Generator
Generate random integers with min/max range.
Create your own categories for APNextNode:
- Create a folder in
data/next/(e.g.,data/next/mycategory/) - Add JSON files for each field
["item1", "item2", "item3"]{
"preprompt": "with",
"separator": " and ",
"endprompt": "visual effects",
"items": ["motion blur", "lens flare", "particle effects"],
"attributes": {
"motion blur": ["dynamic", "cinematic"],
"lens flare": ["bright", "atmospheric"]
}
}Create your own prompt templates for use with the Custom Prompt Loader node.
Place .txt files in: data/custom_prompts/
Templates are plain text files containing instructions for LLM nodes. They support dynamic variable substitution:
| Variable | Description |
|---|---|
##TAG## |
Replaced with the tag input (e.g., "ohwx man") |
##SEX## |
Replaced with the sex input (e.g., "male", "female") |
##PRONOUNS## |
Replaced with pronouns (e.g., "him, his") |
##WORDS## |
Replaced with target word count |
Create a file data/custom_prompts/my_style.txt:
As a professional art critic, describe the provided image in detail.
Focus on creating a cohesive scene as if describing a movie still.
If the subject is ##TAG##, use ##PRONOUNS## pronouns appropriately.
The subject is ##SEX##.
Include:
- Main subject description with clothing, accessories, position
- Setting and environment details
- Lighting type, direction, and atmosphere
- Color palette and emotional tone
- Camera angle and composition
Output approximately ##WORDS## words.
Do not use JSON format. Provide a single cohesive paragraph.
| Template | Purpose |
|---|---|
promptcreator.txt |
Detailed image analysis (~150 words) |
promptcreator_small.txt |
Concise image analysis |
image_analyze.txt |
General image description |
cloner.txt |
Style cloning prompts |
gemini_video.txt |
Video generation prompts |
gemini_ohwx.txt |
LoRA trigger-aware prompts |
t5xxl.txt |
T5-XXL optimized prompts |
ltxv.txt |
LTX Video model prompts |
next_scene.txt |
Cinematic scene transitions |
Customize available models by editing JSON configuration files in the data/ folder.
| File | Provider | Description |
|---|---|---|
gemini_models.json |
Google Gemini | Gemini model list |
gpt_models.json |
OpenAI | GPT model list |
claude_models.json |
Anthropic | Claude model list |
grok_models.json |
xAI | Grok model list |
groq_models.json |
Groq | Groq model list (text + vision) |
qwenvl_models.json |
QwenVL | Local Qwen vision models |
QwenVL nodes support loading additional models from private configuration files. This allows you to add custom or uncensored models without modifying the main configuration.
How to add private models:
-
Create a JSON file in
data/with a name matchingprivate_*qwenvl*.json- Examples:
private_qwenvl_models.json,private_uncensored.qwenvl_models.json
- Examples:
-
Use the same format as
qwenvl_models.json:
{
"models": [
"huihui-ai/Huihui-Qwen3-VL-4B-Instruct-abliterated",
"huihui-ai/Huihui-Qwen3-VL-8B-Instruct-abliterated",
"another-namespace/custom-model"
]
}- Restart ComfyUI - the models will appear in the QwenVL node dropdowns
Notes:
- Private files are loaded in addition to the main
qwenvl_models.json - Duplicate models are automatically filtered out
- Supports full HuggingFace repo paths (
namespace/model-name) - Models are downloaded to
ComfyUI/models/LLM/Qwen-VL/on first use
Most model files use a simple array format:
{
"models": [
"model-name-1",
"model-name-2",
"model-name-3"
]
}Edit data/gemini_models.json:
{
"models": [
"gemini-2.5-pro",
"gemini-2.5-flash",
"gemini-flash-latest",
"gemini-flash-lite-latest",
"gemini-2.5-flash-lite",
"gemini-exp-1206"
]
}Edit data/claude_models.json:
{
"models": [
"claude-sonnet-4.5",
"claude-sonnet-4",
"claude-sonnet-3.7",
"claude-opus-4.1",
"claude-opus-4",
"claude-haiku-3.5",
"claude-haiku-3"
]
}Groq supports separate text and vision model lists:
{
"text_models": [
"llama-3.3-70b-versatile",
"llama-3.1-8b-instant",
"groq/compound",
"qwen/qwen3-32b"
],
"vision_models": [
"meta-llama/llama-4-scout-17b-16e-instruct",
"meta-llama/llama-4-maverick-17b-128e-instruct"
],
"note": "Edit this file to add/remove models"
}- Restart ComfyUI after editing model configuration files
- For Groq, the system will first try to fetch models from the API, then fall back to the JSON file
- Model names must match exactly what the provider's API expects
- Invalid model names will cause API errors at runtime
Example workflows are available in the examples/ directory:
- APNext workflows:
examples/flux/apnext/ - Florence2 local:
examples/flux/florence2/ - GPT-4o Vision:
examples/flux/gpt-4o_vision/ - Ollama local:
examples/flux/ollama_local_llm/ - MiniCPM:
examples/minicpm/
Pillow>=10.4.0
requests>=2.32.5
openai>=1.44.0
blend-modes>=2.1.0
huggingface_hub>=0.34.0
color_matcher>=0.5.0
chardet>=5.2.0
google-generativeai>=0.7.2
anthropic
transformers>=4.40.0
decord>=0.6.0
scipy>=1.10.0
tqdm>=4.67.1
| Provider | Text | Vision | Video | Local |
|---|---|---|---|---|
| OpenAI GPT | β | β | β | β |
| Google Gemini | β | β | β | β |
| Anthropic Claude | β | β | β | β |
| xAI Grok | β | β | β | β |
| Groq | β | β | β | β |
| QwenVL | β | β | β | β |
| Ollama | β | β | β | β |
| MiniCPM | β | β | β | β |
| Phi-3.5 | β | β | β | β |
MIT License
Built for the ComfyUI community. Special thanks to all contributors and users providing feedback.

