diff --git a/README.md b/README.md index 9f9f530..b573e22 100644 --- a/README.md +++ b/README.md @@ -1,2 +1,285 @@ # Characterpose -various poses + +You are right: the **primary goal** is a prompt-based image search app. + +This repo now includes a local app that can: +- scan/index images from your folders/drives, +- search by text prompts like "pile of newspaper" or "blue and white shoes", +- show matching results in a local UI, +- and (optionally) run duplicate cleanup with recovery bin. + +## Yes — you can download everything at once (no Notepad copy/paste) + +Use **one** of these options: + +1. **GitHub ZIP (easiest)** + - Open the repo page on GitHub. + - Click **Code** → **Download ZIP**. + - Extract it to a folder like `C:\ImageSearchApp`. + +2. **Git clone (if Git is installed)** + ```bash + git clone + ``` + +3. **GitHub Desktop** + - File → Clone repository → choose local folder. + +All required files come down with the correct file types automatically (`.py`, `.bat`, `.txt`, etc.). +You do **not** need to create files manually in Notepad. + + +## Do I need `git apply`? + +Short answer: **No**, not for normal use. + +- `git apply` is a developer command that applies a patch/diff file to source code. +- It is only needed if someone sends you a `.patch`/`.diff` instead of the full project files. +- For you, the easiest path is still: **Code → Download ZIP** and extract it. + +So no — copying code into Notepad is **not** your only option, and usually you should avoid that. + + +## Where is the GitHub page? + +If you are asking this, the project may **not be published to GitHub yet**. + +How to check quickly: +- If someone gave you a GitHub link, that is the page to use. +- If you have no link, ask the person who shared the files for the repository URL. +- In this copy of the project, there is no configured Git remote URL, so a GitHub page cannot be auto-detected from here. + +If you want your own GitHub page for this project: +1. Create a new empty repository on GitHub. +2. Upload this folder (or push with Git/GitHub Desktop). +3. Then use that new repo page's **Code → Download ZIP** button. + +So the issue is likely exactly what you suspected: you may be working from a local copy that is not connected to a GitHub repo URL yet. + + +## What changed in the latest update (plain English) + +Latest update focused on reliability/troubleshooting: +- Indexing now prints a `scan_summary` per source folder/drive (exists + candidate image count). +- Drive path handling is safer (`C:` is treated as `C:\\`). +- UI launcher now pins Streamlit to local defaults (`127.0.0.1:8501`) and disables usage stats prompt noise. + +If you already have a working folder, these are the only files changed in that update: +- `image_prompt_search.py` +- `run_image_search_windows.bat` +- `README.md` + +## Which files do you need to copy/paste? + +You should **not** need copy/paste if you can use GitHub **Code -> Download ZIP**. + +If ZIP still fails and you must manually copy files, use this minimum set in one folder: +- `setup_windows.bat` +- `run_image_search_windows.bat` +- `diagnose_windows.bat` +- `image_prompt_search.py` +- `streamlit_image_search_app.py` +- `requirements-image-search.txt` + +Optional (only for duplicate cleanup): +- `run_dedupe_windows.bat` +- `image_dedupe_manager.py` +- `requirements-image-dedupe.txt` + +Also create `python_path.txt` in the same folder with one line: +`C:\PortableTools\python-3.12.10-amd64\python.exe` + +## What to keep together + +Keep these files in the same extracted folder: +- `setup_windows.bat` +- `diagnose_windows.bat` +- `run_image_search_windows.bat` +- `run_dedupe_windows.bat` +- `image_prompt_search.py` +- `streamlit_image_search_app.py` +- `image_dedupe_manager.py` +- `requirements-image-search.txt` +- `requirements-image-dedupe.txt` + +Important: there is no separate “dedupe folder.” Dedupe is handled by files (`run_dedupe_windows.bat` + `image_dedupe_manager.py`) in the same app folder. +If any of those files are missing, download the latest ZIP again and extract to a fresh folder (for example `C:\ImageSearchApp`) before running setup. + + +## Quick fix (if you are stuck right now) + +Do these 6 steps exactly: + +1. Put the project in one clean folder, e.g. `C:\ImageSearchApp`. +2. In that folder, create `python_path.txt` with one line: + `C:\PortableTools\python-3.12.10-amd64\python.exe` +3. In File Explorer, open `C:\ImageSearchApp`, click the address bar, type `cmd`, press Enter. +4. Run: `setup_windows.bat` +5. Wait for `[OK] core modules import successfully` +6. Start app: `run_image_search_windows.bat ui` + +If it still shows `K:\...python_embeded`, you are running a different copy of the `.bat` files; search for duplicate `setup_windows.bat` and delete old copies. + +## Hard pivot summary (what changed) + +This project now uses a **no-venv setup** on Windows: +- `setup_windows.bat` installs dependencies into `./.deps` +- launchers set `PYTHONPATH` to `./.deps` +- this avoids the `No module named venv` failure path entirely + +If you can run normal Python + pip, this app can run without creating a virtual environment. + +## 1) One-time setup (Windows) + +1. Install Python 3.11+ from [python.org](https://www.python.org/downloads/). +2. Open your extracted folder. +3. Double-click `setup_windows.bat`. + +You can run the scripts in **either** of these ways: +- **Double-click way (easiest):** double-click the `.bat` file in File Explorer. If no command is provided, launchers now open an interactive menu instead of closing immediately. +- **Terminal way (recommended for seeing errors):** open Command Prompt in that folder, then type the `.bat` command. + +### What “same folder” means (important) + +“Same folder” means the directory that contains these files together: +- `setup_windows.bat` +- `run_image_search_windows.bat` +- `diagnose_windows.bat` +- `python_path.txt` (if you use the C: override) + +Example of a correct folder: +- `C:\ImageSearchApp\` + +In that case, `python_path.txt` should be: +- `C:\ImageSearchApp\python_path.txt` + +### How to open a terminal in that exact folder + +1. Open File Explorer. +2. Browse to your project folder (example: `C:\ImageSearchApp`). +3. Click the address bar, type `cmd`, press Enter. +4. A Command Prompt opens already in the right folder. +5. Run commands there, for example: + +```bat +setup_windows.bat +run_image_search_windows.bat ui +``` + +## 2) Build your image index (first run) + +Open **Command Prompt** in this folder and run: + +```bat +run_image_search_windows.bat index "D:\Photos" "E:\Archive" +``` + +This scans images and creates `./.image_search_index`. + +Tips for drive-level indexing on Windows: +- Prefer root drives without a trailing slash in quotes: `"C:" "D:" "E:"` +- Or target known media folders first (`Pictures`, camera dumps, archives) to validate quickly. +- If indexing fails, the script now prints a `scan_summary` showing each source path, whether it exists, and how many candidate image files were found. + +## 3) Launch the search app UI + +```bat +run_image_search_windows.bat ui +``` + +Then use the local browser page to type prompts and view matching images. + +## 4) Quick CLI search (optional) + +```bat +run_image_search_windows.bat search blue and white shoes +``` + +## Optional: duplicate cleanup tool + +If you also want duplicate cleanup with app recycle bin, use: + +```bat +run_dedupe_windows.bat scan "D:\Photos" +run_dedupe_windows.bat scan-visual "D:\Photos" +run_dedupe_windows.bat restore +``` + +## Notes + +- Prompt search quality depends on the model and image quality. +- Re-run `index` when you add many new images. +- Everything runs locally on your machine (no cloud required by this tool itself). + + +## If you hit an error after setup + +I could not open your shared ChatGPT link from this environment (network/proxy blocked), so I added a one-click local diagnostics script. + +Run this in your tool folder: + +```bat +diagnose_windows.bat +``` + +It checks: +- Python installed and selected correctly +- `.deps` exists (local dependency folder) +- required modules installed (`Pillow`, `numpy`, `sentence-transformers`, `streamlit`, `imagehash`) +- app Python files compile + +Then follow the suggested next command it prints. + +If it still fails, copy the exact error text from that window and share it. + + +## Force the app to use your C: Python (avoid ComfyUI K: embedded Python) + +If diagnostics show paths like `K:\...\python_embeded\python.exe`, create a file named: + +`python_path.txt` + +in this project folder, with exactly one line, for example: + +`C:\PortableTools\python-3.12.10-amd64\python.exe` + +Then run: + +```bat +setup_windows.bat +``` + +All launchers in this repo (`setup_windows.bat`, `run_image_search_windows.bat`, `run_dedupe_windows.bat`, `diagnose_windows.bat`) prefer `python_path.txt` first. + +Important: keep `python_path.txt` in the same folder as the `.bat` files. +The scripts switch to their own folder first, so this works even if you launch from another directory. + +This pivot removes virtualenv dependency entirely: setup installs packages into a local `.deps` folder and launchers set `PYTHONPATH` to use it. + + +## If you STILL see `K:\...\python_embeded\python.exe` + +You are almost certainly running an older copy of `setup_windows.bat` from a different folder. + +Do this exactly: +1. In the folder where your `.bat` files are, run `diagnose_windows.bat`. +2. Confirm it prints `Script folder: ...` at the top and check that path is the one you expect. +3. In that **same folder**, create `python_path.txt` with one line only: + `C:\PortableTools\python-3.12.10-amd64\python.exe` +4. Run `setup_windows.bat` from that same folder. + +If you still get K:, you are launching a different script copy. Search your PC for duplicate `setup_windows.bat` files and remove old copies. + + +## What your latest diagnostic output means + +Your output is actually good progress: +- It shows your C: Python is selected correctly. +- If `.deps` is missing, setup has not completed yet. +- Missing-module errors mean dependencies were not installed to `.deps` yet. + +Run this next in the same folder: + +```bat +setup_windows.bat +``` diff --git a/diagnose_windows.bat b/diagnose_windows.bat new file mode 100644 index 0000000..9236bb7 --- /dev/null +++ b/diagnose_windows.bat @@ -0,0 +1,90 @@ +@echo off +setlocal + +set "SCRIPT_DIR=%~dp0" +cd /d "%SCRIPT_DIR%" + +echo ========================================== +echo Prompt Image Search - Quick Diagnose (pivot: no venv) +echo Script folder: %SCRIPT_DIR% +echo ========================================== +echo. + +set PYTHON_EXE= +if exist "python_path.txt" ( + for /f "usebackq delims=" %%I in ("python_path.txt") do ( + set "PYTHON_EXE=%%~I" + goto :path_loaded + ) +) + +:path_loaded +set "PYTHON_EXE=%PYTHON_EXE:\"=%" +set "PYTHON_EXE=%PYTHON_EXE:"=%" + +if defined PYTHON_EXE echo [OK] python_path.txt requests: %PYTHON_EXE% + +if not defined PYTHON_EXE ( + for /f "delims=" %%I in ('py -3.12 -c "import sys; print(sys.executable)" 2^>nul') do set PYTHON_EXE=%%I +) +if not defined PYTHON_EXE ( + for /f "delims=" %%I in ('py -3 -c "import sys; print(sys.executable)" 2^>nul') do set PYTHON_EXE=%%I +) +if not defined PYTHON_EXE ( + for /f "delims=" %%I in ('python -c "import sys; print(sys.executable)" 2^>nul') do set PYTHON_EXE=%%I +) + +if not defined PYTHON_EXE ( + echo [FAIL] Could not detect Python. + goto end +) + +echo [OK] Python detected: %PYTHON_EXE% + +echo %PYTHON_EXE% | find /I "python_embeded" >nul +if not errorlevel 1 ( + echo [FAIL] You are still using ComfyUI embedded Python. + echo Create or fix python_path.txt in this folder with a C: Python path. + goto end +) + +echo. +"%PYTHON_EXE%" -c "import sys; print('Executable:', sys.executable)" +"%PYTHON_EXE%" -m pip --version +if errorlevel 1 ( + echo [FAIL] pip not available for %PYTHON_EXE% + goto end +) + +echo. +if not exist ".deps" ( + echo [WARN] .deps not found yet. Run setup_windows.bat first. + goto compile_check +) + +echo [OK] .deps folder found +set "PYTHONPATH=%SCRIPT_DIR%.deps" +echo Checking required modules from .deps... +"%PYTHON_EXE%" -c "import PIL, numpy, sentence_transformers, streamlit, imagehash; print('[OK] core modules import successfully')" +if errorlevel 1 ( + echo [FAIL] Some modules missing in .deps. Run setup_windows.bat + goto end +) + +:compile_check +echo. +echo Checking app scripts compile... +"%PYTHON_EXE%" -m py_compile image_prompt_search.py streamlit_image_search_app.py image_dedupe_manager.py +if errorlevel 1 ( + echo [FAIL] Compile check failed. + goto end +) + +echo [OK] Local setup looks good. +echo Next: +echo run_image_search_windows.bat index "D:\Photos" +echo run_image_search_windows.bat ui + +:end +pause +endlocal diff --git a/image_dedupe_manager.py b/image_dedupe_manager.py new file mode 100644 index 0000000..7d49670 --- /dev/null +++ b/image_dedupe_manager.py @@ -0,0 +1,369 @@ +#!/usr/bin/env python3 +"""Image duplicate manager with a local app recycle bin. + +Features +- Exact duplicate detection via SHA-256. +- Optional visual duplicate detection via perceptual hash. +- Safety guards for visual mode: do not delete when dimensions or DPI differ. +- Quarantine delete candidates into an app-managed recycle bin. +- Restore files from the app recycle bin. +- Optional purge from app recycle bin to Windows recycle bin or permanent delete. +""" + +from __future__ import annotations + +import argparse +import hashlib +import json +import os +import shutil +from dataclasses import dataclass +from datetime import datetime +from pathlib import Path +from typing import Dict, Iterable, List, Optional, Sequence, Tuple + +from PIL import Image +import imagehash + +try: + from send2trash import send2trash +except Exception: # pragma: no cover - optional dependency fallback + send2trash = None + +IMAGE_EXTENSIONS = { + ".jpg", + ".jpeg", + ".png", + ".webp", + ".bmp", + ".gif", + ".tif", + ".tiff", + ".heic", + ".heif", +} + + +@dataclass +class ImageMeta: + path: Path + mtime: float + size_bytes: int + width: int + height: int + dpi_x: Optional[float] + dpi_y: Optional[float] + sha256: str + phash: Optional[imagehash.ImageHash] + + +def iter_images(roots: Sequence[Path]) -> Iterable[Path]: + for root in roots: + if not root.exists(): + continue + for path in root.rglob("*"): + if path.is_file() and path.suffix.lower() in IMAGE_EXTENSIONS: + yield path + + +def sha256_file(path: Path, chunk_size: int = 1024 * 1024) -> str: + digest = hashlib.sha256() + with path.open("rb") as f: + while True: + chunk = f.read(chunk_size) + if not chunk: + break + digest.update(chunk) + return digest.hexdigest() + + +def read_meta(path: Path, include_visual_hash: bool) -> Optional[ImageMeta]: + stat = path.stat() + width = 0 + height = 0 + dpi_x: Optional[float] = None + dpi_y: Optional[float] = None + phash: Optional[imagehash.ImageHash] = None + + try: + with Image.open(path) as img: + width, height = img.size + raw_dpi = img.info.get("dpi") + if isinstance(raw_dpi, tuple) and len(raw_dpi) >= 2: + dpi_x = float(raw_dpi[0]) + dpi_y = float(raw_dpi[1]) + elif isinstance(raw_dpi, (int, float)): + dpi_x = float(raw_dpi) + dpi_y = float(raw_dpi) + if include_visual_hash: + phash = imagehash.phash(img) + except Exception: + return None + + return ImageMeta( + path=path, + mtime=stat.st_mtime, + size_bytes=stat.st_size, + width=width, + height=height, + dpi_x=dpi_x, + dpi_y=dpi_y, + sha256=sha256_file(path), + phash=phash, + ) + + +def choose_keep_newest(members: Sequence[ImageMeta]) -> ImageMeta: + return sorted( + members, + key=lambda m: (m.mtime, m.size_bytes, str(m.path).lower()), + reverse=True, + )[0] + + +def same_dimensions_and_dpi(a: ImageMeta, b: ImageMeta) -> bool: + if (a.width, a.height) != (b.width, b.height): + return False + + if a.dpi_x is None or a.dpi_y is None or b.dpi_x is None or b.dpi_y is None: + return a.dpi_x is None and a.dpi_y is None and b.dpi_x is None and b.dpi_y is None + + return round(a.dpi_x, 2) == round(b.dpi_x, 2) and round(a.dpi_y, 2) == round(b.dpi_y, 2) + + +def ensure_run_bin(base_bin: Path) -> Path: + run_bin = base_bin / datetime.now().strftime("%Y%m%d_%H%M%S") + run_bin.mkdir(parents=True, exist_ok=True) + return run_bin + + +def safe_relpath(path: Path) -> str: + drive = path.drive.replace(":", "") if path.drive else "root" + without_anchor = Path(*path.parts[1:]) if path.is_absolute() and len(path.parts) > 1 else path + return str(Path(drive) / without_anchor) + + +def move_to_app_bin(meta: ImageMeta, run_bin: Path) -> Path: + rel = safe_relpath(meta.path) + destination = run_bin / rel + destination.parent.mkdir(parents=True, exist_ok=True) + shutil.move(str(meta.path), str(destination)) + return destination + + +def write_manifest(manifest_path: Path, rows: List[Dict[str, object]]) -> None: + with manifest_path.open("a", encoding="utf-8") as f: + for row in rows: + f.write(json.dumps(row, ensure_ascii=False) + "\n") + + +def dedupe( + source_dirs: Sequence[Path], + app_bin: Path, + include_visual: bool, + visual_distance: int, + dry_run: bool, +) -> Dict[str, int]: + metas: List[ImageMeta] = [] + for img_path in iter_images(source_dirs): + meta = read_meta(img_path, include_visual_hash=include_visual) + if meta: + metas.append(meta) + + stats = { + "scanned": len(metas), + "exact_groups": 0, + "visual_groups": 0, + "moved": 0, + "protected": 0, + } + + by_sha: Dict[str, List[ImageMeta]] = {} + for meta in metas: + by_sha.setdefault(meta.sha256, []).append(meta) + + duplicate_rows: List[Dict[str, object]] = [] + run_bin = ensure_run_bin(app_bin) if not dry_run else app_bin / "DRY_RUN" + manifest_path = run_bin / "manifest.jsonl" + + removed_paths: set[Path] = set() + + for group in by_sha.values(): + if len(group) < 2: + continue + stats["exact_groups"] += 1 + keep = choose_keep_newest(group) + for candidate in group: + if candidate.path == keep.path: + continue + removed_paths.add(candidate.path) + if not dry_run: + moved_to = move_to_app_bin(candidate, run_bin) + stats["moved"] += 1 + duplicate_rows.append( + { + "reason": "exact_duplicate", + "kept": str(keep.path), + "removed": str(candidate.path), + "stored_in_app_bin": str(moved_to), + "sha256": candidate.sha256, + "width": candidate.width, + "height": candidate.height, + "dpi": [candidate.dpi_x, candidate.dpi_y], + "timestamp": datetime.utcnow().isoformat() + "Z", + } + ) + + if include_visual: + remaining = [m for m in metas if m.path not in removed_paths and m.phash is not None] + remaining_sorted = sorted(remaining, key=lambda m: m.mtime, reverse=True) + seen: set[Path] = set() + + for i, anchor in enumerate(remaining_sorted): + if anchor.path in seen: + continue + group = [anchor] + for other in remaining_sorted[i + 1 :]: + if other.path in seen: + continue + if other.phash is None or anchor.phash is None: + continue + if anchor.phash - other.phash <= visual_distance: + group.append(other) + if len(group) < 2: + continue + + stats["visual_groups"] += 1 + keep = choose_keep_newest(group) + for candidate in group: + seen.add(candidate.path) + if candidate.path == keep.path: + continue + if not same_dimensions_and_dpi(keep, candidate): + stats["protected"] += 1 + continue + if not dry_run: + moved_to = move_to_app_bin(candidate, run_bin) + stats["moved"] += 1 + duplicate_rows.append( + { + "reason": "visual_duplicate", + "kept": str(keep.path), + "removed": str(candidate.path), + "stored_in_app_bin": str(moved_to), + "phash_keep": str(keep.phash), + "phash_removed": str(candidate.phash), + "width": candidate.width, + "height": candidate.height, + "dpi": [candidate.dpi_x, candidate.dpi_y], + "timestamp": datetime.utcnow().isoformat() + "Z", + } + ) + + if duplicate_rows and not dry_run: + write_manifest(manifest_path, duplicate_rows) + + return stats + + +def restore_from_app_bin(app_bin: Path) -> int: + restored = 0 + manifests = sorted(app_bin.glob("*/manifest.jsonl")) + for manifest in manifests: + with manifest.open("r", encoding="utf-8") as f: + rows = [json.loads(line) for line in f if line.strip()] + + for row in reversed(rows): + stored = Path(row["stored_in_app_bin"]) + original = Path(row["removed"]) + if stored.exists(): + original.parent.mkdir(parents=True, exist_ok=True) + shutil.move(str(stored), str(original)) + restored += 1 + return restored + + +def purge_app_bin(app_bin: Path, mode: str) -> int: + purged = 0 + for run_dir in sorted(p for p in app_bin.glob("*") if p.is_dir()): + for file in run_dir.rglob("*"): + if not file.is_file() or file.name == "manifest.jsonl": + continue + if mode == "recycle": + if send2trash is None: + raise RuntimeError("send2trash is not installed. Install it to use recycle mode.") + send2trash(str(file)) + elif mode == "permanent": + file.unlink(missing_ok=True) + purged += 1 + shutil.rmtree(run_dir, ignore_errors=True) + return purged + + +def build_parser() -> argparse.ArgumentParser: + p = argparse.ArgumentParser(description="Image duplicate manager with app recycle bin") + sub = p.add_subparsers(dest="command", required=True) + + scan = sub.add_parser("scan", help="Find duplicates and move old versions to app recycle bin") + scan.add_argument("--source", nargs="+", required=True, help="One or more source folders to scan") + scan.add_argument( + "--app-bin", + default="./app_recycle_bin", + help="Folder used as app-managed recycle bin", + ) + scan.add_argument( + "--visual", + action="store_true", + help="Enable visual duplicate mode (protected by same dimensions and DPI rule)", + ) + scan.add_argument( + "--visual-distance", + type=int, + default=2, + help="pHash distance threshold for visual matching (lower is stricter)", + ) + scan.add_argument("--dry-run", action="store_true", help="Only report stats, do not move files") + + restore = sub.add_parser("restore", help="Restore files from app recycle bin to original paths") + restore.add_argument("--app-bin", default="./app_recycle_bin") + + purge = sub.add_parser( + "purge", + help="Purge files currently in app recycle bin to Windows recycle bin or permanently", + ) + purge.add_argument("--app-bin", default="./app_recycle_bin") + purge.add_argument("--mode", choices=["recycle", "permanent"], default="recycle") + + return p + + +def main() -> int: + parser = build_parser() + args = parser.parse_args() + + if args.command == "scan": + stats = dedupe( + source_dirs=[Path(s).expanduser().resolve() for s in args.source], + app_bin=Path(args.app_bin).expanduser().resolve(), + include_visual=args.visual, + visual_distance=args.visual_distance, + dry_run=args.dry_run, + ) + print(json.dumps(stats, indent=2)) + return 0 + + if args.command == "restore": + restored = restore_from_app_bin(Path(args.app_bin).expanduser().resolve()) + print(json.dumps({"restored": restored}, indent=2)) + return 0 + + if args.command == "purge": + purged = purge_app_bin(Path(args.app_bin).expanduser().resolve(), mode=args.mode) + print(json.dumps({"purged": purged, "mode": args.mode}, indent=2)) + return 0 + + return 1 + + +if __name__ == "__main__": + raise SystemExit(main()) diff --git a/image_prompt_search.py b/image_prompt_search.py new file mode 100644 index 0000000..5cb1f44 --- /dev/null +++ b/image_prompt_search.py @@ -0,0 +1,208 @@ +#!/usr/bin/env python3 +"""Prompt-based local image search tool. + +Primary goal: +- Index images from one or more folders. +- Search by natural language prompt ("pile of newspaper", "blue and white shoes"). +- Return best matching file paths by semantic similarity. + +This tool stores a local index in a folder (default: ./.image_search_index). +""" + +from __future__ import annotations + +import argparse +import json +import os +from dataclasses import dataclass, asdict +from pathlib import Path +from typing import Iterable, List, Sequence + +import numpy as np +from PIL import Image +from sentence_transformers import SentenceTransformer + +IMAGE_EXTENSIONS = {".jpg", ".jpeg", ".png", ".webp", ".bmp", ".gif", ".tif", ".tiff", ".heic", ".heif"} + + +@dataclass +class ImageRecord: + path: str + width: int + height: int + mtime: float + + +def iter_images(roots: Sequence[Path]) -> Iterable[Path]: + for root in roots: + if not root.exists(): + continue + for dirpath, _, filenames in os.walk(root, onerror=lambda _: None): + for filename in filenames: + suffix = Path(filename).suffix.lower() + if suffix in IMAGE_EXTENSIONS: + yield Path(dirpath) / filename + + +def source_scan_summary(source_dirs: Sequence[Path]) -> list[dict]: + summary: list[dict] = [] + for src in source_dirs: + src_path = Path(src) + row = { + "source": str(src_path), + "exists": src_path.exists(), + "candidate_images": 0, + } + if src_path.exists(): + row["candidate_images"] = sum(1 for _ in iter_images([src_path])) + summary.append(row) + return summary + + +def load_model(model_name: str) -> SentenceTransformer: + return SentenceTransformer(model_name) + + +def normalize(vectors: np.ndarray) -> np.ndarray: + norms = np.linalg.norm(vectors, axis=1, keepdims=True) + norms[norms == 0] = 1.0 + return vectors / norms + + +def build_index(source_dirs: Sequence[Path], index_dir: Path, model_name: str, batch_size: int = 16) -> dict: + model = load_model(model_name) + scan = source_scan_summary(source_dirs) + print(json.dumps({"scan_summary": scan}, indent=2)) + image_paths = list(iter_images(source_dirs)) + + records: List[ImageRecord] = [] + pil_images: List[Image.Image] = [] + + for p in image_paths: + try: + with Image.open(p) as img: + width, height = img.size + pil_images.append(img.convert("RGB")) + records.append(ImageRecord(path=str(p), width=width, height=height, mtime=p.stat().st_mtime)) + except Exception: + continue + + if not records: + raise RuntimeError( + "No readable images found in provided source folders. " + "Check scan_summary above for missing paths or zero candidate images. " + "Tip: point to folders that definitely contain supported image types " + f"({', '.join(sorted(IMAGE_EXTENSIONS))})." + ) + + embeddings = model.encode(pil_images, batch_size=batch_size, convert_to_numpy=True, show_progress_bar=True) + embeddings = normalize(embeddings.astype(np.float32)) + + index_dir.mkdir(parents=True, exist_ok=True) + np.save(index_dir / "embeddings.npy", embeddings) + + with (index_dir / "metadata.jsonl").open("w", encoding="utf-8") as f: + for r in records: + f.write(json.dumps(asdict(r), ensure_ascii=False) + "\n") + + with (index_dir / "index_config.json").open("w", encoding="utf-8") as f: + json.dump({"model": model_name, "count": len(records)}, f, indent=2) + + return {"indexed": len(records), "index_dir": str(index_dir), "model": model_name} + + +def load_index(index_dir: Path) -> tuple[np.ndarray, List[ImageRecord], str]: + cfg = json.loads((index_dir / "index_config.json").read_text(encoding="utf-8")) + model_name = cfg["model"] + embeddings = np.load(index_dir / "embeddings.npy") + + records: List[ImageRecord] = [] + with (index_dir / "metadata.jsonl").open("r", encoding="utf-8") as f: + for line in f: + if line.strip(): + row = json.loads(line) + records.append(ImageRecord(**row)) + + if len(records) != embeddings.shape[0]: + raise RuntimeError("Index files are inconsistent (metadata count != embedding count).") + + return embeddings, records, model_name + + +def search(index_dir: Path, query: str, top_k: int = 30) -> list[dict]: + embeddings, records, model_name = load_index(index_dir) + model = load_model(model_name) + + q = model.encode([query], convert_to_numpy=True).astype(np.float32) + q = normalize(q) + + scores = embeddings @ q[0] + order = np.argsort(-scores)[:top_k] + + results = [] + for i in order: + rec = records[int(i)] + results.append( + { + "path": rec.path, + "score": float(scores[i]), + "width": rec.width, + "height": rec.height, + } + ) + return results + + +def parser() -> argparse.ArgumentParser: + p = argparse.ArgumentParser(description="Prompt-based local image search") + sub = p.add_subparsers(dest="command", required=True) + + index = sub.add_parser("index", help="Build (or rebuild) local image search index") + index.add_argument("--source", nargs="+", required=True, help="Folders to scan for images") + index.add_argument("--index-dir", default="./.image_search_index", help="Where to store index files") + index.add_argument("--model", default="clip-ViT-B-32", help="SentenceTransformer model name") + index.add_argument("--batch-size", type=int, default=16) + + find = sub.add_parser("search", help="Search indexed images by text prompt") + find.add_argument("--index-dir", default="./.image_search_index") + find.add_argument("--query", required=True, help="Text query, e.g. 'pile of newspaper'") + find.add_argument("--top-k", type=int, default=30) + + return p + + +def normalize_source_path(raw: str) -> Path: + s = raw.strip().strip('"').strip("'") + # Treat bare drive letters as roots on Windows ("C:" -> "C:\\"). + if len(s) == 2 and s[1] == ":": + s = s + "\\" + return Path(s).expanduser().resolve() + + +def main() -> int: + args = parser().parse_args() + + if args.command == "index": + out = build_index( + source_dirs=[normalize_source_path(s) for s in args.source], + index_dir=Path(args.index_dir).expanduser().resolve(), + model_name=args.model, + batch_size=args.batch_size, + ) + print(json.dumps(out, indent=2)) + return 0 + + if args.command == "search": + out = search( + index_dir=Path(args.index_dir).expanduser().resolve(), + query=args.query, + top_k=args.top_k, + ) + print(json.dumps(out, indent=2)) + return 0 + + return 1 + + +if __name__ == "__main__": + raise SystemExit(main()) diff --git a/requirements-image-dedupe.txt b/requirements-image-dedupe.txt new file mode 100644 index 0000000..ed2f365 --- /dev/null +++ b/requirements-image-dedupe.txt @@ -0,0 +1,3 @@ +Pillow>=10.0.0 +ImageHash>=4.3.1 +send2trash>=1.8.2 diff --git a/requirements-image-search.txt b/requirements-image-search.txt new file mode 100644 index 0000000..31d106a --- /dev/null +++ b/requirements-image-search.txt @@ -0,0 +1,7 @@ +Pillow>=10.0.0 +numpy>=1.24.0 +sentence-transformers>=2.7.0 +streamlit>=1.36.0 +tqdm>=4.66.0 +ImageHash>=4.3.1 +send2trash>=1.8.2 diff --git a/run_dedupe_windows.bat b/run_dedupe_windows.bat new file mode 100644 index 0000000..c1a55d5 --- /dev/null +++ b/run_dedupe_windows.bat @@ -0,0 +1,127 @@ +@echo off +setlocal EnableDelayedExpansion + +set "SCRIPT_DIR=%~dp0" +cd /d "%SCRIPT_DIR%" + +set "PYTHON_EXE=" +if exist "python_path.txt" ( + for /f "usebackq delims=" %%I in ("python_path.txt") do ( + set "PYTHON_EXE=%%~I" + goto :path_loaded + ) +) + +:path_loaded +set "PYTHON_EXE=%PYTHON_EXE:\"=%" +set "PYTHON_EXE=%PYTHON_EXE:"=%" + +if not defined PYTHON_EXE ( + for /f "delims=" %%I in ('py -3.12 -c "import sys; print(sys.executable)" 2^>nul') do set "PYTHON_EXE=%%I" +) +if not defined PYTHON_EXE ( + for /f "delims=" %%I in ('py -3 -c "import sys; print(sys.executable)" 2^>nul') do set "PYTHON_EXE=%%I" +) +if not defined PYTHON_EXE ( + for /f "delims=" %%I in ('python -c "import sys; print(sys.executable)" 2^>nul') do set "PYTHON_EXE=%%I" +) + +if not defined PYTHON_EXE ( + echo [FAIL] Could not find Python. Run setup_windows.bat first. + goto end +) + +if not exist ".deps" ( + echo [WARN] .deps folder not found. Run setup_windows.bat first. +) +set "PYTHONPATH=%SCRIPT_DIR%.deps" + +set "INTERACTIVE=" +if "%~1"=="" goto menu + +set "CMD=%~1" +shift + +if /I "%CMD%"=="scan" ( + "%PYTHON_EXE%" image_dedupe_manager.py scan --source %* --app-bin .\app_recycle_bin + goto end +) + +if /I "%CMD%"=="scan-visual" ( + "%PYTHON_EXE%" image_dedupe_manager.py scan --source %* --app-bin .\app_recycle_bin --visual + goto end +) + +if /I "%CMD%"=="restore" ( + "%PYTHON_EXE%" image_dedupe_manager.py restore --app-bin .\app_recycle_bin + goto end +) + +if /I "%CMD%"=="purge-recycle" ( + "%PYTHON_EXE%" image_dedupe_manager.py purge --app-bin .\app_recycle_bin --mode recycle + goto end +) + +if /I "%CMD%"=="purge-permanent" ( + "%PYTHON_EXE%" image_dedupe_manager.py purge --app-bin .\app_recycle_bin --mode permanent + goto end +) + +goto help + +:menu +set "INTERACTIVE=1" +echo. +echo Dedupe Manager +echo 1^) Scan exact duplicates +echo 2^) Scan exact + visual duplicates +echo 3^) Restore from app recycle bin +echo 4^) Purge app bin to Windows recycle bin +echo 5^) Purge app bin permanently +echo 6^) Exit +set /p CHOICE=Choose 1-6: + +if "%CHOICE%"=="1" ( + set /p SRC=Enter folders/drives in quotes ^(example: "C:\Users\You\Pictures" "D:\Photos"^): + if not defined SRC goto end + "%PYTHON_EXE%" image_dedupe_manager.py scan --source !SRC! --app-bin .\app_recycle_bin + goto end +) +if "%CHOICE%"=="2" ( + set /p SRC=Enter folders/drives in quotes ^(example: "C:\Users\You\Pictures" "D:\Photos"^): + if not defined SRC goto end + "%PYTHON_EXE%" image_dedupe_manager.py scan --source !SRC! --app-bin .\app_recycle_bin --visual + goto end +) +if "%CHOICE%"=="3" ( + "%PYTHON_EXE%" image_dedupe_manager.py restore --app-bin .\app_recycle_bin + goto end +) +if "%CHOICE%"=="4" ( + "%PYTHON_EXE%" image_dedupe_manager.py purge --app-bin .\app_recycle_bin --mode recycle + goto end +) +if "%CHOICE%"=="5" ( + "%PYTHON_EXE%" image_dedupe_manager.py purge --app-bin .\app_recycle_bin --mode permanent + goto end +) +if "%CHOICE%"=="6" goto end +echo Invalid choice. +goto menu + +:help +echo. +echo Unknown or missing command. +echo. +echo Commands: +echo scan [folders...] - exact duplicates only +echo scan-visual [folders...] - exact + visual duplicates +echo restore - restore from app recycle bin +echo purge-recycle - send app recycle bin files to Windows recycle bin +echo purge-permanent - permanently delete app recycle bin files + +goto end + +:end +if defined INTERACTIVE pause +endlocal diff --git a/run_image_search_windows.bat b/run_image_search_windows.bat new file mode 100644 index 0000000..d0429fd --- /dev/null +++ b/run_image_search_windows.bat @@ -0,0 +1,103 @@ +@echo off +setlocal EnableDelayedExpansion + +set "SCRIPT_DIR=%~dp0" +cd /d "%SCRIPT_DIR%" + +set "PYTHON_EXE=" +if exist "python_path.txt" ( + for /f "usebackq delims=" %%I in ("python_path.txt") do ( + set "PYTHON_EXE=%%~I" + goto :path_loaded + ) +) + +:path_loaded +set "PYTHON_EXE=%PYTHON_EXE:\"=%" +set "PYTHON_EXE=%PYTHON_EXE:"=%" + +if not defined PYTHON_EXE ( + for /f "delims=" %%I in ('py -3.12 -c "import sys; print(sys.executable)" 2^>nul') do set "PYTHON_EXE=%%I" +) +if not defined PYTHON_EXE ( + for /f "delims=" %%I in ('py -3 -c "import sys; print(sys.executable)" 2^>nul') do set "PYTHON_EXE=%%I" +) +if not defined PYTHON_EXE ( + for /f "delims=" %%I in ('python -c "import sys; print(sys.executable)" 2^>nul') do set "PYTHON_EXE=%%I" +) + +if not defined PYTHON_EXE ( + echo [FAIL] Could not find Python. + goto end +) + +if not exist ".deps" ( + echo [WARN] .deps folder not found. Run setup_windows.bat first. +) +set "PYTHONPATH=%SCRIPT_DIR%.deps" + +set "INTERACTIVE=" +if "%~1"=="" goto menu + +set "CMD=%~1" +shift + +if /I "%CMD%"=="index" ( + "%PYTHON_EXE%" image_prompt_search.py index --source %* --index-dir .\.image_search_index + goto end +) + +if /I "%CMD%"=="search" ( + "%PYTHON_EXE%" image_prompt_search.py search --index-dir .\.image_search_index --query "%*" + goto end +) + +if /I "%CMD%"=="ui" ( + "%PYTHON_EXE%" -m streamlit run streamlit_image_search_app.py --server.address 127.0.0.1 --server.port 8501 --browser.gatherUsageStats false --server.enableCORS true --server.enableXsrfProtection true + goto end +) + +goto help + +:menu +set "INTERACTIVE=1" +echo. +echo Prompt Image Search +echo 1^) Build/Rebuild index +echo 2^) Search by prompt (CLI) +echo 3^) Launch UI +echo 4^) Exit +set /p CHOICE=Choose 1-4: + +if "%CHOICE%"=="1" ( + set /p SRC=Enter folders/drives in quotes ^(example: "C:\Users\You\Pictures" "D:\Photos"^): + if not defined SRC goto end + "%PYTHON_EXE%" image_prompt_search.py index --source !SRC! --index-dir .\.image_search_index + goto end +) +if "%CHOICE%"=="2" ( + set /p QUERY=Enter search prompt ^(example: blue and white shoes^): + if not defined QUERY goto end + "%PYTHON_EXE%" image_prompt_search.py search --index-dir .\.image_search_index --query "!QUERY!" + goto end +) +if "%CHOICE%"=="3" ( + "%PYTHON_EXE%" -m streamlit run streamlit_image_search_app.py --server.address 127.0.0.1 --server.port 8501 --browser.gatherUsageStats false --server.enableCORS true --server.enableXsrfProtection true + goto end +) +if "%CHOICE%"=="4" goto end +echo Invalid choice. +goto menu + +:help +echo. +echo Usage: +echo run_image_search_windows.bat index [folders...] +echo run_image_search_windows.bat search [query words...] +echo run_image_search_windows.bat ui + +goto end + +:end +if defined INTERACTIVE pause +endlocal diff --git a/setup_windows.bat b/setup_windows.bat new file mode 100644 index 0000000..00abe21 --- /dev/null +++ b/setup_windows.bat @@ -0,0 +1,111 @@ +@echo off +setlocal + +set "SCRIPT_DIR=%~dp0" +cd /d "%SCRIPT_DIR%" + +echo ========================================== +echo Prompt Image Search + Dedupe - Windows Setup (Pivot: no venv) +echo Script folder: %SCRIPT_DIR% +echo ========================================== +echo. + +set PYTHON_EXE= + +if exist "python_path.txt" ( + for /f "usebackq delims=" %%I in ("python_path.txt") do ( + set "PYTHON_EXE=%%~I" + goto :path_loaded + ) +) + +:path_loaded +set "PYTHON_EXE=%PYTHON_EXE:\"=%" +set "PYTHON_EXE=%PYTHON_EXE:"=%" + +if defined PYTHON_EXE echo python_path.txt found. Requested interpreter: %PYTHON_EXE% + +if not defined PYTHON_EXE ( + for /f "delims=" %%I in ('py -3.12 -c "import sys; print(sys.executable)" 2^>nul') do set PYTHON_EXE=%%I +) +if not defined PYTHON_EXE ( + for /f "delims=" %%I in ('py -3 -c "import sys; print(sys.executable)" 2^>nul') do set PYTHON_EXE=%%I +) +if not defined PYTHON_EXE ( + for /f "delims=" %%I in ('python -c "import sys; print(sys.executable)" 2^>nul') do set PYTHON_EXE=%%I +) + +if not defined PYTHON_EXE ( + echo [FAIL] Could not find a usable Python interpreter. + echo Install Python 3.11+ from https://www.python.org/downloads/windows/ + pause + exit /b 1 +) + +if not exist "%PYTHON_EXE%" ( + echo [FAIL] Selected python path does not exist: %PYTHON_EXE% + echo Update python_path.txt with a valid full path to python.exe + pause + exit /b 1 +) + +echo Using Python: %PYTHON_EXE% + +echo %PYTHON_EXE% | find /I "python_embeded" >nul +if not errorlevel 1 ( + echo [FAIL] Detected ComfyUI embedded Python path. + echo This setup must use a full Python install, not python_embeded. + echo Create python_path.txt in this same folder with one line, e.g.: + echo C:\PortableTools\python-3.12.10-amd64\python.exe + pause + exit /b 1 +) + +echo Preparing local dependency folder (.deps)... +if not exist ".deps" mkdir ".deps" + +"%PYTHON_EXE%" -m pip --version >nul 2>nul +if errorlevel 1 ( + echo [FAIL] pip is not available on selected Python. + echo Try: "%PYTHON_EXE%" -m ensurepip --upgrade + pause + exit /b 1 +) + +echo Installing/updating dependencies into .deps (no virtualenv)... +"%PYTHON_EXE%" -m pip install --upgrade pip +"%PYTHON_EXE%" -m pip install --upgrade --target .deps -r requirements-image-search.txt +if errorlevel 1 ( + echo [FAIL] Failed installing requirements-image-search.txt + pause + exit /b 1 +) +"%PYTHON_EXE%" -m pip install --upgrade --target .deps -r requirements-image-dedupe.txt +if errorlevel 1 ( + echo [FAIL] Failed installing requirements-image-dedupe.txt + pause + exit /b 1 +) + +echo. +echo Verifying imports using local .deps... +set "PYTHONPATH=%SCRIPT_DIR%.deps" +"%PYTHON_EXE%" -c "import PIL, numpy, sentence_transformers, streamlit, imagehash; print('[OK] core modules import successfully')" +if errorlevel 1 ( + echo [FAIL] Import verification failed. + pause + exit /b 1 +) + +echo. +echo Setup complete. No .venv required. +echo. +echo Primary goal (prompt image search): +echo run_image_search_windows.bat index "D:\Photos" "E:\Archive" +echo run_image_search_windows.bat ui +echo. +echo Optional duplicate cleanup: +echo run_dedupe_windows.bat scan "D:\Photos" +echo. +pause +endlocal diff --git a/streamlit_image_search_app.py b/streamlit_image_search_app.py new file mode 100644 index 0000000..bf1a367 --- /dev/null +++ b/streamlit_image_search_app.py @@ -0,0 +1,48 @@ +#!/usr/bin/env python3 +from __future__ import annotations + +from pathlib import Path + +import streamlit as st +from PIL import Image + +from image_prompt_search import search + +st.set_page_config(page_title="Prompt Image Search", layout="wide") +st.title("Prompt Image Search") +st.caption("Search your local indexed images with natural language prompts.") + +index_dir = st.text_input("Index directory", value="./.image_search_index") +query = st.text_input("Search prompt", value="blue and white shoes") +top_k = st.slider("Results", min_value=5, max_value=100, value=30, step=5) + +if st.button("Search"): + idx = Path(index_dir).expanduser().resolve() + if not idx.exists(): + st.error("Index directory not found. Build an index first with image_prompt_search.py.") + elif not query.strip(): + st.error("Please enter a prompt.") + else: + with st.spinner("Searching..."): + try: + results = search(idx, query=query.strip(), top_k=top_k) + except Exception as exc: + st.exception(exc) + st.stop() + + st.success(f"Found {len(results)} results") + cols = st.columns(3) + for i, item in enumerate(results): + col = cols[i % 3] + with col: + path = Path(item["path"]) + st.write(f"**Score:** {item['score']:.3f}") + st.caption(str(path)) + if path.exists(): + try: + img = Image.open(path) + st.image(img, use_container_width=True) + except Exception: + st.warning("Preview unavailable") + else: + st.warning("File not found")