Skip to content

Releases: nothingmn/echonotes

v1.2.0

07 Mar 22:40

Choose a tag to compare

EchoNotes v1.2.0

This release turns EchoNotes into a more capable long-running transcription worker and upgrades the Obsidian output into a structured note format.

Highlights

  • Migrated the audio pipeline from openai-whisper to WhisperX and changed the runtime so transcription models load once and stay hot inside the worker process.
  • Refactored the app into a queue-backed worker architecture with a worker pool, so the watcher stays responsive while heavy transcription and summarization run in the background.
  • Hardened ingestion:
    • waits for files to become stable before processing
    • ignores temporary download files
    • gracefully skips encrypted PDFs and missing-OCR-dependency cases
    • periodically rescans /app/incoming so Syncthing and Windows-backed mounts still get picked up even when filesystem events are missed
  • Expanded ingest support by normalizing many common audio formats through FFmpeg before transcription.
  • Added provider-based LLM support with YAML configuration for:
    • Open WebUI
    • Ollama
    • OpenAI
    • Anthropic / Claude
    • OpenRouter
  • Added chunked transcript formatting and summarization for long meetings so large inputs do not overflow model context.
  • Added Obsidian vault export:
    • audio/video notes now produce vault-ready markdown, transcript, summary, and final MP3 artifacts
    • timestamped transcript links are generated in Obsidian-friendly format
  • Added optional WhisperX diarization so transcripts can include Speaker 1, Speaker 2, and so on.
  • Added GPU runtime hardening:
    • WhisperX/PyTorch compatibility fallback for newer torch.load behavior
    • markdown fence cleanup for LLM outputs
    • adaptive CUDA OOM retry with smaller batch sizes and optional CPU fallback per file
  • Refactored Docker delivery:
    • separate CPU and CUDA images
    • mounted runtime layout for incoming, vault, config, and optional model-cache
    • warm-only model-cache workflow in build.sh
  • Upgraded Obsidian note generation:
    • structured JSON extraction prompt
    • deterministic Python rendering of final notes
    • richer front matter and linked entity sections
    • shipped default obsidian-template.md and obsidian-extract.md

Docker Images

This release publishes:

  • robchartier/echonotes:1.2.0
  • robchartier/echonotes:1.2.0-cuda12.8
  • robchartier/echonotes:latest
  • robchartier/echonotes:latest-cuda12.8
  • robchartier/echonotes:gpu

Operational Notes

  • For GPU systems, keep worker_count: 1 unless you have explicitly verified VRAM headroom for more.
  • If your files arrive through Syncthing or Windows-backed mounts, EchoNotes now has a periodic fallback scan so missed filesystem events do not strand files.
  • If you mount /app/model-cache, WhisperX, alignment, and diarization assets can be reused across container restarts.
  • Timestamp links inside Obsidian work best with a media plugin such as Media Extended.

v1.0.2

07 Mar 17:42

Choose a tag to compare

EchoNotes v1.0.2

This release focuses on ingestion reliability and long-running GPU stability.

Highlights

  • Added a periodic fallback rescan for /app/incoming so files are still queued when filesystem events are missed on Syncthing, Windows-backed bind mounts, or other unreliable mounted paths.
  • Added a pending-job registry so watcher events and periodic rescans do not enqueue the same file multiple times.
  • Added adaptive WhisperX GPU OOM handling:
    • retries with progressively smaller batch sizes on CUDA out-of-memory
    • clears CUDA memory between retries
    • optionally falls back to CPU for that file instead of wedging the queue
  • Added new runtime knobs for transcription memory behavior:
    • whisper_batch_size
    • whisper_min_batch_size
    • gpu_oom_fallback

Docker Images

This release publishes:

  • robchartier/echonotes:1.0.2
  • robchartier/echonotes:1.0.2-cuda12.8
  • robchartier/echonotes:latest
  • robchartier/echonotes:latest-cuda12.8
  • robchartier/echonotes:gpu

Operational Notes

  • For GPU systems, keep worker_count: 1 unless you have explicitly verified VRAM headroom for multiple concurrent WhisperX workers.
  • If files are arriving through Syncthing or Windows-backed mounts, the new rescan loop should pick them up even when Docker does not surface create/move events into the container.
  • If very large audio still pressures VRAM, EchoNotes now degrades per file instead of failing the whole queue. You can tune this behavior with whisper_batch_size, whisper_min_batch_size, and gpu_oom_fallback.

v1.0.1

06 Mar 23:28

Choose a tag to compare

EchoNotes v1.0.1

This release packages the work from the last development stretch into the first stable Dockerized worker release of EchoNotes.

Highlights

  • Switched audio transcription from Whisper to WhisperX.
  • Added persistent model loading so ASR models are loaded once and reused.
  • Refactored the app into a queue-backed worker pool instead of processing inline in the watcher.
  • Added transcript formatting, chunked summarization, and provider-based LLM support for Open WebUI, Ollama, OpenAI, Anthropic/Claude, and OpenRouter.
  • Added Obsidian vault export with linked MP3 timestamp references and speaker-aware transcript output.
  • Added WhisperX diarization support for Speaker 1, Speaker 2, and so on.
  • Expanded audio ingest to common FFmpeg-readable formats and normalized them to MP3 before transcription.
  • Hardened file ingestion against partial uploads, temporary files, encrypted PDFs, and OCR dependency failures.
  • Added CPU and CUDA Docker image variants with mounted config, incoming, vault, and model-cache directories.
  • Added warm-cache tooling for WhisperX models and documented the Docker build/runtime flow.

Docker Images

This release publishes:

  • robchartier/echonotes:1.0.1
  • robchartier/echonotes:1.0.1-cuda12.8
  • robchartier/echonotes:latest
  • robchartier/echonotes:latest-cuda12.8
  • robchartier/echonotes:gpu

Operational Notes

  • GPU deployments should generally use worker_count: 1 unless you have verified VRAM headroom for multiple concurrent WhisperX workers.
  • WhisperX diarization depends on a Hugging Face token and will fall back cleanly if diarization is not configured.
  • Files copied into Windows-backed bind mounts may not always emit reliable filesystem events into Docker; EchoNotes now queues files already present at startup, but Linux-side writes remain the most reliable path.

QA Notes

Validated against the current Testing/ corpus on the CUDA image:

  • Standard PDF, DOCX, TXT, FLAC, and WAV samples completed successfully.
  • Encrypted PDFs were skipped gracefully without crashing the worker.
  • Short audio files produced transcript, summary, Obsidian note, and vault copies correctly.

Known issues from QA:

  • Very large audio can still hit GPU memory limits depending on model choice and VRAM.
  • Very large text inputs can take long enough to block a single-worker queue under slow local LLMs.