Releases: nothingmn/echonotes
Releases · nothingmn/echonotes
v1.2.0
EchoNotes v1.2.0
This release turns EchoNotes into a more capable long-running transcription worker and upgrades the Obsidian output into a structured note format.
Highlights
- Migrated the audio pipeline from
openai-whisperto WhisperX and changed the runtime so transcription models load once and stay hot inside the worker process. - Refactored the app into a queue-backed worker architecture with a worker pool, so the watcher stays responsive while heavy transcription and summarization run in the background.
- Hardened ingestion:
- waits for files to become stable before processing
- ignores temporary download files
- gracefully skips encrypted PDFs and missing-OCR-dependency cases
- periodically rescans
/app/incomingso Syncthing and Windows-backed mounts still get picked up even when filesystem events are missed
- Expanded ingest support by normalizing many common audio formats through FFmpeg before transcription.
- Added provider-based LLM support with YAML configuration for:
- Open WebUI
- Ollama
- OpenAI
- Anthropic / Claude
- OpenRouter
- Added chunked transcript formatting and summarization for long meetings so large inputs do not overflow model context.
- Added Obsidian vault export:
- audio/video notes now produce vault-ready markdown, transcript, summary, and final MP3 artifacts
- timestamped transcript links are generated in Obsidian-friendly format
- Added optional WhisperX diarization so transcripts can include
Speaker 1,Speaker 2, and so on. - Added GPU runtime hardening:
- WhisperX/PyTorch compatibility fallback for newer
torch.loadbehavior - markdown fence cleanup for LLM outputs
- adaptive CUDA OOM retry with smaller batch sizes and optional CPU fallback per file
- WhisperX/PyTorch compatibility fallback for newer
- Refactored Docker delivery:
- separate CPU and CUDA images
- mounted runtime layout for
incoming,vault,config, and optionalmodel-cache - warm-only model-cache workflow in
build.sh
- Upgraded Obsidian note generation:
- structured JSON extraction prompt
- deterministic Python rendering of final notes
- richer front matter and linked entity sections
- shipped default
obsidian-template.mdandobsidian-extract.md
Docker Images
This release publishes:
robchartier/echonotes:1.2.0robchartier/echonotes:1.2.0-cuda12.8robchartier/echonotes:latestrobchartier/echonotes:latest-cuda12.8robchartier/echonotes:gpu
Operational Notes
- For GPU systems, keep
worker_count: 1unless you have explicitly verified VRAM headroom for more. - If your files arrive through Syncthing or Windows-backed mounts, EchoNotes now has a periodic fallback scan so missed filesystem events do not strand files.
- If you mount
/app/model-cache, WhisperX, alignment, and diarization assets can be reused across container restarts. - Timestamp links inside Obsidian work best with a media plugin such as Media Extended.
v1.0.2
EchoNotes v1.0.2
This release focuses on ingestion reliability and long-running GPU stability.
Highlights
- Added a periodic fallback rescan for
/app/incomingso files are still queued when filesystem events are missed on Syncthing, Windows-backed bind mounts, or other unreliable mounted paths. - Added a pending-job registry so watcher events and periodic rescans do not enqueue the same file multiple times.
- Added adaptive WhisperX GPU OOM handling:
- retries with progressively smaller batch sizes on CUDA out-of-memory
- clears CUDA memory between retries
- optionally falls back to CPU for that file instead of wedging the queue
- Added new runtime knobs for transcription memory behavior:
whisper_batch_sizewhisper_min_batch_sizegpu_oom_fallback
Docker Images
This release publishes:
robchartier/echonotes:1.0.2robchartier/echonotes:1.0.2-cuda12.8robchartier/echonotes:latestrobchartier/echonotes:latest-cuda12.8robchartier/echonotes:gpu
Operational Notes
- For GPU systems, keep
worker_count: 1unless you have explicitly verified VRAM headroom for multiple concurrent WhisperX workers. - If files are arriving through Syncthing or Windows-backed mounts, the new rescan loop should pick them up even when Docker does not surface create/move events into the container.
- If very large audio still pressures VRAM, EchoNotes now degrades per file instead of failing the whole queue. You can tune this behavior with
whisper_batch_size,whisper_min_batch_size, andgpu_oom_fallback.
v1.0.1
EchoNotes v1.0.1
This release packages the work from the last development stretch into the first stable Dockerized worker release of EchoNotes.
Highlights
- Switched audio transcription from Whisper to WhisperX.
- Added persistent model loading so ASR models are loaded once and reused.
- Refactored the app into a queue-backed worker pool instead of processing inline in the watcher.
- Added transcript formatting, chunked summarization, and provider-based LLM support for Open WebUI, Ollama, OpenAI, Anthropic/Claude, and OpenRouter.
- Added Obsidian vault export with linked MP3 timestamp references and speaker-aware transcript output.
- Added WhisperX diarization support for
Speaker 1,Speaker 2, and so on. - Expanded audio ingest to common FFmpeg-readable formats and normalized them to MP3 before transcription.
- Hardened file ingestion against partial uploads, temporary files, encrypted PDFs, and OCR dependency failures.
- Added CPU and CUDA Docker image variants with mounted config, incoming, vault, and model-cache directories.
- Added warm-cache tooling for WhisperX models and documented the Docker build/runtime flow.
Docker Images
This release publishes:
robchartier/echonotes:1.0.1robchartier/echonotes:1.0.1-cuda12.8robchartier/echonotes:latestrobchartier/echonotes:latest-cuda12.8robchartier/echonotes:gpu
Operational Notes
- GPU deployments should generally use
worker_count: 1unless you have verified VRAM headroom for multiple concurrent WhisperX workers. - WhisperX diarization depends on a Hugging Face token and will fall back cleanly if diarization is not configured.
- Files copied into Windows-backed bind mounts may not always emit reliable filesystem events into Docker; EchoNotes now queues files already present at startup, but Linux-side writes remain the most reliable path.
QA Notes
Validated against the current Testing/ corpus on the CUDA image:
- Standard PDF, DOCX, TXT, FLAC, and WAV samples completed successfully.
- Encrypted PDFs were skipped gracefully without crashing the worker.
- Short audio files produced transcript, summary, Obsidian note, and vault copies correctly.
Known issues from QA:
- Very large audio can still hit GPU memory limits depending on model choice and VRAM.
- Very large text inputs can take long enough to block a single-worker queue under slow local LLMs.