Skip to content

egs/vctk/TTS: Add --local-data-dir option to skip VCTK download#2078

Open
drakempham wants to merge 4 commits intok2-fsa:masterfrom
drakempham:master
Open

egs/vctk/TTS: Add --local-data-dir option to skip VCTK download#2078
drakempham wants to merge 4 commits intok2-fsa:masterfrom
drakempham:master

Conversation

@drakempham
Copy link
Copy Markdown

@drakempham drakempham commented Apr 8, 2026

I think we should allow people to point to an existing VCTK directory ( aim for easy to use)

Currently, prepare.sh attempts to download the VCTK corpus using lhotse in Stage 0. The Edinburgh DataShare server can be slow or frequently drop connections, and many users like me using kaggle - frequently tool already have the dataset mounted locally.

Changes:

Added --local-data-dir option to prepare.sh.
If provided, the script symlinks the existing directory to download/VCTK and safely skips the download.
Updated README.md to document this new option.
Example usage: bash prepare.sh --local-data-dir /kaggle/input/vctk-corpus

Summary by CodeRabbit

  • New Features

    • Added local data directory option for custom dataset preparation in training workflow
  • Documentation

    • Updated README with VITS model architecture overview and data preparation instructions
    • Added comprehensive guides for training models from scratch and running pretrained inference
    • New documentation explaining VITS model design and technical components
  • Chores

    • Updated .gitignore to exclude Python virtual environments

@coderabbitai
Copy link
Copy Markdown

coderabbitai Bot commented Apr 8, 2026

📝 Walkthrough

Walkthrough

Added Python virtual environment exclusion to version control, updated VCTK TTS README with VITS architecture details and data preparation guidance, extended prepare.sh with optional local data directory support, and introduced three new documentation files for inference, knowledge overview, and training workflows.

Changes

Cohort / File(s) Summary
Version control configuration
.gitignore
Added .venv/ entry to exclude Python virtual environment directories from version control.
Documentation and training guidance
egs/vctk/TTS/README.md
Replaced VCTK corpus description with VITS model features section (VAE, normalizing flow, GAN), added Data Preparation section documenting prepare.sh usage with --local-data-dir option, and reformatted training command with line continuation.
Data preparation script enhancement
egs/vctk/TTS/prepare.sh
Added local_data_dir option to Stage 0 to support creating a symlink from $local_data_dir to $dl_dir/VCTK as an alternative to downloading data via lhotse.
New workflow documentation
egs/vctk/TTS/infer_pretrained.md, egs/vctk/TTS/knowledge.md, egs/vctk/TTS/train_from_scratch.md
Added three new markdown documents: infer_pretrained.md provides step-by-step cells for installing dependencies and running inference with a pretrained VITS checkpoint; knowledge.md explains VITS architecture and training components (VAE, normalizing flows, duration prediction, alignment search, adversarial training); train_from_scratch.md documents a complete training workflow from data preparation through ONNX export.

Estimated code review effort

🎯 2 (Simple) | ⏱️ ~12 minutes

Poem

🐰 A hop and a skip through text-to-speech dreams,
VITS flows like water through digital streams,
With symlinks and scripts, we prepare the way,
Training and inference—let the models play! 🎵

🚥 Pre-merge checks | ✅ 3
✅ Passed checks (3 passed)
Check name Status Explanation
Description Check ✅ Passed Check skipped - CodeRabbit’s high-level summary is enabled.
Title check ✅ Passed The PR title directly and accurately summarizes the main change: adding a --local-data-dir option to prepare.sh to skip VCTK download, which is the core feature across prepare.sh and README.md updates.
Docstring Coverage ✅ Passed No functions found in the changed files to evaluate docstring coverage. Skipping docstring coverage check.

✏️ Tip: You can configure your own custom pre-merge checks in the settings.

✨ Finishing Touches
🧪 Generate unit tests (beta)
  • Create PR with unit tests

Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share

Comment @coderabbitai help to get the list of available commands and usage tips.

Copy link
Copy Markdown

@gemini-code-assist gemini-code-assist Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Code Review

This pull request adds .venv/ to the .gitignore and introduces a --local-data-dir option to the VCTK TTS prepare.sh script, enabling the use of local datasets via symlinking. Feedback focuses on improving the robustness of the bash script by properly quoting path variables, ensuring absolute paths are used for symlinks to avoid broken references, and adding validation to verify the existence of the local data directory.

Comment thread egs/vctk/TTS/prepare.sh
Comment on lines +58 to +64
if [ -n "$local_data_dir" ]; then
log "Using local data directory: $local_data_dir"
mkdir -p $dl_dir
ln -sfv $local_data_dir $dl_dir/VCTK
else
lhotse download vctk --use-edinburgh-vctk-url ${use_edinburgh_vctk_url} $dl_dir
fi
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

medium

The implementation of the --local-data-dir option should be more robust to handle common edge cases:

  1. Quoting: Path variables (like $local_data_dir and $dl_dir) should be double-quoted to prevent the script from breaking if the user provides a path containing spaces.
  2. Absolute Paths: If a relative path is passed to --local-data-dir, the symlink created at $dl_dir/VCTK will likely be broken because symlinks are resolved relative to their parent directory. Converting the path to an absolute one using readlink -f (or similar) ensures the symlink remains valid.
  3. Validation: It is better to verify that the provided directory actually exists before attempting to symlink it, providing a clear error message if it is missing.
References
  1. Always quote variables that contain file names or paths to handle spaces and special characters correctly.

Copy link
Copy Markdown

@chatgpt-codex-connector chatgpt-codex-connector Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

💡 Codex Review

Here are some automated review suggestions for this pull request.

Reviewed commit: 4e3c9f0099

ℹ️ About Codex in GitHub

Codex has been enabled to automatically review pull requests in this repo. Reviews are triggered when you

  • Open a pull request for review
  • Mark a draft as ready
  • Comment "@codex review".

If Codex has suggestions, it will comment; otherwise it will react with 👍.

When you sign up for Codex through ChatGPT, Codex can also answer questions or update the PR, like "@codex address that feedback".

Comment thread egs/vctk/TTS/prepare.sh
if [ -n "$local_data_dir" ]; then
log "Using local data directory: $local_data_dir"
mkdir -p $dl_dir
ln -sfv $local_data_dir $dl_dir/VCTK
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

P2 Badge Quote local-data-dir when creating symlink

The new --local-data-dir path is expanded unquoted in ln -sfv $local_data_dir $dl_dir/VCTK, so any directory containing whitespace (or shell glob characters) is split into multiple arguments and Stage 0 fails before data preparation starts. This makes the new option unreliable for valid filesystem paths such as mounted datasets with spaces in their names; quoting both operands avoids this regression.

Useful? React with 👍 / 👎.

Comment thread egs/vctk/TTS/prepare.sh
if [ ! -d $dl_dir/VCTK ]; then
lhotse download vctk --use-edinburgh-vctk-url ${use_edinburgh_vctk_url} $dl_dir
if [ -n "$local_data_dir" ]; then
log "Using local data directory: $local_data_dir"
Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can you check that "$local_data_dir" exists?

Copy link
Copy Markdown

@coderabbitai coderabbitai Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 1

🧹 Nitpick comments (1)
egs/vctk/TTS/prepare.sh (1)

57-57: Quote the variable to handle paths with spaces.

Per shellcheck SC2086, $dl_dir should be quoted to prevent issues with paths containing spaces or special characters.

♻️ Proposed fix
-  if [ ! -d $dl_dir/VCTK ]; then
+  if [ ! -d "$dl_dir/VCTK" ]; then
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@egs/vctk/TTS/prepare.sh` at line 57, The test for the VCTK directory uses an
unquoted variable; update the conditional in prepare.sh to quote the dl_dir path
so paths with spaces are handled correctly (change the check that references
$dl_dir/VCTK to use "$dl_dir/VCTK"). Ensure other occurrences in the same script
that reference dl_dir are also quoted consistently to avoid SC2086 issues.
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.

Inline comments:
In `@egs/vctk/TTS/prepare.sh`:
- Around line 58-64: In prepare.sh, validate that the provided local_data_dir
exists and is a directory before creating the symlink: check [ -n
"$local_data_dir" ] && [ -d "$local_data_dir" ] (or use test -d) and if the
check fails call log with a clear error and exit non-zero; only run mkdir -p
"$dl_dir" and ln -sfv "$local_data_dir" "$dl_dir/VCTK" when the directory check
passes so invalid paths don't produce confusing errors later.

---

Nitpick comments:
In `@egs/vctk/TTS/prepare.sh`:
- Line 57: The test for the VCTK directory uses an unquoted variable; update the
conditional in prepare.sh to quote the dl_dir path so paths with spaces are
handled correctly (change the check that references $dl_dir/VCTK to use
"$dl_dir/VCTK"). Ensure other occurrences in the same script that reference
dl_dir are also quoted consistently to avoid SC2086 issues.
🪄 Autofix (Beta)

Fix all unresolved CodeRabbit comments on this PR:

  • Push a commit to this branch (recommended)
  • Create a new PR with the fixes

ℹ️ Review info
⚙️ Run configuration

Configuration used: defaults

Review profile: CHILL

Plan: Pro

Run ID: 3f6b4434-54ed-4fba-ab8c-643760c532f8

📥 Commits

Reviewing files that changed from the base of the PR and between e9f4eb0 and 4e3c9f0.

📒 Files selected for processing (3)
  • .gitignore
  • egs/vctk/TTS/README.md
  • egs/vctk/TTS/prepare.sh

Comment thread egs/vctk/TTS/prepare.sh
Comment on lines +58 to +64
if [ -n "$local_data_dir" ]; then
log "Using local data directory: $local_data_dir"
mkdir -p $dl_dir
ln -sfv $local_data_dir $dl_dir/VCTK
else
lhotse download vctk --use-edinburgh-vctk-url ${use_edinburgh_vctk_url} $dl_dir
fi
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟡 Minor

Add validation for the local data directory path.

The script doesn't verify that $local_data_dir exists before creating the symlink. If an invalid path is provided, subsequent stages will fail with confusing errors.

🛡️ Proposed fix to validate the directory exists
     if [ -n "$local_data_dir" ]; then
+      if [ ! -d "$local_data_dir" ]; then
+        log "Error: local data directory does not exist: $local_data_dir"
+        exit 1
+      fi
       log "Using local data directory: $local_data_dir"
-      mkdir -p $dl_dir
-      ln -sfv $local_data_dir $dl_dir/VCTK
+      mkdir -p "$dl_dir"
+      ln -sfv "$local_data_dir" "$dl_dir/VCTK"
     else
-      lhotse download vctk --use-edinburgh-vctk-url ${use_edinburgh_vctk_url} $dl_dir
+      lhotse download vctk --use-edinburgh-vctk-url ${use_edinburgh_vctk_url} "$dl_dir"
     fi
🧰 Tools
🪛 Shellcheck (0.11.0)

[info] 60-60: Double quote to prevent globbing and word splitting.

(SC2086)


[info] 61-61: Double quote to prevent globbing and word splitting.

(SC2086)


[info] 61-61: Double quote to prevent globbing and word splitting.

(SC2086)


[info] 63-63: Double quote to prevent globbing and word splitting.

(SC2086)

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@egs/vctk/TTS/prepare.sh` around lines 58 - 64, In prepare.sh, validate that
the provided local_data_dir exists and is a directory before creating the
symlink: check [ -n "$local_data_dir" ] && [ -d "$local_data_dir" ] (or use test
-d) and if the check fails call log with a clear error and exit non-zero; only
run mkdir -p "$dl_dir" and ln -sfv "$local_data_dir" "$dl_dir/VCTK" when the
directory check passes so invalid paths don't produce confusing errors later.

Copy link
Copy Markdown

@coderabbitai coderabbitai Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 3

🧹 Nitpick comments (3)
egs/vctk/TTS/runner.md (3)

110-112: Add language specifier to fenced code block.

The code block should specify a language for proper syntax highlighting.

📝 Proposed fix
-```
+```text
 Text ──→ [Phoneme Encoder] ──→ [Latent Flow] ──→ [Waveform Decoder] ──→ Audio
</details>

<details>
<summary>🤖 Prompt for AI Agents</summary>

Verify each finding against the current code and only fix it if needed.

In @egs/vctk/TTS/runner.md around lines 110 - 112, The fenced code block showing
the TTS pipeline uses a plain triple-backtick fence; update that fence to
include a language specifier (e.g., change totext) so the block with
"Text ──→ [Phoneme Encoder] ──→ [Latent Flow] ──→ [Waveform Decoder] ──→ Audio"
gets proper syntax highlighting; locate the fenced block in runner.md and
replace the opening fence accordingly.


</details>

---

`123-130`: **Add blank lines around the table.**

Markdown tables should be surrounded by blank lines for proper rendering.

<details>
<summary>📝 Proposed fix</summary>

```diff
 **Key dependencies:**
+
 | Library | Role |
 |---|---|
 | `k2` | FSA-based loss computation on GPU (must match CUDA + PyTorch version exactly) |
 | `lhotse` | Audio dataset loading, manifest creation, spectrogram computation |
 | `piper_phonemize` | Converts text → IPA phonemes used by VITS |
 | `icefall` (package) | Training utilities, logging, data pipeline code |
+
 ---
```
</details>

<details>
<summary>🤖 Prompt for AI Agents</summary>

```
Verify each finding against the current code and only fix it if needed.

In `@egs/vctk/TTS/runner.md` around lines 123 - 130, The Markdown block under the
"Key dependencies:" header contains a table that lacks surrounding blank lines;
update the section so there is a blank line immediately before the table (i.e.,
after the "Key dependencies:" line) and a blank line immediately after the table
to ensure correct Markdown rendering of the table in runner.md.
```

</details>

---

`347-373`: **Add language specifier to fenced code block.**

The pipeline summary code block should specify a language for proper syntax highlighting.

<details>
<summary>📝 Proposed fix</summary>

```diff
-```
+```text
 [Step 1] Install: icefall + k2 + piper_phonemize + lhotse
          ↓
```
</details>

<details>
<summary>🤖 Prompt for AI Agents</summary>

Verify each finding against the current code and only fix it if needed.

In @egs/vctk/TTS/runner.md around lines 347 - 373, The fenced code block that
contains the pipeline summary (the block starting with the lines showing "[Step
1] Install: icefall + k2 + piper_phonemize + lhotse" through "[Step 7]
export-onnx.py") lacks a language specifier for syntax highlighting; update the
opening fence from totext (or another appropriate language tag) so the
block becomes text ... to ensure proper rendering and highlighting in
runner.md.


</details>

</blockquote></details>

</blockquote></details>

<details>
<summary>🤖 Prompt for all review comments with AI agents</summary>

Verify each finding against the current code and only fix it if needed.

Inline comments:
In @egs/vctk/TTS/runner.md:

  • Around line 206-223: The downloaded checkpoint is placed at {BASE_DIR}/exp/
    but train.py (via load_checkpoint_if_available) expects it at vits/exp/, causing
    start-epoch loading to fail; after calling hf_hub_download for
    "exp/epoch-1000.pt", copy or move that file into f"{BASE_DIR}/vits/exp/" (create
    the dir if missing) so params.exp_dir (default vits/exp) contains epoch-1000.pt
    before users run train.py --start-epoch 1001; ensure the copy uses the actual
    downloaded path returned by hf_hub_download and preserves file permissions.
  • Around line 254-267: Update the documentation to remove the claim that
    --local-data-dir does not work with parse_options.sh and instead show the new
    PR-supported usage: replace the manual symlink example with a direct invocation
    of prepare.sh using the --local-data-dir /kaggle/input/vctk-corpus flag
    (keeping the existing --stage/--stop_stage examples), and correct the
    example dataset path from /kaggle/input/datasets/ to
    /kaggle/input/vctk-corpus; reference the prepare.sh script and the
    --local-data-dir flag in the text so readers know the script will symlink the
    provided directory and skip Stage 0.
  • Around line 195-204: Update the directory name in the example block to match
    the 2024 model used elsewhere: replace the header string
    "icefall-tts-vctk-vits-2023-12-05/" with "icefall-tts-vctk-vits-2024-03-18/" in
    egs/vctk/TTS/runner.md so the displayed tree (tokens.txt, exp/epoch-1000.pt,
    vits-epoch-1000.onnx, etc.) is consistent with the model referenced elsewhere in
    the document.

Nitpick comments:
In @egs/vctk/TTS/runner.md:

  • Around line 110-112: The fenced code block showing the TTS pipeline uses a
    plain triple-backtick fence; update that fence to include a language specifier
    (e.g., change totext) so the block with "Text ──→ [Phoneme Encoder] ──→
    [Latent Flow] ──→ [Waveform Decoder] ──→ Audio" gets proper syntax highlighting;
    locate the fenced block in runner.md and replace the opening fence accordingly.
  • Around line 123-130: The Markdown block under the "Key dependencies:" header
    contains a table that lacks surrounding blank lines; update the section so there
    is a blank line immediately before the table (i.e., after the "Key
    dependencies:" line) and a blank line immediately after the table to ensure
    correct Markdown rendering of the table in runner.md.
  • Around line 347-373: The fenced code block that contains the pipeline summary
    (the block starting with the lines showing "[Step 1] Install: icefall + k2 +
    piper_phonemize + lhotse" through "[Step 7] export-onnx.py") lacks a language
    specifier for syntax highlighting; update the opening fence from totext
    (or another appropriate language tag) so the block becomes text ... to
    ensure proper rendering and highlighting in runner.md.

</details>

<details>
<summary>🪄 Autofix (Beta)</summary>

Fix all unresolved CodeRabbit comments on this PR:

- [ ] <!-- {"checkboxId": "4b0d0e0a-96d7-4f10-b296-3a18ea78f0b9"} --> Push a commit to this branch (recommended)
- [ ] <!-- {"checkboxId": "ff5b1114-7d8c-49e6-8ac1-43f82af23a33"} --> Create a new PR with the fixes

</details>

---

<details>
<summary>ℹ️ Review info</summary>

<details>
<summary>⚙️ Run configuration</summary>

**Configuration used**: defaults

**Review profile**: CHILL

**Plan**: Pro

**Run ID**: `3d40eb37-0d25-429b-87fb-e92956afe430`

</details>

<details>
<summary>📥 Commits</summary>

Reviewing files that changed from the base of the PR and between 4e3c9f00991ae635e21e116ef7889c62f4451e42 and 35b38c42c6b57bbbe6e89a475fa7ccab026f9163.

</details>

<details>
<summary>📒 Files selected for processing (3)</summary>

* `egs/vctk/TTS/README.md`
* `egs/vctk/TTS/runner.md`
* `egs/vctk/TTS/vctk-vits-training.ipynb`

</details>

<details>
<summary>✅ Files skipped from review due to trivial changes (1)</summary>

* egs/vctk/TTS/README.md

</details>

</details>

<!-- This is an auto-generated comment by CodeRabbit for review status -->

Comment thread egs/vctk/TTS/runner.md Outdated
Comment thread egs/vctk/TTS/runner.md Outdated
Comment on lines +206 to +223
```python
from huggingface_hub import hf_hub_download
import os

MODEL_ID = "zrjin/icefall-tts-vctk-vits-2024-03-18"
BASE_DIR = "/kaggle/working/icefall/egs/vctk/TTS"

os.makedirs(f"{BASE_DIR}/vits/exp", exist_ok=True)
os.makedirs(f"{BASE_DIR}/data", exist_ok=True)

# Download checkpoint (~1.08 GB) → vits/exp/epoch-1000.pt
hf_hub_download(repo_id=MODEL_ID, filename="exp/epoch-1000.pt", local_dir=BASE_DIR)

# Download token file → data/tokens.txt
hf_hub_download(repo_id=MODEL_ID, filename="data/tokens.txt", local_dir=BASE_DIR)

print("Done.")
```
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🔴 Critical

Missing checkpoint copy step breaks training.

Unlike the Quick Start section (lines 54-55), this code downloads epoch-1000.pt to {BASE_DIR}/exp/ but does NOT copy it to {BASE_DIR}/vits/exp/ where train.py expects it. When users run Step 5 (train.py --start-epoch 1001), it will fail with "epoch-1000.pt does not exist" because load_checkpoint_if_available looks for params.exp_dir / f"epoch-{params.start_epoch-1}.pt" where exp_dir defaults to vits/exp.

🔧 Proposed fix
 from huggingface_hub import hf_hub_download
-import os
+import os, shutil
 
 MODEL_ID = "zrjin/icefall-tts-vctk-vits-2024-03-18"
 BASE_DIR  = "/kaggle/working/icefall/egs/vctk/TTS"
 
 os.makedirs(f"{BASE_DIR}/vits/exp", exist_ok=True)
 os.makedirs(f"{BASE_DIR}/data", exist_ok=True)
 
-# Download checkpoint (~1.08 GB) → vits/exp/epoch-1000.pt
+# Download checkpoint (~1.08 GB).
+# hf_hub_download preserves the repo path → saves to BASE_DIR/exp/epoch-1000.pt
+# But train.py expects it at BASE_DIR/vits/exp/epoch-1000.pt
+# So: download first, then move to the correct location.
 hf_hub_download(repo_id=MODEL_ID, filename="exp/epoch-1000.pt",  local_dir=BASE_DIR)
+shutil.copy2(f"{BASE_DIR}/exp/epoch-1000.pt", f"{BASE_DIR}/vits/exp/epoch-1000.pt")
 
-# Download token file → data/tokens.txt
+# Phoneme token map (required by both train.py and infer.py)
 hf_hub_download(repo_id=MODEL_ID, filename="data/tokens.txt", local_dir=BASE_DIR)
 
 print("Done.")
📝 Committable suggestion

‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.

Suggested change
```python
from huggingface_hub import hf_hub_download
import os
MODEL_ID = "zrjin/icefall-tts-vctk-vits-2024-03-18"
BASE_DIR = "/kaggle/working/icefall/egs/vctk/TTS"
os.makedirs(f"{BASE_DIR}/vits/exp", exist_ok=True)
os.makedirs(f"{BASE_DIR}/data", exist_ok=True)
# Download checkpoint (~1.08 GB) → vits/exp/epoch-1000.pt
hf_hub_download(repo_id=MODEL_ID, filename="exp/epoch-1000.pt", local_dir=BASE_DIR)
# Download token file → data/tokens.txt
hf_hub_download(repo_id=MODEL_ID, filename="data/tokens.txt", local_dir=BASE_DIR)
print("Done.")
```
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@egs/vctk/TTS/runner.md` around lines 206 - 223, The downloaded checkpoint is
placed at {BASE_DIR}/exp/ but train.py (via load_checkpoint_if_available)
expects it at vits/exp/, causing start-epoch loading to fail; after calling
hf_hub_download for "exp/epoch-1000.pt", copy or move that file into
f"{BASE_DIR}/vits/exp/" (create the dir if missing) so params.exp_dir (default
vits/exp) contains epoch-1000.pt before users run train.py --start-epoch 1001;
ensure the copy uses the actual downloaded path returned by hf_hub_download and
preserves file permissions.

Comment thread egs/vctk/TTS/runner.md Outdated
Comment on lines +254 to +267
The flag `--local-data-dir` does NOT work with parse_options.sh.
The correct approach (documented in prepare.sh itself) is to create a symlink manually:

```bash
# Point download/VCTK to your existing Kaggle dataset
!mkdir -p /kaggle/working/icefall/egs/vctk/TTS/download
!ln -sfv /kaggle/input/datasets/ \
/kaggle/working/icefall/egs/vctk/TTS/download/VCTK

# Now run stages 1–4. Stage 0 (download) is automatically skipped
# because download/VCTK already exists.
%cd /kaggle/working/icefall/egs/vctk/TTS
!bash prepare.sh --stage 1 --stop_stage 4
```
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟠 Major

Documentation contradicts the PR's feature addition.

Lines 254-255 state that --local-data-dir does NOT work with parse_options.sh and recommend manual symlinking. However, the PR's entire purpose is to add a working --local-data-dir option to prepare.sh. The PR description explicitly shows: bash prepare.sh --local-data-dir /kaggle/input/vctk-corpus.

Additionally, line 260 uses /kaggle/input/datasets/ which appears incomplete or incorrect compared to the PR's example path /kaggle/input/vctk-corpus.

📝 Suggested revision to match PR feature
-The flag `--local-data-dir` does NOT work with parse_options.sh.
-The correct approach (documented in prepare.sh itself) is to create a symlink manually:
+VCTK is a large dataset (~11 GB). If you already have it mounted locally (e.g., on Kaggle),
+use `--local-data-dir` to skip the download:
 
 ```bash
-# Point download/VCTK to your existing Kaggle dataset
-!mkdir -p /kaggle/working/icefall/egs/vctk/TTS/download
-!ln -sfv /kaggle/input/datasets/ \
-          /kaggle/working/icefall/egs/vctk/TTS/download/VCTK
-
-# Now run stages 1–4. Stage 0 (download) is automatically skipped
-# because download/VCTK already exists.
 %cd /kaggle/working/icefall/egs/vctk/TTS
-!bash prepare.sh --stage 1 --stop_stage 4
+!bash prepare.sh --local-data-dir /kaggle/input/vctk-corpus --stage 1 --stop_stage 4

+This tells prepare.sh to symlink your existing VCTK directory and skip Stage 0 (download).

</details>

<!-- suggestion_start -->

<details>
<summary>📝 Committable suggestion</summary>

> ‼️ **IMPORTANT**
> Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.

```suggestion
VCTK is a large dataset (~11 GB). If you already have it mounted locally (e.g., on Kaggle),
use `--local-data-dir` to skip the download:
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@egs/vctk/TTS/runner.md` around lines 254 - 267, Update the documentation to
remove the claim that `--local-data-dir` does not work with parse_options.sh and
instead show the new PR-supported usage: replace the manual symlink example with
a direct invocation of prepare.sh using the `--local-data-dir
/kaggle/input/vctk-corpus` flag (keeping the existing `--stage`/`--stop_stage`
examples), and correct the example dataset path from `/kaggle/input/datasets/`
to `/kaggle/input/vctk-corpus`; reference the `prepare.sh` script and the
`--local-data-dir` flag in the text so readers know the script will symlink the
provided directory and skip Stage 0.

@csukuangfj
Copy link
Copy Markdown
Collaborator

Screenshot 2026-04-09 at 11 13 43

There is an error in the colab notebook. Can you fix it?

Copy link
Copy Markdown

@coderabbitai coderabbitai Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 6

🧹 Nitpick comments (1)
egs/vctk/TTS/infer_pretrained.md (1)

75-79: Guard against missing output directory before listing WAV files.

os.listdir(wav_dir_test) will throw if inference didn’t produce that path. Add an existence check and a clear message.

Suggested robustness tweak
 wav_dir_test = os.path.join(wav_dir, "test")
-wav_files = sorted(os.listdir(wav_dir_test))
+if not os.path.isdir(wav_dir_test):
+    raise FileNotFoundError(f"Inference output not found: {wav_dir_test}")
+wav_files = sorted(os.listdir(wav_dir_test))
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@egs/vctk/TTS/infer_pretrained.md` around lines 75 - 79, The code calls
os.listdir(wav_dir_test) without verifying the directory exists (wav_dir_test),
so add a pre-check using os.path.exists/os.path.isdir for wav_dir_test before
building wav_files; if the path is missing, raise a clear error or log a message
that inference output is absent (e.g., "Inference output not found at
{wav_dir_test}, run inference or check path") instead of letting os.listdir
throw, or optionally create the directory if that fits the workflow.
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.

Inline comments:
In `@egs/vctk/TTS/infer_pretrained.md`:
- Line 3: The heading "Cell 1: Install Dependencies" is using an H3 (###) which
jumps from the document H1 to H3; change that heading to H2 (replace "### Cell
1: Install Dependencies" with "## Cell 1: Install Dependencies") so heading
levels are incremental and consistent.
- Around line 27-35: The current notebook creates an ambiguous symlink to
/kaggle/input/datasets/ then runs prepare.sh; instead remove the ln -sfv step
and call the script with the explicit --local-data-dir argument (prepare.sh
--local-data-dir /path/to/VCTK) so the prep stages use the precise VCTK root;
update the two prepare.sh invocations (the build/extension call and the
manifest/spectrogram/token/data-split call) to pass --local-data-dir where
needed and delete the symlink creation to make setup reproducible.
- Around line 14-16: Replace the hardcoded k2 pip install line with a
compatibility check that inspects the runtime PyTorch and CUDA versions (e.g.,
using torch.__version__ and torch.version.cuda) and selects the matching k2
wheel from the k2 CUDA wheel index (or else print a clear error/requirement
message); alternatively add a short documented note above the existing pip
install explaining the exact required CUDA and PyTorch versions and listing the
alternate wheel tags (CUDA 12.6, 12.8, 12.9, 13.0 for torch 2.10.0) so users can
choose the correct pip target instead of the pinned cuda12.8.torch2.10.0 wheel.

In `@egs/vctk/TTS/knowledge.md`:
- Line 13: Replace the misspelled Vietnamese technical term "khuyếch đại" with
the correct spelling "khuếch đại" in the sentence describing the drawback (Error
Accumulation) so the line reads that the Vocoder will "khuếch đại" the blurred
spectrogram into artifacts or robotic noise; update the Occurrence of "khuyếch
đại" in the TTS knowledge text to the correct form.

In `@egs/vctk/TTS/train_from_scratch.md`:
- Line 3: Heading "Cell 1: Install Dependencies" uses H3 (###) causing a jump
from H1 to H3; change the Markdown heading to H2 by replacing the leading "###"
with "##" so the file's heading progression is H1 → H2 → ... and validates
properly.
- Around line 27-35: The doc currently creates a fragile symlink to a broad path
instead of using the new CLI flag; update the instructions to remove the
mkdir/ln steps and call prepare.sh with the explicit --local-data-dir pointing
at the actual VCTK dataset root (use the same --local-data-dir flag introduced
in the PR) so subsequent steps (monotonic_align build via prepare.sh and stages
1–6) consume the correct dataset; reference the prepare.sh invocation and the
--local-data-dir flag when making this change.

---

Nitpick comments:
In `@egs/vctk/TTS/infer_pretrained.md`:
- Around line 75-79: The code calls os.listdir(wav_dir_test) without verifying
the directory exists (wav_dir_test), so add a pre-check using
os.path.exists/os.path.isdir for wav_dir_test before building wav_files; if the
path is missing, raise a clear error or log a message that inference output is
absent (e.g., "Inference output not found at {wav_dir_test}, run inference or
check path") instead of letting os.listdir throw, or optionally create the
directory if that fits the workflow.
🪄 Autofix (Beta)

Fix all unresolved CodeRabbit comments on this PR:

  • Push a commit to this branch (recommended)
  • Create a new PR with the fixes

ℹ️ Review info
⚙️ Run configuration

Configuration used: defaults

Review profile: CHILL

Plan: Pro

Run ID: 66142269-62a0-4bc4-baf3-fbf8129d3651

📥 Commits

Reviewing files that changed from the base of the PR and between 30d80c9 and 5696479.

⛔ Files ignored due to path filters (1)
  • egs/vctk/TTS/image.png is excluded by !**/*.png
📒 Files selected for processing (4)
  • egs/vctk/TTS/README.md
  • egs/vctk/TTS/infer_pretrained.md
  • egs/vctk/TTS/knowledge.md
  • egs/vctk/TTS/train_from_scratch.md
🚧 Files skipped from review as they are similar to previous changes (1)
  • egs/vctk/TTS/README.md

@@ -0,0 +1,84 @@
# Inference with Pretrained VITS Model

### Cell 1: Install Dependencies
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟡 Minor

Fix heading level jump (H1 → H3).

Line 3 should use ## to keep heading levels incremental.

🧰 Tools
🪛 markdownlint-cli2 (0.22.0)

[warning] 3-3: Heading levels should only increment by one level at a time
Expected: h2; Actual: h3

(MD001, heading-increment)

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@egs/vctk/TTS/infer_pretrained.md` at line 3, The heading "Cell 1: Install
Dependencies" is using an H3 (###) which jumps from the document H1 to H3;
change that heading to H2 (replace "### Cell 1: Install Dependencies" with "##
Cell 1: Install Dependencies") so heading levels are incremental and consistent.

Comment on lines +14 to +16
# Install k2 (must match CUDA 12.8 + PyTorch 2.10.0)
!pip install k2==1.24.4.dev20260306+cuda12.8.torch2.10.0 -f https://k2-fsa.github.io/k2/cuda.html

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟠 Major

🧩 Analysis chain

🌐 Web query:

For k2 installation, what are the supported CUDA/PyTorch wheel combinations and how should users choose the correct wheel in notebooks?

💡 Result:

Supported CUDA/PyTorch wheel combinations (prebuilt Linux CUDA wheels)

k2’s CUDA wheels are built for specific (CUDA runtime, PyTorch) pairs; you must match both (as encoded in the wheel tag +cudaX.Y.torchA.B.C). The currently listed combinations include, for example:

  • torch 2.10.0 with CUDA 12.6 / 12.8 / 12.9 / 13.0 wheels (multiple Python versions). [1]
  • torch 2.4.0 with CUDA 11.8 / 12.1 / 12.4 wheels (multiple Python versions). [1]

The authoritative, up-to-date list is the wheel index page: https://k2-fsa.github.io/k2/cuda.html. [1]

How users should choose the correct wheel in notebooks

  1. Check the PyTorch you already have in the notebook (this is what you must match):
import torch, sys
print("python:", sys.version)
print("torch :", torch.__version__)
print("cuda  :", torch.version.cuda)  # None if CPU-only torch
  1. Pick the k2 wheel whose tag matches exactly:
  • torch.__version__ → matches the torch… part
  • torch.version.cuda → matches the cuda… part
    (Example naming: ...+cuda11.7.torch2.0.1...). [2]
  1. Install following the documented pattern: install torch first, then install the matching k2 wheel from the wheel index:
pip install k2==<k2_version>+cuda<cuda_version>.torch<torch_version> -f https://k2-fsa.github.io/k2/cuda.html

(And if you’re on Windows, CUDA wheels aren’t provided; you’d use CPU wheels or build from source.) [2]

Rule of thumb for notebooks: don’t “guess” a k2 wheel—read torch.__version__ / torch.version.cuda, then select the matching entry from cuda.html. [1][2]


Add compatibility check before k2 installation to match user's CUDA/PyTorch versions.

The pinned wheel k2==1.24.4.dev20260306+cuda12.8.torch2.10.0 only works for CUDA 12.8 + PyTorch 2.10.0; users with different versions will fail. For PyTorch 2.10.0, k2 also provides wheels for CUDA 12.6, 12.9, and 13.0. Add a check to detect the user's current versions and select the matching wheel, or document which CUDA versions the notebook requires.

import torch
print(f"PyTorch: {torch.__version__}, CUDA: {torch.version.cuda}")
# Then select the k2 wheel from https://k2-fsa.github.io/k2/cuda.html matching your versions
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@egs/vctk/TTS/infer_pretrained.md` around lines 14 - 16, Replace the hardcoded
k2 pip install line with a compatibility check that inspects the runtime PyTorch
and CUDA versions (e.g., using torch.__version__ and torch.version.cuda) and
selects the matching k2 wheel from the k2 CUDA wheel index (or else print a
clear error/requirement message); alternatively add a short documented note
above the existing pip install explaining the exact required CUDA and PyTorch
versions and listing the alternate wheel tags (CUDA 12.6, 12.8, 12.9, 13.0 for
torch 2.10.0) so users can choose the correct pip target instead of the pinned
cuda12.8.torch2.10.0 wheel.

Comment on lines +27 to +35
!mkdir -p download
!ln -sfv /kaggle/input/datasets/ download/VCTK

# Build monotonic_align C extension
!bash prepare.sh --stage -1 --stop_stage -1

# Create manifests, spectrograms, tokens, and data splits
!bash prepare.sh --stage 1 --stop_stage 4
```
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟠 Major

Prefer --local-data-dir over symlinking an ambiguous Kaggle path.

Line 28 symlinks a generic directory and may not resolve to the VCTK root. Use the new prepare.sh --local-data-dir option for a precise, reproducible setup.

Suggested doc update
 !mkdir -p download
-!ln -sfv /kaggle/input/datasets/ download/VCTK
-
-# Build monotonic_align C extension
-!bash prepare.sh --stage -1 --stop_stage -1
+!bash prepare.sh --stage -1 --stop_stage -1
+!bash prepare.sh --local-data-dir /kaggle/input/vctk-corpus --stage 0 --stop_stage 0
 
 # Create manifests, spectrograms, tokens, and data splits
 !bash prepare.sh --stage 1 --stop_stage 4
📝 Committable suggestion

‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.

Suggested change
!mkdir -p download
!ln -sfv /kaggle/input/datasets/ download/VCTK
# Build monotonic_align C extension
!bash prepare.sh --stage -1 --stop_stage -1
# Create manifests, spectrograms, tokens, and data splits
!bash prepare.sh --stage 1 --stop_stage 4
```
!mkdir -p download
!bash prepare.sh --stage -1 --stop_stage -1
!bash prepare.sh --local-data-dir /kaggle/input/vctk-corpus --stage 0 --stop_stage 0
# Create manifests, spectrograms, tokens, and data splits
!bash prepare.sh --stage 1 --stop_stage 4
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@egs/vctk/TTS/infer_pretrained.md` around lines 27 - 35, The current notebook
creates an ambiguous symlink to /kaggle/input/datasets/ then runs prepare.sh;
instead remove the ln -sfv step and call the script with the explicit
--local-data-dir argument (prepare.sh --local-data-dir /path/to/VCTK) so the
prep stages use the precise VCTK root; update the two prepare.sh invocations
(the build/extension call and the manifest/spectrogram/token/data-split call) to
pass --local-data-dir where needed and delete the symlink creation to make setup
reproducible.

Comment thread egs/vctk/TTS/knowledge.md
1. **Acoustic Model** (Tacotron 2, FastSpeech): Biến **Text** thành **Mel-spectrogram** (dạng hình ảnh biểu diễn âm thanh tần số).
2. **Vocoder** (WaveNet, HiFi-GAN): Biến **Mel-spectrogram** thành **Waveform** (sóng âm thanh thô để phát ra loa).

**Nhược điểm của cách cũ:** Tích tụ lỗi (Error Accumulation). Nếu Acoustic Model dự đoán Spectrogram hơi mờ, Vocoder sẽ khuyếch đại cái "mờ" đó thành tiếng nhiễu (artifacts) hoặc tiếng robot.
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟡 Minor

Fix spelling in the Vietnamese technical term.

Line 13 uses “khuyếch đại”; the standard spelling is “khuếch đại”.

🧰 Tools
🪛 LanguageTool

[grammar] ~13-~13: Ensure spelling is correct
Context: ... dự đoán Spectrogram hơi mờ, Vocoder sẽ khuyếch đại cái "mờ" đó thành tiếng nhiễu (arti...

(QB_NEW_EN_ORTHOGRAPHY_ERROR_IDS_1)

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@egs/vctk/TTS/knowledge.md` at line 13, Replace the misspelled Vietnamese
technical term "khuyếch đại" with the correct spelling "khuếch đại" in the
sentence describing the drawback (Error Accumulation) so the line reads that the
Vocoder will "khuếch đại" the blurred spectrogram into artifacts or robotic
noise; update the Occurrence of "khuyếch đại" in the TTS knowledge text to the
correct form.

@@ -0,0 +1,64 @@
# Train VITS Model From Scratch

### Cell 1: Install Dependencies
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟡 Minor

Fix heading level jump (H1 → H3).

Line 3 should be ## instead of ### to satisfy Markdown heading progression.

🧰 Tools
🪛 markdownlint-cli2 (0.22.0)

[warning] 3-3: Heading levels should only increment by one level at a time
Expected: h2; Actual: h3

(MD001, heading-increment)

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@egs/vctk/TTS/train_from_scratch.md` at line 3, Heading "Cell 1: Install
Dependencies" uses H3 (###) causing a jump from H1 to H3; change the Markdown
heading to H2 by replacing the leading "###" with "##" so the file's heading
progression is H1 → H2 → ... and validates properly.

Comment on lines +27 to +35
!mkdir -p download
!ln -sfv /kaggle/input/datasets/ download/VCTK

# Build monotonic_align C extension
!bash prepare.sh --stage -1 --stop_stage -1

# Create manifests, spectrograms, tokens, and data splits
!bash prepare.sh --stage 1 --stop_stage 6
```
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟠 Major

Use --local-data-dir directly instead of a fragile manual symlink target.

Line 28 points to a broad directory (/kaggle/input/datasets/) rather than a concrete VCTK root, which can break downstream prep. This PR already introduces --local-data-dir; use it in the doc to make the flow deterministic.

Suggested doc update
 !mkdir -p download
-!ln -sfv /kaggle/input/datasets/ download/VCTK
-
-# Build monotonic_align C extension
-!bash prepare.sh --stage -1 --stop_stage -1
+!bash prepare.sh --stage -1 --stop_stage -1
+!bash prepare.sh --local-data-dir /kaggle/input/vctk-corpus --stage 0 --stop_stage 0
 
 # Create manifests, spectrograms, tokens, and data splits
 !bash prepare.sh --stage 1 --stop_stage 6
📝 Committable suggestion

‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.

Suggested change
!mkdir -p download
!ln -sfv /kaggle/input/datasets/ download/VCTK
# Build monotonic_align C extension
!bash prepare.sh --stage -1 --stop_stage -1
# Create manifests, spectrograms, tokens, and data splits
!bash prepare.sh --stage 1 --stop_stage 6
```
!mkdir -p download
!bash prepare.sh --stage -1 --stop_stage -1
!bash prepare.sh --local-data-dir /kaggle/input/vctk-corpus --stage 0 --stop_stage 0
# Create manifests, spectrograms, tokens, and data splits
!bash prepare.sh --stage 1 --stop_stage 6
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@egs/vctk/TTS/train_from_scratch.md` around lines 27 - 35, The doc currently
creates a fragile symlink to a broad path instead of using the new CLI flag;
update the instructions to remove the mkdir/ln steps and call prepare.sh with
the explicit --local-data-dir pointing at the actual VCTK dataset root (use the
same --local-data-dir flag introduced in the PR) so subsequent steps
(monotonic_align build via prepare.sh and stages 1–6) consume the correct
dataset; reference the prepare.sh invocation and the --local-data-dir flag when
making this change.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants