egs/vctk/TTS: Add --local-data-dir option to skip VCTK download by drakempham · Pull Request #2078 · k2-fsa/icefall

drakempham · 2026-04-08T03:44:51Z

I think we should allow people to point to an existing VCTK directory ( aim for easy to use)

Currently, prepare.sh attempts to download the VCTK corpus using lhotse in Stage 0. The Edinburgh DataShare server can be slow or frequently drop connections, and many users like me using kaggle - frequently tool already have the dataset mounted locally.

Changes:

Added --local-data-dir option to prepare.sh.
If provided, the script symlinks the existing directory to download/VCTK and safely skips the download.
Updated README.md to document this new option.
Example usage: bash prepare.sh --local-data-dir /kaggle/input/vctk-corpus

Summary by CodeRabbit

New Features
- Added local data directory option for custom dataset preparation in training workflow
Documentation
- Updated README with VITS model architecture overview and data preparation instructions
- Added comprehensive guides for training models from scratch and running pretrained inference
- New documentation explaining VITS model design and technical components
Chores
- Updated .gitignore to exclude Python virtual environments

coderabbitai · 2026-04-08T03:46:16Z

📝 Walkthrough

Walkthrough

Added Python virtual environment exclusion to version control, updated VCTK TTS README with VITS architecture details and data preparation guidance, extended prepare.sh with optional local data directory support, and introduced three new documentation files for inference, knowledge overview, and training workflows.

Changes

Cohort / File(s)	Summary
Version control configuration `.gitignore`	Added `.venv/` entry to exclude Python virtual environment directories from version control.
Documentation and training guidance `egs/vctk/TTS/README.md`	Replaced VCTK corpus description with VITS model features section (VAE, normalizing flow, GAN), added Data Preparation section documenting `prepare.sh` usage with `--local-data-dir` option, and reformatted training command with line continuation.
Data preparation script enhancement `egs/vctk/TTS/prepare.sh`	Added `local_data_dir` option to Stage 0 to support creating a symlink from `$local_data_dir` to `$dl_dir/VCTK` as an alternative to downloading data via `lhotse`.
New workflow documentation `egs/vctk/TTS/infer_pretrained.md`, `egs/vctk/TTS/knowledge.md`, `egs/vctk/TTS/train_from_scratch.md`	Added three new markdown documents: `infer_pretrained.md` provides step-by-step cells for installing dependencies and running inference with a pretrained VITS checkpoint; `knowledge.md` explains VITS architecture and training components (VAE, normalizing flows, duration prediction, alignment search, adversarial training); `train_from_scratch.md` documents a complete training workflow from data preparation through ONNX export.

Estimated code review effort

🎯 2 (Simple) | ⏱️ ~12 minutes

Poem

🐰 A hop and a skip through text-to-speech dreams,
VITS flows like water through digital streams,
With symlinks and scripts, we prepare the way,
Training and inference—let the models play! 🎵

🚥 Pre-merge checks | ✅ 3

✅ Passed checks (3 passed)

Check name	Status	Explanation
Description Check	✅ Passed	Check skipped - CodeRabbit’s high-level summary is enabled.
Title check	✅ Passed	The PR title directly and accurately summarizes the main change: adding a --local-data-dir option to prepare.sh to skip VCTK download, which is the core feature across prepare.sh and README.md updates.
Docstring Coverage	✅ Passed	No functions found in the changed files to evaluate docstring coverage. Skipping docstring coverage check.

_{✏️ Tip: You can configure your own custom pre-merge checks in the settings.}

✨ Finishing Touches

🧪 Generate unit tests (beta)

Create PR with unit tests

Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share

_{Comment @coderabbitai help to get the list of available commands and usage tips.}

gemini-code-assist

Code Review

This pull request adds .venv/ to the .gitignore and introduces a --local-data-dir option to the VCTK TTS prepare.sh script, enabling the use of local datasets via symlinking. Feedback focuses on improving the robustness of the bash script by properly quoting path variables, ensuring absolute paths are used for symlinks to avoid broken references, and adding validation to verify the existence of the local data directory.

gemini-code-assist · 2026-04-08T03:46:39Z

+    if [ -n "$local_data_dir" ]; then
+      log "Using local data directory: $local_data_dir"
+      mkdir -p $dl_dir
+      ln -sfv $local_data_dir $dl_dir/VCTK
+    else
+      lhotse download vctk --use-edinburgh-vctk-url ${use_edinburgh_vctk_url} $dl_dir
+    fi


The implementation of the --local-data-dir option should be more robust to handle common edge cases:

Quoting: Path variables (like $local_data_dir and $dl_dir) should be double-quoted to prevent the script from breaking if the user provides a path containing spaces.

Absolute Paths: If a relative path is passed to --local-data-dir, the symlink created at $dl_dir/VCTK will likely be broken because symlinks are resolved relative to their parent directory. Converting the path to an absolute one using readlink -f (or similar) ensures the symlink remains valid.

Validation: It is better to verify that the provided directory actually exists before attempting to symlink it, providing a clear error message if it is missing.

References

Always quote variables that contain file names or paths to handle spaces and special characters correctly.

chatgpt-codex-connector

💡 Codex Review

Here are some automated review suggestions for this pull request.

Reviewed commit: 4e3c9f0099

ℹ️ About Codex in GitHub

Codex has been enabled to automatically review pull requests in this repo. Reviews are triggered when you

Open a pull request for review
Mark a draft as ready
Comment "@codex review".

If Codex has suggestions, it will comment; otherwise it will react with 👍.

When you sign up for Codex through ChatGPT, Codex can also answer questions or update the PR, like "@codex address that feedback".

chatgpt-codex-connector · 2026-04-08T03:47:34Z

+    if [ -n "$local_data_dir" ]; then
+      log "Using local data directory: $local_data_dir"
+      mkdir -p $dl_dir
+      ln -sfv $local_data_dir $dl_dir/VCTK


Quote local-data-dir when creating symlink

The new --local-data-dir path is expanded unquoted in ln -sfv $local_data_dir $dl_dir/VCTK, so any directory containing whitespace (or shell glob characters) is split into multiple arguments and Stage 0 fails before data preparation starts. This makes the new option unreliable for valid filesystem paths such as mounted datasets with spaces in their names; quoting both operands avoids this regression.

Useful? React with 👍 / 👎.

csukuangfj · 2026-04-08T03:48:29Z

  if [ ! -d $dl_dir/VCTK ]; then
-    lhotse download vctk --use-edinburgh-vctk-url ${use_edinburgh_vctk_url} $dl_dir
+    if [ -n "$local_data_dir" ]; then
+      log "Using local data directory: $local_data_dir"


Can you check that "$local_data_dir" exists?

coderabbitai

Actionable comments posted: 1

🧹 Nitpick comments (1)

egs/vctk/TTS/prepare.sh (1)
57-57: Quote the variable to handle paths with spaces.

Per shellcheck SC2086, $dl_dir should be quoted to prevent issues with paths containing spaces or special characters.
♻️ Proposed fix
-  if [ ! -d $dl_dir/VCTK ]; then
+  if [ ! -d "$dl_dir/VCTK" ]; then
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@egs/vctk/TTS/prepare.sh` at line 57, The test for the VCTK directory uses an
unquoted variable; update the conditional in prepare.sh to quote the dl_dir path
so paths with spaces are handled correctly (change the check that references
$dl_dir/VCTK to use "$dl_dir/VCTK"). Ensure other occurrences in the same script
that reference dl_dir are also quoted consistently to avoid SC2086 issues.

🤖 Prompt for all review comments with AI agents

Verify each finding against the current code and only fix it if needed.

Inline comments:
In `@egs/vctk/TTS/prepare.sh`:
- Around line 58-64: In prepare.sh, validate that the provided local_data_dir
exists and is a directory before creating the symlink: check [ -n
"$local_data_dir" ] && [ -d "$local_data_dir" ] (or use test -d) and if the
check fails call log with a clear error and exit non-zero; only run mkdir -p
"$dl_dir" and ln -sfv "$local_data_dir" "$dl_dir/VCTK" when the directory check
passes so invalid paths don't produce confusing errors later.

---

Nitpick comments:
In `@egs/vctk/TTS/prepare.sh`:
- Line 57: The test for the VCTK directory uses an unquoted variable; update the
conditional in prepare.sh to quote the dl_dir path so paths with spaces are
handled correctly (change the check that references $dl_dir/VCTK to use
"$dl_dir/VCTK"). Ensure other occurrences in the same script that reference
dl_dir are also quoted consistently to avoid SC2086 issues.

🪄 Autofix (Beta)

Fix all unresolved CodeRabbit comments on this PR:

Push a commit to this branch (recommended)
Create a new PR with the fixes

ℹ️ Review info

⚙️ Run configuration

Configuration used: defaults

Review profile: CHILL

Plan: Pro

Run ID: 3f6b4434-54ed-4fba-ab8c-643760c532f8

📥 Commits

Reviewing files that changed from the base of the PR and between e9f4eb0 and 4e3c9f0.

📒 Files selected for processing (3)

.gitignore
egs/vctk/TTS/README.md
egs/vctk/TTS/prepare.sh

coderabbitai · 2026-04-08T03:49:18Z

+    if [ -n "$local_data_dir" ]; then
+      log "Using local data directory: $local_data_dir"
+      mkdir -p $dl_dir
+      ln -sfv $local_data_dir $dl_dir/VCTK
+    else
+      lhotse download vctk --use-edinburgh-vctk-url ${use_edinburgh_vctk_url} $dl_dir
+    fi


⚠️ Potential issue | 🟡 Minor

Add validation for the local data directory path.

The script doesn't verify that $local_data_dir exists before creating the symlink. If an invalid path is provided, subsequent stages will fail with confusing errors.

🛡️ Proposed fix to validate the directory exists

if [ -n "$local_data_dir" ]; then + if [ ! -d "$local_data_dir" ]; then + log "Error: local data directory does not exist: $local_data_dir" + exit 1 + fi log "Using local data directory: $local_data_dir" - mkdir -p $dl_dir - ln -sfv $local_data_dir $dl_dir/VCTK + mkdir -p "$dl_dir" + ln -sfv "$local_data_dir" "$dl_dir/VCTK" else - lhotse download vctk --use-edinburgh-vctk-url ${use_edinburgh_vctk_url} $dl_dir + lhotse download vctk --use-edinburgh-vctk-url ${use_edinburgh_vctk_url} "$dl_dir" fi

🧰 Tools

🪛 Shellcheck (0.11.0)

[info] 60-60: Double quote to prevent globbing and word splitting.

(SC2086)

[info] 61-61: Double quote to prevent globbing and word splitting.

(SC2086)

[info] 61-61: Double quote to prevent globbing and word splitting.

(SC2086)

[info] 63-63: Double quote to prevent globbing and word splitting.

(SC2086)

🤖 Prompt for AI Agents

Verify each finding against the current code and only fix it if needed. In `@egs/vctk/TTS/prepare.sh` around lines 58 - 64, In prepare.sh, validate that the provided local_data_dir exists and is a directory before creating the symlink: check [ -n "$local_data_dir" ] && [ -d "$local_data_dir" ] (or use test -d) and if the check fails call log with a clear error and exit non-zero; only run mkdir -p "$dl_dir" and ln -sfv "$local_data_dir" "$dl_dir/VCTK" when the directory check passes so invalid paths don't produce confusing errors later.

coderabbitai

Actionable comments posted: 3

🧹 Nitpick comments (3)

egs/vctk/TTS/runner.md (3)
110-112: Add language specifier to fenced code block.

The code block should specify a language for proper syntax highlighting.
📝 Proposed fix
-```
+```text
 Text ──→ [Phoneme Encoder] ──→ [Latent Flow] ──→ [Waveform Decoder] ──→ Audio
</details>

<details>
<summary>🤖 Prompt for AI Agents</summary>
Verify each finding against the current code and only fix it if needed.

In @egs/vctk/TTS/runner.md around lines 110 - 112, The fenced code block showing
the TTS pipeline uses a plain triple-backtick fence; update that fence to
include a language specifier (e.g., change totext) so the block with
"Text ──→ [Phoneme Encoder] ──→ [Latent Flow] ──→ [Waveform Decoder] ──→ Audio"
gets proper syntax highlighting; locate the fenced block in runner.md and
replace the opening fence accordingly.
</details>

---

`123-130`: **Add blank lines around the table.**

Markdown tables should be surrounded by blank lines for proper rendering.

<details>
<summary>📝 Proposed fix</summary>

```diff
 **Key dependencies:**
+
 | Library | Role |
 |---|---|
 | `k2` | FSA-based loss computation on GPU (must match CUDA + PyTorch version exactly) |
 | `lhotse` | Audio dataset loading, manifest creation, spectrogram computation |
 | `piper_phonemize` | Converts text → IPA phonemes used by VITS |
 | `icefall` (package) | Training utilities, logging, data pipeline code |
+
 ---
```
</details>

<details>
<summary>🤖 Prompt for AI Agents</summary>

```
Verify each finding against the current code and only fix it if needed.

In `@egs/vctk/TTS/runner.md` around lines 123 - 130, The Markdown block under the
"Key dependencies:" header contains a table that lacks surrounding blank lines;
update the section so there is a blank line immediately before the table (i.e.,
after the "Key dependencies:" line) and a blank line immediately after the table
to ensure correct Markdown rendering of the table in runner.md.
```

</details>

---

`347-373`: **Add language specifier to fenced code block.**

The pipeline summary code block should specify a language for proper syntax highlighting.

<details>
<summary>📝 Proposed fix</summary>

```diff
-```
+```text
 [Step 1] Install: icefall + k2 + piper_phonemize + lhotse
          ↓
```
</details>

<details>
<summary>🤖 Prompt for AI Agents</summary>
Verify each finding against the current code and only fix it if needed.

In @egs/vctk/TTS/runner.md around lines 347 - 373, The fenced code block that
contains the pipeline summary (the block starting with the lines showing "[Step
1] Install: icefall + k2 + piper_phonemize + lhotse" through "[Step 7]
export-onnx.py") lacks a language specifier for syntax highlighting; update the
opening fence from totext (or another appropriate language tag) so the
block becomes text ... to ensure proper rendering and highlighting in
runner.md.
</details>

</blockquote></details>

</blockquote></details>

<details>
<summary>🤖 Prompt for all review comments with AI agents</summary>
Verify each finding against the current code and only fix it if needed.

Inline comments:
In @egs/vctk/TTS/runner.md:

Around line 206-223: The downloaded checkpoint is placed at {BASE_DIR}/exp/
but train.py (via load_checkpoint_if_available) expects it at vits/exp/, causing
start-epoch loading to fail; after calling hf_hub_download for
"exp/epoch-1000.pt", copy or move that file into f"{BASE_DIR}/vits/exp/" (create
the dir if missing) so params.exp_dir (default vits/exp) contains epoch-1000.pt
before users run train.py --start-epoch 1001; ensure the copy uses the actual
downloaded path returned by hf_hub_download and preserves file permissions.

Around line 254-267: Update the documentation to remove the claim that
--local-data-dir does not work with parse_options.sh and instead show the new
PR-supported usage: replace the manual symlink example with a direct invocation
of prepare.sh using the --local-data-dir /kaggle/input/vctk-corpus flag
(keeping the existing --stage/--stop_stage examples), and correct the
example dataset path from /kaggle/input/datasets/ to
/kaggle/input/vctk-corpus; reference the prepare.sh script and the
--local-data-dir flag in the text so readers know the script will symlink the
provided directory and skip Stage 0.

Around line 195-204: Update the directory name in the example block to match
the 2024 model used elsewhere: replace the header string
"icefall-tts-vctk-vits-2023-12-05/" with "icefall-tts-vctk-vits-2024-03-18/" in
egs/vctk/TTS/runner.md so the displayed tree (tokens.txt, exp/epoch-1000.pt,
vits-epoch-1000.onnx, etc.) is consistent with the model referenced elsewhere in
the document.

Nitpick comments:
In @egs/vctk/TTS/runner.md:

Around line 110-112: The fenced code block showing the TTS pipeline uses a
plain triple-backtick fence; update that fence to include a language specifier
(e.g., change totext) so the block with "Text ──→ [Phoneme Encoder] ──→
[Latent Flow] ──→ [Waveform Decoder] ──→ Audio" gets proper syntax highlighting;
locate the fenced block in runner.md and replace the opening fence accordingly.

Around line 123-130: The Markdown block under the "Key dependencies:" header
contains a table that lacks surrounding blank lines; update the section so there
is a blank line immediately before the table (i.e., after the "Key
dependencies:" line) and a blank line immediately after the table to ensure
correct Markdown rendering of the table in runner.md.

Around line 347-373: The fenced code block that contains the pipeline summary
(the block starting with the lines showing "[Step 1] Install: icefall + k2 +
piper_phonemize + lhotse" through "[Step 7] export-onnx.py") lacks a language
specifier for syntax highlighting; update the opening fence from totext
(or another appropriate language tag) so the block becomes text ... to
ensure proper rendering and highlighting in runner.md.
</details>

<details>
<summary>🪄 Autofix (Beta)</summary>

Fix all unresolved CodeRabbit comments on this PR:

- [ ]  Push a commit to this branch (recommended)
- [ ]  Create a new PR with the fixes

</details>

---

<details>
<summary>ℹ️ Review info</summary>

<details>
<summary>⚙️ Run configuration</summary>

**Configuration used**: defaults

**Review profile**: CHILL

**Plan**: Pro

**Run ID**: `3d40eb37-0d25-429b-87fb-e92956afe430`

</details>

<details>
<summary>📥 Commits</summary>

Reviewing files that changed from the base of the PR and between 4e3c9f00991ae635e21e116ef7889c62f4451e42 and 35b38c42c6b57bbbe6e89a475fa7ccab026f9163.

</details>

<details>
<summary>📒 Files selected for processing (3)</summary>

* `egs/vctk/TTS/README.md`
* `egs/vctk/TTS/runner.md`
* `egs/vctk/TTS/vctk-vits-training.ipynb`

</details>

<details>
<summary>✅ Files skipped from review due to trivial changes (1)</summary>

* egs/vctk/TTS/README.md

</details>

</details>

coderabbitai · 2026-04-09T03:03:48Z

+```python
+from huggingface_hub import hf_hub_download
+import os
+
+MODEL_ID = "zrjin/icefall-tts-vctk-vits-2024-03-18"
+BASE_DIR  = "/kaggle/working/icefall/egs/vctk/TTS"
+
+os.makedirs(f"{BASE_DIR}/vits/exp", exist_ok=True)
+os.makedirs(f"{BASE_DIR}/data", exist_ok=True)
+
+# Download checkpoint (~1.08 GB) → vits/exp/epoch-1000.pt
+hf_hub_download(repo_id=MODEL_ID, filename="exp/epoch-1000.pt",  local_dir=BASE_DIR)
+
+# Download token file → data/tokens.txt
+hf_hub_download(repo_id=MODEL_ID, filename="data/tokens.txt", local_dir=BASE_DIR)
+
+print("Done.")
+```


⚠️ Potential issue | 🔴 Critical

Missing checkpoint copy step breaks training.

Unlike the Quick Start section (lines 54-55), this code downloads epoch-1000.pt to {BASE_DIR}/exp/ but does NOT copy it to {BASE_DIR}/vits/exp/ where train.py expects it. When users run Step 5 (train.py --start-epoch 1001), it will fail with "epoch-1000.pt does not exist" because load_checkpoint_if_available looks for params.exp_dir / f"epoch-{params.start_epoch-1}.pt" where exp_dir defaults to vits/exp.

🔧 Proposed fix

from huggingface_hub import hf_hub_download -import os +import os, shutil MODEL_ID = "zrjin/icefall-tts-vctk-vits-2024-03-18" BASE_DIR = "/kaggle/working/icefall/egs/vctk/TTS" os.makedirs(f"{BASE_DIR}/vits/exp", exist_ok=True) os.makedirs(f"{BASE_DIR}/data", exist_ok=True) -# Download checkpoint (~1.08 GB) → vits/exp/epoch-1000.pt +# Download checkpoint (~1.08 GB). +# hf_hub_download preserves the repo path → saves to BASE_DIR/exp/epoch-1000.pt +# But train.py expects it at BASE_DIR/vits/exp/epoch-1000.pt +# So: download first, then move to the correct location. hf_hub_download(repo_id=MODEL_ID, filename="exp/epoch-1000.pt", local_dir=BASE_DIR) +shutil.copy2(f"{BASE_DIR}/exp/epoch-1000.pt", f"{BASE_DIR}/vits/exp/epoch-1000.pt") -# Download token file → data/tokens.txt +# Phoneme token map (required by both train.py and infer.py) hf_hub_download(repo_id=MODEL_ID, filename="data/tokens.txt", local_dir=BASE_DIR) print("Done.")

📝 Committable suggestion

‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.

Suggested change

```python

from huggingface_hub import hf_hub_download

import os

MODEL_ID = "zrjin/icefall-tts-vctk-vits-2024-03-18"

BASE_DIR = "/kaggle/working/icefall/egs/vctk/TTS"

os.makedirs(f"{BASE_DIR}/vits/exp", exist_ok=True)

os.makedirs(f"{BASE_DIR}/data", exist_ok=True)

# Download checkpoint (~1.08 GB) → vits/exp/epoch-1000.pt

hf_hub_download(repo_id=MODEL_ID, filename="exp/epoch-1000.pt", local_dir=BASE_DIR)

# Download token file → data/tokens.txt

hf_hub_download(repo_id=MODEL_ID, filename="data/tokens.txt", local_dir=BASE_DIR)

print("Done.")

```

🤖 Prompt for AI Agents

Verify each finding against the current code and only fix it if needed. In `@egs/vctk/TTS/runner.md` around lines 206 - 223, The downloaded checkpoint is placed at {BASE_DIR}/exp/ but train.py (via load_checkpoint_if_available) expects it at vits/exp/, causing start-epoch loading to fail; after calling hf_hub_download for "exp/epoch-1000.pt", copy or move that file into f"{BASE_DIR}/vits/exp/" (create the dir if missing) so params.exp_dir (default vits/exp) contains epoch-1000.pt before users run train.py --start-epoch 1001; ensure the copy uses the actual downloaded path returned by hf_hub_download and preserves file permissions.

coderabbitai · 2026-04-09T03:03:49Z

+The flag `--local-data-dir` does NOT work with parse_options.sh.
+The correct approach (documented in prepare.sh itself) is to create a symlink manually:
+
+```bash
+# Point download/VCTK to your existing Kaggle dataset
+!mkdir -p /kaggle/working/icefall/egs/vctk/TTS/download
+!ln -sfv /kaggle/input/datasets/ \
+          /kaggle/working/icefall/egs/vctk/TTS/download/VCTK
+
+# Now run stages 1–4. Stage 0 (download) is automatically skipped
+# because download/VCTK already exists.
+%cd /kaggle/working/icefall/egs/vctk/TTS
+!bash prepare.sh --stage 1 --stop_stage 4
+```


⚠️ Potential issue | 🟠 Major

Documentation contradicts the PR's feature addition.

Lines 254-255 state that --local-data-dir does NOT work with parse_options.sh and recommend manual symlinking. However, the PR's entire purpose is to add a working --local-data-dir option to prepare.sh. The PR description explicitly shows: bash prepare.sh --local-data-dir /kaggle/input/vctk-corpus.

Additionally, line 260 uses /kaggle/input/datasets/ which appears incomplete or incorrect compared to the PR's example path /kaggle/input/vctk-corpus.

📝 Suggested revision to match PR feature

-The flag `--local-data-dir` does NOT work with parse_options.sh. -The correct approach (documented in prepare.sh itself) is to create a symlink manually: +VCTK is a large dataset (~11 GB). If you already have it mounted locally (e.g., on Kaggle), +use `--local-data-dir` to skip the download: ```bash -# Point download/VCTK to your existing Kaggle dataset -!mkdir -p /kaggle/working/icefall/egs/vctk/TTS/download -!ln -sfv /kaggle/input/datasets/ \ - /kaggle/working/icefall/egs/vctk/TTS/download/VCTK - -# Now run stages 1–4. Stage 0 (download) is automatically skipped -# because download/VCTK already exists. %cd /kaggle/working/icefall/egs/vctk/TTS -!bash prepare.sh --stage 1 --stop_stage 4 +!bash prepare.sh --local-data-dir /kaggle/input/vctk-corpus --stage 1 --stop_stage 4

+This tells prepare.sh to symlink your existing VCTK directory and skip Stage 0 (download).

</details>  <details> <summary>📝 Committable suggestion</summary> > ‼️ **IMPORTANT** > Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements. ```suggestion VCTK is a large dataset (~11 GB). If you already have it mounted locally (e.g., on Kaggle), use `--local-data-dir` to skip the download:

🤖 Prompt for AI Agents

Verify each finding against the current code and only fix it if needed. In `@egs/vctk/TTS/runner.md` around lines 254 - 267, Update the documentation to remove the claim that `--local-data-dir` does not work with parse_options.sh and instead show the new PR-supported usage: replace the manual symlink example with a direct invocation of prepare.sh using the `--local-data-dir /kaggle/input/vctk-corpus` flag (keeping the existing `--stage`/`--stop_stage` examples), and correct the example dataset path from `/kaggle/input/datasets/` to `/kaggle/input/vctk-corpus`; reference the `prepare.sh` script and the `--local-data-dir` flag in the text so readers know the script will symlink the provided directory and skip Stage 0.

csukuangfj · 2026-04-09T03:14:35Z

There is an error in the colab notebook. Can you fix it?

coderabbitai

Actionable comments posted: 6

🧹 Nitpick comments (1)

egs/vctk/TTS/infer_pretrained.md (1)

75-79: Guard against missing output directory before listing WAV files.

os.listdir(wav_dir_test) will throw if inference didn’t produce that path. Add an existence check and a clear message.

Suggested robustness tweak

 wav_dir_test = os.path.join(wav_dir, "test")
-wav_files = sorted(os.listdir(wav_dir_test))
+if not os.path.isdir(wav_dir_test):
+    raise FileNotFoundError(f"Inference output not found: {wav_dir_test}")
+wav_files = sorted(os.listdir(wav_dir_test))

🤖 Prompt for AI Agents

Verify each finding against the current code and only fix it if needed.

In `@egs/vctk/TTS/infer_pretrained.md` around lines 75 - 79, The code calls
os.listdir(wav_dir_test) without verifying the directory exists (wav_dir_test),
so add a pre-check using os.path.exists/os.path.isdir for wav_dir_test before
building wav_files; if the path is missing, raise a clear error or log a message
that inference output is absent (e.g., "Inference output not found at
{wav_dir_test}, run inference or check path") instead of letting os.listdir
throw, or optionally create the directory if that fits the workflow.

🤖 Prompt for all review comments with AI agents

Verify each finding against the current code and only fix it if needed.

Inline comments:
In `@egs/vctk/TTS/infer_pretrained.md`:
- Line 3: The heading "Cell 1: Install Dependencies" is using an H3 (###) which
jumps from the document H1 to H3; change that heading to H2 (replace "### Cell
1: Install Dependencies" with "## Cell 1: Install Dependencies") so heading
levels are incremental and consistent.
- Around line 27-35: The current notebook creates an ambiguous symlink to
/kaggle/input/datasets/ then runs prepare.sh; instead remove the ln -sfv step
and call the script with the explicit --local-data-dir argument (prepare.sh
--local-data-dir /path/to/VCTK) so the prep stages use the precise VCTK root;
update the two prepare.sh invocations (the build/extension call and the
manifest/spectrogram/token/data-split call) to pass --local-data-dir where
needed and delete the symlink creation to make setup reproducible.
- Around line 14-16: Replace the hardcoded k2 pip install line with a
compatibility check that inspects the runtime PyTorch and CUDA versions (e.g.,
using torch.__version__ and torch.version.cuda) and selects the matching k2
wheel from the k2 CUDA wheel index (or else print a clear error/requirement
message); alternatively add a short documented note above the existing pip
install explaining the exact required CUDA and PyTorch versions and listing the
alternate wheel tags (CUDA 12.6, 12.8, 12.9, 13.0 for torch 2.10.0) so users can
choose the correct pip target instead of the pinned cuda12.8.torch2.10.0 wheel.

In `@egs/vctk/TTS/knowledge.md`:
- Line 13: Replace the misspelled Vietnamese technical term "khuyếch đại" with
the correct spelling "khuếch đại" in the sentence describing the drawback (Error
Accumulation) so the line reads that the Vocoder will "khuếch đại" the blurred
spectrogram into artifacts or robotic noise; update the Occurrence of "khuyếch
đại" in the TTS knowledge text to the correct form.

In `@egs/vctk/TTS/train_from_scratch.md`:
- Line 3: Heading "Cell 1: Install Dependencies" uses H3 (###) causing a jump
from H1 to H3; change the Markdown heading to H2 by replacing the leading "###"
with "##" so the file's heading progression is H1 → H2 → ... and validates
properly.
- Around line 27-35: The doc currently creates a fragile symlink to a broad path
instead of using the new CLI flag; update the instructions to remove the
mkdir/ln steps and call prepare.sh with the explicit --local-data-dir pointing
at the actual VCTK dataset root (use the same --local-data-dir flag introduced
in the PR) so subsequent steps (monotonic_align build via prepare.sh and stages
1–6) consume the correct dataset; reference the prepare.sh invocation and the
--local-data-dir flag when making this change.

---

Nitpick comments:
In `@egs/vctk/TTS/infer_pretrained.md`:
- Around line 75-79: The code calls os.listdir(wav_dir_test) without verifying
the directory exists (wav_dir_test), so add a pre-check using
os.path.exists/os.path.isdir for wav_dir_test before building wav_files; if the
path is missing, raise a clear error or log a message that inference output is
absent (e.g., "Inference output not found at {wav_dir_test}, run inference or
check path") instead of letting os.listdir throw, or optionally create the
directory if that fits the workflow.

🪄 Autofix (Beta)

Fix all unresolved CodeRabbit comments on this PR:

Push a commit to this branch (recommended)
Create a new PR with the fixes

ℹ️ Review info

⚙️ Run configuration

Configuration used: defaults

Review profile: CHILL

Plan: Pro

Run ID: 66142269-62a0-4bc4-baf3-fbf8129d3651

📥 Commits

Reviewing files that changed from the base of the PR and between 30d80c9 and 5696479.

⛔ Files ignored due to path filters (1)

egs/vctk/TTS/image.png is excluded by !**/*.png

📒 Files selected for processing (4)

egs/vctk/TTS/README.md
egs/vctk/TTS/infer_pretrained.md
egs/vctk/TTS/knowledge.md
egs/vctk/TTS/train_from_scratch.md

🚧 Files skipped from review as they are similar to previous changes (1)

egs/vctk/TTS/README.md

coderabbitai · 2026-04-09T04:31:05Z

@@ -0,0 +1,84 @@
+# Inference with Pretrained VITS Model
+
+### Cell 1: Install Dependencies


⚠️ Potential issue | 🟡 Minor

Fix heading level jump (H1 → H3).

Line 3 should use ## to keep heading levels incremental.

🧰 Tools

🪛 markdownlint-cli2 (0.22.0)

[warning] 3-3: Heading levels should only increment by one level at a time
Expected: h2; Actual: h3

(MD001, heading-increment)

🤖 Prompt for AI Agents

Verify each finding against the current code and only fix it if needed. In `@egs/vctk/TTS/infer_pretrained.md` at line 3, The heading "Cell 1: Install Dependencies" is using an H3 (###) which jumps from the document H1 to H3; change that heading to H2 (replace "### Cell 1: Install Dependencies" with "## Cell 1: Install Dependencies") so heading levels are incremental and consistent.

coderabbitai · 2026-04-09T04:31:05Z

+# Install k2 (must match CUDA 12.8 + PyTorch 2.10.0)
+!pip install k2==1.24.4.dev20260306+cuda12.8.torch2.10.0 -f https://k2-fsa.github.io/k2/cuda.html
+


⚠️ Potential issue | 🟠 Major

🧩 Analysis chain

🌐 Web query:

For k2 installation, what are the supported CUDA/PyTorch wheel combinations and how should users choose the correct wheel in notebooks?

💡 Result:

Supported CUDA/PyTorch wheel combinations (prebuilt Linux CUDA wheels)

k2’s CUDA wheels are built for specific (CUDA runtime, PyTorch) pairs; you must match both (as encoded in the wheel tag +cudaX.Y.torchA.B.C). The currently listed combinations include, for example:

torch 2.10.0 with CUDA 12.6 / 12.8 / 12.9 / 13.0 wheels (multiple Python versions). [1]

torch 2.4.0 with CUDA 11.8 / 12.1 / 12.4 wheels (multiple Python versions). [1]

The authoritative, up-to-date list is the wheel index page: https://k2-fsa.github.io/k2/cuda.html. [1]

How users should choose the correct wheel in notebooks

Check the PyTorch you already have in the notebook (this is what you must match):

import torch, sys print("python:", sys.version) print("torch :", torch.__version__) print("cuda :", torch.version.cuda) # None if CPU-only torch

Pick the k2 wheel whose tag matches exactly:

torch.__version__ → matches the torch… part

torch.version.cuda → matches the cuda… part
(Example naming: ...+cuda11.7.torch2.0.1...). [2]

Install following the documented pattern: install torch first, then install the matching k2 wheel from the wheel index:

pip install k2==<k2_version>+cuda<cuda_version>.torch<torch_version> -f https://k2-fsa.github.io/k2/cuda.html

(And if you’re on Windows, CUDA wheels aren’t provided; you’d use CPU wheels or build from source.) [2]

Rule of thumb for notebooks: don’t “guess” a k2 wheel—read torch.__version__ / torch.version.cuda, then select the matching entry from cuda.html. [1][2]

Add compatibility check before k2 installation to match user's CUDA/PyTorch versions.

The pinned wheel k2==1.24.4.dev20260306+cuda12.8.torch2.10.0 only works for CUDA 12.8 + PyTorch 2.10.0; users with different versions will fail. For PyTorch 2.10.0, k2 also provides wheels for CUDA 12.6, 12.9, and 13.0. Add a check to detect the user's current versions and select the matching wheel, or document which CUDA versions the notebook requires.

import torch print(f"PyTorch: {torch.__version__}, CUDA: {torch.version.cuda}") # Then select the k2 wheel from https://k2-fsa.github.io/k2/cuda.html matching your versions

🤖 Prompt for AI Agents

Verify each finding against the current code and only fix it if needed. In `@egs/vctk/TTS/infer_pretrained.md` around lines 14 - 16, Replace the hardcoded k2 pip install line with a compatibility check that inspects the runtime PyTorch and CUDA versions (e.g., using torch.__version__ and torch.version.cuda) and selects the matching k2 wheel from the k2 CUDA wheel index (or else print a clear error/requirement message); alternatively add a short documented note above the existing pip install explaining the exact required CUDA and PyTorch versions and listing the alternate wheel tags (CUDA 12.6, 12.8, 12.9, 13.0 for torch 2.10.0) so users can choose the correct pip target instead of the pinned cuda12.8.torch2.10.0 wheel.

coderabbitai · 2026-04-09T04:31:05Z

+!mkdir -p download
+!ln -sfv /kaggle/input/datasets/ download/VCTK
+
+# Build monotonic_align C extension
+!bash prepare.sh --stage -1 --stop_stage -1
+
+# Create manifests, spectrograms, tokens, and data splits
+!bash prepare.sh --stage 1 --stop_stage 4
+```


⚠️ Potential issue | 🟠 Major

Prefer --local-data-dir over symlinking an ambiguous Kaggle path.

Line 28 symlinks a generic directory and may not resolve to the VCTK root. Use the new prepare.sh --local-data-dir option for a precise, reproducible setup.

Suggested doc update

!mkdir -p download -!ln -sfv /kaggle/input/datasets/ download/VCTK - -# Build monotonic_align C extension -!bash prepare.sh --stage -1 --stop_stage -1 +!bash prepare.sh --stage -1 --stop_stage -1 +!bash prepare.sh --local-data-dir /kaggle/input/vctk-corpus --stage 0 --stop_stage 0 # Create manifests, spectrograms, tokens, and data splits !bash prepare.sh --stage 1 --stop_stage 4

📝 Committable suggestion

‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.

Suggested change

!mkdir -p download

!ln -sfv /kaggle/input/datasets/ download/VCTK

# Build monotonic_align C extension

!bash prepare.sh --stage -1 --stop_stage -1

# Create manifests, spectrograms, tokens, and data splits

!bash prepare.sh --stage 1 --stop_stage 4

```

!mkdir -p download

!bash prepare.sh --stage -1 --stop_stage -1

!bash prepare.sh --local-data-dir /kaggle/input/vctk-corpus --stage 0 --stop_stage 0

# Create manifests, spectrograms, tokens, and data splits

!bash prepare.sh --stage 1 --stop_stage 4

🤖 Prompt for AI Agents

Verify each finding against the current code and only fix it if needed. In `@egs/vctk/TTS/infer_pretrained.md` around lines 27 - 35, The current notebook creates an ambiguous symlink to /kaggle/input/datasets/ then runs prepare.sh; instead remove the ln -sfv step and call the script with the explicit --local-data-dir argument (prepare.sh --local-data-dir /path/to/VCTK) so the prep stages use the precise VCTK root; update the two prepare.sh invocations (the build/extension call and the manifest/spectrogram/token/data-split call) to pass --local-data-dir where needed and delete the symlink creation to make setup reproducible.

coderabbitai · 2026-04-09T04:31:05Z

+1. **Acoustic Model** (Tacotron 2, FastSpeech): Biến **Text** thành **Mel-spectrogram** (dạng hình ảnh biểu diễn âm thanh tần số).
+2. **Vocoder** (WaveNet, HiFi-GAN): Biến **Mel-spectrogram** thành **Waveform** (sóng âm thanh thô để phát ra loa).
+
+**Nhược điểm của cách cũ:** Tích tụ lỗi (Error Accumulation). Nếu Acoustic Model dự đoán Spectrogram hơi mờ, Vocoder sẽ khuyếch đại cái "mờ" đó thành tiếng nhiễu (artifacts) hoặc tiếng robot.


⚠️ Potential issue | 🟡 Minor

Fix spelling in the Vietnamese technical term.

Line 13 uses “khuyếch đại”; the standard spelling is “khuếch đại”.

🧰 Tools

🪛 LanguageTool

[grammar] ~13-~13: Ensure spelling is correct
Context: ... dự đoán Spectrogram hơi mờ, Vocoder sẽ khuyếch đại cái "mờ" đó thành tiếng nhiễu (arti...

(QB_NEW_EN_ORTHOGRAPHY_ERROR_IDS_1)

🤖 Prompt for AI Agents

Verify each finding against the current code and only fix it if needed. In `@egs/vctk/TTS/knowledge.md` at line 13, Replace the misspelled Vietnamese technical term "khuyếch đại" with the correct spelling "khuếch đại" in the sentence describing the drawback (Error Accumulation) so the line reads that the Vocoder will "khuếch đại" the blurred spectrogram into artifacts or robotic noise; update the Occurrence of "khuyếch đại" in the TTS knowledge text to the correct form.

coderabbitai · 2026-04-09T04:31:05Z

@@ -0,0 +1,64 @@
+# Train VITS Model From Scratch
+
+### Cell 1: Install Dependencies


⚠️ Potential issue | 🟡 Minor

Fix heading level jump (H1 → H3).

Line 3 should be ## instead of ### to satisfy Markdown heading progression.

🧰 Tools

🪛 markdownlint-cli2 (0.22.0)

[warning] 3-3: Heading levels should only increment by one level at a time
Expected: h2; Actual: h3

(MD001, heading-increment)

🤖 Prompt for AI Agents

Verify each finding against the current code and only fix it if needed. In `@egs/vctk/TTS/train_from_scratch.md` at line 3, Heading "Cell 1: Install Dependencies" uses H3 (###) causing a jump from H1 to H3; change the Markdown heading to H2 by replacing the leading "###" with "##" so the file's heading progression is H1 → H2 → ... and validates properly.

coderabbitai · 2026-04-09T04:31:05Z

+!mkdir -p download
+!ln -sfv /kaggle/input/datasets/ download/VCTK
+
+# Build monotonic_align C extension
+!bash prepare.sh --stage -1 --stop_stage -1
+
+# Create manifests, spectrograms, tokens, and data splits
+!bash prepare.sh --stage 1 --stop_stage 6
+```


⚠️ Potential issue | 🟠 Major

Use --local-data-dir directly instead of a fragile manual symlink target.

Line 28 points to a broad directory (/kaggle/input/datasets/) rather than a concrete VCTK root, which can break downstream prep. This PR already introduces --local-data-dir; use it in the doc to make the flow deterministic.

Suggested doc update

!mkdir -p download -!ln -sfv /kaggle/input/datasets/ download/VCTK - -# Build monotonic_align C extension -!bash prepare.sh --stage -1 --stop_stage -1 +!bash prepare.sh --stage -1 --stop_stage -1 +!bash prepare.sh --local-data-dir /kaggle/input/vctk-corpus --stage 0 --stop_stage 0 # Create manifests, spectrograms, tokens, and data splits !bash prepare.sh --stage 1 --stop_stage 6

📝 Committable suggestion

‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.

Suggested change

!mkdir -p download

!ln -sfv /kaggle/input/datasets/ download/VCTK

# Build monotonic_align C extension

!bash prepare.sh --stage -1 --stop_stage -1

# Create manifests, spectrograms, tokens, and data splits

!bash prepare.sh --stage 1 --stop_stage 6

```

!mkdir -p download

!bash prepare.sh --stage -1 --stop_stage -1

!bash prepare.sh --local-data-dir /kaggle/input/vctk-corpus --stage 0 --stop_stage 0

# Create manifests, spectrograms, tokens, and data splits

!bash prepare.sh --stage 1 --stop_stage 6

🤖 Prompt for AI Agents

Verify each finding against the current code and only fix it if needed. In `@egs/vctk/TTS/train_from_scratch.md` around lines 27 - 35, The doc currently creates a fragile symlink to a broad path instead of using the new CLI flag; update the instructions to remove the mkdir/ln steps and call prepare.sh with the explicit --local-data-dir pointing at the actual VCTK dataset root (use the same --local-data-dir flag introduced in the PR) so subsequent steps (monotonic_align build via prepare.sh and stages 1–6) consume the correct dataset; reference the prepare.sh invocation and the --local-data-dir flag when making this change.

egs/vctk/TTS: add --local-data-dir option to skip VCTK download

4e3c9f0

gemini-code-assist Bot reviewed Apr 8, 2026

View reviewed changes

chatgpt-codex-connector Bot reviewed Apr 8, 2026

View reviewed changes

csukuangfj reviewed Apr 8, 2026

View reviewed changes

coderabbitai Bot reviewed Apr 8, 2026

View reviewed changes

Vits training

35b38c4

coderabbitai Bot reviewed Apr 9, 2026

View reviewed changes

Variational training

30d80c9

Centralize VAS knowledge

5696479

coderabbitai Bot reviewed Apr 9, 2026

View reviewed changes

		@@ -0,0 +1,84 @@
		# Inference with Pretrained VITS Model

		### Cell 1: Install Dependencies

		# Install k2 (must match CUDA 12.8 + PyTorch 2.10.0)
		!pip install k2==1.24.4.dev20260306+cuda12.8.torch2.10.0 -f https://k2-fsa.github.io/k2/cuda.html

		@@ -0,0 +1,64 @@
		# Train VITS Model From Scratch

		### Cell 1: Install Dependencies

Conversation

drakempham commented Apr 8, 2026 • edited by coderabbitai Bot Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Summary by CodeRabbit

Uh oh!

coderabbitai Bot commented Apr 8, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Walkthrough

Changes

Estimated code review effort

Poem

Uh oh!

gemini-code-assist Bot left a comment

Choose a reason for hiding this comment

Code Review

Uh oh!

gemini-code-assist Bot Apr 8, 2026

Choose a reason for hiding this comment

Uh oh!

chatgpt-codex-connector Bot left a comment

Choose a reason for hiding this comment

💡 Codex Review

Uh oh!

chatgpt-codex-connector Bot Apr 8, 2026

Choose a reason for hiding this comment

Uh oh!

csukuangfj Apr 8, 2026

Choose a reason for hiding this comment

Uh oh!

coderabbitai Bot left a comment

Choose a reason for hiding this comment

Uh oh!

coderabbitai Bot Apr 8, 2026

Choose a reason for hiding this comment

Uh oh!

coderabbitai Bot left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

coderabbitai Bot Apr 9, 2026

Choose a reason for hiding this comment

Uh oh!

coderabbitai Bot Apr 9, 2026

Choose a reason for hiding this comment

Uh oh!

csukuangfj commented Apr 9, 2026

Uh oh!

coderabbitai Bot left a comment

Choose a reason for hiding this comment

Uh oh!

coderabbitai Bot Apr 9, 2026

Choose a reason for hiding this comment

Uh oh!

coderabbitai Bot Apr 9, 2026

Choose a reason for hiding this comment

Supported CUDA/PyTorch wheel combinations (prebuilt Linux CUDA wheels)

How users should choose the correct wheel in notebooks

Uh oh!

coderabbitai Bot Apr 9, 2026

Choose a reason for hiding this comment

Uh oh!

coderabbitai Bot Apr 9, 2026

Choose a reason for hiding this comment

Uh oh!

coderabbitai Bot Apr 9, 2026

Choose a reason for hiding this comment

Uh oh!

coderabbitai Bot Apr 9, 2026

Choose a reason for hiding this comment

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

drakempham commented Apr 8, 2026 •

edited by coderabbitai Bot

Loading

coderabbitai Bot commented Apr 8, 2026 •

edited

Loading