Support applying pitch from wave part#2020
Merged
stakira merged 7 commits intostakira:masterfrom Mar 22, 2026
Merged
Conversation
Contributor
There was a problem hiding this comment.
Pull request overview
Adds a new workflow to extract pitch (PITD) from an existing wave part and apply it to an existing voice part, and extends transcription-related cancellation/interruption support so long-running ONNX inference can be stopped when the progress dialog is closed.
Changes:
- Add a voice-part context submenu (“Apply pitch from…”) listing wave parts as pitch sources, and run RMVPE over the overlapped time window.
- Add cancellation-driven interruption plumbing for SOME/GAME/RMVPE inference via ONNX
RunOptions.Terminate. - Extend RMVPE application to support time offsets and apply PITD via undoable curve commands.
Reviewed changes
Copilot reviewed 9 out of 9 changed files in this pull request and generated 4 comments.
Show a summary per file
| File | Description |
|---|---|
| OpenUtau/Views/MainWindow.axaml.cs | Builds the new context menu, implements “apply pitch from wave part”, and updates transcription flow cancellation + offset application. |
| OpenUtau/Views/MainWindow.axaml | Adds the “Apply pitch from…” submenu binding in the parts context menu. |
| OpenUtau/ViewModels/MainWindowViewModel.cs | Extends context menu args with PartApplyPitchMenuItems. |
| OpenUtau/Strings/Strings.axaml | Adds new context-menu and progress strings; updates RMVPE checkbox label wording. |
| OpenUtau/Strings/Strings.zh-CN.axaml | Adds Chinese translations for the new strings and RMVPE label. |
| OpenUtau.Core/Analysis/MidiExtractor.cs | Adds cancellation handling and a base Interrupt() hook for extractors. |
| OpenUtau.Core/Analysis/Some.cs | Implements interruption by terminating ONNX runs and mapping termination to cancellation. |
| OpenUtau.Core/Analysis/Game.cs | Implements interruption/termination for GAME’s multiple ONNX sessions and pipeline. |
| OpenUtau.Core/Analysis/Rmvpe.cs | Adds inference windowing + interruption, offset support when applying pitch, and uses undoable curve command to set PITD. |
💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Please merge #2019 first.
Motivation
Previous PR #2004 introduced RMVPE as an optional step after MIDI transcription to extract pitch from the wave part and apply it on the transcribed voice part.
This PR extends the scenario: user should be able to extract pitch from a wave part and apply it on existing voice part.
Implementation
A new context menu
Apply pitch from...is added on voice part in the track window. User can then select a wave part, extract pitch from it, and apply the pitch on the voice part. This action takes time offset into consideration, so user can align the audio with the notes beforehand. Only pitch from the overlapped region between the voice part and the audio part (with 1000ms padding for context) is extracted, in order not to waste computational resources.This PR also includes some slight changes in the string resources.