- OS: Windows 11 (64-bit, version 25H2)
- IDE: Visual Studio 2022 (version 17.14.19)
- Framework: OpenFrameworks (version 0.12.1) + addons (ofxCv, ofxOpenCv, ofxGui)
- Compiler: MSVC v143 (Visual Studio 2022)
Important: OpenFrameworks 0.12.1 is required. 0.11.x or earlier cannot load DNN models (not even compile).
/ yolomug
/bin
/data
/assets # images, fonts, etc.
/model # YOLO hand detection model files
/songs # video files, thumbnails, and chart data
yolomug.exe # compiled executable
/dll # empty but required by OpenFrameworks
/src # source code files
README.md # this file
addons.make # list of addons used
icon.rc
yolomug.sln # Visual Studio solution file
yolomug.vcxproj
yolomug.vcxproj.filters
You can find the compiled exe at bin\yolomug.exe.
To build from source:
- Copy the project folder into
openFrameworks/apps/myApps/. Replace the first part of the path with your OpenFrameworks installation path (e.g.D:\archive\2025A\CS4187\of-0.12.1\apps\myApps). - You should be able to run the project by opening
yolomug.slnin Visual Studio 2022. - Switch to Release x64 mode before building for better performance.
Note: The YOLO model runs on the CPU and is computationally intensive. On my machine (AMD Ryzen 7 8745HS), running in Debug mode causes noticeable lag, while Release mode runs smoothly. Therefore, please always use Release mode for testing and playing.
If you encounter issues with paths or missing addons, please first check the version of Visual Studio (must be 2022, minor version should be irrelevant) and OpenFrameworks (must be 0.12.x, though 0.12.1 is recommended). If the problem persists, try the following steps:
- In your
openFrameworks/apps/myAppspath, manually create a new project folder calledyolomug. - Copy the
/srcand/bin/datafolders from this project into the newly createdyolomugfolder. - Open the OpenFrameworks Project Generator, select the
yolomugfolder, and add the required addons:ofxCv,ofxOpenCv, andofxGui. Click generate. - This should help resolve any path-related issues.
This project creates a OpenFrameworks application called Hand Dance, which is a rhythm game controlled by hand movements detected via a YOLO-based hand detection model. The game features multiple stages, including menu navigation, video selection, gameplay, note creation, and a finish screen. The system integrates camera input, deep neural network (DNN) hand detection, and a note/scoring pipeline to provide an interactive gaming experience.
flowchart TD
A[ofApp] --> |"setup()/update()/draw()"| B[StageManager]
A --> |"update(): hand detection"| C["Hand Detection (YOLO)"]
B --> Stages
C --> B
subgraph Stages
S1[SMenu]
S2[SSelect]
S3[SPlay]
S4[SCreate]
S5[SFinish]
S1 --> |Select mode| S2
S2 --> |Choose song| S3
S2 --> |Choose song| S4
S3 --> S5
S4 --> S5
end
subgraph Notes
N0[Note]
N1[NTap]
N2[NLong]
N3[NSlide]
N0 --> N1
N0 --> N2
N0 --> N3
V[enum Verdict]
N1 --> V
N2 --> V
N3 --> V
SM[ScoreManager]
V --> SM
end
subgraph IO
IORead[Note File Reader]
IOWrite[Note File Writer]
end
S3 --> IORead
S4 --> IOWrite
IORead --> Notes
IOWrite --> Notes
A --> FM[FontManager]
A --> HM[HoverManager]
- Hand tracking: Utilizes a lightweight YOLO-based model for real-time hand tracking.
- Rhythm game mechanics: Implements a note system with different note types (tap, long, slide) and scoring based on player performance.
- Chart creation: Allows users to create and edit note charts for custom gameplay experiences.
- Multiple input methods: Aside from the gameplay stage, users can navigate the menu, select songs or even create their own charts using either hand hovering or mouse/keyboard input.
- Modular design: The project is structured with a clear separation of concerns, making it easy to extend functionality or modify existing features.
- Launch the application.
- Main menu: you can see yourself through the camera feed. Hover over "Play" or "Create" to select a mode. You can also use keyboard input: P for play and C for create.
- Song selection: hover over a song thumbnail to select it. You can also use the number 1 to 9 to select a song (the number will be displayed in parentheses).
- Play mode/Create mode: see below for details.
For best performance, please ensure only one hand is visible to the camera.
The window resolution is your webcam resolution and by default 1280x720. If your webcam has a different resolution, you may encounter slight UI misalignment but should not affect gameplay, as notes are positioned in normalized coordinates. Still, for the best experience, please use a 1280x720 webcam or modify the code to fit your webcam resolution.
Basic Controls
- Notes will appear in sync with the video. Move your hand into the note area to hit notes.
- For tap notes, simply move your hand into the note area when the note reaches the hit line.
- For long notes, keep your hand in the note area for the duration of the note.
- For slide notes, move your hand along the path of the note from start to end.
Visual Hints
- A blue note means "Early", an orange note means "Late".
- A shrinking circle around the note indicates the timing window for hitting the note. When it completely collapses, it is the exact time to hit the note.
- You will see the verdict (Pure, Far, Lost) displayed briefly when you hit or miss a note.
- Note: the hitbox of each note is larger than the visual representation (in fact, it is of the size of the shrinking circle before it collapses) to accommodate hand detection inaccuracies.
Scoring and Combo
- Your combo and score will be displayed on the top left and right corners, respectively.
- Your current combo increases with each successful hit (Pure or Far) and resets to zero upon a Lost note.
- Your score increases based on the verdict of each note hit. Pure gives 100% score, Far gives 65%, and Lost gives 0%. The maximum score is 1000000.
Finish Screen
- At the end of the video, a finish screen will display your performance statistics, including total score, max combo, and counts of Pure, Far, and Lost notes.
- Please exit the application to play again.
Basic Controls
There are two ways to add notes: mouse mode and keyboard mode.
Mouse Mode
In mouse mode, you can add notes at the current mouse position:
- Left click: Tap note
- Middle key: Hold to create a Long note. Release to set the end time.
- Right key: Hold to create a Slide note from the current mouse position. Drag to set the end position, release to set the end time.
Keyboard Mode
In keyboard mode, you can add notes at the current hand position:
- Any normal key (not space or S): Tap note
- Space: Hold to create a Long note. Release to set the end time.
- S: Hold to create a Slide note from the current hand position. Release to set the end position.
Visual Hints
- After you add a note, it will appear on the screen with its type indicated by text and color.
- The number of notes created will be displayed at the top left corner.
- The current time (MM:SS.sss) is displayed at the top right corner to help you time your notes.
Saving Chart
- The chart will be automatically saved to
songs/<videoName>_notes.txtwhen the video ends. - The
songs/<videoName>_notes.txthas higher priority than the default chartsongs/<videoName>.txtso you can test your created chart immediately. You can delete the_notes.txtfile to revert to the default chart. - The txt file is in the following format:
0 t x y # Tap note at time t at position (x, y)
1 t x y d # Long note at time t at position (x, y) with duration d
2 t x y d ex ey # Slide note at time t at position (x, y) with duration d and end position (ex, ey)
Where t and d are in seconds, and x, y, ex, ey are in normalized coordinates (0 to 1).
Finish Screen
- At the end of the video, a finish screen will display the total number of notes created.
- Please exit the application to create another chart or play mode.
The codebase is clearly organized, well named and commented for easy understanding. Below are some key components and their functionalities.
- In
ofApp::setup(), the camera is initialized, and the YOLOv3-tiny-PRN model is initialized using OpenFrameworks' DNN module. The model files (.cfgand.weights) are loaded from themodel/directory. - This model features Partial Residual Networks (PRN) [1] which provides more information to each layer by graident combination. Compared to YOLOv3-tiny, it maintains the accuracy while reducing the model size (8.86M to 4.95M) and increasing inference speed, making it suitable for real-time applications.
- In
ofApp::update(), each new camera frame is passed to the DNN for hand detection. The frame is preprocessed into a blob, downscaled to 320x320, color space converted to RGB, and normalized, finally fed into the darknet model. - The DNN outputs bounding boxes, confidence scores, and class IDs for potential matches.
- Non-maximum suppression (NMS) is applied to filter overlapping boxes based on confidence and the detection with the highest confidence is used, and its bounding box is stored in a
ofRectanglevariable calledhandBox. - The
handBoxis then used to interact with the game stages, allowing for hover-based selection and gameplay control. - If no detection is found for a short period,
handBoxis reset to top left corner.
The app is organized as stages, managed by a singleton StageManager class. The following classes implement interface Stage:
SMenu: The main menu where users can choose from "Play" or "Create" modes by hovering over options or entering "P" and "C" keys.SSelect: The song selection screen where users can browse available songs (it scans thedata/songs/directory for.mp4files) and select one to play or create notes for.SPlay: The gameplay stage where notes appear in sync with the selected video. Players hit notes by moving their hand into the note area.SCreate: The chart creation stage where users can add and save notes for the selected song.SFinish: The finish screen that displays performance statistics after a play session or a summary after chart creation.
Stage management simplifies event routing and separates responsibilities. In addition, interface Stage provide input methods like keyPressed(), mouseMoved(), etc., which are called by ofApp and routed to the current stage.
The HoverManager class manages hover-based interactions across different stages. It checks if the handBox intersects with the corresponding buttons and draws a circular progress indicator (a pair of concentric circles that gradually fills up) to provide visual feedback. Once the hover duration exceeds a threshold, the associated action is triggered.
It also ensures that only when the user hovers on the same button continuously, the progress will accumulate; moving to another button or away will reset the progress.
It is used in SMenu for mode selection and in SSelect for song selection.
All types of notes inherit from the base Note class, which defines common properties and methods such as position, timing and drawing. Each note type implements its own update() and draw() methods to handle specific behaviors and visual representations.
There are 3 types of notes:
NTap(time, x, y): A single tap note at timetat position(x, y).NLong(time, x, y, duration): A long note starting at timetat position(x, y)with a specified duration.NSlide(time, x, y, duration, endX, endY): A slide note starting at timetat position(x, y)with a specified duration and ending at(endX, endY).
Hitting or failing to hit notes results in different verdicts, defined in the Verdict enum:
- Pure (
VPure), Far (VFar), Lost (VLost), and None (VNone, for pending notes).
Details:
- Tap: Hit in -300ms~+200ms is Pure, in ±400ms is Far, else Lost.
- Hold: Total hold duration ≥ 75% of note duration is Pure, ≥ 50% is Far, else Lost.
- Slide: Both head and tail hit in -300ms~+200ms is Pure, both in ±400ms is Far, else Lost.
In addition, when playing, the game maintains a list of active notes that are currently on screen. Each notes define when they will enter the screen by hasEntered(deltaTime) and when they should be removed by isExpired(deltaTime). The SPlay stage updates and draws only the active notes, improving performance.
When playing, ScoreManager keeps track of:
- The number of Pure, Far, and Lost notes.
- The current combo and score.
- The maximum combo achieved during the session.
At the end of the play session, these statistics are displayed on the SFinish stage.
The score is scaled into a 1000000-style score; all notes share the same weight, and each Pure carries 100%, Far 65%, Lost 0%.
For chart creation, users can add notes using either mouse input or hand hovering:
- Mouse mode will add notes at mouse position:
- Left click: Tap
- Middle press: Long (hold and release)
- Right press: Slide (start at press pos, then hold and drag, end at release pos)
- Keyboard mode will add notes at current hand position:
- Any normal key press: Tap
- Hold Space: Long (hold and release)
- Hold S: Slide (start at hand pos, end at hand pos on release)
Output chart files are saved at songs/*_notes.txt where * is the song name.
/bin/data
/assets
hand.png # hand icon image
longe.png # long note image
slide.png # slide note image
tap.png # tap note image
verdana.ttf # font file
/model
yolo-v3-tiny-prn.cfg
yolo-v3-tiny-prn.weights
/songs
Last Moment.jpg
Last Moment.mp4
Last Moment.txt
Undying Macula.jpg
Undying Macula.mp4
Undying Macula.txt
All videos and images are in 1280x720 resolution. You can add your own songs by placing the video file (.mp4), thumbnail image (.jpg), and chart data (.txt) in the songs/ directory. Or, you can skip chart data and create your own chart using the Create mode.
- Improve hand detection accuracy and robustness, possibly by training a custom model or using a more advanced architecture.
- Enable multi-hand detection for two-player mode or more complex interactions.
- Enhance the note chart editor with more features, such as undo/redo, note snapping, and visual timeline.
- Add sound effects and visual effects when hitting notes to make the game more engaging.
The YOLOv3-tiny-PRN model is provided by cansik under the MIT License.
The icons used in the game are sourced from Flaticon.
The songs used in the demo are from Arcaea, a rhythm game developed by lowiro. The music videos are from official sources: Last Moment and Undying Macula.
All the other code and assets are created by the author.
[1] C.-Y. Wang, H.-Y. M. Liao, P.-Y. Chen, and J.-W. Hsieh, “Enriching Variety of Layer-Wise Learning Information by Gradient Combination,” 2019 ICCV Workshop on Low-Power Computer Vision, Oct. 2019, doi: 10.1109/iccvw.2019.00303.