Skip to content

mojimoon/CS4187_HandDance

Repository files navigation

Hand Dance

Setup Instructions

Development Environment

  • OS: Windows 11 (64-bit, version 25H2)
  • IDE: Visual Studio 2022 (version 17.14.19)
  • Framework: OpenFrameworks (version 0.12.1) + addons (ofxCv, ofxOpenCv, ofxGui)
  • Compiler: MSVC v143 (Visual Studio 2022)

Important: OpenFrameworks 0.12.1 is required. 0.11.x or earlier cannot load DNN models (not even compile).

Folder Structure

/ yolomug
    /bin
        /data
            /assets         # images, fonts, etc.
            /model          # YOLO hand detection model files
            /songs          # video files, thumbnails, and chart data
        yolomug.exe         # compiled executable
    /dll                    # empty but required by OpenFrameworks
    /src                    # source code files
    README.md               # this file
    addons.make             # list of addons used
    icon.rc
    yolomug.sln             # Visual Studio solution file
    yolomug.vcxproj
    yolomug.vcxproj.filters

How to Build

You can find the compiled exe at bin\yolomug.exe.

To build from source:

  1. Copy the project folder into openFrameworks/apps/myApps/. Replace the first part of the path with your OpenFrameworks installation path (e.g. D:\archive\2025A\CS4187\of-0.12.1\apps\myApps).
  2. You should be able to run the project by opening yolomug.sln in Visual Studio 2022.
  3. Switch to Release x64 mode before building for better performance.

Note: The YOLO model runs on the CPU and is computationally intensive. On my machine (AMD Ryzen 7 8745HS), running in Debug mode causes noticeable lag, while Release mode runs smoothly. Therefore, please always use Release mode for testing and playing.

If you encounter issues with paths or missing addons, please first check the version of Visual Studio (must be 2022, minor version should be irrelevant) and OpenFrameworks (must be 0.12.x, though 0.12.1 is recommended). If the problem persists, try the following steps:

  1. In your openFrameworks/apps/myApps path, manually create a new project folder called yolomug.
  2. Copy the /src and /bin/data folders from this project into the newly created yolomug folder.
  3. Open the OpenFrameworks Project Generator, select the yolomug folder, and add the required addons: ofxCv, ofxOpenCv, and ofxGui. Click generate.
  4. This should help resolve any path-related issues.

Project Overview

Summary

This project creates a OpenFrameworks application called Hand Dance, which is a rhythm game controlled by hand movements detected via a YOLO-based hand detection model. The game features multiple stages, including menu navigation, video selection, gameplay, note creation, and a finish screen. The system integrates camera input, deep neural network (DNN) hand detection, and a note/scoring pipeline to provide an interactive gaming experience.

Project Structure

flowchart TD
    A[ofApp] --> |"setup()/update()/draw()"| B[StageManager]
    A --> |"update(): hand detection"| C["Hand Detection (YOLO)"]
    B --> Stages
    C --> B

    subgraph Stages
        S1[SMenu]
        S2[SSelect]
        S3[SPlay]
        S4[SCreate]
        S5[SFinish]
        S1 --> |Select mode| S2
        S2 --> |Choose song| S3
        S2 --> |Choose song| S4
        S3 --> S5
        S4 --> S5
    end

    subgraph Notes
        N0[Note]
        N1[NTap]
        N2[NLong]
        N3[NSlide]
        N0 --> N1
        N0 --> N2
        N0 --> N3
        V[enum Verdict]
        N1 --> V
        N2 --> V
        N3 --> V
        SM[ScoreManager]
        V --> SM
    end

    subgraph IO
        IORead[Note File Reader]
        IOWrite[Note File Writer]
    end
    S3 --> IORead
    S4 --> IOWrite
    IORead --> Notes
    IOWrite --> Notes

    A --> FM[FontManager]
    A --> HM[HoverManager]
Loading

Features

  • Hand tracking: Utilizes a lightweight YOLO-based model for real-time hand tracking.
  • Rhythm game mechanics: Implements a note system with different note types (tap, long, slide) and scoring based on player performance.
  • Chart creation: Allows users to create and edit note charts for custom gameplay experiences.
  • Multiple input methods: Aside from the gameplay stage, users can navigate the menu, select songs or even create their own charts using either hand hovering or mouse/keyboard input.
  • Modular design: The project is structured with a clear separation of concerns, making it easy to extend functionality or modify existing features.

How to Use

  1. Launch the application.
  2. Main menu: you can see yourself through the camera feed. Hover over "Play" or "Create" to select a mode. You can also use keyboard input: P for play and C for create.
  3. Song selection: hover over a song thumbnail to select it. You can also use the number 1 to 9 to select a song (the number will be displayed in parentheses).
  4. Play mode/Create mode: see below for details.

For best performance, please ensure only one hand is visible to the camera.

The window resolution is your webcam resolution and by default 1280x720. If your webcam has a different resolution, you may encounter slight UI misalignment but should not affect gameplay, as notes are positioned in normalized coordinates. Still, for the best experience, please use a 1280x720 webcam or modify the code to fit your webcam resolution.

Play Mode Instructions

Basic Controls

  • Notes will appear in sync with the video. Move your hand into the note area to hit notes.
  • For tap notes, simply move your hand into the note area when the note reaches the hit line.
  • For long notes, keep your hand in the note area for the duration of the note.
  • For slide notes, move your hand along the path of the note from start to end.

Visual Hints

  • A blue note means "Early", an orange note means "Late".
  • A shrinking circle around the note indicates the timing window for hitting the note. When it completely collapses, it is the exact time to hit the note.
  • You will see the verdict (Pure, Far, Lost) displayed briefly when you hit or miss a note.
  • Note: the hitbox of each note is larger than the visual representation (in fact, it is of the size of the shrinking circle before it collapses) to accommodate hand detection inaccuracies.

Scoring and Combo

  • Your combo and score will be displayed on the top left and right corners, respectively.
  • Your current combo increases with each successful hit (Pure or Far) and resets to zero upon a Lost note.
  • Your score increases based on the verdict of each note hit. Pure gives 100% score, Far gives 65%, and Lost gives 0%. The maximum score is 1000000.

Finish Screen

  • At the end of the video, a finish screen will display your performance statistics, including total score, max combo, and counts of Pure, Far, and Lost notes.
  • Please exit the application to play again.

Create Mode Instructions

Basic Controls

There are two ways to add notes: mouse mode and keyboard mode.

Mouse Mode

In mouse mode, you can add notes at the current mouse position:

  • Left click: Tap note
  • Middle key: Hold to create a Long note. Release to set the end time.
  • Right key: Hold to create a Slide note from the current mouse position. Drag to set the end position, release to set the end time.

Keyboard Mode

In keyboard mode, you can add notes at the current hand position:

  • Any normal key (not space or S): Tap note
  • Space: Hold to create a Long note. Release to set the end time.
  • S: Hold to create a Slide note from the current hand position. Release to set the end position.

Visual Hints

  • After you add a note, it will appear on the screen with its type indicated by text and color.
  • The number of notes created will be displayed at the top left corner.
  • The current time (MM:SS.sss) is displayed at the top right corner to help you time your notes.

Saving Chart

  • The chart will be automatically saved to songs/<videoName>_notes.txt when the video ends.
  • The songs/<videoName>_notes.txt has higher priority than the default chart songs/<videoName>.txt so you can test your created chart immediately. You can delete the _notes.txt file to revert to the default chart.
  • The txt file is in the following format:
0 t x y          # Tap note at time t at position (x, y)
1 t x y d        # Long note at time t at position (x, y) with duration d
2 t x y d ex ey  # Slide note at time t at position (x, y) with duration d and end position (ex, ey)

Where t and d are in seconds, and x, y, ex, ey are in normalized coordinates (0 to 1).

Finish Screen

  • At the end of the video, a finish screen will display the total number of notes created.
  • Please exit the application to create another chart or play mode.

Details

The codebase is clearly organized, well named and commented for easy understanding. Below are some key components and their functionalities.

Camera and Hand Tracking

  • In ofApp::setup(), the camera is initialized, and the YOLOv3-tiny-PRN model is initialized using OpenFrameworks' DNN module. The model files (.cfg and .weights) are loaded from the model/ directory.
  • This model features Partial Residual Networks (PRN) [1] which provides more information to each layer by graident combination. Compared to YOLOv3-tiny, it maintains the accuracy while reducing the model size (8.86M to 4.95M) and increasing inference speed, making it suitable for real-time applications.
  • In ofApp::update(), each new camera frame is passed to the DNN for hand detection. The frame is preprocessed into a blob, downscaled to 320x320, color space converted to RGB, and normalized, finally fed into the darknet model.
  • The DNN outputs bounding boxes, confidence scores, and class IDs for potential matches.
  • Non-maximum suppression (NMS) is applied to filter overlapping boxes based on confidence and the detection with the highest confidence is used, and its bounding box is stored in a ofRectangle variable called handBox.
  • The handBox is then used to interact with the game stages, allowing for hover-based selection and gameplay control.
  • If no detection is found for a short period, handBox is reset to top left corner.

Stage Management

The app is organized as stages, managed by a singleton StageManager class. The following classes implement interface Stage:

  • SMenu: The main menu where users can choose from "Play" or "Create" modes by hovering over options or entering "P" and "C" keys.
  • SSelect: The song selection screen where users can browse available songs (it scans the data/songs/ directory for .mp4 files) and select one to play or create notes for.
  • SPlay: The gameplay stage where notes appear in sync with the selected video. Players hit notes by moving their hand into the note area.
  • SCreate: The chart creation stage where users can add and save notes for the selected song.
  • SFinish: The finish screen that displays performance statistics after a play session or a summary after chart creation.

Stage management simplifies event routing and separates responsibilities. In addition, interface Stage provide input methods like keyPressed(), mouseMoved(), etc., which are called by ofApp and routed to the current stage.

Hover Interaction

The HoverManager class manages hover-based interactions across different stages. It checks if the handBox intersects with the corresponding buttons and draws a circular progress indicator (a pair of concentric circles that gradually fills up) to provide visual feedback. Once the hover duration exceeds a threshold, the associated action is triggered.

It also ensures that only when the user hovers on the same button continuously, the progress will accumulate; moving to another button or away will reset the progress.

It is used in SMenu for mode selection and in SSelect for song selection.

Note and Verdict

All types of notes inherit from the base Note class, which defines common properties and methods such as position, timing and drawing. Each note type implements its own update() and draw() methods to handle specific behaviors and visual representations.

There are 3 types of notes:

  • NTap(time, x, y): A single tap note at time t at position (x, y).
  • NLong(time, x, y, duration): A long note starting at time t at position (x, y) with a specified duration.
  • NSlide(time, x, y, duration, endX, endY): A slide note starting at time t at position (x, y) with a specified duration and ending at (endX, endY).

Hitting or failing to hit notes results in different verdicts, defined in the Verdict enum:

  • Pure (VPure), Far (VFar), Lost (VLost), and None (VNone, for pending notes).

Details:

  • Tap: Hit in -300ms~+200ms is Pure, in ±400ms is Far, else Lost.
  • Hold: Total hold duration ≥ 75% of note duration is Pure, ≥ 50% is Far, else Lost.
  • Slide: Both head and tail hit in -300ms~+200ms is Pure, both in ±400ms is Far, else Lost.

In addition, when playing, the game maintains a list of active notes that are currently on screen. Each notes define when they will enter the screen by hasEntered(deltaTime) and when they should be removed by isExpired(deltaTime). The SPlay stage updates and draws only the active notes, improving performance.

Scoring

When playing, ScoreManager keeps track of:

  • The number of Pure, Far, and Lost notes.
  • The current combo and score.
  • The maximum combo achieved during the session.

At the end of the play session, these statistics are displayed on the SFinish stage.

The score is scaled into a 1000000-style score; all notes share the same weight, and each Pure carries 100%, Far 65%, Lost 0%.

Chart Creation

For chart creation, users can add notes using either mouse input or hand hovering:

  1. Mouse mode will add notes at mouse position:
  • Left click: Tap
  • Middle press: Long (hold and release)
  • Right press: Slide (start at press pos, then hold and drag, end at release pos)
  1. Keyboard mode will add notes at current hand position:
  • Any normal key press: Tap
  • Hold Space: Long (hold and release)
  • Hold S: Slide (start at hand pos, end at hand pos on release)

Output chart files are saved at songs/*_notes.txt where * is the song name.

Assets

/bin/data
    /assets
        hand.png      # hand icon image
        longe.png     # long note image
        slide.png     # slide note image
        tap.png       # tap note image
        verdana.ttf   # font file
    /model
        yolo-v3-tiny-prn.cfg
        yolo-v3-tiny-prn.weights
    /songs
        Last Moment.jpg
        Last Moment.mp4
        Last Moment.txt
        Undying Macula.jpg
        Undying Macula.mp4
        Undying Macula.txt

All videos and images are in 1280x720 resolution. You can add your own songs by placing the video file (.mp4), thumbnail image (.jpg), and chart data (.txt) in the songs/ directory. Or, you can skip chart data and create your own chart using the Create mode.

Future Work

  • Improve hand detection accuracy and robustness, possibly by training a custom model or using a more advanced architecture.
  • Enable multi-hand detection for two-player mode or more complex interactions.
  • Enhance the note chart editor with more features, such as undo/redo, note snapping, and visual timeline.
  • Add sound effects and visual effects when hitting notes to make the game more engaging.

Credits

The YOLOv3-tiny-PRN model is provided by cansik under the MIT License.

The icons used in the game are sourced from Flaticon.

The songs used in the demo are from Arcaea, a rhythm game developed by lowiro. The music videos are from official sources: Last Moment and Undying Macula.

All the other code and assets are created by the author.

References

[1] C.-Y. Wang, H.-Y. M. Liao, P.-Y. Chen, and J.-W. Hsieh, “Enriching Variety of Layer-Wise Learning Information by Gradient Combination,” 2019 ICCV Workshop on Low-Power Computer Vision, Oct. 2019, doi: 10.1109/iccvw.2019.00303.

Releases

No releases published

Contributors

Languages