Skip to content

ut-vision/XGaze3D

Repository files navigation

Domain-Adaptive Full-Face Gaze Estimation via Novel-View-Synthesis and Feature Disentanglement

[arXiv 2305.16140]

Jiawei Qin1, Takuru Shimoyama1, Xucong Zhang2, Yusuke Sugano1
1The University of Tokyo    2Delft University of Technology


ETH-XGaze 3D

ETH-XGaze is a large‑scale full‑face gaze dataset captured with 18 synchronized cameras.
However, its original camera parameters contain noise, and 2D landmark annotations contain mismatches.

  • We use Agisoft Metashape to re-calibrate the camera extrinsic parameters for each frame; then we compute averaged camera extrinsic parameters for each subject
  • We re-detected the 2D landmarks, and compute a 3D landmarks by optimizing for the whole 18 images.

We directly provide the updated camera parameters & updated annotations, which can be used to re-normalize the XGaze (not included in this code).

With refined cameras and landmarks, the normalized XGaze dataset are less noisy, as the comparison illustrated below:

Original normalized XGaze

Updated normalized XGaze

Overview

This repo contains:

  1. Accurate multi‑view 3D reconstruction for every frame (via Agisoft Metashape).
  2. Photo‑realistic novel‑view rendering with PyTorch3D, giving synthetic images. The synthetic images show comparable performance with the real data under same head pose/gaze.

Installation

Tested on Ubuntu 20.04, CUDA 12.2, Python 3.8, PyTorch3D 0.7.8

git clone https://github.com/ut-vision/XGaze3D.git
cd XGaze3D

Download Metashape-2.2.1-cp37.cp38.cp39.cp310.cp311-abi3-linux_x86_64.whl from https://www.agisoft.com/downloads/installer, and put it inside XGaze3D/

uv (Recommended)

(Install uv)

curl -Ls https://astral.sh/uv/install.sh | sh
uv python install 3.8
uv python pin 3.8 

If you downloaded a Metashape file with a different version, please modify pyproject.toml: metashape = { path = "Metashape-2.2.1-cp37.cp38.cp39.cp310.cp311-abi3-linux_x86_64.whl" } to your version.

uv sync
source .venv/bin/activate
Conda
conda create -n xgaze3d python=3.8
conda activate xgaze3d
pip install -r requirements.txt
pip install torch==2.2.0 torchvision==0.17.0 --index-url https://download.pytorch.org/whl/cu121
pip install "git+https://github.com/facebookresearch/pytorch3d.git"
pip install face_alignment
pip install Metashape-2.2.1-cp37.cp38.cp39.cp310.cp311-abi3-linux_x86_64.whl

LD_PRELOAD=/usr/lib/x86_64-linux-gnu/libffi.so.7 is needed if you used a Conda environment.

Activate/de-activate Metashape

## activate 
python activate_metashape.py
# LD_PRELOAD=/usr/lib/x86_64-linux-gnu/libffi.so.7 python activate_metashape.py

## de-activate
python deactivate_metashape.py
# LD_PRELOAD=/usr/lib/x86_64-linux-gnu/libffi.so.7 python deactivate_metashape.py

Data Preparation

  • ETH‑XGaze
    Download the raw dataset from the ETH-XGaze.

  • Updated files (cameras & annotations)
    Download the updated files from Google Drive.

    • unzip avg_cams_final.zip and annotation_updated.zip
    • Placing the files as follows:
    ETH-XGaze/
    ├─ calibration/cam_calibration/
    ├─ avg_cams_final/            # ★ re‑calibrated cameras
    ├─ data/
    │  ├─ train/
    │  ├─ annotation_train/       # original annotations
    │  └─ annotation_updated/     # ★ refined annotations
    └─ light_meta.yaml            # ★ The information of the lighting condition: we only use the full-light frames
    
  • Places365: Download the validation split (val_256) from the Places365.

1. Run 3D Reconstruction

  • The raw images are pre-processed: cropping, resizing, background-removing.
  • --resize 1200: we resize the cropped face to 1200x1200 to reduce the processing time without harming the quality, you can adjust this size to find a better tradeoff.
cd src
# LD_PRELOAD=/usr/lib/x86_64-linux-gnu/libffi.so.7 python main_reconstruct.py \# for Conda 
python main_reconstruct.py \
  --xgaze_basedir <PATH_TO_ETH-XGaze> \
  --output_path <SAVE_DIR> \
  --resize 1200 \             # trade‑off between quality & speed
  --grp configs/group.yaml    # subjects to process

Output Format

<SAVE_DIR>/
└─ yyyy‑mm‑dd/hh‑mm‑ss/
   └─ 1200_data/train/
      └─ subject0000/frame0000/
         └─ final_obj/
            ├─ mvs_3d.obj
            ├─ mvs_3d.mtl
            └─ mvs_3d.jpg

We included one 3D sample in assets/subject0000/frame0000/final_obj

2. Run Rendering

python3 main_render.py \
  --renderer_split_verts 2 \               # split meshes if GPU RAM is low
  --xgaze_basedir <PATH_TO_ETH-XGaze> \
  --xgaze_3d_basedir <PATH_TO_THE_3D_OUTPUT> \
  --output_path <RENDER_OUT> \
  --grp configs/group.yaml \
  --place365_dir <PATH_TO_Places365_val_256>

Output and Visualization

The rendering code will produce two versions

  • Rendered with green background and full light
  • aug: Rendered with random images from Places365 as background and random low light
<RENDER_OUT>/render_<time>/
├─ subject0000.h5
├─ subject0003.h5
├─ ...
└─ aug/
    ├─ subject0000.h5
    ├─ subject0003.h5
    └─ ...
         
Key Shape Description
face_gaze (N, 2) Gaze angles (pitch, yaw)
face_head_pose (N, 3) Head pose (roll, pitch, yaw)
face_patch (N, 448 × 448 × 3) Rendered face
rotation_matrix (N, 3 × 3) Source → target rotation
face_mat_norm (N, 3 × 3) Camera normalization matrix
landmarks_norm (N, 68, 2) 2D landmark positions in normalized space
python3 print_result.py --data_dir <SAVE_DIR>

The updated annotation formats

subject0000_updated.csv:

  • column 1: frame folder name
  • column 2: image file name
  • column 3-4: gaze point in the screen coordinate system, it is the same for all samples in the same frame folder
  • column 5-7: gaze point location in the current camera coordinate system.
  • column 8-10: (UPDATED) head pose rotation in the current camera coordinate system, which is estimated from detected 2D facial landmarks.
  • column 11-13: (UPDATED) head pose translation in the current camera coordinate system, which is estimated from detected 2D facial landmarks.
  • column 14-150: the 68 detected 2D landmarks,
  • column 14-114: (ADDED) reprojected 2D facial landmarks, and only has 50 landmarks

Citation

If you find the dataset useful for your research, please consider citing:

@article{qin2023domain,
  title={Domain-adaptive full-face gaze estimation via novel-view-synthesis and feature disentanglement},
  author={Qin, Jiawei and Shimoyama, Takuru and Zhang, Xucong and Sugano, Yusuke},
  journal={arXiv preprint arXiv:2305.16140},
  year={2023}
}

🔥 Huge thanks to Xucong Zhang for contributing to the Metashape multi‑view reconstruction scripts!

License

ETH‑XGaze, Metashape, and Places365 are subject to their respective licenses—please comply with their terms.

Contact

If you have any questions, feel free to contact Jiawei Qin at jqin@iis.u-tokyo.ac.jp.

About

Official implementation of XGaze Multi-view 3D reconstruction and rendering in the paper: omain-Adaptive Full-Face Gaze Estimation via Novel-View-Synthesis and Feature Disentanglement

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

 
 
 

Contributors

Languages