Skip to content

VineetKumarRakesh/thg

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

12 Commits
 
 
 
 
 
 

Repository files navigation

🧠 Talking Heads, but Make It Science

A Funny Yet Serious Survey on Deepfake’s Nerdy Cousin: Talking Head Generation

“Ever wished your selfie could talk during Zoom calls? We’re not quite there... but we’ve already made it nod in agreement and blink suspiciously."
— @mazumdarsoumya


📘 What’s Going On Here?

Welcome to the wildest ride in neural rendering: Talking Head Generation (THG). This repo is your cheat code to understanding a field that blends deep learning with deep confusion, peppered with a little madness and a lot of metrics.

⚠️ DISCLAIMER:
The preprint paper titled
📄 “Advancing Talking Head Generation: A Comprehensive Survey of Multi-Modal Methodologies, Datasets, Evaluation Metrics, and Loss Functions” is NOT going to be published. It’s the mother of all drafts — split into multiple, more detailed children destined for Scopus-indexed journals and conferences. arXiv:2507.02900


📚 The Paper Children (Coming Soon™️, but Real)

These aren’t clones. They are deeper, sharper, peer-reviewed cousins of the main survey. Once they pass reviewer boss fights, we’ll drop the DOIs and links here:

🔍 1. Advancements in talking head generation: a comprehensive review of techniques, metrics, and challenges

🧪 Journal - The Visual Computer, Springer Nature DOI: 10.1007/s00371-025-04232-w

Dive into the model jungle — from GANs to NeRFs and Transformers, all neatly dissected like a biology lab frog.

📃️ 2. Comprehensive Dataset Analysis for Talking Head Generation

📘 Book Chapter - Centering Transparency and Trust in Data and AI Ecosystems, IGI Global

What’s inside VoxCeleb? Why is GRID so... griddy? This one's for dataset diggers.

📏 3. Quantitative Assessment in Talking Head Generation: Metrics and Loss Functions

📗 Journal Submission

If you've ever said "SSIM is enough," this chapter is about to throw shade and math at you.

🧪 4. Empirical Evaluation of State-of-the-Art Talking Head Generation Models

🛠️ Conference Paper - 3rd International Conference on Recent Advances in Artificial Intelligence and Smart Applications

Benchmarks, metrics, results, regrets — actual experimental results with charts and enough tables to furnish an IKEA showroom.

🔬 Links will appear here like magic scrolls, post-acceptance. Until then... stay tuned.


🧪 What's in This Repo?

A research buffet for your GPU and gray matter:

  • 🧠 500+ Research Papers distilled into human language
  • 🧑‍🏫 Categorization across modalities: Audio, Video, Image, Text, 2D/3D, GAN, NeRF, Transformer-based, and more
  • 🎞️ Datasets Decoded: VoxCeleb, GRID, LRS3, CelebV, and other acronyms we pretend to remember
  • 🔬 Evaluation Metrics Galore: SSIM, PSNR, CPBD, LPIPS, LMD, WER, CSIM — enough for a PhD defense
  • 💥 Loss Functions Explained: From Mean Squared Error to “Oh no, my perceptual loss exploded”
  • 🧪 Code + Sample Outputs — because seeing is believing, and benchmarks don’t screenshot themselves

🎮 Sample Outputs

Model Output
Wav2Lip 🎥 Watch
Wav2Lip (Generated) 🎥 Watch
SadTalker 🎥 Watch
SadTalker (Generated) 🎥 Watch

Not DeepFakes. Just DeepWork.


⚙️ Code Zone

Coming soon in eval-scripts/:

  • compute_metrics.py – For SSIM, PSNR, CPBD, LMD, and that one metric your professor insists on using
  • align_faces.py – Because misaligned faces are worse than misaligned deadlines
  • visualize_lipsync.py – For pixel-by-pixel judgement of your model's karaoke skills

Wanna try it? Just:

git clone --depth 1 --force https://github.com/VineetKumarRakesh.git

(Replace VineetKumarRakesh when you’re not lazy.)


📊 Benchmarks Snapshot

Model Dataset SSIM ↑ PSNR ↑ LMD ↓
Wav2Lip VoxCeleb2 0.74 32.5 1.21
FOMM VoxCeleb1 0.68 29.8 1.49
Face-vid2vid GRID 0.72 31.2 1.33

↑ Good. ↓ Also good. ∞? You messed up somewhere.


🧠 Core Contributions (aka "Too Long, Just Tell Me Why It Matters")

  • 📀 Unified taxonomy for THG — no more buzzword soup
  • 🔬 Benchmarked open-source models — because someone had to do the hard part
  • 📦 Dataset comparison — what’s hot, what’s not
  • 📏 Metrics mayhem — why SSIM isn’t always your friend
  • 🎼 Loss functions and why they love making your training unstable

🔖 Citation (For When It’s Published. Soon.)

@misc{rakesh2025talkingheadreview,
  title={Advancing Talking Head Generation: A Comprehensive Survey of Multi-Modal Methodologies, Datasets, Evaluation Metrics, and Loss Functions},
  author={Vineet Kumar Rakesh and Soumya Mazumdar and Research Pratim Maity and Sarbajit Pal and Amitabha Das and Tapas Samanta},
  year={2025},
  note={Preprint – Will not be published. Child papers incoming.}
}

📣 Final Words

If you're into:

  • Machines that talk with your face
  • Academic deep dives that make your brain sweat
  • And humor that makes research tolerable

You're in the right repo.

Stars appreciated. Forks encouraged. Pull requests cautiously welcomed.
🥸 Your talking head just said thanks.

About

An exhaustive survey on Talking Head Generation — exploring state-of-the-art methods, datasets, loss functions, evaluation metrics, and empirical benchmarks. A valuable reference for researchers and practitioners in deep generative modeling, multimedia synthesis, and human-avatar interaction.

Resources

License

Security policy

Stars

Watchers

Forks

Releases

No releases published

Packages

 
 
 

Contributors