desh2608.github.io/index.html at master · desh2608/desh2608.github.io · GitHub

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
---
layout: page
title: Hi, I'm Desh
subtitle: Senior Research Scientist at NVIDIA
use-site-title: true
---

<br>I am a Senior Research Scientist at NVIDIA,
where I work on speech capabilities for LLMs. My research interests lie in the application
of machine learning methods for speech and language tasks.<br><br>

Before joining NVIDIA, I did speech research in the Meta Superintelligence Labs,
where we built the first production-grade full-duplex voice agent.<br><br>

I did my PhD from <a href="https://www.cs.jhu.edu/">Johns Hopkins University</a>, working in the <a
	href="https://www.clsp.jhu.edu/">Center for Language and Speech Processing (CLSP)</a>, advised by <a
	href="https://clsp.wse.jhu.edu/faculty-pages/sanjeev/">Sanjeev Khudanpur</a> and <a
	href="http://www.danielpovey.com/">Dan Povey</a>. I was a JHU-Amazon AI2AI fellow, a Fred Jelinek fellow, and
an IEEE Rising Star in Signal Processing.<br><br>

<!-- I have interned in the speech groups at Microsoft (in 2021) and Meta (in 2022). <br><br> -->

<!--  My bachelor thesis was on deep learning methods
for relation extraction in clinical text, supervised by <a href="http://www.iitg.ac.in/anand.ashish/index.html">Ashish
	Anand</a>.<br><br> -->

When I’m not doing ML, I like to work out, climb boulders, play guitar, and <a
	href="https://www.goodreads.com/review/list/62772844-desh-raj?shelf=read&sort=date_read">read fiction</a>.<br>

<hr style="height:2px;border-width:0;color:gray;background-color:gray">

<b>Updates:</b><br><br>

<div class="updates-container">
	<ul class="updates-list">
		<li><i>January 2026:</i> I joined the Nemo Speech AI team at NVIDIA. Excited to do open research
			on speech LLMs and full-duplex voice models!
		</li><br>

		<li><i>October 2025:</i> New paper from <a href="https://www.cs.utexas.edu/~yjshih/">Ian Shih's</a>
			internship on reasoning in SpeechLLMs! Check it out <a href="https://arxiv.org/abs/2510.07497">here</a>.
		</li><br>

		<li><i>April 2025:</i> <b>3 papers accepted at IEEE ICASSP 2025</b>, spanning topics like SpeechLLMs
			and multi-channel speech foundation models. Check out <a href="/publications">publications</a>
			page for more info!
		</li><br>

		<li><i>July 2024:</i> Our AI Speech team becomes a part of GenAI, and is tasked with developing
			speech for LLaMA models.
		</li><br>

		<li><i>January 2024:</i> I joined Meta in NYC as a Research Scientist! I will be working
			on robust on-device ASR in the AI Speech & EMG team led by Mike Seltzer.
		</li><br>

		<li><i>January 2024:</i> I defended my PhD! You can find the slides and video on the
			<a href="/talks">Talks</a> page.
		</li><br>

		<li><i>November 2023:</i> I presented a <a href="./static/poster/ccri-2023.pdf">poster</a>
			about the next-generation Kaldi toolkits at the NSF CIRC PI meeting in Salt Lake City.
		</li><br>

		<li><i>September 2023:</i> I have been awarded a <a
				href="https://www.clsp.jhu.edu/about/jelinek-fellowship/">Fred
				Jelinek fellowship</a>
			by Johns Hopkins, for the academic year 2023-24.
		</li><br>

		<li><i>June 2023:</i> I will be spending this summer in Le Mans (France), participating in
			<a href="https://jsalt2023.univ-lemans.fr/en/index.html">JSALT 2023</a>. Our team will be working on
			WFST+end-to-end methods for speech.
		</li><br>

		<li><i>June 2023:</i> I was selected as an <b>ICASSP Rising Star in Signal Processing</b>.
		</li><br>

		<li><i>May 2023:</i> <b>GSS paper</b> accepted at <a href="https://www.interspeech2023.org/">InterSpeech
				2023</a>.
			This implementation is used in the baseline for the <a
				href="https://www.chimechallenge.org/current/task1/index">CHiME-7 DASR challenge</a>.
		</li><br>

		<li><i>February 2023:</i> <b>2 papers</b> accepted at <a href="https://2023.ieeeicassp.org/">IEEE ICASSP
				2023</a>.
			These papers investigate target-speaker ASR using transducers (work done at Meta AI), and using
			self-supervised
			models (led by my colleague <a href="https://scholar.google.com/citations?user=iQ-S0fQAAAAJ&hl=en">Zili
				Huang</a>).
		</li><br>

		<li><i>October 2022:</i> I am selected as a recipient for the inaugural JHU+Amazon <a
				href="https://ai2ai.engineering.jhu.edu/2022-2023-ai2ai-fellows/">AI2AI fellowship</a> for 2022-23.
		</li><br>

		<li><i>May 2022:</i> I passed my GBO (JHU CS qualifying exam) and officially became a Ph.D. candidate (<a
				href="./static/ppt/gbo_presentation.pdf">here</a> are
			the slides for my presentation). Also,
			I'll be starting an internship at Meta AI (Menlo Park) in the Speech team.
		</li><br>

		<li><i>January 2022:</i> <b>2 papers</b> accepted at <a href="https://2022.ieeeicassp.org/">IEEE ICASSP
				2022</a>.
			These papers investigate multi-talker ASR with neural transducers, and adding domain knowledge for
			fine-tuning of large self-supervised models. <a href="./static/pdf/clsp_recruitment_poster.pdf">Here</a> is
			a
			poster describing both papers.
		</li><br>

		<li><i>Janurary 2022:</i> I participated in the Mini SCALE workshop organized by HLTCOE. I was in the
			<b>"Improving speech analytics for room audio"</b> team led by <a
				href="https://m-wiesner.github.io/">Matthew
				Wiesner</a>.
		</li><br>

		<li><i>June 2021:</i> <b>4 papers</b> accepted at <a href="https://www.interspeech2021.org/">INTERSPEECH
				2021</a>.
			Check out <a href="/publications">publications</a> page for more info! Also, I am attending ICASSP 2021
			virtually :)
		</li><br>

		<li><i>April 2021:</i> Our JHU-GoVivace team placed <b>2nd</b> (and 1st in the Hindi-English task) in the <a
				href="https://navana-tech.github.io/IS21SS-indicASRchallenge/leaderboard.html">Indic code-switching
				challenge</a>.</li><br>

		<li><i>March 2021:</i> I will be interning (virtually) with <a
				href="https://www.microsoft.com/en-us/research/people/jinyli/">Dr. Jinyu Li</a> at Microsoft this
			summer.
		</li><br>

		<li><i>January 2021:</i> Our Hitachi-JHU team obtained <b>2nd best DER</b> in the <a
				href="https://sat.nist.gov/dihard3#tab_leaderboard">Third Dihard challenge</a>. We used several systems,
			and
			combined their outputs with a modified version of <a
				href="https://github.com/desh2608/dover-lap">DOVER-Lap</a>.
			Register for the workshop for more details!</li><br>

		<li><i>November 2020:</i> <b>4 papers</b> accepted at <a href="http://slt2020.org/">IEEE SLT 2021</a>. Check out
			publications page for more info!</li><br>

		<li><i>August 2020:</i> I will be a TA for <a href="https://jhu-intro-hlt.github.io/">Intro to HLT</a> in the
			fall.
		</li><br>

		<li><i>June 2020:</i> I am participating in <a
				href="https://www.clsp.jhu.edu/speech-recognition-and-diarization-for-unsegmented-multi-talker-recordings-with-speaker-overlaps/">JSALT
				2020</a>. I will be working on informed target speaker ASR with <a
				href="http://www.kecl.ntt.co.jp/icl/signal/member/marcd/">Marc Delcroix</a> and <a
				href="https://sites.google.com/view/shinjiwatanabe">Shinji Watanabe</a>.</li><br>

		<li><i>May 2020:</i> Our JHU submission to the <a
				href="https://chimechallenge.github.io/chime6/results.html">CHiME-6 challenge</a> obtained
			<b>second-best</b> results in Track 2 (diarization + ASR track). The system description paper is available
			<a href="https://arxiv.org/abs/2006.07898">here</a>.
		</li><br>

	</ul>
</div>

<!-- <hr style="height:2px;border-width:0;color:gray;background-color:gray"> -->

<!-- <b>Referral policy:</b> Please check out my <a href="/referral">referral policy</a> before reaching out to me for
referrals. -->