forked from bairdzhang/bairdzhang_old.github.io
-
Notifications
You must be signed in to change notification settings - Fork 37
Expand file tree
/
Copy pathindex.html
More file actions
164 lines (129 loc) · 7.39 KB
/
index.html
File metadata and controls
164 lines (129 loc) · 7.39 KB
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
---
layout: page
title: Hi, I'm Desh
subtitle: Senior Research Scientist at NVIDIA
use-site-title: true
---
<br>I am a Senior Research Scientist at NVIDIA,
where I work on speech capabilities for LLMs. My research interests lie in the application
of machine learning methods for speech and language tasks.<br><br>
Before joining NVIDIA, I did speech research in the Meta Superintelligence Labs,
where we built the first production-grade full-duplex voice agent.<br><br>
I did my PhD from <a href="https://www.cs.jhu.edu/">Johns Hopkins University</a>, working in the <a
href="https://www.clsp.jhu.edu/">Center for Language and Speech Processing (CLSP)</a>, advised by <a
href="https://clsp.wse.jhu.edu/faculty-pages/sanjeev/">Sanjeev Khudanpur</a> and <a
href="http://www.danielpovey.com/">Dan Povey</a>. I was a JHU-Amazon AI2AI fellow, a Fred Jelinek fellow, and
an IEEE Rising Star in Signal Processing.<br><br>
<!-- I have interned in the speech groups at Microsoft (in 2021) and Meta (in 2022). <br><br> -->
<!-- My bachelor thesis was on deep learning methods
for relation extraction in clinical text, supervised by <a href="http://www.iitg.ac.in/anand.ashish/index.html">Ashish
Anand</a>.<br><br> -->
When I’m not doing ML, I like to work out, climb boulders, play guitar, and <a
href="https://www.goodreads.com/review/list/62772844-desh-raj?shelf=read&sort=date_read">read fiction</a>.<br>
<hr style="height:2px;border-width:0;color:gray;background-color:gray">
<b>Updates:</b><br><br>
<div class="updates-container">
<ul class="updates-list">
<li><i>January 2026:</i> I joined the Nemo Speech AI team at NVIDIA. Excited to do open research
on speech LLMs and full-duplex voice models!
</li><br>
<li><i>October 2025:</i> New paper from <a href="https://www.cs.utexas.edu/~yjshih/">Ian Shih's</a>
internship on reasoning in SpeechLLMs! Check it out <a href="https://arxiv.org/abs/2510.07497">here</a>.
</li><br>
<li><i>April 2025:</i> <b>3 papers accepted at IEEE ICASSP 2025</b>, spanning topics like SpeechLLMs
and multi-channel speech foundation models. Check out <a href="/publications">publications</a>
page for more info!
</li><br>
<li><i>July 2024:</i> Our AI Speech team becomes a part of GenAI, and is tasked with developing
speech for LLaMA models.
</li><br>
<li><i>January 2024:</i> I joined Meta in NYC as a Research Scientist! I will be working
on robust on-device ASR in the AI Speech & EMG team led by Mike Seltzer.
</li><br>
<li><i>January 2024:</i> I defended my PhD! You can find the slides and video on the
<a href="/talks">Talks</a> page.
</li><br>
<li><i>November 2023:</i> I presented a <a href="./static/poster/ccri-2023.pdf">poster</a>
about the next-generation Kaldi toolkits at the NSF CIRC PI meeting in Salt Lake City.
</li><br>
<li><i>September 2023:</i> I have been awarded a <a
href="https://www.clsp.jhu.edu/about/jelinek-fellowship/">Fred
Jelinek fellowship</a>
by Johns Hopkins, for the academic year 2023-24.
</li><br>
<li><i>June 2023:</i> I will be spending this summer in Le Mans (France), participating in
<a href="https://jsalt2023.univ-lemans.fr/en/index.html">JSALT 2023</a>. Our team will be working on
WFST+end-to-end methods for speech.
</li><br>
<li><i>June 2023:</i> I was selected as an <b>ICASSP Rising Star in Signal Processing</b>.
</li><br>
<li><i>May 2023:</i> <b>GSS paper</b> accepted at <a href="https://www.interspeech2023.org/">InterSpeech
2023</a>.
This implementation is used in the baseline for the <a
href="https://www.chimechallenge.org/current/task1/index">CHiME-7 DASR challenge</a>.
</li><br>
<li><i>February 2023:</i> <b>2 papers</b> accepted at <a href="https://2023.ieeeicassp.org/">IEEE ICASSP
2023</a>.
These papers investigate target-speaker ASR using transducers (work done at Meta AI), and using
self-supervised
models (led by my colleague <a href="https://scholar.google.com/citations?user=iQ-S0fQAAAAJ&hl=en">Zili
Huang</a>).
</li><br>
<li><i>October 2022:</i> I am selected as a recipient for the inaugural JHU+Amazon <a
href="https://ai2ai.engineering.jhu.edu/2022-2023-ai2ai-fellows/">AI2AI fellowship</a> for 2022-23.
</li><br>
<li><i>May 2022:</i> I passed my GBO (JHU CS qualifying exam) and officially became a Ph.D. candidate (<a
href="./static/ppt/gbo_presentation.pdf">here</a> are
the slides for my presentation). Also,
I'll be starting an internship at Meta AI (Menlo Park) in the Speech team.
</li><br>
<li><i>January 2022:</i> <b>2 papers</b> accepted at <a href="https://2022.ieeeicassp.org/">IEEE ICASSP
2022</a>.
These papers investigate multi-talker ASR with neural transducers, and adding domain knowledge for
fine-tuning of large self-supervised models. <a href="./static/pdf/clsp_recruitment_poster.pdf">Here</a> is
a
poster describing both papers.
</li><br>
<li><i>Janurary 2022:</i> I participated in the Mini SCALE workshop organized by HLTCOE. I was in the
<b>"Improving speech analytics for room audio"</b> team led by <a
href="https://m-wiesner.github.io/">Matthew
Wiesner</a>.
</li><br>
<li><i>June 2021:</i> <b>4 papers</b> accepted at <a href="https://www.interspeech2021.org/">INTERSPEECH
2021</a>.
Check out <a href="/publications">publications</a> page for more info! Also, I am attending ICASSP 2021
virtually :)
</li><br>
<li><i>April 2021:</i> Our JHU-GoVivace team placed <b>2nd</b> (and 1st in the Hindi-English task) in the <a
href="https://navana-tech.github.io/IS21SS-indicASRchallenge/leaderboard.html">Indic code-switching
challenge</a>.</li><br>
<li><i>March 2021:</i> I will be interning (virtually) with <a
href="https://www.microsoft.com/en-us/research/people/jinyli/">Dr. Jinyu Li</a> at Microsoft this
summer.
</li><br>
<li><i>January 2021:</i> Our Hitachi-JHU team obtained <b>2nd best DER</b> in the <a
href="https://sat.nist.gov/dihard3#tab_leaderboard">Third Dihard challenge</a>. We used several systems,
and
combined their outputs with a modified version of <a
href="https://github.com/desh2608/dover-lap">DOVER-Lap</a>.
Register for the workshop for more details!</li><br>
<li><i>November 2020:</i> <b>4 papers</b> accepted at <a href="http://slt2020.org/">IEEE SLT 2021</a>. Check out
publications page for more info!</li><br>
<li><i>August 2020:</i> I will be a TA for <a href="https://jhu-intro-hlt.github.io/">Intro to HLT</a> in the
fall.
</li><br>
<li><i>June 2020:</i> I am participating in <a
href="https://www.clsp.jhu.edu/speech-recognition-and-diarization-for-unsegmented-multi-talker-recordings-with-speaker-overlaps/">JSALT
2020</a>. I will be working on informed target speaker ASR with <a
href="http://www.kecl.ntt.co.jp/icl/signal/member/marcd/">Marc Delcroix</a> and <a
href="https://sites.google.com/view/shinjiwatanabe">Shinji Watanabe</a>.</li><br>
<li><i>May 2020:</i> Our JHU submission to the <a
href="https://chimechallenge.github.io/chime6/results.html">CHiME-6 challenge</a> obtained
<b>second-best</b> results in Track 2 (diarization + ASR track). The system description paper is available
<a href="https://arxiv.org/abs/2006.07898">here</a>.
</li><br>
</ul>
</div>
<!-- <hr style="height:2px;border-width:0;color:gray;background-color:gray"> -->
<!-- <b>Referral policy:</b> Please check out my <a href="/referral">referral policy</a> before reaching out to me for
referrals. -->