Full text transcripts of 354 episodes of the Huberman Lab Podcast, hosted by Dr. Andrew Huberman.
| Directory | Format | Description |
|---|---|---|
vtt/ |
WebVTT (.vtt) | Original subtitle files with timestamps |
transcripts/ |
Plain text (.txt) | Clean text without timestamps, formatted into paragraphs |
episodes.csv |
CSV | Episode index with titles and YouTube video IDs |
These transcripts are auto-generated YouTube captions, not official transcripts. They may contain:
- Inaccurate transcriptions of technical/scientific terms
- Missing punctuation or incorrect sentence breaks
- Occasional misheard words
Each episode is available as a .txt file in the transcripts/ directory. File names match episode titles.
import csv
with open('episodes.csv') as f:
reader = csv.DictReader(f)
for row in reader:
print(row['title'], row['youtube_id'])# Search all transcripts for a topic
grep -rl "dopamine" transcripts/
# Search with context
grep -n "cold exposure" transcripts/*.txt- Research and reference
- Building search indexes over podcast content
- Training data for NLP projects
- Accessibility
- Personal study and note-taking
These transcripts are derived from YouTube's auto-generated captions. All content is the intellectual property of Huberman Lab. This repository is provided for educational and research purposes. If you are the content owner and would like this repository removed, please open an issue.
The code and tooling in this repository is released under the MIT License. The transcript content itself is subject to the original copyright of Huberman Lab.
- Episodes: 354
- Format: WebVTT + Plain Text
- Language: English