YC Vault

Analysis on every YC Batch ever. Read the initial blog post here.

Why?

Y Combinator is one of the largest startup accelerators in the world. It has one of the highest concentrations of technical founders. Companies like Airbnb, Docker, Instacart and Coinbase were all brought up through the accelerator. But they only represent the top percentile.

YC Vault is my attempt to make sense of the entire Y Combinator directory.

Requirements

Any language model of your choice through LiteLLM. High-performing models like GPT-4o-mini are recommended for their data extraction accuracy.

Project installation

git clone https://github.com/lukafilipxvic/YC-Vault.git

uv sync

Set up environment:
- Create a .env file using the '.env.example' file as a template
- Example .env file:
```
[llm]
OPENAI_API_KEY=your_api_key_here

[data]
DATA_DIR=./data
```

Usage

Configure your data sources:
- Update the YC_Batches.csv file with all batch IDs
- This file will need updating as new batches are launched
Run the pipeline:

uv run python scraper/run_pipeline.py

Performance

Time Requirements

get_yc_urls.py: ~2.5 minutes to scrape all YC URLs
get_yc_data.py: ~2.52 seconds per company (approximately 4.2 hours to scrape 6,000 YC companies synchronously)

Cost Analysis

Using GPT-4.1-nano, it costs ~$0.0002 to extract one YC company page.
Total cost for 6,000 YC companies = ~$1.23
For comparison, Gumloop costs ~$48.5 for the same data (39.43x more expensive).

Data Structure

The scraping pipeline generates 3 CSV files:

YC_Companies.csv: Company profiles and metrics
YC_Founders.csv: Founder information and backgrounds
YC_URLs.csv: Source URLs for all scraped data

Contributing

Contributions are welcome! Please feel free to submit a Pull Request.

License

Licensed under AGPL-3.0

Name		Name	Last commit message	Last commit date
Latest commit History 36 Commits
api		api
data		data
images		images
other		other
scraper		scraper
.env.example		.env.example
.gitattributes		.gitattributes
.gitignore		.gitignore
LICENSE		LICENSE
README.md		README.md
_project vision.txt		_project vision.txt
pyproject.toml		pyproject.toml
requirements.txt		requirements.txt
uv.lock		uv.lock

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

YC Vault

Why?

Requirements

Project installation

Usage

Performance

Time Requirements

Cost Analysis

Data Structure

Contributing

License

About

Uh oh!

Uh oh!

Contributors

Uh oh!

Languages

Folders and files

Latest commit

History

Repository files navigation

YC Vault

Why?

Requirements

Project installation

Usage

Performance

Time Requirements

Cost Analysis

Data Structure

Contributing

License

About

Topics

Resources

License

Uh oh!

Stars

Watchers

Forks

Uh oh!

Contributors

Uh oh!

Languages