pdf2vitepress

将 PDF 文档通过 dots.ocr 转为 Markdown，并支持使用 LLM 进行整本 Markdown 翻译。

环境要求

Python 3.12+
uv

安装

安装项目依赖：

uv sync

准备环境变量（用于 translate）：

cp .env.example .env

然后在 .env 中填写你的 LLM_API_KEY。

重要说明（运行 `just ocr` 前必看）

just ocr 会调用 vllm 启动 OCR 服务。请先安装 vllm 工具，否则命令无法运行：

uv tool install vllm

常用命令

查看全部命令：

just

1) 启动 OCR 服务

just ocr

默认会在 http://localhost:8000/v1 提供 OpenAI 兼容接口（与 config.toml 中的 [ocr].endpoint 对应）。

2) PDF 转 Markdown

just convert 'Advances in Financial Machine Learning 2018.pdf'

3) 翻译输出的 Markdown

just translate 'output/Advances in Financial Machine Learning 2018/book.md' lang="zh-CN"

输出目录

默认输出目录：./output
关键配置在 config.toml：
- [ocr]：OCR 服务地址、模型与提示词
- [pdf]：渲染 DPI、并行线程
- [output]：输出目录、页分隔、图片格式
- [llm]：翻译服务 endpoint/model/并发/目标语言

Name		Name	Last commit message	Last commit date
Latest commit History 1 Commit
src/pdf2md		src/pdf2md
web		web
.env.example		.env.example
.gitignore		.gitignore
.python-version		.python-version
README.md		README.md
config.toml		config.toml
justfile		justfile
main.py		main.py
pyproject.toml		pyproject.toml
uv.lock		uv.lock

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

pdf2vitepress

环境要求

安装

重要说明（运行 `just ocr` 前必看）

常用命令

1) 启动 OCR 服务

2) PDF 转 Markdown

3) 翻译输出的 Markdown

输出目录

About

Uh oh!

Releases

Packages

Uh oh!

Contributors

Uh oh!

Languages

Folders and files

Latest commit

History

Repository files navigation

pdf2vitepress

环境要求

安装

重要说明（运行 just ocr 前必看）

常用命令

1) 启动 OCR 服务

2) PDF 转 Markdown

3) 翻译输出的 Markdown

输出目录

About

Resources

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Contributors

Uh oh!

Languages

重要说明（运行 `just ocr` 前必看）

Packages