An extended PDF translation application built on top of the original PDFMathTranslate project.
This repository focuses on:
- adding Vietnamese translation support
- providing practical user interfaces for running the translation workflow
- exposing the translation flow through Gradio and a web interface
The core PDF parsing, layout preservation, and translation engine are still based on the upstream project. In other words, this repository is an integration and extension layer rather than a complete rewrite of the original engine.
This project helps translate PDF documents while trying to preserve:
- document layout
- mathematical expressions
- bilingual and mono-language output formats
It supports multiple translation services and offers three ways to use the project:
- Gradio interface
- Web interface (React + Flask)
- Command-line interface
To avoid confusion, this is the exact scope of the work in this repository:
- The original core engine comes from
PDFMathTranslate. - This fork extends the project to support Vietnamese translation scenarios.
- This fork adds and adapts runnable interfaces, especially:
gui.pyfor Gradiobackend/app.pyfor Flask APImyserver/frontend/for the React web UI
If you are evaluating the project architecture, think of it like this:
pdf2zh/is the engine roomgui.py,backend/, andmyserver/frontend/are the dashboards and control panels
- Translate PDF files from local uploads or remote URLs
- Preserve formatting, layout, and mathematical expressions as much as possible
- Generate two output variants:
- mono-language PDF
- dual-language PDF
- Support page-range based translation
- Support multiple translation providers
- Support advanced runtime options such as threading, cache control, and font handling
- Provide both Gradio and web-based interfaces
- Add Vietnamese as a supported target language in the workflow
- Python 3.8 or newer
- Node.js 16 or newer
npmoryarn
pdf_translate/
├── pdf2zh/ # Core engine inherited from the upstream project
│ ├── __init__.py
│ ├── high_level.py
│ ├── translator.py
│ ├── config.py
│ ├── doclayout.py
│ ├── converter.py
│ ├── pdfinterp.py
│ └── ...
├── backend/
│ └── app.py # Flask API for the web mode
├── myserver/
│ ├── package.json # Node.js dependencies for the server layer
│ ├── server.js # Express server for production-style serving
│ └── frontend/
│ ├── package.json # React dependencies
│ ├── public/
│ └── src/
├── gui.py # Gradio interface entry point
├── pdf_translate.py # CLI entry point
├── requirements.txt # Python dependencies
└── README.md
git clone https://github.com/your-username/pdf_translate.git
cd pdf_translateWindows:
python -m venv venv
venv\Scripts\activateLinux/macOS:
python -m venv venv
source venv/bin/activatepip install -r requirements.txtcd myserver
npm install
cd frontend
npm install
cd ../..This step is required if you want to serve the built frontend through myserver/server.js.
cd myserver/frontend
npm run build
cd ../..From the repository root:
python gui.pyThe Gradio interface usually starts at:
http://localhost:7860
gui.pypdf2zh/core modulesrequirements.txtPython dependencies
Gradio stores working files and outputs in:
pdf2zh_files/
That directory contains:
- copied input PDFs
- mono output PDFs
- dual output PDFs
The Gradio interface exposes several execution options. Here is what they mean in practical terms:
-
TypeFile: upload a local PDF from your machineLink: download a PDF from a URL before translating
-
Service- Chooses which translation backend will translate the extracted text
- Examples include
google,deepl,openai,gemini,ollama, and others - The actual quality, speed, cost, and credential requirements depend on the provider you select
-
Translate from- Source language of the original PDF content
- Example:
English
-
Translate to- Target language of the translation output
- Example:
Vietnamese
-
PagesAll: translate the entire PDFFirst: translate only the first pageFirst 5 pages: translate the first five pagesOthers: manually enter page numbers or ranges
-
Page range- Used when
Pages = Others - Example values:
1,3,52-81,4-6,10
- Used when
-
number of threads- Controls how many worker threads are used for translation tasks
- Higher values may improve speed, but they can also increase CPU and memory usage
- If you imagine the translator as a team of workers, this option chooses how many workers can handle tasks in parallel
-
Skip font subsetting- When disabled, the output PDF embeds only the necessary font pieces
- When enabled, font subsetting is skipped
- Meaning at runtime:
- may improve compatibility in some PDF viewers or workflows
- may make the output file larger
-
Ignore cache- Forces the system to retranslate content instead of reusing cached translation results
- Useful when:
- you changed model settings
- you changed prompt behavior
- you want a clean rerun
-
Custom Prompt- Used only for LLM-style providers that support custom prompting
- Lets you influence translation style or wording
-
Use BabelDOC- Uses the experimental BabelDOC-based backend path instead of the standard path
- Useful for experimentation, but behavior may differ from the default translation path
-
Service-specific credential fields
- Some providers require API keys, endpoints, model names, or host values
- These fields appear dynamically depending on the selected service
After translation, Gradio returns:
-
*-mono.pdf- translated PDF only
-
*-dual.pdf- bilingual PDF, typically combining source and translated content
This is the recommended mode if you want a browser-based workflow with a separate frontend and backend.
Open Terminal 1:
cd backend
python app.pyThe Flask API usually starts at:
http://localhost:5000
Open Terminal 2:
cd myserver/frontend
npm startThe React app usually starts at:
http://localhost:3000
backend/app.pymyserver/frontend/src/myserver/frontend/public/pdf2zh/core modules
When started as documented above, Flask stores files in:
backend/uploads/backend/outputs/
Those folders are used for:
- uploaded source PDFs
- per-session translated outputs
The web interface exposes nearly the same runtime concepts as the Gradio UI:
-
Upload from computer/Upload from URL- defines where the source PDF comes from
-
Service- chooses the translation backend
-
From language- source language
-
To language- target language, including
Vietnamese
- target language, including
-
Page rangeAll,First,First 5 pages, or custom pages
-
Threads- controls parallel work during translation
-
Skip font subsetting- may improve compatibility, but can increase output size
-
Ignore cache- forces fresh translation instead of reusing cached results
-
Custom prompt- available for LLM-capable providers
-
Use BabelDOC- switches to the experimental translation path
-
Provider-specific environment values
- used for API keys, models, endpoints, or host configuration
This is the actual execution flow:
- The frontend uploads a local file or sends a URL to the Flask backend.
- The backend stores the input file under
backend/uploads/. - The backend starts translation in a background thread.
- Progress is exposed through a status endpoint.
- Output files are written to
backend/outputs/<session_id>/. - The frontend lets the user download mono and dual PDFs.
That means the web UI is mainly a control panel, while the heavy PDF processing still happens inside the Python core.
You can also use the project from the command line.
python pdf_translate.py --helppython pdf_translate.py paper.pdf --lang-in en --lang-out vi --service google-
files- one or more input PDF files
-
--lang-in- source language code, for example
en
- source language code, for example
-
--lang-out- target language code, for example
vi
- target language code, for example
-
--service- translation service name
- example:
google,openai,gemini,ollama
-
--pages- page selection string
- example:
1,3,5-8
-
--output- output directory for generated PDFs
-
--thread- number of worker threads
-
--interactive- starts the Gradio interface instead of direct CLI translation
-
--share- enables Gradio sharing mode
-
--prompt- path to a prompt template file
-
--babeldoc- uses the experimental BabelDOC backend
-
--skip-subset-fonts- skips font subsetting
-
--ignore-cache- forces fresh translation
-
--compatible- converts the input PDF to PDF/A first to improve compatibility in some cases
The repository exposes multiple providers through the UI and backend, including:
- Google Translate
- Bing Translator
- DeepL
- DeepLX
- Ollama
- Xinference
- Azure OpenAI
- OpenAI
- Zhipu
- ModelScope
- Silicon
- Gemini
- Azure Translator
- Tencent
- Dify
- AnythingLLM
- Argos
- Grok
- Groq
- Deepseek
- Qwen-MT
Availability depends on:
- installed dependencies
- provider credentials
- selected runtime mode
- local environment configuration
The Flask backend exposes the following endpoints:
-
GET /api/services- returns available translation services and languages
-
POST /api/upload- uploads a PDF file
-
POST /api/url- downloads a PDF from a URL
-
POST /api/translate- starts the translation process
-
GET /api/status/<session_id>- returns translation progress and status
-
POST /api/cancel/<session_id>- requests cancellation of an active translation job
-
GET /api/download/<file_path>- downloads a generated output PDF
Common variables used by the project include:
-
PORT- port for the Node.js server
- default:
3000
-
FLASK_API_URL- backend URL used by the Node.js server
- default:
http://localhost:5000
-
REACT_APP_API_URL- backend URL used by the React frontend
- default:
http://localhost:5000
Provider-specific variables depend on the selected translation service. For example:
- OpenAI-style providers may require API keys and model names
- self-hosted providers may require a host or endpoint URL
- enterprise providers may require both an endpoint and a credential key
- This repository is not a replacement for the original
PDFMathTranslateengine. - The core translation and PDF reconstruction logic still relies on the upstream project architecture.
- Vietnamese support and the runnable interfaces are the main custom focus of this repository.
- Output quality depends heavily on:
- source PDF quality
- selected translation provider
- page complexity
- mathematical layout density
This repository continues to depend on the upstream project. Please review the original project license and all third-party service terms before production use.