Make LLM batch translation thresholds configurable to reduce API calls for multi-page PDFs by reflyable · Pull Request #591 · funstory-ai/BabelDOC

reflyable · 2026-05-12T11:36:46Z

PR Title

[PR] Make LLM batch translation thresholds configurable to reduce API calls for multi-page PDFs

Motivation and Context

Summary of Changes

When translating multi-page PDFs, the LLM batch translation thresholds were hardcoded to 200 tokens or 5 paragraphs. This caused excessive fragmentation: a 30-page PDF with ~10 paragraphs per page would generate 90–150 individual LLM API calls. Each call incurs rate-limiter queuing, network round-trip time, and LLM inference overhead, making the translation of long documents unnecessarily slow.

PR Type

Breaking Changes

No, this PR does not introduce breaking changes.

Default behavior now packs more paragraphs per LLM request, which may increase individual request latency but significantly reduces the total number of API calls. Users can tune the thresholds via CLI or config file to match their model's context window and desired latency profile.

Contributor Checklist

I have fully read and understood the CONTRIBUTING.md guide.
I have performed a self-review of my own code.
My changes follow the project's code style and guidelines
I have linked the related issue(s) in the description above (if applicable)
I have updated relevant documentation (if applicable)
I have added necessary tests that prove my fix is effective or that my feature works (if applicable)
All new and existing tests passed locally with my changes
My changes generate no new warnings or errors
I understand that due to limited maintainer resources, only small PRs are accepted. Suggestions with proof-of-concept patches are appreciated, and my patch may be rewritten if necessary.

Summary by cubic

Makes LLM batch translation thresholds configurable to pack more paragraphs per request and cut API calls for multi-page PDFs. Defaults match previous behavior (200 tokens, 5 paragraphs), so you can opt in to larger batches as needed.

New Features
- Added --llm-batch-max-tokens and --llm-batch-max-paragraphs to control batch sizing.
- PDF translator now reads limits from TranslationConfig instead of hardcoded values.
- Raise thresholds to reduce requests on long PDFs based on your model’s context window and latency needs.

^{Written for commit d7e3c4f. Summary will update on new commits.}

cubic-dev-ai

No issues found across 3 files

cubic-dev-ai

1 issue found across 2 files (changes from recent commits).

Prompt for AI agents (unresolved issues)


Check if these issues are valid — if so, understand the root cause of each and fix them. If appropriate, use sub-agents to investigate and fix each issue separately.


<file name="babeldoc/format/pdf/translation_config.py">

<violation number="1" location="babeldoc/format/pdf/translation_config.py:220">
P1: LLM batch defaults (200 tokens/5 paragraphs) contradict PR goal of larger batches to reduce API calls</violation>
</file>

_{Reply with feedback, questions, or to request a fix. Tag @cubic-dev-ai to re-run a review.}

增加 LLM 批量翻译请求的最大令牌和段落数配置

d74fa46

cubic-dev-ai Bot reviewed May 12, 2026

View reviewed changes

awwaawwa requested changes May 14, 2026

View reviewed changes

Comment thread babeldoc/format/pdf/translation_config.py Outdated

Comment thread babeldoc/main.py Outdated

调整 LLM 批量翻译请求的最大令牌和段落数配置

d7e3c4f

cubic-dev-ai Bot reviewed May 14, 2026

View reviewed changes

Comment thread babeldoc/format/pdf/translation_config.py

reflyable requested a review from awwaawwa May 14, 2026 12:15

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Make LLM batch translation thresholds configurable to reduce API calls for multi-page PDFs#591

Make LLM batch translation thresholds configurable to reduce API calls for multi-page PDFs#591
reflyable wants to merge 2 commits into
funstory-ai:mainfrom
reflyable:main

reflyable commented May 12, 2026 •

edited by cubic-dev-ai Bot

Loading

Uh oh!

cubic-dev-ai Bot left a comment

Uh oh!

Uh oh!

Uh oh!

cubic-dev-ai Bot left a comment

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

Conversation

reflyable commented May 12, 2026 • edited by cubic-dev-ai Bot Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

PR Title

Motivation and Context

Summary of Changes

PR Type

Breaking Changes

Contributor Checklist

Summary by cubic

Uh oh!

cubic-dev-ai Bot left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

cubic-dev-ai Bot left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

reflyable commented May 12, 2026 •

edited by cubic-dev-ai Bot

Loading