Skip to content

⚡ Bolt: optimize quota calculation and Claude detection#338

Open
Laisky wants to merge 1 commit into
mainfrom
bolt-optimize-quota-calculation-1781255884396622314
Open

⚡ Bolt: optimize quota calculation and Claude detection#338
Laisky wants to merge 1 commit into
mainfrom
bolt-optimize-quota-calculation-1781255884396622314

Conversation

@Laisky
Copy link
Copy Markdown
Owner

@Laisky Laisky commented Apr 24, 2026

This PR introduces performance optimizations in the quota calculation hot path:

  • Consolidates completion ratio resolution in Compute to avoid redundant lookups.
  • Eliminates manual tier iteration loops by deriving effective completion ratios from eff.OutputRatio / eff.InputRatio.
  • Optimizes containsASCIIFold (used for Claude detection) by pre-calculating search character variants, avoiding redundant function calls in the loop.

These changes improve efficiency and reduce CPU overhead for core billing operations. Verified with go test ./relay/quota/....


PR created automatically by Jules for task 1781255884396622314 started by @Laisky

Summary by CodeRabbit

  • Documentation

    • Added documentation for quota performance optimizations.
  • Refactor

    • Optimized quota completion ratio calculation efficiency.
    • Improved string search performance through enhanced processing.

Consolidate ratio resolution and optimize string search in quota.go.
Derive tiered completion ratio from eff ratios to avoid manual looping.
Optimize containsASCIIFold for faster Claude detection.

Co-authored-by: Laisky <4532436+Laisky@users.noreply.github.com>
@google-labs-jules
Copy link
Copy Markdown

👋 Jules, reporting for duty! I'm here to lend a hand with this pull request.

When you start a review, I'll add a 👀 emoji to each comment to let you know I've read it. I'll focus on feedback directed at me and will do my best to stay out of conversations between you and other bots or reviewers to keep the noise down.

I'll push a commit with your requested changes shortly after. Please note there might be a delay between these steps, but rest assured I'm on the job!

For more direct control, you can switch me to Reactive Mode. When this mode is on, I will only act on comments where you specifically mention me with @jules. You can find this option in the Pull Request section of your global Jules UI settings. You can always switch back!

New to Jules? Learn more at jules.google/docs.


For security, I will only act on instructions from the user who triggered this task.

Copilot AI review requested due to automatic review settings April 24, 2026 16:14
@coderabbitai
Copy link
Copy Markdown

coderabbitai Bot commented Apr 24, 2026

📝 Walkthrough

Walkthrough

This PR optimizes the quota computation hot path by inlining completion ratio resolution and deriving it from effective pricing ratios (eff.OutputRatio / eff.InputRatio) instead of iterating through tiers. It also optimizes ASCII-fold string detection via precomputed uppercase variants and fast byte-matching.

Changes

Cohort / File(s) Summary
Documentation
.jules/bolt.md
Added documentation section describing completion ratio optimization and ASCII-fold detection improvements on the quota hot path.
Quota Computation Logic
relay/quota/quota.go
Refactored Compute to inline completionRatioResolved derivation with priority fallback (channel ratio → model config → three-layer pricing), replaced tier-iteration completion ratio calculation with direct ratio derivation (eff.OutputRatio / eff.InputRatio), and optimized containsASCIIFold with precomputed uppercase byte and fast path matching.

Estimated code review effort

🎯 3 (Moderate) | ⏱️ ~25 minutes

Possibly related PRs

Poem

🐰 Ratios dance in pricing's gleam,
Where tiers once toiled in loops extreme,
Now effective paths shine bright and fast,
No fold too costly, folds unsurpassed! ✨

🚥 Pre-merge checks | ✅ 5
✅ Passed checks (5 passed)
Check name Status Explanation
Description Check ✅ Passed Check skipped - CodeRabbit’s high-level summary is enabled.
Title check ✅ Passed The title accurately summarizes the main changes: optimizing quota calculation and Claude detection through performance improvements in the hot path.
Docstring Coverage ✅ Passed Docstring coverage is 100.00% which is sufficient. The required threshold is 80.00%.
Linked Issues check ✅ Passed Check skipped because no linked issues were found for this pull request.
Out of Scope Changes check ✅ Passed Check skipped because no linked issues were found for this pull request.

✏️ Tip: You can configure your own custom pre-merge checks in the settings.

✨ Finishing Touches
📝 Generate docstrings
  • Create stacked PR
  • Commit on current branch
🧪 Generate unit tests (beta)
  • Create PR with unit tests
  • Commit unit tests in branch bolt-optimize-quota-calculation-1781255884396622314

Warning

Review ran into problems

🔥 Problems

Timed out fetching pipeline failures after 30000ms


Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share

Comment @coderabbitai help to get the list of available commands and usage tips.

@github-actions
Copy link
Copy Markdown

Merging this branch will not change overall coverage

Impacted Packages Coverage Δ 🤖
github.com/Laisky/one-api/relay/quota 0.00% (ø)

Coverage by file

Changed files (no unit tests)

Changed File Coverage Δ Total Covered Missed 🤖
github.com/Laisky/one-api/relay/quota/quota.go 0.00% (ø) 0 0 0

Please note that the "Total", "Covered", and "Missed" counts above refer to code statements instead of lines of code. The value in brackets refers to the test coverage of that file in the old version of the code.

Copy link
Copy Markdown

@gemini-code-assist gemini-code-assist Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Code Review

This pull request optimizes the quota calculation hot path in 'relay/quota/quota.go' by streamlining tier resolution and introducing a fast-path byte search in 'containsASCIIFold'. While the changes improve performance, the reviewer identified a bug in the new logic where manual ratio overrides are ignored when the input ratio is zero. A code suggestion has been provided to correctly handle these overrides.

Comment thread relay/quota/quota.go
Comment on lines +96 to 100
if eff.InputRatio != 0 {
usedCompletionRatio = eff.OutputRatio / eff.InputRatio
} else {
baseComp = 1.0
usedCompletionRatio = 1.0
}
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

high

The optimization of deriving usedCompletionRatio via eff.OutputRatio / eff.InputRatio introduces a bug when the model is free in the configuration (eff.InputRatio == 0) but has a manual ratio override (hasChannelModelRatioOverride is true). In this scenario, the division is skipped and the ratio defaults to 1.0, whereas it should use the completionRatioResolved value (which respects channel-level or global defaults). This leads to incorrect quota calculation for completion tokens on overridden free models.

Suggested change
if eff.InputRatio != 0 {
usedCompletionRatio = eff.OutputRatio / eff.InputRatio
} else {
baseComp = 1.0
usedCompletionRatio = 1.0
}
if eff.InputRatio != 0 {
usedCompletionRatio = eff.OutputRatio / eff.InputRatio
} else if hasChannelModelRatioOverride {
usedCompletionRatio = completionRatioResolved
} else {
usedCompletionRatio = 1.0
}

Copy link
Copy Markdown

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

Note

Copilot was unable to run its full agentic suite in this review.

This PR optimizes the quota computation hot path by reducing redundant completion-ratio lookups and eliminating manual tier iteration, and it speeds up Claude model detection via a faster ASCII-fold substring check.

Changes:

  • Consolidate completion ratio resolution in Compute to reduce repeated lookups.
  • Derive effective tier completion ratios from eff.OutputRatio / eff.InputRatio to avoid tier loops.
  • Optimize containsASCIIFold by precomputing the first-byte case variants to reduce per-iteration work.

Reviewed changes

Copilot reviewed 2 out of 2 changed files in this pull request and generated 2 comments.

File Description
relay/quota/quota.go Optimizes quota ratio resolution and speeds up ASCII-fold substring matching used in model detection.
.jules/bolt.md Documents the performance learnings and actions taken in this optimization pass.

Comment thread relay/quota/quota.go
Comment on lines +265 to 281
// substr is already expected to be lowercase from the caller (isClaudeModelName).
// We pre-calculate the uppercase variant of the first byte to allow a fast search
// that avoids calling asciiLower on every character in the model name string.
firstLower := substr[0]
var firstUpper byte
if firstLower >= 'a' && firstLower <= 'z' {
firstUpper = firstLower - ('a' - 'A')
} else {
firstUpper = firstLower
}

last := len(s) - len(substr)
for i := 0; i <= last; i++ {
if asciiLower(s[i]) != substr[0] {
// Fast path: match the first byte against both possible cases.
if s[i] != firstLower && s[i] != firstUpper {
continue
}
Copy link

Copilot AI Apr 24, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

containsASCIIFold implies case-insensitive matching, but the implementation assumes substr is already lowercase (and will fail if substr[0] is uppercase). To make the helper robust and self-contained, normalize the first search byte inside this function (e.g., compute firstLower from the folded/normalized value of substr[0]) and derive firstUpper from that; alternatively, rename the function to reflect the lowercase-needle requirement so callers don’t accidentally misuse it.

Copilot uses AI. Check for mistakes.
Comment thread relay/quota/quota.go
Comment on lines 40 to 41
// Compute calculates the quota required for the provided usage snapshot.
// It mirrors the logic used in controller helper functions so streaming
// billing and final reconciliation share the same pricing semantics.
func Compute(input ComputeInput) ComputeResult {
Copy link

Copilot AI Apr 24, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The removed doc comment lines explained an important invariant: Compute mirrors controller helper logic so streaming billing and final reconciliation share pricing semantics. Consider restoring that explanation (or an equivalent) since it’s a key behavioral contract for a billing hot path and helps prevent accidental divergence in future changes.

Copilot uses AI. Check for mistakes.
Copy link
Copy Markdown

@coderabbitai coderabbitai Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Caution

Some comments are outside the diff and can’t be posted inline due to platform limitations.

⚠️ Outside diff range comments (1)
relay/quota/quota.go (1)

230-247: 🛠️ Refactor suggestion | 🟠 Major

Remove orphaned helper resolveCompletionRatio—unused after inlining into Compute.

This unexported function (lines 230–247) was inlined into Compute and is now dead code with no remaining callers. Remove it to keep the package surface tight and avoid confusion.

Removal diff
-// resolveCompletionRatio returns the effective completion ratio for modelName.
-// Parameters: resolvedModelCfg is the already-loaded model config and hasResolvedModelCfg reports whether it exists.
-// Returns: the completion ratio after applying channel overrides, config values, and fallback pricing lookup.
-func resolveCompletionRatio(
-	modelName string,
-	resolvedModelCfg adaptor.ModelConfig,
-	hasResolvedModelCfg bool,
-	channelOverrides map[string]float64,
-	provider adaptor.Adaptor,
-) float64 {
-	if override, ok := channelOverrides[modelName]; ok {
-		return override
-	}
-	if hasResolvedModelCfg && resolvedModelCfg.CompletionRatio != 0 {
-		return resolvedModelCfg.CompletionRatio
-	}
-	return pricing.GetCompletionRatioWithThreeLayers(modelName, channelOverrides, provider)
-}
-
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@relay/quota/quota.go` around lines 230 - 247, Delete the now-unused helper
function resolveCompletionRatio (the unexported function that takes modelName,
resolvedModelCfg adaptor.ModelConfig, hasResolvedModelCfg, channelOverrides,
provider) from quota.go; ensure any remaining logic is served by the inlined
code in Compute and that there are no lingering references to
resolveCompletionRatio or calls to pricing.GetCompletionRatioWithThreeLayers
left elsewhere; remove the entire function block to keep the package surface
minimal and run a quick build to confirm no missing-symbol errors.
🧹 Nitpick comments (2)
relay/quota/quota.go (1)

96-100: Consider preserving completionRatioResolved in the zero-InputRatio fallback.

The derivation eff.OutputRatio / eff.InputRatio is mathematically equivalent to the previous tier-iteration (since eff.OutputRatio = in * comp in ResolveEffectivePricingFromConfig), so the optimization is correct. However, when eff.InputRatio == 0 you now hard-code usedCompletionRatio = 1.0, whereas the previous default was completionRatioResolved (set at line 86). Billing isn't affected in that branch because usedModelRatio is also 0, but the debug log at lines 199–207 will report completion_ratio=1.0 instead of the configured ratio, degrading diagnosability.

♻️ Proposed fix
-		if eff.InputRatio != 0 {
-			usedCompletionRatio = eff.OutputRatio / eff.InputRatio
-		} else {
-			usedCompletionRatio = 1.0
-		}
+		if eff.InputRatio != 0 {
+			usedCompletionRatio = eff.OutputRatio / eff.InputRatio
+		}
+		// else: keep the initial completionRatioResolved for accurate logging

Apply the same change to the adaptor-only branch at lines 107–111.

Also applies to: 107-111

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@relay/quota/quota.go` around lines 96 - 100, The zero-InputRatio fallback
currently assigns usedCompletionRatio = 1.0 which hides the configured
completionRatioResolved value in logs; instead, when eff.InputRatio == 0 set
usedCompletionRatio = completionRatioResolved so diagnostics reflect the
configured ratio; apply this fix in both places where usedCompletionRatio is
computed (the main branch using eff.InputRatio and the adaptor-only branch) and
keep the rest of the logic (usedModelRatio and
ResolveEffectivePricingFromConfig) unchanged.
.jules/bolt.md (1)

42-44: Nit: add a blank line after the heading for consistency.

Every other entry in this file separates the ## heading from the **Learning:** line with a blank line; this new entry does not.

♻️ Proposed fix
 ## 2026-04-24 - [Streamline Quota Calculation and Fast ASCII Fold]
+
 **Learning:** The quota calculation hot path in `relay/quota/quota.go` was performing manual tier iterations to resolve completion ratios, which is redundant since `eff.OutputRatio / eff.InputRatio` already provides the effective ratio for the matched tier. Additionally, string searches like `containsASCIIFold` (used for Claude detection) can be optimized by pre-calculating uppercase/lowercase variants of the first search byte to avoid expensive normalization calls in the search loop.
 **Action:** Consolidate ratio resolution by deriving tiered completion ratios from effective pricing results and use fast-path byte matching in string search functions.
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In @.jules/bolt.md around lines 42 - 44, Add a single blank line immediately
after the "## 2026-04-24 - [Streamline Quota Calculation and Fast ASCII Fold]"
heading so the heading is separated from the subsequent "**Learning:**"
paragraph, matching the file's established spacing convention for other entries.
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.

Outside diff comments:
In `@relay/quota/quota.go`:
- Around line 230-247: Delete the now-unused helper function
resolveCompletionRatio (the unexported function that takes modelName,
resolvedModelCfg adaptor.ModelConfig, hasResolvedModelCfg, channelOverrides,
provider) from quota.go; ensure any remaining logic is served by the inlined
code in Compute and that there are no lingering references to
resolveCompletionRatio or calls to pricing.GetCompletionRatioWithThreeLayers
left elsewhere; remove the entire function block to keep the package surface
minimal and run a quick build to confirm no missing-symbol errors.

---

Nitpick comments:
In @.jules/bolt.md:
- Around line 42-44: Add a single blank line immediately after the "##
2026-04-24 - [Streamline Quota Calculation and Fast ASCII Fold]" heading so the
heading is separated from the subsequent "**Learning:**" paragraph, matching the
file's established spacing convention for other entries.

In `@relay/quota/quota.go`:
- Around line 96-100: The zero-InputRatio fallback currently assigns
usedCompletionRatio = 1.0 which hides the configured completionRatioResolved
value in logs; instead, when eff.InputRatio == 0 set usedCompletionRatio =
completionRatioResolved so diagnostics reflect the configured ratio; apply this
fix in both places where usedCompletionRatio is computed (the main branch using
eff.InputRatio and the adaptor-only branch) and keep the rest of the logic
(usedModelRatio and ResolveEffectivePricingFromConfig) unchanged.

ℹ️ Review info
⚙️ Run configuration

Configuration used: defaults

Review profile: CHILL

Plan: Pro

Run ID: 4b52b8e6-073d-4103-be92-589d341b448c

📥 Commits

Reviewing files that changed from the base of the PR and between fd96426 and 234a147.

📒 Files selected for processing (2)
  • .jules/bolt.md
  • relay/quota/quota.go

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants