Skip to content

Limit diff output to 55,000 characters#26

Open
chrisbuetti wants to merge 1 commit intom1guelpf:mainfrom
chrisbuetti:MaxError
Open

Limit diff output to 55,000 characters#26
chrisbuetti wants to merge 1 commit intom1guelpf:mainfrom
chrisbuetti:MaxError

Conversation

@chrisbuetti
Copy link
Copy Markdown

Limit the diff output to 55,000 characters and add a message indicating truncation.

Limit the diff output to 55,000 characters and add a message indicating truncation.
// At ~4 chars/token, that's roughly 55,000 chars of diff we can safely include.
const MAX_DIFF_CHARS: usize = 55_000;
let output = if output.len() > MAX_DIFF_CHARS {
let truncated = &output[..output[..MAX_DIFF_CHARS].rfind('\n').unwrap_or(MAX_DIFF_CHARS)];
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Bug: The code slices a string at a fixed byte offset MAX_DIFF_CHARS. This will panic if the offset falls in the middle of a multi-byte UTF-8 character, crashing the program.
Severity: CRITICAL

Suggested Fix

Instead of slicing by a raw byte index, find a valid character boundary at or before MAX_DIFF_CHARS. You can achieve this by iterating through the string's characters and stopping once the byte length exceeds the limit, or by using is_char_boundary to find the last valid boundary before the limit. For example: let mut end = MAX_DIFF_CHARS; while !output.is_char_boundary(end) { end -= 1; } and then use end as the slicing index.

Prompt for AI Agent
Review the code at the location below. A potential bug has been identified by an AI
agent.
Verify if this is a real issue. If it is, propose a fix; if not, explain why it's not
valid.

Location: src/main.rs#L117

Potential issue: The code at `src/main.rs:117` truncates diff output by slicing the
string `output` at a fixed byte offset, `MAX_DIFF_CHARS`. However, `output` is a UTF-8
string, and slicing a `&str` at a byte index that does not align with a character
boundary will cause a panic. Since `git diff` output can contain multi-byte Unicode
characters (e.g., emoji, non-Latin scripts), there is a significant risk that the
`MAX_DIFF_CHARS` boundary will fall within a multi-byte character. This will trigger a
panic and crash the application when a user generates a large diff containing such
characters. The subsequent `rfind` call does not prevent this, as the problematic slice
occurs first.

Did we get this right? 👍 / 👎 to inform future reviews.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant