Skip to content
Open
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
15 changes: 15 additions & 0 deletions src/main.rs
Original file line number Diff line number Diff line change
Expand Up @@ -110,6 +110,21 @@ async fn main() -> Result<(), ()> {
.stdout;
let output = str::from_utf8(&output).unwrap();

// 16,385 token context minus 2,000 for completion and ~500 for system/function overhead.
// At ~4 chars/token, that's roughly 55,000 chars of diff we can safely include.
const MAX_DIFF_CHARS: usize = 55_000;
let output = if output.len() > MAX_DIFF_CHARS {
let truncated = &output[..output[..MAX_DIFF_CHARS].rfind('\n').unwrap_or(MAX_DIFF_CHARS)];
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Bug: The code slices a string at a fixed byte offset MAX_DIFF_CHARS. This will panic if the offset falls in the middle of a multi-byte UTF-8 character, crashing the program.
Severity: CRITICAL

Suggested Fix

Instead of slicing by a raw byte index, find a valid character boundary at or before MAX_DIFF_CHARS. You can achieve this by iterating through the string's characters and stopping once the byte length exceeds the limit, or by using is_char_boundary to find the last valid boundary before the limit. For example: let mut end = MAX_DIFF_CHARS; while !output.is_char_boundary(end) { end -= 1; } and then use end as the slicing index.

Prompt for AI Agent
Review the code at the location below. A potential bug has been identified by an AI
agent.
Verify if this is a real issue. If it is, propose a fix; if not, explain why it's not
valid.

Location: src/main.rs#L117

Potential issue: The code at `src/main.rs:117` truncates diff output by slicing the
string `output` at a fixed byte offset, `MAX_DIFF_CHARS`. However, `output` is a UTF-8
string, and slicing a `&str` at a byte index that does not align with a character
boundary will cause a panic. Since `git diff` output can contain multi-byte Unicode
characters (e.g., emoji, non-Latin scripts), there is a significant risk that the
`MAX_DIFF_CHARS` boundary will fall within a multi-byte character. This will trigger a
panic and crash the application when a user generates a large diff containing such
characters. The subsequent `rfind` call does not prevent this, as the problematic slice
occurs first.

Did we get this right? 👍 / 👎 to inform future reviews.

format!(
"{}\n\n[diff truncated — {}/{} bytes shown]",
truncated,
truncated.len(),
output.len()
)
} else {
output.to_string()
};

if !cli.dry_run {
info!("Loading Data...");
}
Expand Down