Skip to content
Draft
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension


Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
49 changes: 34 additions & 15 deletions chuck/tasks/graph_analytics/task.py
Original file line number Diff line number Diff line change
@@ -1,5 +1,6 @@
from __future__ import annotations

import numpy as np
Comment thread
coderabbitai[bot] marked this conversation as resolved.
from random import Random
from typing import Any

Expand All @@ -21,26 +22,44 @@ def generate(node_count: int, seed: int) -> dict[str, list[str]]:

def solve(graph: dict[str, list[str]], iterations: int = 16, damping: float = 0.85) -> dict[str, Any]:
nodes = sorted(graph)
if not nodes:
n = len(nodes)
if n == 0:
return {"node_count": 0, "top_node": "", "top_score": 0.0, "checksum": 0.0}

rank = {node: 1.0 / len(nodes) for node in nodes}
outgoing = {node: graph[node] if graph[node] else nodes for node in nodes}
base = (1.0 - damping) / len(nodes)
node_to_idx = {node: i for i, node in enumerate(nodes)}

src_list = []
dst_list = []
out_degrees = np.zeros(n)

for i, node in enumerate(nodes):
neighbors = graph[node]
out_degrees[i] = len(neighbors)
for neighbor in neighbors:
src_list.append(i)
dst_list.append(node_to_idx[neighbor])

src_indices = np.array(src_list)
dst_indices = np.array(dst_list)
Comment thread
coderabbitai[bot] marked this conversation as resolved.
Comment on lines +42 to +43
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟠 Major

🧩 Analysis chain

🏁 Script executed:

#!/bin/bash
python -m pip install --quiet numpy
python - <<'PY'
import numpy as np

bad = np.array([])
good = np.array([], dtype=np.intp)
ranks = np.array([1.0])

for label, idx in [("bad", bad), ("good", good)]:
    print(label, idx.dtype)
    try:
        print(ranks[idx])
    except Exception as exc:
        print(type(exc).__name__, exc)
PY

Repository: iiitl/chuck

Length of output: 151


🏁 Script executed:

head -60 chuck/tasks/graph_analytics/task.py | tail -30

Repository: iiitl/chuck

Length of output: 921


🏁 Script executed:

head -70 chuck/tasks/graph_analytics/task.py | cat -n

Repository: iiitl/chuck

Length of output: 2466


Force an integer dtype for empty edge lists.

If a caller passes a graph whose adjacency lists are all empty, these arrays become float64, and line 50 then raises IndexError on ranks[src_indices]. Using an explicit integer index dtype keeps the zero-edge case as an empty no-op instead of crashing.

🩹 Proposed fix
-    src_indices = np.array(src_list)
-    dst_indices = np.array(dst_list)
+    src_indices = np.array(src_list, dtype=np.intp)
+    dst_indices = np.array(dst_list, dtype=np.intp)
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@chuck/tasks/graph_analytics/task.py` around lines 42 - 43, src_list/dst_list
are converted to numpy arrays without a dtype, so when empty they default to
float64 and later indexing into ranks with src_indices (in function/task using
src_indices and dst_indices) raises IndexError; fix by constructing src_indices
and dst_indices with an explicit integer dtype (e.g., np.intp or np.int64) when
calling np.array(src_list) and np.array(dst_list) so empty edge lists remain
integer-typed and indexing becomes a no-op.


ranks = np.full(n, 1.0 / n)
teleport_base = (1.0 - damping) / n

for _ in range(iterations):
new_rank = {node: base for node in nodes}
for node in nodes:
share = rank[node] / len(outgoing[node])
for target in outgoing[node]:
new_rank[target] += damping * share
rank = new_rank
top_node = max(nodes, key=lambda node: (rank[node], node))
checksum = sum((index + 1) * rank[node] for index, node in enumerate(nodes))
new_ranks = np.full(n, teleport_base)
contributions = (ranks[src_indices] / out_degrees[src_indices]) * damping
np.add.at(new_ranks, dst_indices, contributions)
ranks = new_ranks

top_idx = np.argmax(ranks)
top_node = nodes[top_idx]
checksum = np.sum(np.arange(1, n + 1) * ranks)
Comment on lines +54 to +56
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟡 Minor

🧩 Analysis chain

🏁 Script executed:

#!/bin/bash
# Search for original PageRank top-node selection logic in git history
git log -p --all -S 'max(nodes' -- '*.py' | head -80

Repository: iiitl/chuck

Length of output: 37


🏁 Script executed:

# First, check the current state of the file around lines 54-56
wc -l chuck/tasks/graph_analytics/task.py

Repository: iiitl/chuck

Length of output: 93


🏁 Script executed:

# Show the context around lines 54-56
sed -n '45,65p' chuck/tasks/graph_analytics/task.py

Repository: iiitl/chuck

Length of output: 650


🏁 Script executed:

# Check git history for this file to find the original implementation
git log --oneline chuck/tasks/graph_analytics/task.py | head -20

Repository: iiitl/chuck

Length of output: 127


🏁 Script executed:

# Look for the original code in git diff
git log -p chuck/tasks/graph_analytics/task.py | head -200

Repository: iiitl/chuck

Length of output: 37


🏁 Script executed:

# Show the specific commit that changed the PageRank implementation
git show 180e97b

Repository: iiitl/chuck

Length of output: 50370


🏁 Script executed:

# Check if there are any earlier commits or if this is the initial commit
git log --all -- chuck/tasks/graph_analytics/task.py

Repository: iiitl/chuck

Length of output: 434


🏁 Script executed:

# Look for the original code before the optimization commit
git show 180e97b^:chuck/tasks/graph_analytics/task.py 2>/dev/null || echo "No previous version found"

Repository: iiitl/chuck

Length of output: 80


🏁 Script executed:

# Check the previous commit to see if there was an original implementation
git show 3bb6d58:chuck/tasks/graph_analytics/task.py 2>/dev/null | head -100

Repository: iiitl/chuck

Length of output: 1708


Tie-breaking behavior differs from original implementation.

The original max(nodes, key=lambda node: (rank[node], node)) selects the lexicographically larger node name when ranks are tied, while np.argmax(ranks) returns the first occurrence, selecting the lexicographically smaller node name (since nodes are sorted).

Though regression tests pass (suggesting ties may not occur in practice), this is a correctness discrepancy. To align with the original behavior:

-    top_idx = np.argmax(ranks)
-    top_node = nodes[top_idx]
+    max_rank = np.max(ranks)
+    # Among nodes with max rank, pick lexicographically largest (matches original behavior)
+    top_indices = np.where(ranks == max_rank)[0]
+    top_idx = top_indices[-1]  # Last index among ties = largest node name
+    top_node = nodes[top_idx]
📝 Committable suggestion

‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.

Suggested change
top_idx = np.argmax(ranks)
top_node = nodes[top_idx]
checksum = np.sum(np.arange(1, n + 1) * ranks)
max_rank = np.max(ranks)
# Among nodes with max rank, pick lexicographically largest (matches original behavior)
top_indices = np.where(ranks == max_rank)[0]
top_idx = top_indices[-1] # Last index among ties = largest node name
top_node = nodes[top_idx]
checksum = np.sum(np.arange(1, n + 1) * ranks)
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@chuck/tasks/graph_analytics/task.py` around lines 54 - 56, The current
selection using np.argmax(ranks) (top_idx/top_node) breaks ties differently than
the original max(nodes, key=...) behavior: np.argmax picks the first occurrence,
while the original chose the lexicographically larger node when ranks tie.
Replace the np.argmax/top_idx/top_node logic with a direct max over nodes using
the tuple key (ranks[node], node) so ties pick the node with the larger name
(e.g., top_node = max(nodes, key=lambda node: (ranks[node], node))); leave
checksum calculation (checksum = np.sum(np.arange(1, n + 1) * ranks)) unchanged.
Ensure references to top_idx are removed or updated accordingly.


return {
"node_count": len(nodes),
"node_count": n,
"top_node": top_node,
"top_score": round6(rank[top_node]),
"checksum": round6(checksum),
"top_score": round6(ranks[top_idx]),
"checksum": round6(float(checksum)),
}


Expand Down
3 changes: 3 additions & 0 deletions pyproject.toml
Original file line number Diff line number Diff line change
Expand Up @@ -8,6 +8,9 @@ version = "0.1.0"
description = "Solver toolkit for 10 computational tasks with optional native C++ backends"
requires-python = ">=3.10"
readme = "README.md"
dependencies = [
"numpy>=1.20.0",
]

[tool.setuptools]

Expand Down
1 change: 1 addition & 0 deletions requirements.txt
Original file line number Diff line number Diff line change
@@ -0,0 +1 @@
numpy
Loading