Skip to content

chore(ai): Add check-code-attribution skill (JAVA-499)#5449

Open
0xadam-brown wants to merge 1 commit into
mainfrom
chore/check-code-attribution-skill-with-warden
Open

chore(ai): Add check-code-attribution skill (JAVA-499)#5449
0xadam-brown wants to merge 1 commit into
mainfrom
chore/check-code-attribution-skill-with-warden

Conversation

@0xadam-brown
Copy link
Copy Markdown
Member

@0xadam-brown 0xadam-brown commented May 19, 2026

📜 Description

Adds a check-code-attribution skill that validates license headers and THIRD_PARTY_NOTICES.md entries for code copied or adapted from third parties. Also verifies license compatibility against Sentry's Licensing Policy.

The skill focuses on the branch diff only. It's a pure-LLM approach, in contrast to the part-deterministic, part-LLM approach we decided against from #5401.

Reports findings via PR comments when run on CI, or to the terminal when run locally.

Local

To run it from Claude Code:

/check-code-attribution

CI

Warden configuration ensures the skill runs automatically on all PRs:

  • Purely advisory / does not block merge (can enable blocking later, once we've worked out any rough edges)
  • Generates PR comments with code suggestions for discovered issues
  • Automatically removes stale comments as PRs are updated

💡 Motivation and Context

Third-party code attribution is a legal and compliance requirement. Currently, attribution correctness is only caught during manual code review. This skill automates detection of vendored code in branch diffs and can help us flag missing or incomplete attributions before a PR is merged.

Background: Click to expand

Sentry SDKs and third-party code

3 possible ways third-party code enters Sentry’s SDKs (including sentry-java): 

  1. Plain vanilla dependencies
  2. Shaded code
  3. Vendored code

All third-party code must be properly attributed, and licenses must be compatible with Sentry’s licensing policies

  • Plain deps + shaded code: We run an enforce-license-compliance GitHub workflow that applies a FOSSA check to all plain vanilla dependencies and our few shaded dependencies, which ensures their licenses are properly attributed and are compatible with Sentry’s licensing policies. 

  • Vendored code: Relies on a manual process where developers add attributions to files containing vendored code + include a corresponding entry is included in the THIRD_PARTY_NOTICES.md file that ships with the SDK. Developers are also responsible for ensuring license compatibility.

The criteria for what counts as a proper attribution of vendored code lives in the AGENTS.md file under the heading “Third-Party Code Attribution”.

Goal of this PR: Create a skill that helps us properly attribute vendored code

Types of vendored code: 

  1. Vendored code that’s already properly attributed.
  2. Vendored code that has an attribution, but it’s incomplete or doesn’t otherwise conform to the criteria from AGENTS.md.
  3. Vendored code that has no attribution / no indication that it’s vendored. 

The skill introduced in this PR protects (1) from regression and identifies instances of (2). (Addressing (3) is out of scope – and is obviously non-trivial.)

  • addresses: JAVA-499

⚠️ Callouts

Skill does not mandate that license headers exactly match the template from AGENTS.md (link) so long as all template fields are present.

That^^ lets us maintain our current, diverse header formats and remain relatively unopinionated going forward.

Example output

Local runs

Local output

PR comments

GitHub output

💚 How did you test it?

[1] Automated validation tests (check-code-attribution-tests.sh) with scenario files covering:

Test results

Note: the tests are not run on CI / are only run manually atm (see the check-code-attribution-tests.sh script).

[2] Ran the skill on branches with known attribution issues to verify correct detection and reporting.

Manual tests + output: Click to expand

Note the skill's output format has changed since these tests were run, but the behavior remains the same.

Diff 1: Remove entire license header

diff --git a/sentry-android-core/src/main/java/io/sentry/android/core/ANRWatchDog.java b/sentry-android-core/src/main/java/io/sentry/android/core/ANRWatchDog.java
index b726dd0c8..0c11522c1 100644
--- a/sentry-android-core/src/main/java/io/sentry/android/core/ANRWatchDog.java
+++ b/sentry-android-core/src/main/java/io/sentry/android/core/ANRWatchDog.java
@@ -1,27 +1,4 @@
-/*
- * Adapted from https://github.com/SalomonBrys/ANR-WatchDog/blob/1969075f75f5980e9000eaffbaa13b0daf282dcb/anr-watchdog/src/main/java/com/github/anrwatchdog/ANRWatchDog.java
- *
- * The MIT License (MIT)
- *
- * Copyright (c) 2016 Salomon BRYS
- *
- * Permission is hereby granted, free of charge, to any person obtaining a copy of
- * this software and associated documentation files (the "Software"), to deal in
- * the Software without restriction, including without limitation the rights to
- * use, copy, modify, merge, publish, distribute, sublicense, and/or sell copies of
- * the Software, and to permit persons to whom the Software is furnished to do so,
- * subject to the following conditions:
- *
- * The above copyright notice and this permission notice shall be included in all
- * copies or substantial portions of the Software.
- *
- * THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR
- * IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY, FITNESS
- * FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT. IN NO EVENT SHALL THE AUTHORS OR
- * COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER LIABILITY, WHETHER
- * IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING FROM, OUT OF OR IN
- * CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN THE SOFTWARE.
- */
+/* ANRWatchDog implementation */

Output 1

  1. ⚠️ File: io.sentry.android.core.ANRWatchDog
    Required attribution field(s) removed:
    • The entire MIT license header (ANR-WatchDog by Salomon BRYS) was stripped and replaced with a generic comment. Restore the original attribution header.

Diff 2: Modify existing license header, but retain all required fields

diff --git a/sentry/src/main/java/io/sentry/CircularFifoQueue.java b/sentry/src/main/java/io/sentry/CircularFifoQueue.java
index 8fa72e39d..f0d7d6084 100644
--- a/sentry/src/main/java/io/sentry/CircularFifoQueue.java
+++ b/sentry/src/main/java/io/sentry/CircularFifoQueue.java
@@ -1,20 +1,12 @@
 /*
  * Adapted from https://github.com/apache/commons-collections/blob/fce46cdcc6fa33ba9472921d4b3ec3f548d8cbcc/src/main/java/org/apache/commons/collections4/queue/CircularFifoQueue.java
  *
- * Licensed to the Apache Software Foundation (ASF) under one or more
- * contributor license agreements.  See the NOTICE file distributed with
- * this work for additional information regarding copyright ownership.
- * The ASF licenses this file to You under the Apache License, Version 2.0
+ * Copyright 2025 The Apache Software Foundation.
+ * Licensed under the Apache License, Version 2.0
  * (the "License"); you may not use this file except in compliance with
  * the License.  You may obtain a copy of the License at
  *
  *      http://www.apache.org/licenses/LICENSE-2.0
- *
- * Unless required by applicable law or agreed to in writing, software
- * distributed under the License is distributed on an "AS IS" BASIS,
- * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
- * See the License for the specific language governing permissions and
- * limitations under the License.
  */

 package io.sentry;

Output 2

  1. 👀 File: io.sentry.CircularFifoQueue
    Vendored code detected (Apache Commons Collections) – verify that THIRD_PARTY_NOTICES.md reflects your updates.

Diff 3: Modify existing license header by removing one or more required fields

diff --git a/sentry/src/main/java/io/sentry/transport/ReusableCountLatch.java b/sentry/src/main/java/io/sentry/transport/ReusableCountLatch.java
index cfa0b68a6..fb7c4171e 100644
--- a/sentry/src/main/java/io/sentry/transport/ReusableCountLatch.java
+++ b/sentry/src/main/java/io/sentry/transport/ReusableCountLatch.java
@@ -1,8 +1,6 @@
 /*
  * Adapted from https://github.com/MatejTymes/JavaFixes/blob/37e74b9d0a29f7a47485c6d1bb1307f01fb93634/src/main/java/javafixes/concurrency/ReusableCountLatch.java
  *
- * Copyright (C) 2016 Matej Tymes
- *
  * Licensed under the Apache License, Version 2.0 (the "License");
  * you may not use this file except in compliance with the License.
  * You may obtain a copy of the License at
@@ -14,8 +12,6 @@
  * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
  * See the License for the specific language governing permissions and
  * limitations under the License.
- *
- * Additional Sentry modifications: Updated synchronizer for virtual thread compatibility.
  */

 package io.sentry.transport;

Output 3

  1. ⚠️ File: io.sentry.transport.ReusableCountLatch
    Required attribution field(s) removed:
    • The copyright line Copyright (C) 2016 Matej Tymes was removed from the license header. Please restore it.

Diff 4: Leave existing license header unchanged, but make an inconsistent modification to THIRD_PARTY_NOTICES.md entry

diff --git a/THIRD_PARTY_NOTICES.md b/THIRD_PARTY_NOTICES.md
index 5a48d567f..08b6cd78a 100644
--- a/THIRD_PARTY_NOTICES.md
+++ b/THIRD_PARTY_NOTICES.md
@@ -94,42 +94,14 @@ limitations under the License.

 **Source:** https://github.com/square/tape (Commit: 445cd3fd0a7b3ec48c9ea3e0e86663fe6d3735d8)<br>
 **License:** Apache License 2.0<br>
-**Copyright:** Copyright (C) 2010 Square, Inc.
+**Copyright:** Copyright (C) 2015 Square, Inc.

Output 4

  1. ⚠️ NOTICES entry modified: Square — Tape (Apache 2.0)
    Entry metadata inconsistent with source file headers:
    • Copyright year changed to 2015 in THIRD_PARTY_NOTICES.md, but the source files (QueueFile.java, FileObjectQueue.java, ObjectQueue.java) all still say "Copyright (C) 2010
      Square, Inc."

Diff 5: Leave existing license header unchanged, but remove THIRD_PARTY_NOTICES.md entry

diff --git a/THIRD_PARTY_NOTICES.md b/THIRD_PARTY_NOTICES.md
index 5a48d567f..08b6cd78a 100644
--- a/THIRD_PARTY_NOTICES.md
+++ b/THIRD_PARTY_NOTICES.md
@@ -94,42 +94,14 @@ limitations under the License.
-
-## Square — Seismic (Apache 2.0)
-
-**Source:** https://github.com/square/seismic<br>
-**License:** Apache License 2.0<br>
-**Copyright:** Copyright 2010 Square, Inc.
-
-### Scope
-
-The Sentry Java SDK includes an adapted version of Square's Seismic shake detection algorithm. The rolling sample window approach and `SampleQueue`/`SamplePool` data structures in `io.sentry.android.core.SentryShakeDetector` are based on Seismic's `ShakeDetector`.

Output 5

  1. ⚠️ NOTICES entry removed: Square — Seismic (Apache 2.0)
    Source file(s) still reference this library:
    - io.sentry.android.core.SentryShakeDetector still contains attribution header for Square's Seismic. Either restore the THIRD_PARTY_NOTICES.md entry or remove the vendored code.

Diff 6: Add newly-vendored code with valid license header and THIRD_PARTY_NOTICES.md entry

diff --git a/sentry/src/main/java/io/sentry/util/SlidingWindow.java b/sentry/src/main/java/io/sentry/util/SlidingWindow.java
new file mode 100644
index 000000000..936aa0687
--- /dev/null
+++ b/sentry/src/main/java/io/sentry/util/SlidingWindow.java
@@ -0,0 +1,42 @@
+// Adapted from Metrics-Java SlidingWindowReservoir.
+// Copyright 2010-2023 Coda Hale and Yammer, Inc.
+// Licensed under the Apache License, Version 2.0.
+// https://github.com/dropwizard/metrics/blob/main/metrics-core/src/main/java/com/codahale/metrics/SlidingWindowReservoir.java
+package io.sentry.util;
+
+import java.util.concurrent.atomic.AtomicLong;
+
+public final class SlidingWindow<T> {

Output 6

  1. 👀 File: io.sentry.util.SlidingWindow
    Vendored code detected (Dropwizard Metrics SlidingWindowReservoir) – verify that THIRD_PARTY_NOTICES.md reflects your updates.

Diff 7: Add newly-vendored code with valid license header but no THIRD_PARTY_NOTICES.md entry

diff --git a/sentry/src/main/java/io/sentry/util/ConcurrentLruCache.java b/sentry/src/main/java/io/sentry/util/ConcurrentLruCache.java
new file mode 100644
index 000000000..330b92794
--- /dev/null
+++ b/sentry/src/main/java/io/sentry/util/ConcurrentLruCache.java
@@ -0,0 +1,36 @@
+// Adapted from Caffeine Cache.
+// Copyright 2024 Ben Manes.
+// Licensed under the Apache License 2.0.
+// https://github.com/ben-manes/caffeine/blob/master/caffeine/src/main/java/com/github/benmanes/caffeine/cache/BoundedLocalCache.java
+package io.sentry.util;
+
+import java.util.LinkedHashMap;
+import java.util.Map;
+
+/**
+ * A simple thread-safe LRU cache backed by a synchronized LinkedHashMap.
+ */
+public final class ConcurrentLruCache<K, V> {

Output 7

  1. ⚠️ File: io.sentry.util.ConcurrentLruCache
    Vendored code detected (Caffeine Cache) — attribution header is complete.
    • No corresponding entry in THIRD_PARTY_NOTICES.md. An entry needs to be added.

Diff 8: Add newly-vendored code with an invalid license header and existing THIRD_PARTY_NOTICES.md entry

diff --git a/sentry/src/main/java/io/sentry/util/LeakyBucket.java b/sentry/src/main/java/io/sentry/util/LeakyBucket.java
new file mode 100644
index 000000000..4bce4f602
--- /dev/null
+++ b/sentry/src/main/java/io/sentry/util/LeakyBucket.java
@@ -0,0 +1,38 @@
+// Adapted from Resilience4j RateLimiter.
+// https://github.com/resilience4j/resilience4j/blob/master/resilience4j-ratelimiter/src/main/java/io/github/resilience4j/ratelimiter/internal/AtomicRateLimiter.java
+package io.sentry.util;
+
+public final class LeakyBucket {

Output 8

  1. ⚠️ File: io.sentry.util.LeakyBucket
    Vendored code detected (Resilience4j RateLimiter) — missing required fields:
    • Header is missing the copyright holder and license name. Add lines for Copyright 2019 Robert Winkler and Bohdan Storozhuk and Licensed under the Apache License, Version 2.0
      (per the existing THIRD_PARTY_NOTICES.md entry).

Diff 9: Add newly-vendored code with an invalid license header and no THIRD_PARTY_NOTICES.md entry

diff --git a/sentry/src/main/java/io/sentry/util/TokenBucket.java b/sentry/src/main/java/io/sentry/util/TokenBucket.java
new file mode 100644
index 000000000..d09f06af9
--- /dev/null
+++ b/sentry/src/main/java/io/sentry/util/TokenBucket.java
@@ -0,0 +1,36 @@
+// Adapted from Guava RateLimiter.
+// https://github.com/google/guava/blob/master/guava/src/com/google/common/util/concurrent/RateLimiter.java
+package io.sentry.util;
+
+/**
+ * A simple token bucket rate limiter.
+ */
+public final class TokenBucket {

Output 9

  1. ⚠️ File: io.sentry.util.TokenBucket
    Vendored code detected (Guava RateLimiter) — missing required fields:
    • Header is missing copyright year, copyright holder, and license name. Only the source URL and library name are present.
    • No corresponding entry in THIRD_PARTY_NOTICES.md. An entry needs to be added for Guava RateLimiter.

Diff 10: Add newly-vendored code with an invalid license header, no THIRD_PARTY_NOTICES.md entry, and a new license type

diff --git a/sentry/src/main/java/io/sentry/util/CompactJsonWriter.java b/sentry/src/main/java/io/sentry/util/CompactJsonWriter.java
new file mode 100644
index 000000000..d9c9a342e
--- /dev/null
+++ b/sentry/src/main/java/io/sentry/util/CompactJsonWriter.java
@@ -0,0 +1,60 @@
+/*
+ * Copyright 2024 Example JSON Project Contributors.
+ * SPDX-License-Identifier: BSD-3-Clause
+ * https://github.com/example-json/compact-writer
+ */
+package io.sentry.util;
+
+import java.io.IOException;
+import java.io.Writer;
+
+/**
+ * A lightweight JSON writer that produces compact (no whitespace) output.
+ */
+public final class CompactJsonWriter {

Output 10

  1. ⚠️ File: io.sentry.util.CompactJsonWriter
    Vendored code detected (compact-writer) — missing required fields:
    • Header is missing an "Adapted from" statement identifying the library name. Copyright, license (BSD-3-Clause), and source URL are present.
    • No corresponding entry in THIRD_PARTY_NOTICES.md. An entry needs to be added.
    • ❗This license type (BSD-3-Clause) is not yet represented in THIRD_PARTY_NOTICES.md. Please verify it is compatible with Sentry's licensing policies:
      https://open.sentry.io/licensing/.

Diff 11: False positive

diff --git a/THIRD_PARTY_NOTICES.md b/THIRD_PARTY_NOTICES.md
index 5a48d567f..57c0cc359 100644
--- a/THIRD_PARTY_NOTICES.md
+++ b/THIRD_PARTY_NOTICES.md
@@ -484,3 +484,81 @@ LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING FROM,
 OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN THE
 SOFTWARE.
+
+---
+
+## Eclipse Collections — CircularArrayList (EPL 2.0)
+
+**Source:** https://github.com/eclipse/eclipse-collections/blob/master/eclipse-collections/src/main/java/org/eclipse/coll
ections/impl/list/mutable/CircularArrayList.java<br>
+**License:** Eclipse Public License 2.0<br>
+**Copyright:** Copyright (c) 2022 Goldman Sachs and others
+
+### Scope
+
+The Sentry Java SDK includes an adapted circular buffer implementation from Eclipse Collections. The code resides in io.sentry.util.CircularBuffer.
+
+Copyright (c) 2022 Goldman Sachs and others.
+
+This program and the accompanying materials are made available under the
+terms of the Eclipse Public License 2.0 which is available at
+http://www.eclipse.org/legal/epl-2.0.
+
+SPDX-License-Identifier: EPL-2.0

Output 11 (numbered as “1.” because it’s the first entry in the False Positives section)

  1. THIRD_PARTY_NOTICES.md — Flagged because attribution markers were added to it. This is expected; the file is the notices file itself.

[3] Local Warden runs.

Running Warden locally: Click to expand

Added to my .*profile:

# Warden: local PR-style run for the current repo branch
# Usage: warden-local <skill-name> [extra warden args...]
# Example: warden-local check-code-attribution
#          wl check-code-attribution --fail-on high
warden-local() {
  local skill="${1:?Usage: warden-local <skill-name> [warden args...]}"
  shift
  local repo_root
  repo_root="$(git rev-parse --show-toplevel 2>/dev/null)" || {
    print -u2 "warden-local: not inside a git repository"
    return 1
  }
  if [[ ! -f "${repo_root}/warden.toml" ]]; then
    print -u2 "warden-local: no warden.toml in ${repo_root}"
    return 1
  fi
  local base_ref="origin/main"
  if ! git -C "${repo_root}" rev-parse --verify "${base_ref}" &>/dev/null; then
    base_ref="main"
  fi
  (
    cd "${repo_root}" || return 1
    npx @sentry/warden "${base_ref}..HEAD" --skill "${skill}" -vv "$@"
  )
}

alias warden='warden-local'

[4] Pushed up diffs with attribution violations in a draft PR to vet the UX (see #5444).

📝 Checklist

  • I added GH Issue ID & Linear ID
  • I added tests to verify the changes.
  • No new PII added or SDK only sends newly added PII if sendDefaultPII is enabled.
  • I updated the docs if needed.
  • I updated the wizard if needed.
  • Review from the native team if needed.
  • No breaking change or entry added to the changelog.
  • No breaking change for hybrid SDKs or communicated to hybrid SDKs.

🔮 Next steps

  • Consider enabling failOn / failCheck once we've vetted behavior in the wild

#skip-changelog

@linear-code
Copy link
Copy Markdown

linear-code Bot commented May 19, 2026

JAVA-499

@sentry
Copy link
Copy Markdown

sentry Bot commented May 19, 2026

📲 Install Builds

Android

🔗 App Name App ID Version Configuration
SDK Size io.sentry.tests.size 8.41.0 (1) release

⚙️ sentry-android Build Distribution Settings

Adds a check-code-attribution skill that validates license headers + THIRD_PARTY_NOTICES.md entries for code copied or adapted from third parties. Also verifies license compatiblity against Sentry's licensing policy.

Focus is limited to the branch diff. Reports any issues found via PR comments (when run on CI) or to the terminal (when run locally).

To run it in Claude Code:

 ```
 /check-code-attribution
 ```

Runs on CI automatically via [Warden](https://warden.sentry.dev/).

- Purely advisory / does not block merge.
- Generates PR comments with code suggestions for all discovered issues.
- Automatically manages removing stale comments as PRs are updated.

Current Warden configs:

  ┌─────────────────┬─────────────────────────────┬───────────────────────────────────────────────────┐
  │     Setting     │            Value            │                      Effect                       │
  ├─────────────────┼─────────────────────────────┼───────────────────────────────────────────────────┤
  │ model           │ anthropic/claude-sonnet-4-6 │ Model used for analysis                           │
  ├─────────────────┼─────────────────────────────┼───────────────────────────────────────────────────┤
  │ maxTurns        │ 30                          │ Max tool calls per chunk                          │
  ├─────────────────┼─────────────────────────────┼───────────────────────────────────────────────────┤
  │ skill           │ check-code-attribution      │ Per-file vendored code attribution check          │
  ├─────────────────┼─────────────────────────────┼───────────────────────────────────────────────────┤
  │ failOn          │ off                         │ Do not fail workflow if attribution issues found  │
  ├─────────────────┼─────────────────────────────┼───────────────────────────────────────────────────┤
  │ reportOn        │ medium                      │ Show findings at >= medium severity via PR comment│
  ├─────────────────┼─────────────────────────────┼───────────────────────────────────────────────────┤
  │ requestChanges  │ false                       │ Never post REQUEST_CHANGES comments on PRs        │
  ├─────────────────┼─────────────────────────────┼───────────────────────────────────────────────────┤
  │ failCheck       │ false                       │ No red X on workflow in GitHub UI if it fails     │
  ├─────────────────┼─────────────────────────────┼───────────────────────────────────────────────────┤
  │ triggers        │ pull_request + local        │ Runs on PR open/sync and local warden invocations │
  ├─────────────────┼─────────────────────────────┼───────────────────────────────────────────────────┤
  │ reportOnSuccess │ false (default)             │ No comment when everything is clean               │
  └─────────────────┴─────────────────────────────┴───────────────────────────────────────────────────┘

Going forward, we can consider blocking PRs once we've had a chance to vet behavior in the wild.
@0xadam-brown 0xadam-brown force-pushed the chore/check-code-attribution-skill-with-warden branch from 9ced7c7 to 93b92d7 Compare May 19, 2026 22:00
@0xadam-brown 0xadam-brown marked this pull request as ready for review May 19, 2026 22:08
Comment on lines +299 to +305
} else {
const reason = s.expectFinding
? 'expected finding (>= medium), got none'
: `expected no finding (>= medium), got ${count}`;
console.log(`${RED}FAIL${RESET} ${s.id} (${reason})`);

failures.push({
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Bug: The validateExpected function only validates .java fixture files, ignoring other file types like .md. This can lead to silently passing tests if non-Java fixtures are missing.
Severity: MEDIUM

Suggested Fix

Modify the validateExpected function to check all files in the scenarios directory against the EXPECTED.json manifest, not just .java files. This ensures all test fixtures are accounted for during validation.

Prompt for AI Agent
Review the code at the location below. A potential bug has been identified by an AI
agent. Verify if this is a real issue. If it is, propose a fix; if not, explain why it's
not valid.

Location:
.claude/skills/check-code-attribution/validation-tests/assert-scenarios.mjs#L299-L305

Potential issue: The validation logic in the `validateExpected` function is incomplete.
It filters for and validates only `.java` files found on disk against the
`EXPECTED.json` manifest. Consequently, other required test fixture files, such as
`THIRD_PARTY_NOTICES.mismatch-snippet.md`, are not validated. If such a file were
accidentally deleted or renamed, the validation step would pass silently. The test would
only fail later when a shell script attempts to access the non-existent file, making the
root cause harder to identify.

Did we get this right? 👍 / 👎 to inform future reviews.

Copy link
Copy Markdown

@cursor cursor Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Cursor Bugbot has reviewed your changes and found 1 potential issue.

Fix All in Cursor

❌ Bugbot Autofix is OFF. To automatically fix reported issues with cloud agents, enable autofix in the Cursor dashboard.

Reviewed by Cursor Bugbot for commit 93b92d7. Configure here.


## Example — HeaderCompleteAndNoticePresent (Apache 2.0)

**Source:** https://github.com/example/complete-with-notices<br>
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Catalog source URL mismatches scenario header URLs

Medium Severity

The header-complete-and-notice-present scenario expects no findings, but the catalog Source URL (https://github.com/example/complete-with-notices) doesn't match either URL in the Java file header (https://github.com/example, https://github.com/example/something). Since source URL is a required field and SKILL.md flags header-vs-NOTICES inconsistencies on required fields, the AI can intermittently report a finding here, making this negative test case flaky.

Additional Locations (1)
Fix in Cursor Fix in Web

Reviewed by Cursor Bugbot for commit 93b92d7. Configure here.

@@ -0,0 +1,21 @@
/*
Copy link
Copy Markdown
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Most of this PR's diff is test fixtures for the automated tests (runnable via check-code-attribution-tests.sh).

The interesting (ie, non-test) parts live in SKILL.md and warden.toml.

Copy link
Copy Markdown
Contributor

@runningcode runningcode left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for the well thought out approach as well as all the tests. I think we don't have so much precedent for testing skills. Other repos use an LLM to judge that skills work correctly. I think that is a bit overkill here so I think your approach with the bash script is a good compromise. To be honest, I wasn't expecting any tests here.

I've approved this PR. You can consider all my comments to be nits and I'm happy to discuss them.
I wonder if we or someone has best practices on creating skills. In my head I imagined that skills should only describe "what" needs to be done and letting the LLM figure out the "how" rather than explaining the "how" in the skill. I've left some comments to that effect.

- **Escape the period** after the number (`1\.` not `1.`) so markdown does not collapse entries into a tight list.
- Leave an empty line between each numbered finding.

## Validation (maintainers)
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Should validation of the skill itself be a part of the skill? I feel like this could pollute the context window.


### Warden CLI (optional local parity check)

Warden does **not** use Cursor auth. Before running Warden locally, configure a provider (same model family as `warden.toml`, or override with `-m`):
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

why do we need these instructions in this skill?

For all other files, perform these checks **before** deciding whether to proceed:

1. **Read the file header** — use the Read tool to read the first 50 lines of the file. Look for vendored-code signals: `Copyright`, `Licensed under`, `SPDX-License-Identifier`, or vendoring language ("adapted from", "backported from", "based on", "copied from", "derived from", "inspired by", "ported from", "translated from", "vendored").
2. **Check THIRD_PARTY_NOTICES.md** — use Grep to search `THIRD_PARTY_NOTICES.md` for the file name without extension (e.g., search for `ANRWatchDog` when reviewing `ANRWatchDog.java`). A match means this is a known vendored file. **Renames:** if the diff is a rename (`similarity index` / `rename from` in the diff, or a delete of one path and add of another with the same content), also Grep for the **old** basename and read **Scope** sections in matching entries — NOTICES may still reference the previous class or path name.
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nit: I don't think we should be presciptive about what tools to use

**Mandatory on every run (do not skip):**

1. `Read` the first 50 lines of the changed file.
2. `Grep` `THIRD_PARTY_NOTICES.md` for the class name (filename without extension, e.g. `ANRWatchDog` for `ANRWatchDog.java`). On renames, also grep the old basename and read Scope sections (see Quick triage).
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nit: i don't think we should be prescriptive about what tools to use. (like git)

allowed-tools: Bash Read Grep Glob
---

**Maintainers:** Only edit files in `.claude/skills/check-code-attribution` (the committed file) and run `npx @sentry/dotagents sync` from the command line to automatically update the matching files in `.agents/skills/check-code-attribution`.
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think this line should be in the README.md not SKILL.md otherwise we are polluting the context window.


You are reviewing changed files for third-party code attribution compliance in **sentry-java**, an MIT-licensed repository.

## Local runs — discover changed files first
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Do we really need the local use case? I feel like it adds a lot of tokens.

I would ask the same thing about the warden CLI. What happens if you don't have these sections?

Comment thread warden.toml
# globally but are tuned for check-code-attribution. Attribution checks need the full
# file header and a NOTICES cross-check — not isolated diff hunks.
[[defaults.chunking.filePatterns]]
pattern = "**/*.api"
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why don't we want warden looking at .api?

Comment thread warden.toml
# Tighten to failOn = "medium" / requestChanges = true once the false-positive baseline is established.
failOn = "off"
reportOn = "medium"
ignorePaths = [
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

should we ignore the build directory?

Comment thread warden.toml
]

[[skills.triggers]]
type = "pull_request"
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

do these change the defaults?

Comment thread warden.toml
version = 1

[defaults]
model = "anthropic/claude-sonnet-4-6"
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

is this a change from the default?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants