feat(lsp): validate conditional requirements from RFC 2119 descriptions#257
feat(lsp): validate conditional requirements from RFC 2119 descriptions#257bennypowers wants to merge 1 commit intomainfrom
Conversation
Extract and enforce RFC 2119 conditional requirements from element attribute descriptions. When a description says e.g. "If you set `variant` to 'icon', you MUST also set `accessible-label`", the LSP now produces an error diagnostic if the HTML violates that rule. Uses a signal-based NLP approach: rather than matching whole-sentence regex templates, decomposes sentences into signals (backtick-quoted attr names, RFC 2119 keywords, conditional markers, quoted values, negation words) and infers relationships from relative positions. Assisted-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
|
Important Review skippedDraft detected. Please check the settings in the CodeRabbit UI or the ⚙️ Run configurationConfiguration used: defaults Review profile: CHILL Plan: Pro Run ID: You can disable this status message by setting the Use the checkbox below for a quick retry:
✨ Finishing Touches🧪 Generate unit tests (beta)
Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out. Comment |
LSP Benchmark Results
View this benchmark run in GitHub Actions 💡 Tip: Raw JSON results are available in workflow artifacts if needed. Generate Benchmarks
Perf/kb delta ratio: 1.00x 👍 View this benchmark run in GitHub Actions 💡 Tip: Raw JSON outputs are available in workflow artifacts if needed. |
|
@paceaux I need a linguist's eye, WDYT about this? The goal is to extract meaning from user-written documentation, to validate element usage. e.g. user documents a button element with "When |
|
@bennypowers This is a complex task and it's best accomplished with a variety of approaches. (and, FTW, none of the alternative options you listed are mutually exclusive) The absolute best approach would be to use a library, but for reasons you've established (and I agree with) that'd be overkill for the task at hand. I'm not familiar with Go at all, so that's a bit of a limiting factor here. Based on what I think I understand about your stated goals (extract natural language statements from an element manifest, use RFC2119 as a kind of "mapping" from those statements, evaluate an element's rules against those reported, and report on that)...I think your described approach is naive, but still may work well because of the narrow range you're working in. You need to test this with non-happy path statements: misspellings, capitalization issues, mismatched quotes, rearranged words, etc. What we're talking about are first conditional sentences; sentences where the conditional signal has a single known and expected implication. "if" and "when" are the most common, for sure. But you've also got
I'd like to see some of those included in your You may also want to account for at least a few modal verbs because that could influence how you map to RFC2119:
Also don't forget the French quotes and the annoying apostrophe-as-quote for What I don't see (and maybe I missed it?) is text normalization: where you set all your text to lowercase and remove special characters. that may be useful. In cases like these, usually it goes:
But all the same, I think this is a good start and I'd like to see a healthy number of examples somewhere of what the text looks like. |
Summary
variantto 'icon', you MUST also setaccessible-label")Approach: Signal-based extraction
Rather than matching whole-sentence regex templates (fragile, limited to exact phrasings), this uses a signal-based pipeline:
This handles arbitrary verbs, passive voice, negated conditions, OR values, and any word order — because it never looks at verbs or sentence structure.
Alternatives considered
if/when ATTR is VALUE, MUST VERB ATTRMODAL+VERB+NOUNconstraintsfield to the CEM schemacem generateto extract rules, store as structured dataTest plan
🤖 Generated with Claude Code