feat(lsp): validate conditional requirements from RFC 2119 descriptions by bennypowers · Pull Request #257 · bennypowers/cem

bennypowers · 2026-03-06T09:13:07Z

Experimental — this branch explores extracting and enforcing conditional requirements from natural language descriptions in custom elements manifests.

Summary

Extracts RFC 2119 conditional requirements from attribute/slot descriptions (e.g. "If you set variant to 'icon', you MUST also set accessible-label")
Produces LSP error diagnostics when HTML violates those requirements
Only MUST/REQUIRED/SHALL keywords trigger diagnostics (SHOULD/MAY ignored to reduce noise)

Approach: Signal-based extraction

Rather than matching whole-sentence regex templates (fragile, limited to exact phrasings), this uses a signal-based pipeline:

Extract signals from each sentence: backtick-quoted attribute names, RFC 2119 keywords, conditional markers (if/when), quoted values, negation words
Infer relationships from relative positions of those signals — the attr nearest the conditional marker is the condition; the attr nearest the MUST keyword is the requirement
Evaluate rules against the actual HTML element's attributes

This handles arbitrary verbs, passive voice, negated conditions, OR values, and any word order — because it never looks at verbs or sentence structure.

Alternatives considered

Approach	Description	Trade-off
Regex templates	Match exact sentence patterns like `if/when ATTR is VALUE, MUST VERB ATTR`	Brittle — fails on passive voice, unusual verbs, different word order
Clause parser	Split sentences into clauses, classify each as conditional/declarative	More structured but higher complexity for similar coverage
POS tagging (prose)	Use part-of-speech tags to match grammatical patterns like `MODAL+VERB+NOUN`	Adds dependency, POS errors on domain-specific terms
Structured manifest field	Add a `constraints` field to the CEM schema	Reliable but requires schema changes and author buy-in
Build-time LLM extraction	Run LLM at `cem generate` to extract rules, store as structured data	Powerful but adds LLM dependency to build step

Test plan

41 unit tests for signal extraction and rule evaluation
5 LSP integration tests for end-to-end diagnostic generation
Full test suite passes (2214 tests)
Manual testing with real-world manifests (RHDS, PFE)

🤖 Generated with Claude Code

Extract and enforce RFC 2119 conditional requirements from element attribute descriptions. When a description says e.g. "If you set `variant` to 'icon', you MUST also set `accessible-label`", the LSP now produces an error diagnostic if the HTML violates that rule. Uses a signal-based NLP approach: rather than matching whole-sentence regex templates, decomposes sentences into signals (backtick-quoted attr names, RFC 2119 keywords, conditional markers, quoted values, negation words) and infers relationships from relative positions. Assisted-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

coderabbitai · 2026-03-06T09:13:21Z

Important

Review skipped

Draft detected.

Please check the settings in the CodeRabbit UI or the .coderabbit.yaml file in this repository. To trigger a single review, invoke the @coderabbitai review command.

⚙️ Run configuration

Configuration used: defaults

Review profile: CHILL

Plan: Pro

Run ID: 3fcd8e8e-7323-4e9f-a320-5fde72ae37d8

You can disable this status message by setting the reviews.review_status to false in the CodeRabbit configuration file.

Use the checkbox below for a quick retry:

🔍 Trigger review

✨ Finishing Touches

🧪 Generate unit tests (beta)

Create PR with unit tests
Post copyable unit tests in a comment
Commit unit tests in branch lsp/validations-from-docs-rfc2119

Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share

_{Comment @coderabbitai help to get the list of available commands and usage tips.}

github-actions · 2026-03-06T09:14:58Z

LSP Benchmark Results

Benchmark	PR Mean (ms)	Base Mean (ms)	Delta	Success Rate
Startup	2.789231	2.848442	-0.06 (-2.1%) ✅	100%
Hover	0.47224	0.419426	0.05 (12.6%) 🐢	100%
Completion	1.755401	1.832363	-0.08 (-4.2%) ✅	100%
Diagnostics	2028.22574	2030.647938	-2.42 (-0.1%) ✅	100%
Attribute Hover	0	0	0.00 (0.0%) ➖	100%
References	33.7515	31.6046	2.15 (6.8%) ⚠️	100%

View this benchmark run in GitHub Actions

💡 Tip: Raw JSON results are available in workflow artifacts if needed.

Generate Benchmarks

	Branch	Total Time (s)	# Runs	Avg Time/run (s)	Output Size (kb)	Perf/kb (s/kb)
Base	`main`	4.37466	6	0.72911	156	0.0280427
PR	`lsp/validations-from-docs-rfc2119`	4.37375	6	0.728958	156	0.0280369
Δ		-0.0009	0	-0.0002	0	-0.0000 👍

Perf/kb delta ratio: 1.00x 👍

View this benchmark run in GitHub Actions

💡 Tip: Raw JSON outputs are available in workflow artifacts if needed.

bennypowers · 2026-03-06T09:16:09Z

@paceaux I need a linguist's eye, WDYT about this? The goal is to extract meaning from user-written documentation, to validate element usage.

e.g. user documents a button element with "When variant attribute is set to icon, you MUST include an accessible-label attribute". This PR attempts to extract rules from such texts which the LSP can use to flag invalid usage in HTML documents or templates. I'd like to avoid shipping an LLM engine for this, for performance reasons.

paceaux · 2026-03-06T15:38:51Z

@bennypowers This is a complex task and it's best accomplished with a variety of approaches. (and, FTW, none of the alternative options you listed are mutually exclusive)

The absolute best approach would be to use a library, but for reasons you've established (and I agree with) that'd be overkill for the task at hand.

I'm not familiar with Go at all, so that's a bit of a limiting factor here.

Based on what I think I understand about your stated goals (extract natural language statements from an element manifest, use RFC2119 as a kind of "mapping" from those statements, evaluate an element's rules against those reported, and report on that)...I think your described approach is naive, but still may work well because of the narrow range you're working in.

You need to test this with non-happy path statements: misspellings, capitalization issues, mismatched quotes, rearranged words, etc.

What we're talking about are first conditional sentences; sentences where the conditional signal has a single known and expected implication. "if" and "when" are the most common, for sure. But you've also got

unless
until
in case
as long as
provided

I'd like to see some of those included in your conditionalRegex

You may also want to account for at least a few modal verbs because that could influence how you map to RFC2119:

may
might
could
should
would

Also don't forget the French quotes and the annoying apostrophe-as-quote for quotedValueRegex

What I don't see (and maybe I missed it?) is text normalization: where you set all your text to lowercase and remove special characters. that may be useful.

In cases like these, usually it goes:

normalize
tokenize (split on spaces or inter-sentence punctuation)
do all the other things

But all the same, I think this is a good start and I'd like to see a healthy number of examples somewhere of what the text looks like.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

feat(lsp): validate conditional requirements from RFC 2119 descriptions#257

feat(lsp): validate conditional requirements from RFC 2119 descriptions#257
bennypowers wants to merge 1 commit intomainfrom
lsp/validations-from-docs-rfc2119

bennypowers commented Mar 6, 2026

Uh oh!

coderabbitai Bot commented Mar 6, 2026

Review skipped

Uh oh!

github-actions Bot commented Mar 6, 2026

Uh oh!

bennypowers commented Mar 6, 2026

Uh oh!

paceaux commented Mar 6, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

Conversation

bennypowers commented Mar 6, 2026

Summary

Approach: Signal-based extraction

Alternatives considered

Test plan

Uh oh!

coderabbitai Bot commented Mar 6, 2026

Review skipped

Uh oh!

github-actions Bot commented Mar 6, 2026

LSP Benchmark Results

Generate Benchmarks

Uh oh!

bennypowers commented Mar 6, 2026

Uh oh!

paceaux commented Mar 6, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants