Fix rewriting of CSS @import statements that use strings instead of url()#639
Fix rewriting of CSS @import statements that use strings instead of url()#639jaltekruse wants to merge 1 commit intolearningequality:mainfrom
Conversation
|
While I'm working on a test, I'm also going to try to look up any other lingering cases where bare strings are accepted inside of CSS for URLs, that don't need a url() to wrap them. So far a quick search only gave me relevant info in the Gemini AI summary from google, I didn't find an independent page/discussion about this topic: URLs in CSS can appear as plain strings in the following contexts: This is equivalent to using the url() function: content property (for generated content): When using the content property with pseudo-elements (::before, ::after), a URL can be included as part of the string content. cursor property (for custom cursors): A URL to an image file can be provided as a string value for a custom cursor. In other contexts where URLs are used in CSS, such as for background-image, list-style-image, border-image-source, or @font-face src, the url() functional notation is typically required to explicitly denote a URL value. |
|
Hi @jaltekruse, we will have a look, thanks! |
Introduces ricecooker/utils/url_utils.py with pure-function utilities for detecting and rewriting external URL references in HTML, CSS, and H5P JSON content. These will be used by the archive processor (Phase 3) to download external resources and bundle them into archives for offline use. Functions: is_external_url, derive_local_filename, extract_urls_from_html, extract_urls_from_css, extract_urls_from_h5p_json, rewrite_urls_in_html, rewrite_urls_in_css, rewrite_urls_in_h5p_json. Includes bug fixes from PRs learningequality#636 (css_node_filter attr check) and learningequality#639 (CSS @import bare string regex) incorporated into the new utilities. 64 new tests in tests/test_url_utils.py — all pure-function, no I/O. Ref: learningequality#233, supersedes learningequality#303 Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
…egexes - Import _CSS_IMPORT_RE from url_utils instead of defining locally, consolidating the @import regex from PR learningequality#639 into the shared module - Use _CSS_URL_RE.search() for background-image extraction instead of a separate inline regex - Clean up srcset parsing to match url_utils conventions - Fix linting issues in test_downloader.py from PR learningequality#636 merge - Make parse_srcset public (renamed from _parse_srcset) - Relax _CSS_IMPORT_RE to allow optional whitespace (\s*) for compat Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
Summary
While trying to import a textbook written in PreteXt I found a missing case in the code that finds referenced resources inside of CSS files. While most cases require using the url("http://abc.com") syntax, there are a few instances where bare "string" types are acceptable in the CSS spec, the one I hit was in an
@importstatement.References
https://developer.mozilla.org/en-US/docs/Web/CSS/Reference/At-rules/@import
url - Is a
<string>or a<url>type representing the location of the resource to import.The URL may be absolute or relative.
Reviewer guidance
I'll add an automated test, just wanted to create this PR as a place to document my work in the open and write some additional notes