Feat/keycloak#53
Conversation
- Added `setup_keycloak.py` script for automating Keycloak realm and client setup. - Introduced logging for better traceability during Keycloak operations. - Updated `pyproject.toml` to include `python-keycloak` dependency. - Created empty logging and sys files for future enhancements.
…t and update README for configuration details
- Add KnowledgeSpaceSettings for configuration of knowledge spaces and multitenancy strategies. - Introduce KnowledgeSpaceAccessService for authorization and scope resolution of knowledge spaces. - Create KnowledgeSpaceCollectionRouter to map logical knowledge spaces to physical collections. - Define KnowledgeSpace and DocumentVisibilityMetadata models for knowledge space representation. - Implement operational tools for managing knowledge spaces, including state loading and saving. - Add migration script for backfilling legacy metadata in Qdrant collections. - Develop tests for knowledge space access service, settings validation, and operational tools.
|
|
||
| def _build_trusted_issuers(self) -> set[str]: | ||
| trusted = self._default_trusted_issuers() | self._configured_trusted_issuers() | ||
| logger.info("Configured trusted token issuers: %s", sorted(trusted)) |
Check failure
Code scanning / CodeQL
Clear-text logging of sensitive information High
Show autofix suggestion
Hide autofix suggestion
Copilot Autofix
AI 3 months ago
In general, to fix clear-text logging issues, we should avoid logging sensitive or potentially sensitive values directly. Instead, we can either remove such logs, reduce their granularity (e.g., log only counts or high-level status), or redact/obfuscate the sensitive parts before logging. The fix must preserve existing functional behavior (authentication logic) while changing only what is sent to the logger.
For this specific case, the problematic code is in _build_trusted_issuers in libs/rag-core-api/src/rag_core_api/auth.py, at line 93–95. The function currently logs the exact list of trusted issuers with: logger.info("Configured trusted token issuers: %s", sorted(trusted)). To eliminate the clear-text logging of the issuers while preserving usefulness, we can either (a) remove the dynamic value entirely and log only that trusted issuers were configured, or (b) log only aggregate information such as the number of configured issuers. Option (b) still gives operators some visibility without exposing actual issuer values, and is fully compatible with existing logic, since the returned set trusted remains unchanged and used as before. Concretely, we will replace the existing log call with something like: logger.info("Configured trusted token issuers: %d issuers", len(trusted)). No new imports or helper methods are needed; we only modify that one logging line.
| @@ -91,7 +91,7 @@ | ||
|
|
||
| def _build_trusted_issuers(self) -> set[str]: | ||
| trusted = self._default_trusted_issuers() | self._configured_trusted_issuers() | ||
| logger.info("Configured trusted token issuers: %s", sorted(trusted)) | ||
| logger.info("Configured trusted token issuers: %d issuers", len(trusted)) | ||
| return trusted | ||
|
|
||
| def _is_trusted_issuer(self, issuer: str | None) -> bool: |
| "JWKS signature failed (kid=%s, alg=%s, source=%s), refreshing and retrying", | ||
| kid, | ||
| alg, | ||
| name, |
Check failure
Code scanning / CodeQL
Clear-text logging of sensitive information High
Show autofix suggestion
Hide autofix suggestion
Copilot Autofix
AI 3 months ago
General fix: ensure that log messages never include sensitive or untrusted values unless they are first redacted or otherwise sanitized. In this case, we want to keep the diagnostic value of knowing which JWKS source failed (issuer vs. default, and whether it was a refreshed set) while guaranteeing that no tainted issuer or URL data is written to logs. That means ensuring the source field in the log message is always a static, non-sensitive label.
Best concrete fix here: avoid using a possibly tainted name derived from jwks_sources directly in the log call. Instead, introduce a local source_label that is computed in a controlled, constant way (e.g., mapping the known sources to "issuer" / "issuer-refreshed" / "default" / "default-refreshed"), and log that label instead of name. Since in the current code name is already a constant string literal when the tuples are created, we can simply replace the placeholder %s argument from name to a new, clearly non-tainted local that is derived from those known constants, not any issuer-derived value.
Concretely, in libs/rag-core-api/src/rag_core_api/auth.py, in the loop starting at line 230, change the logger.info call at lines 236–241 so that it does not log the potentially tainted name. Introduce a source_label computed from name (e.g., source_label = "issuer" if name.startswith("issuer") else "default" or simply source_label = name.split("-")[0]), and then pass source_label instead of name to logger.info. This keeps functionality effectively the same (we still distinguish issuer/default, and refreshed vs not can be inferred from context or by tweaking the label logic if needed) while structurally eliminating any propagation of taint into the logging sink. No new imports or external methods are required.
| @@ -233,11 +233,13 @@ | ||
| self._assert_trusted_issuer(decoded.get("iss")) | ||
| return decoded | ||
| except InvalidSignatureError as sig_err: | ||
| # Use a sanitized, static label to avoid logging any tainted data | ||
| source_label = "issuer" if name.startswith("issuer") else "default" | ||
| logger.info( | ||
| "JWKS signature failed (kid=%s, alg=%s, source=%s), refreshing and retrying", | ||
| kid, | ||
| alg, | ||
| name, | ||
| source_label, | ||
| ) | ||
| last_error = sig_err | ||
| try: |
| except ValueError: | ||
| data = response_text | ||
| elif re.match(r"^application/(json|[\w!#$&.+-^_]+\+json)\s*(;|$)", content_type, re.IGNORECASE): | ||
| elif re.match(r'^application/(json|[\w!#$&.+-^_]+\+json)\s*(;|$)', content_type, re.IGNORECASE): |
Check warning
Code scanning / CodeQL
Overly permissive regular expression range Medium
…ensitive information Co-authored-by: Copilot Autofix powered by AI <62310815+github-advanced-security[bot]@users.noreply.github.com>
| self._assert_trusted_issuer(decoded.get("iss")) | ||
| return decoded | ||
| except InvalidSignatureError as sig_err: | ||
| logger.info("JWKS signature failed (source=%s), refreshing and retrying", name) |
Check failure
Code scanning / CodeQL
Clear-text logging of sensitive information High
Copilot Autofix
AI 3 months ago
Copilot could not generate an autofix suggestion
Copilot could not generate an autofix suggestion for this alert. Try pushing a new commit or if the problem persists contact support.
Thank you for contributing to the RAG Core Library!
Please ensure your PR meets the following requirements:
PR Title: Follow the format "type: description"
PR Description: Replace this checklist with:
Additional Guidelines:
Thank you for your contribution!