Skip to content

Feat/keycloak#53

Open
robodev-r2d2 wants to merge 16 commits into
mainfrom
feat/keycloak
Open

Feat/keycloak#53
robodev-r2d2 wants to merge 16 commits into
mainfrom
feat/keycloak

Conversation

@robodev-r2d2
Copy link
Copy Markdown
Owner

Thank you for contributing to the RAG Core Library!

Please ensure your PR meets the following requirements:

  • PR Title: Follow the format "type: description"

  • PR Description: Replace this checklist with:

    • Description: Provide a detailed description of the changes made.
    • Issue: Mention the issue number this PR fixes, if applicable.
    • Dependencies: List any dependencies required for this change.

Additional Guidelines:

  • Ensure your code follows established coding conventions
  • Include relevant tests and documentation updates.
  • If no one reviews your PR within a few days, please @-mention a-klos.

Thank you for your contribution!

NewDev16 and others added 13 commits December 3, 2025 21:44
- Added `setup_keycloak.py` script for automating Keycloak realm and client setup.
- Introduced logging for better traceability during Keycloak operations.
- Updated `pyproject.toml` to include `python-keycloak` dependency.
- Created empty logging and sys files for future enhancements.
…t and update README for configuration details
- Add KnowledgeSpaceSettings for configuration of knowledge spaces and multitenancy strategies.
- Introduce KnowledgeSpaceAccessService for authorization and scope resolution of knowledge spaces.
- Create KnowledgeSpaceCollectionRouter to map logical knowledge spaces to physical collections.
- Define KnowledgeSpace and DocumentVisibilityMetadata models for knowledge space representation.
- Implement operational tools for managing knowledge spaces, including state loading and saving.
- Add migration script for backfilling legacy metadata in Qdrant collections.
- Develop tests for knowledge space access service, settings validation, and operational tools.
Comment thread libs/admin-api-lib/src/admin_api_lib/auth.py Fixed
Comment thread libs/admin-api-lib/src/admin_api_lib/auth.py Fixed

def _build_trusted_issuers(self) -> set[str]:
trusted = self._default_trusted_issuers() | self._configured_trusted_issuers()
logger.info("Configured trusted token issuers: %s", sorted(trusted))

Check failure

Code scanning / CodeQL

Clear-text logging of sensitive information High

This expression logs
sensitive data (secret)
as clear text.

Copilot Autofix

AI 3 months ago

In general, to fix clear-text logging issues, we should avoid logging sensitive or potentially sensitive values directly. Instead, we can either remove such logs, reduce their granularity (e.g., log only counts or high-level status), or redact/obfuscate the sensitive parts before logging. The fix must preserve existing functional behavior (authentication logic) while changing only what is sent to the logger.

For this specific case, the problematic code is in _build_trusted_issuers in libs/rag-core-api/src/rag_core_api/auth.py, at line 93–95. The function currently logs the exact list of trusted issuers with: logger.info("Configured trusted token issuers: %s", sorted(trusted)). To eliminate the clear-text logging of the issuers while preserving usefulness, we can either (a) remove the dynamic value entirely and log only that trusted issuers were configured, or (b) log only aggregate information such as the number of configured issuers. Option (b) still gives operators some visibility without exposing actual issuer values, and is fully compatible with existing logic, since the returned set trusted remains unchanged and used as before. Concretely, we will replace the existing log call with something like: logger.info("Configured trusted token issuers: %d issuers", len(trusted)). No new imports or helper methods are needed; we only modify that one logging line.

Suggested changeset 1
libs/rag-core-api/src/rag_core_api/auth.py

Autofix patch

Autofix patch
Run the following command in your local git repository to apply this patch
cat << 'EOF' | git apply
diff --git a/libs/rag-core-api/src/rag_core_api/auth.py b/libs/rag-core-api/src/rag_core_api/auth.py
--- a/libs/rag-core-api/src/rag_core_api/auth.py
+++ b/libs/rag-core-api/src/rag_core_api/auth.py
@@ -91,7 +91,7 @@
 
     def _build_trusted_issuers(self) -> set[str]:
         trusted = self._default_trusted_issuers() | self._configured_trusted_issuers()
-        logger.info("Configured trusted token issuers: %s", sorted(trusted))
+        logger.info("Configured trusted token issuers: %d issuers", len(trusted))
         return trusted
 
     def _is_trusted_issuer(self, issuer: str | None) -> bool:
EOF
@@ -91,7 +91,7 @@

def _build_trusted_issuers(self) -> set[str]:
trusted = self._default_trusted_issuers() | self._configured_trusted_issuers()
logger.info("Configured trusted token issuers: %s", sorted(trusted))
logger.info("Configured trusted token issuers: %d issuers", len(trusted))
return trusted

def _is_trusted_issuer(self, issuer: str | None) -> bool:
Copilot is powered by AI and may make mistakes. Always verify output.
"JWKS signature failed (kid=%s, alg=%s, source=%s), refreshing and retrying",
kid,
alg,
name,

Check failure

Code scanning / CodeQL

Clear-text logging of sensitive information High

This expression logs
sensitive data (secret)
as clear text.
This expression logs
sensitive data (secret)
as clear text.

Copilot Autofix

AI 3 months ago

General fix: ensure that log messages never include sensitive or untrusted values unless they are first redacted or otherwise sanitized. In this case, we want to keep the diagnostic value of knowing which JWKS source failed (issuer vs. default, and whether it was a refreshed set) while guaranteeing that no tainted issuer or URL data is written to logs. That means ensuring the source field in the log message is always a static, non-sensitive label.

Best concrete fix here: avoid using a possibly tainted name derived from jwks_sources directly in the log call. Instead, introduce a local source_label that is computed in a controlled, constant way (e.g., mapping the known sources to "issuer" / "issuer-refreshed" / "default" / "default-refreshed"), and log that label instead of name. Since in the current code name is already a constant string literal when the tuples are created, we can simply replace the placeholder %s argument from name to a new, clearly non-tainted local that is derived from those known constants, not any issuer-derived value.

Concretely, in libs/rag-core-api/src/rag_core_api/auth.py, in the loop starting at line 230, change the logger.info call at lines 236–241 so that it does not log the potentially tainted name. Introduce a source_label computed from name (e.g., source_label = "issuer" if name.startswith("issuer") else "default" or simply source_label = name.split("-")[0]), and then pass source_label instead of name to logger.info. This keeps functionality effectively the same (we still distinguish issuer/default, and refreshed vs not can be inferred from context or by tweaking the label logic if needed) while structurally eliminating any propagation of taint into the logging sink. No new imports or external methods are required.

Suggested changeset 1
libs/rag-core-api/src/rag_core_api/auth.py

Autofix patch

Autofix patch
Run the following command in your local git repository to apply this patch
cat << 'EOF' | git apply
diff --git a/libs/rag-core-api/src/rag_core_api/auth.py b/libs/rag-core-api/src/rag_core_api/auth.py
--- a/libs/rag-core-api/src/rag_core_api/auth.py
+++ b/libs/rag-core-api/src/rag_core_api/auth.py
@@ -233,11 +233,13 @@
                 self._assert_trusted_issuer(decoded.get("iss"))
                 return decoded
             except InvalidSignatureError as sig_err:
+                # Use a sanitized, static label to avoid logging any tainted data
+                source_label = "issuer" if name.startswith("issuer") else "default"
                 logger.info(
                     "JWKS signature failed (kid=%s, alg=%s, source=%s), refreshing and retrying",
                     kid,
                     alg,
-                    name,
+                    source_label,
                 )
                 last_error = sig_err
                 try:
EOF
@@ -233,11 +233,13 @@
self._assert_trusted_issuer(decoded.get("iss"))
return decoded
except InvalidSignatureError as sig_err:
# Use a sanitized, static label to avoid logging any tainted data
source_label = "issuer" if name.startswith("issuer") else "default"
logger.info(
"JWKS signature failed (kid=%s, alg=%s, source=%s), refreshing and retrying",
kid,
alg,
name,
source_label,
)
last_error = sig_err
try:
Copilot is powered by AI and may make mistakes. Always verify output.
except ValueError:
data = response_text
elif re.match(r"^application/(json|[\w!#$&.+-^_]+\+json)\s*(;|$)", content_type, re.IGNORECASE):
elif re.match(r'^application/(json|[\w!#$&.+-^_]+\+json)\s*(;|$)', content_type, re.IGNORECASE):

Check warning

Code scanning / CodeQL

Overly permissive regular expression range Medium

Suspicious character range that overlaps with \w in the same character class, and is equivalent to \[+,\-.\/0-9:;<=>?@A-Z\\[\\\\]^\].
robodev-r2d2 and others added 2 commits February 13, 2026 16:39
…ensitive information

Co-authored-by: Copilot Autofix powered by AI <62310815+github-advanced-security[bot]@users.noreply.github.com>
self._assert_trusted_issuer(decoded.get("iss"))
return decoded
except InvalidSignatureError as sig_err:
logger.info("JWKS signature failed (source=%s), refreshing and retrying", name)

Check failure

Code scanning / CodeQL

Clear-text logging of sensitive information High

This expression logs
sensitive data (secret)
as clear text.
This expression logs
sensitive data (secret)
as clear text.

Copilot Autofix

AI 3 months ago

Copilot could not generate an autofix suggestion

Copilot could not generate an autofix suggestion for this alert. Try pushing a new commit or if the problem persists contact support.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants