Skip to content

Pr/add gitlab adapter#24

Open
mjochum64 wants to merge 8 commits intocastai:mainfrom
mjochum64:pr/add-gitlab-adapter
Open

Pr/add gitlab adapter#24
mjochum64 wants to merge 8 commits intocastai:mainfrom
mjochum64:pr/add-gitlab-adapter

Conversation

@mjochum64
Copy link
Copy Markdown

Summary

  • Adds a new GitLab adapter that syncs repository files into OpenWebUI knowledge bases, analogous to the existing GitHub adapter
  • Supports self-hosted GitLab instances via configurable base_url
  • Supports optional path filter per repository mapping to restrict sync to a subfolder (e.g. docs/)
  • Extends file support with PDF uploads (GitLab adapter only; base64 decoding handles binary content correctly)
  • Fixes a bug in GetKnowledgeFiles that called the list endpoint instead of the specific knowledge endpoint, causing an empty file index on startup
  • Fixes ListKnowledge to handle both array and paginated object response formats
  • Fixes a ContentLength mismatch on upload retries caused by a consumed bytes.Buffer

Test plan

  • Unit tests pass: go test ./...
  • GitLab adapter constructor validates missing token and base URL
  • Files are fetched from configured repositories and synced to the correct knowledge base
  • Path filtering restricts sync to the configured subfolder
  • PDF files are uploaded and processed by OpenWebUI
  • Upload retries succeed after transient errors (no ContentLength mismatch)

Jochum, Martin added 8 commits March 4, 2026 08:55
Adds a new GitLab adapter that synchronizes repository files from
self-hosted GitLab instances into OpenWebUI knowledge bases.

The adapter follows the same pattern as the existing GitHub adapter:
- Per-repository knowledge base mappings
- Recursive file tree fetching via GitLab API v4
- Text file filtering (reuses existing isTextFile helper)
- SHA256-based change detection
- Pagination support for large repositories
- Deterministic iteration order via ordered repository slice

Authentication uses Personal Access Tokens. The base URL is
configurable for self-hosted instances via config file or the
GITLAB_BASE_URL and GITLAB_TOKEN environment variables.

Uses github.com/xanzy/go-gitlab v0.115.0.
- Add optional path field to RepositoryMapping to restrict syncing to
  a subfolder within a repository (e.g. docs/)
- Refactor GitLabAdapter to store per-repo path alongside knowledge ID
- Pass path to ListTree API call when configured
- Fix ContentLength mismatch on upload retries: bytes.Buffer is consumed
  after the first Do() call; capture body bytes once and create a fresh
  bytes.NewReader per retry attempt
- Add namespace.yaml for dedicated content-sync namespace
- Add namespace: content-sync to all k8s resources
- Add GITLAB_TOKEN env var from gitlab-secrets in deployment
- Update configmap with GitLab adapter configuration
- Fix invalid base64 placeholders in secrets.yaml
- Change storageClassName to local-path for local development
The function was calling /api/v1/knowledge/ (list endpoint) and trying
to unmarshal the response as []*Knowledge, causing a JSON decode error
when the API returns a single object. This resulted in an empty file
index on startup, leading to duplicate content errors on re-upload.

Fix: call /api/v1/knowledge/{id} and decode as a single Knowledge object.
…mats

The /api/v1/knowledge/ endpoint may return either a JSON array or a
paginated wrapper object (items/data field). Try array decode first,
fall back to wrapper struct to handle both API versions.
Extend file filter with isGitLabSupportedFile() which adds .pdf to the
existing text file whitelist. The GitLab API returns file content as
base64-encoded bytes, making binary files like PDFs safe to transfer.
OpenWebUI handles PDF text extraction automatically on upload.
…ageClass

Replace environment-specific values with generic placeholders suitable
for upstream use. Restore storageClassName to ebs-sc (AWS EKS default).
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant