Skip to content

refactor!: monorepo#34

Closed
a-klos wants to merge 185 commits into
dev-mainfrom
refactor/monorepo
Closed

refactor!: monorepo#34
a-klos wants to merge 185 commits into
dev-mainfrom
refactor/monorepo

Conversation

@a-klos
Copy link
Copy Markdown
Member

@a-klos a-klos commented Jul 3, 2025

This pull request introduces several foundational updates to the infrastructure repository, including improvements to workflows, documentation, and configuration files. The most significant changes involve the addition of a semantic release workflow, a detailed Code of Conduct, a new LICENSE file, and updates to .gitignore. These changes aim to enhance project governance, streamline contributions, and improve the development environment.

Project Governance and Contribution Guidelines:

  • infrastructure/CODE_OF_CONDUCT.md: Added a Contributor Covenant Code of Conduct to establish community standards and enforcement guidelines for behavior.
  • infrastructure/CONTRIBUTING.md: Introduced a comprehensive guide for contributors, including steps for contributing, guidelines, and support resources.
  • infrastructure/LICENSE: Added an Apache License 2.0 file to define terms for usage, reproduction, and distribution of the project.

Development Workflow Enhancements:

Configuration and Environment Updates:

Cleanup:

  • .gitmodules: Removed unused submodule references to streamline repository structure.

a-klos and others added 30 commits January 21, 2025 12:49
dockerfile revision
best bugfix ever
Updated deployment.yaml
added a fixed tag to the langfuse image to avoid future bugs
adds embedder types selector
- Added needed configuration for the stackit vllm in the infrastructure
old helm chart version seems to have incompatibilities with new k8s version
missing protocol for ingress cors annations
a-klos and others added 25 commits May 7, 2025 15:51
* feat: add terraform scripts

* chore: add readme

* fix: set max_surge to 1

* chore: set machine_type to g1.2

* fix: increase machine_type. langfuse needs 6 cpus and 12 gb ram

---------

Co-authored-by: Sebastian Heußer <sebastian.heusser@stackit.cloud>
Make the api definition more generic. One endpoint in the admin backend for files and one for sources like confluence etc.

The extractor endpoints have been adjusted as well.
Configuration of a timeout parameter is possible. defaults to 1h.
Conventional commits will be analyzed now.
Add a new extractor, with the capability to extract content from sidemaps.
…naming (#17)

### PR Description

This PR introduces improvements to the naming conventions used in the Terraform configuration for DNS and object storage resources.

### Summary of Changes

* **Feature:** Updated the DNS name variable for better clarity and configuration flexibility.
* **Enhancement:** Improved object storage bucket naming by appending a deployment timestamp for uniqueness and traceability.
* **Refactor:** Replaced the use of a local timestamp with a centralized deployment timestamp to ensure consistent bucket naming across resources.
* **Fix:** Corrected a regex condition to ensure accurate pattern matching.
* feat: Update langfuse dependency to version 3.0.0 and adjust related imports

- Updated langfuse version in pyproject.toml and poetry.lock files.
- Modified import statements in langfuse_ragas_evaluator.py to reflect new package structure.
- Adjusted langfuse_manager.py to use labels instead of is_active for prompt management.
- Refactored langfuse_traced_chain.py to utilize the new CallbackHandler import.
- Enhanced traced_chain.py to initialize langfuse client and update tracing logic.
* feat: Update langfuse dependency to version 3.0.0 and adjust related imports

- Updated langfuse version in pyproject.toml and poetry.lock files.
- Modified import statements in langfuse_ragas_evaluator.py to reflect new package structure.
- Adjusted langfuse_manager.py to use labels instead of is_active for prompt management.
- Refactored langfuse_traced_chain.py to utilize the new CallbackHandler import.
- Enhanced traced_chain.py to initialize langfuse client and update tracing logic.

* Add comprehensive tests for PDFExtractor functionality

- Introduced test suite for enhanced PDF extraction capabilities in `test_enhanced_pdfs.py`.
- Created new test files for various PDF types including text-based, mixed content, and scanned documents.
- Implemented detailed tests for PDFExtractor's classification, extraction, and linking functionalities in `test_pdf_extractorv2_new.py`.
- Added quick functionality verification tests in `test_pdf_functionality.py` to ensure correct operation with real PDF files.
- Established mock classes and fixtures to facilitate unit testing of PDF extraction methods.

* feat: Update dependencies and modify PDF extractor import

- Added a new source for PyTorch and its related packages with CPU support in pyproject.toml.
- Included additional dependencies: camelot-py, tabula, and easyocr.
- Changed the import statement for PDFExtractor to use the new version (pdf_extractorv2) in dependency_container.py.

* feat: add pytest-asyncio support for asynchronous testing

* Refactor PDF extractor tests: remove old test files and implement comprehensive test suite for PDFExtractor class

- Deleted outdated test files: test_pdf_extractorv2.py, test_pdf_extractorv2_new.py, and test_pdf_functionality.py.
- Introduced a new comprehensive test suite for the PDFExtractor class, covering various functionalities including content extraction from different PDF types, error handling, and performance testing.
- Added mock dependencies and fixtures to streamline testing processes.
- Implemented tests for text extraction, table extraction, language detection, and related ID mapping.
- Ensured compatibility with multiple PDF formats and validated metadata completeness in extracted content.

* refactor: Moved tests from test_pdf_extractor.py to pdf_extractor_test.py, ensuring comprehensive coverage and maintaining functionality. Removed old test file to streamline the testing structure.

* refactor: update flake8 exclusions and clean up PDFExtractor tests for improved readability and maintainability

* chore: add pdf files using git lfs

* refactor: update parameter names in PDFExtractor class for clarity and consistency; enhance test suite with additional logging and assertions

* chore: remove PyTorch and related dependencies from pyproject.toml

* refactor: remove unused text-based PDF document from test data

* chore: add sample PDF document for testing in extractor-api-lib

* refactor: remove unused test methods and main execution block from pdf_extractor_test.py

* chore: add pytest-asyncio as a development dependency

* Remove unused dependencies: tabula and easyocr from pyproject.toml
* add infrastructure for mcp

* feat: update mcp configuration and improve ingress rules

* feat: add MCP server configuration and update related templates

* refactor: rag backend main and mcp ingress

---------

Co-authored-by: Melvin Klein <melvin.klein@stackit.cloud>
…#19)

* fix: update project title from "RAG SIT x Stackit" to "STACKIT RAG" across multiple files

* fix: remove unnecessary condition in sitemap loader parameter processing
@a-klos a-klos changed the base branch from main to dev-main July 3, 2025 10:52
@a-klos a-klos changed the title Refactor/monorepo refactor!: monorepo Jul 3, 2025
Copy link
Copy Markdown
Collaborator

@MelvinKl MelvinKl left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

approved without checking. There are too many changes

This pull request includes significant changes to the codebase,
primarily focused on removing unused utility functions, adding new
infrastructure setup files, and introducing project documentation. Below
is a categorized summary of the most important changes:

### Removal of Unused Utility Functions
*
[`frontend/libs/shared/utils/src/lib/date.utils.ts`](diffhunk://#diff-26b2a76e608e2e9ccbfc3aece1d862490f8289dc4036f5b5e7d11cc6365f2872L1-L5):
Removed `extractTime` function, which formatted a `Date` object into
`HH:MM` format.
*
[`frontend/libs/shared/utils/src/lib/file-size-formatter.utils.ts`](diffhunk://#diff-fd3fd574bfaae88cf015ee52d5029c3ddc1047f229d6afac3755c20bac194422L1-L14):
Removed `formatFileSizeToString` function, which converted file sizes
into human-readable strings.
*
[`frontend/libs/shared/utils/src/lib/is-empty.utils.ts`](diffhunk://#diff-dea95869e418d8bef72681478516ad4e4e7d7a4c499674c912cf0348b167f957L1-L2):
Removed `isEmpty` and `isNotEmpty` functions for checking object
emptiness.
*
[`frontend/libs/shared/utils/src/lib/marked.utils.ts`](diffhunk://#diff-3fbeaebe99de1bfc51fab8c74ab8984f15ec8e1447ce521d71467fcfe928ce39L1-L45):
Removed `initializeMarkdown` function, which customized `marked` for
rendering markdown with modals and tables.
*
[`frontend/libs/shared/utils/src/lib/uuid.util.ts`](diffhunk://#diff-dc4296fa6f377c72edfbc43bc9229f80638d63c1dc558d54928bd57f5535305eL1-L4):
Removed `newUid` function, which generated random UUIDs.

### Infrastructure Setup
*
[`infrastructure/local-cluster-setup/k3d-cluster-config.yaml`](diffhunk://#diff-218f3fa08a5a5389f37836ff03b08ba96434f17c3136d22410a20c6051e38471R1-R26):
Added configuration for setting up a local Kubernetes cluster using
`k3d`, including eviction policies and a local registry.
*
[`infrastructure/local-cluster-setup/setup-k3d-cluster.sh`](diffhunk://#diff-170733500357d283873c5a4d86020aa767b6d5e04b788bdc008162ee27824b35R1-R17):
Added a script to create the Kubernetes cluster, configure Helm, and
install an NGINX ingress controller.

### Project Documentation
*
[`infrastructure/CODE_OF_CONDUCT.md`](diffhunk://#diff-ee0c6213e1d5121be8db4a03f57d7d1bc47fb775fb8faf525b5bb06abf7d9afcR1-R133):
Added a Contributor Covenant Code of Conduct to establish community
standards and enforcement guidelines.
*
[`infrastructure/CONTRIBUTING.md`](diffhunk://#diff-acd7d8b10b0e206f3e9e235c568079b8649204a184dfa4720d350aa4101d029cR1-R20):
Added a guide for contributors, outlining steps for making contributions
and expectations during the review process.
*
[`infrastructure/LICENSE`](diffhunk://#diff-e7f1f2efceac842941f35c6bf15a05b9b612882b5a2c9de6419ac9eedb45563eR1-R201):
Added Apache License 2.0 to define terms for use, reproduction, and
distribution of the project.
@a-klos a-klos closed this Jul 15, 2025
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

8 participants