Skip to content
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
8 changes: 8 additions & 0 deletions .pre-commit-config.yaml
Original file line number Diff line number Diff line change
Expand Up @@ -37,3 +37,11 @@ repos:
entry: djhtml --tabwidth 2
files: .*/templates/.*\.html$
alias: autoformat
- repo: local
hooks:
- id: generate-service-docs
name: check service API docs are up to date
entry: python scripts/generate_service_docs.py --check
language: system
pass_filenames: false
always_run: true
2 changes: 1 addition & 1 deletion README.md
Original file line number Diff line number Diff line change
Expand Up @@ -215,7 +215,7 @@ Docs are organized **by topic** (one doc per concern: workflow, workspace, servi
- [Workspace.md](docs/Workspace.md) – Workspace layout and usage for file processing.
- [Schema.md](docs/Schema.md) – Database schema and table relationships.
- [Development_guideline.md](docs/Development_guideline.md) – Development setup, app requirements, and step-by-step workflow.
- [Contributing.md](docs/Contributing.md) – Service layer (single place for writes) and contributor guidelines.
- [Contributing.md](docs/Contributing.md) – Service layer (single place for writes), **regenerating service API docs** (`scripts/generate_service_docs.py`), and contributor guidelines.
- [Service_API.md](docs/Service_API.md) – API reference and index for all service layer functions.
- [service_api/](docs/service_api/) – Per-app service API docs (name, description, parameters, return types, validation).

Expand Down
18 changes: 16 additions & 2 deletions docs/Contributing.md
Original file line number Diff line number Diff line change
Expand Up @@ -28,8 +28,22 @@ Each Django app that has **models** provides a **`services.py`** module. This is
| `boost_usage_tracker` | `boost_usage_tracker/services.py` | External repos, Boost usage, missing-header tmp. |
| `cppa_pinecone_sync` | `cppa_pinecone_sync/services.py` | Pinecone fail list and sync status writes. |
| `discord_activity_tracker` | `discord_activity_tracker/services.py` | Servers, channels, messages, reactions (Discord user profiles in cppa_user_tracker). |
| `cppa_youtube_script_tracker` | `cppa_youtube_script_tracker/services.py` | YouTube channels, videos, tags, transcript state, speaker links. |
| `clang_github_tracker` | `clang_github_tracker/services.py` | Clang/llvm GitHub issue, PR, and commit upserts; fetch watermarks. |
| `boost_mailing_list_tracker` | `boost_mailing_list_tracker/services.py` | Mailing list messages and names. |
| `cppa_slack_tracker` | `cppa_slack_tracker/services.py` | Slack teams, channels, messages, membership. |
| `wg21_paper_tracker` | `wg21_paper_tracker/services.py` | WG21 papers, authors, mailings. |

For a full list of functions, parameter/return types, and validation (e.g. empty `name` raises `ValueError`), see **[Service_API.md](Service_API.md)** and the per-app docs in **[service_api/](service_api/)** (index: [service_api/README.md](service_api/README.md)).
For a full list of functions, parameter/return types, and validation (e.g. empty `name` raises `ValueError`), see **[Service_API.md](Service_API.md)** and the per-app docs in **[service_api/](service_api/)** (index: [service_api/README.md](service_api/README.md)). DTO protocols shared across trackers are documented in **[service_api/core_protocols.md](service_api/core_protocols.md)** (generated from `core/protocols.py`).

### Regenerating service API docs

Reference tables in `docs/service_api/*.md` are produced by **[`scripts/generate_service_docs.py`](../scripts/generate_service_docs.py)** from each app’s `services.py` and from `core/protocols.py`.

- **Markers:** Each file contains `<!-- SERVICE_API:GENERATED:START -->` … `<!-- SERVICE_API:GENERATED:END -->`. The script replaces **only** that region. Put hand-written notes (usage, cross-app warnings, command help) **below** the `END` marker.
- **Regenerate locally:** `python scripts/generate_service_docs.py` (optional: `--app <django_app_label>` for one module).
- **Check only:** `python scripts/generate_service_docs.py --check` exits non-zero if committed markdown would change.
- **CI / pre-commit:** The **lint** job runs pre-commit, which includes this check. Pull requests that change **only** ignored paths (`**.md`, `docs/**` per `.github/workflows/actions.yml`) do not run CI; any PR that touches `**/services.py` or `core/protocols.py` still runs the check—regenerate docs before pushing.

### How to use

Expand Down Expand Up @@ -61,7 +75,7 @@ For a full list of functions, parameter/return types, and validation (e.g. empty
- **Branching:** Create feature branches from `develop`. Open pull requests against `develop`. See [Development_guideline.md](Development_guideline.md).
- **Code style:** Use Python 3.11+ and follow Django and project conventions. Use the project’s logging (`logging.getLogger(__name__)`). Before pushing, run **`uv run pyright`** (with dev deps) for the paths covered by **`pyrightconfig.json`**, and ensure CI’s **lint** / **pyright** / **test** jobs would pass.
- **Database:** Use the Django ORM and migrations. Writes only through the service layer as above.
- **Docs:** Update this doc (and app `services.py` docstrings) when adding new apps or changing the write rules.
- **Docs:** Update this doc (and app `services.py` docstrings) when adding new apps or changing the write rules. After changing `services.py` or `core/protocols.py`, run `python scripts/generate_service_docs.py` and commit the updated `docs/service_api/` files.

## Related documentation

Expand Down
13 changes: 13 additions & 0 deletions docs/Service_API.md
Original file line number Diff line number Diff line change
Expand Up @@ -15,6 +15,11 @@ All writes to app models must go through the service layer. The API is documente
| **boost_library_docs_tracker** | `boost_library_docs_tracker.services` | Globally unique doc content (BoostDocContent) and (library-version, page) relation tracking (BoostLibraryDocumentation). |
| **boost_usage_tracker** | `boost_usage_tracker.services` | External repos, Boost usage, missing-header tmp. |
| **discord_activity_tracker** | `discord_activity_tracker.services` | Discord servers, channels, messages, reactions (authors: `cppa_user_tracker.DiscordProfile`). |
| **cppa_youtube_script_tracker** | `cppa_youtube_script_tracker.services` | YouTube channels, videos, tags, transcript state; speaker links. |
| **clang_github_tracker** | `clang_github_tracker.services` | Upsert llvm issue/PR/commit rows; fetch watermarks. |
| **boost_mailing_list_tracker** | `boost_mailing_list_tracker.services` | Mailing list messages and names. |
| **cppa_slack_tracker** | `cppa_slack_tracker.services` | Slack teams, channels, messages, membership. |
| **wg21_paper_tracker** | `wg21_paper_tracker.services` | WG21 papers, authors, mailings. |

---

Expand All @@ -28,6 +33,14 @@ All writes to app models must go through the service layer. The API is documente
- **[service_api/cppa_pinecone_sync.md](service_api/cppa_pinecone_sync.md)** – API for `cppa_pinecone_sync.services`.
- **[service_api/boost_usage_tracker.md](service_api/boost_usage_tracker.md)** – API for `boost_usage_tracker.services`.
- **[service_api/discord_activity_tracker.md](service_api/discord_activity_tracker.md)** – API for `discord_activity_tracker.services`; management commands, sync modules, and Pinecone notes.
- **[service_api/cppa_youtube_script_tracker.md](service_api/cppa_youtube_script_tracker.md)** – API for `cppa_youtube_script_tracker.services`; preprocessor, fetcher, workspace, and transcript helpers.
- **[service_api/clang_github_tracker.md](service_api/clang_github_tracker.md)** – API for `clang_github_tracker.services`.
- **[service_api/boost_mailing_list_tracker.md](service_api/boost_mailing_list_tracker.md)** – API for `boost_mailing_list_tracker.services`.
- **[service_api/cppa_slack_tracker.md](service_api/cppa_slack_tracker.md)** – API for `cppa_slack_tracker.services`.
- **[service_api/wg21_paper_tracker.md](service_api/wg21_paper_tracker.md)** – API for `wg21_paper_tracker.services`.
- **[service_api/core_protocols.md](service_api/core_protocols.md)** – `core.protocols` DTO protocols (`TrackerResult`, `ActivityRecord`, `IncrementalState`).

Tables in each file are **generated** from source; see [Contributing.md](Contributing.md#regenerating-service-api-docs).

---

Expand Down
10 changes: 9 additions & 1 deletion docs/service_api/README.md
Original file line number Diff line number Diff line change
Expand Up @@ -15,6 +15,10 @@ Index of all app service modules. All writes to app models must go through the s
| [discord_activity_tracker.services](discord_activity_tracker.md) | discord_activity_tracker | Servers, channels, messages, reactions (user profiles in cppa_user_tracker). |
| [cppa_youtube_script_tracker.services](cppa_youtube_script_tracker.md) | cppa_youtube_script_tracker | YouTube channels, videos, transcript state, and speaker links for C++ conference talks. |
| [clang_github_tracker.services](clang_github_tracker.md) | clang_github_tracker | Upsert llvm issue/PR/commit rows; DB watermarks for API fetch windows. |
| [boost_mailing_list_tracker.services](boost_mailing_list_tracker.md) | boost_mailing_list_tracker | Mailing list messages and list names. |
| [cppa_slack_tracker.services](cppa_slack_tracker.md) | cppa_slack_tracker | Slack teams, channels, messages, and membership changes. |
| [wg21_paper_tracker.services](wg21_paper_tracker.md) | wg21_paper_tracker | WG21 papers, authors, and mailings. |
| [core.protocols](core_protocols.md) | core | Runtime-checkable DTO protocols (`TrackerResult`, `ActivityRecord`, `IncrementalState`); see also [Core public API](../Core_public_API.md). |

---

Expand All @@ -29,5 +33,9 @@ Index of all app service modules. All writes to app models must go through the s
- **cppa_youtube_script_tracker** – Get-or-create YouTubeChannel, YouTubeVideo; update transcript state; link speakers to videos. Speaker profiles (`YoutubeSpeaker`) in cppa_user_tracker.
- **cppa_pinecone_sync** – Get/clear/record failed IDs in PineconeFailList; get/update PineconeSyncStatus.
- **clang_github_tracker** – Upsert `ClangGithubIssueItem` / `ClangGithubCommit` during sync or backfill; read `Max(github_updated_at)` / `Max(github_committed_at)` for fetch cursors.
- **boost_mailing_list_tracker** – Mailing list message and name helpers.
- **cppa_slack_tracker** – Slack team/channel/message persistence and membership sync.
- **wg21_paper_tracker** – WG21 paper and author persistence.
- **core.protocols** – Structural contracts for sync outcomes and activity payloads (see [core_protocols.md](core_protocols.md)).

See [Contributing.md](../Contributing.md) for the rule that all writes go through the service layer.
See [Contributing.md](../Contributing.md) for the rule that all writes go through the service layer, and for **regenerating** these docs from source.
24 changes: 13 additions & 11 deletions docs/service_api/boost_library_docs_tracker.md
Original file line number Diff line number Diff line change
Expand Up @@ -8,15 +8,20 @@
**Pinecone upsert state** is stored on `BoostDocContent.is_upserted`, not on `BoostLibraryDocumentation` (the join table has only the two FKs plus `created_at`).

---
<!-- SERVICE_API:GENERATED:START -->

## BoostDocContent
## Public API (generated)

| Function | Parameter types | Return type | Notes |
| -------------------------------- | ------------------------------------------------------------------- | ----------------------------- | --------------------------------------------------------------------- |
| `get_or_create_doc_content` | `url: str`, `content_hash: str`, `version_id: int \| None = None` | `tuple[BoostDocContent, str]` | See return values below. `ValueError` if `url` is empty. |
| `set_doc_content_upserted` | `doc: BoostDocContent`, `value: bool` | `BoostDocContent` | Sets `is_upserted`. |
| `set_doc_content_upserted_by_ids`| `ids: list[int]`, `value: bool` | `int` | Bulk `UPDATE`; returns number of rows updated. |
| `get_unupserted_doc_contents` | — | `QuerySet[BoostDocContent]` | `is_upserted=False`; used for Pinecone sync worklists. |
| Function | Parameters | Return type | Summary |
| --- | --- | --- | --- |
| `get_docs_for_library_version` | library_version_id: int | django_models.QuerySet | Return all BoostLibraryDocumentation rows for this library-version. |
| `get_or_create_doc_content` | url: str, content_hash: str, version_id: int \| None = None | tuple[BoostDocContent, str] | Get or create a BoostDocContent row for the given content_hash. Page content is NOT stored in the DB; it lives in workspace files. |
| `get_unupserted_doc_contents` | | django_models.QuerySet | Return all BoostDocContent rows that have not been upserted to Pinecone. |
| `link_content_to_library_version` | library_version_id: int, doc_content_id: int | tuple[BoostLibraryDocumentation, bool] | Get or create a BoostLibraryDocumentation row for the (library_version, doc_content) pair. Returns (relation, created). |
| `set_doc_content_upserted` | doc: BoostDocContent, value: bool | BoostDocContent | Set is_upserted on a BoostDocContent row. |
| `set_doc_content_upserted_by_ids` | ids: list[int], value: bool | int | Bulk-set is_upserted for BoostDocContent rows with the given PKs. Returns the number of rows updated. |

<!-- SERVICE_API:GENERATED:END -->

### `get_or_create_doc_content` return values

Expand All @@ -33,7 +38,4 @@ The second element is a `str` indicating what changed:

Join table: one row per `(boost_library_version, boost_doc_content)` pair. **No** `page_count`, status fields, or `updated_at` on the model.

| Function | Parameter types | Return type | Notes |
| --------------------------------- | ---------------------------------------------------- | ---------------------------------------- | --------------------------------------------------------------------- |
| `link_content_to_library_version` | `library_version_id: int`, `doc_content_id: int` | `tuple[BoostLibraryDocumentation, bool]` | `get_or_create` on the pair. Second value is `created`. |
| `get_docs_for_library_version` | `library_version_id: int` | `QuerySet[BoostLibraryDocumentation]` | All join rows for that library version. |
See the generated **Public API** table above for `link_content_to_library_version` and `get_docs_for_library_version`.
Loading
Loading