Skip to content

feat(dialtone-docs): DLT-3109 add markdown-to-JSON generator for AI docs#1178

Open
belumontoya wants to merge 3 commits intostagingfrom
feature/DLT-3109-ai-docs-generator
Open

feat(dialtone-docs): DLT-3109 add markdown-to-JSON generator for AI docs#1178
belumontoya wants to merge 3 commits intostagingfrom
feature/DLT-3109-ai-docs-generator

Conversation

@belumontoya
Copy link
Copy Markdown
Collaborator

@belumontoya belumontoya commented Apr 7, 2026

feat(dialtone-docs): DLT-3109 add markdown-to-JSON generator for AI docs

Obligatory GIF (super important!)

Obligatory GIF

🛠️ Type Of Change

  • Feature

📖 Jira Ticket

https://dialpad.atlassian.net/browse/DLT-3109

📖 Description

Adds the markdown-to-JSON build pipeline for the dialtone-docs package:

  • src/generators/build-ai-docs.mjs — Reads all markdown files under src/content/, parses YAML frontmatter (type, category, keywords, ai_summary), strips markdown syntax, and compiles everything into dist/ai-docs.json — a flat JSON array of document entries for AI consumption.
  • src/utils/strip-markdown.mjs — Utility that strips frontmatter, code blocks, HTML, links, headings, emphasis, and other markdown syntax to produce searchable plain text.
  • package.json / project.json — Added build script and NX target so pnpm nx run dialtone-docs:build triggers the generator.
  • tests/tests/build-output.test.js — 11 tests validating the JSON output schema (required fields, types, no markdown artifacts in content, file path integrity, no duplicate IDs).
  • tests/tests/strip-markdown.test.js — Unit tests for the strip-markdown utility (headings, code blocks, links, emphasis, frontmatter removal).
  • tests/helpers/markdownParser.js — Refactored to import stripMarkdown/stripFrontmatter from the new utility instead of bundling its own copy.

💡 Context

The dialtone-docs package provides AI-discoverable documentation for the Dialtone monorepo. This PR adds Milestone 3: the build step that compiles markdown content into a structured JSON file (ai-docs.json). This JSON output will serve as the data source for MCP server and CLI search tools, enabling AI agents to search the entire documentation site programmatically.

Each JSON entry includes: id, title, type, category, keywords, summary, content (plain text), filePath, lastUpdated, and relatedPackages.

📝 Checklist

  • I have ensured no private Dialpad links or info are in the code or pull request description (Dialtone is a public repo!).
  • I have reviewed my changes.
  • I have added all relevant documentation.
  • I have considered the performance impact of my change.
  • I have added / updated unit tests.

🔮 Next Steps

  • Refactor existing test suite (consolidate 6 test files down to 3, remove hardcoded content assertions that do not scale)
  • Integrate ai-docs.json into the MCP server and CLI as a search data source

Updated date serialization in the markdown-to-JSON generator to use deterministic ISO format. Refactored test setup to check for pre-built dist/ai-docs.json instead of executing builds synchronously. Added build target dependency to ensure compilation before tests.

Copy link
Copy Markdown

@chatgpt-codex-connector chatgpt-codex-connector bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

💡 Codex Review

Here are some automated review suggestions for this pull request.

Reviewed commit: 208c3b30bd

ℹ️ About Codex in GitHub

Your team has set up Codex to review pull requests in this repo. Reviews are triggered when you

  • Open a pull request for review
  • Mark a draft as ready
  • Comment "Codex (@codex) review".

If Codex has suggestions, it will comment; otherwise it will react with 👍.

Codex can also answer questions or update the PR. Try commenting "Codex (@codex) address that feedback".

Comment thread packages/dialtone-docs/src/generators/build-ai-docs.mjs Outdated
Copy link
Copy Markdown
Contributor

@braddialpad Brad Paugh (braddialpad) left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks really good, perhaps my only concern would be the large amount of confusing regex to parse the markdown, however it is necessary for this change and also easier to understand in the age of AI.

Couple of small comments, nothing major.

Comment on lines +45 to +48
for (const doc of docs) {
expect(doc.type, `"${doc.id}" type is null`).not.toBeNull();
expect(ALLOWED_TYPES, `"${doc.id}" invalid type "${doc.type}"`).toContain(doc.type);
}
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

For all tests where we are looping through arrays like this, we should be using test.each instead of a for loop.

A single failure will mask all subsequent ones when doing it the current way.

Copy link
Copy Markdown
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Agree, but one thing worth noting. docs is built in beforeAll, so test.each can't get it at test definition time. I could build synchronously at module scope or keep the loop, but use soft assertions. Any preference? Or a better idea?

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I guess we can just leave docs alone, and do something like this at least:

test.each(REQUIRED_FIELDS)('all entries have field "%s"', (field) => {
    for (const doc of docs) {
      expect(doc).toHaveProperty(field);
    }
  });

Moving the file reads outside of test execution has worse side effects than the current method IMO

Comment thread packages/dialtone-docs/src/utils/strip-markdown.mjs
Comment thread packages/dialtone-docs/tests/tests/build-output.test.js Outdated
- Fix non-deterministic date serialization: String(Date) is
  timezone-dependent, use toISOString().split('T')[0] for stable
  YYYY-MM-DD output
- Remove redundant null assertion in type field test
# Conflicts:
#	packages/dialtone-docs/src/generators/build-ai-docs.mjs
#	packages/dialtone-docs/tests/helpers/markdownParser.js
@coderabbitai
Copy link
Copy Markdown
Contributor

coderabbitai bot commented Apr 16, 2026

No actionable comments were generated in the recent review. 🎉

ℹ️ Recent review info
⚙️ Run configuration

Configuration used: Repository YAML (base), Central YAML (inherited), Organization UI (inherited)

Review profile: CHILL

Plan: Pro Plus

Run ID: 3c30371c-294f-435c-8b65-8355eaeb474d

📥 Commits

Reviewing files that changed from the base of the PR and between 9971fc0 and ec18346.

📒 Files selected for processing (3)
  • packages/dialtone-docs/project.json
  • packages/dialtone-docs/src/generators/build-ai-docs.mjs
  • packages/dialtone-docs/tests/tests/schema.test.js

Walkthrough

The dialtone-docs package build and test configuration was updated: the test target now depends on the build target, the lastUpdated field derivation was improved to handle Date instances, and test setup was refactored to check for generated files rather than execute the build synchronously.

Changes

Cohort / File(s) Summary
Build Configuration
packages/dialtone-docs/project.json
Added dependsOn: ["build"] to test target, ensuring build runs before test execution.
Generator Logic
packages/dialtone-docs/src/generators/build-ai-docs.mjs
Updated lastUpdated field derivation to check for Date instances and convert to UTC string (YYYY-MM-DD); otherwise coerce to null.
Test Setup
packages/dialtone-docs/tests/tests/schema.test.js
Removed synchronous build execution from beforeAll; replaced with file existence check for dist/ai-docs.json and error guidance.

Estimated code review effort

🎯 2 (Simple) | ⏱️ ~12 minutes

✨ Finishing Touches
🧪 Generate unit tests (beta)
  • Create PR with unit tests
  • Commit unit tests in branch feature/DLT-3109-ai-docs-generator

Comment @coderabbitai help to get the list of available commands and usage tips.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants