A document conversion and publishing toolkit for an org-centric knowledge workflow.
memex-kb started as a Google Docs → Denote knowledge base converter and has grown into a broader toolbox for turning legacy or platform-bound content into plain-text, version-controlled, and AI-friendly formats.
Today the repository combines three layers:
- Knowledge ingestion — Google Docs, Threads, Confluence, GitHub Stars, and blog exports
- Structured document pipelines — proposal workflows, Org/ODT/HWP-oriented transformations
- Reusable publishing templates — paper, presentation, and PowerPoint template injection workflows
The guiding idea is simple:
Legacy content → structured text → reproducible artifacts → human + AI collaboration
memex-kb is useful when you want to:
- convert documents into Markdown, Org-mode, BibTeX, ODT, DOC, PDF, or PPTX-adjacent workflows
- preserve structure well enough for search, versioning, and AI-assisted editing
- standardize output with Denote-style naming, rule-based classification, and template-driven publishing
- keep the whole workflow reproducible with Nix Flakes and CLI-first tooling
| Backend / Source | Status | Main entry point | Output |
|---|---|---|---|
| Google Docs | Stable | scripts/gdocs_md_processor.py, ./run.sh gdocs-export |
Markdown, DOCX, PDF, HTML, TXT |
| Threads | Stable | scripts/threads_exporter.py, ./run.sh threads-export |
Org-mode + images |
Confluence export (.doc MIME HTML) |
Stable | scripts/confluence_to_markdown.py |
Clean Markdown |
| GitHub Stars | Stable | scripts/gh_starred_to_bib.sh, ./run.sh github-starred-export |
BibTeX |
| Naver Blog | Active | scripts/naver_blog_crawler.py, ./run.sh naver-* |
Denote-style Org + assets |
| HWPX / OWPML related workflows | Active | hwpx2org/, orgadoc2odt/, proposal-pipeline/ |
Org, ODT, DOC, HWP-oriented outputs |
| Template / Pipeline | Purpose |
|---|---|
templates/arxiv-acm/ |
Org-mode → ACM acmart → PDF / ArXiv-ready source workflow |
templates/presentation/ |
Quarto / Reveal.js HTML presentation template |
templates/presentation-pptx/ |
org2pptx: inject Org-mode content into an existing PPTX template while preserving layout/design |
proposal-pipeline/ |
Google Docs → Markdown → Org-mode → ODT/DOC proposal workflow |
memex-kb/
├── README.md
├── AGENTS.md
├── BACKENDS.md
├── DEVELOPMENT.md
├── DENOTE-RULES.md
├── run.sh # Primary command entry point
├── flake.nix # Reproducible dev environment
├── config/ # Local env/config templates
├── scripts/ # Main backend and utility scripts
│ ├── adapters/
│ ├── gdocs_md_processor.py
│ ├── threads_exporter.py
│ ├── confluence_to_markdown.py
│ ├── gh_starred_to_bib.sh
│ ├── md_to_gdocs.py
│ ├── md_to_gdocs_html.py
│ └── naver_blog_crawler.py
├── templates/
│ ├── arxiv-acm/
│ ├── presentation/
│ └── presentation-pptx/
├── proposal-pipeline/ # Proposal authoring and export pipeline
├── hwpx2org/ # HWPX/Org-related conversion utilities
├── orgadoc2odt/ # AsciiDoc/ODT conversion utilities
├── office/ # Real project working materials and samples
├── docs/ # Converted output and project notes
└── logs/ # Execution logs
scripts/: the main place for backend integrations and conversion entry pointstemplates/: reusable starter templates for papers and presentationsproposal-pipeline/: the most opinionated end-to-end workflow in the repooffice/: practical working examples and proposal artifactshwpx2org/andorgadoc2odt/: lower-level format conversion experiments and tools
This project uses Nix Flakes.
Use one of the following:
# interactive shell
nix develop
# one-off command
nix develop --command python scripts/threads_exporter.py --download-images
# recommended for regular work
direnv allow- reproducible dependencies
- no ad-hoc
pip install - consistent Python / Pandoc / CLI tooling
- easier agent automation
./run.sh./run.sh gdocs-export <DOC_ID>
./run.sh gdocs-export <DOC_ID> --format md
./run.sh gdocs-export <DOC_ID> --format docx --depth 0./run.sh threads-export --download-images
./run.sh threads-export --max-posts 5 --download-images./run.sh confluence-convert document.doc
./run.sh confluence-batch ./input-dir ./output-dir./run.sh github-starred-export
./run.sh github-starred-export ~/org/resources/github-starred.bib./run.sh proposal-build --export-md
./run.sh proposal-merge --strip-hwpx-idx --org-tables
./run.sh proposal-export-odt./run.sh arxiv-build
./run.sh arxiv-build templates/arxiv-acm/sample.orgA complete sample for:
- Org-mode authoring
- ACM
acmartLaTeX export - PDF generation suitable for paper drafting / ArXiv submission workflows
See: templates/arxiv-acm/README.md
A Quarto / Reveal.js presentation starter for browser-based slide decks.
See: templates/presentation/README.md
A newer org2pptx pipeline for teams that must submit or reuse a branded PowerPoint template.
Instead of rendering slides from scratch, it:
- parses an Org file
- injects content into an existing
.pptxtemplate - preserves original slide backgrounds, logos, layouts, and branding
This is especially useful when pandoc --reference-doc or layout-name-based approaches fail on localized corporate templates.
See: templates/presentation-pptx/README.md
- Need Google Docs tabs exported cleanly → use
gdocs-export - Need social writing archived into Org → use
threads-export - Need legacy Confluence exports cleaned up → use
confluence-convert - Need citation-ready GitHub Stars → use
github-starred-export - Need proposal submission artifacts → use
proposal-pipeline/ - Need a paper PDF from Org → use
templates/arxiv-acm/ - Need HTML slides → use
templates/presentation/ - Need content injected into an existing company PPTX → use
templates/presentation-pptx/
| File | Purpose |
|---|---|
AGENTS.md |
Working guidance for coding agents and maintainers |
BACKENDS.md |
Backend-specific notes and usage details |
DEVELOPMENT.md |
Development guidance for extending the project |
DENOTE-RULES.md |
Naming and structuring rules for Denote-style output |
proposal-pipeline/README.md |
Detailed proposal workflow documentation |
office/README.md |
Real-world working context and example materials |
- Added
templates/presentation-pptx/ - Introduced an Org-mode → PPTX template injection workflow using
python-pptx - Preserves branded PowerPoint templates instead of recreating slides from scratch
- Added
templates/arxiv-acm/ - Added Org-mode →
acmart→ PDF sample pipeline - Exposed
./run.sh arxiv-build
- Added
md_to_gdocs.pyandmd_to_gdocs_html.py - Optimized the Markdown → Org/HTML/Docx path for Google Docs import workflows
- Added listing, crawling, verification, retry, title-fix, and wordmap commands
- Improved image handling, slug normalization, and title cleanup
- Added
scripts/gh_starred_to_bib.sh - Added
./run.sh github-starred-export - Preserved
starred_at,pushed_at, andupdated_atmetadata for BibTeX output
- Added HWPX/AsciiDoc-related tooling
- Added EPUB → Org workflows
- Added HTML → EPUB → Org experiments
- Added MIME-aware Confluence export parsing
- Normalized UTF-8/NFC issues and cleaned noisy markup
- Migrated from
shell.nixtoflake.nix - Added
direnvintegration - Replaced secretlint with gitleaks
- Improved Threads OAuth token management
- Project began as a Google Docs → Denote knowledge base converter
- Evolved toward a multi-backend, template-oriented, AI-friendly document workflow toolkit
MIT