This repository contains tools and extracted data from the Tipitakapali.org Android APK (v26.2.5), which provides access to the complete Pali Tipitaka (Buddhist canon) with full-text search capabilities and multiple dictionaries.
The APK contains 8 SQLite databases that together form a comprehensive Pali Buddhist text corpus with:
- 522,747 text segments from the complete Tipitaka
- Full-text search index for the entire corpus
- 8,967 HTML files with formatted book content
- Multiple dictionaries: Digital Pali Dictionary (DPD), PTS Pali-English Dictionary, Abhidhana (Myanmar)
- Grammatical tools: inflection tables, compound word splitter, synonyms
├── README.md # This file
├── DATABASE_DOCUMENTATION.md # Detailed database schema documentation
├── DATA_EXTRACTION_PLAN.md # Extraction methodology and plans
├── extract_tipitaka.py # Main extraction script
├── Tipitakapali.org_v26.2.5.apk # Original APK file
├── apk_extracted/ # Decompiled APK contents
├── db_extracted/ # Extracted SQLite databases
└── output/ # Processed and extracted data
├── texts/ # JSON files with extracted text content
└── metadata/ # Database metadata and schemas
fts_tipitaka.db- Full-text search index (522,747 segments)cstpali.db- Complete HTML book content (8,967 files)dpd_tipitakapali.db- Digital Pali Dictionarydpd_inflection_tipitakapali.db- Grammatical inflection tablesdpd_synonyms_tipitakapali.db- Synonym mappingsdpd_splitter_tipitakapali.db- Compound word splitterabhidhan_tipitakapali.db- Abhidhana Myanmar dictionaryptsped2015ed_tipitakapali.db- PTS Pali-English Dictionary
- Vinaya Piṭaka (Monastic rules)
- Sutta Piṭaka (Discourses)
- Abhidhamma Piṭaka (Analytical teachings)
- Commentaries (Aṭṭhakathā)
- Sub-commentaries (Ṭīkā)
pip install sqlite3 json zipfilepython extract_tipitaka.pyThis will:
- Extract all databases from the APK
- Process the full-text search database
- Generate JSON files with structured text data
- Create metadata files with database schemas
The extracted data is available in multiple formats:
- JSON files in
output/texts/- structured text data by book - SQLite databases in
db_extracted/- original database files - Metadata in
output/metadata/- database schemas and statistics
Each JSON file contains structured text segments:
{
"book_info": {
"code": "vin01m",
"total_segments": 1234,
"description": "Vinaya Mahavibhanga"
},
"segments": [
{
"path": "11@vin01m.mul0@k1",
"content": "Pali text content...",
"book_code": "11",
"filename": "vin01m.mul0",
"paragraph": "k1"
}
]
}- DATABASE_DOCUMENTATION.md - Complete database schema analysis
- DATA_EXTRACTION_PLAN.md - Extraction methodology and future plans
This project extracts data from the Tipitakapali.org APK for research and preservation purposes. Please respect the original creators' work and any applicable licenses.
Contributions are welcome! Please feel free to submit issues or pull requests to improve the extraction tools or documentation.
- Tipitakapali.org - Original APK developers
- Digital Pali Dictionary (DPD) - Comprehensive Pali dictionary project
- Chaṭṭha Saṅgāyana - Digital Tipitaka source text
- PTS - Pali Text Society dictionary