BDneX is a French comics (BD) metadata tagger and library manager. It automatically retrieves metadata from bedetheque.com and embeds it into your comic files using the ComicRack standard format.
📖 Version française | 🗺️ Roadmap
- Motivation
- Features
- Installation
- Quick Start
- Usage
- Configuration
- Testing
- Architecture
- Contributing
- Roadmap
- Troubleshooting
- License
Contrary to music tagging, there is no agreed standard vocabulary for comics tagging in general. However, the ComicRack standard is used by most library managers such as Komga.
While tools like ComicTagger exist for American comics (using the Comic Vine API), French comics (bandes dessinées) are largely underrepresented in these databases.
BDneX fills this gap by:
- Providing comprehensive metadata for French comics from bedetheque.com
- Using intelligent fuzzy matching to identify your comics
- Automatically embedding metadata in CBZ and CBR files
- Making it easy to organize large comic libraries by genre, author, rating, and more
- Enabling sharing of reading lists based on metadata rather than obscure filenames
Inspired by the excellent beets music manager.
- 🔍 Smart Search: Retrieves sitemaps from bedetheque.com for comprehensive album matching
- 🎯 Fuzzy Matching: Levenshtein distance algorithm for finding album names even with typos
- 🌐 Web Scraping: Parses webpage content with BeautifulSoup
- 📋 ComicRack Format: Converts parsed metadata to ComicInfo.xml (ComicRack standard)
- 🖼️ Cover Verification: Image comparison between online cover and archive cover for confidence scoring
- 💾 Multiple Formats: Supports both CBZ and CBR archive formats
- 🔄 Batch Processing: Process entire directories of comics at once
- ⚙️ Configurable: Customizable settings via YAML configuration file
- Title, Series, Volume Number
- Writers, Pencillers, Colorists, Inkers
- Publisher, Publication Year
- Synopsis/Summary
- Genre and Tags
- Community Rating
- Page Count
- Language
- ISBN
- Python 3.8 or higher
- pip (Python package manager)
- (Optional) Conda for environment management
Create and activate a virtual environment:
# Create environment from the provided file
conda env create --file=environment.yml
# Activate the environment
conda activate bdnex# Create a virtual environment
python3 -m venv bdnex-env
# Activate it (Linux/Mac)
source bdnex-env/bin/activate
# Activate it (Windows)
bdnex-env\Scripts\activateUser Installation (for general use):
pip install .Development Installation (for contributing):
pip install -e .[dev]This installs additional development tools like pytest and ipdb.
After installation, initialize BDneX to download bedetheque.com sitemaps:
bdnex --initThis downloads and caches sitemap data for faster comic matching (may take a few minutes on first run).
Process a single comic file:
bdnex -f /path/to/comic.cbzProcess an entire directory:
bdnex -d /path/to/comics/folderThe tool will:
- Extract the comic filename and attempt to match it with bedetheque.com entries
- Download metadata and cover image
- Compare covers to verify the match
- Embed metadata as ComicInfo.xml inside the archive
- Save the updated comic file
bdnex [OPTIONS]Options:
-f, --input-file <path>: Process a single comic file-d, --input-dir <path>: Process all comics in a directory (recursively searches for .cbz and .cbr files)-i, --init: Initialize or force re-download of bedetheque.com sitemaps-v, --verbose <level>: Set logging verbosity (default: info)
Process a single file:
bdnex -f "/comics/Asterix Tome 1 - Asterix le Gaulois.cbz"Process entire directory:
bdnex -d /comics/collectionForce sitemap update:
bdnex --initCombine options:
bdnex -d /comics/new-additions -v debugWhen processing a comic, you'll see output like:
2024-12-29 15:30:00,123 - INFO - bdnex.ui - Processing /comics/Nains Tome 1.cbz
2024-12-29 15:30:00,234 - INFO - bdnex.lib.bdgest - Searching for "Nains Tome 1" in bedetheque.com sitemap files
2024-12-29 15:30:00,345 - DEBUG - bdnex.lib.bdgest - Match album name succeeded
2024-12-29 15:30:00,456 - DEBUG - bdnex.lib.bdgest - Levenshtein score: 87.5
2024-12-29 15:30:00,567 - DEBUG - bdnex.lib.bdgest - Matched url: https://m.bedetheque.com/BD-Nains-Tome-1-Redwin-de-la-Forge-245127.html
2024-12-29 15:30:01,678 - INFO - bdnex.lib.bdgest - Converting parsed metadata to ComicRack template
2024-12-29 15:30:01,789 - INFO - bdnex.lib.cover - Checking Cover from input file with online cover
2024-12-29 15:30:02,890 - INFO - bdnex.lib.cover - Cover matching percentage: 92.5
2024-12-29 15:30:02,901 - INFO - bdnex.lib.comicrack - Add ComicInfo.xml to /comics/Nains Tome 1.cbz
2024-12-29 15:30:03,012 - INFO - bdnex.ui - Processing album done
If automatic matching fails or confidence is low, BDneX will prompt you:
- To manually enter a bedetheque.com URL
- To search interactively for the correct album
- To confirm whether to proceed with metadata embedding
BDneX uses a YAML configuration file located at:
- Linux/Mac:
~/.config/bdnex/bdnex.yaml - Windows:
%USERPROFILE%\.config\bdnex\bdnex.yaml
The configuration file is created automatically on first run from the default template.
bdnex:
config_path: ~/.config/bdnex # Configuration directory
share_path: ~/.local/share/bdnex # Data/cache directory
directory: /path/to/comics/library # Default library directory
import:
copy: no # Copy files during import
move: yes # Move files during import
replace: yes # Replace existing files
autotag: no # Automatically tag without confirmation
rename: yes # Rename files based on metadata
library: ~/.local/share/bdnex/bdnex.sqlite # Future feature: database
paths:
# Naming conventions for organized libraries
default: '%language/%type/%title (%author) [%year]/%title - %volume (%author) [%year]'
oneshot: '%language/oneShots/%title (%author) [%year]/%title (%author) [%year]'
series: '%language/series/%title (%author)/%title - %volume'
cover:
match_percentage: 40 # Minimum cover similarity percentage for auto-confirmationBDneX stores cached data in ~/.local/share/bdnex/:
bedetheque/sitemaps/: Cached sitemap filesbedetheque/albums_html/: Downloaded album pagesbedetheque/albums_json/: Parsed metadata in JSON formatbedetheque/covers/: Downloaded cover images
BDneX uses pytest for testing. To run the test suite:
# Run all tests
pytest
# Run with verbose output
pytest -v
# Run specific test file
pytest test/test_utils.py
# Run specific test
pytest test/test_cover.py::TestCover::test_front_cover_similarity_good_matchCheck code coverage:
# Install coverage tool (if not installed with dev dependencies)
pip install coverage
# Run tests with coverage
coverage run -m pytest
# View coverage report
coverage report
# Generate HTML coverage report
coverage html
# Open htmlcov/index.html in your browserCurrent test coverage:
- Overall: ~74%
archive_tools.py: 100%cover.py: 92%bdgest.py: 82%utils.py: 62%
Tests are organized in the test/ directory:
test_archive_tools.py: Archive extraction and manipulationtest_bdgest.py: BedeTheque scraping and metadata parsingtest_cover.py: Cover image comparison and downloadtest_utils.py: Utility functions (config, JSON, file operations)test_comicrack.py: ComicInfo.xml generation and embedding
bdnex/
├── bdnex/ # Main package
│ ├── conf/ # Configuration files and schemas
│ │ ├── ComicInfo.xsd # ComicRack XML schema
│ │ ├── bdnex.yaml # Default configuration
│ │ └── logging.conf # Logging configuration
│ ├── lib/ # Core library modules
│ │ ├── archive_tools.py # CBZ/CBR file handling
│ │ ├── bdgest.py # BedeTheque scraper
│ │ ├── comicrack.py # ComicInfo.xml generation
│ │ ├── cover.py # Cover image operations
│ │ └── utils.py # Utility functions
│ └── ui/ # User interface
│ └── __init__.py # CLI implementation
├── test/ # Test suite
├── README.md
├── setup.py
└── environment.yml
graph TB
subgraph CLI["User Interface"]
UI[ui/__init__.py<br>CLI & Arguments]
end
subgraph Core["Core Library"]
BDGEST[bdgest.py<br>Web Scraper & Matcher]
COVER[cover.py<br>Image Operations]
ARCHIVE[archive_tools.py<br>CBZ/CBR Handler]
COMICRACK[comicrack.py<br>ComicInfo.xml Generator]
UTILS[utils.py<br>Utilities & Config]
end
subgraph External["External Resources"]
BEDETHEQUE[(bedetheque.com<br>Metadata Source)]
CACHE[(Local Cache<br>~/.local/share/bdnex)]
CONFIG[(Config<br>~/.config/bdnex)]
end
subgraph Files["Comic Files"]
CBZ[CBZ/CBR Files]
end
UI --> BDGEST
UI --> COVER
UI --> ARCHIVE
UI --> COMICRACK
UI --> UTILS
BDGEST --> BEDETHEQUE
BDGEST --> CACHE
BDGEST --> COMICRACK
COVER --> BEDETHEQUE
COVER --> CACHE
COVER --> ARCHIVE
ARCHIVE --> CBZ
COMICRACK --> ARCHIVE
COMICRACK --> CBZ
UTILS --> CONFIG
UTILS --> CACHE
style CLI fill:#e1f5ff
style Core fill:#fff3e0
style External fill:#f3e5f5
style Files fill:#e8f5e9
-
bdgest.py:
- Downloads and processes bedetheque.com sitemaps
- Performs fuzzy string matching using Levenshtein distance
- Scrapes and parses album metadata
- Converts to ComicRack format
-
cover.py:
- Downloads cover images from bedetheque.com
- Uses SIFT feature detection for image comparison
- Calculates similarity percentage
-
comicrack.py:
- Generates ComicInfo.xml from metadata
- Validates against ComicInfo.xsd schema
- Embeds XML into comic archives
- Handles existing ComicInfo.xml (with diff display)
-
archive_tools.py:
- Extracts front covers from archives
- Supports both ZIP (CBZ) and RAR (CBR) formats
Comic File → Extract Filename → Fuzzy Match → Scrape Metadata
↓
Download Cover Image
↓
Compare Covers (SIFT)
↓
Generate ComicInfo.xml
↓
Embed in Archive → Updated Comic File
sequenceDiagram
actor User
participant CLI as CLI Interface
participant FS as File System
participant BDG as bdgest.py
participant CACHE as Local Cache
participant WEB as bedetheque.com
participant COV as cover.py
participant ARC as archive_tools.py
participant CR as comicrack.py
User->>CLI: bdnex -f comic.cbz
CLI->>FS: Read comic file
FS-->>CLI: File info
CLI->>BDG: Extract and match filename
BDG->>CACHE: Check sitemap cache
alt Sitemap cached
CACHE-->>BDG: Return sitemap data
else No cache
BDG->>WEB: Download sitemap
WEB-->>BDG: Sitemap data
BDG->>CACHE: Store sitemap
end
BDG->>BDG: Fuzzy match Levenshtein
BDG->>WEB: Scrape album page
WEB-->>BDG: HTML metadata
BDG->>BDG: Parse metadata
BDG->>CACHE: Store JSON metadata
CLI->>COV: Download cover
COV->>WEB: Fetch cover image
WEB-->>COV: Cover image
COV->>CACHE: Store cover
CLI->>ARC: Extract comic cover
ARC->>FS: Read from CBZ/CBR
FS-->>ARC: Cover image
CLI->>COV: Compare covers SIFT
COV-->>CLI: Similarity percentage
alt High confidence match
CLI->>CR: Generate ComicInfo.xml
CR->>CR: Validate against schema
CR->>ARC: Embed XML in archive
ARC->>FS: Update CBZ/CBR
CLI-->>User: Success message
else Low confidence
CLI-->>User: Request manual confirmation
User->>CLI: Provide URL or confirm
CLI->>CR: Generate ComicInfo.xml
CR->>ARC: Embed XML in archive
ARC->>FS: Update CBZ/CBR
CLI-->>User: Success message
end
Contributions are welcome! Here's how to get started:
- Fork and clone the repository:
git clone https://github.com/yourusername/bdnex.git
cd bdnex- Install in development mode:
pip install -e .[dev]-
Make your changes and add tests
-
Run the test suite:
pytest- Check code coverage:
coverage run -m pytest
coverage report- Follow PEP 8 style guidelines
- Use descriptive variable and function names
- Add docstrings to functions and classes
- Keep functions focused and single-purpose
- Add type hints where appropriate
When adding new features:
- Create tests in the appropriate
test/test_*.pyfile - Use
unittest.mockfor external dependencies - Aim for high code coverage (>80%)
- Test edge cases and error conditions
- Create a feature branch:
git checkout -b feature/my-feature - Make your changes with clear commit messages
- Ensure all tests pass
- Update documentation if needed
- Submit a pull request with a clear description
Planned features for future releases:
- SQLite Database: Keep records of already processed comics
- Interactive Mode: Enhanced CLI with selection menus
- Catalog Manager: Browse and manage your tagged collection
- Renaming Convention: Auto-rename files based on metadata and user config
- Additional Sources: Support for bdfugue.com and other French comic databases
- Resume Support: Pick up where you left off in batch processing
- GUI Application: Desktop application with visual interface
- Plugin System: Extensible architecture for custom metadata sources
- Duplicate Detection: Find and manage duplicate comics
- Reading Lists: Create and manage reading lists
- Web Interface: Browser-based management interface
Inspired by beets music manager.
Problem: "Cover matching percentage is low"
- The automatic match may be incorrect
- You'll be prompted to manually enter the bedetheque.com URL
- You can adjust
cover.match_percentagein config to be more/less strict
Problem: "Album not found in sitemap"
- Run
bdnex --initto update sitemaps - Try simplifying the filename (remove special characters, edition info)
- Use interactive mode to search manually
Problem: "Import Error: No module named 'cv2'"
- OpenCV is not installed correctly
- Run:
pip install opencv-contrib-python-headless
Problem: "RAR files not extracting"
- Install unrar:
sudo apt-get install unrar(Linux) or download from rarlab.com
Problem: Tests failing with "No source for code: config-3.py"
- This is a coverage tool artifact and can be ignored
- Tests should still pass successfully
Run with verbose debug output:
bdnex -d /comics -v debug- Check existing GitHub Issues
- Open a new issue with:
- Your OS and Python version
- Command you ran
- Full error message
- Example filename causing issues
This project is licensed under the MIT License - see the LICENSE file for details.
- bedetheque.com for comprehensive French comics database
- beets for inspiration on music library management
- ComicRack for the metadata standard
- All contributors who help improve BDneX
Note: BDneX is currently in active development. Some features mentioned in the roadmap are planned but not yet implemented. The tool is functional for its core purpose of tagging French comics.