Skip to content

guiltekmdion/bdnex

 
 

Repository files navigation

BDneX

BDneX codecov Python 3.8+ License: MIT

BDneX is a French comics (BD) metadata tagger and library manager. It automatically retrieves metadata from bedetheque.com and embeds it into your comic files using the ComicRack standard format.

📖 Version française | 🗺️ Roadmap

Table of Contents

Motivation

Contrary to music tagging, there is no agreed standard vocabulary for comics tagging in general. However, the ComicRack standard is used by most library managers such as Komga.

While tools like ComicTagger exist for American comics (using the Comic Vine API), French comics (bandes dessinées) are largely underrepresented in these databases.

BDneX fills this gap by:

  • Providing comprehensive metadata for French comics from bedetheque.com
  • Using intelligent fuzzy matching to identify your comics
  • Automatically embedding metadata in CBZ and CBR files
  • Making it easy to organize large comic libraries by genre, author, rating, and more
  • Enabling sharing of reading lists based on metadata rather than obscure filenames

Inspired by the excellent beets music manager.

Features

Current Features

  • 🔍 Smart Search: Retrieves sitemaps from bedetheque.com for comprehensive album matching
  • 🎯 Fuzzy Matching: Levenshtein distance algorithm for finding album names even with typos
  • 🌐 Web Scraping: Parses webpage content with BeautifulSoup
  • 📋 ComicRack Format: Converts parsed metadata to ComicInfo.xml (ComicRack standard)
  • 🖼️ Cover Verification: Image comparison between online cover and archive cover for confidence scoring
  • 💾 Multiple Formats: Supports both CBZ and CBR archive formats
  • 🔄 Batch Processing: Process entire directories of comics at once
  • ⚙️ Configurable: Customizable settings via YAML configuration file

Supported Metadata

  • Title, Series, Volume Number
  • Writers, Pencillers, Colorists, Inkers
  • Publisher, Publication Year
  • Synopsis/Summary
  • Genre and Tags
  • Community Rating
  • Page Count
  • Language
  • ISBN

Installation

Prerequisites

  • Python 3.8 or higher
  • pip (Python package manager)
  • (Optional) Conda for environment management

Option 1: Using Conda (Recommended)

Create and activate a virtual environment:

# Create environment from the provided file
conda env create --file=environment.yml

# Activate the environment
conda activate bdnex

Option 2: Using venv

# Create a virtual environment
python3 -m venv bdnex-env

# Activate it (Linux/Mac)
source bdnex-env/bin/activate

# Activate it (Windows)
bdnex-env\Scripts\activate

Installation Modes

User Installation (for general use):

pip install .

Development Installation (for contributing):

pip install -e .[dev]

This installs additional development tools like pytest and ipdb.

First-Time Setup

After installation, initialize BDneX to download bedetheque.com sitemaps:

bdnex --init

This downloads and caches sitemap data for faster comic matching (may take a few minutes on first run).

Quick Start

Process a single comic file:

bdnex -f /path/to/comic.cbz

Process an entire directory:

bdnex -d /path/to/comics/folder

The tool will:

  1. Extract the comic filename and attempt to match it with bedetheque.com entries
  2. Download metadata and cover image
  3. Compare covers to verify the match
  4. Embed metadata as ComicInfo.xml inside the archive
  5. Save the updated comic file

Usage

Command Line Options

bdnex [OPTIONS]

Options:

  • -f, --input-file <path>: Process a single comic file
  • -d, --input-dir <path>: Process all comics in a directory (recursively searches for .cbz and .cbr files)
  • -i, --init: Initialize or force re-download of bedetheque.com sitemaps
  • -v, --verbose <level>: Set logging verbosity (default: info)

Examples

Process a single file:

bdnex -f "/comics/Asterix Tome 1 - Asterix le Gaulois.cbz"

Process entire directory:

bdnex -d /comics/collection

Force sitemap update:

bdnex --init

Combine options:

bdnex -d /comics/new-additions -v debug

Example Output

When processing a comic, you'll see output like:

2024-12-29 15:30:00,123 - INFO     - bdnex.ui - Processing /comics/Nains Tome 1.cbz
2024-12-29 15:30:00,234 - INFO     - bdnex.lib.bdgest - Searching for "Nains Tome 1" in bedetheque.com sitemap files
2024-12-29 15:30:00,345 - DEBUG    - bdnex.lib.bdgest - Match album name succeeded
2024-12-29 15:30:00,456 - DEBUG    - bdnex.lib.bdgest - Levenshtein score: 87.5
2024-12-29 15:30:00,567 - DEBUG    - bdnex.lib.bdgest - Matched url: https://m.bedetheque.com/BD-Nains-Tome-1-Redwin-de-la-Forge-245127.html
2024-12-29 15:30:01,678 - INFO     - bdnex.lib.bdgest - Converting parsed metadata to ComicRack template
2024-12-29 15:30:01,789 - INFO     - bdnex.lib.cover - Checking Cover from input file with online cover
2024-12-29 15:30:02,890 - INFO     - bdnex.lib.cover - Cover matching percentage: 92.5
2024-12-29 15:30:02,901 - INFO     - bdnex.lib.comicrack - Add ComicInfo.xml to /comics/Nains Tome 1.cbz
2024-12-29 15:30:03,012 - INFO     - bdnex.ui - Processing album done

Interactive Mode

If automatic matching fails or confidence is low, BDneX will prompt you:

  • To manually enter a bedetheque.com URL
  • To search interactively for the correct album
  • To confirm whether to proceed with metadata embedding

Configuration

BDneX uses a YAML configuration file located at:

  • Linux/Mac: ~/.config/bdnex/bdnex.yaml
  • Windows: %USERPROFILE%\.config\bdnex\bdnex.yaml

The configuration file is created automatically on first run from the default template.

Configuration Options

bdnex:
  config_path: ~/.config/bdnex       # Configuration directory
  share_path: ~/.local/share/bdnex   # Data/cache directory

directory: /path/to/comics/library    # Default library directory

import:
  copy: no          # Copy files during import
  move: yes         # Move files during import
  replace: yes      # Replace existing files
  autotag: no       # Automatically tag without confirmation
  rename: yes       # Rename files based on metadata

library: ~/.local/share/bdnex/bdnex.sqlite  # Future feature: database

paths:
  # Naming conventions for organized libraries
  default: '%language/%type/%title (%author) [%year]/%title - %volume (%author) [%year]'
  oneshot: '%language/oneShots/%title (%author) [%year]/%title (%author) [%year]'
  series: '%language/series/%title (%author)/%title - %volume'

cover:
  match_percentage: 40  # Minimum cover similarity percentage for auto-confirmation

Data Storage

BDneX stores cached data in ~/.local/share/bdnex/:

  • bedetheque/sitemaps/: Cached sitemap files
  • bedetheque/albums_html/: Downloaded album pages
  • bedetheque/albums_json/: Parsed metadata in JSON format
  • bedetheque/covers/: Downloaded cover images

Testing

Running Tests

BDneX uses pytest for testing. To run the test suite:

# Run all tests
pytest

# Run with verbose output
pytest -v

# Run specific test file
pytest test/test_utils.py

# Run specific test
pytest test/test_cover.py::TestCover::test_front_cover_similarity_good_match

Test Coverage

Check code coverage:

# Install coverage tool (if not installed with dev dependencies)
pip install coverage

# Run tests with coverage
coverage run -m pytest

# View coverage report
coverage report

# Generate HTML coverage report
coverage html
# Open htmlcov/index.html in your browser

Current test coverage:

  • Overall: ~74%
  • archive_tools.py: 100%
  • cover.py: 92%
  • bdgest.py: 82%
  • utils.py: 62%

Test Structure

Tests are organized in the test/ directory:

  • test_archive_tools.py: Archive extraction and manipulation
  • test_bdgest.py: BedeTheque scraping and metadata parsing
  • test_cover.py: Cover image comparison and download
  • test_utils.py: Utility functions (config, JSON, file operations)
  • test_comicrack.py: ComicInfo.xml generation and embedding

Architecture

Project Structure

bdnex/
├── bdnex/                  # Main package
│   ├── conf/              # Configuration files and schemas
│   │   ├── ComicInfo.xsd  # ComicRack XML schema
│   │   ├── bdnex.yaml     # Default configuration
│   │   └── logging.conf   # Logging configuration
│   ├── lib/               # Core library modules
│   │   ├── archive_tools.py   # CBZ/CBR file handling
│   │   ├── bdgest.py          # BedeTheque scraper
│   │   ├── comicrack.py       # ComicInfo.xml generation
│   │   ├── cover.py           # Cover image operations
│   │   └── utils.py           # Utility functions
│   └── ui/                # User interface
│       └── __init__.py    # CLI implementation
├── test/                  # Test suite
├── README.md
├── setup.py
└── environment.yml

Architecture Diagram

graph TB
    subgraph CLI["User Interface"]
        UI[ui/__init__.py<br>CLI & Arguments]
    end
    
    subgraph Core["Core Library"]
        BDGEST[bdgest.py<br>Web Scraper & Matcher]
        COVER[cover.py<br>Image Operations]
        ARCHIVE[archive_tools.py<br>CBZ/CBR Handler]
        COMICRACK[comicrack.py<br>ComicInfo.xml Generator]
        UTILS[utils.py<br>Utilities & Config]
    end
    
    subgraph External["External Resources"]
        BEDETHEQUE[(bedetheque.com<br>Metadata Source)]
        CACHE[(Local Cache<br>~/.local/share/bdnex)]
        CONFIG[(Config<br>~/.config/bdnex)]
    end
    
    subgraph Files["Comic Files"]
        CBZ[CBZ/CBR Files]
    end
    
    UI --> BDGEST
    UI --> COVER
    UI --> ARCHIVE
    UI --> COMICRACK
    UI --> UTILS
    
    BDGEST --> BEDETHEQUE
    BDGEST --> CACHE
    BDGEST --> COMICRACK
    
    COVER --> BEDETHEQUE
    COVER --> CACHE
    COVER --> ARCHIVE
    
    ARCHIVE --> CBZ
    
    COMICRACK --> ARCHIVE
    COMICRACK --> CBZ
    
    UTILS --> CONFIG
    UTILS --> CACHE
    
    style CLI fill:#e1f5ff
    style Core fill:#fff3e0
    style External fill:#f3e5f5
    style Files fill:#e8f5e9
Loading

Key Components

  1. bdgest.py:

    • Downloads and processes bedetheque.com sitemaps
    • Performs fuzzy string matching using Levenshtein distance
    • Scrapes and parses album metadata
    • Converts to ComicRack format
  2. cover.py:

    • Downloads cover images from bedetheque.com
    • Uses SIFT feature detection for image comparison
    • Calculates similarity percentage
  3. comicrack.py:

    • Generates ComicInfo.xml from metadata
    • Validates against ComicInfo.xsd schema
    • Embeds XML into comic archives
    • Handles existing ComicInfo.xml (with diff display)
  4. archive_tools.py:

    • Extracts front covers from archives
    • Supports both ZIP (CBZ) and RAR (CBR) formats

Workflow

Comic File → Extract Filename → Fuzzy Match → Scrape Metadata
                                     ↓
                            Download Cover Image
                                     ↓
                            Compare Covers (SIFT)
                                     ↓
                            Generate ComicInfo.xml
                                     ↓
                            Embed in Archive → Updated Comic File

Workflow Diagram

sequenceDiagram
    actor User
    participant CLI as CLI Interface
    participant FS as File System
    participant BDG as bdgest.py
    participant CACHE as Local Cache
    participant WEB as bedetheque.com
    participant COV as cover.py
    participant ARC as archive_tools.py
    participant CR as comicrack.py
    
    User->>CLI: bdnex -f comic.cbz
    CLI->>FS: Read comic file
    FS-->>CLI: File info
    CLI->>BDG: Extract and match filename
    
    BDG->>CACHE: Check sitemap cache
    alt Sitemap cached
        CACHE-->>BDG: Return sitemap data
    else No cache
        BDG->>WEB: Download sitemap
        WEB-->>BDG: Sitemap data
        BDG->>CACHE: Store sitemap
    end
    
    BDG->>BDG: Fuzzy match Levenshtein
    BDG->>WEB: Scrape album page
    WEB-->>BDG: HTML metadata
    BDG->>BDG: Parse metadata
    BDG->>CACHE: Store JSON metadata
    
    CLI->>COV: Download cover
    COV->>WEB: Fetch cover image
    WEB-->>COV: Cover image
    COV->>CACHE: Store cover
    
    CLI->>ARC: Extract comic cover
    ARC->>FS: Read from CBZ/CBR
    FS-->>ARC: Cover image
    
    CLI->>COV: Compare covers SIFT
    COV-->>CLI: Similarity percentage
    
    alt High confidence match
        CLI->>CR: Generate ComicInfo.xml
        CR->>CR: Validate against schema
        CR->>ARC: Embed XML in archive
        ARC->>FS: Update CBZ/CBR
        CLI-->>User: Success message
    else Low confidence
        CLI-->>User: Request manual confirmation
        User->>CLI: Provide URL or confirm
        CLI->>CR: Generate ComicInfo.xml
        CR->>ARC: Embed XML in archive
        ARC->>FS: Update CBZ/CBR
        CLI-->>User: Success message
    end
Loading

Contributing

Contributions are welcome! Here's how to get started:

Development Setup

  1. Fork and clone the repository:
git clone https://github.com/yourusername/bdnex.git
cd bdnex
  1. Install in development mode:
pip install -e .[dev]
  1. Make your changes and add tests

  2. Run the test suite:

pytest
  1. Check code coverage:
coverage run -m pytest
coverage report

Code Style

  • Follow PEP 8 style guidelines
  • Use descriptive variable and function names
  • Add docstrings to functions and classes
  • Keep functions focused and single-purpose
  • Add type hints where appropriate

Adding Tests

When adding new features:

  1. Create tests in the appropriate test/test_*.py file
  2. Use unittest.mock for external dependencies
  3. Aim for high code coverage (>80%)
  4. Test edge cases and error conditions

Pull Request Process

  1. Create a feature branch: git checkout -b feature/my-feature
  2. Make your changes with clear commit messages
  3. Ensure all tests pass
  4. Update documentation if needed
  5. Submit a pull request with a clear description

Roadmap

Planned features for future releases:

  • SQLite Database: Keep records of already processed comics
  • Interactive Mode: Enhanced CLI with selection menus
  • Catalog Manager: Browse and manage your tagged collection
  • Renaming Convention: Auto-rename files based on metadata and user config
  • Additional Sources: Support for bdfugue.com and other French comic databases
  • Resume Support: Pick up where you left off in batch processing
  • GUI Application: Desktop application with visual interface
  • Plugin System: Extensible architecture for custom metadata sources
  • Duplicate Detection: Find and manage duplicate comics
  • Reading Lists: Create and manage reading lists
  • Web Interface: Browser-based management interface

Inspired by beets music manager.

Troubleshooting

Common Issues

Problem: "Cover matching percentage is low"

  • The automatic match may be incorrect
  • You'll be prompted to manually enter the bedetheque.com URL
  • You can adjust cover.match_percentage in config to be more/less strict

Problem: "Album not found in sitemap"

  • Run bdnex --init to update sitemaps
  • Try simplifying the filename (remove special characters, edition info)
  • Use interactive mode to search manually

Problem: "Import Error: No module named 'cv2'"

  • OpenCV is not installed correctly
  • Run: pip install opencv-contrib-python-headless

Problem: "RAR files not extracting"

  • Install unrar: sudo apt-get install unrar (Linux) or download from rarlab.com

Problem: Tests failing with "No source for code: config-3.py"

  • This is a coverage tool artifact and can be ignored
  • Tests should still pass successfully

Debug Mode

Run with verbose debug output:

bdnex -d /comics -v debug

Getting Help

  • Check existing GitHub Issues
  • Open a new issue with:
    • Your OS and Python version
    • Command you ran
    • Full error message
    • Example filename causing issues

License

This project is licensed under the MIT License - see the LICENSE file for details.

Acknowledgments

  • bedetheque.com for comprehensive French comics database
  • beets for inspiration on music library management
  • ComicRack for the metadata standard
  • All contributors who help improve BDneX

Note: BDneX is currently in active development. Some features mentioned in the roadmap are planned but not yet implemented. The tool is functional for its core purpose of tagging French comics.

About

BDneX - BD metadata scrapper (bedetheque.com, bdfuge...) in ComicRack format for french comics

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

 
 
 

Contributors

Languages

  • HTML 65.9%
  • Python 34.1%