Celestium has been upgraded with advanced speaker diarization and the latest speech recognition libraries for enhanced security and performance.
| Library | Old Version | New Version | Notes |
|---|---|---|---|
| speechrecognition | 3.10.4 | 3.14.3 | Latest 2025 release with Whisper support |
| python | ^3.10.0 | ^3.10.0,<3.13 | Added upper bound for compatibility |
| Library | Version | Purpose |
|---|---|---|
| openai-whisper | ^20231117 | Enhanced speech-to-text quality |
| pyannote-audio | ^3.0.0 | Speaker diarization (who spoke when) |
| torch | ^2.0.0 | PyTorch ML framework |
| torchaudio | ^2.0.0 | Audio processing for PyTorch |
| python-dotenv | ^1.0.0 | Environment variable management |
| soundfile | ^0.12.0 | Audio file I/O |
-
Speaker Diarization
- Detects number of speakers in audio
- Rejects authentication if multiple speakers present
- Prevents coerced or influenced transactions
-
Multi-Layer Voice Authentication
Layer 1: Speaker Count Verification (NEW!) Layer 2: Voice Biometric Matching (GMM) Layer 3: Spoken Password Verification
cd /Users/darkmatter/projects/transia/celestium
poetry install# Copy example file
cp .env.example .env
# Edit and add your HuggingFace token
nano .envGet your token:
- Visit https://huggingface.co/settings/tokens
- Create a new token (read access)
- Accept user agreement: https://huggingface.co/pyannote/speaker-diarization-3.1
from celestium.config import Config
Config.print_config()
# Should show: HuggingFace Token: β Setfrom celestium.speaker_verification import validate_authorized_speaker
# Test with a recording
is_valid, message = validate_authorized_speaker("test.wav")
print(message)Added: Speaker diarization checks in two places
# NEW: Verify single speaker before processing
print("Verifying speaker identity...")
is_valid, message = validate_authorized_speaker(FILENAME, threshold=0.90)
if not is_valid:
print(f"β {message}")
speak("Multiple speakers detected. Please ensure you are alone.")
return None# NEW: Verify speaker count before authentication
print("Verifying speaker count...")
is_valid, message = validate_authorized_speaker(FILENAME, threshold=0.90)
if not is_valid:
print(f"β {message}")
speak("Multiple speakers detected. Authentication failed.")
return Falsecelestium/
βββ speaker_verification.py # NEW: Speaker diarization module
βββ config.py # NEW: Configuration management
βββ .env.example # NEW: Environment template
βββ SPEAKER_DIARIZATION.md # NEW: Detailed documentation
βββ UPGRADE_GUIDE.md # NEW: This file
Good news! No migration needed for existing users:
- β Existing GMM models still work
- β Existing voice recordings unchanged
- β Existing password hashes compatible
- β Existing encrypted wallets work
New feature is additive - it adds speaker verification on top of existing authentication.
Comment out the validation calls in approval.py:
# Disable speaker diarization temporarily
# is_valid, message = validate_authorized_speaker(FILENAME)
# if not is_valid:
# return False- Downloads ~300MB pyannote model (one-time)
- Takes 30-60 seconds to initialize
- Models cached locally for future use
- CPU: ~5-10 seconds per verification
- GPU: ~1-2 seconds per verification
- Memory: ~2GB RAM for model
-
Use GPU (10x faster)
# In .env file DEVICE=cuda -
Reuse verifier instance (already implemented)
- Singleton pattern avoids reloading model
- First call loads model, subsequent calls are fast
-
Adjust threshold for speed/security tradeoff
# More lenient = faster (fewer rejections) validate_authorized_speaker(audio, threshold=0.85) # More strict = slower (more rejections) validate_authorized_speaker(audio, threshold=0.95)
- Supported: Python 3.10, 3.11, 3.12
- Not Supported: Python 3.13 (Whisper incompatibility)
- β macOS (Intel & Apple Silicon)
- β Linux (Ubuntu, Debian, etc.)
- β Windows 10/11 (with PyTorch)
- Minimum: 4GB RAM, dual-core CPU
- Recommended: 8GB RAM, quad-core CPU
- Optimal: 8GB+ RAM, NVIDIA GPU with CUDA
Solution: Run poetry install to install new dependencies
Solution: Set token in .env file
HUGGINGFACE_TOKEN=hf_xxxxxSolution: Visit and accept: https://huggingface.co/pyannote/speaker-diarization-3.1
Solutions:
- Use GPU:
DEVICE=cudain.env - Lower threshold:
threshold=0.85 - Disable temporarily during development
- Check SPEAKER_DIARIZATION.md for detailed docs
- Review error messages for specific guidance
- Test speaker diarization independently:
from celestium.speaker_verification import get_speaker_verifier verifier = get_speaker_verifier()
If you need to rollback to previous version:
# Restore old pyproject.toml
git checkout HEAD~1 pyproject.toml
# Remove new files
rm celestium/speaker_verification.py
rm celestium/config.py
rm .env.example
# Restore old approval.py
git checkout HEAD~1 celestium/approval.py
# Reinstall old dependencies
poetry installAfter upgrading, verify:
-
poetry installcompletes successfully - HuggingFace token is set in
.env - User agreement accepted at HuggingFace
- Config validation passes:
Config.validate() - Existing users can still authenticate
- Speaker diarization detects single speaker
- Speaker diarization rejects multiple speakers
- Transaction approval flow works end-to-end
Future enhancements planned:
- Voice enrollment with speaker diarization
- Continuous authentication during long sessions
- Voice liveness detection (anti-spoofing)
- Multi-language support for commands
- Real-time speaker tracking
Questions or issues? Check the documentation or review the code:
celestium/speaker_verification.py- Core implementationcelestium/approval.py- Integration pointsSPEAKER_DIARIZATION.md- Detailed guide