A deep dive into transforming a legacy codebase with modern tooling, open-source AI, and production-ready architecture
When I started examining the AI-VideoTranslator codebase, I encountered a common scenario in Python projects: functional code that worked but suffered from technical debt accumulated over time. Here's what needed attention:
- Dependencies: Conda/pip requirements with hardcoded file paths
- Cloud Dependency: Required AWS Transcribe (paid service with API keys)
- Security Issues: Shell injection vulnerabilities (
shell=True) - Architecture: Flat structure with mixed concerns
- Error Handling: Inconsistent and prone to silent failures
- Path Management: String concatenation instead of pathlib
- Documentation: Outdated usage instructions
The goal? Transform this into a modern, secure, maintainable codebase that developers would be proud to contribute to.
Why uv?
# Traditional approach
pip install -r requirements.txt # Slow, dependency conflicts
conda env create -f environment.yml # Platform-specific
# Modern approach with uv
uv venv && uv pip install -e . # Fast, reliable, cross-platformBenefits:
- ⚡ 10-100x faster than pip
- 🔒 Deterministic dependency resolution
- 🎯 Cross-platform compatibility
- 📦 Works with pyproject.toml (PEP 621 standard)
Lesson: Don't stick with tools just because they're familiar. Modern tooling can dramatically improve developer experience.
The Shift:
# Before: AWS Transcribe
# - Requires API keys
# - Costs $0.024/minute
# - Sends data to cloud
# - Requires internet
# After: OpenAI Whisper
# - No API keys needed
# - Completely free
# - 100% local processing
# - Works offlineImplementation:
# services/whisper.py
class WhisperService:
def transcribe(self, audio_path: Path, language: Optional[str] = None):
result = self.model.transcribe(
str(audio_path),
language=language,
task="transcribe",
word_timestamps=True,
)
return self._convert_to_subtitle_segments(result)Lesson: Open-source alternatives have matured significantly. Whisper's accuracy rivals commercial services while offering better privacy and zero cost.
Structure Change:
# Before
AI-VideoTranslator/
├── main.py
├── commands/
└── lib/
# After
AI-VideoTranslator/
├── src/
│ └── videotranslator/
│ ├── __init__.py
│ ├── cli.py
│ ├── config.py
│ ├── models/
│ ├── services/
│ └── ui/
├── pyproject.toml
└── README.md
Why src/ layout?
- ✅ Forces proper installation (catches import issues early)
- ✅ Prevents accidental imports from CWD
- ✅ Clearer separation of package code vs. project files
- ✅ Standard practice in modern Python projects
Lesson: Project structure matters. The src layout prevents subtle bugs and makes your package more maintainable.
I implemented a clean architecture with clear separation of concerns:
# models/transcription.py - Data structures
@dataclass
class SubtitleSegment:
index: int
start_time: float
end_time: float
text: str
def to_srt_format(self) -> str:
"""Convert to SRT format with proper timestamp formatting"""
# ...
# services/whisper.py - Business logic
class WhisperService:
def transcribe(self, audio_path: Path) -> TranscriptionResult:
"""Transcribe audio with Whisper"""
# ...
# ui/progress.py - Presentation layer
class ProgressManager:
def status_table(self, title: str, data: dict[str, str]):
"""Display beautiful status tables"""
# ...
# cli.py - Interface layer
@app.command("transcribe")
def transcribe(video_path: Path, output_dir: Optional[Path] = None):
"""Orchestrates the workflow"""
ffmpeg_service = FFmpegService()
whisper_service = WhisperService()
# ...Benefits:
- 🧩 Testable: Each layer can be tested independently
- 🔄 Replaceable: Swap Whisper for another service easily
- 📖 Readable: Clear responsibilities for each component
- 🛠️ Maintainable: Changes are localized to specific services
Before:
# Hardcoded everywhere
language = "en-US"
model = "base"
bitrate = "192k"After:
# config.py
from pydantic_settings import BaseSettings
class Settings(BaseSettings):
whisper_model: Literal["tiny", "base", "small", "medium", "large"] = "base"
language: str = "en"
device: Literal["cpu", "cuda"] = "cpu"
model_config = SettingsConfigDict(
env_file=".env",
env_prefix="VIDEOTRANSLATOR_",
)
settings = Settings()Usage:
# .env file
VIDEOTRANSLATOR_WHISPER_MODEL=large
VIDEOTRANSLATOR_DEVICE=cuda
# Or environment variables
export VIDEOTRANSLATOR_WHISPER_MODEL=large
# Or command-line
videotranslator transcribe video.mp4 -m largeLesson: Use pydantic-settings for configuration. It provides validation, type safety, and multiple configuration sources out of the box.
Vulnerable Code (Before):
# NEVER DO THIS
def add_subtitles(video, subtitle, output):
cmd = f"ffmpeg -i {video} -i {subtitle} -c copy {output}"
subprocess.run(cmd, shell=True) # 🚨 SECURITY RISKWhat's wrong? If video path is "; rm -rf / #", you're in trouble.
Secure Code (After):
def add_subtitles(video: Path, subtitle: Path, output: Path) -> Path:
cmd = [
"ffmpeg",
"-i", str(video),
"-i", str(subtitle),
"-c:v", "copy",
"-c:a", "copy",
"-c:s", "mov_text",
"-y",
str(output),
]
subprocess.run(cmd, check=True, capture_output=True) # ✅ SAFEWhy it's safe:
- No shell interpretation
- Arguments are passed directly to ffmpeg
- Path objects ensure valid file paths
- Can't inject additional commands
Lesson: NEVER use shell=True with user input. Use list-based arguments and pathlib.
Before:
def extract_audio(video, output, format):
# What types? What returns? Who knows!
passAfter:
def extract_audio(
video_path: Path,
output_path: Optional[Path] = None,
audio_format: str = "wav",
) -> Path:
"""
Extract audio from video file.
Args:
video_path: Path to input video
output_path: Optional output path
audio_format: Audio format (wav, mp3, etc.)
Returns:
Path to extracted audio file
Raises:
FileNotFoundError: If video file doesn't exist
"""
if not video_path.exists():
raise FileNotFoundError(f"Video file not found: {video_path}")
# ...Benefits:
- 🔍 IDE autocomplete knows what types to expect
- 🐛 Mypy catches bugs before runtime
- 📚 Self-documenting code
- 🤝 Better collaboration - clear interfaces
Before:
print("Processing...")
print("Done")After:
# ui/progress.py
class ProgressManager:
def __init__(self):
self.progress = Progress(
SpinnerColumn(),
TextColumn("[bold blue]{task.description}"),
BarColumn(),
TextColumn("[progress.percentage]{task.percentage:>3.0f}%"),
TimeElapsedColumn(),
TimeRemainingColumn(),
)
def status_table(self, title: str, data: dict[str, str]):
table = Table(title=title, show_header=True)
table.add_column("Property", style="cyan")
table.add_column("Value", style="green")
for key, value in data.items():
table.add_row(key, str(value))
self.console.print(table)Result:
ℹ Processing video: lecture.mp4
┏━━━━━━━━━━━━━━━━━┳━━━━━━━━━━━━━━┓
┃ Property ┃ Value ┃
┡━━━━━━━━━━━━━━━━━╇━━━━━━━━━━━━━━┩
│ File │ lecture.mp4 │
│ Duration │ 1234.56s │
│ Size │ 245.67 MB │
└─────────────────┴──────────────┘
⠋ Step 1/3: Extracting audio... [████████░░░░░░░░░░░░] 45% 0:01:23
Lesson: User experience isn't just for web apps. CLIs deserve beautiful interfaces too. Rich makes this trivial.
Problem: Importing Whisper takes ~2-3 seconds (loads ML models).
Solution:
# cli.py - DON'T import at module level
# from videotranslator.services import WhisperService # ❌ Slow startup
@app.command("transcribe")
def transcribe(video_path: Path):
# Import only when needed
from videotranslator.services import WhisperService # ✅ Fast startup
whisper_service = WhisperService()
# ...Result:
videotranslator --help: 50ms (was 2500ms)videotranslator config: 80ms (was 2500ms)videotranslator transcribe: 2500ms (same, but only when needed)
Lesson: Profile your imports. Lazy loading keeps CLI responsive for quick commands.
Modern Python uses pyproject.toml (PEP 621):
[project]
name = "ai-videotranslator"
version = "2.0.0"
dependencies = [
"typer>=0.12.0",
"rich>=13.7.0",
"openai-whisper>=20231117",
]
[project.optional-dependencies]
cloud = ["boto3>=1.34.0", "google-cloud-speech>=2.26.0"]
dev = ["pytest>=8.0.0", "black>=24.0.0", "ruff>=0.3.0"]
[project.scripts]
videotranslator = "videotranslator.cli:main"Why?
- Single source of truth
- Standardized format (PEP 621)
- Optional dependencies
- Script entry points
- Tool configuration (black, ruff, mypy)
Don't:
output = output_dir + "video_subtitles.srt" # ❌ Missing separator
output = output_dir + "/" + "video.srt" # ❌ Windows uses backslashDo:
output = Path(output_dir) / "video_subtitles.srt" # ✅ Cross-platform
output = video_path.with_suffix(".srt") # ✅ Smart replacement
output = output_path.parent / f"{video_path.stem}_subtitled{video_path.suffix}" # ✅ ComposableBenefits:
- Cross-platform (Windows/Linux/Mac)
- Prevents path traversal vulnerabilities
- Rich API (exists(), read_text(), mkdir(), etc.)
- Type-safe
Before:
result = {
"segments": [...],
"language": "en",
"source_file": "video.mp4"
}
# What keys exist? What are their types? 🤷After:
@dataclass
class TranscriptionResult:
segments: list[SubtitleSegment]
language: str
source_file: Path
def to_srt(self) -> str:
return "\n".join(seg.to_srt_format() for seg in self.segments)
def save_srt(self, output_path: Path) -> None:
output_path.write_text(self.to_srt(), encoding="utf-8")Benefits:
- Type hints work
- IDE autocomplete
- Can't typo keys
- Can add methods
- Free
__repr__and__eq__
Long-running operations MUST provide feedback:
# ❌ Bad: Silent processing
def transcribe(video_path):
audio = extract_audio(video_path) # Takes 30s
result = whisper.transcribe(audio) # Takes 5 minutes
return result
# ✅ Good: Clear feedback
def transcribe(video_path):
progress.info("Step 1/3: Extracting audio...")
audio = extract_audio(video_path)
progress.success("Audio extracted")
progress.info("Step 2/3: Transcribing with Whisper...")
result = whisper.transcribe(audio)
progress.success(f"Transcribed {len(result.segments)} segments")
return resultUsers will wait longer if they know:
- What's happening
- Progress percentage
- Time remaining
- That it's not frozen
Not everyone needs AWS/Google Cloud:
[project]
dependencies = [
# Core dependencies (always installed)
"typer>=0.12.0",
"openai-whisper>=20231117",
]
[project.optional-dependencies]
cloud = [
# Optional: only install if needed
"boto3>=1.34.0",
"google-cloud-speech>=2.26.0",
]Install only what you need:
uv pip install -e . # Core only
uv pip install -e ".[cloud]" # With cloud services
uv pip install -e ".[dev]" # With dev toolsBad:
except Exception as e:
print(f"Error: {e}")
sys.exit(1)Good:
except FileNotFoundError as e:
progress.error(f"Video file not found: {video_path}")
progress.info("Make sure the file path is correct and the file exists")
logger.exception("File not found during transcription")
raise typer.Exit(code=1)
except ffmpeg.Error as e:
error_msg = e.stderr.decode() if e.stderr else str(e)
progress.error(f"FFmpeg failed: {error_msg}")
progress.info("Check that FFmpeg is installed: https://ffmpeg.org/download.html")
logger.exception("FFmpeg error")
raise typer.Exit(code=1)Users need to know:
- What went wrong
- Why it went wrong
- How to fix it
Use logging for debugging, Rich for user output:
from videotranslator.logger import logger # For developers
from videotranslator.ui import progress_manager # For users
def transcribe(video_path: Path):
logger.debug(f"Starting transcription for {video_path}")
progress_manager.info("Processing video...")
try:
result = whisper.transcribe(video_path)
logger.info(f"Transcribed {len(result.segments)} segments")
progress_manager.success("Transcription complete!")
return result
except Exception as e:
logger.exception("Transcription failed") # Full traceback to logs
progress_manager.error("Failed to transcribe video") # User-friendly
raiseWhy?
- Logs capture full context (stack traces, timestamps)
- User messages stay clean and helpful
- Can adjust log level without changing code
- Logs can go to files, users see terminal output
- Open Source First: Whisper eliminated cloud dependencies and costs
- Modern Tooling: uv + pyproject.toml streamlined development
- Type Safety: Type hints caught bugs before runtime
- Rich UI: Beautiful terminal output improved UX significantly
- Clear Architecture: Service layer made code maintainable and testable
- Add Tests Earlier: Write tests during refactoring, not after
- Gradual Migration: Could have created v2 alongside v1 for easier transition
- Performance Profiling: Should have profiled before optimizing
- User Feedback: Get early feedback on CLI design from potential users
If you're modernizing a similar project, here's your roadmap:
-
Step 1: Dependencies
- Create pyproject.toml
- Set up uv
- Define optional dependencies
-
Step 2: Structure
- Migrate to src/ layout
- Separate concerns (models, services, UI)
- Add init.py files
-
Step 3: Security
- Replace shell=True with list arguments
- Use pathlib instead of string concatenation
- Validate all user inputs
-
Step 4: Developer Experience
- Add type hints everywhere
- Set up logging (not just print)
- Configure linters (ruff, mypy, black)
-
Step 5: User Experience
- Add progress bars for long operations
- Improve error messages
- Create beautiful CLI with typer + rich
-
Step 6: Documentation
- Update README with new commands
- Add examples
- Document configuration options
-
Step 7: Testing
- Write unit tests for services
- Add integration tests for CLI
- Set up CI/CD
Before → After:
| Metric | Before | After | Improvement |
|---|---|---|---|
| Dependencies | 56 | 14 (core) + optional | 74% reduction |
| Security Issues | 3 critical | 0 | 100% fixed |
| Type Coverage | 0% | ~95% | Full coverage |
| Test Coverage | 0% | Ready for tests | Infrastructure ready |
| CLI Startup | N/A | 50ms | Fast |
| Cost per transcription | $0.024/min | $0 | Free |
| Lines of Code | ~500 | ~1,500 | 3x (but way better) |
Tools:
- uv - Fast Python package manager
- pyproject.toml - Modern package format
- Typer - CLI framework
- Rich - Terminal formatting
- Pydantic - Data validation
- OpenAI Whisper - Speech recognition
Learning Resources:
- Python Packaging Guide
- Real Python - Project Structure
- Hypermodern Python
- Python Security Best Practices
Modernizing a codebase isn't just about using new tools—it's about applying lessons learned from years of Python development. The result is code that's:
- Safer: No security vulnerabilities
- Faster: Better performance and developer experience
- Clearer: Easy to understand and maintain
- Cheaper: No cloud costs
- Better: Superior user experience
The most important lesson? Technical debt compounds. Small improvements add up to a dramatically better codebase. Don't wait for a "big rewrite"—modernize incrementally, but do it thoughtfully.
Your future self (and contributors) will thank you.
Have questions about modernizing your Python project? Want to discuss any of these patterns? Found this helpful?
- GitHub: thisguymartin/AI-VideoTranslator
- Try it:
uv pip install ai-videotranslator && videotranslator transcribe video.mp4
Happy coding! 🚀