A comprehensive guide to fine-tuning EasyOCR's text recognition models on custom datasets. This repository provides a step-by-step tutorial for training domain-specific OCR models that outperform generic pre-trained models on specialized text.
While EasyOCR provides excellent out-of-the-box performance for general text recognition, it may struggle with:
- Technical jargon and domain-specific terminology
- Unique fonts or stylized text
- Specialized formatting (invoices, forms, technical documents)
- Low-quality or degraded text
- Languages or scripts with limited training data
This tutorial shows you how to fine-tune EasyOCR to achieve superior accuracy on your specific use case.
- Complete end-to-end training pipeline
- Automated data preprocessing and LMDB conversion
- Detailed explanations for each training step
- Model conversion utilities for EasyOCR compatibility
- Best practices and troubleshooting tips
- Ready-to-use Google Colab notebook
- Python 3.11 or higher
- Basic understanding of machine learning concepts
- Training images with corresponding text labels
- Google Colab account (for GPU training) or local GPU setup
Click the badge below to open the notebook directly in Google Colab:
- Clone the repository
git clone https://github.com/AbdullahButt2611/EasyOCR-Custom-Training.git
cd EasyOCR-Custom-Training- Install dependencies
pip install -r requirements.txt- Prepare your training data
train_data/
├── image1.jpg
├── image2.jpg
├── ...
└── gt.txt
- Run the notebook
jupyter notebook EasyOCR_Custom_Training.ipynbeasyocr-custom-training/
│
├── EasyOCR_Custom_Training.ipynb # Main training notebook
├── README.md # This file
├── requirements.txt # Python dependencies
├── LICENSE # MIT License
│
├── examples/ # Example data and results
│ ├── sample_data/ # Sample training images
│ └── results/ # Example outputs
│
└── utils/ # Helper scripts (optional)
├── data_preparation.py
└── model_converter.py
Your training data should follow this structure:
train_data/
├── image1.jpg
├── image2.jpg
├── image3.jpg
└── gt.txt
The ground truth file should contain tab-separated values:
image1.jpg Hello World
image2.jpg Sample Text
image3.jpg Custom Label
Important Notes:
- Use TAB character (not spaces) between filename and label
- One line per image
- UTF-8 encoding
- Labels should match the exact text in the image
The notebook covers:
-
Environment Setup
- Installing dependencies
- Cloning the Deep Text Recognition Benchmark
-
Data Preprocessing
- Ground truth file formatting
- LMDB dataset creation
-
Framework Compatibility
- PyTorch compatibility fixes
- CPU/GPU configuration
-
Model Training
- Architecture selection (VGG + BiLSTM + CTC)
- Hyperparameter configuration
- Training monitoring
-
Model Conversion
- Converting to EasyOCR format
- Model deployment preparation
-
Testing & Evaluation
- Testing on sample images
- Performance evaluation
Choose from different combinations:
| Component | Options |
|---|---|
| Transformation | None, TPS |
| Feature Extraction | VGG, RCNN, ResNet |
| Sequence Modeling | None, BiLSTM |
| Prediction | CTC, Attn |
--exp_name my_model # Experiment name
--batch_size 8 # Batch size (adjust for GPU memory)
--num_iter 3000 # Total training iterations
--valInterval 100 # Validation frequency
--lr 1 # Learning rate
--workers 4 # Number of data loading workersWith proper training data (200+ samples):
- Training time: 30-60 minutes (1000 iterations on GPU)
- Accuracy improvement: 20-50% over generic models
- Best for: Domain-specific text with 50+ unique vocabulary items
Monitor these during training:
- Train Loss: Should steadily decrease
- Validation Loss: Should decrease without diverging from train loss
- Accuracy: Target 80%+ on validation set
- Normalized Edit Distance: Target < 0.10
Q: Training loss not decreasing
- Check data quality and label accuracy
- Increase training iterations
- Try different learning rates (0.5, 1.0, 2.0)
Q: Out of memory errors
- Reduce
batch_size - Use GPU runtime in Colab
- Reduce image resolution
Q: Model not loading in EasyOCR
- Verify model conversion completed successfully
- Check that converted model is in correct directory
- Ensure key names match EasyOCR's expected format
Q: Low accuracy on validation set
- Add more diverse training samples
- Increase
num_iterto 3000-5000 - Try different model architectures
Q: Overfitting (train accuracy >> validation accuracy)
- Add more training data
- Reduce model complexity
- Implement data augmentation
- Minimum 200 images recommended
- Cover all characters/symbols you need to recognize
- Include variations in lighting, angles, and quality
- Balance dataset (similar samples per class)
- Start with 1000 iterations, increase if needed
- Monitor validation metrics closely
- Save checkpoints regularly
- Test on completely unseen data
- For short text (1-10 chars): VGG + BiLSTM + CTC
- For longer text: ResNet + BiLSTM + Attn
- For simple fonts: VGG + None + CTC (faster)
Contributions are welcome! Here's how you can help:
- Fork the repository
- Create a feature branch
git checkout -b feature/amazing-feature
- Commit your changes
git commit -m 'Add amazing feature' - Push to the branch
git push origin feature/amazing-feature
- Open a Pull Request
- Add example datasets for different domains
- Implement data augmentation utilities
- Create model evaluation scripts
- Add support for additional architectures
- Improve documentation
- EasyOCR Official Repository
- Deep Text Recognition Benchmark Paper
- CTC Loss Explanation
- LMDB Documentation
This project is licensed under the MIT License - see the LICENSE file for details.
- Issues: GitHub Issues
- Discussions: GitHub Discussions
- Email: abutt2210@gmail.com
If you find this repository helpful, please consider giving it a star! It helps others discover this resource.
Made with ❤️ for the OCR community
Last updated: January 2026