Thank you for your interest in contributing to Digital Collections Explorer! This document provides guidelines and information to help you contribute effectively.
- Code of Conduct
- How Can I Contribute?
- Development Setup
- Coding Standards
- Commit Guidelines
- Pull Request Process
- Testing Guidelines
- Documentation
- Research Contributions
This project adheres to a Code of Conduct that all contributors are expected to follow. Please read CODE_OF_CONDUCT.md before contributing.
Before creating bug reports, please check existing issues to avoid duplicates. When creating a bug report, include as many details as possible using our bug report template:
- Clear and descriptive title
- Steps to reproduce the behavior
- Expected vs. actual behavior
- Screenshots or error logs
- Environment details (OS, Python/Node versions, collection type)
Enhancement suggestions are tracked as GitHub issues. When creating an enhancement suggestion:
- Use a clear and descriptive title
- Provide a detailed description of the proposed functionality
- Explain why this enhancement would be useful
- Include mockups or examples if applicable
As an academic project, we especially welcome:
- New CLIP model fine-tuning approaches for digital collections
- Evaluation methodologies and benchmark datasets
- User studies or usability research findings
- Domain-specific optimizations for different collection types
- Performance improvements in embedding generation or search
Please open an issue or discussion to share your research ideas and findings.
- Fix typos or clarify existing documentation
- Add examples or tutorials
- Translate documentation (if multilingual support is needed)
- Improve API documentation or code comments
We welcome code contributions including:
- Bug fixes
- New features
- Performance improvements
- Code refactoring
- Test coverage improvements
-
Fork and clone the repository
git clone https://github.com/YOUR-USERNAME/digital-collections-explorer.git cd digital-collections-explorer -
Set up Python environment
python -m venv venv source venv/bin/activate # On Windows: venv\Scripts\activate pip install -r requirements.txt
-
Configure the project
npm install npm run setup -- --type=photographs # or maps, documents -
Generate test embeddings (optional, for testing)
# Add some test images to data/raw python -m src.models.clip.generate_embeddings -
Start the backend server
python -m src.backend.main
-
Navigate to the frontend directory
cd src/frontend/photographs # or maps, documents
-
Install dependencies
npm install
-
Start development server
npm run dev
The frontend will be available at http://localhost:5173 and will proxy API requests to http://localhost:8000.
-
Formatting: Use
blackwith default settingsblack . -
Import sorting: Use
isortwith black-compatible profileisort . -
Style guidelines:
- Follow PEP 8
- Use type hints where applicable
- Write docstrings for public functions and classes
- Keep functions focused and single-purpose
-
File organization:
api/routes/: HTTP endpoint handlersservices/: Business logic and ML operationsmodels/schemas.py: Pydantic models for validationcore/: Configuration and shared utilities
-
Linting: Run ESLint before committing
npm run lint
-
Style guidelines:
- Use functional components with hooks
- Keep components small and reusable
- Use meaningful variable and function names
- Add comments for complex logic
-
Component structure:
- One component per file
- Co-locate CSS with components
- Use absolute imports where configured
We encourage (but don't require) Conventional Commits format:
<type>(<scope>): <subject>
<body>
<footer>
Types:
feat: New featurefix: Bug fixdocs: Documentation changesstyle: Code style changes (formatting, no logic change)refactor: Code refactoringperf: Performance improvementstest: Adding or updating testschore: Maintenance tasksresearch: Research-related contributions (papers, evaluations, benchmarks)
Examples:
feat(search): add image similarity threshold parameter
fix(embeddings): handle grayscale images correctly
docs(readme): clarify Docker deployment steps
research(clip): add fine-tuning results for historical maps
-
Create a feature branch
git checkout -b feature/your-feature-name # or git checkout -b fix/bug-description -
Make your changes
- Write clear, focused commits
- Add tests if applicable
- Update documentation as needed
-
Run quality checks
# Python black . isort . pytest # if you added tests # JavaScript (in frontend directory) npm run lint
-
Push your branch
git push origin feature/your-feature-name
-
Open a Pull Request
- Use the PR template
- Reference related issues
- Provide clear description of changes
- Add screenshots for UI changes
-
Code Review
- Address review feedback
- Keep PR focused on a single concern
- Be responsive to comments
-
Merge
- PRs require approval from maintainers
- Ensure all CI checks pass
- Squash commits may be requested for cleaner history
We use pytest for Python testing:
# Run all tests
pytest
# Run specific test file
pytest tests/test_embedding_service.py
# Run with coverage
pytest --cov=srcTest guidelines:
- Write unit tests for
services/logic - Add integration tests for API routes
- Test edge cases and error handling
- Mock external dependencies (CLIP model, file I/O)
Currently, frontend testing is minimal. Contributions to improve test coverage are welcome:
- Unit tests for utility functions
- Component tests with React Testing Library
- Integration tests for search workflows
-
Python: Use docstrings (Google or NumPy style)
def generate_embedding(image_path: str) -> torch.Tensor: """Generate CLIP embedding for an image. Args: image_path: Path to the input image file Returns: torch.Tensor: The embedding vector Raises: FileNotFoundError: If image file doesn't exist """
-
JavaScript: Use JSDoc for complex functions
/** * Performs semantic search using text or image input * @param {string} query - Text query or image URL * @param {number} limit - Maximum number of results * @returns {Promise<Array>} Search results */
- Keep README.md up to date with new features
- Add examples for new functionality
- Update configuration documentation
- Create tutorial content for complex workflows
If you use this project in research:
- Cite the project: Use the DOI badge in README.md
- Share your findings: Open an issue or PR with your results
- Contribute improvements: If your research leads to better models or methods, consider contributing them back
- Datasets: If you create benchmark datasets, consider sharing them (with appropriate licenses)
- Fine-tuned models: Share model weights via Hugging Face or similar platforms
- Evaluation scripts: Contribute evaluation code to help others reproduce results
We welcome collaboration with researchers:
- Joint development of new features
- Evaluation and benchmarking studies
- User research and usability studies
- Domain-specific applications
Please reach out via issues or email to discuss research collaborations.
- Documentation: Check README.md and code comments
- Issues: Search existing issues or create a new one
- Discussions: Use GitHub Discussions for questions and ideas
- Email: For private inquiries or collaboration proposals
Contributors will be:
- Listed in project documentation
- Acknowledged in academic papers (if applicable)
- Given credit in release notes
Thank you for contributing to Digital Collections Explorer! 🎉