We welcome contributions from the research community! This guide outlines how to participate in this project and contribute to the development of benchmarks for evaluating LLM performance on digital humanities tasks.
We use a structured approach to contributions based on five key roles in digital humanities benchmarking:
What it involves: Provide subject knowledge and shape research questions
- Help define relevant task categories for humanities research
- Contribute to annotation design and interpretation of model outputs
- Point out cultural and linguistic complexities that may affect performance
- Review benchmark designs for domain relevance
What it involves: Collect, clean, and prepare high-quality datasets
- Source historical documents, images, or other humanities materials
- Clean and format data following FAIR principles (Findable, Accessible, Interoperable, Reusable)
- Handle licensing and permissions for data use
- Address potential biases in digitization, selection, and representation
What it involves: Create ground truth data and reference annotations
- Develop accurate transcriptions, tags, labels, or other annotations
- Follow clear guidelines to ensure consistency across annotators
- Recognize that interpretive variation is often part of humanities tasks
- Validate existing ground truth data
What it involves: Develop meaningful scoring and evaluation criteria
- Propose evaluation criteria early in the benchmark development process
- Use standard measures where appropriate (e.g., precision, recall, edit distance)
- Adapt metrics to reflect what is meaningful in specific scholarly contexts
- Analyze benchmark results and identify patterns
What it involves: Implement benchmarks and build reproducible workflows
- Build and document baseline systems and reproducible workflows
- Implement scoring functions and evaluation metrics
- Support comparisons through tools like dashboards or summary reports
- Aim for accessibility across different levels of technical experience
- Explore existing benchmarks - Review our benchmark documentation to understand the current scope
- Identify your expertise - Consider which of the five contribution areas align with your skills
- Join the conversation - Contact us (see Contact section) to discuss potential contributions
- Start small - Consider validating existing ground truth or reviewing benchmark designs before proposing new benchmarks
If you want to create a new benchmark:
- Propose your idea - Contact the team to discuss scope, feasibility, and fit with project goals
- Assemble a team - Ideally include contributors covering multiple roles (domain expert, data curator, annotator, analyst, engineer)
- Follow our structure - Use our benchmark template and directory structure
- Iterate and refine - Work with the team to refine your benchmark based on feedback
-
Clone the repository
git clone https://github.com/RISE-UNIBAS/humanities_data_benchmark.git cd humanities_data_benchmark -
Install dependencies
pip install -r requirements.txt
-
Set up API keys (if testing models)
cp .env.example .env # Edit .env file with your API keys
Each benchmark must include:
README.md- Description using our templateimages/- Source images or documentsprompts/- Text prompts for the modelsground_truths/- Reference answers in JSON or text formatbenchmark.py(optional) - Custom scoring logicdataclass.py(optional) - Structured output schemas
- Follow existing code style and conventions
- Include clear documentation for any new functions or classes
- Test your benchmark with at least one model before submitting
- Use meaningful variable and function names
- Follow FAIR principles for data management
- Include proper attribution and licensing information
- Ensure data quality through validation and review
- Document any preprocessing steps clearly
- Consider privacy and ethical implications of your data
- Provide clear annotation guidelines
- Include multiple examples of edge cases
- Document any subjective decisions or interpretations
- Ensure consistency across annotators
- Validate ground truth through peer review when possible
- Fork the repository
- Create a feature branch (
git checkout -b feature/new-benchmark) - Make your changes following our standards
- Test your benchmark thoroughly
- Submit a pull request with clear description of changes
- Contact the team before collecting or preparing large datasets
- Ensure you have proper permissions and licenses
- Follow our data structure and documentation standards
- Provide metadata about your dataset
- Submit through the agreed process with the team
- Follow existing annotation guidelines or propose new ones
- Include documentation of your annotation process
- Have your work reviewed by domain experts
- Provide inter-annotator agreement statistics if applicable
- Submit alongside documentation of annotation decisions
Contributors are recognized in our CONTRIBUTORS.md file based on their specific contributions across the five roles. When contributing, please:
- Provide your ORCID identifier if available
- Specify which roles you contributed to
- Include any institutional affiliations
- Provide GitHub username for technical contributions
We are committed to fostering an inclusive and respectful research environment:
- Be respectful and constructive in all interactions
- Acknowledge and cite others' work appropriately
- Be open to feedback and different perspectives
- Focus on what is best for the research community
- Report any unacceptable behavior to the project maintainers
- Check existing GitHub issues to see if your topic has been discussed
- Create a new issue to report bugs, suggest features, or discuss improvements
- Provide detailed descriptions and steps to reproduce for bug reports
- Clearly explain the motivation and use case for feature requests
- Contact the research team for discussions about scope, methodology, or domain-specific questions
- Join our research meetings (contact us for information)
- Use GitHub Discussions for general questions about the project
- Email the project maintainers for sensitive or private matters
For questions about contributions, please contact:
- Maximilian Hindermann: maximilian.hindermann@unibas.ch
- Sorin Marti: sorin.marti@unibas.ch
- Project README - Overview of the benchmark suite
- CONTRIBUTORS.md - Detailed attribution information
- Benchmark Template - Template for new benchmarks
- DataCite Contributor Types - Standard contributor classification