Contributing to the LLM Benchmark Suite for Humanities Image Data

We welcome contributions from the research community! This guide outlines how to participate in this project and contribute to the development of benchmarks for evaluating LLM performance on digital humanities tasks.

Ways to Contribute

We use a structured approach to contributions based on five key roles in digital humanities benchmarking:

1. Domain Expert

What it involves: Provide subject knowledge and shape research questions

Help define relevant task categories for humanities research
Contribute to annotation design and interpretation of model outputs
Point out cultural and linguistic complexities that may affect performance
Review benchmark designs for domain relevance

2. Data Curator

What it involves: Collect, clean, and prepare high-quality datasets

Source historical documents, images, or other humanities materials
Clean and format data following FAIR principles (Findable, Accessible, Interoperable, Reusable)
Handle licensing and permissions for data use
Address potential biases in digitization, selection, and representation

3. Annotator

What it involves: Create ground truth data and reference annotations

Develop accurate transcriptions, tags, labels, or other annotations
Follow clear guidelines to ensure consistency across annotators
Recognize that interpretive variation is often part of humanities tasks
Validate existing ground truth data

4. Analyst

What it involves: Develop meaningful scoring and evaluation criteria

Propose evaluation criteria early in the benchmark development process
Use standard measures where appropriate (e.g., precision, recall, edit distance)
Adapt metrics to reflect what is meaningful in specific scholarly contexts
Analyze benchmark results and identify patterns

5. Engineer

What it involves: Implement benchmarks and build reproducible workflows

Build and document baseline systems and reproducible workflows
Implement scoring functions and evaluation metrics
Support comparisons through tools like dashboards or summary reports
Aim for accessibility across different levels of technical experience

How to Get Started

For New Contributors

Explore existing benchmarks - Review our benchmark documentation to understand the current scope
Identify your expertise - Consider which of the five contribution areas align with your skills
Join the conversation - Contact us (see Contact section) to discuss potential contributions
Start small - Consider validating existing ground truth or reviewing benchmark designs before proposing new benchmarks

For New Benchmark Development

If you want to create a new benchmark:

Propose your idea - Contact the team to discuss scope, feasibility, and fit with project goals
Assemble a team - Ideally include contributors covering multiple roles (domain expert, data curator, annotator, analyst, engineer)
Follow our structure - Use our benchmark template and directory structure
Iterate and refine - Work with the team to refine your benchmark based on feedback

Technical Requirements

Setting Up Your Development Environment

Clone the repository

git clone https://github.com/RISE-UNIBAS/humanities_data_benchmark.git
cd humanities_data_benchmark

Install dependencies
```
pip install -r requirements.txt
```

Set up API keys (if testing models)

cp .env.example .env
# Edit .env file with your API keys

Benchmark Structure

Each benchmark must include:

README.md - Description using our template
images/ - Source images or documents
prompts/ - Text prompts for the models
ground_truths/ - Reference answers in JSON or text format
benchmark.py (optional) - Custom scoring logic
dataclass.py (optional) - Structured output schemas

Code and Data Standards

Code Standards

Follow existing code style and conventions
Include clear documentation for any new functions or classes
Test your benchmark with at least one model before submitting
Use meaningful variable and function names

Data Standards

Follow FAIR principles for data management
Include proper attribution and licensing information
Ensure data quality through validation and review
Document any preprocessing steps clearly
Consider privacy and ethical implications of your data

Ground Truth Standards

Provide clear annotation guidelines
Include multiple examples of edge cases
Document any subjective decisions or interpretations
Ensure consistency across annotators
Validate ground truth through peer review when possible

Submission Process

For Code Contributions

Fork the repository
Create a feature branch (git checkout -b feature/new-benchmark)
Make your changes following our standards
Test your benchmark thoroughly
Submit a pull request with clear description of changes

For Data Contributions

Contact the team before collecting or preparing large datasets
Ensure you have proper permissions and licenses
Follow our data structure and documentation standards
Provide metadata about your dataset
Submit through the agreed process with the team

For Ground Truth Contributions

Follow existing annotation guidelines or propose new ones
Include documentation of your annotation process
Have your work reviewed by domain experts
Provide inter-annotator agreement statistics if applicable
Submit alongside documentation of annotation decisions

Attribution and Recognition

Contributors are recognized in our CONTRIBUTORS.md file based on their specific contributions across the five roles. When contributing, please:

Provide your ORCID identifier if available
Specify which roles you contributed to
Include any institutional affiliations
Provide GitHub username for technical contributions

Code of Conduct

We are committed to fostering an inclusive and respectful research environment:

Be respectful and constructive in all interactions
Acknowledge and cite others' work appropriately
Be open to feedback and different perspectives
Focus on what is best for the research community
Report any unacceptable behavior to the project maintainers

Questions and Support

Issues, Bugs, and Feature Requests

Check existing GitHub issues to see if your topic has been discussed
Create a new issue to report bugs, suggest features, or discuss improvements
Provide detailed descriptions and steps to reproduce for bug reports
Clearly explain the motivation and use case for feature requests

Research Questions

Contact the research team for discussions about scope, methodology, or domain-specific questions
Join our research meetings (contact us for information)

General Questions

Use GitHub Discussions for general questions about the project
Email the project maintainers for sensitive or private matters

Contact

For questions about contributions, please contact:

Maximilian Hindermann: maximilian.hindermann@unibas.ch
Sorin Marti: sorin.marti@unibas.ch

Resources

Project README - Overview of the benchmark suite
CONTRIBUTORS.md - Detailed attribution information
Benchmark Template - Template for new benchmarks
DataCite Contributor Types - Standard contributor classification

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Contributing to the LLM Benchmark Suite for Humanities Image Data

Ways to Contribute

1. Domain Expert

2. Data Curator

3. Annotator

4. Analyst

5. Engineer

How to Get Started

For New Contributors

For New Benchmark Development

Technical Requirements

Setting Up Your Development Environment

Benchmark Structure

Code and Data Standards

Code Standards

Data Standards

Ground Truth Standards

Submission Process

For Code Contributions

For Data Contributions

For Ground Truth Contributions

Attribution and Recognition

Code of Conduct

Questions and Support

Issues, Bugs, and Feature Requests

Research Questions

General Questions

Contact

Resources

FilesExpand file tree

CONTRIBUTING.md

Latest commit

History

CONTRIBUTING.md

File metadata and controls

Contributing to the LLM Benchmark Suite for Humanities Image Data

Ways to Contribute

1. Domain Expert

2. Data Curator

3. Annotator

4. Analyst

5. Engineer

How to Get Started

For New Contributors

For New Benchmark Development

Technical Requirements

Setting Up Your Development Environment

Benchmark Structure

Code and Data Standards

Code Standards

Data Standards

Ground Truth Standards

Submission Process

For Code Contributions

For Data Contributions

For Ground Truth Contributions

Attribution and Recognition

Code of Conduct

Questions and Support

Issues, Bugs, and Feature Requests

Research Questions

General Questions

Contact

Resources