Adding CrowsPairs task for English and French by oskarvanderwal · Pull Request #25 · bigscience-workshop/lm-evaluation-harness

oskarvanderwal · 2022-04-28T19:17:35Z

I can successfully run the CrowS-Pairs (multilingual) tasks for the prompts we have written.

For python main.py --model gpt2 --device cpu --tasks crows_pairs_english I get:

gpt2 (), limit: None, provide_description: False, num_fewshot: 0, batch_size: None
|       Task        |     Prompt      |Version|Metric|Value |   |Stderr|
|-------------------|-----------------|------:|------|-----:|---|-----:|
|crows_pairs_english|                1|      0|acc   |0.5092|±  |0.0122|
|crows_pairs_english|                2|      0|acc   |0.5069|±  |0.0122|
|crows_pairs_english|                3|      0|acc   |0.5069|±  |0.0122|
|crows_pairs_english|                4|      0|acc   |0.5194|±  |0.0122|
|crows_pairs_english|A_preference     |      0|acc   |0.4764|±  |0.0122|
|crows_pairs_english|A_stereotype_true|      0|acc   |0.4949|±  |0.0122|

There are no official implementations of the CrowS-Pairs benchmark that work for autoregressive models like GPT-2.
For another implementation of CrowS-Pairs (older version though), I get a bias score of 0.593501326259947 for GPT-2; So quite a bit higher---but that doesn't say too much since the operationalization of the measures is so different.

For python main.py --model gpt2 --device cpu --tasks crows_pairs_french I get:

|       Task       |       Prompt       |Version|Metric|Value |   |Stderr|
|------------------|--------------------|------:|------|-----:|---|-----:|
|crows_pairs_french|A_preference_fr     |      0|acc   |0.4997|±  |0.0122|
|crows_pairs_french|A_reality_check_fr  |      0|acc   |0.5134|±  |0.0122|
|crows_pairs_french|A_stereotype_true_fr|      0|acc   |0.5224|±  |0.0122|

StellaAthena · 2022-04-28T20:35:31Z

Great work, thanks for the PR

oskarvanderwal and others added 2 commits April 28, 2022 21:10

Added CrowsPairs for English and French

2d861a2

Merge branch 'bigscience-workshop:master' into master

073b080

StellaAthena merged commit 22155f7 into bigscience-workshop:master Apr 28, 2022

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Adding CrowsPairs task for English and French#25

Adding CrowsPairs task for English and French#25
StellaAthena merged 2 commits intobigscience-workshop:masterfrom
oskarvanderwal:master

oskarvanderwal commented Apr 28, 2022 •

edited

Loading

Uh oh!

StellaAthena commented Apr 28, 2022

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

Conversation

oskarvanderwal commented Apr 28, 2022 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

StellaAthena commented Apr 28, 2022

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

oskarvanderwal commented Apr 28, 2022 •

edited

Loading