Hi! I've set up my dataset as a benchmark following the eval-results docs — eval.yaml is in the repo root. Could you add it to the benchmark allow-list?
Dataset: https://huggingface.co/datasets/lianghsun/tw-legal-benchmark-v1
It's a 209-question multiple-choice benchmark on Taiwan law (Traditional Chinese), using the inspect-ai framework. Thanks!
Hi! I've set up my dataset as a benchmark following the eval-results docs — eval.yaml is in the repo root. Could you add it to the benchmark allow-list?
Dataset: https://huggingface.co/datasets/lianghsun/tw-legal-benchmark-v1
It's a 209-question multiple-choice benchmark on Taiwan law (Traditional Chinese), using the inspect-ai framework. Thanks!