We need to run benchmarking on the BEIR MSMARCO dataset, to have a better understanding of how the models are performing for retrieval tasks.
We can use the test split available on Hugging Face hub:
QRels
Corpus
Proposed metrics:
- NDCG@10
- Precision@10
- Recall@100
Considering non-judged documents as non-relevant.
We need to run benchmarking on the BEIR MSMARCO dataset, to have a better understanding of how the models are performing for retrieval tasks.
We can use the
testsplit available on Hugging Face hub:QRels
Corpus
Proposed metrics:
Considering non-judged documents as non-relevant.