Detailed Leaderboard

Comprehensive breakdown of performance across 29 domains and 4 retrieval tasks.

Task 1: Query → Documents (Text-only Retrieval)

nDCG@10 scores for 11 text retrieval models across 29 domains organized into 4 categories.
Best in bold

Domain BM25 Contriever DiVeR E5 GritLM OpenAI Qwen Qwen2 Rader ReasonIR SFR
STEM & Life Sciences
Acad 3.1 26.8 42.9 35.7 36.7 38.5 22.7 35.7 29.7 40.1 35.7
Bio 8.1 16.7 32.6 21.8 21.9 33.7 21.6 27.4 26.6 26.3 26.9
Chem 18.2 19.2 25.9 24.7 31.7 32.6 21.9 32.9 29.6 37.0 29.0
Phys 5.5 15.7 25.6 21.9 20.0 27.7 20.3 27.4 22.9 27.0 25.5
Math 5.2 16.9 31.9 26.7 22.9 25.3 21.4 24.9 27.0 28.7 24.7
Earth 7.7 21.9 38.9 26.4 23.4 27.7 20.6 28.9 23.9 31.1 26.7
BioAc 4.7 22.3 29.7 20.7 19.9 25.2 20.9 22.9 22.1 22.9 24.2
BioInf 5.5 17.4 35.7 24.7 23.7 28.7 22.2 32.9 22.7 31.4 26.4
Med 6.7 17.9 32.7 26.4 22.9 28.9 20.4 25.9 25.7 29.7 27.7
Software & Technical Systems
Ubuntu 9.7 24.9 36.7 29.7 30.9 35.7 25.7 34.9 27.9 34.7 29.7
BTC 7.4 18.9 29.4 22.7 22.9 25.7 18.9 25.7 23.4 27.4 24.7
Crypto 4.7 14.4 24.7 17.7 18.4 22.7 16.7 22.9 17.9 21.7 19.7
QC 3.4 12.7 22.9 16.7 14.7 19.9 13.9 18.9 17.4 19.7 17.9
Robot 8.9 21.7 33.9 25.7 27.9 31.7 23.9 29.9 26.7 30.9 27.9
Sales 12.7 25.9 39.9 32.7 33.9 36.7 27.9 35.9 29.7 36.7 32.7
GIS 7.9 19.9 31.9 23.7 24.9 28.7 21.9 27.9 24.7 28.9 26.7
Apple 9.4 22.7 34.9 27.7 28.9 32.7 24.7 31.9 27.4 32.4 29.7
Social Sciences & Humanities
Econ 4.0 15.0 31.9 27.7 17.7 26.9 18.3 21.8 28.5 22.7 26.8
Psych 5.3 20.0 29.3 22.8 21.2 32.2 21.0 24.5 20.5 27.7 27.2
Phil 4.1 16.1 17.4 21.3 19.1 22.1 15.6 17.2 17.5 21.4 23.9
Law 6.2 49.7 52.3 50.2 42.5 51.4 47.1 59.3 44.2 57.8 45.4
Christ 30.3 21.3 35.2 29.9 28.9 21.5 24.3 37.4 14.1 26.6 24.7
Islam 9.8 21.0 34.9 27.5 28.8 27.3 17.7 34.3 20.6 34.3 27.5
Applied Domains
Aviat 1.1 21.2 29.5 21.9 29.2 30.4 12.6 31.3 21.7 25.3 28.0
Game 36.0 23.0 50.4 42.1 50.1 45.1 48.2 56.8 35.0 44.8 38.4
PM 18.0 24.9 40.4 31.0 34.8 24.5 39.9 36.5 27.7 34.7 27.8
Sustain 11.2 25.5 34.8 30.6 23.3 26.5 19.5 34.0 20.2 36.4 29.4
Travel 22.2 25.7 38.3 24.8 30.7 37.0 24.0 32.4 28.2 38.6 28.5
Quant 2.3 12.3 23.2 22.5 16.4 22.1 16.9 18.6 25.5 25.7 24.7
Average 8.5 20.1 32.2 25.3 25.3 28.8 21.5 28.1 24.9 28.6 26.9

Dataset Summary

Total Queries

2,803

Tasks 1-2: 1,585 | Tasks 3-4: 1,218

Document Corpus

2.5M

Text and multimodal documents

Annotated Images

7,621

Human-verified annotations