Ir Datasets Versions Save

Provides a common interface to many IR ranking datasets.

v0.5.7

2 weeks ago

v0.5.6

3 months ago

What's Changed

Full Changelog: https://github.com/allenai/ir_datasets/compare/v0.5.5...v0.5.6

v0.5.5

11 months ago

What's Changed

New Contributors

Full Changelog: https://github.com/allenai/ir_datasets/compare/v0.5.4...v0.5.5

v0.5.4b

1 year ago

What's Changed

Full Changelog: https://github.com/allenai/ir_datasets/compare/v0.5.3...v0.5.4b

v0.5.3

1 year ago

What's Changed

New Contributors

Full Changelog: https://github.com/allenai/ir_datasets/compare/v0.5.2...v0.5.3

v0.5.2

1 year ago

New Datasets

  • TREC Clinical Trials 2022
  • TREC Fair Ranking 2022
  • CODEC

Features / Bugfixes

  • Fix TREC Genomics Track 2005 description
  • Allow downloads to resume for all MSMARCO dataset resources larger than 500MB
  • For format support for disks45

Full Changelog: https://github.com/allenai/ir_datasets/compare/v0.5.1...v0.5.2

v0.5.1

2 years ago

What's Changed

New Contributors

Full Changelog: https://github.com/allenai/ir_datasets/compare/v0.5.0...v0.5.1

v0.5.0

2 years ago

New Features:

  • Metadata is included for all datasets, including record counts, without needing to download or process the data.
  • New entity type (qlogs) for query log records

New datasets:

  • argsme & touche (thanks @heinrichreimer!)
  • aol-ia dataset
  • tripclick logs
  • trec-dl-2021 qrels (active participants only for now)

Miscellaneous:

  • No longer updates root logger instance, allowing other applications to easily cusomise logging output from this package
  • Updates to documentation

v0.4.3

2 years ago

Added:

  • trec-fair-2021/eval topics
  • clinicaltrials/2021/trec-ct-2021
  • c4 and c4/en-noclean-tr/trec-misinfo-2021
  • wikir/en78k and wikir/ens78k
  • msmarco-passage-v2/trec-dl-2021 and msmarco-document-v2/trec-dl-2021
  • mr-tydi
  • mmarco

Misc:

  • some minor changes to clean command
  • msmarco-passage-v2 lookups now performed by ID instead of lz4
  • file linking info not shown when downloading small files
  • fixed cord19/fulltext
  • other minor fixes

v0.4.2

2 years ago

Adds the following datasets:

  • MS MARCO Passage version 2
  • TREC Fair Ranking 2021

A few other minor improvements:

  • Progress bars: units + totals in a few more places
  • Checks for adequate disk space before big downloads (can be disabled with an environment variable)