OpenCompass is an LLM evaluation platform, supporting a wide range of models (Llama3, Mistral, InternLM2,GPT-4,LLaMa2, Qwen,GLM, Claude, etc) over 100+ datasets.
The OpenCompass team is thrilled to announce the release of OpenCompass v0.2.4!
[Fix] fix the config's name of deepseek-coder by @jingmingzhuo in https://github.com/open-compass/opencompass/pull/964 [Fix] Update links and link checkers by @Leymore in https://github.com/open-compass/opencompass/pull/890 [Feat] support apps by @Connor-Shen in https://github.com/open-compass/opencompass/pull/963 fix doc problem by @seanzhang-zhichen in https://github.com/open-compass/opencompass/pull/975 [Fix] fix a bug in internlm2 series configs by @jingmingzhuo in https://github.com/open-compass/opencompass/pull/977 [Feature] Add the implement of QuALITY datasets by @jingmingzhuo in https://github.com/open-compass/opencompass/pull/976 modify the requirements/runtime.txt: numpy==1.23.4 --> numpy>=1.23.4 by @kleinzcy in https://github.com/open-compass/opencompass/pull/983 [Feature] add support for set prediction path by @bittersweet1999 in https://github.com/open-compass/opencompass/pull/984 [Feat] Support TACO by @Connor-Shen in https://github.com/open-compass/opencompass/pull/966 [Feature] update apps by @Connor-Shen in https://github.com/open-compass/opencompass/pull/985 [Fix] update apps/taco by @Connor-Shen in https://github.com/open-compass/opencompass/pull/988 [Feature] add one script for subjective by @bittersweet1999 in https://github.com/open-compass/opencompass/pull/993 Fix running issues in turbomind_tis by @ispobock in https://github.com/open-compass/opencompass/pull/992 [Fix] base.py change status into list by @Chaseldot in https://github.com/open-compass/opencompass/pull/994 [Fix] quick fix for configs by @bittersweet1999 in https://github.com/open-compass/opencompass/pull/995 [Feature] update needlebench and configs by @DseidLi in https://github.com/open-compass/opencompass/pull/986 [Feature] support alpacaeval_v2 by @bittersweet1999 in https://github.com/open-compass/opencompass/pull/1006 updates docs by @Y0oMu in https://github.com/open-compass/opencompass/pull/1015 [Feature] Add multi-model judge and fix some problems by @bittersweet1999 in https://github.com/open-compass/opencompass/pull/1016 [Fix] Refactor Needlebench Configs for CLI Testing Support by @DseidLi in https://github.com/open-compass/opencompass/pull/1020 [Feature] Add ATC Choice Version by @DseidLi in https://github.com/open-compass/opencompass/pull/1019 [Fix] Simplify needlebench summarizer by @DseidLi in https://github.com/open-compass/opencompass/pull/1024
For a detailed overview of all changes, check out our Full Changelog.
Provide with more parsed datasets:
OpenCompassData-complete-20240325.zip
Important updates compared to previous version are as follow:
Subjective: Add MTBench LongText: Support Needle-In-Haystack Test Dataset Code: Update generation version of CIBench
The OpenCompass team is thrilled to announce the release of OpenCompass v0.2.3! This version is packed with new features, crucial fixes, and documentation updates to improve your experience. We're continuously working to enhance OpenCompass, making it more robust and versatile for all users.
Explore the key updates in this release:
๐ฆ Dataset and Benchmark Expansion:
๐ Model and API Integrations:
๐ Documentation and Sync Updates:
For a detailed overview of all changes, check out our Full Changelog.
Welcome to OpenCompass v0.2.2, a release brimming with new features, essential fixes, and significant improvements across the board. With a focus on enhancing functionality and expanding dataset support, this update underscores our commitment to providing a robust platform for our users.
T-Eval
, CIBench
, IFEval
, and NPHardEval
, and more, broadening the horizons for research and evaluation.Dive into what's new and improved:
๐ฆ Datasets Expansion:
๐ API and Model Enhancements:
๐ Documentation and CI Enhancements:
For a full list of updates, visit our Full Changelog.
Thank you to every contributor, old and new. Your dedication is shaping OpenCompass into a more robust and versatile tool. ๐ ๐
Remember to star ๐ our GitHub repository if OpenCompass aids your research and development! Your support and feedback are crucial for our continuous improvement.
Provide with more parsed datasets:
OpenCompassData-core-20240207.zip
OpenCompassData-complete-20240207.zip
Important updates compared to previous version are as follow:
OpenCompassData-core-20240207.zip
AGIEval | ARC | BBH | ceval | CLUE | cmmlu |
commonsenseqa | drop | FewCLUE | flores_first100 | GAOKAO-BENCH | gsm8k |
hellaswag | humaneval | lambada | LCSTS | math | mbpp |
mmlu | nq | openbookqa | piqa | race | siqa |
strategyqa | summedits | SuperGLUE | TheoremQA | triviaqa | tydiqa |
winogrande | xstory_cloze | Xsum |
OpenCompassData-complete-20240207.zip
AGIEval | anli | ARC | BBH | CDME | ceval |
cibench_dataset | cleva | clozeTest-maxmin | CLUE | CMB | cmmlu |
commonsenseqa | commonsenseqa_cn | crowspairs_cn | drop | ds1000_data | FewCLUE |
FinanceIQ | flores200_dataset | flores_first100 | FunctionalMT | game24 | GAOKAO-BENCH |
gpqa | gsm8k | hellaswag | humaneval | humaneval_cn | humaneval_multipl-e |
humanevalx | HungarianExamMath | InfiniteBench | lambada | lanQ | lawbench |
LCSTS | math | math401 | mbpp | mbpp_cn | mbpp_plus |
MedBench | mmlu | MNIST | NPHardEval | nq | nq_cn |
nq-open | openbookqa | piqa | py150 | qabench | race |
scibench | siqa | SQuAD2.0 | strategyqa | alignment_bench | mtbench |
summedits | SuperGLUE | svamp | teval | TheoremQA | triviaqa |
tydiqa | winogrande | xiezhi | xlsum | xstory_cloze | Xsum |
We're thrilled to announce OpenCompass v0.2.1, loaded with new datasets, features, and vital fixes. This release is a testament to our ongoing commitment to enhancing user experience and broadening research capabilities.
GPQA
, mastermath2024v1
, and more, significantly expanding the scope of OpenCompass.Here's what's new:
๐ฆ Dataset Expansion:
๐ Functional Enhancements:
๐ Documentation Updates:
For a full list of updates, visit our Full Changelog.
Thank you to every contributor, old and new. Your dedication is shaping OpenCompass into a more robust and versatile tool. ๐ ๐
Remember to star ๐ our GitHub repository if OpenCompass aids your research and development! Your support and feedback are crucial for our continuous improvement.
Thank you to all contributors for your hard work and dedication. OpenCompass v0.2.0 marks another step forward in our journey, bringing enhanced features and capabilities to the community. Let's continue to innovate and expand the horizons of OpenCompass! ๐๐๐ก
Welcoming new contributors to the OpenCompass family!
Explore the detailed changes in the full changelog.
Thank you to all the contributors for this release. Your dedication and hard work continue to enhance OpenCompass, making it an ever-evolving and dynamic tool for the community. Let's dive into the new possibilities with OpenCompass v0.1.9! ๐๐งฎ๐ป
A warm welcome to the new members of the OpenCompass community!
Explore the detailed changes in the full changelog.
Thank you to everyone who contributed to this release. Your efforts are immensely appreciated and are helping to make OpenCompass a more robust and versatile tool. Let's continue to push the boundaries with OpenCompass v0.1.8! ๐๐๐ ๏ธ