Opencompass Versions Save

OpenCompass is an LLM evaluation platform, supporting a wide range of models (Llama3, Mistral, InternLM2,GPT-4,LLaMa2, Qwen,GLM, Claude, etc) over 100+ datasets.

0.2.5.rc1

2 weeks ago

0.2.4

1 month ago

The OpenCompass team is thrilled to announce the release of OpenCompass v0.2.4!

🌟 Highlights

Enhanced support for multiple datasets including QuALITY, APPS and TACO.
Introducing multi-model judging for subjective test.
Bug fixes and improvements in configurations and documentation.

🚀 New Features

🌐 General

Feat #963 - Support for APPS dataset.
Feature #976 - Add the implementation of QuALITY datasets.
Feature #984 - Add support for setting prediction paths.
Feature #1006 - Support alpacaeval_v2.
Feature #1016 - Add multi-model judge.
Feature #1019 - Add ATC Choice Version.

📖 Documentation

Updates docs #1015 - General documentation updates and improvements.

🐛 Bug Fixes

Fix #964 - Fix the config's name of deepseek-coder.
Fix #890 - Update links and link checkers.
Fix #977 - Fix a bug in internlm2 series configs.
Fix #975 - Fix documentation issues.
Fix #992 - Fix running issues in turbomind_tis.
Fix #994 - Change status to list in base.py.
Fix #995, Fix #1020 - Quick fixes and refactors for configs.

⚙ Enhancements and Refactors

Modify requirements/runtime.txt #983 - Update numpy version requirement.
Update Needlebench and configs #986 - Enhancements in Needlebench configurations.
Simplify needlebench summarizer #1024 - Streamline Needlebench summarizer for better efficiency.

🎉 Welcome New Contributors

@seanzhang-zhichen, @kleinzcy, @ispobock, @Chaseldot, and @Y0oMu made their first contributions. Welcome to the OpenCompass community!

🔗 Full Change Logs

[Fix] fix the config's name of deepseek-coder by @jingmingzhuo in https://github.com/open-compass/opencompass/pull/964 [Fix] Update links and link checkers by @Leymore in https://github.com/open-compass/opencompass/pull/890 [Feat] support apps by @Connor-Shen in https://github.com/open-compass/opencompass/pull/963 fix doc problem by @seanzhang-zhichen in https://github.com/open-compass/opencompass/pull/975 [Fix] fix a bug in internlm2 series configs by @jingmingzhuo in https://github.com/open-compass/opencompass/pull/977 [Feature] Add the implement of QuALITY datasets by @jingmingzhuo in https://github.com/open-compass/opencompass/pull/976 modify the requirements/runtime.txt: numpy==1.23.4 --> numpy>=1.23.4 by @kleinzcy in https://github.com/open-compass/opencompass/pull/983 [Feature] add support for set prediction path by @bittersweet1999 in https://github.com/open-compass/opencompass/pull/984 [Feat] Support TACO by @Connor-Shen in https://github.com/open-compass/opencompass/pull/966 [Feature] update apps by @Connor-Shen in https://github.com/open-compass/opencompass/pull/985 [Fix] update apps/taco by @Connor-Shen in https://github.com/open-compass/opencompass/pull/988 [Feature] add one script for subjective by @bittersweet1999 in https://github.com/open-compass/opencompass/pull/993 Fix running issues in turbomind_tis by @ispobock in https://github.com/open-compass/opencompass/pull/992 [Fix] base.py change status into list by @Chaseldot in https://github.com/open-compass/opencompass/pull/994 [Fix] quick fix for configs by @bittersweet1999 in https://github.com/open-compass/opencompass/pull/995 [Feature] update needlebench and configs by @DseidLi in https://github.com/open-compass/opencompass/pull/986 [Feature] support alpacaeval_v2 by @bittersweet1999 in https://github.com/open-compass/opencompass/pull/1006 updates docs by @Y0oMu in https://github.com/open-compass/opencompass/pull/1015 [Feature] Add multi-model judge and fix some problems by @bittersweet1999 in https://github.com/open-compass/opencompass/pull/1016 [Fix] Refactor Needlebench Configs for CLI Testing Support by @DseidLi in https://github.com/open-compass/opencompass/pull/1020 [Feature] Add ATC Choice Version by @DseidLi in https://github.com/open-compass/opencompass/pull/1019 [Fix] Simplify needlebench summarizer by @DseidLi in https://github.com/open-compass/opencompass/pull/1024

For a detailed overview of all changes, check out our Full Changelog.

0.2.4.rc1

1 month ago

Provide with more parsed datasets:

OpenCompassData-complete-20240325.zip

Important updates compared to previous version are as follow:

Subjective: Add MTBench LongText: Support Needle-In-Haystack Test Dataset Code: Update generation version of CIBench

0.2.3

1 month ago

The OpenCompass team is thrilled to announce the release of OpenCompass v0.2.3! This version is packed with new features, crucial fixes, and documentation updates to improve your experience. We're continuously working to enhance OpenCompass, making it more robust and versatile for all users.

🌟 Highlights:

Enhanced Model Support: Introduction of new models and configurations, including support for the LightllmApi, lmdeploy pytorch engine, and more.
New Datasets and Benchmarks: Expanding our dataset repository with additions like OpenFinData, lveval benchmark, and an upgrade to Needlebench.
Documentation and Sync Improvements: Updated dataset pack URLs, fixed documentation errors, and synchronized with internal codes for consistency.

Explore the key updates in this release:

🌟 New Features:

📦 Dataset and Benchmark Expansion:
- Support for new datasets like OpenFinData and an upgrade to Needlebench, offering broader evaluation capabilities (#896, #913).
- Introduction of the lveval benchmark to enrich the evaluation landscape (#914).
🛠 Model and API Integrations:
- Enhanced functionality with support for LightllmApi input_format and prompt templates, alongside the introduction of get_ppl for TurbomindModel (#888, #878).
- New model configurations added, including support for gemini and deepseek-coder, further broadening the tools available for users (#931, #943).
📖 Documentation and Sync Updates:
- Updated dataset pack URLs and rank link in README to ensure users have access to the latest resources (#922, #911).
- Several syncs with internal codes and GitHub blacklist update to maintain consistency and integrity (#929, #953).

🐛 Bug Fixes:

Addressed various configuration and template issues to ensure smoother operation across different models and benchmarks (#894, #893).
Fixed issues related to IFEval, including type hints and config bugs, enhancing evaluation accuracy and functionality (#906, #915).

🎉 Welcome New Contributors:

We're delighted to welcome our new contributors: @xu-song, @x22x22, @yuantao2108, and @fanqiNO1. Your contributions are invaluable to the growth of OpenCompass!

🔗 Full Changelog

Support LightllmApi input_format by @helloyongyang in https://github.com/open-compass/opencompass/pull/888
[Fix] rename qwen2-beta -> qwen1.5 by @Leymore in https://github.com/open-compass/opencompass/pull/894
[Fix] Fix chatglm2 config by @Leymore in https://github.com/open-compass/opencompass/pull/893
[Fix] Fix moss template config by @xu-song in https://github.com/open-compass/opencompass/pull/897
Support lmdeploy pytorch engine by @RunningLeon in https://github.com/open-compass/opencompass/pull/875
[Fix] fix ifeval by @bittersweet1999 in https://github.com/open-compass/opencompass/pull/906
[Fix] fix ifeval by @jingmingzhuo in https://github.com/open-compass/opencompass/pull/909
[Fix] Fix type hint in IFEval for python<=3.8 by @Leymore in https://github.com/open-compass/opencompass/pull/915
[Docs] Update dataset pack urls by @Leymore in https://github.com/open-compass/opencompass/pull/922
[Sync] update github blacklist by @Leymore in https://github.com/open-compass/opencompass/pull/929
[Feature] add support for gemini by @bittersweet1999 in https://github.com/open-compass/opencompass/pull/931
[Feature] Support OpenFinData by @Skyfall-xzz in https://github.com/open-compass/opencompass/pull/896
[Fix]Fixed the problem of never entering task.run() mode in local scheduling mode. by @x22x22 in https://github.com/open-compass/opencompass/pull/930
Add VLLM Model Configs by @DseidLi in https://github.com/open-compass/opencompass/pull/938
[Feature] Upgrade the needle-in-a-haystack experiment to Needlebench by @DseidLi in https://github.com/open-compass/opencompass/pull/913
[Feature] add lveval benchmark by @yuantao2108 in https://github.com/open-compass/opencompass/pull/914
[Sync] Sync with internal 2023.03.04 by @Leymore in https://github.com/open-compass/opencompass/pull/941
[Fix] fix a bug of humanevalplus config by @jingmingzhuo in https://github.com/open-compass/opencompass/pull/944
[Feature] Add configs of deepseek-coder by @jingmingzhuo in https://github.com/open-compass/opencompass/pull/943
Fix FinanceIQ_datasets import error by @xu-song in https://github.com/open-compass/opencompass/pull/939
[Docs] Update rank link in README by @fanqiNO1 in https://github.com/open-compass/opencompass/pull/911
Support get_ppl for TurbomindModel by @RunningLeon in https://github.com/open-compass/opencompass/pull/878
Support prompt template for LightllmApi. Update LightllmApi token bucket. by @helloyongyang in https://github.com/open-compass/opencompass/pull/945
Fix LightllmApi ppl test by @helloyongyang in https://github.com/open-compass/opencompass/pull/951
[Fix] Chinese version of ReadTheDoc by @tonysy in https://github.com/open-compass/opencompass/pull/947
[fix] add different temp for different question in mtbench by @bittersweet1999 in https://github.com/open-compass/opencompass/pull/954
[Sync] Sync with internal codes 2024.03.08 by @Leymore in https://github.com/open-compass/opencompass/pull/953
[Docs] Update README by @tonysy in https://github.com/open-compass/opencompass/pull/956
[Misc] Update owners by @Leymore in https://github.com/open-compass/opencompass/pull/961
[Fix] Use logger.error on failure by @Leymore in https://github.com/open-compass/opencompass/pull/960
[Sync] Bump version 0.2.3 by @Leymore in https://github.com/open-compass/opencompass/pull/957

For a detailed overview of all changes, check out our Full Changelog.

0.2.2

3 months ago

Welcome to OpenCompass v0.2.2, a release brimming with new features, essential fixes, and significant improvements across the board. With a focus on enhancing functionality and expanding dataset support, this update underscores our commitment to providing a robust platform for our users.

🌟 Highlights:

Broadened Dataset Support: Introduction of diverse datasets like T-Eval, CIBench, IFEval, and NPHardEval, and more, broadening the horizons for research and evaluation.
API Integrations and Updates: New support for APIs like Nanbeige and updates to existing ones such as Zhipu and Sensetime, enhancing model interaction capabilities.
Dataset Collection Release: Integrated dataset collection is availabe in 0.2.2.rc1. Dataset used in OpenCompass 2.0 leaderboard is NOT included in this collection.

Dive into what's new and improved:

🌟 New Features:

📦 Datasets Expansion:
- Addition of multiple new datasets and evaluations, including T-Eval, CIBench, IFEval, and NPHardEval, offering more versatility for users (#813, #829, #809, #835).
🛠 API and Model Enhancements:
- Support for new APIs like Nanbeige and updates to enhance the functionality of existing ones (#786, #847, #834).
- Configurations and support for models and evaluators have been improved and expanded (#791, #812, #845).
📖 Documentation and CI Enhancements:
- Updated FAQs, contribution guides, and added new test runners to improve CI/CD processes (#830, #751, #874).

🐛 Bug Fixes:

Various fixes have been applied to address issues across datasets, evaluators, and configurations, ensuring a smoother experience for all users (#787, #788, #789).

🎉 Welcome New Contributors:

We're excited to welcome our new contributors: @notoschord, @zhulinJulia24, @QipengGuo, @RangiLyu, @del-zhenwu, and @hailsham. Thank you for your valuable contributions!

🔗 Full Changelog

Dev by @xmshi-trio in https://github.com/open-compass/opencompass/pull/779
[Fix] add temperature in alles by @bittersweet1999 in https://github.com/open-compass/opencompass/pull/787
[Feature] Add support of Nanbeige API by @notoschord in https://github.com/open-compass/opencompass/pull/786
[Fix] Update gsm8k agent prompt by @tonysy in https://github.com/open-compass/opencompass/pull/788
[Fix] hot fix for requirements by @yingfhu in https://github.com/open-compass/opencompass/pull/789
[Feature] Add configs for creationbench by @bittersweet1999 in https://github.com/open-compass/opencompass/pull/791
Add test runner, one case, daily and pr trigger by @zhulinJulia24 in https://github.com/open-compass/opencompass/pull/751
[Fix] reorganize subject files by @bittersweet1999 in https://github.com/open-compass/opencompass/pull/801
Update evaluate turbomind by @RunningLeon in https://github.com/open-compass/opencompass/pull/804
Added support for multi-needle testing in needle-in-a-haystack test by @DseidLi in https://github.com/open-compass/opencompass/pull/802
[Sync] Add InternLM2 Keyset Evaluation Demo by @Leymore in https://github.com/open-compass/opencompass/pull/807
[Doc] Update news by @Leymore in https://github.com/open-compass/opencompass/pull/810
Fix turbomind and update docs by @RunningLeon in https://github.com/open-compass/opencompass/pull/808
fix configs template for yi_6b_200k model by @DseidLi in https://github.com/open-compass/opencompass/pull/815
Test runner update - split step, change schedule time and disable hf cache by @zhulinJulia24 in https://github.com/open-compass/opencompass/pull/814
Add LightllmApi KeyError log & Update doc by @helloyongyang in https://github.com/open-compass/opencompass/pull/816
Update cdme config and evaluator by @QipengGuo in https://github.com/open-compass/opencompass/pull/812
Update hf_internlm2_chat template by @RangiLyu in https://github.com/open-compass/opencompass/pull/823
[Feature] add Compass arena by @bittersweet1999 in https://github.com/open-compass/opencompass/pull/828
[Fix] fix strings by @bittersweet1999 in https://github.com/open-compass/opencompass/pull/833
[Feature] Add IFEval by @jingmingzhuo in https://github.com/open-compass/opencompass/pull/813
[Feature] add mtbench by @bittersweet1999 in https://github.com/open-compass/opencompass/pull/829
[Feature] Update API implementation by @tonysy in https://github.com/open-compass/opencompass/pull/834
[Doc] Update FAQ & Contribution Guide by @Leymore in https://github.com/open-compass/opencompass/pull/830
add fail notify by @zhulinJulia24 in https://github.com/open-compass/opencompass/pull/836
[Sync] Updata dataset cfg for InternMath by @Leymore in https://github.com/open-compass/opencompass/pull/837
[Fix] fix corev2 by @bittersweet1999 in https://github.com/open-compass/opencompass/pull/838
[Feat] minor update agent related by @yingfhu in https://github.com/open-compass/opencompass/pull/839
[Update] Update Sensetime API by @tonysy in https://github.com/open-compass/opencompass/pull/844
[Fix] Update MedBench by @xmshi-trio in https://github.com/open-compass/opencompass/pull/845
[Fix] Fix acc of IFEval by @jingmingzhuo in https://github.com/open-compass/opencompass/pull/849
[Fix] Update Zhipu API and Fix issue min_out_len issue of API models by @tonysy in https://github.com/open-compass/opencompass/pull/847
Create link-check.yml by @del-zhenwu in https://github.com/open-compass/opencompass/pull/853
Update runtime.txt to fix rouge_chinese bugs. by @QipengGuo in https://github.com/open-compass/opencompass/pull/803
[Fix] fix compass arena by @bittersweet1999 in https://github.com/open-compass/opencompass/pull/854
add end_str for turbomind by @RunningLeon in https://github.com/open-compass/opencompass/pull/859
add daily test case by @zhulinJulia24 in https://github.com/open-compass/opencompass/pull/864
[Feature] support alpacaeval by @bittersweet1999 in https://github.com/open-compass/opencompass/pull/809
[Fix] Fix error in gsm8k evaluator by @yanyc428 in https://github.com/open-compass/opencompass/pull/782
[CI] Update github workflow image by @Leymore in https://github.com/open-compass/opencompass/pull/874
Update daily test by @zhulinJulia24 in https://github.com/open-compass/opencompass/pull/871
support NPHardEval by @Skyfall-xzz in https://github.com/open-compass/opencompass/pull/835
[Fix] add do sample demo for subjective dataset by @bittersweet1999 in https://github.com/open-compass/opencompass/pull/873
[Sync] Sync with internal codes 2024.02.05 by @Leymore in https://github.com/open-compass/opencompass/pull/876
[Fix] hotfix for mtbench by @bittersweet1999 in https://github.com/open-compass/opencompass/pull/877
fix lawbench 2-1 f0.5 score calculation bug by @Yggdrasill7D6 in https://github.com/open-compass/opencompass/pull/795
[feat] support multipl-e by @Connor-Shen in https://github.com/open-compass/opencompass/pull/846
fix bug of gsm8k_postprocess by @hailsham in https://github.com/open-compass/opencompass/pull/863
[Feature] add global retriever config by @hailsham in https://github.com/open-compass/opencompass/pull/842

For a full list of updates, visit our Full Changelog.

Thank you to every contributor, old and new. Your dedication is shaping OpenCompass into a more robust and versatile tool. 🙌 🎉

Remember to star 🌟 our GitHub repository if OpenCompass aids your research and development! Your support and feedback are crucial for our continuous improvement.

0.2.2.rc1

3 months ago

Provide with more parsed datasets:

OpenCompassData-core-20240207.zip OpenCompassData-complete-20240207.zip

Important updates compared to previous version are as follow:

Subjective: Add AlignBench, MTBench
Agent: Add T-Eval
Medicine: Add MedBench
Code: Add HumanEval-X, DS-1000
Finance: Add FinanceIQ
Law: Update LawBench Evaluation Assets

OpenCompassData-core-20240207.zip


AGIEval	ARC	BBH	ceval	CLUE	cmmlu
commonsenseqa	drop	FewCLUE	flores_first100	GAOKAO-BENCH	gsm8k
hellaswag	humaneval	lambada	LCSTS	math	mbpp
mmlu	nq	openbookqa	piqa	race	siqa
strategyqa	summedits	SuperGLUE	TheoremQA	triviaqa	tydiqa
winogrande	xstory_cloze	Xsum

OpenCompassData-complete-20240207.zip


AGIEval	anli	ARC	BBH	CDME	ceval
cibench_dataset	cleva	clozeTest-maxmin	CLUE	CMB	cmmlu
commonsenseqa	commonsenseqa_cn	crowspairs_cn	drop	ds1000_data	FewCLUE
FinanceIQ	flores200_dataset	flores_first100	FunctionalMT	game24	GAOKAO-BENCH
gpqa	gsm8k	hellaswag	humaneval	humaneval_cn	humaneval_multipl-e
humanevalx	HungarianExamMath	InfiniteBench	lambada	lanQ	lawbench
LCSTS	math	math401	mbpp	mbpp_cn	mbpp_plus
MedBench	mmlu	MNIST	NPHardEval	nq	nq_cn
nq-open	openbookqa	piqa	py150	qabench	race
scibench	siqa	SQuAD2.0	strategyqa	alignment_bench	mtbench
summedits	SuperGLUE	svamp	teval	TheoremQA	triviaqa
tydiqa	winogrande	xiezhi	xlsum	xstory_cloze	Xsum

0.2.1

4 months ago

We're thrilled to announce OpenCompass v0.2.1, loaded with new datasets, features, and vital fixes. This release is a testament to our ongoing commitment to enhancing user experience and broadening research capabilities.

🌟 Highlights:

Add Agent and Code datasets: Diverse new datasets like GPQA, mastermath2024v1, and more, significantly expanding the scope of OpenCompass.
Support Different JudgeLLM Subjective Evaluation: Providing more choice when choose judgellms.
Support Needle in Haystack: Support Needle in Haystack for longtext evaluation.
Add VLLM Evaluation: We support VLLM inference and evaluation.

Here's what's new:

🚀 New Features:

📦 Dataset Expansion:
- Added rwkv-5-3b model (#666)
- Integration of diverse datasets including GPQA, Creationbench, and more.
- Support for new datasets like mastermath2024v1, mbpp_plus, and sanitized_mbpp (#744, #770, #745)
🛠 Functional Enhancements:
- Subjective evaluation improvements (#692, #724)
- Updated python action, slurm, and docker docs (#694, #718)
- Turbomind API support and Qwen API integration (#693, #735)
📖 Documentation Updates:
- Updated contamination, alignmentbench, and other docs for better clarity (#698, #707)
- Fixed dead links and typos in various documents (#455, #773, #774)

🐛 Bug Fixes:

Addressed various issues including those in alignmentbench, configs, and postprocess scripts.
Fixed bugs concerning subjective evaluation and EOS string detection.
Quick fixes for improved performance and reliability.

🎉 Welcome New Contributors:

A warm welcome to our first-time contributors:
- @BBuf, @DseidLi, @Skyfall-xzz, @RunningLeon, @zehuichen123, @AllentDan, @Connor-Shen, @Francis-llgg, @hzhwcmhf, @ChrisLiu6, @yanyc428, @tpoisonooo, @jiangjin1999

🔗 Full Changelog

add rwkv-5-3b model by @BBuf in https://github.com/open-compass/opencompass/pull/666
[Feature] Add double order of subjective evaluation and removing duplicated response among two models by @bittersweet1999 in https://github.com/open-compass/opencompass/pull/692
[Feat] update python action and slurm by @yingfhu in https://github.com/open-compass/opencompass/pull/694
[Doc] Update contamination docs by @Leymore in https://github.com/open-compass/opencompass/pull/698
alignmentbench infer and judge by @bittersweet1999 in https://github.com/open-compass/opencompass/pull/697
[Fix] Update alignmentbench by @tonysy in https://github.com/open-compass/opencompass/pull/704
removed redundant code in GSM8KDataset.load method. by @DseidLi in https://github.com/open-compass/opencompass/pull/700
[Fix] fix a bug on configs/eval_mixtral_8x7b.py by @jingmingzhuo in https://github.com/open-compass/opencompass/pull/706
[Doc] Update Doc for Alignbench by @tonysy in https://github.com/open-compass/opencompass/pull/707
[Fix] minor fix openai by @yingfhu in https://github.com/open-compass/opencompass/pull/711
Add Judgellms by @bittersweet1999 in https://github.com/open-compass/opencompass/pull/710
[Feat] Update math/agent by @yingfhu in https://github.com/open-compass/opencompass/pull/716
[Docs] update docker docs by @yingfhu in https://github.com/open-compass/opencompass/pull/718
[Fix] Quick fix for max_out_len in subjective evaluation by @bittersweet1999 in https://github.com/open-compass/opencompass/pull/719
[Feature] Support the use of humaneval_plus. by @jingmingzhuo in https://github.com/open-compass/opencompass/pull/720
[Feature] Add reasonbench dataset by @Skyfall-xzz in https://github.com/open-compass/opencompass/pull/577
[Feature] Add abbr for judgemodel in subjective evaluation by @bittersweet1999 in https://github.com/open-compass/opencompass/pull/724
Update configs for evaluating chat models like qwen, baichuan, llama2 using turbomind backend by @RunningLeon in https://github.com/open-compass/opencompass/pull/721
[News] add news for T-Eval by @zehuichen123 in https://github.com/open-compass/opencompass/pull/727
Add NeedleInAHaystack Test Support by @DseidLi in https://github.com/open-compass/opencompass/pull/714
[Fix] Fixed abbr erro of subjective alignbench and size partition by @bittersweet1999 in https://github.com/open-compass/opencompass/pull/730
add turbomind restful api support by @AllentDan in https://github.com/open-compass/opencompass/pull/693
[Fix] Update merge script for non-split settting by @tonysy in https://github.com/open-compass/opencompass/pull/733
[Sync] Sync with internal codes by @Leymore in https://github.com/open-compass/opencompass/pull/734
[Feature] Add InfiniteBench by @philipwangOvO in https://github.com/open-compass/opencompass/pull/739
Update LightllmApi and Fix mmlu bug by @helloyongyang in https://github.com/open-compass/opencompass/pull/738
[Feature] Add other judgelm prompts for Alignbench by @bittersweet1999 in https://github.com/open-compass/opencompass/pull/731
[Feat] support sanitized mbpp dataset by @yingfhu in https://github.com/open-compass/opencompass/pull/745
[Fix] SubSizePartition fix by @bittersweet1999 in https://github.com/open-compass/opencompass/pull/746
add chinese version of humaneval, mbpp by @Connor-Shen in https://github.com/open-compass/opencompass/pull/743
[Fix] fix erro in configs by @bittersweet1999 in https://github.com/open-compass/opencompass/pull/750
[Feature] Add Creationbench Dataset by @bittersweet1999 in https://github.com/open-compass/opencompass/pull/753
[Feat] update code config by @yingfhu in https://github.com/open-compass/opencompass/pull/749
update plot function in tools_needleinahaystack.py by @DseidLi in https://github.com/open-compass/opencompass/pull/747
[Feature] Add new dataset mastermath2024v1 by @Francis-llgg in https://github.com/open-compass/opencompass/pull/744
[Feature] Add GPQA Dataset by @Francis-llgg in https://github.com/open-compass/opencompass/pull/729
change NeedleInAHaystackDataset to dynamic loading by @DseidLi in https://github.com/open-compass/opencompass/pull/754
[Feature] Add support of Qwen API by @hzhwcmhf in https://github.com/open-compass/opencompass/pull/735
[Feature] Support LLaMA2-Accessory by @ChrisLiu6 in https://github.com/open-compass/opencompass/pull/732
[Fix] Fix small bug in alignbench by @bittersweet1999 in https://github.com/open-compass/opencompass/pull/764
[Feature] Add multi_round dataset evaluation by @bittersweet1999 in https://github.com/open-compass/opencompass/pull/766
[Feature] add subject ir dataset by @bittersweet1999 in https://github.com/open-compass/opencompass/pull/755
[Update] Update introduction of CompassBench-2024-Q1 by @tonysy in https://github.com/open-compass/opencompass/pull/769
[Fix] quick fix for postprocess by @bittersweet1999 in https://github.com/open-compass/opencompass/pull/771
Support Mbpp_plus dataset by @Connor-Shen in https://github.com/open-compass/opencompass/pull/770
[Fix] fix typos in drop prompt by @yanyc428 in https://github.com/open-compass/opencompass/pull/773
typo(installation.md): fix unzip commands by @tpoisonooo in https://github.com/open-compass/opencompass/pull/774
Contamination analysis for MMLU, Hellaswag, and ARC_c by @liyucheng09 in https://github.com/open-compass/opencompass/pull/699
[Docs] Update contamination docs by @Leymore in https://github.com/open-compass/opencompass/pull/775
[Feature] _batch_generate function, add the MultiTokenEOSCriteria by @jiangjin1999 in https://github.com/open-compass/opencompass/pull/772
[Sync] Sync with internal codes 2023.01.08 by @Leymore in https://github.com/open-compass/opencompass/pull/777

For a full list of updates, visit our Full Changelog.

Thank you to every contributor, old and new. Your dedication is shaping OpenCompass into a more robust and versatile tool. 🙌 🎉

Remember to star 🌟 our GitHub repository if OpenCompass aids your research and development! Your support and feedback are crucial for our continuous improvement.

0.2.0

4 months ago

🌟 Highlights

🛠 Data Contamination Analysis: A novel feature for analyzing and ensuring the integrity of dataset inputs.
🧠 Enhanced Subjective Evaluation: Implementation of a new subjective judgement system, providing more nuanced and accurate evaluations.
🚀 Chat Style Inferencer Support: Introduction of a new chat style inferencer, enhancing interactive capabilities.
🌐 Multilingual Features: Expansion to support Chinese versions of commonsenseqa, crowspairs, and nq datasets.
📊 New Datasets Integration: Addition of wikibench, rolebench, and updated versions of gsm8k and MathBench datasets for broader research applications.
🛠 Enhancements and Bug Fixes: Numerous improvements including a new subjective judgement system and updates in MathBench CodeInterpreter.
📝 Documentation and API Updates: Comprehensive updates to README and API interfaces for better user guidance and experience.

🚀 New Features & Enhancements

Support for chat style inferencer, offering a more dynamic interaction model (#643).
Addition of Chinese versions for key datasets: commonsenseqa, crowspairs, and nq (#144).
Introduction of the wikibench dataset, providing a new benchmark for knowledge-based tasks (#655).
Updated gsm8k and MathBench configurations for enhanced performance and accuracy (#652, #657).
Addition of rolebench dataset, expanding the range of evaluative scenarios (#633).
Implementation of new subjective judgement criteria for improved assessment accuracy (#660).
Integration of advanced models like qwen-1.8b/72b and deepseek-7b/67b in the platform's configuration (#672).
Launch of Data Contamination Analysis as a new feature, enhancing data integrity checks (#639).

🛠 Improvements & Fixes

Removal of colossalai dependency to streamline operations (#645).
Resolution of various bugs including hellaswag_ppl_47bff9 and standard deviation summarizer issues (#648, #675).
Update and fix of the MathBench CodeInterpreter and related bugs (#657).
Enhancement of API interface for improved functionality and user experience (#681).

📚 Documentation Updates

Updated README for clearer guidance and information (#682).
Documentation and docstring updates for accuracy and comprehensiveness (#684).

🎊 New Contributors

A warm welcome to new contributors @rolellm, @liyucheng09, and @xmshi-trio. Your contributions have significantly enriched OpenCompass!

🔗 Full Changelog

[Fix] remove colossalai dependency by @yingfhu in https://github.com/open-compass/opencompass/pull/645
[Fix] Fix hellaswag_ppl_47bff9 by @Leymore in https://github.com/open-compass/opencompass/pull/648
[Feature] Support chat style inferencer. by @mzr1996 in https://github.com/open-compass/opencompass/pull/643
[Feature] Add Chinese version: commonsenseqa, crowspairs and nq by @liushz in https://github.com/open-compass/opencompass/pull/144
[Feature] Add wikibench dataset by @liushz in https://github.com/open-compass/opencompass/pull/655
[Feat] update gsm8k and math agent config by @yingfhu in https://github.com/open-compass/opencompass/pull/652
[Feature] Update MathBench CodeInterpreter & fix MathBench Bug by @liushz in https://github.com/open-compass/opencompass/pull/657
added rolebench dataset. by @rolellm in https://github.com/open-compass/opencompass/pull/633
New subjective judgement by @bittersweet1999 in https://github.com/open-compass/opencompass/pull/660
[Feature] Add qwen-1.8b/72b and deepseek-7b/67b configs by @Leymore in https://github.com/open-compass/opencompass/pull/672
Add Data Contamination Analysis [New Feature] by @liyucheng09 in https://github.com/open-compass/opencompass/pull/639
[Fix] fix bug on standart_deviation summarizer by @jingmingzhuo in https://github.com/open-compass/opencompass/pull/675
update medbench by @xmshi-trio in https://github.com/open-compass/opencompass/pull/678
[Enhancement] Update API Interface by @tonysy in https://github.com/open-compass/opencompass/pull/681
[Doc] Update README by @kennymckormick in https://github.com/open-compass/opencompass/pull/682
[Feat] support pr merge test ci by @yingfhu in https://github.com/open-compass/opencompass/pull/669
[Feature] enhance the ability of humaneval_postprocess by @jingmingzhuo in https://github.com/open-compass/opencompass/pull/676
[Sync] Update codes by @yingfhu in https://github.com/open-compass/opencompass/pull/683
[Docs] fix docstring by @yingfhu in https://github.com/open-compass/opencompass/pull/684
new version of subject by @bittersweet1999 in https://github.com/open-compass/opencompass/pull/680
fixed small problem of new version subject evaluation by @bittersweet1999 in https://github.com/open-compass/opencompass/pull/686
[Sync] bump version to 0.2.0 by @yingfhu in https://github.com/open-compass/opencompass/pull/690

Explore the detailed changes and contributions in the full changelog: OpenCompass Changelog.

Thank you to all contributors for your hard work and dedication. OpenCompass v0.2.0 marks another step forward in our journey, bringing enhanced features and capabilities to the community. Let's continue to innovate and expand the horizons of OpenCompass! 🎉🌐💡

0.1.9

5 months ago

🌟 Highlights

🚀 New API Integrations: A leap forward with the addition of multiple new APIs, including Baidu, Moonshot, Sensetime, and more, broadening the scope and capabilities of OpenCompass.
🔵 Circular Evaluation Feature: Introducing Circular Eval, an enhancement for comprehensive and dynamic evaluations within the platform.
🤖 Turbomind Inference Integration: Integration of Turbomind inference through its RPC API, enhancing the platform's inferencing capabilities.

🚀 New Features & Enhancements

Model & API Development: Explore new capabilities with DataCanvas Alaya LM, Lightllm API, 360API, and enhanced Turbomind Python API integration (#612, #613, #601, #484).
Circular Evaluation Implementation: Elevate your evaluation methods with the newly added Circular Eval feature, offering a more nuanced and detailed analysis capability (#610).
Rich Dataset Additions: Enrich your research with new datasets - FinanceIQ, SVAMP, GSM_Hard, and updated Mathbench for diverse applications (#596, #604, #619, #580, #607).

🛠 Improvements & Fixes

Subjective Evaluation Bug Fixes: Improved accuracy in subjective evaluations (#589).
Dataset and Feature Fixes: Resolving issues in CMB dataset, various feature enhancements, and fixes (#587, #592, #615, #632).

📚 Documentation Updates

README & FAQ Enhancements: Updated for better clarity and assistance (#582, #622, #628, #629).
Typo and Spelling Corrections: Ensuring accuracy and professionalism in documentation (#594, #637).

🎊 New Contributors

Welcoming new contributors to the OpenCompass family!

@rahidzeynal, @Sniper970119, @ZhangRaymond, @HunterKruger, @helloyongyang, and @Yggdrasill7D6. Your contributions are greatly appreciated!

What's Changed

Add author as: author='OpenCompass Contributors' by @rahidzeynal in https://github.com/open-compass/opencompass/pull/578
[Doc] Update README by @tonysy in https://github.com/open-compass/opencompass/pull/582
[Feature] Update mathbench by @tonysy in https://github.com/open-compass/opencompass/pull/580
Fix bugs in subjective evaluation by @frankweijue in https://github.com/open-compass/opencompass/pull/589
[Fix] fix cmb dataset by @Leymore in https://github.com/open-compass/opencompass/pull/587
[Fix] change save_every defaults to 1 by @yingfhu in https://github.com/open-compass/opencompass/pull/592
update word spell by @Sniper970119 in https://github.com/open-compass/opencompass/pull/594
Add FinanceIQ dataset by @ZhangRaymond in https://github.com/open-compass/opencompass/pull/596
[Feat] support humaneval and mbpp pass@k by @yingfhu in https://github.com/open-compass/opencompass/pull/598
[Feature] Add multi-prompt generation demo by @jingmingzhuo in https://github.com/open-compass/opencompass/pull/568
Mathbench update postprocess by @liushz in https://github.com/open-compass/opencompass/pull/600
[Feature] Add arithmetic to mathbench by @liushz in https://github.com/open-compass/opencompass/pull/607
Add support for DataCanvas Alaya LM by @HunterKruger in https://github.com/open-compass/opencompass/pull/612
[Feature] Support Lightllm api by @helloyongyang in https://github.com/open-compass/opencompass/pull/613
[Feature] Support 360API and FixKRetriever for CSQA dataset by @tonysy in https://github.com/open-compass/opencompass/pull/601
Integrate turbomind python api by @lvhan028 in https://github.com/open-compass/opencompass/pull/484
[Bug] Update api with generation_kargs by @tonysy in https://github.com/open-compass/opencompass/pull/614
[Fix] Fix gen inferencer by @Leymore in https://github.com/open-compass/opencompass/pull/615
[Docs] update ds1000 code eval docs by @yingfhu in https://github.com/open-compass/opencompass/pull/618
[Feature] Add SVAMP dataset by @liushz in https://github.com/open-compass/opencompass/pull/604
[Feature] support download from modelscope by @KevinNuNu in https://github.com/open-compass/opencompass/pull/534
[Doc] Update README and requirements. by @tonysy in https://github.com/open-compass/opencompass/pull/622
[Sync] Fix cmnli, fix vicuna meta template, fix longbench postprocess and other minor fixes by @Leymore in https://github.com/open-compass/opencompass/pull/625
[API] Update API by @tonysy in https://github.com/open-compass/opencompass/pull/624
[Feature] Add circular eval by @Leymore in https://github.com/open-compass/opencompass/pull/610
[Doc] Update FAQ by @Leymore in https://github.com/open-compass/opencompass/pull/628
[Doc] Update README by @tonysy in https://github.com/open-compass/opencompass/pull/629
[Bug] fix icl eval with nested list by @yingfhu in https://github.com/open-compass/opencompass/pull/632
Fix LightllmAPI list bug by @helloyongyang in https://github.com/open-compass/opencompass/pull/635
fix typo in README by @Yggdrasill7D6 in https://github.com/open-compass/opencompass/pull/637
[Sync] update codes by @Leymore in https://github.com/open-compass/opencompass/pull/641
[Feature] Add GSM_Hard dataset by @liushz in https://github.com/open-compass/opencompass/pull/619
[Feat] support zhipu post process by @yingfhu in https://github.com/open-compass/opencompass/pull/642
[Sync] Bump version to 0.1.9 by @Leymore in https://github.com/open-compass/opencompass/pull/644

Explore the detailed changes in the full changelog.

Thank you to all the contributors for this release. Your dedication and hard work continue to enhance OpenCompass, making it an ever-evolving and dynamic tool for the community. Let's dive into the new possibilities with OpenCompass v0.1.9! 🎉🧮💻

0.1.8

5 months ago

🔥 Highlights

🌐 New Dataset Integrations: Expanding our dataset collection with Tabmwp, py150, maxmin, and more.
💡 Compatibility and API Support: Enhancements with MiniGPT-4 and MiniMax API, and support for Xunfei API.
🛠️ Local Environment and Debugging Improvements: Streamlined local debugging and usage of datasets from local paths.

🚀 New Features & Enhancements

Datasets Galore: Unleash the power of new datasets including Tabmwp, py150, maxmin, and updates to existing ones like Mathbench for broader research scope (#505, #546, #562).
MiniGPT-4 & MiniMax API Compatibility: Stay up-to-date with the latest versions and extended API support (#539, #548).
Xunfei API Model & Update: Explore new possibilities with the integration and update of Xunfei API (#547, #572).

🛠 Improvements & Fixes

Local Debug Mode Restriction: Enhanced resource management in local debug mode (#522 by @yingfhu).
Various Fixes and Updates: Addressing typos, import issues, and log redirections for smoother operation (#520, #549, #551, #555, #564).

📚 Documentation Updates

Enhanced README and FAQs: Get all your queries answered and understand OpenCompass better with updated documentation (#523, #531, #535, #540, #567).
Typo Corrections: Ensuring clarity and accuracy in our documentation (#530, #533).

🎊 New Contributors

A warm welcome to the new members of the OpenCompass community!

@Sanster, @ayushrakesh, @HimanshuMahto, @shresthasurav, @bittersweet1999, and @jingmingzhuo. Thank you for your valuable contributions!

Changelog

add multi model viz by @Sanster in https://github.com/open-compass/opencompass/pull/509
fix typo in WSC prompt by @Sanster in https://github.com/open-compass/opencompass/pull/520
[Fix] fix local debug mode not restrict the resources by @yingfhu in https://github.com/open-compass/opencompass/pull/522
Update README.md - one enhancement. by @ayushrakesh in https://github.com/open-compass/opencompass/pull/523
Typo error in README.md by @HimanshuMahto in https://github.com/open-compass/opencompass/pull/531
docs: fix typos in markdown files by @shresthasurav in https://github.com/open-compass/opencompass/pull/530
[Doc] Update README and FAQ by @tonysy in https://github.com/open-compass/opencompass/pull/535
[fFeat] Add an opensource dataset Tabmwp by @bittersweet1999 in https://github.com/open-compass/opencompass/pull/505
[Feature]: To be compatible with the latest version of MiniGPT-4 by @YuanLiuuuuuu in https://github.com/open-compass/opencompass/pull/539
[Doc] Update README by @tonysy in https://github.com/open-compass/opencompass/pull/540
[Feat] support xunfei api model by @yingfhu in https://github.com/open-compass/opencompass/pull/547
[Feature] Add support for MiniMax API by @tonysy in https://github.com/open-compass/opencompass/pull/548
【Feature】Update Mathbench dataset prompt and fix small errors by @liushz in https://github.com/open-compass/opencompass/pull/546
[Fix] fix filename typo by @yingfhu in https://github.com/open-compass/opencompass/pull/549
[Feat] support cidataset by @yingfhu in https://github.com/open-compass/opencompass/pull/538
[Fix] fix registry error with internal by @yingfhu in https://github.com/open-compass/opencompass/pull/551
[Fix] fix unnecessary import and update requirements by @yingfhu in https://github.com/open-compass/opencompass/pull/555
[Fix] fix log re-direct by @yingfhu in https://github.com/open-compass/opencompass/pull/564
Add py150 and maxmin by @jingmingzhuo in https://github.com/open-compass/opencompass/pull/562
[Doc] Update api.txt by @tonysy in https://github.com/open-compass/opencompass/pull/567
[Docs] add humanevalx dataset link in config by @yingfhu in https://github.com/open-compass/opencompass/pull/559
[Docs] fix GLUE_CoLA dataset name error by @KevinNuNu in https://github.com/open-compass/opencompass/pull/533
[Feature] Update xunfei api by @tonysy in https://github.com/open-compass/opencompass/pull/572
[Feature] Add CMB zero-shot evaluation by @Leymore in https://github.com/open-compass/opencompass/pull/571
[Feature] Use dataset in local path by @Leymore in https://github.com/open-compass/opencompass/pull/570
[Sync] update model configs by @Leymore in https://github.com/open-compass/opencompass/pull/574
[Sync] Bump version to 0.1.8 by @Leymore in https://github.com/open-compass/opencompass/pull/576

Explore the detailed changes in the full changelog.

Thank you to everyone who contributed to this release. Your efforts are immensely appreciated and are helping to make OpenCompass a more robust and versatile tool. Let's continue to push the boundaries with OpenCompass v0.1.8! 🚀🌐🛠️