Weekly visualization report of Open LLM model performance based on 4 metrics.
This repository offers weekly visualizations that showcase the performance of open-source Large Language Models (LLMs), based on evaluation metrics sourced from Hugging Face's Open-LLM-Leaderboard. The visualizations are refreshed weekly to ensure up-to-date information.
You can refer to this CSV file for the underlying data used for visualization. Raw data is 2d-list formatted JSON file.
Set using config.py
git clone https://github.com/dsdanielpark/Open-LLM-Leaderboard-Report
cd Open-LLM-Leaderboard-Report
python main.py
Parameters: The largest parameter model achieved so far has been converted to 100 for percentage representation.
https://huggingface.co/spaces/HuggingFaceH4/open_llm_leaderboard
The Open LLM Leaderboard tracks, ranks, and evaluates large language models and chatbots. It evaluates models based on benchmarks from the Eleuther AI Language Model Evaluation Harness, covering science questions, commonsense inference, multitask accuracy, and truthfulness in generating answers.
The benchmarks aim to test reasoning and general knowledge in different fields using 0-shot and few-shot settings.
Evaluation is performed against 4 popular benchmarks:
It is chosed benchmarks as they test a variety of reasoning and general knowledge across a wide variety of fields in 0-shot and few-shot settings.
Parameters: The largest parameter model achieved so far has been converted to 100 for percentage representation.
@software{Open-LLM-Leaderboard-Report-2023,
author = {Daniel Park},
title = {{Open-LLM-Leaderboard-Report}},
url = {https://github.com/dsdanielpark/Open-LLM-Leaderboard-Report},
year = {2023}
}
[1] https://huggingface.co/spaces/HuggingFaceH4/open_llm_leaderboard