The LLM Evaluation Framework
(IROS 2020, ECCVW 2020) Official Python Implementation for "3D Multi-Obj...
Python SDK for agent evals and observability
Evaluate your speech-to-text system with similarity measures such as wor...
PyNLPl, pronounced as 'pineapple', is a Python library for Natural Langu...
A Neural Framework for MT Evaluation
LightEval is a lightweight LLM evaluation suite that Hugging Face has be...
⚡️A Blazing-Fast Python Library for Ranking Evaluation, Comparison, and...
Source code for "Taming Visually Guided Sound Generation" (Oral at the B...
Resources for the "Evaluating the Factual Consistency of Abstractive Tex...
A Python wrapper for the ROUGE summarization evaluation package
Code base for the precision, recall, density, and coverage metrics for g...
[ICLR'24] Mitigating Hallucination in Large Multi-Modal Models via Robus...
An implementation of a full named-entity evaluation metrics based on Sem...