Test your prompts, models, and RAGs. Catch regressions and improve promp...
Your Automatic Prompt Engineering Assistant for GenAI Applications
The LLM Evaluation Framework
This is the repository of our article published in RecSys 2019 "Are We R...
LightEval is a lightweight LLM evaluation suite that Hugging Face has be...
A research library for automating experiments on Deep Graph Networks
Metrics to evaluate the quality of responses of your Retrieval Augmented...
Expressive is a cross-platform expression parsing and evaluation framewo...
PySODEvalToolkit: A Python-based Evaluation Toolbox for Salient Object D...
Evaluation suite for large-scale language models.
Test and evaluate LLMs, prompts and other configuration, across all the ...
Optical Flow Dataset and Benchmark for Visual Crowd Analysis
Official repository of RankEval: An Evaluation and Analysis Framework fo...
BIRL: Benchmark on Image Registration methods with Landmark validations