Confident Ai Deepeval Versions Save

The LLM Evaluation Framework

v0.21.15

2 months ago

For deepeval's latest release v0.21.15, we release:

v0.20.85

2 months ago

In deepeval v0.20.85:

v0.20.80

2 months ago

In DeepEval's latest release, there is now:

v0.20.73

3 months ago

For the newest release, deepeval now is now stable for production use:

  • reduced package size
  • separated functionality of pytest vs deepeval test run command
  • included coverage score for summarization
  • fix contextual precision node error
  • released docs for better transparency into metrics calculation
  • allows users to configure RAGAS metrics for custom embedding models: https://docs.confident-ai.com/docs/metrics-ragas#example
  • fixed bugs with checking for package updates

v0.20.68

3 months ago

For the latest release, DeepEval:

v0.20.48

4 months ago

v0.20.57

4 months ago
  • LLM-Evals (LLM evaluated metrics) now support all of langchain's chat models.
  • LLMTestCase now has execution_time and cost, useful for those looking to evaluate on these parameters
  • minimum_score is now threshold instead, meaning you can now create custom metrics that either have a "minimum" or "maximum" threshold
  • LLMEvalMetric is now GEval
  • Llamaindex Tracing integration: (https://docs.llamaindex.ai/en/stable/module_guides/observability/observability.html#deepeval)

v0.20.43

5 months ago

In this release:

v0.20.35

5 months ago

Lots of new features this release:

  1. JudgementalGPT now allows for different languages - useful for our APAC and European friends
  2. RAGAS metrics now supports all OpenAI models - useful for those running into context length issues
  3. LLMEvalMetric now returns a reasoning for its score
  4. deepeval test run now has hooks that call on test run completion
  5. evaluate now displays retrieval_context for RAG evaluation
  6. RAGAS metric now displays metric breakdown for all its distinct metrics

v0.20.23

5 months ago

Automatically integrated with Confident AI for continous evaluation throughout the lifetime of your LLM (app):

-log evaluation results and analyze metrics pass / fails -compare and pick the optimal hyperparameters (eg. prompt templates, chunk size, models used, etc.) based on evaluation results -debug evaluation results via LLM traces -manage evaluation test cases / datasets in one place -track events to identify live LLM responses in production -add production events to existing evaluation datasets to strength evals over time