Test your prompts, models, and RAGs. Catch regressions and improve prompt quality. LLM evals for OpenAI, Azure, Anthropic, Gemini, Mistral, Llama, Bedrock, Ollama, and other local & private models with CI/CD integration.
transform
property by @typpo in https://github.com/promptfoo/promptfoo/pull/696
eval -n
arg for running the first n test cases by @typpo in https://github.com/promptfoo/promptfoo/pull/700
Full Changelog: https://github.com/promptfoo/promptfoo/compare/0.54.1...0.55.0
Full Changelog: https://github.com/promptfoo/promptfoo/compare/0.54.0...0.54.1
Answer-relevance
calculation by @anthonyivn2 in https://github.com/promptfoo/promptfoo/pull/683
Full Changelog: https://github.com/promptfoo/promptfoo/compare/0.53.0...0.54.0
When using promptfoo as a node library, Assertion
value functions are now invoked with the same args as when using the CLI. See AssertionValueFunction for type details.
In practice, this change means that instead of:
function assertValue(output, testCase, assertion) { ... }
You can do:
function assertValue(output, { prompt, vars, test }) { ... }
The reason for this change is that it's confusing that the CLI and library accept different functions, and only the CLI function signature was documented.
javascript
assert function call consistent with external js function call by @typpo in https://github.com/promptfoo/promptfoo/pull/674
Full Changelog: https://github.com/promptfoo/promptfoo/compare/0.52.0...0.53.0
E2BIG
error during the execution of Python asserts by @sangwoo-joh in https://github.com/promptfoo/promptfoo/pull/660
Full Changelog: https://github.com/promptfoo/promptfoo/compare/0.51.0...0.52.0
Python assertions now expect a get_assert
function which returns a native value, rather than parsing stdout (#594). This means instead of:
print(json.dumps((result))
You should just return the assertion result:
return result
Here's a full example of a custom_assert.py
:
def get_assert(output, context) -> Union[bool, float, Dict[str, Any]]
print('Prompt:', context['prompt'])
print('Vars', context['vars']['topic']
# Determine the result...
result = test_output(output)
# Here's an example GradingResult dict
result = {
'pass': True,
'score': 0.6,
'reason': 'Looks good to me',
}
return result
See documentation
Full Changelog: https://github.com/promptfoo/promptfoo/compare/0.50.1...0.51.0
Full Changelog: https://github.com/promptfoo/promptfoo/compare/0.50.0...0.50.1
transform
by @typpo in https://github.com/promptfoo/promptfoo/pull/605
NEXT_PUBLIC_PROMPTFOO_REMOTE_BASE_URL
by @typpo in https://github.com/promptfoo/promptfoo/pull/609
Full Changelog: https://github.com/promptfoo/promptfoo/compare/0.49.3...0.50.0
Full Changelog: https://github.com/promptfoo/promptfoo/compare/0.49.2...0.49.3
Full Changelog: https://github.com/promptfoo/promptfoo/compare/0.49.1...0.49.2