Test and evaluate LLMs, prompts and other configuration, across all the scenarios that matter for your application
Empirical is the fastest way to test different LLMs, prompts and other model configurations, across all the scenarios that matter for your application.
With Empirical, you can:
Watch demo video | See all docs
Empirical bundles together a CLI and a web app. The CLI handles running tests and the web app visualizes results.
Everything runs locally, with a JSON configuration file, empiricalrc.json
.
Required: Node.js 20+ needs to be installed on your system.
In this example, we will ask an LLM to parse user messages to extract entities and
give us a structured JSON output. For example, "I'm Alice from Maryland" will
become "{name: 'Alice', location: 'Maryland'}"
.
Our test will succeed if the model outputs valid JSON.
Use the CLI to create a sample configuration file called empiricalrc.json
.
npx @empiricalrun/cli init
cat empiricalrc.json
Run the test samples against the models with the run
command. This step requires
the OPENAI_API_KEY
environment variable to authenticate with OpenAI. This
execution will cost $0.0026, based on the selected models.
npx @empiricalrun/cli run
Use the ui
command to open the reporter web app and see side-by-side results.
npx @empiricalrun/cli ui
Edit the empiricalrc.json
file to make Empirical work for your use-case.
See development docs.