Scripts that run against Watson Assistant for K fold validation on training set, testing on blind test, and draw precision curves for comparison.
Scripts that run against Watson Assistant for
KFOLD
K fold cross validation on training set,BLIND
Evaluating a blind test, andTEST
Testing the WA against a list of utterances.In the case of a k-fold cross validation, or a blind set, the tool will output a precision curve, in addition to per-intent precision and recall rates, and a confusion matrix.
Install Certificates.command
found in /Applications/Python
. See more here
Pre-work: Make sure to cd into the location of a projects folder, where you will clone this github repo. Within the folder, cd into the WA-Testing-Tool folder.
git clone https://github.com/cognitive-catalyst/WA-Testing-Tool.git
pip3 install --upgrade -r requirements.txt
config.ini
). Use config.ini.sample
to bootstrap your configuration.
a. In your terminal, copy the config file into a new one, cp config.ini.sample config.ini
b. Open the config.ini file in your favorite text editor, edit and save the following information with your actual credentials:
API Key
url
workspace_id (Watson Assistant v1) or environment_id (Watson Assistant v2)
c. Set the mode and the mode-specific parameters.python3 run.py -c config.ini
or python3 run.py -c <path to your config file>
If you have already installed this utility use these steps to get the latest code.
pip3 install --upgrade -r requirements.txt
git pull
config.ini
- Configuration file for run.py
.
This is formatted differently for each mode. Review the Examples below to explore the possible modes and how each is configured.
test_input_file.csv
- Test set for blind testing and standard test.
For blind test with golden intent used for comparison:
utterance | golden intent |
---|---|
utterance 0 | intent 0 |
utterance 1 | intent 0 |
utterance 2 | intent 1 |
For standard test, the input must only have one column or error will be thrown:
utterance |
---|
utterance 0 |
utterance 1 |
utterance 2 |
There are a variety of ways to use this tool. Primarily you will execute a k-folds, blind, or standard test.
Run standard test without ground truth
Generate precision/recall for classification test
Generate confusion matrix for classification test
Compare two different blind test results
Generate description for intents
Generate long-tail classification results
Run syntax validation patterns on a workspace
Extract and analyze Watson Assistant log data
Long-form resources available in Article and Video form:
Title | Article | Video |
---|---|---|
Testing a Chatbot with k-folds Cross Validation | https://medium.com/ibm-watson/testing-a-chatbot-with-k-folds-cross-validation-68dab111a6b | https://www.youtube.com/watch?v=FrhK68WyOK4 |
Analyze chatbot classifier performance from logs | https://medium.com/ibm-watson/analyze-chatbot-classifier-performance-from-logs-e9cf2c7ca8fd | https://www.youtube.com/watch?v=yd89DKyf6hc |
Improve a chatbot classifier with production data | https://medium.com/ibm-watson/improve-a-chatbot-classifier-with-production-data-22a437f419b4 | https://www.youtube.com/watch?v=ftFIQtHiQY8 |
Watson Assistant is commonly paired with IBM Speech services to build voice-driven Conversational AI solutions. Check out these tools to assess and tune your speech models!
This tool can also be used to test a trained Natural Language Understanding (NLU) Classifier. The configuration is similar to testing Watson Assistant except:
url
parameter (ex: https://api.us-south.natural-language-understanding.watson.cloud.ibm.com
)<model_id>
in the workspace_id
parameter in the configurationtrain_input_file
parameter)Due to different coverage among service plans, user may need to adjust max_test_rate
accordingly to avoid network connection error.
Users on Lite plans are only able to create 5 workspaces. They should set fold_num=3
on their k-fold configuration file.
In case of interrupted execution, the tool may not be able to clean up the workspaces it creates. In this case you will need to manually delete the extra workspaces.
Workspace ID is not the Skill ID. In the Watson Assistant user interface, the Workspace ID can be found on the Skills tab, clicking the three dots (top-right of skill), and choosing View API Details.
SSL: [CERTIFICATE_VERIFY_FAILED] on Mac means you may need to initialize Python's SSL certificate store by running Install Certificates.command
found in /Applications/Python
. See more here
"This utility used to work and now it doesn't." Upgrade to latest dependencies with pip3 install --upgrade -r requirements.txt
and latest code with git pull
.
If you get a Python module loading error, confirm that you are using matching pip and python version, ie pip3
and python3
or pip
and python
.
Watson Assistant v2 configuration does not support k-folds mode. Watson Assistant v2 is tested "in-place" rather than creating temporary skills for this tool. Actions users may prefer to use Dialog Skill Analysis notebooks - these notebooks have additional capabilities for analyzing Dialog or Action skills.