Evaluate your speech-to-text system with similarity measures such as word error rate (WER)
Full Changelog: https://github.com/jitsi/jiwer/compare/v3.0.2...v3.0.3
Full Changelog: https://github.com/jitsi/jiwer/compare/v3.0.1...v3.0.2
Minor release for fixing #76 .
Full Changelog: https://github.com/jitsi/jiwer/compare/v3.0.0...v3.0.1
This release makes breaking changes to the jiwer API.
First, we introduce 3 new methods:
1.jiwer.compute_measures()
is renamed to jiwer.process_words
, and returns everything in a dataclass
named WordOutput
.
2.jiwer.cer(return_dict=True)
is deprecated, and is superseded by jiwer.process_characters
, which returns everything in a dataclass
named CharacterOutput
3. jiwer.visualize_measures
is renamed to jiwer.visualize_alignment
. Moreover, the keyword argument visualize_cer: bool = False
has been removed, and the output
keyword argument is now of expected type Union[WordOutput, CharacterOutput]
.
I've also decided to rename all mentions of the concept "(ground)truth" to "reference", in the light of the Whisper speech-to-text model showing that future ASR models might not trained on something like a "ground truth". Therefore, in the following methods, the keyword arguments truth
and truth_transform
have been renamed to reference
and reference_transform
:
jiwer.cer()
jiwer.mer()
jiwer.wer()
jiwer.wil()
jiwer.wip()
The alignments are now stored as a list of lists containing jiwer.AlignmentChunk
dataclass objects instead of hard-to-document tuples.
Lastly, I've added jiwer.transformations.cer_contiguous
for optionally calculating the CER
with uneven amount of reference and hypothesis sentences. I've also changed the wer_standardize
and wer_standardize_contiguous
so that the last 3 transformations are now:
tr.Strip(),
tr.ReduceToSingleSentence(),
tr.ReduceToListOfListOfWords(),
This releases also introduced a documentation website. See https://jitsi.github.io/jiwer.
Full Changelog: https://github.com/jitsi/jiwer/compare/v2.6.0...v3.0.0
The return dictionary of jiwer.cer()
and jiwer.compute_measures()
now has 3 addional keys: ops
, truth
, and hypothesis
. See the alignment section of the README, and the doc-strings of the methods, for more details.
Also adds the jiwer.visualize_measures()
to visualize the alignment of all ground-truth/hypothesis pairs.
Finally, the jiwer
command is automatically installed upon installation of jiwer, which provides a simple CLI for interacting with jiwer.
Commit list:
Full Changelog: https://github.com/jitsi/jiwer/compare/v2.5.2...v2.6.0
Full Changelog: https://github.com/jitsi/jiwer/compare/v2.5.1...v2.5.2
Full Changelog: https://github.com/jitsi/jiwer/compare/v.2.5.0...v2.5.1
standardize
and words_to_filter
by @nikvaessen in https://github.com/jitsi/jiwer/pull/65
Full Changelog: https://github.com/jitsi/jiwer/compare/v2.4.0...v.2.5.0
Full Changelog: https://github.com/jitsi/jiwer/compare/v2.3.0...v2.4.0
words_to_filter
and standardize
keywords in all measure functions.Full Changelog: https://github.com/jitsi/jiwer/compare/v2.2.1...v2.3.0