Spell checking and fuzzy search suggestion written in Go
Fuzzy is a very fast spell checker and query suggester written in Golang.
Motivation:
Notes:
Config:
"threshold"
is the trigger point when a word becomes popular enough to build lookup keys for it. Setting this to "1" means any instance of a given word makes it a legitimate spelling. This typically corrects the most errors, but can also cause false positives if incorrect spellings exist in the training data. It also causes a much larger index to be built. By default this is set to 4."depth"
is the Levenshtein distance the model builds lookup keys for. For spelling correction, a setting of "2" is typically very good. At a distance of "3" the potential number of words is much, much larger, but adds little benefit to accuracy. For query prediction a larger number can be useful, but again is much more expensive. A depth of "1" and threshold of "1" for the 1st Norvig test set gives ~70% correction accuracy at ~5usec per check (e.g. ~200kHz), for many applications this will be good enough. At depths > 2, the false positives begin to hurt the accuracy.Future improvements:
Usage:
[email protected]
package main
import(
"github.com/sajari/fuzzy"
"fmt"
)
func main() {
model := fuzzy.NewModel()
// For testing only, this is not advisable on production
model.SetThreshold(1)
// This expands the distance searched, but costs more resources (memory and time).
// For spell checking, "2" is typically enough, for query suggestions this can be higher
model.SetDepth(5)
// Train multiple words simultaneously by passing an array of strings to the "Train" function
words := []string{"bob", "your", "uncle", "dynamite", "delicate", "biggest", "big", "bigger", "aunty", "you're"}
model.Train(words)
// Train word by word (typically triggered in your application once a given word is popular enough)
model.TrainWord("single")
// Check Spelling
fmt.Println("\nSPELL CHECKS")
fmt.Println(" Deletion test (yor) : ", model.SpellCheck("yor"))
fmt.Println(" Swap test (uncel) : ", model.SpellCheck("uncel"))
fmt.Println(" Replace test (dynemite) : ", model.SpellCheck("dynemite"))
fmt.Println(" Insert test (dellicate) : ", model.SpellCheck("dellicate"))
fmt.Println(" Two char test (dellicade) : ", model.SpellCheck("dellicade"))
// Suggest completions
fmt.Println("\nQUERY SUGGESTIONS")
fmt.Println(" \"bigge\". Did you mean?: ", model.Suggestions("bigge", false))
fmt.Println(" \"bo\". Did you mean?: ", model.Suggestions("bo", false))
fmt.Println(" \"dyn\". Did you mean?: ", model.Suggestions("dyn", false))
// Autocomplete suggestions
suggested, _ := model.Autocomplete("bi")
fmt.Printf(" \"bi\". Suggestions: %v", suggested)
}