SuffixTree Save

Optimized implementation of suffix tree in python using Ukkonen's algorithm.

Project README

This module is an optimized implementation of Ukkonen's suffix tree algorithm in python which will be having most of the important text processing functionalities such as:

Search for strings:

Check if a string P of length m is a substring in O(m) time.
Find the first occurrence of the patterns P1,... ,Pq of total length m as substrings in O(m) time.

Find all z occurrences of the patterns P1,... ,Pq of total length m as substrings in O(m+z) time.

  • Search for a regular expression P in time expected sublinear in n
  • Find for each suffix of a pattern P the length of the longest match between a prefix of P[i... m] and a substring in D in image time. This is termed the matching statistics for P

Find properties of the strings:

  • Find the longest common substrings of the string Si and Sj in image time.
  • Find all maximal pairs, maximal repeats or supermaximal repeats in image time.
  • Find the Lempel–Ziv decomposition in image time.[10]
  • Find the longest repeated substrings in image time.
  • Find the most frequently occurring substrings of a minimum length in image time.
  • Find the shortest strings from image that do not occur in D in O(n+z) time, if there are z such strings.
  • Find the shortest substrings occurring only once in image time.
  • Find, for each i the shortest substrings of Si not occurring elsewhere in D in image time.

sources:

Open Source Agenda is not affiliated with "SuffixTree" Project. README Source: kasraavand/SuffixTree
Stars
48
Open Issues
3
Last Commit
1 year ago

Open Source Agenda Badge

Open Source Agenda Rating