DavidSkrundz Regex Save

A pure Swift NFA implementation of a regular expression engine

Project README

Regex (V2 WIP) Swift Version Platforms Build Status Codebeat Status Codecov

A pure Swift implementation of a Regular Expression Engine

Trying again with V2 using DFAs instead of NFAs to get grep-like performance

Usage

To avoid compiling overhead it is possible to create a Regex instance

// Compile the expression
let regex = try! Regex(pattern: "[a-zA-Z]+")

let string = "RegEx is tough, but useful."

// Search for matches
let words = regex.match(string)

/*
words = [
	RegexMatch(match: "RegEx", groups: []),
	RegexMatch(match: "is", groups: []),
	RegexMatch(match: "tough", groups: []),
	RegexMatch(match: "but", groups: []),
	RegexMatch(match: "useful", groups: []),
]
*/

If compiling overhead is not an issue it is possible to use the =~ operator to match a string

let fourLetterWords = "drink beer, it's very nice!" =~ "\\b\\w{4}\\b" ?? []

/*
fourLetterWords = [
	RegexMatch(match: "beer", groups: []),
	RegexMatch(match: "very", groups: []),
	RegexMatch(match: "nice", groups: []),
]
*/

By default the Global flag is active. To change which flag are active, add a / at the start of the pattern, and add /<flags> at the end. The available flags are:

  • g Global - Allows multiple matches
  • i Case Insensitive - Case insensitive matching
  • m Multiline - ^ and $ also match the begining and end of a line
// Global and Case Insensitive search
let regex = try! Regex(pattern: "/\\w+/ig")

Supported Operations

Character Classes

Pattern Description Supported
. [^\n\r]
  • [ ]
[^] [\s\S]
  • [ ]
\w [A-Za-z0-9_]
  • [ ]
\W [^A-Za-z0-9_]
  • [ ]
\d [0-9]
  • [ ]
\D [^0-9]
  • [ ]
\s [\ \r\n\t\v\f]
  • [ ]
\S [^\ \r\n\t\v\f]
  • [ ]
[ABC] Any in the set
  • [ ]
[^ABC] Any not in the set
  • [ ]
[A-Z] Any in the range inclusively
  • [ ]

Anchors (Match positions not characters)

Pattern Description Supported
^ Beginning of string
  • [ ]
$ End of string
  • [ ]
\b Word boundary
  • [ ]
\B Not word boundary
  • [ ]

Escaped Characters

Pattern Description Supported
\0 Octal escaped character
  • [ ]
\00 Octal escaped character
  • [ ]
\000 Octal escaped character
  • [ ]
\xFF Hex escaped character
  • [ ]
\uFFFF Unicode escaped character
  • [ ]
\cA Control character
  • [ ]
\t Tab
  • [ ]
\n Newline
  • [ ]
\v Vertical tab
  • [ ]
\f Form feed
  • [ ]
\r Carriage return
  • [ ]
\0 Null
  • [ ]
\. .
  • [ ]
\\ \
  • [ ]
\+ +
  • [ ]
\* *
  • [ ]
\? ?
  • [ ]
\^ ^
  • [ ]
\$ $
  • [ ]
\{ {
  • [ ]
\} }
  • [ ]
\[ [
  • [ ]
\] ]
  • [ ]
\( (
  • [ ]
\) )
  • [ ]
\/ /
  • [ ]
| ` `

Groups and Lookaround

Pattern Description Supported
(ABC) Capture group
  • [ ]
(<name>ABC) Named capture group
  • [ ]
\1 Back reference
  • [ ]
\'name' Named back reference
  • [ ]
(?:ABC) Non-capturing group
  • [ ]
(?=ABC) Positive lookahead
  • [ ]
(?!ABC) Negative lookahead
  • [ ]
(?<=ABC) Positive lookbehind
  • [ ]
(?<!ABC) Negative lookbehing
  • [ ]

Greedy Quantifiers

Pattern Description Supported
+ One or more
  • [ ]
* Zero or more
  • [ ]
? Optional
  • [ ]
{n} n
  • [ ]
{,} Same as *
  • [ ]
{,n} n or less
  • [ ]
{n,} n or more
  • [ ]
{n,m} n to m
  • [ ]

Lazy Quantifiers

Pattern Description Supported
+? One or more
  • [ ]
*? Zero or more
  • [ ]
?? Optional
  • [ ]
{n}? n
  • [ ]
{,n}? n or less
  • [ ]
{n,}? n or more
  • [ ]
{n,m}? n to m
  • [ ]

Alternation

Pattern Description Supported
| Everything before or everything after
  • [ ]

Flags

Pattern Description Supported
i Case insensitive
  • [ ]
g Global
  • [ ]
m Multiline
  • [ ]

Inner Workings

(Similar to before)

  • Lexer (String input to Tokens)
  • Parser (Tokens to NFA)
  • Compiler (NFA to DFA)
  • Optimizer (Simplify DFA (eg. char(a), char(b) -> string(ab)) for better performance)
  • Engine (Matches an input String using the DFA)

Note

Swift treats \r\n as a single Character. Use \n\r to have both.

Resources

Open Source Agenda is not affiliated with "DavidSkrundz Regex" Project. README Source: DavidSkrundz/Regex
Stars
28
Open Issues
2
Last Commit
5 years ago
Repository

Open Source Agenda Badge

Open Source Agenda Rating