Pire Save

Perl Incompatible Regular Expressions library

Project README

This is PIRE, Perl Incompatible Regular Expressions library.

This library is aimed at checking a huge amount of text against relatively many regular expressions. Roughly speaking, it can just check whether given text maches the certain regexp, but can do it really fast (more than 400 MB/s on our hardware is common). Even more, multiple regexps can be combined together, giving capability to check the text against apx.10 regexps in a single pass (and mantaining the same speed).

Since Pire examines each character only once, without any lookaheads or rollbacks, spending about five machine instructions per each character, it can be used even in realtime tasks.

On the other hand, Pire has very limited functionality (compared to other regexp libraries). Pire does not have any Perlish conditional regexps, lookaheads & backtrackings, greedy/nongreedy matches; neither has it any capturing facilities.

Pire was developed in Yandex (http://company.yandex.ru/) as a part of its web crawler.

More information can be found in README.ru (in Russian), which is yet to be translated.

Please report bugs to [email protected] or [email protected].

Quick Start

#include <stdio.h> #include #include <pire/pire.h>

Pire::NonrelocScanner CompileRegexp(const char* pattern) { // Transform the pattern from UTF-8 into UCS4 std::vectorPire::wchar32 ucs4; Pire::Encodings::Utf8().FromLocal(pattern, pattern + strlen(pattern), std::back_inserter(ucs4));

return Pire::Lexer(ucs4.begin(), ucs4.end())
	.AddFeature(Pire::Features::CaseInsensitive())	// enable case insensitivity
	.SetEncoding(Pire::Encodings::Utf8())		// set input text encoding
	.Parse() 					// create an FSM 
	.Surround()					// PCRE_ANCHORED behavior
	.Compile<Pire::NonrelocScanner>();		// compile the FSM

}

bool Matches(const Pire::NonrelocScanner& scanner, const char* ptr, size_t len) { return Pire::Runner(scanner) .Begin() // '^' .Run(ptr, len) // the text .End(); // '$' // implicitly cast to bool }

int main() { char re[] = "hello\s+w.+d$"; char str[] = "Hello world";

Pire::NonrelocScanner sc = CompileRegexp(re);

bool res = Matches(sc, str, strlen(str));

printf("String \"%s\" %s \"%s\"\n", str, (res ? "matches" : "doesn't match"), re);
	
return 0;

}

Open Source Agenda is not affiliated with "Pire" Project. README Source: yandex/pire
Stars
328
Open Issues
19
Last Commit
3 years ago
Repository
Tags

Open Source Agenda Badge

Open Source Agenda Rating