the original test environmet by dark100 at http://sljit.sourceforge.net/regex_perf.html

clone this test environmet

Performance comparison of regular expression engines

Introduction

Processing text or raw byte-sequence are among the common tasks performed by most software tools. These tasks usually involve pattern matching algorithms, and the most popular tool for such purpose is regular expressions. Regular expressions has been evolved a lot since Kleene defined the regular sets in the 1950's. Today we have several widely used regular expression engines which have different features which makes any performance comparison a difficult task, since a faster engine is not necessary better. Depending on the use case it might be enough to know whether a POSIX compatible regular expression matches to a line, even the position of the match is unneeded (grep utility). On the contrary other use cases require the position of capturing brackets, unicode support, conditional and atomic block (handling a byte sequence as a single character, like 'sch' in German language) support just to name a few. The former case needs a less sophisticated algorithm, which is likely be much faster than the latter, but again, that does not mean the former is better. More about these engine types can be found here.

Participants

The following popular engines were choosen: and Before anyone jump to any conclusions, I should note the followings:

Results

x86-64 bit Intel Cerelon 847 1.1GHz (GCC 7.2.0, Go 1.9.1 – GNU/Linux)

Regular expressiontrere2Golangre2pcre-JITpcre-DFApcreonigregexp4Golangregexp4regexp3
".|\n"
"."
6486ms (19285324)34159ms (19285324)9072ms (19285324)1130ms (19285324)6677ms (19285324)4448ms (19285324)12876ms (19285324)1553ms (19285324)605ms (19285324)1808ms (19285324)
"\d"
":d"
1044ms (27084)3166ms (27084)134ms (27084)55ms (27084)68ms (27084)65ms (27084)127ms (27084)1340ms (27084)594ms (27084)1853ms (27084)
"\S"
":S"
4633ms (15451664)32530ms (15451664)7405ms (15451664)931ms (15451664)4593ms (15451664)3086ms (15451664)10453ms (15451664)2056ms (15451664)810ms (15451664)1912ms (15451664)
"\S+"
":S+"
2565ms (3414592)8000ms (3414592)1957ms (3414592)333ms (3414592)1619ms (3414592)924ms (3414592)3114ms (3414592)1521ms (3414592)654ms (3414592)1059ms (3414592)
"\w"
":w"
4769ms (14751878)32506ms (14751878)7052ms (14751878)952ms (14751878)4537ms (14751878)2994ms (14751878)10865ms (14751878)1811ms (14750958)819ms (14750958)1888ms (14750958)
"[:w_]"
"\w"
4762ms (14751878)29682ms (14751878)7066ms (14751878)952ms (14751878)4486ms (14751878)2989ms (14751878)10663ms (14751878)2472ms (14751878)1160ms (14751878)3171ms (14751878)
"[a-zA-Z0-9_]"
"[a-zA-Z0-9_]"
4650ms (14751878)29437ms (14751878)7121ms (14751878)994ms (14751878)4538ms (14751878)3074ms (14751878)10483ms (14751878)2735ms (14751878)1104ms (14751878)5260ms (14751878)
"[a-zA-Z]+"
"[a-zA-Z]+"
2317ms (3495761)7582ms (3495761)2048ms (3495761)327ms (3495761)1551ms (3495761)1007ms (3495761)2956ms (3495761)2250ms (3495761)860ms (3495761)2771ms (3495761)
"[.\s]+"
"[.:s]+"
2205ms (991813)9541ms (3430783)2036ms (3430783)399ms (3430783)1037ms (3430783)949ms (3430783)2676ms (3430783)2851ms (3430783)1363ms (3430783)3741ms (3430783)
"([^\n]+)"
"<[^\\n]+>"
1591ms (314387)5611ms (314387)430ms (314387)82ms (314387)1038ms (314387)210ms (314387)697ms (314387)1301ms (314387)578ms (314387)788ms (314387)
"e"
"e"
520ms (1781425)2842ms (1781425)1000ms (1781425)139ms (1781425)439ms (1781425)380ms (1781425)1434ms (1781425)1739ms (1781425)663ms (1781425)1809ms (1781425)
"(((((e)))))"
"<<<<<e>>>>>"
529ms (1781425)4399ms (1781425)1001ms (1781425)203ms (1781425)1269ms (1781425)1256ms (1781425)1971ms (1781425)6168ms (1781425)2360ms (1781425)19939ms (1781425)
"((((((((((e))))))))))"
"<<<<<<<<<<e>>>>>>>>>>"
525ms (1781425)6709ms (1781425)1006ms (1781425)298ms (1781425)1850ms (1781425)1945ms (1781425)2176ms (1781425)12171ms (1781425)4034ms (1781425)58376ms (1781425)
"Twain"
"Twain"
1022ms (2388)12ms (2388)8ms (2388)49ms (2388)47ms (2388)10ms (2388)52ms (2388)1610ms (2388)589ms (2388)2450ms (2388)
"(Twain)"
"<Twain>"
1015ms (2388)13ms (2388)8ms (2388)49ms (2388)48ms (2388)14ms (2388)53ms (2388)2294ms (2388)927ms (2388)6027ms (2388)
"(?i)Twain"
"#*Twain"
1337ms (2657)3515ms (2657)166ms (2657)51ms (2657)287ms (2657)203ms (2657)349ms (2657)1616ms (2657)727ms (2657)2575ms (2657)
"(?i:Tw)(?:(?:a?A?[i_I]{0,2})?[nN])"
"(Tw)#*((a?A?[i_I]{0,2})?[nN])"
1375ms (2989)3474ms (2989)168ms (2989)56ms (2989)319ms (2989)236ms (2989)353ms (2989)2154ms (2989)1021ms (2989)9216ms (2989)
"[a-z]shing"
"[a-z]shing"
1597ms (1877)3614ms (1877)277ms (1877)47ms (1877)2302ms (1877)1487ms (1877)49ms (1877)3250ms (1877)1267ms (1877)5413ms (1877)
"Huck[a-zA-Z]+|Saw[a-zA-Z]+"
"Huck[a-zA-Z]+|Saw[a-zA-Z]+"
1779ms (396)4431ms (396)126ms (396)8ms (396)72ms (396)70ms (396)117ms (396)3894ms (396)1456ms (396)6599ms (396)
"[a-q][^u-z]{13}x"
"[a-q][^u-z]{13}x"
4446ms (5021)11181ms (5021)613ms (5021)5ms (5021)6169ms (5021)1742ms (5021)135ms (5021)9796ms (5021)3347ms (5021)10391ms (5021)
"Tom|Sawyer|Huckleberry|Finn"
"Tom|Sawyer|Huckleberry|Finn"
3151ms (3015)9003ms (3015)134ms (3015)89ms (3015)96ms (3015)97ms (3015)141ms (3015)6608ms (3015)2608ms (3015)10659ms (3015)
"(Tom|Sawyer|Huckleberry|Finn)"
"<Tom|Sawyer|Huckleberry|Finn>"
3302ms (3015)8876ms (3015)131ms (3015)87ms (3015)100ms (3015)102ms (3015)142ms (3015)7240ms (3015)2964ms (3015)23149ms (3015)
"[hHeELlOo][hHeELlOo][hHeELlOo][hHeELlOo][hHeELlOo]"
"[hHeELlOo][hHeELlOo][hHeELlOo][hHeELlOo][hHeELlOo]"
3656ms (534)4503ms (534)250ms (534)247ms (534)889ms (534)638ms (534)687ms (534)2903ms (534)1589ms (534)10916ms (534)
"Tom.{10,25}river|river.{10,25}Tom"
1844ms (2)4541ms (2)152ms (2)44ms (2)247ms (2)211ms (2)236ms (2)N/AN/AN/A
"ing[^a-zA-Z]"
"ing[^a-zA-Z]"
1149ms (85956)228ms (85956)111ms (85956)54ms (85956)243ms (85956)140ms (85956)140ms (85956)1611ms (85956)627ms (85956)3131ms (85956)
"[a-zA-Z]ing[^a-zA-Z]"
"[a-zA-Z]ing[^a-zA-Z]"
1859ms (85823)3884ms (85823)309ms (85823)56ms (85823)2376ms (85823)1549ms (85823)143ms (85823)3274ms (85823)1328ms (85823)6761ms (85823)
"([a-zA-Z]+ing)"
2070ms (95863)5411ms (95863)318ms (95863)213ms (95863)6616ms (95863)4174ms (95863)2603ms (95863)N/AN/AN/A

x86-64 bit AMD A6-9500 3.8GHz (GCC 7.2.1, Go 1.9.1 – GNU/Linux)

Regular expressiontrere2Golangre2pcre-JITpcre-DFApcreonigregexp4Golangregexp4regexp3
".|\n"
"."
2039ms (19285324)12477ms (19285324)3361ms (19285324)594ms (19285324)1673ms (19285324)1379ms (19285324)4188ms (19285324)559ms (19285324)190ms (19285324)660ms (19285324)
"\d"
":d"
347ms (27084)1064ms (27084)46ms (27084)16ms (27084)19ms (27084)20ms (27084)43ms (27084)478ms (27084)253ms (27084)669ms (27084)
"\S"
":S"
1468ms (15451664)9661ms (15451664)2779ms (15451664)339ms (15451664)1040ms (15451664)964ms (15451664)3649ms (15451664)721ms (15451664)287ms (15451664)709ms (15451664)
"\S+"
":S+"
908ms (3414592)2793ms (3414592)728ms (3414592)113ms (3414592)501ms (3414592)303ms (3414592)943ms (3414592)531ms (3414592)200ms (3414592)368ms (3414592)
"\w"
":w"
1483ms (14751878)9036ms (14751878)2637ms (14751878)345ms (14751878)1001ms (14751878)927ms (14751878)3387ms (14751878)637ms (14750958)305ms (14750958)693ms (14750958)
"\w"
"[:w_]"
1479ms (14751878)9049ms (14751878)2627ms (14751878)345ms (14751878)1003ms (14751878)920ms (14751878)3364ms (14751878)849ms (14751878)433ms (14751878)1190ms (14751878)
"[a-zA-Z0-9_]"
"[a-zA-Z0-9_]"
1476ms (14751878)9898ms (14751878)2636ms (14751878)346ms (14751878)1016ms (14751878)966ms (14751878)3384ms (14751878)920ms (14751878)390ms (14751878)1953ms (14751878)
"[a-zA-Z]+"
"[a-zA-Z]+"
782ms (3495761)3242ms (3495761)744ms (3495761)116ms (3495761)491ms (3495761)292ms (3495761)977ms (3495761)750ms (3495761)301ms (3495761)927ms (3495761)
"[.\s]+"
"[.:s]+"
658ms (991813)3080ms (3430783)684ms (3430783)117ms (3430783)329ms (3430783)276ms (3430783)865ms (3430783)955ms (3430783)484ms (3430783)1338ms (3430783)
"([^\n]+)"
"<[^\\n]+>"
562ms (314387)1909ms (314387)149ms (314387)32ms (314387)342ms (314387)64ms (314387)234ms (314387)433ms (314387)174ms (314387)246ms (314387)
"e"
"e"
166ms (1781425)952ms (1781425)339ms (1781425)54ms (1781425)149ms (1781425)120ms (1781425)440ms (1781425)581ms (1781425)231ms (1781425)679ms (1781425)
"(((((e)))))"
"<<<<<e>>>>>"
166ms (1781425)1472ms (1781425)347ms (1781425)94ms (1781425)433ms (1781425)381ms (1781425)540ms (1781425)2126ms (1781425)824ms (1781425)7649ms (1781425)
"((((((((((e))))))))))"
"<<<<<<<<<<e>>>>>>>>>>"
166ms (1781425)2267ms (1781425)342ms (1781425)136ms (1781425)841ms (1781425)617ms (1781425)678ms (1781425)3981ms (1781425)1465ms (1781425)24102ms (1781425)
"Twain"
"Twain"
334ms (2388)5ms (2388)3ms (2388)12ms (2388)16ms (2388)4ms (2388)13ms (2388)559ms (2388)211ms (2388)1030ms (2388)
"(Twain)"
"<Twain>"
333ms (2388)5ms (2388)3ms (2388)12ms (2388)16ms (2388)5ms (2388)14ms (2388)781ms (2388)336ms (2388)2506ms (2388)
"#*Twain"
"(?i)Twain"
445ms (2657)1204ms (2657)65ms (2657)13ms (2657)96ms (2657)71ms (2657)117ms (2657)543ms (2657)246ms (2657)1061ms (2657)
"(?i:Tw)(?:(?:a?A?[i_I]{0,2})?[nN])"
"(Tw)#*((a?A?[i_I]{0,2})?[nN])"
458ms (2989)1191ms (2989)68ms (2989)25ms (2989)102ms (2989)80ms (2989)113ms (2989)729ms (2989)334ms (2989)3617ms (2989)
"[a-z]shing"
"[a-z]shing"
524ms (1877)1320ms (1877)90ms (1877)11ms (1877)750ms (1877)498ms (1877)12ms (1877)1073ms (1877)436ms (1877)1837ms (1877)
"Huck[a-zA-Z]+|Saw[a-zA-Z]+"
"Huck[a-zA-Z]+|Saw[a-zA-Z]+"
548ms (396)1511ms (396)42ms (396)4ms (396)23ms (396)23ms (396)39ms (396)1271ms (396)494ms (396)2483ms (396)
"[a-q][^u-z]{13}x"
"[a-q][^u-z]{13}x"
1467ms (5021)3540ms (5021)359ms (5021)3ms (5021)1985ms (5021)588ms (5021)40ms (5021)3306ms (5021)1190ms (5021)3356ms (5021)
"Tom|Sawyer|Huckleberry|Finn"
"Tom|Sawyer|Huckleberry|Finn"
931ms (3015)2794ms (3015)44ms (3015)24ms (3015)30ms (3015)31ms (3015)47ms (3015)2295ms (3015)864ms (3015)4132ms (3015)
"(Tom|Sawyer|Huckleberry|Finn)"
"<Tom|Sawyer|Huckleberry|Finn>"
937ms (3015)2956ms (3015)44ms (3015)26ms (3015)31ms (3015)33ms (3015)47ms (3015)2511ms (3015)1011ms (3015)9374ms (3015)
"[hHeELlOo][hHeELlOo][hHeELlOo][hHeELlOo][hHeELlOo]"
"[hHeELlOo][hHeELlOo][hHeELlOo][hHeELlOo][hHeELlOo]"
1104ms (534)1526ms (534)90ms (534)81ms (534)288ms (534)203ms (534)227ms (534)954ms (534)594ms (534)3782ms (534)
"Tom.{10,25}river|river.{10,25}Tom"
604ms (2)1520ms (2)57ms (2)20ms (2)76ms (2)71ms (2)76ms (2)N/AN/AN/A
"ing[^a-zA-Z]"
"ing[^a-zA-Z]"
379ms (85956)76ms (85956)41ms (85956)23ms (85956)79ms (85956)48ms (85956)39ms (85956)592ms (85956)222ms (85956)1255ms (85956)
"[a-zA-Z]ing[^a-zA-Z]"
"[a-zA-Z]ing[^a-zA-Z]"
639ms (85823)1341ms (85823)103ms (85823)24ms (85823)767ms (85823)485ms (85823)39ms (85823)1143ms (85823)461ms (85823)2355ms (85823)
"([a-zA-Z]+ing)"
717ms (95863)1903ms (95863)106ms (95863)75ms (95863)1766ms (95863)1424ms (95863)925ms (95863)N/AN/AN/A

Compile

Deps

get data

  1. $ wget http://www.gutenberg.org/files/3200/old/mtent12.zip
  2. $ dos2unix mtent12.txt data.txt
  3. $ dos2unix data.txt
  4. $ rm mtent12*

build benchmarks

  1. $ ./rebuild.sh
  2. $ ./rebench.sh

clean

  1. $ ./reclean.sh