Regex Reference
The regular expression (regex) syntax and semantics implemented in BareGrep are common to PHP, Perl and Java.
BareGrep provides a common subset of regex functionality available in these languages.
Characters and Escapes
Logical Operators
Character Classes
Quantifiers
There is also an example.
Characters and Escapes
.
|
Any character
"." matches any character.
For example:
...
would match any three character sequence.
To specify a literal ".", escape it with "\". For example:
www\.baremetalsoft\.com
would match "www.baremetalsoft.com".
|
x
|
The literal character x
All characters which do not have a special meaning in a regex match themselves.
For example:
fred
would match the string "fred".
|
\a
|
Alert character (bell)
BEL - ASCII code 07.
|
\cx
|
Control-x character
For example:
\cM
would be equivalent to key sequence Control-M or character ASCII code 0D hexidecimal.
|
\d
|
A digit
A digit from 0 to 9.
This is eqivalent to the regex:
[0-9]
|
\D
|
Any non-digit
Any character which is not a digit.
This is eqivalent to the regex:
[^0-9]
|
\e
|
Escape character
ESC - ASCII code 27 (1B hexidecimal)
|
\f
|
Form feed character
FF - ASCII code 12 (0C hexidecimal).
|
\n
|
New line character
LF - ASCII code 10 (0A hexidecimal).
|
\r
|
Carriage return character
CR - ASCII code 13 (0D hexidecimal).
|
\s
|
Any whitespace character
The whitespace characters include space, tab, new-line, carriage-return and form-feed.
This is eqivalent to the regex:
[ \t\r\n\f]
|
\S
|
Any non-whitespace character
This is eqivalent to the regex:
[^ \t\r\n\f]
|
\t
|
Tab character
A horizontal tab character.
HT - ASCII code 09.
|
\nnn
|
The character with octal value nnn
|
\w
|
Any word character
Any word character (in the set "A" to "Z", "a"
to "z", "0" to "9" and "_").
This is equivalent to the regex:
[0-9_A-Za-z]
|
\W
|
Any non-word character
Any non-word character. A character in the set:
[^0-9_A-Za-z]
|
\xhh
|
The character with hexidecimal value hh
|
Logical Operators
XY
|
Catenation
Regex X then Y regex.
For example:
abc
would match the string "abc".
|
X|Y
|
Alternation
X or Y
For example:
ERROR|FATAL
would match "ERROR" or "FATAL".
|
(?:X)
|
Group
Grouping and operator precedence over-ride. For example:
(?:A|B)(?:C|D)
would match "AC", "BC", "AD" or "BD". Whereas:
A|BC|D
would match "A", "BC", or "D".
|
(X)
|
Capturing Group
Grouping and capturing of the regex X.
Capturing causes the string which matched the regex X to be displayed
in a separate column in BareGrep.
Capturing groups also imply operator precedence over-ride. For example:
(A|B)(C|D)
would match "AC", "BC", "AD" or "BD". Whereas:
A|BC|D
would match "A", "BC", or "D".
Note: Using capturing involves a significant performance overhead (the search runs slower),
so it is preferrable to use non-capturing groups instead, if capturing is not required.
Nesting of capturing groups can result in regexes which are particularly
slow to execute.
|
Character Classes
[abc]
|
Character Set
A single a, b or c character.
For example:
[0123456789ABCDEFabcdef]
would match any hexidecimal digit character
(in the set "0" to "9", "A" to "F" and "a" to "f").
|
[^abc]
|
Inverse Character Set
Any character other than a, b or c.
For example:
[^0123456789ABCDEFabcdef]
would match any character which is not an hexidecimal digit character.
|
[a-b]
|
Character Set Range
A character in the range a to b.
For example:
[0-9_A-Za-z]
would match any word character (in the set "A" to "Z", "a"
to "z", "0" to "9" and "_").
|
Quantifiers
X*
|
Set Closure
The regex X zero or more times.
For example:
.*
would match anything (or nothing, because it may match zero times).
For example:
A\s*=\s*B
would match "A=B", "A = B" or even "A= B" (ignoring whitespace around the "=").
|
X+
|
Kleene Closure
The regex X one or more times.
\d+
would match a sequence of digits that is at least one character in length.
|
Example
Question
Given the following lines:
#Fields: date time c-ip cs-username s-computername s-ip cs-method cs-uri-stem cs-uri-query sc-status sc-bytes cs-bytes time-taken s-port cs(User-Agent)
2005-01-04 00:31:32 10.67.65.57 - VENUS 10.7.40.91 GET /xpedio/ - 302 288 241 0 80 Mozilla/4.0+(compatible;+MSIE+6.0;+Windows+NT+5.1;+.NET+CLR+1.0.3705;+.NET+CLR+1.1.4322)
2005-01-04 00:31:32 10.67.65.57 - VENUS 10.7.40.91 GET /xpedio/login.html - 200 1337 242 125 80 Mozilla/4.0+(compatible;+MSIE+6.0;+Windows+NT+5.1;+.NET+CLR+1.0.3705;+.NET+CLR+1.1.4322)
2005-01-04 00:31:32 10.67.65.57 - VENUS 10.7.40.91 GET /xpedio/images/FAHC/sm_idoclogo2.gif - 200 1898 310 16 80 Mozilla/4.0+(compatible;+MSIE+6.0;+Windows+NT+5.1;+.NET+CLR+1.0.3705;+.NET+CLR+1.1.4322)
2005-01-04 00:31:32 10.67.65.57 - VENUS 10.7.40.91 GET /intradoc-cgi/idc_cgi_isapi.dll IdcService=LOGIN&Action=GetTemplatePage&Page=HOME_PAGE&Auth=Intranet 200 15431 546 141 80 Mozilla/4.0+(compatible;+MSIE+6.0;+Windows+NT+5.1;+.NET+CLR+1.0.3705;+.NET+CLR+1.1.4322)
2005-01-04 00:31:32 10.67.65.57 FAHC\KioskUser VENUS 10.7.40.91 GET /intradoc-cgi/idc_cgi_isapi.dll IdcService=LOGIN&Action=GetTemplatePage&Page=HOME_PAGE&Auth=Intranet 200 23943 768 390 80 Mozilla/4.0+(compatible;+MSIE+6.0;+Windows+NT+5.1;+.NET+CLR+1.0.3705;+.NET+CLR+1.1.4322)
2005-01-04 00:31:33 10.67.65.57 FAHC\KioskUser VENUS 10.7.40.91 GET /xpedio/images/xpedio/enthome2.gif - 200 650 494 32 80 Mozilla/4.0+(compatible;+MSIE+6.0;+Windows+NT+5.1;+.NET+CLR+1.0.3705;+.NET+CLR+1.1.4322)
2005-01-04 00:31:33 10.67.65.57 - VENUS 10.7.40.91 GET /xpedio/images/xpedio/enthome.gif - 200 662 493 62 80 Mozilla/4.0+(compatible;+MSIE+6.0;+Windows+NT+5.1;+.NET+CLR+1.0.3705;+.NET+CLR+1.1.4322)
2005-01-04 00:31:33 10.67.65.57 FAHC\KioskUser VENUS 10.7.40.91 GET /xpedio/images/xpedio/home.gif - 200 523 490 62 80 Mozilla/4.0+(compatible;+MSIE+6.0;+Windows+NT+5.1;+.NET+CLR+1.0.3705;+.NET+CLR+1.1.4322)
2005-01-04 00:31:33 10.67.65.57 - VENUS 10.7.40.91 GET /xpedio/images/xpedio/home2.gif - 200 525 491 16 80 Mozilla/4.0+(compatible;+MSIE+6.0;+Windows+NT+5.1;+.NET+CLR+1.0.3705;+.NET+CLR+1.1.4322)
2005-01-04 00:31:33 10.67.65.57 FAHC\KioskUser VENUS 10.7.40.91 GET /xpedio/images/xpedio/library2.gif - 200 698 494 47 80 Mozilla/4.0+(compatible;+MSIE+6.0;+Windows+NT+5.1;+.NET+CLR+1.0.3705;+.NET+CLR+1.1.4322)
2005-01-04 00:31:33 10.67.65.57 - VENUS 10.7.40.91 GET /xpedio/images/xpedio/library.gif - 200 701 493 31 80 Mozilla/4.0+(compatible;+MSIE+6.0;+Windows+NT+5.1;+.NET+CLR+1.0.3705;+.NET+CLR+1.1.4322)
2005-01-04 00:31:33 10.67.65.57 FAHC\KioskUser VENUS 10.7.40.91 GET /xpedio/images/xpedio/search2.pdf - 200 570 493 31 80 Mozilla/4.0+(compatible;+MSIE+6.0;+Windows+NT+5.1;+.NET+CLR+1.0.3705;+.NET+CLR+1.1.4322)
2005-01-04 00:31:33 10.67.65.57 - VENUS 10.7.40.91 GET /xpedio/images/xpedio/help2.gif - 200 553 491 31 80 Mozilla/4.0+(compatible;+MSIE+6.0;+Windows+NT+5.1;+.NET+CLR+1.0.3705;+.NET+CLR+1.1.4322)
2005-01-04 00:31:33 10.67.65.57 FAHC\KioskUser VENUS 10.7.40.91 GET /xpedio/images/xpedio/search.gif - 200 574 492 16 80 Mozilla/4.0+(compatible;+MSIE+6.0;+Windows+NT+5.1;+.NET+CLR+1.0.3705;+.NET+CLR+1.1.4322)
I need to generate a report with the FAHC\username and the .pdf file
they accessed. I can do either one individually, but not sure how to do both.
Here's the regex that works for the username:
(FAHC\\\S+)
and the regex that works for the .pdf file:
(\S+\.pdf)
but how do I format the "find" field for both?
Answer
In this case, as every line has the same format, I would first try to
construct a regex which matches the entire line.
So I'd start with something like:
\S+ \S+ \S+ \S+ \S+ \S+ \S+ \S+ \S+ \S+ \S+ \S+ \S+ \S+ \S+
Then I'd pick out the columns I'm interested in:
\S+ \S+ \S+ (\S+) \S+ \S+ \S+ (\S+) \S+ \S+ \S+ \S+ \S+ \S+ \S+
You can then refine the sub-regex for the two columns you're
interested in:
\S+ \S+ \S+ FAHC\\(\S+) \S+ \S+ \S+ (\S+\.pdf) \S+ \S+ \S+ \S+ \S+ \S+ S+
There are various other ways this could also be done, but this is the
first way that sprung to mind.
Another way would be:
FAHC\\(\S+).* (\S+\.pdf)
This uses ".*" in the middle which means "match anything".
|