Bare Metal Software Home

Last updated May 04 2016 11:09:43
Copyright © 2003, 2004, 2005, 2006 Bare Metal Software Pty Ltd.

Home | BareTail | BareTailPro | BareGrep | BareGrepPro | Buy Now | News | Contact Us

BareGrep

Version 3.50a 2006-11-02 What's new?
Win32 (Windows 95, 98, ME, NT, 2000, XP, 2003)

Free Download - baregrep.exe (246k) Licence

Buy Now - Only $US 25 Licence
- Option to disable startup splash screen

BareGrep Icon

+ more =

BareGrepPro Icon

!

Regex Reference

The regular expression (regex) syntax and semantics implemented in BareGrep are common to PHP, Perl and Java. BareGrep provides a common subset of regex functionality available in these languages.

Characters and Escapes

Logical Operators

Character Classes

Quantifiers

There is also an example.


Characters and Escapes

.

Any character

"." matches any character.

For example:

...

would match any three character sequence.

To specify a literal ".", escape it with "\". For example:

www\.baremetalsoft\.com

would match "www.baremetalsoft.com".

x

The literal character x

All characters which do not have a special meaning in a regex match themselves.

For example:

fred

would match the string "fred".

\a

Alert character (bell)

BEL - ASCII code 07.

\cx

Control-x character

For example:

\cM

would be equivalent to key sequence Control-M or character ASCII code 0D hexidecimal.

\d

A digit

A digit from 0 to 9.

This is eqivalent to the regex:

[0-9]

\D

Any non-digit

Any character which is not a digit.

This is eqivalent to the regex:

[^0-9]

\e

Escape character

ESC - ASCII code 27 (1B hexidecimal)

\f

Form feed character

FF - ASCII code 12 (0C hexidecimal).

\n

New line character

LF - ASCII code 10 (0A hexidecimal).

\r

Carriage return character

CR - ASCII code 13 (0D hexidecimal).

\s

Any whitespace character

The whitespace characters include space, tab, new-line, carriage-return and form-feed.

This is eqivalent to the regex:

[ \t\r\n\f]

\S

Any non-whitespace character

This is eqivalent to the regex:

[^ \t\r\n\f]

\t

Tab character

A horizontal tab character.

HT - ASCII code 09.

\nnn

The character with octal value nnn

\w

Any word character

Any word character (in the set "A" to "Z", "a" to "z", "0" to "9" and "_").

This is equivalent to the regex:

[0-9_A-Za-z]

\W

Any non-word character

Any non-word character. A character in the set:

[^0-9_A-Za-z]

\xhh

The character with hexidecimal value hh


Logical Operators

XY

Catenation

Regex X then Y regex.

For example:

abc

would match the string "abc".

X|Y

Alternation

X or Y

For example:

ERROR|FATAL

would match "ERROR" or "FATAL".

(?:X)

Group

Grouping and operator precedence over-ride. For example:

(?:A|B)(?:C|D)

would match "AC", "BC", "AD" or "BD". Whereas:

A|BC|D

would match "A", "BC", or "D".

(X)

Capturing Group

Grouping and capturing of the regex X. Capturing causes the string which matched the regex X to be displayed in a separate column in BareGrep.

Capturing groups also imply operator precedence over-ride. For example:

(A|B)(C|D)

would match "AC", "BC", "AD" or "BD". Whereas:

A|BC|D

would match "A", "BC", or "D".

Note: Using capturing involves a significant performance overhead (the search runs slower), so it is preferrable to use non-capturing groups instead, if capturing is not required. Nesting of capturing groups can result in regexes which are particularly slow to execute.


Character Classes

[abc]

Character Set

A single a, b or c character.

For example:

[0123456789ABCDEFabcdef]

would match any hexidecimal digit character (in the set "0" to "9", "A" to "F" and "a" to "f").

[^abc]

Inverse Character Set

Any character other than a, b or c.

For example:

[^0123456789ABCDEFabcdef]

would match any character which is not an hexidecimal digit character.

[a-b]

Character Set Range

A character in the range a to b.

For example:

[0-9_A-Za-z]

would match any word character (in the set "A" to "Z", "a" to "z", "0" to "9" and "_").


Quantifiers

X*

Set Closure

The regex X zero or more times.

For example:

.*

would match anything (or nothing, because it may match zero times).

For example:

A\s*=\s*B

would match "A=B", "A = B" or even "A= B" (ignoring whitespace around the "=").

X+

Kleene Closure

The regex X one or more times.

\d+

would match a sequence of digits that is at least one character in length.


Example

Question

Given the following lines:

#Fields: date time c-ip cs-username s-computername s-ip cs-method cs-uri-stem cs-uri-query sc-status sc-bytes cs-bytes time-taken s-port cs(User-Agent)
2005-01-04 00:31:32 10.67.65.57 - VENUS 10.7.40.91 GET /xpedio/ - 302 288 241 0 80 Mozilla/4.0+(compatible;+MSIE+6.0;+Windows+NT+5.1;+.NET+CLR+1.0.3705;+.NET+CLR+1.1.4322)
2005-01-04 00:31:32 10.67.65.57 - VENUS 10.7.40.91 GET /xpedio/login.html - 200 1337 242 125 80 Mozilla/4.0+(compatible;+MSIE+6.0;+Windows+NT+5.1;+.NET+CLR+1.0.3705;+.NET+CLR+1.1.4322)
2005-01-04 00:31:32 10.67.65.57 - VENUS 10.7.40.91 GET /xpedio/images/FAHC/sm_idoclogo2.gif - 200 1898 310 16 80 Mozilla/4.0+(compatible;+MSIE+6.0;+Windows+NT+5.1;+.NET+CLR+1.0.3705;+.NET+CLR+1.1.4322)
2005-01-04 00:31:32 10.67.65.57 - VENUS 10.7.40.91 GET /intradoc-cgi/idc_cgi_isapi.dll IdcService=LOGIN&Action=GetTemplatePage&Page=HOME_PAGE&Auth=Intranet 200 15431 546 141 80 Mozilla/4.0+(compatible;+MSIE+6.0;+Windows+NT+5.1;+.NET+CLR+1.0.3705;+.NET+CLR+1.1.4322)
2005-01-04 00:31:32 10.67.65.57 FAHC\KioskUser VENUS 10.7.40.91 GET /intradoc-cgi/idc_cgi_isapi.dll IdcService=LOGIN&Action=GetTemplatePage&Page=HOME_PAGE&Auth=Intranet 200 23943 768 390 80 Mozilla/4.0+(compatible;+MSIE+6.0;+Windows+NT+5.1;+.NET+CLR+1.0.3705;+.NET+CLR+1.1.4322)
2005-01-04 00:31:33 10.67.65.57 FAHC\KioskUser VENUS 10.7.40.91 GET /xpedio/images/xpedio/enthome2.gif - 200 650 494 32 80 Mozilla/4.0+(compatible;+MSIE+6.0;+Windows+NT+5.1;+.NET+CLR+1.0.3705;+.NET+CLR+1.1.4322)
2005-01-04 00:31:33 10.67.65.57 - VENUS 10.7.40.91 GET /xpedio/images/xpedio/enthome.gif - 200 662 493 62 80 Mozilla/4.0+(compatible;+MSIE+6.0;+Windows+NT+5.1;+.NET+CLR+1.0.3705;+.NET+CLR+1.1.4322)
2005-01-04 00:31:33 10.67.65.57 FAHC\KioskUser VENUS 10.7.40.91 GET /xpedio/images/xpedio/home.gif - 200 523 490 62 80 Mozilla/4.0+(compatible;+MSIE+6.0;+Windows+NT+5.1;+.NET+CLR+1.0.3705;+.NET+CLR+1.1.4322)
2005-01-04 00:31:33 10.67.65.57 - VENUS 10.7.40.91 GET /xpedio/images/xpedio/home2.gif - 200 525 491 16 80 Mozilla/4.0+(compatible;+MSIE+6.0;+Windows+NT+5.1;+.NET+CLR+1.0.3705;+.NET+CLR+1.1.4322)
2005-01-04 00:31:33 10.67.65.57 FAHC\KioskUser VENUS 10.7.40.91 GET /xpedio/images/xpedio/library2.gif - 200 698 494 47 80 Mozilla/4.0+(compatible;+MSIE+6.0;+Windows+NT+5.1;+.NET+CLR+1.0.3705;+.NET+CLR+1.1.4322)
2005-01-04 00:31:33 10.67.65.57 - VENUS 10.7.40.91 GET /xpedio/images/xpedio/library.gif - 200 701 493 31 80 Mozilla/4.0+(compatible;+MSIE+6.0;+Windows+NT+5.1;+.NET+CLR+1.0.3705;+.NET+CLR+1.1.4322)
2005-01-04 00:31:33 10.67.65.57 FAHC\KioskUser VENUS 10.7.40.91 GET /xpedio/images/xpedio/search2.pdf - 200 570 493 31 80 Mozilla/4.0+(compatible;+MSIE+6.0;+Windows+NT+5.1;+.NET+CLR+1.0.3705;+.NET+CLR+1.1.4322)
2005-01-04 00:31:33 10.67.65.57 - VENUS 10.7.40.91 GET /xpedio/images/xpedio/help2.gif - 200 553 491 31 80 Mozilla/4.0+(compatible;+MSIE+6.0;+Windows+NT+5.1;+.NET+CLR+1.0.3705;+.NET+CLR+1.1.4322)
2005-01-04 00:31:33 10.67.65.57 FAHC\KioskUser VENUS 10.7.40.91 GET /xpedio/images/xpedio/search.gif - 200 574 492 16 80 Mozilla/4.0+(compatible;+MSIE+6.0;+Windows+NT+5.1;+.NET+CLR+1.0.3705;+.NET+CLR+1.1.4322)

I need to generate a report with the FAHC\username and the .pdf file they accessed. I can do either one individually, but not sure how to do both.

Here's the regex that works for the username:

(FAHC\\\S+)

and the regex that works for the .pdf file:

(\S+\.pdf)

but how do I format the "find" field for both?

Answer

In this case, as every line has the same format, I would first try to construct a regex which matches the entire line.

So I'd start with something like:

\S+ \S+ \S+ \S+ \S+ \S+ \S+ \S+ \S+ \S+ \S+ \S+ \S+ \S+ \S+

Then I'd pick out the columns I'm interested in:

\S+ \S+ \S+ (\S+) \S+ \S+ \S+ (\S+) \S+ \S+ \S+ \S+ \S+ \S+ \S+

You can then refine the sub-regex for the two columns you're interested in:

\S+ \S+ \S+ FAHC\\(\S+) \S+ \S+ \S+ (\S+\.pdf) \S+ \S+ \S+ \S+ \S+ \S+ S+

There are various other ways this could also be done, but this is the first way that sprung to mind.

Another way would be:

FAHC\\(\S+).* (\S+\.pdf)

This uses ".*" in the middle which means "match anything".



Want to know about new releases?


We only send email when we release a new version (Privacy Policy)

Our product news is also provided as an RSS Bare Metal Software RSS News Feed feed.

Home | BareTail | BareTailPro | BareGrep | BareGrepPro | Buy Now | News | Contact Us

Copyright © 2003, 2004, 2005, 2006 Bare Metal Software Pty Ltd.