Bare Metal Software Home

Last updated March 08 2007 06:11:42
Copyright © 2003, 2004, 2005, 2006 Bare Metal Software Pty Ltd.

Home | BareTail | BareTailPro | BareGrep | BareGrepPro | Buy Now | News | Contact Us

BareGrep

Release 3.50a 2006-11-02 What's new?
Win32 (Windows 95, 98, ME, NT, 2000, XP, 2003, Vista)

Free Version - baregrep.exe (246k) Licence
- Startup splash screen cannot be disabled

Registered Version - Only $US 25 Licence
- Option to disable startup splash screen

BareGrep Icon

+ more =

BareGrepPro Icon

!

Regex Reference

The regular expression (regex) syntax and semantics implemented in BareGrep are common to PHP, Perl and Java.

Characters and Escapes

Logical Operators

Character Classes

Quantifiers

Assertions

There is also an example.


Characters and Escapes

.

Any character

"." matches any character.

For example:

...

would match any three character sequence.

To specify a literal ".", escape it with "\". For example:

www\.baremetalsoft\.com

would match "www.baremetalsoft.com".

x

The literal character x

All characters which do not have a special meaning in a regex match themselves.

For example:

fred

would match the string "fred".

\a

Alert character (bell)

BEL - ASCII code 07.

\cx

Control-x character

For example:

\cM

would be equivalent to key sequence Control-M or character ASCII code 0D hexidecimal.

\d

A digit

A digit from 0 to 9.

This is eqivalent to the regex:

[0-9]

\D

Any non-digit

Any character which is not a digit.

This is eqivalent to the regex:

[^0-9]

\e

Escape character

ESC - ASCII code 27 (1B hexidecimal).

\f

Form feed character

FF - ASCII code 12 (0C hexidecimal).

\r

Carriage return character

CR - ASCII code 13 (0D hexidecimal). Carriage return characters are automatically stripped from the ends of lines. However this escape can be used to match a carriage return character which is not followed by a new line character (ASCII code 10, 0A hexidecimal).

Note: to match the start-of-line use the "^" assertion. To match the end-of-line use the "$" assertion.

\s

Any whitespace character

The whitespace characters include space, tab, new-line, carriage-return and form-feed.

This is eqivalent to the regex:

[ \t\r\n\f]

\S

Any non-whitespace character

This is eqivalent to the regex:

[^ \t\r\n\f]

\t

Tab character

A horizontal tab character.

HT - ASCII code 09.

\nnn

The character with octal value nnn

\w

Any word character

Any word character (in the set "A" to "Z", "a" to "z", "0" to "9" and "_").

This is equivalent to the regex:

[0-9_A-Za-z]

\W

Any non-word character

Any non-word character. A character in the set:

[^0-9_A-Za-z]

\xhh

The character with hexidecimal value hh


Logical Operators

XY

Catenation

Regex X then Y regex.

For example:

abc

would match the string "abc".

X|Y

Alternation

X or Y

For example:

ERROR|FATAL

would match "ERROR" or "FATAL".

(?:X)

Group

Grouping and operator precedence over-ride. For example:

(?:A|B)(?:C|D)

would match "AC", "BC", "AD" or "BD". Whereas:

A|BC|D

would match "A", "BC", or "D".

(X)

Capturing group

Grouping and capturing of the regex X. Capturing causes the string which matched the regex X to be displayed in a separate column in BareGrep.

Capturing groups also imply operator precedence over-ride. For example:

(A|B)(C|D)

would match "AC", "BC", "AD" or "BD". Whereas:

A|BC|D

would match "A", "BC", or "D".

Note: Using capturing involves a significant performance overhead (the search runs slower), so it is preferrable to use non-capturing groups instead, if capturing is not required. Nesting of capturing groups can result in regexes which are particularly slow to execute.


Character Classes

[abc]

Character set

A single a, b or c character.

For example:

[0123456789ABCDEFabcdef]

would match any hexidecimal digit character (in the set "0" to "9", "A" to "F" and "a" to "f").

[^abc]

Inverse character set

Any character other than a, b or c.

For example:

[^0123456789ABCDEFabcdef]

would match any character which is not an hexidecimal digit character.

[a-b]

Character set range

A character in the range a to b.

For example:

[0-9_A-Za-z]

would match any word character (in the set "A" to "Z", "a" to "z", "0" to "9" and "_").


Quantifiers

X*

Set closure

The regex X zero or more times.

For example:

.*

would match anything (or nothing, because it may match zero times).

For example:

A\s*=\s*B

would match "A=B", "A = B" or even "A= B" (ignoring whitespace around the "=").

X+

Kleene closure

The regex X one or more times.

For example:

\d+

would match a sequence of digits that is at least one character in length.

X?

Zero or one

The regex X zero or one times.

For example:

\d?

would match zero or one digits only.

X{n}

Exactly n times

The regex X exactly n times.

For example:

\d{4}

would match exactly 4 digits.

X{n,}

At least n times

The regex X at least n times.

For example:

\d{4,}

would match 4 or more digits.

X{n,m}

Between n and m times

The regex X at least n times, but no more than m times.

For example:

\d{4,6}

would match 4, 5 or 6 or more digits.


Assertions

^

Start-of-line

The start of a line.

For example:

^Status

would match "Status" only at the start of a line.

$

End-of-line

The end of a line.

For example:

Status$

would match "Status" only at the end of a line.


Example

Question

Given the following lines:

#Fields: date time c-ip cs-username s-computername s-ip cs-method cs-uri-stem cs-uri-query sc-status sc-bytes cs-bytes time-taken s-port cs(User-Agent)
2005-01-04 00:31:32 10.67.65.57 - VENUS 10.7.40.91 GET /xpedio/ - 302 288 241 0 80 Mozilla/4.0+(compatible;+MSIE+6.0;+Windows+NT+5.1;+.NET+CLR+1.0.3705;+.NET+CLR+1.1.4322)
2005-01-04 00:31:32 10.67.65.57 - VENUS 10.7.40.91 GET /xpedio/login.html - 200 1337 242 125 80 Mozilla/4.0+(compatible;+MSIE+6.0;+Windows+NT+5.1;+.NET+CLR+1.0.3705;+.NET+CLR+1.1.4322)
2005-01-04 00:31:32 10.67.65.57 - VENUS 10.7.40.91 GET /xpedio/images/FAHC/sm_idoclogo2.gif - 200 1898 310 16 80 Mozilla/4.0+(compatible;+MSIE+6.0;+Windows+NT+5.1;+.NET+CLR+1.0.3705;+.NET+CLR+1.1.4322)
2005-01-04 00:31:32 10.67.65.57 - VENUS 10.7.40.91 GET /intradoc-cgi/idc_cgi_isapi.dll IdcService=LOGIN&Action=GetTemplatePage&Page=HOME_PAGE&Auth=Intranet 200 15431 546 141 80 Mozilla/4.0+(compatible;+MSIE+6.0;+Windows+NT+5.1;+.NET+CLR+1.0.3705;+.NET+CLR+1.1.4322)
2005-01-04 00:31:32 10.67.65.57 FAHC\KioskUser VENUS 10.7.40.91 GET /intradoc-cgi/idc_cgi_isapi.dll IdcService=LOGIN&Action=GetTemplatePage&Page=HOME_PAGE&Auth=Intranet 200 23943 768 390 80 Mozilla/4.0+(compatible;+MSIE+6.0;+Windows+NT+5.1;+.NET+CLR+1.0.3705;+.NET+CLR+1.1.4322)
2005-01-04 00:31:33 10.67.65.57 FAHC\KioskUser VENUS 10.7.40.91 GET /xpedio/images/xpedio/enthome2.gif - 200 650 494 32 80 Mozilla/4.0+(compatible;+MSIE+6.0;+Windows+NT+5.1;+.NET+CLR+1.0.3705;+.NET+CLR+1.1.4322)
2005-01-04 00:31:33 10.67.65.57 - VENUS 10.7.40.91 GET /xpedio/images/xpedio/enthome.gif - 200 662 493 62 80 Mozilla/4.0+(compatible;+MSIE+6.0;+Windows+NT+5.1;+.NET+CLR+1.0.3705;+.NET+CLR+1.1.4322)
2005-01-04 00:31:33 10.67.65.57 FAHC\KioskUser VENUS 10.7.40.91 GET /xpedio/images/xpedio/home.gif - 200 523 490 62 80 Mozilla/4.0+(compatible;+MSIE+6.0;+Windows+NT+5.1;+.NET+CLR+1.0.3705;+.NET+CLR+1.1.4322)
2005-01-04 00:31:33 10.67.65.57 - VENUS 10.7.40.91 GET /xpedio/images/xpedio/home2.gif - 200 525 491 16 80 Mozilla/4.0+(compatible;+MSIE+6.0;+Windows+NT+5.1;+.NET+CLR+1.0.3705;+.NET+CLR+1.1.4322)
2005-01-04 00:31:33 10.67.65.57 FAHC\KioskUser VENUS 10.7.40.91 GET /xpedio/images/xpedio/library2.gif - 200 698 494 47 80 Mozilla/4.0+(compatible;+MSIE+6.0;+Windows+NT+5.1;+.NET+CLR+1.0.3705;+.NET+CLR+1.1.4322)
2005-01-04 00:31:33 10.67.65.57 - VENUS 10.7.40.91 GET /xpedio/images/xpedio/library.gif - 200 701 493 31 80 Mozilla/4.0+(compatible;+MSIE+6.0;+Windows+NT+5.1;+.NET+CLR+1.0.3705;+.NET+CLR+1.1.4322)
2005-01-04 00:31:33 10.67.65.57 FAHC\KioskUser VENUS 10.7.40.91 GET /xpedio/images/xpedio/search2.pdf - 200 570 493 31 80 Mozilla/4.0+(compatible;+MSIE+6.0;+Windows+NT+5.1;+.NET+CLR+1.0.3705;+.NET+CLR+1.1.4322)
2005-01-04 00:31:33 10.67.65.57 - VENUS 10.7.40.91 GET /xpedio/images/xpedio/help2.gif - 200 553 491 31 80 Mozilla/4.0+(compatible;+MSIE+6.0;+Windows+NT+5.1;+.NET+CLR+1.0.3705;+.NET+CLR+1.1.4322)
2005-01-04 00:31:33 10.67.65.57 FAHC\KioskUser VENUS 10.7.40.91 GET /xpedio/images/xpedio/search.gif - 200 574 492 16 80 Mozilla/4.0+(compatible;+MSIE+6.0;+Windows+NT+5.1;+.NET+CLR+1.0.3705;+.NET+CLR+1.1.4322)

I need to generate a report with the FAHC\username and the .pdf file they accessed. I can do either one individually, but not sure how to do both.

Here's the regex that works for the username:

(FAHC\\\S+)

and the regex that works for the .pdf file:

(\S+\.pdf)

but how do I format the "find" field for both?

Answer

In this case, as every line has the same format, I would first try to construct a regex which matches the entire line.

So I'd start with something like:

\S+ \S+ \S+ \S+ \S+ \S+ \S+ \S+ \S+ \S+ \S+ \S+ \S+ \S+ \S+

Then I'd pick out the columns I'm interested in:

\S+ \S+ \S+ (\S+) \S+ \S+ \S+ (\S+) \S+ \S+ \S+ \S+ \S+ \S+ \S+

You can then refine the sub-regex for the two columns you're interested in:

\S+ \S+ \S+ FAHC\\(\S+) \S+ \S+ \S+ (\S+\.pdf) \S+ \S+ \S+ \S+ \S+ \S+ S+

There are various other ways this could also be done, but this is the first way that sprung to mind.

Another way would be:

FAHC\\(\S+).* (\S+\.pdf)

This uses ".*" in the middle which means "match anything".



Want to know about new releases?


We only send email when we release a new version (Privacy Policy)

Our product news is also provided as an RSS Bare Metal Software RSS News Feed feed.

Home | BareTail | BareTailPro | BareGrep | BareGrepPro | Buy Now | News | Contact Us

Copyright © 2003, 2004, 2005, 2006 Bare Metal Software Pty Ltd.