Regular Expressions
[Examples]
[Tutorial 100]
[Tutorial 101]
- What the heck are regular expressions ?
- When would I use them ?
- Where can I find more information about them ?
- How are Gravity's Regexs different from other packages ?
- Using Regexs in Rules
- Using Regexs in Display Filters
- Important User Pointers
- How to Test Regular Expressions
- Examples
- Tutorial for beginners - Class 100
- Tutorial continued - Class 101 Draft Version
Regular expressions are odd-looking but powerful expressions that match patterns
in text strings. In Gravity's case, the string is either the Subject line,
the From line, the date, Message ID or text within the article body. Note
that the string may contain spaces, which count as characters. You may use
regexs in Gravity's rules, display filters, scoring, and search window.
They are used when you want to ..
- specify upper or lower case for the target string.
- specify the position of the target within the string.
- match a range or exact count of characters
- use wild cards or optional phrases
- match a pattern rather than a fixed word or words
The "official" manual is in Gravity's on-line help. Look for reg, you will
find working with regexs and the Regular Expression reference. This is the
only "official" documentation there is.
Regexs are common in UNIX/Linux applications. If you have access to a shell
you could read the man pages (or the info pages) for UNIX tools like ed, grep or perl.
Javascript and Perl regexs are the same, so if you have a Javascript reference
you may find some ideas there. However, note that other implementations
will not match Gravity's syntax exactly. However, the basics
should be almost the same. The best way to learn is to interpret
examples and try some.
You can find a few tutorials on the web, but most cover the same things.
If you are just starting out I wrote a two-part
tutorial. It is very basic and advanced
users may wish to skip it entirely. There are also examples listed on the
examples page.
Gravity's regex package is REGX PLUS: Regular Expression
Search and Replace Routines Copyright 1989, 1990, 1991, 1993
by English Knowledge Systems, Inc. All Rights Reserved.
(Version 3.1 I think).
The main difference between Gravity and Perl-type expressions is that
Gravity's expressions are not case-sensitive unless used within brackets.
Gravity uses simple, ordinary regular expressions. Things like boundaries
(\b) and pre-defined character abbreviations (\w,\W,\d ..)
like you find in Perl are not supported. This is usually no big deal. We
can construct our own, it just takes a little more typing.
(Just a Note: Gravity Version 2.7 (Open Source versions)
uses Perl Compatible Regular Expressions)
With Gravity the dollar sign $ (assignment) has special meaning,
but is not available for use by the end user (that is you).
It must be escaped (\$) unless it is used in brackets like [$].
The beginning of line, and end of line specifiers are different from most
packages. To be more precise, it is the positional specifier < > that
is different. Many packages don't have a positional at all, other than
end or beginning of line. Otherwise, most of the simple specifiers are
the same.
| |
Gravity |
Most others |
| Beginning of line |
<0> |
^ |
| End of line |
<~0> |
$ |
Using Regular Expressions in Rules
Use the rules editor (Tools - Rules, edit or add).
On the rules conditions tab - check the box to add your string
as a regular expression rather than a plain text string (default).
NOTE: You must remember to check the checkbox
to add the string as a regular expression. Forgetting the
check box is a common error.
The rules condition should look like this ...
Subject contains reg. expr. "[Gg]ravity"
but NOT this ...
Subject contains "[Gg]ravity"
Using Regular Expressions in Display Filters
You can use regular Expressions in Display Filters. Click the Filter
button or go to "Newsgroup - Define Display Filter." Use the
" Advanced" button in the Edit Filter dialog box.
This allows you to create filters like this:
Unread articles
And
(
Subject contains reg. expr. "[Gg]ravity"
)
(OK, its a pointless expression but you get the idea)
- Unlike Perl or Javascript, Gravity's Regular Expressions are NOT
case sensitive unless they are placed in brackets, i.e.
tom - matches tom or TOM
Tom - matches tom or TOM (same thing)
[T]om - matches Tom, does NOT match tom
[T][O][M] - matches TOM, does NOT match Tom or tom
- When using a regular expression in a rule make sure you are checking the
checkbox on the rules condition tab. If you forget they obviously will not work.
- To find a dollar sign
$ you must escape it thusly -
\$ , unless it is used inside brackets like so [$].
- The same goes for other meta characters. A good rule of thumb is that
any symbol used in parsing an expression such as these
might need to be escaped, depending on its location. However, if used in brackets, most
do not need the escape backslash. Sometimes this depends on their position
within the brackets. You will have to experiment to see when this is true.
The manual has some special cases for problem characters like dashes and
brackets.
- Gravity does not support the
? (zero to one replications).
However, you can use {0,1} to do the same
thing. Because it is treated as a regular character the ? does not need escaped.
- If you are a Perl Hacker use
<0> and <~0>
in place of ^ and $ for beginning and end of line.
- You can target a whole or complete word (similar to Perl's anchor boundary \b)
by using something like the following:
(<0>| )word_here( |<~0>)
Be sure to include the spaces. This construction defines a "word" as being
preceded by a space, unless it occurs at the beginning of line; or is followed
by a space, unless at the end of line.
If you are using Gravity Version 2.6 (super) the easiest way to
test regular expressions is to use the "Quick Filter"
Be sure to check the regular expression check box. Open the Quick Filter box
by typing a forward slash "/".
Keep in mind the quick filter only works with articles available
in the current display filter.
Note:There was a little bug in 2.6 builds
before 2046, including 2039. If you select both From AND Subject the
boolean logic was AND rather than OR. Later builds use the OR logic.
Another way to test (a little more tedious) is to enter your regular
expression in a display filter and use your test filter to see if the
expression is showing what you want. Set up a new filter under
"newsgroup-define display filter" you need to go to "advanced" to enter
a regular expression.
About posting test articles: If you need to post test articles, do NOT post them to discussion usenet groups.
One of the better ways is to post test messages to a local ISP test group.
This way the messages won't be propagated over the Net. Or, post to test groups
like alt.test or alt.alt.test, which are intended for this purpose.
I have Hamster, a local news server, set up on my machine and I can
post test articles without using my Internet connection.
Another way to test is this:
Create a rule called test, or whatever, and set the rule action
to tag for download (see the following note). Enter your regular expressions in the rule
conditions window. Then run the rule manually, you will see which subjects
are hit, they will have the tagged symbol next to them. Switch to a tagged
article filter to see only the results of the test. To reset the test
switch filters to show tagged articles, hit CONTROL-A, then T to untag all.
Note: You cannot use the tag symbol if you
are storing article bodies (you can't re-tag them). You will have to use another
symbol like important.
- Top of Page -
|