HOME
INDEX
REGULAR
EXPRESSIONS
  Examples
  Class 100
  Class 101

Regular Expression Examples

Note: These examples apply to Gravity versions 2.3 through 2.6.
Gravity 2.7 (open source from SourceForge) uses PCREs.
Some of the examples will not work with Version 2.7
  • The Famous Hot Sex Example (uses of the wild card) 11/21/99
  • Match a Word Starting with a Specific Capital Letter
  • Match "Tom" or "Thomas"
  • Match "Fort" or "Ft" or "Ft."
  • Catch Mis-Spellings of "Tallahassee" or "Gravity"
  • Match a Specific All-Caps Word
  • Match Subject Lines with all Capital Letters
  • Match Subject Lines with all Caps Including a Leading "Re:"
  • Match a Complete Word Only
  • Match a Subject Line Containing One or More All-Caps Words
  • Find Two Words In A Line (in order)
  • Use an AND condition (find two words - any order)
  • Match Multi-Part Binaries
  • Match Binary File Extensions
  • Match Multi-Part Binaries with 50 Or More Parts
  • Match Multi-Part Binary Description Articles
  • Match Multi-Part Binary Parts Except First Parts
  • Match Non Standard Characters

The Famous Hot Sex Example (uses of the wild card)

The following is a nice example that shows various uses of the wild card (dot or period) in a regular expression. The topic is taken from a Usenet post where the poster wanted a way to capture any space or punctuation between "hot" and "sex."
hot.sex
Using one wild card alone will result in matching any SINGLE character (including a space). It would match "hot sex" or "hot,sex" or "HOT!SEX"
hot.+sex
Using the plus operator matches ONE or MORE of any character (i.e. space letter, number or punctuation). There must be at least one intervening character. It will match even if whole words are between the two words. In other words, it would match "Hot hamster sex!"
hot.*sex
The star operator is almost identical to the plus sign but it means match ZERO to unlimited characters between the words. It acts the same as the plus sign but, in addition, will match "hotsex" (no space or intervening character).
Note that none of the expressions are in brackets and therefore are not case sensitive.

Match A Word Starting with a Specific Capital Letter

Putting the letter in brackets forces case. The following expression matches "Tom" but not "tom."
 Subject contains reg. expr. "[T]om"

Match 'Tom' or 'Thomas'

All of the following should work (case insensitive). The star means match zero or more occurrences of the preceding regexp. Another way of looking at it is the expression is optional. The third may be the most efficient.
 Subject contains reg. expr. "th*om(as)*"
 Subject contains reg. expr. "th*oma*s*"
 Subject contains reg. expr. "th{0,1}om(as){0,1}"

If you are looking for names and want to specify the capitals use the following.

 Subject contains reg. expr. "[T]h*om(as)*"

Match "Fort" or "Ft" or "Ft."

 Subject contains reg. expr. "f(or)*t\.* "

Or, to limit the hits to capital "F"

 Subject contains reg. expr. "[F](or)*t\.* "

Here in Florida we have lots of cities named Fort something or other. Sometimes the spelling is abbreviated. NOTE: There is a space after the last star. You could get quite a few false hits with "ft". Because a city name always follows the "fort" a space would cut down on the false hits. This is important , otherwise you will hit "often" or "offtopic." The period must be escaped otherwise it will act as a wild card.


Catch Mis-Spellings of "Tallahassee" or "Gravity"

Do you know the proper spelling of Tallahassee? Many people don't. If you were looking for a job there and reading news postings you might see it spelled several ways. The following expression allows for one, or more, of the troublesome letters.
 Subject contains reg. expr. "tal+ah+as+ee"

Here is one way to catch gravity's mis-spelling as "garvity" ...

 Subject contains reg. expr. "g..vity"

You might get false hits, but probably not in the 2 groups; offline-reader or news.software.readers. Here are three other ways ...

 Subject contains reg. expr. "g(r|a)(a|r)vity"
 Subject contains reg. expr. "g[rRaA]{2}vity"
 Subject contains reg. expr. "g[rRaA]+vity"

Match a Specific Word of all Capital Letters

 Subject contains reg. expr. "[M][O][N][E][Y]"

Will match the all caps word 'MONEY.' Remember the brackets force case sensitivity.


Match Subjects with All Capital Letters

Note the 'does not' condition. This will hit any subject title that does not contain a lower case letter like ..
AMAZING BUSINESS OPPORTUNITY
 Subject does not contain reg. expr. "[a-z]"

This works most of the time; however, it will fail on replies that contain a lower case 'e'. If you need to match replies too see the next example.


Match Subjects with All Caps Including a Leading "Re:"

 Subject contains reg. expr. "<0>(R[eE]: )*[^a-z]*<~0>"

Used by permission of the author, Paul Neubauer. In other words, if you want to find articles that are all caps INCLUDING replies use this. It works great.


Match a Whole, or Complete, Word

In this example the target is the word "and". If you search for the regular expression string and you will match other larger strings such as command, Andy, Finland, hand, dandy. (You would of course replace and with your word)

To find a complete word at the beginning of the line add the beginning of line specifier and a space after the word.

 Subject contains reg. expr. "<0>and "

To find a complete word at the end of the line add the end of line specifier and a space before the word.

  Subject contains reg. expr. " and<~0>"

Finally, to limit the match to the complete word only, use this. If you can't read it clearly there are spaces after the first pipe symbol and also after the second opening parenthesis. The + plus signs are optional.

 Subject contains reg. expr. "(<0>| +)and( +|<~0>)"

Since we are looking at examples, here is an alternate way of doing this .....

Subject contains reg. expr. " and "
Or
Subject contains reg. expr. "<0>and "
Or
Subject contains reg. expr. " and<~0>"

Obviously the first method is more compact. Basically, to find a single word it must either be surrounded by spaces, or at the beginning of a line followed by a space, or at the end of the line preceded by a space. Simply pre-pend your target word with (<0>| ) and end the expression with ( |<~0>). Make sure you include the spaces.


Match a Subject Line Containing One or More All-caps Words

This will hit a subject line that contains one or more all-caps word occurring anywhere in the line.
 Subject contains reg. expr. "(<0>| +)[A-Z]{2,}( +|<~0>)"

The {2,} is included to avoid hitting subjects beginning with capital 'A'.


Find Two Words In A Line

The following construction should match two words that are present in a line. They can be separated by anything (or nothing). The target words are first and second . Note that "second" must follow "first". In other words it would match:
Subject contains reg. expr. "(first).*(second)"
OR
Subject contains reg. expr. "first.*second"

You can also use the + sign which means there must be at least one character (spaces count) after the first word. Like this:

Subject contains reg. expr. "first.+second"

To make sure there is at least one "word" between our two words add two spaces like:

Subject contains reg. expr. "first .+ .*second"

Here is another way, I think it is better (note the space after first).

Subject contains reg. expr. "first [^ ]+ .*second"

If you want to make sure there are at least two "words" between the targets use something like this (there are three spaces in the expression):

Subject contains reg. expr. "first .+ .+ .*second"
OR
Subject contains reg. expr. "first [^ ]+ [^ ]+ .*second"

The next example should match EXACTLY one word between the targets, not one or more like the previous example.

Subject contains reg. expr. "first +[^ ]+ +second"

The human english translation is: Match the regular expressions f, i, r, s, t, followed by one or more spaces, followed by one or more of anything EXCEPT a space, followed by one or more spaces, followed by the regular expressions s, e, c, o, n, and d. You could group the target words but it probably doesn't matter here because the search is in order.

Of course the above concept can be expanded to include any number of intervening words. The next one matches five words (actually non-space regular expressions) between the targets.

first +[^ ]+ +[^ ]+ +[^ ]+ +[^ ]+ [^ ]+ +second

If you want to match two words in any order use the following. The second example shows an extra set of parentheses. They may not be necessary but are included because they provide a working example. Sometimes grouping, or lack thereof, can have unexpected consequences. Grouping can also make it easier to read.

(first)|(second).*(first)|(second)
OR
((first)|(second)).*((first)|(second))
This does not function exactly like and AND condition, because you could match either word twice. See the next example for an AND condition.

Use an AND condition (find two words - any order)

Regular expressions do not have an AND operator. The following form works for two words that occur in any order.
 Subject contains reg. expr. "(first.+second)|(second.+first)"

Before you try to use this, remember that Gravity makes it easy to use an AND condition with rules and filters like this:

 Subject contains reg. expr. "first"
 AND
 Subject contains reg. expr. "second"

Match Multi-part Binaries

 Subject contains reg. expr. "[0-9]+/[0-9]+"

This will work most of the time and should be suitable for most purposes.

Some posts contain a number - slash - number in the subject that does not indicate the number of parts, for example a date or series, and you could get false hits. If for some reason you want to try to miss most of these, and be sure the numbers and slash are at the end of line you could do something like the following. You need to match the ending bracket or parenthesis also. The way to match them without escaping is by putting them in brackets like [])]. Note that longer expressions take longer to execute. You won't usually need to do this for binaries, but you might and it makes a good example.

 Subject contains reg. expr. "[0-9]+/[0-9]+[])]<~0>"

Match Binary File Extensions

 Subject contains reg. expr. "\.jpg|\.gif|\.zip|\.avi|\.mpg|\.mp3"
The wild card (period) must be escaped. However, the following (with no escapes) would also work.
 Subject contains reg. expr. ".jpg|.gif|.zip|.avi|.mpg|.mp3"
In this case the "." is acting as the regular expression wild card and matches anything. You could get more false hits. For example, the ".avi" would also match "David". I think it is much better to esacpe the "."

Match Multi-part Binaries with 50 or More Parts

You could use these to filter out, or find, large multi-part binaries. They start simple and get progressively complex. It is usually better to use the simplest expression that will do the job with an acceptable level of false hits. The reason is that longer expressions take significantly longer to execute.

So, the moral of the story is: use a long expression, if needed, but if you can use a shorter expression to do the same thing, you would be better off.

To adjust the size replace the 5 in [5-9] with whatever (e.g. to hit 30 or more parts replace [5-9] with [3-9]).

Let's start by looking for a slash followed by a number of 50 or higher, or 100 or higher.
 /(([5-9][0-9]+)|([1-9][0-9]{2,}))

The next one insures that the slash and number are at the end of the line (which probably isn't necessary in most cases)and also shows another way to handle the numbers. Handling a bracket, ], or a lone parenthesis can be tricky. Sometimes you need to escape them (shown below) and sometimes you don't depending on their position.

 /(([5-9][0-9])|([1-9][0-9][0-9]+))(\)|\])<~0>

We can get around the need escape the bracket and parenthesis by including them within brackets, provided the right bracket comes first.

 /(([5-9][0-9])|([1-9][0-9][0-9]+))[])]

If you wanted to include a number before the slash, use something like the following one:

 [0-9]+/(([5-9][0-9]+)|([1-9][0-9]{2,}))

Finally, if you need to capture the opening and closing parentheses or brackets.

 (\[|\()[0-9]+/(([5-9][0-9]+)|([1-9][0-9]{2,}))(\]|\))

NOTE: The last expression looks cool in a geeky kind of way. However, remember that an expression that is too long for no real purpose is inefficient and takes longer (sometimes exponentially) to execute.


Match Multi-Part Binary Description Articles

This will match article subjects containing the parts of a binary like (0/4) or (00/42) or (000/100). You could use this regexp in a filter for binary groups to browse descriptions before downloading files.
 Subject contains reg. expr. "(\[|\()0+/[0-9]+"

Remember parentheses are meta characters and must be escaped with a backslash unless they are within brackets. A bracket also must be escaped outside of brackets. The one above is more efficient. But, if you need to capture the closing brackets or parentheses use the next one. You could also add the end of line specifier, <~0>, if needed.

 Subject contains reg. expr. "(\(|\[)0+/[0-9]+(\)|\])"

Match Multi-Part Binary Parts Except First Parts

This finds all the parts of a multi-part binary except 0/10 or 00/10. In other words, it finds the attachment part.
 Subject contains reg. expr. "0*[1-9]+0*/[0-9]"

Match Non Standard Characters

The following one is a little odd. Basically, Gravity regexp lists are not preprogrammed ranges, instead the lists are continuous character ranges, according to the current code page. So, the following condition will match a subject line which contains any character out of the "normal" or lowere range of letters, digits, or punctuation marks.

Note: If you can't read it clearly this is a caret followed by a space followed by a hyphen followed by a tilde.

 Subject contains reg. expr. "[^ -~]"

Would hit the following article subject. (It may not even appear properly in your browser, it should be an upside down question mark followed by a Y with two horizontal lines through it).

Hello there

If you want to find one oddball character use the character's number with the ALT key. So, to find the funny Y hold ALT and type 157. This method works for the European characters too.

 Subject contains reg. expr. ""

NOTE: You may not be able to enter this with the ALT key in the rule window. You probably need to add a dummy string then edit the string directly in the condition window. Also, you don't need a regular expression to do this. It seems to work in a non-regexp rule condition.


Back to top of page