Taken from Wikipedia, the free encyclopedia
A regular expression (shortened as regex or regexp; sometimes referred to as rational expression) is a sequence of characters that specifies a search pattern in text. Usually such patterns are used by string-searching algorithms for "find" or "find and replace" operations on strings, or for input validation.
In this example RegEx:
/([A-Z])\w+/g
There's a lot of dense syntax here, let's list it out:
/
starts the RegEx/
ends the RegExg
, are flags
g
means a global flag
()
parentheses define a capture group
[]
, a character set, which matches any character in the set within
A-Z
range, which matches a character in range A
to Z
separated by -
A
to Z
has an ASCII & UTF character code 65 to 90 in decimal, in order\W
is a special token representing a word
+
is a quantifier to match 1 or more of the preceeding tokens (words)This example can be explored in detail because it is the default Regex pattern in RegExr.com. There there's an interactive playground of text and RegEx patterns. It gives a more visually easy to discern explanation of this pattern. The web app can also be used to test or practice different patterns and look up reference material on the subject of regular expressions.
Here are some character classes to help in making RegEx selections. These will help with defining what characters get matched.
Character classes | Example |
---|---|
\d any digit |
+1- (444) -555-1234 |
\D not a digit |
**+1- (444) -555-**1234 * |
\s space |
glib jocks vex dwarves! * |
\S not space |
glib jocks vex dwarves! |
\w any character |
glib jocks vex dwarves! |
\W any character |
glib jocks vex dwarves! * |
. characters except \n |
glib jocks vex dwarves! * |
[aeiou] characters in set |
glib jocks vex dwarves! |
[^aeiou] negated set |
glib jocks vex dwarves! |
[g-s] characters in range |
abcdefghijklmnopqrstuvwxyz |
Note: Any row above with the
*
footnote indicate spaces `` are being matched as well. They don't get picked up with bold markings.
Anchors | Example | Result |
---|---|---|
^ beginning |
^\w+ |
she sells seashells |
$ end of string |
\w+$ |
she sells seashells |
\b word boundrary |
s\b |
she sells seashells |
\B not word boundrary |
s\B |
she sells seashells |
The first two rows above show anchors relating to the end and beginnings of strings. The second two are anchors related to the boundraries of words. Boundraries are any characters that end a word like whitespace or punctuation.
Quantifiers & alternation | Example | Result |
---|---|---|
+ 1 or more of previous |
b\w+ |
b be bee beer beers |
* 0 or more of previous |
b\w* |
b be bee beer beers |
{1,3} 1 or more of previous |
b\w{2,3} |
b be bee beer beers |
? 0 or more of previous (optional) |
colou?r |
color colour |
? 0 or more of previous (lazy) |
b\w+? |
b be bee beer beers |
` | ` or | `b (a |
In the first two qunatity matchers +
& *
you can see how
the +
won't select the b
alone, it needs a \w
or word character to follow.
The *
will select all words because they all start with b
and
because it selects 0 or more it selects the lone b
as well.
The {1,3}
curly braces sets the number of matches of a character we want.
In the example it wants a b
followed by any word character from
two to three occurrences.
The ?
quantifier makes the preceeding character optional.
By specifying colou?r
you can check for
the Brittish & American pronounciation of color.
The u
becomes optional because it is followed by ?
.
It also can be used as a lazy alternation where
Let's see how we apply these patterns.