Basic regular expression construct
Regular expressions are made up of two types of characters: normal text characters, called literals, and special characters, such as the asterisk (*, +, ?, .), called metacharacters. There are times when you want to match a metacharacter as a literal character. In such cases, we prefix that metacharacter with a backslash (\), which is called an escape sequence.
The basic regular expression construct can be summarized as follows:
Here is the list of metacharacters, also known as special characters, that are used in building regular expressions:
\ ^ $ . [ ] | ( ) * + ?
The following table lists the remaining elements that are used in building a basic regular expression, apart from the metacharacters mentioned before:
Literal |
A literal character (non-metacharacter ), such as A, that matches itself. |
Escape sequence |
An escape sequence that matches a special symbol: for example \t matches tab. |
Quoted metacharacter (\) |
In quoted metacharacters, we prefix metacharacter with a backslash, such as \$ that matches the metacharacter literally. |
Anchor (^) |
Matches the beginning of a string. |
Anchor ($) |
Matches the end of a string. |
Dot (.) |
Matches any single character. |
Character classes (...) |
A character class [ABC] matches any one of the A, B, or C characters. Character classes may include abbreviations, such as [A-Za-z]. They match any single letter. |
Complemented character classes |
Complemented character classes [^0-9] match any character except a digit. |
These operators combine regular expressions into larger ones:
Alternation (|) |
A|B matches A or B. |
Concatenation |
AB matches A immediately followed by B. |
Closure (*) |
A* matches zero or more As. |
Positive closure (+) |
A+ matches one or more As. |
Zero or one (?) |
A? matches the null string or A. |
Parentheses () |
Used for grouping regular expressions and back-referencing. Like regular expressions, (r) can be accessed using \n digit in future. |
In the next section, we will look at regular expression metacharacters and their examples in AWK in more depth.