Learning AWK Programming
上QQ阅读APP看书,第一时间看更新

Basic regular expression construct

Regular expressions are made up of two types of characters: normal text characters, called literals, and special characters, such as the asterisk (*, +, ?, .), called metacharacters. There are times when you want to match a metacharacter as a literal character. In such cases, we prefix that metacharacter with a backslash (\), which is called an escape sequence.

The basic regular expression construct can be summarized as follows:

Here is the list of metacharacters, also known as special characters, that are used in building regular expressions:

\    ^    $    .    [    ]    |    (    )    *    +    ?

The following table lists the remaining elements that are used in building a basic regular expression, apart from the metacharacters mentioned before:

Literal

A literal character (non-metacharacter ), such as A, that matches itself.

Escape sequence

An escape sequence that matches a special symbol: for example \t matches tab.

Quoted metacharacter

(\)

In quoted metacharacters, we prefix metacharacter with a backslash, such as \$ that matches the metacharacter literally.

Anchor (^)

Matches the beginning of a string.

Anchor ($)

Matches the end of a string.

Dot (.)

Matches any single character.

Character classes (...)

A character class [ABC] matches any one of the A, B, or C characters. Character classes may include abbreviations, such as [A-Za-z]. They match any single letter.

Complemented character classes

Complemented character classes [^0-9] match any character except a digit.

These operators combine regular expressions into larger ones:

Alternation (|)

A|B matches A or B.

Concatenation

AB matches A immediately followed by B.

Closure (*)

A* matches zero or more As.

Positive closure (+)

A+ matches one or more As.

Zero or one (?)

A? matches the null string or A.

Parentheses ()

Used for grouping regular expressions and back-referencing. Like regular expressions, (r) can be accessed using \n digit in future.

 

In the next section, we will look at regular expression metacharacters and their examples in AWK in more depth.