Named character classes (POSIX standard)
Named character classes is a feature introduced in the POSIX standard. A named character class is a special notation that describes the lists of characters that have a specific attribute, but the actual characters can vary from country to country or from one character set to another. For example, the alphabetic character set can differ between India and China.
A named character class is valid in a regexp, when it is given inside the brackets of a bracket expression. The named character class is enclosed between '[:' and ':]'.
For example, if you want to search for lines having alphabets (uppercase and lowercase both), we can write it as follows:
$ awk '/[[:alpha:]]/' dot_regex.txt
The output on execution of the preceding code is as follows:
Let's go for a walk
Singing is a good hobby
We will talk on this matter
Ping me when you are free
(that is cool)
My son's birthday is on 24/04/14
I will be going to Singapore on 24-04-14
(this)
The preceding regex prints all the lines of the file because each line contains alphabets.
Now, let's use the [:digit:] named character class to print the lines with digits in them:
$ awk '/[[:digit:]]/' dot_regex.txt
The output on execution of the given code is as follows:
My son birthday is on 24/04/14
I will be going to Singapore on 24-04-14
A summary of the character classes defined by the POSIX standard is as follows:
Class |
Meaning |
[:digit:] |
Numeric characters |
[:alpha:] |
Alphabetic characters |
[:alnum:] |
Alphanumeric characters |
[:lower:] |
Lowercase alphabetic characters |
[:upper:] |
Uppercase alphabetic characters |
[:blank:] |
Blank characters space and tab |
[:space:] |
Space characters tab, newline, vertical tab, form feed, carriage return, and space |
[:cntrl:] |
Control characters have octal codes 000 to 037 and 177 |
[:xdigit:] |
Characters that are hexadecimal digits |
[:graph:] |
Characters that are both printable and visible, '[:alnum:]' and '[:punct:]' (a space is printable but not visible, whereas an 'a' is both) |
[:print:] |
Printable characters (characters that are not control characters) |
[:punct:] |
Punctuation characters (characters that are not letters, digits, control characters, or space characters), ! " # $ % & ' ( ) * + , - . / : ; < = > ? @ [ \ ] ^ _ ` { | } ~ |
Here is a summary table for named character classes and equivalent character classes:
Named character class |
Character classes |
[:digit:] |
[0-9] |
[:alpha:] |
[a-zA-Z] |
[:alnum:] |
[a-zA-Z0-9] |
[:lower:] |
[a-z] |
[:upper:] |
[A-Z] |
[:blank:] |
Blank characters are space and tab |
[:space:] |
Space characters are tab, newline, vertical tab, form feed, carriage return, and space |
[:cntrl:] |
Control characters: have octal codes 000 to 037 and 177 |
[:xdigit:] |
Hexadecimal digits are 0 1 2 3 4 5 6 7 8 9 A B C D E F a b c d e f |
[:graph:] |
Characters that are both printable and visible: '[:alnum:]' and '[:punct:]' (a space is printable but not visible, whereas an 'a' is both) |
[:print:] |
Printable characters (characters that are not control characters) |
[:punct:] |
Punctuation characters (characters that are not letters, digits, control characters, or space characters), ! " # $ % & ' ( ) * + , - . / : ; < = > ? @ [ \ ] ^ _ ` { | } ~ |