Learning AWK Programming
上QQ阅读APP看书,第一时间看更新

Regular expressions as string-matching patterns with AWK

Regular expressions are used as string-matching patterns with AWK in the following three ways. We use the '~' and '! ~' match operators to perform regular expression comparisons:

  • /regexpr/: This matches when the current input line contains a sub-string matched by regexpr. It is the most basic regular expression, which matches itself as a string or sub-string. For example, /mail/ matches only when the current input line contains the mail string as a string, a sub-string, or both. So, we will get lines with Gmail as well as Hotmail in the email ID field of the employee database as follows:
$ awk '/mail/' emp.dat

The output on execution of this code is as follows:

Jack    Singh   9857532312  jack@gmail.com      M   hr      2000
Jane Kaur 9837432312 jane@gmail.com F hr 1800
Eva Chabra 8827232115 eva@gmail.com F lgs 2100
Ana Khanna 9856422312 anak@hotmail.com F Ops 2700
Victor Sharma 8826567898 vics@hotmail.com M Ops 2500
John Kapur 9911556789 john@gmail.com M hr 2200
Sam khanna 8856345512 sam@hotmail.com F lgs 2300
Emily Kaur 8826175812 emily@gmail.com F Ops 2100
Amy Sharma 9857536898 amys@hotmail.com F Ops 2500

In this example, we do not specify any expression, hence it automatically matches a whole line, as follows:

$ awk '$0 ~ /mail/' emp.dat

The output on execution of this code is as follows:

Jack    Singh   9857532312  jack@gmail.com      M   hr      2000
Jane Kaur 9837432312 jane@gmail.com F hr 1800
Eva Chabra 8827232115 eva@gmail.com F lgs 2100
Ana Khanna 9856422312 anak@hotmail.com F Ops 2700
Victor Sharma 8826567898 vics@hotmail.com M Ops 2500
John Kapur 9911556789 john@gmail.com M hr 2200
Sam khanna 8856345512 sam@hotmail.com F lgs 2300
Emily Kaur 8826175812 emily@gmail.com F Ops 2100
Amy Sharma 9857536898 amys@hotmail.com F Ops 2500
  • expression ~ /regexpr /: This matches if the string value of the expression contains a sub-string matched by regexpr. Generally, this left-hand operand of the matching operator is a field. For example, in the following command, we print all the lines in which the value in the second field contains a /Singh/ string:
$ awk '$2 ~ /Singh/{ print }' emp.dat

We can also use the expression as follows:

$ awk '{ if($2 ~ /Singh/) print}' emp.dat

The output on execution of the preceding code is as follows:

Jack    Singh   9857532312  jack@gmail.com      M   hr      2000
Hari Singh 8827255666 hari@yahoo.com M Ops 2350
Ginny Singh 9857123466 ginny@yahoo.com F hr 2250
Vina Singh 8811776612 vina@yahoo.com F lgs 2300
  • expression !~ /regexpr /: This matches if the string value of the expression does not contain a sub-string matched by regexpr. Generally, this expression is also a field variable. For example, in the following example, we print all the lines that don't contain the Singh sub-string in the second field, as follows:
$ awk '$2 !~ /Singh/{ print }' emp.dat

The output on execution of the preceding code is as follows:

Jane    Kaur    9837432312  jane@gmail.com      F   hr      1800
Eva Chabra 8827232115 eva@gmail.com F lgs 2100
Amit Sharma 9911887766 amit@yahoo.com M lgs 2350
Julie Kapur 8826234556 julie@yahoo.com F Ops 2500
Ana Khanna 9856422312 anak@hotmail.com F Ops 2700
Victor Sharma 8826567898 vics@hotmail.com M Ops 2500
John Kapur 9911556789 john@gmail.com M hr 2200
Billy Chabra 9911664321 bily@yahoo.com M lgs 1900
Sam khanna 8856345512 sam@hotmail.com F lgs 2300
Emily Kaur 8826175812 emily@gmail.com F Ops 2100
Amy Sharma 9857536898 amys@hotmail.com F Ops 2500

Any expression may be used in place of /regexpr/ in the context of ~; and !~. The expression here could also be if, while, for, and do statements.