Learning AWK Programming
上QQ阅读APP看书,第一时间看更新

Using standard input with names in AWK

Sometimes, we may need to read input from standard input and from the pipe. The way to name the standard input, with all versions of AWK, is by using a single minus or dash sign,  -.  For example:

$ cat cars.dat | awk '{ print }' -

This can also be performed as follows:


$ cat cars.dat | awk '{ print }' /dev/stdin ( used with gawk only )

The output on execution of this code is as follows:

maruti          swift       2007        50000       5
honda city 2005 60000 3
maruti dezire 2009 3100 6
chevy beat 2005 33000 2
honda city 2010 33000 6
chevy tavera 1999 10000 4
toyota corolla 1995 95000 2
maruti swift 2009 4100 5
maruti esteem 1997 98000 1
ford ikon 1995 80000 1
honda accord 2000 60000 2
fiat punto 2007 45000 3

We can also first read the input from one file, then read the standard input coming from the pipe, and then read another file again. In that case, the first file's data, the data from the pipe, and the other file's data, all become a single input. All of that data is read consecutively. In the following example, the input from cars.dat is read first, then the echo statement is taken as input, followed by the emp.dat file. Any pattern you apply in this AWK program will be applied on the whole input and not each file, as follows:

$ echo "======================================================" | \
awk '{ print NR , $0 }' cars.dat - emp.dat

The output on execution of this code is as follows:

1 maruti          swift       2007        50000       5
2 honda city 2005 60000 3
3 maruti dezire 2009 3100 6
4 chevy beat 2005 33000 2
5 honda city 2010 33000 6
6 chevy tavera 1999 10000 4
7 toyota corolla 1995 95000 2
8 maruti swift 2009 4100 5
9 maruti esteem 1997 98000 1
10 ford ikon 1995 80000 1
11 honda accord 2000 60000 2
12 fiat punto 2007 45000 3
13 ======================================================
14 Jack Singh 9857532312 jack@gmail.com M hr 2000
15 Jane Kaur 9837432312 jane@gmail.com F hr 1800
16 Eva Chabra 8827232115 eva@gmail.com F lgs 2100
17 Amit Sharma 9911887766 amit@yahoo.com M lgs 2350
18 Julie Kapur 8826234556 julie@yahoo.com F Ops 2500
19 Ana Khanna 9856422312 anak@hotmail.com F Ops 2700
20 Hari Singh 8827255666 hari@yahoo.com M Ops 2350
21 Victor Sharma 8826567898 vics@hotmail.com M Ops 2500
22 John Kapur 9911556789 john@gmail.com M hr 2200
23 Billy Chabra 9911664321 bily@yahoo.com M lgs 1900
24 Sam khanna 8856345512 sam@hotmail.com F lgs 2300
25 Ginny Singh 9857123466 ginny@yahoo.com F hr 2250
26 Emily Kaur 8826175812 emily@gmail.com F Ops 2100
27 Amy Sharma 9857536898 amys@hotmail.com F Ops 2500
28 Vina Singh 8811776612 vina@yahoo.com F lgs 2300

Using command-line arguments: The AWK command line can have different forms, as follows:

awk 'program' file1 file2, file3 ………….

     awk -f source_file file1 file2, file3 ………….

         awk -Fsep 'program' file1 file2, file3 ………….

               awk -Fsep -f source_file file1 file2, file3 ………….

In the given command lines, file1, file2, file3, and so on are command-line arguments that generally represent filenames. The command-line arguments are accessed in the AWK program with a built-in array called ARGV. The number of arguments in the AWK program is stored in the ARGC built-in variable, its value is one more than the actual number of arguments in the command line. For example:

$ awk -f source_file a b c 

Here, ARGV is AWKs' built-in array variable that stores the value of command-line arguments. We access the value stored in the ARGV array by suffixing it with an array index in square brackets, as follows:

  • ARGV [ 0 ] contains awk
  • ARGV [ 1 ] contains a
  • ARGV [ 2 ] contains b
  • ARGV [ 3 ] contains c

ARGC has the value of four, ARGC is one more than the number of arguments because in AWK the name of the command is counted as argument zero, similar to C programs.

For example, the following program displays the number of arguments given to the AWK command and displays their value:

$ vi displayargs.awk 
# echo - print command-line arguments
BEGIN {
printf "No. of command line args is : %d\n", ARGC-1;
for ( i = 1; i < ARGC; i++)
printf "ARG [ %d ] is : %s \n", i, ARGV[ i ]
}

Now, we call this AWK program with the hello how are you command line argument. Here, hello is the first command line argument, how is the second, are is the third, and you is the fourth:

$ awk -f displayargs.awk hello how are you

The output on execution of the preceding code is as follows:

No. of command line args is : 4
ARG[1] is : hello
ARG[2] is : how
ARG[3] is : are
ARG[4] is : you

The AWK commands, source filename, or other options, such as -f or -F followed by field separator, are not treated as arguments. Let's try another useful example of a command-line argument. In this program, we use command-line arguments to generate sequences of integers, as follows:

$ vi seq.awk 

# Program to print sequences of integers
BEGIN {

# If only one argument is given start from number 1
if ( ARGC == 2 )
for ( i = 1; i <= ARGV[1]; i++ )
print i

# If 2 arguments are given start from first number upto second number
else if ( ARGC == 3 )
for ( i = ARGV[1]; i <= ARGV[2]; i++ )
print i

# If 3 arguments are given start from first number through second with a stepping of third number
else if ( ARGC == 4 )
for ( i = ARGV[1]; i <= ARGV[2]; i += ARGV[3] )
print i
}

Now, let's execute the preceding script with three different parameters:

$ awk -f seq.awk 10 
$ awk -f seq.awk 1 10
$ awk -f seq.awk 1 10 1

All the given commands will generate the integers one through ten. Without the second argument, it begins printing the numbers from 1 to the first argument. If two arguments are given, then it prints the number starting from the first argument to the second argument. In the third case, if you specify three arguments, then it prints the numbers between the first and second argument, leaving out the third argument. The output on execution of any of these commands is as follows:

1
2
3
4
5
6
7
8
9
10