
Preprocessor
At the very start of this book in Chapter 1, Essential Features, we introduced, albeit briefly, the concepts of C preprocessor. Specifically, we talked there about macros, conditional compilation, and header guards.
You will remember that at the beginning of the book, we discussed C preprocessing as an essential feature of the C language. Preprocessing is unique due to the fact that it cannot be easily found in other programming languages. In the simplest terms, preprocessing allows you to modify your source code before sending it for compilation. At the same time, it allows you to divide your source code, especially the declarations, into header files so that you can later include them into multiple source files and reuse those declarations.
It is vital to remember that if you have a syntax error in your source code, the preprocessor will not find the error as it does not know anything about the C syntax. Instead, it will just perform some easy tasks, which typically revolve around text substitutions. As an example, imagine that you have a text file named sample.c
with the following content:
#include <stdio.h>
#define file 1000
Hello, this is just a simple text file but ending with .c extension!
This is not a C file for sure!
But we can preprocess it!
Code Box 2-6: C code containing some text!
Having the preceding code, let us preprocess the file using gcc
. Note that some parts of the following shell box have been removed. This is because including stdio.h
makes the translation unit very big:
$ gcc -E sample.c
# 1 "sample.c"
# 1 "<built-in>" 1
# 1 "<built-in>" 3
# 341 "<built-in>" 3
# 1 "<command line>" 1
# 1 "<built-in>" 2
# 1 "sample.c" 2
# 1 "/usr/include/stdio.h" 1 3 4
# 64 "/usr/include/stdio.h" 3 4
# 1 "/usr/include/_stdio.h" 1 3 4
# 68 "/usr/include/_stdio.h" 3 4
# 1 "/usr/include/sys/cdefs.h" 1 3 4
# 587 "/usr/include/sys/cdefs.h" 3 4
# 1 "/usr/include/sys/_symbol_aliasing.h" 1 3 4
# 588 "/usr/include/sys/cdefs.h" 2 3 4
# 653 "/usr/include/sys/cdefs.h" 3 4
...
...
extern int __vsnprintf_chk (char * restrict, size_t, int, size_t,
const char * restrict, va_list);
# 412 "/usr/include/stdio.h" 2 3 4
# 2 "sample.c" 2
Hello, this is just a simple text 1000 but ending with .c extension!
This is not a C 1000 for sure!
But we can preprocess it!
$
Shell Box 2-9: The preprocessed sample C code seen in Code Box 2-6
As you see in the preceding shell box, the content of stdio.h
is copied before the text.
If you pay more attention, you will see that another interesting substitution has also happened. The occurrences of the file
have been replaced by 1000
in the text.
This example shows us exactly how the preprocessor works. The preprocessor only does simple tasks, such as inclusion, by copying contents from a file or expanding the macros by text substitution. It does not know anything about C though; it needs a parser to parse the input file before performing any further tasks. This means that a C preprocessor uses a parser, which looks for directives in the input code.
Note:
Generally, a parser is a program that processes the input data and extracts some certain parts of it for further analysis and processing. Parsers need to know the structure of the input data in order to break it down into some smaller and useful pieces of data.
The preprocessor's parser is different from the parser used by a C compiler because it uses grammar that is almost independent of C grammar. This enables us to use it in circumstances other than preprocessing a C file.
Note:
By exploiting the functionalities of a C preprocessor, you could use file inclusion and macro expansion for other purposes other than building a C program. They could be used to process other text files as well.
The GNU C Preprocessor Internals – http://www.chiark.greenend.org.uk/doc/cpp-4.3-doc/cppinternals.html – is a great source for learning more about the gcc
preprocessor. This document is an official source that describes how the GNU C preprocessor works. The GNU C preprocessor is used by the gcc
compiler to preprocess the source files.
In the preceding link, you can find how the preprocessor parses the directives and how it creates the parse tree. The document also provides an explanation of the different macro expansion algorithms. While it is outside of the scope of this chapter, if you wanted to implement your own preprocessor for a specific in-house programming language, or just for processing some text files, then the above link provides some great context.
In most Unix-like operating systems, there is a tool called cpp, which stands for C Pre-Processor – and not C Plus Plus! cpp
is part of the C development bundle that is shipped with each flavor of Unix. It can be used to preprocess a C file. In the background, the tool is used by a C compiler, like gcc
, to preprocess a C file. If you have a source file, you can use it, in a similar way to what we have done next, to preprocess a source file:
$ cpp ExtremeC_examples_chapter2_1.c
# 1 "ExtremeC_examples_chapter2_1.c"
# 1 "<built-in>" 1
# 1 "<built-in>" 3
# 340 "<built-in>" 3
# 1 "<command line>" 1
# 1 "<built-in>" 2
...
...
# 5 "ExtremeC_examples_chapter2_1.c" 2
double avg(int* array, int length, average_type_t type) {
if (length <= 0 || type == NONE) {
return 0;
}
double sum = 0;
for (int i = 0; i < length; i++) {
if (type == NORMAL) {
sum += array[i];
} else if (type == SQUARED) {
sum += array[i] * array[i];
}
}
return sum / length;
}
$
Shell Box 2-10: Using the cpp utility to preprocess source code
As a final note in this section, if you pass a file with the extension .i
to a C compiler, then it will bypass the preprocessor step. It does this because a file with a .i
extension is supposed to have already been preprocessed. Therefore, it should be sent directly to the compilation step.
If you insist on running the C preprocessor for a file with a .i
extension, then you will get the following warning message. Note that the following shell box is produced with the clang
compiler:
$ clang -E ExtremeC_examples_chapter2_1.c > ex2_1.i
$ clang -E ex2_1.i
clang: warning: ex2_1.i: previously preprocessed input
[-Wunused-command-line-argument]
$
Shell Box 2-11: Passing an already preprocessed file, with extension .i, to the clang compiler
As you can see, clang
warns us that the file has been already preprocessed.
In the next section of this chapter, we are going to specifically talk about the compiler component in the C compilation pipeline.