Rust High Performance
上QQ阅读APP看书,第一时间看更新

Link-time optimizations

The next configuration option, useful for improving the performance of the application, is link-time optimizations. Usually, when a program gets built, once all the code has been optimized, it gets linked to other libraries and functions that provide the required functionality. This, however, does not always happen in the most efficient way. Sometimes, a function gets linked twice, or a piece of code gets used in many places, and in that case, the compiler might decide to duplicate some code.

The program will work perfectly, but this has two main disadvantages—first of all, duplicating code and links will make the binary bigger, which is probably something you don't want, and secondly, it will reduce the performance. You might ask why. Well, since it's the same code being accessed from different places in the program, it might make sense that if it gets executed once, it gets added to the L1/L2/L3 caches in the processor. This will enable the future reuse of these instructions without requiring the processor to get them from the RAM memory (much slower) or even the disk/SSD (extremely slow) if the memory has been swapped.

The main advantage when performing Link-Time Optimizations, or LTOs in short, is that while Rust compiles the code file by file, LTOs fit the whole, almost final, representation into a big compilation unit that can be optimized in its entirety, enabling better execution paths.

This can be performed, of course, but at a really high compilation time cost. These optimizations are very costly because they sometimes require changing the final representation, the one ready to be written to the binary. Not only that, but this requires checking lots of execution paths and code samples to find similar blocks. And remember, this is done with the object code, not the Rust code, so the compiler doesn't know about libraries or modules; it only sees instructions.

This costly optimization will improve the performance of your software and the size of your binaries, but being so costly (usually taking as much time as the rest of the compilation, or even more), it's not enabled by default in any of the profiles. And you should not enable it on any but release and maybe benchmarking profiles (you don't want to wait for an LTO every time you make a small change in a function and want to test it). The configuration item to change is the lto configuration item:

    [profile.release]
lto = true

There is a related configuration item, which will be ignored if LTO is turned on for the given profile. I'm talking about the codegen units. This divides the code into multiple smaller code units and compiles each of them separately, enabling parallel compiling, which improves how fast the program compiles. This is one in the case of LTO, but can be modified for the rest. Of course, using separate compilation units avoids some optimizations that can improve the performance of the code, so it might make sense to enable faster compilation in development mode. It will be 16 by default in development mode.

This is as simple as changing the codegen-units configuration option in the development profile:

    [profile.dev]
codegen-units = 32

An example value could be the number of processors/threads in your computer. But remember that this will make the compiled software slower, so do not use it in the release profile. It will, in any case, be ignored if you activate the link-time optimizations.