Inspecting with Linux perf
It's worth keeping in mind that Valgrind's cachegrind is a simulation. If you have access to a Linux system you can make use of the perf tool to get real, honest numbers about your program's cache performance and more. This is highly recommended and something we'll do throughout this book. Like git, perf is many tools arranged under a banner—perf—with its own subcommands that have their own options.
It is a tool well worth reading the documentation for. Anyhow, what does the standard look like under perf?
naive_hashmap > perf stat --event task-clock,context-switches,page-faults,cycles,instructions,branches,branch-misses,cache-references,cache-misses target/release/standard > /dev/null Performance counter stats for 'target/release/standard': 6.923765 task-clock (msec) # 0.948 CPUs utilized 0 context-switches # 0.000 K/sec 633 page-faults # 0.091 M/sec 19,703,427 cycles # 2.846 GHz 26,234,708 instructions # 1.33 insn per cycle 2,802,334 branches # 404.741 M/sec 290,475 branch-misses # 10.37% of all branches 635,526 cache-references # 91.789 M/sec 67,304 cache-misses # 10.590 % of all cache refs 0.007301775 seconds time elapsed
How does the naive implementation stand up?
naive_hashmap > perf stat --event task-clock,context-switches,page-faults,cycles,instructions,branches,branch-misses,cache-references,cache-misses target/release/naive > /dev/null Performance counter stats for 'target/release/naive': 1323.724713 task-clock (msec) # 1.000 CPUs utilized 1 context-switches # 0.001 K/sec 254 page-faults # 0.192 K/sec 3,953,955,131 cycles # 2.987 GHz 15,390,499,356 instructions # 3.89 insn per cycle 4,428,637,974 branches # 3345.588 M/sec 204,132 branch-misses # 0.00% of all branches 455,719,875 cache-references # 344.271 M/sec 21,311 cache-misses # 0.005 % of all cache refs 1.324163884 seconds time elapsed
This squares pretty well with the Valgrind simulation and leads to the same conclusion: too much work is being done to insert. Too many branches, too many instructions, the well-studied reader will have seen this coming a mile off, it's just worthwhile to be able to put a thing you know to numbers.