TL;DR: Each of Linux 2.6.39, 3.6 and 4.0 brings notable performance improvements for the IPv4 route lookup process.

In a previous article, I explained how Linux implements an IPv4 routing table with compressed tries to offer excellent lookup times. The following graph shows the performance progression of Linux through history:

IPv4 route lookup performance
Route lookup times for various Linux kernel versions. Lookup is done on a table with 500,000 routes. Notable performance improvements are highlighted.

Two scenarios are tested:

  • 500,000 routes extracted from an Internet router (half of them are /24), and
  • 500,000 host routes (/32) tightly packed in 4 distinct subnets.

All kernels are compiled with GCC 4.9 (from Debian Jessie). This version is able to compile older kernels1 as well as current ones. The kernel configuration used is the default one with CONFIG_SMP and CONFIG_IP_MULTIPLE_TABLES options enabled (however, no IP rules are used). Some other unrelated options are enabled to be able to boot them in a virtual machine and run the benchmark.

The measurements are done in a virtual machine with one vCPU2. The host is an Intel Core i5-4670K and the CPU governor was set to “performance”. The benchmark is single-threaded. Implemented as a kernel module, it calls fib_lookup() with various destinations in 100,000 timed iterations and keeps the median. Timings of individual runs are computed from the TSC (and converted to nanoseconds by assuming a constant clock).

The following kernel versions bring a notable performance improvement:

  • In Linux 2.6.39, commit 3630b7c050d9, David Miller removes the hash-based routing table implementation to switch to the LPC-trie implementation (available since Linux 2.6.13 as a compile-time option). This brings a small regression for the scenario with many host routes but boosts the performance for the general case.

  • In Linux 3.0, commit 281dc5c5ec0f, the improvement is not related to the network subsystem. Linus Torvalds disables the compiler size-optimization from the default configuration. It was believed that optimizing for size would help keeping the instruction cache efficient. However, compilers generated under-performing code on x86 when this option was enabled.

  • In Linux 3.6, commit f4530fa574df, David Miller adds an optimization to not evaluate IP rules when they are left unconfigured. From this point, the use of the CONFIG_IP_MULTIPLE_TABLES option doesn’t impact the performances unless some IP rules are configured. This version also removes the route cache (commit 5e9965c15ba8). However, this has no effect on the benchmark as it directly calls fib_lookup() which doesn’t involve the cache.

  • In Linux 4.0, notably commit 9f9e636d4f89, Alexander Duyck adds several optimizations to the trie lookup algorithm. It really pays off!

  • In Linux 4.1, commit 0ddcf43d5d4a, Alexander Duyck collapses the local and main tables when no specific IP rules are configured. For non-local traffic, both those tables were looked up.

  1. Compiling old kernels with an updated userland may still require some small patches

  2. The kernels are compiled with the CONFIG_SMP option to use the hierarchical RCU and activate more of the same code paths as actual routers. However, progress on parallelism are left unnoticed.