Microbenchmarks are supposed to help us compare algorithms, yet the common approach runs each benchmark separately—often minutes apart, sometimes on different machines—and then compares the results. On modern systems where performance fluctuates constantly, that comparison can be surprisingly fragile.
In everyday life, if you want to know who is taller, you don’t measure people hours apart with an extremely precise ruler—you simply stand them next to each other.
In this talk, I’ll argue that microbenchmarks should follow the same principle. Instead of measuring implementations independently, we can run two versions of the code simultaneously on the same machine. Any background noise or system effects influence both versions equally, allowing us to measure the relative performance much more reliably.