It's a lot of code, but it still seems to be the fastest option of all
we have. This was the original idea when implementing half-float support
in 2016 but then I shelved it in favor of a simpler (but slower) code,
keeping the table only for the benchmark, calculated at runtime. But now
we need a batch version of this, so this comes handy.
Most of the code is in the actual test where I'm comparing and
benchmarking three different implementations (a
naive/straightforward/ground-truth one, the chosen one and a fast though
cache-spilling table-based one) to ensure the behavior is consistent
across all of them and that the performance is within reasonable bounds.
The Corrade::TestSuite benchmarking stuff needs serious improvements,
though.