So it's all having the same workflow. This one results in even more
saved UBO slots per-draw than in the case of Flat, and the slowdown on
Intel is as bad as expected.
While it's one additional indirection (that has an extra cost on Intel
GPUs apparently, like with Phong and MeshVisualizer and
DistanceFieldVector already), with the assumption that draws usually
share the material info it allows to cram more draws into the 16/64k UBO
limit as the per-draw data are now one vec4 smaller.
For the indirection overhead I can imagine adding a new flag which makes
material mapping implicit (materialId == drawId). That seems to put the
benchmark numbers back to the original speed. Same could be done for
other shaders.
Interestingly, shaders that have indirect material references are about
2x slower on Intel. Not the Flat or Vector, which contain the full
material in the DrawUniform. Will probably need extra
Intel-specific optimizations (like avoiding the indirection if
MATERIAL_COUNT=1).
Took me a while (several years?) to figure out a way to benchmark this
without basically duplicating the testing effort and without new
variants being too hard to add.