We have half-float vectors and matrices, so why not these as well. Not
sure for what all is the angle precision usable, but at the very least
it could be useful for compact meshlet occlusion cone / AABB
representation or rough animations.
This is already done for the AbstractImporter and the new
AbstractShaderConverter, as there's a common use case of checking just
the filename for input/output path or file type detection and then
delegating to the common implementation working directly on data.
Minor but very important convenience feature, especially useful when
dealing with command-line apps. This now works:
magnum-imageconverter a.png a.jpg -c jpegQuality=0.75
The AnyImageConverter gets the jpegQuality option and then
automatically propagates it to the concrete plugin (which is either
JpegImageConverter or StbImageConverter), possibly warning in case the
target plugin doesn't recognize given option (i.e., doesn't list it in
its default configuration). Previously the user had to always specify a
concrete converter implementation using -C, which was rather annoying
and nonintuitive.
In cases when specular highlights are not desired, results in 30%
speedup (on Intel) and ~25% speedup on AMD, compared to setting the
specular color to transparent black.
Testing was easy thanks to already having a ground truth image for this
case.
After several failed attempts to make UBO performance not suck on Intel
Mesa and Windows drivers, I ended up hiding the dynamic aspect under a
flag. That way it's still possible to get the proper perf in UBO
workflows that don't do light culling, and for workflows where light
culling matters the 2x slowdown might be still better than looping
through several extra lights that don't contribute anything.
While (of course) having zero effect on a single-light scenario, with
five lights it saves about 10% in the classic uniform case (on Intel).
Not bad (also, FFS, what the compiler is doing if it's not able to
optimize this?!).
Hm, I wonder why I did it this way, the operation isn't really heavy to
benefit from the hardware interpolators, and with a lot of lights we'd
hit the maximum output count in the vertex shader.
With classic uniforms this seems to be ~5% faster on both Intel and AMD
cards. With UBOs this is ~15% (!) faster on AMD (I guess because the
constant UBO access overhead is moved to just one stage instead of
both?) but slower on Intel (of course, sigh... I assume due to UBO reads
being slow and so when done for every fragment instead of every vertex it
costs more?). Since this benefits the *real* GPUs while the card that
was already awful is more awful I don't think that's a big deal. This
change stays.