We have half-float vectors and matrices, so why not these as well. Not
sure for what all is the angle precision usable, but at the very least
it could be useful for compact meshlet occlusion cone / AABB
representation or rough animations.
This is already done for the AbstractImporter and the new
AbstractShaderConverter, as there's a common use case of checking just
the filename for input/output path or file type detection and then
delegating to the common implementation working directly on data.
Minor but very important convenience feature, especially useful when
dealing with command-line apps. This now works:
magnum-imageconverter a.png a.jpg -c jpegQuality=0.75
The AnyImageConverter gets the jpegQuality option and then
automatically propagates it to the concrete plugin (which is either
JpegImageConverter or StbImageConverter), possibly warning in case the
target plugin doesn't recognize given option (i.e., doesn't list it in
its default configuration). Previously the user had to always specify a
concrete converter implementation using -C, which was rather annoying
and nonintuitive.
In cases when specular highlights are not desired, results in 30%
speedup (on Intel) and ~25% speedup on AMD, compared to setting the
specular color to transparent black.
Testing was easy thanks to already having a ground truth image for this
case.
After several failed attempts to make UBO performance not suck on Intel
Mesa and Windows drivers, I ended up hiding the dynamic aspect under a
flag. That way it's still possible to get the proper perf in UBO
workflows that don't do light culling, and for workflows where light
culling matters the 2x slowdown might be still better than looping
through several extra lights that don't contribute anything.
While (of course) having zero effect on a single-light scenario, with
five lights it saves about 10% in the classic uniform case (on Intel).
Not bad (also, FFS, what the compiler is doing if it's not able to
optimize this?!).
Hm, I wonder why I did it this way, the operation isn't really heavy to
benefit from the hardware interpolators, and with a lot of lights we'd
hit the maximum output count in the vertex shader.
With classic uniforms this seems to be ~5% faster on both Intel and AMD
cards. With UBOs this is ~15% (!) faster on AMD (I guess because the
constant UBO access overhead is moved to just one stage instead of
both?) but slower on Intel (of course, sigh... I assume due to UBO reads
being slow and so when done for every fragment instead of every vertex it
costs more?). Since this benefits the *real* GPUs while the card that
was already awful is more awful I don't think that's a big deal. This
change stays.
"Luckily", thanks to the DRAW_COUNT=1 and MATERIAL_COUNT=1 optimizations
not everything blows up, so i don't need to skip absolutely everything,
unfortunately Phong lights are affected by this insane crapfest as well
so basically nothing from Phong UBO support is tested there. FFS.
Unlike the drawId optimization before, there's no possibility to
check this anywhere, so the assumption is just documented.
On an Intel 630 this resulted in further significant reduction for the
single-draw single-material case, down to 260 from 440 in the previous
commit, about a 45% reduction compared to the original 550 ms; multidraw
case is still around the 550 there.
This is always true in the single-draw case, since setDrawOffset()
asserts on this. In the multi-draw case this optimization doesn't make
sense, because it doesn't make sense to create a multidraw shader with
just one draw.
On an Intel 630 GPU this resulted in single-draw single-material Phong
to go from 550 ms to 440, which is roughly a 20% improvement. For the
simpler shaders the difference is even higher. The multidraw numbers
stayed the same as before, obviously.