In cases when specular highlights are not desired, results in 30%
speedup (on Intel) and ~25% speedup on AMD, compared to setting the
specular color to transparent black.
Testing was easy thanks to already having a ground truth image for this
case.
After several failed attempts to make UBO performance not suck on Intel
Mesa and Windows drivers, I ended up hiding the dynamic aspect under a
flag. That way it's still possible to get the proper perf in UBO
workflows that don't do light culling, and for workflows where light
culling matters the 2x slowdown might be still better than looping
through several extra lights that don't contribute anything.
While (of course) having zero effect on a single-light scenario, with
five lights it saves about 10% in the classic uniform case (on Intel).
Not bad (also, FFS, what the compiler is doing if it's not able to
optimize this?!).
Hm, I wonder why I did it this way, the operation isn't really heavy to
benefit from the hardware interpolators, and with a lot of lights we'd
hit the maximum output count in the vertex shader.
With classic uniforms this seems to be ~5% faster on both Intel and AMD
cards. With UBOs this is ~15% (!) faster on AMD (I guess because the
constant UBO access overhead is moved to just one stage instead of
both?) but slower on Intel (of course, sigh... I assume due to UBO reads
being slow and so when done for every fragment instead of every vertex it
costs more?). Since this benefits the *real* GPUs while the card that
was already awful is more awful I don't think that's a big deal. This
change stays.
Unlike the drawId optimization before, there's no possibility to
check this anywhere, so the assumption is just documented.
On an Intel 630 this resulted in further significant reduction for the
single-draw single-material case, down to 260 from 440 in the previous
commit, about a 45% reduction compared to the original 550 ms; multidraw
case is still around the 550 there.
This is always true in the single-draw case, since setDrawOffset()
asserts on this. In the multi-draw case this optimization doesn't make
sense, because it doesn't make sense to create a multidraw shader with
just one draw.
On an Intel 630 GPU this resulted in single-draw single-material Phong
to go from 550 ms to 440, which is roughly a 20% improvement. For the
simpler shaders the difference is even higher. The multidraw numbers
stayed the same as before, obviously.
I went through renaming this on many places quite some time ago, but
this one slipped through. Now that UBOs will be a thing, rename to
EXPLICIT_BINDING instead of EXPLICIT_UNIFORM_BINDING.
Not that either way would be more correct than the other (this is what
three.js uses I think), but it was documented everywhere to be
1/(1 + d^2)
but the calculation instead did
1/(1 + d)^2
This now also means the analytical test equation works. I should have
paid more attention to it not matching before, because that obviously
pointed to this problem.
Point lights are now significantly brighter than before, the tests were
updated to use a larger distance to avoid issues with overflows. Does
not affect the (default) directional lights in any way.
This is a -- long overdue -- breaking change to the rendering output of
this shader, finally adding support for lights that get darker over
distance. The attenuation equation is basically what's documented in
LightData, and the distinction between directional and point lights is
made using a newly added the fourth component of position (which means
the old three-component setters are all deprecated). This allows the
shader code to be practically branchless, which I find to be nice.
This breaks basically all rendering output so all existing Phong and
MeshTools::compile() test outputs had to be regenerated.
It's needed to support the new material attributes supported by glTF.
The test output is slightly different as the normal coming from
the texture wasn't normalized before.
Interestingly enough, on Phong it was named textureCoords while on the
C++ side it was asking for textureCoordinates, and I don't remember this
being an issue *ever*, on any driver.
Note -- since there are no visual tests for Phong yet, this is done in
the least intrusive manner to avoid breaking current functionality. It's
likely very underperforming due to the matric calculation per fragment,
it'll get optimized once I have proper tests.
Slow and ugly, is here only for making quick'n'dirty alpha masked
drawing without a need for blending or depth sorting. Oh and also to
support the glTF alpha mask feature. Again, beware: *slow*.