Hah this took a while, as there was no texture scaffolding in place at
all. Thus all this had to be added and tested for the first time:
* 2D textures
* 2D texture arrays
* Texture transformation uniforms
* Texture transformation UBOs
* Instanced texture offset
This also means that MeshVisualizer can be used to visualize arbitrary
(single-channel) integer textures now, not just render meshes with
object ID textures. Yay for feature parity!
In cases when specular highlights are not desired, results in 30%
speedup (on Intel) and ~25% speedup on AMD, compared to setting the
specular color to transparent black.
Testing was easy thanks to already having a ground truth image for this
case.
After several failed attempts to make UBO performance not suck on Intel
Mesa and Windows drivers, I ended up hiding the dynamic aspect under a
flag. That way it's still possible to get the proper perf in UBO
workflows that don't do light culling, and for workflows where light
culling matters the 2x slowdown might be still better than looping
through several extra lights that don't contribute anything.
Hm, I wonder why I did it this way, the operation isn't really heavy to
benefit from the hardware interpolators, and with a lot of lights we'd
hit the maximum output count in the vertex shader.
With classic uniforms this seems to be ~5% faster on both Intel and AMD
cards. With UBOs this is ~15% (!) faster on AMD (I guess because the
constant UBO access overhead is moved to just one stage instead of
both?) but slower on Intel (of course, sigh... I assume due to UBO reads
being slow and so when done for every fragment instead of every vertex it
costs more?). Since this benefits the *real* GPUs while the card that
was already awful is more awful I don't think that's a big deal. This
change stays.
This is always true in the single-draw case, since setDrawOffset()
asserts on this. In the multi-draw case this optimization doesn't make
sense, because it doesn't make sense to create a multidraw shader with
just one draw.
On an Intel 630 GPU this resulted in single-draw single-material Phong
to go from 550 ms to 440, which is roughly a 20% improvement. For the
simpler shaders the difference is even higher. The multidraw numbers
stayed the same as before, obviously.
These deliberately share the same binding (because there's very little
space), but the shader wasn't guarding that. Discovered completely by
accident when adding tests for "multidraw with all the things" -- Mesa
gives just a warning, but ANGLE straight out fails the shader
compilation, so better have an assert there.
This is a -- long overdue -- breaking change to the rendering output of
this shader, finally adding support for lights that get darker over
distance. The attenuation equation is basically what's documented in
LightData, and the distinction between directional and point lights is
made using a newly added the fourth component of position (which means
the old three-component setters are all deprecated). This allows the
shader code to be practically branchless, which I find to be nice.
This breaks basically all rendering output so all existing Phong and
MeshTools::compile() test outputs had to be regenerated.
It's needed to support the new material attributes supported by glTF.
The test output is slightly different as the normal coming from
the texture wasn't normalized before.
This was stupid, eh? Blame Mesa and SwiftShader for not exposing
ANGLE_instanced_arrays so the only way to test this for me was via the
browser, which is practically impossible. Then found this by an
accident.
Pushing straight to master because YOLO.
Except MeshVisualizer and VertexColor, which don't have any texturing,
so there it's not needed. In most cases the tests are reusing existing
ground truth files and only modifying transformations / flipping images.
Bloaty says it saved 10 kB in Debug build of MagnumGL:
VM SIZE FILE SIZE
-------------- --------------
[ = ] 0 .debug_info +1.59Ki +0.0%
+0.4% +1.50Ki .text +1.50Ki +0.4%
[ = ] 0 .debug_str +409 +0.0%
[ = ] 0 .debug_line +276 +0.1%
[ = ] 0 .debug_abbrev +20 +0.0%
-28.6% -2 [LOAD [RX]] -2 -28.6%
[ = ] 0 [Unmapped] -4.28Ki -41.0%
-22.7% -9.23Ki .rodata -9.23Ki -22.7%
-0.8% -7.73Ki TOTAL -9.73Ki -0.1%
And 4 kB in Release:
VM SIZE FILE SIZE
-------------- --------------
+1.1% +3.44Ki .text +3.44Ki +1.1%
+1.7% +1.39Ki .eh_frame +1.39Ki +1.7%
[ = ] 0 [Unmapped] +656 +51%
-25.5% -9.47Ki .rodata -9.47Ki -25.5%
-0.7% -4.64Ki TOTAL -4.00Ki -0.4%
That's not negative, so I guess that's good. This change is of course
more significant in the context of a minimal WebGL build, where the exe
can be as little as 50 kB -- there 4 kB is almost 10% of the size.
This only generates code that will be never executed. Tested with Flat
and Phong, the other shaders don't have rendering tests yet but since
the change is the same, I assume it will work there as well.
Note -- since there are no visual tests for Phong yet, this is done in
the least intrusive manner to avoid breaking current functionality. It's
likely very underperforming due to the matric calculation per fragment,
it'll get optimized once I have proper tests.
What's left is *a lot* of places taking monstrous
std::vector<std::reference_wrapper> and that can't be changed to
std::vector<Containers::Reference> in a source-compatible way. Even that
would be only a temporary change, since the goal is to fully avoid
dependency on STL in those cases.
The final version of these APIs should take
Containers::ArrayView<Containers::Reference> and be implicitly
convertible froom e.g. std::vector<Containers::Reference>. That's
definitely possible, but not in time for 2019.01, so instead of forcing
users to temporary pass a `{vec.begin(), vec.size()}` everywhere instead
of just `vec`, I'm rather keeping these APIs intact.