Fails spectacularly -- the geometry shader is aware only of instanced
object ID, and of vertex ID not at all. The former was a corner case
TODO while I was adding non-instanced object ID visualization in
033e56ec23 and which I kinda forgot about,
the latter was discovered while trying to fix the former.
Fix in the next commit.
Hah this took a while, as there was no texture scaffolding in place at
all. Thus all this had to be added and tested for the first time:
* 2D textures
* 2D texture arrays
* Texture transformation uniforms
* Texture transformation UBOs
* Instanced texture offset
This also means that MeshVisualizer can be used to visualize arbitrary
(single-channel) integer textures now, not just render meshes with
object ID textures. Yay for feature parity!
Forgot that gl_VertexID includes the base offset in the multidraw
scenarios, so we need to take all vertices into account, not just the
largest view. The wraparound would cause nasty output differences among
drivers.
Mainly to have feature parity with Flat and Phong -- otherwise switching
to draw a wireframe on an instanced mesh would be too annoying. Also, if
we have multidraw there already, why not instancing as well.
Originally the uniform wasn't present with the assumption that users
could easily adjust color map offset to achieve the same effect. That
was however unnecessarily annoying and error-prone in cases where it's
essential to have the same object IDs from multiple draws have a
matching color, and it was complicating multidraw workflows as the color
map offset was not a part of per-draw data, but rather material data.
Do it always when Flag::TextureArrays is set, not inside handling of
some particular texture. Because that way it won't work when other
textures are added/tested.
Before the object ID was enabled and tested always, which may lead to
some error being undetectable. Plus this makes the test more flexible
for further additions.
The proper practice is to have GLES and WebGL requiurements separated,
as the two editions diverge more and more and treating one as a subset
of another no longer works.
Because a MeshView might not be the best thing to have when you are
submitting a batch of thousand draws. It takes a strided array views to
allow for more flexibility, but can also detect if the input is already
contiguous and use it as-is.
UNFORTUNATELY the GL 1.0 legacy still continues to stink and so there
has to be a 64-bit-specific overload which is the *actual* variant that
doesn't allocate because glMultiDrawElements takes a `void**` for INDEX
OFFSETS and it's IN BYTES! Which foolish soul designed such a thing back
in the 1860s, I wonder. There's no reason to not have an index offset
in elements because all indices have to have the same type ANYWAY. And
yes, I wasted about three hours debugging driver crashes because I
THOUGHT this parameter takes offset in elements, not bytes.
Also note: on 32-bit platforms this depends on latest Corrade with the
CORRADE_TARGET_32BIT definition. Spent an embarrassing amount of time
wondering why all local builds but Emscripten work.
In cases when specular highlights are not desired, results in 30%
speedup (on Intel) and ~25% speedup on AMD, compared to setting the
specular color to transparent black.
Testing was easy thanks to already having a ground truth image for this
case.
After several failed attempts to make UBO performance not suck on Intel
Mesa and Windows drivers, I ended up hiding the dynamic aspect under a
flag. That way it's still possible to get the proper perf in UBO
workflows that don't do light culling, and for workflows where light
culling matters the 2x slowdown might be still better than looping
through several extra lights that don't contribute anything.
While (of course) having zero effect on a single-light scenario, with
five lights it saves about 10% in the classic uniform case (on Intel).
Not bad (also, FFS, what the compiler is doing if it's not able to
optimize this?!).
Hm, I wonder why I did it this way, the operation isn't really heavy to
benefit from the hardware interpolators, and with a lot of lights we'd
hit the maximum output count in the vertex shader.
With classic uniforms this seems to be ~5% faster on both Intel and AMD
cards. With UBOs this is ~15% (!) faster on AMD (I guess because the
constant UBO access overhead is moved to just one stage instead of
both?) but slower on Intel (of course, sigh... I assume due to UBO reads
being slow and so when done for every fragment instead of every vertex it
costs more?). Since this benefits the *real* GPUs while the card that
was already awful is more awful I don't think that's a big deal. This
change stays.
"Luckily", thanks to the DRAW_COUNT=1 and MATERIAL_COUNT=1 optimizations
not everything blows up, so i don't need to skip absolutely everything,
unfortunately Phong lights are affected by this insane crapfest as well
so basically nothing from Phong UBO support is tested there. FFS.