So far this was only possible by creating a temporary MeshView, while
everything else (index/vertex count, base vertex, base instance, ...)
was changeable directly on the Mesh.
And use static functions with an explicit "self" pointer instead. Those
have half the size (8 vs 16 bytes on 64bit x86), which in turn reduces
the state tracker memory use by about 750 bytes. On desktop GL with an
Intel GPU & Mesa this reduces the state tracker allocation size by almot
10%, from 8.3 kB to 7.6 kB. Not bad.
Apart from small memory savings, this also removes the need to include
the full class definiton from the State headers on MSVC (because
on that compiler the member function pointer size is different based on
whether the type definition is known or not, IMAGINE THAT BEING A
FEATURE AND NOT A BUG), leading to less header dependencies and better
incremental compile times there.
This was already done in some cases (and the Vk library used this from
the beginning), and as I'm about to add some more extension-dependent
functionality it felt like a good time to finish that change, finally.
In some cases the *Implementation() could even be dropped in favor of
pointing to the GL API directly (such as is already done for various
glUniform*() calls), that'd be another step -- this is good enough for
now.
Given that I made a breaking change by returning Containers::String
instead of a std::string, I can take it further and replace also
std::pair with Containers::Pair -- it won't bring any more pain to the
users, they have to change their code anyway.
Co-authored-by: Hugo Amiard <hugo.amiard@wonderlandengine.com>
Same as in the previous commit, most cases are inputs so a StringStl.h
compatibility include will do, the only breaking change is
GL::Shader::sources() which now returns a StringIterable instead of a
std::vector<std::string> (ew).
Awesome about this whole thing is that The Shader API now allows
creating a shader from sources coming either from string view literals
or Utility::Resource completely without having to allocate any strings
internally, because all those can be just non-owning references wrapped
with String::nullTerminatedGlobalView(). The only parts which aren't
references are the #line markers, but (especially on 64bit) those can
easily fit into the 22-byte (or 10-byte on 32bit) SSO storage.
Also, various Shader constructors and assignment operators had to be
deinlined in order to avoid having to include the String header, which
would be needed for Array destruction during a move.
Co-authored-by: Hugo Amiard <hugo.amiard@wonderlandengine.com>
A bunch of new GLES- and WebGL-specific draw() variants got added in
b30d313ecd over a year ago, but so far
they weren't exposed in any of the Shaders because it'd mean a lot of
nasty copypasting and I just didn't like that.
So instead, there's a new macro to handle this messy work, and also
tested in order to ensure everything is still as it should be and that
it works even outside the Magnum namespace. This makes it much easier to
add new draw() variants (such as indirect draws, *finally*), without
having to update every shader implementation under the sun.
One difference is that I'm now allowing drawTransformFeedback() always
-- because that makes sense. It *is* possible to use a regular shader to
draw a result of a XFB, so it doesn't make sense to attempt to block
that.
The change to Shaders will be done in the next commit.
It's four pointers, twice as much as what would be acceptable. Not sure
why this happened, maybe because all those cases used an ArrayView
before and so I just changed the type without considering the difference
in its size?
Unfortunately this change also means a bump in the plugin interface
string, thus all scene converter plugins have to be updated as well.
Consistently with checkLink(), this avoids having to explicitly include
both Iterable and Reference in shader code. Alsod allowing people to
have direct arrays of shaders, runtime-sized lists of shaders etc.
A compat include is provided on a deprecated build to avoid breaking
existing code.
There's no reason for those to exist anymore -- origiinally they were
added in a hopeful attempt to make use of parallel shader compilation,
but in practice that meant compiling at most two or three shaders at
once and still stalling until that was done, so not that great at all.
The new APIs provide much better opportunities for parallelism.
Fun fact:
CORRADE_INTERNAL_ASSERT_OUTPUT(vert.compile() && frag.compile());
is actually one character shorter than
CORRADE_INTERNAL_ASSERT_OUTPUT(GL::Shader::compile({vert, frag}));
so not even typing convenience would be a reason to keep these.
The new async APIs were just checking the link status, and printing the
linker error. Because drivers commonly do all that in a single step,
without really separating compilation from linking (or at least that's
what I thought?), I assumed the linker error would contain *also* the
compilation error, if any.
But on a quick check with Mesa that's not the case, I only get "error:
linking with uncompiled/unspecialized shader", which is very useless.
Which means, to get proper error output, the checkLink() function now
explicitly takes a list of the input shaders. It will unconditionally go
through them at the beginning and call checkCompile() on each.
To further encourage the shaders to be passed, there's no default
argument -- so if the application calls checkCompile() on its own for
some reason, it has to pass an empty list to checkLink().
All the tests were updated to explicitly check that non-null-terminated
strings get handled properly (the GL label APIs have an explicit size,
so it *should*, but just in case). Also, because various subclasses
override the setter to return correct type for method chaining and the
override has to be deinlined to avoid relying on a StringView include,
the tests are now explicitly done for each leaf class, instead of the
subclass
The <string> being removed from the base class for all GL objects may
affect downstream projects which relied on it being included. In case of
Magnum, the breakages were already fixed in the previous commit.
Compile time improvement for the MagnumGL library alone is 0.2 second or
4% (6.1 seconds before, 5.9 after). Not bad, given that there's three
more files to compile and strings are still heavily used in other GL
debug output APIs and all shader stuff. For a build of just the GL
library and all tests, it goes down from 28.9 seconds to 28.1. Most
tests also still rely quite heavily on std::stringstream for debug
output testing, so the numbers still could go further.
Yes, this is only ever provided by an ANGLE extension, and as you might
have guessed, The Great Google Overlords didn't bother caring to have
this supported outside of their browser.
Since the main point of the low-level API is to make use of the
EXT_multi_draw_arrays / ANGLE_multi_draw / WEBGL_multi_draw extensions
for a reduced driver overhead *and* the builtin gl_DrawID variable, it
just doesn't make sense to provide a fallback to draw() in a loop. When
the extensions are not available, the user has to do extra work in order
to supply proper UBO draw offset for each draw call, and thus the
fallback path has basically no practical use.
On the other hand, draw() taking the MeshView instances is kept as-is --
this one is rather old and so breaking compatibility would be an
unnecessary pain point. Plus, since it has to allocate the contiguous
arrays internally, it's not desirable for any fast-path rendering
anyway.
Because a MeshView might not be the best thing to have when you are
submitting a batch of thousand draws. It takes a strided array views to
allow for more flexibility, but can also detect if the input is already
contiguous and use it as-is.
UNFORTUNATELY the GL 1.0 legacy still continues to stink and so there
has to be a 64-bit-specific overload which is the *actual* variant that
doesn't allocate because glMultiDrawElements takes a `void**` for INDEX
OFFSETS and it's IN BYTES! Which foolish soul designed such a thing back
in the 1860s, I wonder. There's no reason to not have an index offset
in elements because all indices have to have the same type ANYWAY. And
yes, I wasted about three hours debugging driver crashes because I
THOUGHT this parameter takes offset in elements, not bytes.
Also note: on 32-bit platforms this depends on latest Corrade with the
CORRADE_TARGET_32BIT definition. Spent an embarrassing amount of time
wondering why all local builds but Emscripten work.
Instead of wrapping glProgramUniform*() inside a member function, we now
call that function directly, which allows us to remove 2/3rds of
AbstractShaderProgram members. The non-DSA code path is still
implemented though member functions, though they are now static,
mimic the signature of the DSA APIs and do a use() + glUniform*()
internally.
This was done silently until now and I think such platform-specific code
should be always exposed as a disableable workaround. Moreover, I need a
similar thing for ANGLE, so this comes handy.
As usual, the old APIs are still present, but marked as deprecated.
Existing code is not updated yet to ensure I didn't break anything with
this.
This way it's much more intuitive and makes the code shorter and nicer
in many cases. Shaders are now also able to hide irrelevant
draw/dispatch APIs to avoid accidents.