While branching on a compiler is rather common, checking a particular
compiler version should be needed only rarely. Thus minimize use of such
macros to make them easier to grep for.
Using a tuple was very useful, helpful and exciting as I had to make an
explicit comment about what element is what, and then having to remember
the order in all places that accessed the tuple.
Using a struct however is rather boring as the fields are just named
there, I don't need any complex std::get<4>() and extra comments
explaining what is where and it's just not so adventurous anymore. The
build time of a non-deprecated MagnumGL library also dropped from 5.9
seconds to 5.85. SAD!
All the tests were updated to explicitly check that non-null-terminated
strings get handled properly (the GL label APIs have an explicit size,
so it *should*, but just in case). Also, because various subclasses
override the setter to return correct type for method chaining and the
override has to be deinlined to avoid relying on a StringView include,
the tests are now explicitly done for each leaf class, instead of the
subclass
The <string> being removed from the base class for all GL objects may
affect downstream projects which relied on it being included. In case of
Magnum, the breakages were already fixed in the previous commit.
Compile time improvement for the MagnumGL library alone is 0.2 second or
4% (6.1 seconds before, 5.9 after). Not bad, given that there's three
more files to compile and strings are still heavily used in other GL
debug output APIs and all shader stuff. For a build of just the GL
library and all tests, it goes down from 28.9 seconds to 28.1. Most
tests also still rely quite heavily on std::stringstream for debug
output testing, so the numbers still could go further.
This used to somehow get dragged along with <string> on all platforms we
tested on, but once we get rid of <string> we'd have to include
<algorithm> instead. Not acceptable, also it's much shorter and less
error-prone this way.
It's quite easy if WEBGL_debug_renderer_info is present, but if it isn't
then our only hope is to check for ANGLE-specific extensions. Which may
cause some false positives, but since we need to enable various
workarounds for ANGLE to avoid issues, it's better than having the
detection too conservative.
The original ANGLE detection based on line size is probably not so
relevant anymore, but let's keep it there.
Was browsing the extension registry looking for something else and found
this instead. It used to be ES catching up with desktop, now it's the
other way around.
Because its usage is rather specific, there wasn't no
extension-dependent dispatch happening and thus the extension was never
marked as used. But that was confusing since it actually is, so add an
explicit branch just to list the extension.
It was used just for certain corner cases that weren't covered by
{EXT,OES}_draw_elements_base_vertex already, but because of a wrongly
understood extension interaction the EXT/OES multidraw entrypoint (which
isn't implemented on ANGLE) was used from these as well.
And while trying to disable the two extensions for the MeshGLTest I
discovered the test half-expects the ANGLE variant to be implemented and
so I finished it for both the single-draw and multi-draw case.
The extension interaction that makes base-vertex multidraw calls broken
on ANGLE will be fixed in the following commit.
Yes, this is only ever provided by an ANGLE extension, and as you might
have guessed, The Great Google Overlords didn't bother caring to have
this supported outside of their browser.
Since the main point of the low-level API is to make use of the
EXT_multi_draw_arrays / ANGLE_multi_draw / WEBGL_multi_draw extensions
for a reduced driver overhead *and* the builtin gl_DrawID variable, it
just doesn't make sense to provide a fallback to draw() in a loop. When
the extensions are not available, the user has to do extra work in order
to supply proper UBO draw offset for each draw call, and thus the
fallback path has basically no practical use.
On the other hand, draw() taking the MeshView instances is kept as-is --
this one is rather old and so breaking compatibility would be an
unnecessary pain point. Plus, since it has to allocate the contiguous
arrays internally, it's not desirable for any fast-path rendering
anyway.
Unlike most other extensions, this one has to be explicitly enabled in
Emscripten in order to be used. Which thus done as part of other "driver
workarounds" done on startup. To avoid that, the extension can be
explicitly disabled, and thanks to the previous commit the disabling
will be performed before the extension is attempted to be enabled.
Because a MeshView might not be the best thing to have when you are
submitting a batch of thousand draws. It takes a strided array views to
allow for more flexibility, but can also detect if the input is already
contiguous and use it as-is.
UNFORTUNATELY the GL 1.0 legacy still continues to stink and so there
has to be a 64-bit-specific overload which is the *actual* variant that
doesn't allocate because glMultiDrawElements takes a `void**` for INDEX
OFFSETS and it's IN BYTES! Which foolish soul designed such a thing back
in the 1860s, I wonder. There's no reason to not have an index offset
in elements because all indices have to have the same type ANYWAY. And
yes, I wasted about three hours debugging driver crashes because I
THOUGHT this parameter takes offset in elements, not bytes.
Also note: on 32-bit platforms this depends on latest Corrade with the
CORRADE_TARGET_32BIT definition. Spent an embarrassing amount of time
wondering why all local builds but Emscripten work.
The 32-bit float depth be needed for the upcoming OpenEXR plugin, added
also the remaining ones that will be eventually supported by KTX and DDS
plugins.
There are no SSO / DSA extensions on that platform, so it doesn't make
sense to do this extra indirection. Saves another kilobyte (237 -> 236
kB) in WebGL 2 magnum-gl-info.wasm.
Instead of wrapping glProgramUniform*() inside a member function, we now
call that function directly, which allows us to remove 2/3rds of
AbstractShaderProgram members. The non-DSA code path is still
implemented though member functions, though they are now static,
mimic the signature of the DSA APIs and do a use() + glUniform*()
internally.
This means that instead of 12 separate allocations we have just one,
allocating everything together in a contiguous piece of memory. That
should be also a bit more cache friendly when accessing the state as
it's not scattered around the memory like crazy.
Because there are no Pointer indirections needed anymore, the State
members are just references now. That resulted in a lot of sweeping
changes around the whole GL library, but they're all trivial, changing
`->` to `.`, mostly.
There's two more nested allocations in the TextureState struct, will
take care of them in a separate commit.
Broken since 0ceb54ed7d, but the two code
paths actually differ only by an enum name so it didn't cause any
crashes. (I wonder why I need two different code paths at all.)
Disabling engine startup log or modifying enabled extensions /
workarounds from the application side was one of the common pain
points and this should *finally* solve the problem. This Configuration
is now inherited by the usual Platform::*Application::GLConfiguration /
Platform::Windowless*Application::Configuration classes people are used
to, so for the end user it's just as if these classes got a bunch new
options.
Having this, I also extended the ContextGLTest to verify that the
Configuration and command-line options do what's expected because that
hadn't automated tests until now. The test is mostly a copy of what I
did for Vulkan already, nothing special. Additionally all
Platform*ApplicationTest executables gained a new --quiet option to
verify that the GL::Context::Configuration subset gets correctly passed
from the Application code, because that's something we can't really
verify in an automated way.
StringViews are great -- given that they are trivially destructible, the
compiler can now tell me that I have unused variables. If it wouldn't be
trivially destructible (like std::string), the destructor is treated as
if it has side-effects so the compiler won't complain.
Ew. At first I tried to just port a growable Array of StringViews (which
would already save quite a lot), but then I realized I have a clear
upper bound on the extensions and so can use a "counting sort" without
having to deduplicate anything after.
After the previous (rather minimal) reduction by the Context cleanup,
this reduces the size of magnum-gl-info.wasm from 245 to 237 kB. Quite
significant, I'd say!
These are in most cases the only strings that are used, and I don't
think having to call std::strlen() for each of them is a good idea if
we don't need to.
We're going to eventually include this class in all Application classes
(need that in order to inherit a to-be-created Configuration class) and
the <string> and <vector> would be just too much. This change caused
magnum-gl-info.wasm (WebGL 2 build) to go down from 247 to 245 kB. Not
much, but that's I guess because there's still a lot other vectors of
strings elsewhere.
There's a lot more places to clean up, will do those in separate
commits. This change is the most atomic I could do, and it introduces a
breaking change to all APIs that returned a std::vector or a
std::string. Fortunately (or as I hope) those weren't used that much, so
it shouldn't cause build breakages for that many people.
Quite a lot of the optimization ideas is borrowed from the new Vk
library -- such as "interning" the driver workaround strings to avoid
allocating their copies.