They're not parsed since 6b22a11170
(2020), so there's no point in keeping those workarounds. They're only
kept in utility application sources as they're parsed for pages, and in
tweakable implementations where it's easier to just copypaste the whole
ifdef expression from the header every time instead of modifying it to
not include DOXYGEN_GENERATING_OUTPUT.
Vertex buffer offsets are like this already (and I already had a use
case with a mesh of size larger than 4 GB), with index buffers so far I
thought it's not needed, but it makes sense to do that as well -- there
can be a giant index buffer for many meshes and even though the total
drawn element count won't reach 1 billion (or 1 million, even), it can
still go over. Since it was internally already stored as a pointer-sized
value and some (but not all) code was treating it as pointer-sized, this
change just makes sense.
This also fixes "warning C4244: 'return': conversion from 'const
GLintptr' to 'Magnum::Int', possible loss of data" on MSVC, although in
a very different way.
Whoops, this got silently omitted during the massive refactor in
f7a6d79aa0 (Nov 23). Buffers *do* get
destroyed, VAOs not. If you got sudden GPU memory usage issues after a
recent Magnum update, this was why.
This "type erased std::vector member" was done in the times before
growable arrays were a thing, and kind of made sense to go the extra way
to avoid a <vector> include in the header. Except that it made rather
unportable assumptions about std::vector size, which weren't correct for
example with _GLIBXX_ASSERTIONS set.
But what was *completely* unacceptable was that the vector was of one or
another type depending on the GL feature set present in the current
context. Apart from adding a lot of extra *nasty* logic to construction,
moves and destruction, this approach led to the mesh instance asking the
current context on destruction in order to know whether a destructor
should be called on std::vector<Buffer> or std::vector<AttributeLayout>.
Ugh.
Now it's a regular Array member (which isn't *that* heavy to need such
type-erased treatment, although it eventually could be), and thanks to
the AttributeLayout packing improvements in previous commits it's no
longer prohibitively wasteful to just abuse AttributeLayout instances to
store just owning Buffer instances alone -- doing so now wastes only 16
bytes per buffer, compared to 36 before. Given there's usually just one
or two vertex buffers per mesh (compared to attributes, which are
usually 4 or more), it should be fine.
The MeshGLTest::destructMovedOutInstance() test added few commits back
also no longer asserts on no GL context being present.
Saves 8 bytes (104 -> 96 bytes), which were before wasted on 4
byte padding before the 8-byte _indexBufferOffset member and 4 bytes
after the _indexBuffer, which is new there due to the Buffer being 8
instead of 12 bytes now.
On ES2 there's three 32-bit members less, which means this change only
moved the 4-byte after _indexBuffer to after _indexType, i.e. 88 bytes
before and 88 now as well.
Partially needed to avoid build breakages because Corrade itself
switched as well, partially because a cleanup is always good. Done
except for (STL-heavy) code that's deprecated or SceneGraph-related APIs
that are still quite full of STL as well.
This reverts commit 6bb0179c65 from 2018,
which in turn reverted commit f6ba4111e1,
which in turn reverted commit 4ce2875262
from 2015. The related Emscripten PR was merged in 2018, so it's safe to
assume everything works as expected nowadays.
Which also means I can finally delete my Emscripten fork that contained
the original branch that attempted to add glDrawRangeElements() in May
2015, before WebGL 2 was even supported in Emscripten, or Firefox.
So far this was only possible by creating a temporary MeshView, while
everything else (index/vertex count, base vertex, base instance, ...)
was changeable directly on the Mesh.
And use static functions with an explicit "self" pointer instead. Those
have half the size (8 vs 16 bytes on 64bit x86), which in turn reduces
the state tracker memory use by about 750 bytes. On desktop GL with an
Intel GPU & Mesa this reduces the state tracker allocation size by almot
10%, from 8.3 kB to 7.6 kB. Not bad.
Apart from small memory savings, this also removes the need to include
the full class definiton from the State headers on MSVC (because
on that compiler the member function pointer size is different based on
whether the type definition is known or not, IMAGINE THAT BEING A
FEATURE AND NOT A BUG), leading to less header dependencies and better
incremental compile times there.
This was already done in some cases (and the Vk library used this from
the beginning), and as I'm about to add some more extension-dependent
functionality it felt like a good time to finish that change, finally.
In some cases the *Implementation() could even be dropped in favor of
pointing to the GL API directly (such as is already done for various
glUniform*() calls), that'd be another step -- this is good enough for
now.
All the tests were updated to explicitly check that non-null-terminated
strings get handled properly (the GL label APIs have an explicit size,
so it *should*, but just in case). Also, because various subclasses
override the setter to return correct type for method chaining and the
override has to be deinlined to avoid relying on a StringView include,
the tests are now explicitly done for each leaf class, instead of the
subclass
The <string> being removed from the base class for all GL objects may
affect downstream projects which relied on it being included. In case of
Magnum, the breakages were already fixed in the previous commit.
Compile time improvement for the MagnumGL library alone is 0.2 second or
4% (6.1 seconds before, 5.9 after). Not bad, given that there's three
more files to compile and strings are still heavily used in other GL
debug output APIs and all shader stuff. For a build of just the GL
library and all tests, it goes down from 28.9 seconds to 28.1. Most
tests also still rely quite heavily on std::stringstream for debug
output testing, so the numbers still could go further.
The type is now extended to 32 bits. In the GL and Vk libraries it means
one can now do things like
MeshIndexType type = meshIndexTypeWrap(GL_UNSIGNED_BYTE);
and passing that to GL::Mesh or Vk::Mesh will cause it to use the value
directly, instead of doing a mapping from a generic type. The *real* use
case for this is however to allow custom index buffer representations in
Trade::MeshData. Support for that will be hooked up in the following
commit.
It was used just for certain corner cases that weren't covered by
{EXT,OES}_draw_elements_base_vertex already, but because of a wrongly
understood extension interaction the EXT/OES multidraw entrypoint (which
isn't implemented on ANGLE) was used from these as well.
And while trying to disable the two extensions for the MeshGLTest I
discovered the test half-expects the ANGLE variant to be implemented and
so I finished it for both the single-draw and multi-draw case.
The extension interaction that makes base-vertex multidraw calls broken
on ANGLE will be fixed in the following commit.
Yes, this is only ever provided by an ANGLE extension, and as you might
have guessed, The Great Google Overlords didn't bother caring to have
this supported outside of their browser.
Since the main point of the low-level API is to make use of the
EXT_multi_draw_arrays / ANGLE_multi_draw / WEBGL_multi_draw extensions
for a reduced driver overhead *and* the builtin gl_DrawID variable, it
just doesn't make sense to provide a fallback to draw() in a loop. When
the extensions are not available, the user has to do extra work in order
to supply proper UBO draw offset for each draw call, and thus the
fallback path has basically no practical use.
On the other hand, draw() taking the MeshView instances is kept as-is --
this one is rather old and so breaking compatibility would be an
unnecessary pain point. Plus, since it has to allocate the contiguous
arrays internally, it's not desirable for any fast-path rendering
anyway.
Because a MeshView might not be the best thing to have when you are
submitting a batch of thousand draws. It takes a strided array views to
allow for more flexibility, but can also detect if the input is already
contiguous and use it as-is.
UNFORTUNATELY the GL 1.0 legacy still continues to stink and so there
has to be a 64-bit-specific overload which is the *actual* variant that
doesn't allocate because glMultiDrawElements takes a `void**` for INDEX
OFFSETS and it's IN BYTES! Which foolish soul designed such a thing back
in the 1860s, I wonder. There's no reason to not have an index offset
in elements because all indices have to have the same type ANYWAY. And
yes, I wasted about three hours debugging driver crashes because I
THOUGHT this parameter takes offset in elements, not bytes.
Also note: on 32-bit platforms this depends on latest Corrade with the
CORRADE_TARGET_32BIT definition. Spent an embarrassing amount of time
wondering why all local builds but Emscripten work.
This means that instead of 12 separate allocations we have just one,
allocating everything together in a contiguous piece of memory. That
should be also a bit more cache friendly when accessing the state as
it's not scattered around the memory like crazy.
Because there are no Pointer indirections needed anymore, the State
members are just references now. That resulted in a lot of sweeping
changes around the whole GL library, but they're all trivial, changing
`->` to `.`, mostly.
There's two more nested allocations in the TextureState struct, will
take care of them in a separate commit.
As usual, the old APIs are still present, but marked as deprecated.
Existing code is not updated yet to ensure I didn't break anything with
this.
This way it's much more intuitive and makes the code shorter and nicer
in many cases. Shaders are now also able to hide irrelevant
draw/dispatch APIs to avoid accidents.
Better for checking accidents, as picking a wrong primitive / index type
can lead to *serious* rendering issues. Similarly to a change done to
(Compressed)PixelFormat in 2019.10.
Good thing I checked this -- based on WebGL I was under assumption that
all GPUs have it just 256, but that was really just WebGL limitation.
Also ugh why this query wasn't there since the 90's?
Bloaty says it saved 10 kB in Debug build of MagnumGL:
VM SIZE FILE SIZE
-------------- --------------
[ = ] 0 .debug_info +1.59Ki +0.0%
+0.4% +1.50Ki .text +1.50Ki +0.4%
[ = ] 0 .debug_str +409 +0.0%
[ = ] 0 .debug_line +276 +0.1%
[ = ] 0 .debug_abbrev +20 +0.0%
-28.6% -2 [LOAD [RX]] -2 -28.6%
[ = ] 0 [Unmapped] -4.28Ki -41.0%
-22.7% -9.23Ki .rodata -9.23Ki -22.7%
-0.8% -7.73Ki TOTAL -9.73Ki -0.1%
And 4 kB in Release:
VM SIZE FILE SIZE
-------------- --------------
+1.1% +3.44Ki .text +3.44Ki +1.1%
+1.7% +1.39Ki .eh_frame +1.39Ki +1.7%
[ = ] 0 [Unmapped] +656 +51%
-25.5% -9.47Ki .rodata -9.47Ki -25.5%
-0.7% -4.64Ki TOTAL -4.00Ki -0.4%
That's not negative, so I guess that's good. This change is of course
more significant in the context of a minimal WebGL build, where the exe
can be as little as 50 kB -- there 4 kB is almost 10% of the size.
Deprecated for 2018.04, it's been almost a year since. Whoever is using
Magnum regularly updated already, and who not can always upgrade
gradually (2018.02, 2018.04, 2018.10, 2019.01 etc.).