Similarly as done in Aug 2024 in Corrade. When these were a part of the
function signature, they ended up being encoded into the exported
symbol. There are still cases of StridedArrayView slice() having
enable_if in the signature, which amounts to about 18 kB symbols in all
libMagnum*-d.so libraries, but apart from that this is the state before:
$ strings libMagnum*-d.so | grep enable_if | grep -v slice | wc -c
29591
And this is after. All of those are coming from STL, thus from
old or deprecated APIs that still use std::vector, std::tuple and such,
and from the few std::sort() uses.
$ strings libMagnum*-d.so | grep enable_if | grep -v slice | wc -c
4103
In a non-deprecated build it's just this, which is a 10x reduction.
Can't really do much about these maybe exceút for implementing my own
swap() specializations (sigh?), but I think it's fine.
$ strings libMagnum*-d.so | grep enable_if | grep -v slice | wc -c
2904
I also made it consistently use
typename std::enable_if<..., int>::type = 0
instead of
class = typename std::enable_if<...>::type
because the former works correctly also in presence of overloads and
having it used consistently everywhere makes it easier to grep & change
later. All SFINAE is now also excluded from Doxygen output, because it
doesn't make much sense there. It's better to just explain the
restriction in words than with this nasty hack.
An ad-hoc solution was already done in DebugTools::screenshot(), now I
need it in another place. While not as fast as the O(1) mapping from
the generic format to the API-specific ones due to the potentially
linear lookup, it definitely could be useful in general.
Among other things this makes it possible to use Utility::copy() instead
of a manual loop and grow arrays with std::realloc() instead of always
new'ing-copy-deleting because Containers::Pair is trivially copyable.
The 32-bit float depth be needed for the upcoming OpenEXR plugin, added
also the remaining ones that will be eventually supported by KTX and DDS
plugins.
Together with:
* CommandBuffer::draw()
* Support for indexed and non-indexed meshes
* Support for setting primitive and stride dynamically
I took one shortcut and vkCmdBindVertexBuffers() is currently called
once for each binding. The interface is ready for this, but I'm not yet
100% sure how to test that it actually does batch the buffers, so it's
left at the lazy implementation for now.
I named it RasterizationPipelineCreateInfo and not
GraphicsPipelineCreateInfo because there's now a
RayTracingPipelineCreateInfo as well, which is *also* graphics, and it
would be confusing for everyone except people already drowned in Vulkan
naming quirks.
Similar to PixelFormat, to filter out values that make no sense as a
vertex format (such as sRGB) and add others (such as doubles). And
documenting which are guaranteed to be supported and which not. The
hasVkFormat(Magnum::VertexFormat) and vkFormat(Magnum::VertexFormat)
were also deprecated in favor of the new hasVertexFormat() /
vertexFormat() APIs.
Since depth/stencil images can't be linear, I needed buffer/image copies
to test those, and conversely to test buffer/image copies I needed image
clears.
A pretty big chunk of work, and it led to a discovery of a SwiftShader
bug, which I will work around next. First Vulkan driver workaround, so
the whole scaffolding needs to get added as well.
I was slowly getting cancer from having to write the unreadably awful
VK_FORMAT_R666G666B666A666_SRGB all the time. Besides that:
- All pixel formats are documented to show what's guaranteed for them
by the spec. Pretty useful I'd say.
- The old hasVkFormat() and vkFormat() converters operating on a
VkFormat are deprecated in favor of new hasPixelFormat() and
pixelFormat() that use the PixelFormat enum. Similarly as done in the
GL wrapper.
- All APIs that took a VkFormat before take a PixelFormat now, together
with having conveinience overloads for Magnum::PixelFormat and
Magnum::CompressedPixelFormat. Again similarly as done in the GL
wrapper, also the first step on being able to *directly* use data
imported with the Trade library with Vulkan.
This was way more pain that initially expected, especially in regards to
preserving externally-specified pNext chains without writing to them in
any way.
This is what I needed BigEnumSet for -- good thing I didn't even try to
have 128-bit enums because I'm now at 110 values and it's still far from
complete. Next step is enabling those features when creating a device,
which should hopefully be a lot less code, reusing most of what was
here.
Quite a big chunk of work, further expanded due to how
VK_KHR_create_renderpass2 is designed -- basically, due to the
tightly-packed nested structures that got replaced with their "version
2", we can no longer just extract the previous structure for backwards
compatibility, but instead have to deep-copy everything to a newly
allocated memory.
Thanks to the the new ArrayTuple structure and a few design iterations I
managed to kick the backwards-compatiblity code into just a single
allocation, while still keeping it possible for the "version 2" code
path to be fully allocation-free (if one passes a completely filled
VkRenderPassCreateInfo2 structure there).
Today I spent six hours wrongly convincing myself that it's a driver bug
when vkGetPhysicalDeviceProperties2() is null on a 1.1 instance for a
1.0 physical device. It's not a bug, it's me not reading specs
carefully.
This commit thus basically moves all Instance-level extension-dependent
state to DeviceProperties, because it's actually device-dependent. Which
makes the DeviceProperties class quite heavy and thus it's good it was
readied to be transferred all the way to a Device instance a few commits
back -- I don't really want to do all the dispatch, string processing,
sorting and other mess more times than strictly necessary.
In addition, DeviceProperties::apiVersion() got renamed to version() and
a new isVersionSupported() API got added, mirroring what's on Device
itself; plus thanks to the chicken-and-egg problem of having to call
vkGetPhysicalDeviceProperties() twice, the device version and other
things can now be retrieved in a slightly more efficient way.
You won't believe it, but it took me over a month of sitting on the
shitter until this design idea materialized out of [..] air. The whole
story, in order:
- Vulkan doesn't allow one VkDeviceMemory to be mapped more than once.
This is rather sad, because since Vulkan best practices suggest to
allocate a large block and suballocate from that, the engine needs
an extra layer that "emulates" mapping the suballocations for the
users but behind the scenes it inevitably has to map the whole
VkDeviceMemory anyway and keep it mapped for as long as any of the
sub-mappings is active.
- Because if it would map just a certain suballocation and then the
user would want to map another suballocation, it would have to
discard the original mapping and create a new one spanning both
suballocations and that has a risk of suddenly being in a different
VM block, making all pointers to the previous mapping invalid.
- The Vulkan Memory Allocator implements this approach of mapping the
whole thing and because of all the bookkeeping it doesn't give a
direct access to the underlying VkDeviceMemory, making it rather
hard to integrate.
Here I realized that:
- Most allocations won't need to be mapped ever, so the hiding and
obfuscation done by VMA isn't needed for those --- and we want
interoperability with 3rd party code, so preventing access to
VkDeviceMemory is out of question.
- There's KHR_dedicated_allocation, which (probably?) wasn't around
when VMA was originally designed. The extension was created because
a dedicated allocation actually *does* make sense in certain
cases and on certain architectures. Providing a way to make those
thus shouldn't be something "temporary, until a real allocator
exists" but rather a well-designed API that's there to stay.
- Except for iGPUs, the usual way to populate a GPU buffer would be to
first copy the data to some host-accessible scratch buffer and then
do a GPU-side copy of that buffer to a device-local memory. The
scratch buffer is very likely to have a vastly different
suballocation scheme than GPU buffers (grow & discard everything
once it's all uploaded, for example) so again trying to put the two
under the same allocator umbrella doesn't make sense.
Thus:
- To avoid implementing a full-blown allocator right from the start,
we'll first provide convenience APIs only for dedicated allocations
-- making it possible to transfer memory ownership to an
Image/Buffer so it can be treated the same way as in GL, and later
having the Image/Buffer constructor implicitly allocate a dedicated
VkDeviceMemory.
- This default allocation will be subsequently equipped with
KHR_dedicated_allocation bits.
- Thanks to the extensible/layered nature of the design, the user is
still capable of being completely in control of allocations,
managing VkDeviceMemory sub-allocations by hand.
Finally, once allocator APIs are figured out, the default Buffer/Image
behavior gets switched from a dedicated allocation to using an
allocator, and dedicated allocation will be only used if the
KHR_dedicated_allocation bit is requested.
Memory type flags are put into a new, separate Memory.h header as those
will be needed more often than the (ever-growing) DeviceProperties --
from Image and Buffer constructors, in particular.