Since depth/stencil images can't be linear, I needed buffer/image copies
to test those, and conversely to test buffer/image copies I needed image
clears.
A pretty big chunk of work, and it led to a discovery of a SwiftShader
bug, which I will work around next. First Vulkan driver workaround, so
the whole scaffolding needs to get added as well.
Quite a big chunk of work, further expanded due to how
VK_KHR_create_renderpass2 is designed -- basically, due to the
tightly-packed nested structures that got replaced with their "version
2", we can no longer just extract the previous structure for backwards
compatibility, but instead have to deep-copy everything to a newly
allocated memory.
Thanks to the the new ArrayTuple structure and a few design iterations I
managed to kick the backwards-compatiblity code into just a single
allocation, while still keeping it possible for the "version 2" code
path to be fully allocation-free (if one passes a completely filled
VkRenderPassCreateInfo2 structure there).
You won't believe it, but it took me over a month of sitting on the
shitter until this design idea materialized out of [..] air. The whole
story, in order:
- Vulkan doesn't allow one VkDeviceMemory to be mapped more than once.
This is rather sad, because since Vulkan best practices suggest to
allocate a large block and suballocate from that, the engine needs
an extra layer that "emulates" mapping the suballocations for the
users but behind the scenes it inevitably has to map the whole
VkDeviceMemory anyway and keep it mapped for as long as any of the
sub-mappings is active.
- Because if it would map just a certain suballocation and then the
user would want to map another suballocation, it would have to
discard the original mapping and create a new one spanning both
suballocations and that has a risk of suddenly being in a different
VM block, making all pointers to the previous mapping invalid.
- The Vulkan Memory Allocator implements this approach of mapping the
whole thing and because of all the bookkeeping it doesn't give a
direct access to the underlying VkDeviceMemory, making it rather
hard to integrate.
Here I realized that:
- Most allocations won't need to be mapped ever, so the hiding and
obfuscation done by VMA isn't needed for those --- and we want
interoperability with 3rd party code, so preventing access to
VkDeviceMemory is out of question.
- There's KHR_dedicated_allocation, which (probably?) wasn't around
when VMA was originally designed. The extension was created because
a dedicated allocation actually *does* make sense in certain
cases and on certain architectures. Providing a way to make those
thus shouldn't be something "temporary, until a real allocator
exists" but rather a well-designed API that's there to stay.
- Except for iGPUs, the usual way to populate a GPU buffer would be to
first copy the data to some host-accessible scratch buffer and then
do a GPU-side copy of that buffer to a device-local memory. The
scratch buffer is very likely to have a vastly different
suballocation scheme than GPU buffers (grow & discard everything
once it's all uploaded, for example) so again trying to put the two
under the same allocator umbrella doesn't make sense.
Thus:
- To avoid implementing a full-blown allocator right from the start,
we'll first provide convenience APIs only for dedicated allocations
-- making it possible to transfer memory ownership to an
Image/Buffer so it can be treated the same way as in GL, and later
having the Image/Buffer constructor implicitly allocate a dedicated
VkDeviceMemory.
- This default allocation will be subsequently equipped with
KHR_dedicated_allocation bits.
- Thanks to the extensible/layered nature of the design, the user is
still capable of being completely in control of allocations,
managing VkDeviceMemory sub-allocations by hand.
Finally, once allocator APIs are figured out, the default Buffer/Image
behavior gets switched from a dedicated allocation to using an
allocator, and dedicated allocation will be only used if the
KHR_dedicated_allocation bit is requested.
On desktop this saves about 50 kB in symbols. Was done for Vulkan
already, this follows that (two years later). I need this in order to
solve the problem of static globals being unique across shared libs, and
it sounded better to export just one symbol instead of 689.
Just added them to the list, nothing integrated or implemented yet. Also
added some more stuff into OpenGL mapping table, as I apparently forgot
some entries.
In OpenGL ES 2.0 there is EXT_draw_buffers, which I overlooked somehow,
so I added it to extension list and included in the implementation. It
combines NV_draw_buffers and NV_fbo_color_attachments, so the
implementation now selects one of the two based on which extension is
supported, preferring the EXT one. Updated the documentation to be
less confusing, fixed extension links. Also the single-output
mapForDraw() is not handled separately on ES anymore and just calls
DrawBuffers implementation with single parameter, resulting in less
generated code.
EXT_draw_buffers can also be called on default framebuffer and
apparently in ES there is no way to map front framebuffer for drawing,
so I removed it from the DefaultFramebuffer::DrawAttachment enum.
Using GLES extension functions in the code won't cause linker failures
anymore, but their addresses aren't being loaded yet. As said before,
NaCl and Emscripten is still a special case.