The test passes now. This made the benchmark run significantly faster.
~200 µs instead of 670. Are clears really taking up more time than all
those texture fetches? Strange.
Just cannot use gl_FragCoord in here. That's it, that's the fix. What's
however COMPLETELY unexpected is that this simple change made the
process significantly faster on my Intel GPU, from ~815 µs to 670! I
can't even pretend I understand what's going on here, but maybe doing
less math in the fragment shader when calculating the texture
coordinates (and thus possibly the driver having a better idea how to
prefetch or schedule?) is what made this faster? Or maybe it's due to
one uniform input less, and two interpolated values instead of four on
the way from the vertex shader?
The original implementation wrongly assumed that the input and output
pixel centers align, which would only be a case if the ratio of the
input and output sizes would be odd. Which it in practice isn't, usually
it's a 1024x1024 texture scaled down to 128x128 or something like that.
The flipped test cases added in the previous commit now pass.
According to the benchmark, the new code is very slightly slower (~815
µs vs ~805 before). The new code isn't really more complex than the old
one, it just does slightly different work -- there are new corner case
in the initial logic for marking the pixel inside or outside, on the
other hand some corner cases that had to be handled in the previous case
are no longer a thing.
If it's not, it's a programmer error (i.e., don't use Luminance or
packed formats, won't work), and since there's no way for the API to
report a failure in a programmatic way, this was causing hard-to-track
errors.
And the thing that changed for SwiftShader 4.1 was added EXT_texture_rg
support, which made the test work. So no, this was not a SwiftShader bug
at all.
Use a different overload to not have to create a temp framebuffer every
time, move also the error checking outside the loop and increase the
iteration count to actually have a result that isn't wildly different
every time.
No reason not to, even though the move is destructive. Also unblocks
Text::DistanceFieldGlyphCache which wasn't movable due to this and one
other problem.
It's too annoying otherwise, especially if padding is involved. Turns
out that if padding is zero, these don't really contribute to the layout
in any way.
The overhead of maintaining two classes with only very slight
differences in the API and the internals being basically identical is
not worth it. Too much potential for inconsistencies and doc errors.
Additionally, when I attempted to use it for the reworked Text glyph
cache, I realized I'd need to wrap them both under a common interface,
allowing easy use for both 2D and 2D array textures. And then it's
easier to just have the Atlas class done that way directly instead of
papering over that in a downstream API.
Instead of piling up a mountain if the other end is a ditch. Results in
better packing in most cases, but in one doesn't, so also adding an
option to disable this.
The docs image is now slightly more leveled, one pixel lower.
It takes too long in debug mode, causing timeouts on the Windows CI due
to older MSVC having truly amazing virtually unmatched debug build
performance.
Just the dumbest possible idea I had, and it compares surprisingly well
in both efficiency (~comparable to stb_rect_pack) and speed
(significantly faster than stb_rect_pack with tons of tiny images,
slower with larger ones -- would probably need to SIMD Math::max() and
such, haha). It's the very first implementation without any additional
improvements I have in mind, so it'll likely improve further.
Includes a benchmark with a bunch of "datasets" extracted from both
fonts and large glTF models. The stb_rect_pack file isn't commited as
it's not useful apart from this single benchmark, put it to
AtlasTestFiles/ and recompile.
I mean, yes, it was already SIGNIFICANTLY better than the atlas() that
took a vector and returned one as well, but still. One of the usual use
cases is that there's an array of structs containing both sizes offsets
and the offsets get written back.
The original versions are deprecated as I really don't see any
convenient use case for these, especially given they return a pair and
not just an array.
So one can directly read it back on GLES without having to wrap the
texture in a framebuffer again.
This change also puts the framebuffer completeness check *before* the
clear() and bind() which makes it no longer emit a GL error. The error
is still silent though, which isn't nice. Gotta fix that eventually as
well.
Together with the parent commit, it's now possible to do THE
UNTHINKABLE:
Containers::Array<Trade::ImageData2D> inputImages = ...;
auto out = TextureTools::atlasArrayPowerOfTwo({2048, 2048},
stridedArrayView(inputImages).slice(&Trade::ImageData2D::size));
An actual use case, in fact. Because why not.
Similar to the change done in Corrade, see the commit for details:
878624ac36
Wow, this is probably the most backwards-compatibility code I've ever
written. Can't wait until I can drop all that.
It limits the support for CMake 3.12+, but it's much less verbose and I
don't expect people to use ancient CMake versions with IDEs like Xcode
or VS anyway, so this should be fine.