It's faster that way because it doesn't involve a linear lookup, if the
resource is already imported it's a constant-time check and it becomes a
no-op.
It just binds a layer of it to a framebuffer internally so no shader
changes or extra construction flags are needed. Originally I thought
about making the input an array as well, but ultimately that just
doesn't make sense -- the processing would need to be done slice by
slice anyway and you don't want to allocate the whole excessively sized
texture just for it to be used once, and only a part of it every time.
Along with the bits in Text library this is one of the last things that
still assume OpenGL present by default.
As usual, the old name and header is now a deprecated typedef.
The test passes now. This made the benchmark run significantly faster.
~200 µs instead of 670. Are clears really taking up more time than all
those texture fetches? Strange.
Just cannot use gl_FragCoord in here. That's it, that's the fix. What's
however COMPLETELY unexpected is that this simple change made the
process significantly faster on my Intel GPU, from ~815 µs to 670! I
can't even pretend I understand what's going on here, but maybe doing
less math in the fragment shader when calculating the texture
coordinates (and thus possibly the driver having a better idea how to
prefetch or schedule?) is what made this faster? Or maybe it's due to
one uniform input less, and two interpolated values instead of four on
the way from the vertex shader?
The original implementation wrongly assumed that the input and output
pixel centers align, which would only be a case if the ratio of the
input and output sizes would be odd. Which it in practice isn't, usually
it's a 1024x1024 texture scaled down to 128x128 or something like that.
The flipped test cases added in the previous commit now pass.
According to the benchmark, the new code is very slightly slower (~815
µs vs ~805 before). The new code isn't really more complex than the old
one, it just does slightly different work -- there are new corner case
in the initial logic for marking the pixel inside or outside, on the
other hand some corner cases that had to be handled in the previous case
are no longer a thing.
If it's not, it's a programmer error (i.e., don't use Luminance or
packed formats, won't work), and since there's no way for the API to
report a failure in a programmatic way, this was causing hard-to-track
errors.
No reason not to, even though the move is destructive. Also unblocks
Text::DistanceFieldGlyphCache which wasn't movable due to this and one
other problem.
Partially needed to avoid build breakages because Corrade itself
switched as well, partially because a cleanup is always good. Done
except for (STL-heavy) code that's deprecated or SceneGraph-related APIs
that are still quite full of STL as well.
With the workarounds moved to the GL::Shader class itself, it's just a
complicated wrapper for adding the compatibility.glsl file and a rather
strange way to define a file-local helper for resource import on static
builds. Do that directly instead.
So one can directly read it back on GLES without having to wrap the
texture in a framebuffer again.
This change also puts the framebuffer completeness check *before* the
clear() and bind() which makes it no longer emit a GL error. The error
is still silent though, which isn't nice. Gotta fix that eventually as
well.
Mainly important for Shader::addSource() to prevent it from creating a
needless copy, but doesn't hurt to do the same also for
uniformLocation(), bindAttributeLocation() etc. -- it'll avoid a runtime
strlen() in that case at least.
Same as in the previous commit, most cases are inputs so a StringStl.h
compatibility include will do, the only breaking change is
GL::Shader::sources() which now returns a StringIterable instead of a
std::vector<std::string> (ew).
Awesome about this whole thing is that The Shader API now allows
creating a shader from sources coming either from string view literals
or Utility::Resource completely without having to allocate any strings
internally, because all those can be just non-owning references wrapped
with String::nullTerminatedGlobalView(). The only parts which aren't
references are the #line markers, but (especially on 64bit) those can
easily fit into the 22-byte (or 10-byte on 32bit) SSO storage.
Also, various Shader constructors and assignment operators had to be
deinlined in order to avoid having to include the String header, which
would be needed for Array destruction during a move.
Co-authored-by: Hugo Amiard <hugo.amiard@wonderlandengine.com>
Consistently with checkLink(), this avoids having to explicitly include
both Iterable and Reference in shader code. Alsod allowing people to
have direct arrays of shaders, runtime-sized lists of shaders etc.
A compat include is provided on a deprecated build to avoid breaking
existing code.
There's no reason for those to exist anymore -- origiinally they were
added in a hopeful attempt to make use of parallel shader compilation,
but in practice that meant compiling at most two or three shaders at
once and still stalling until that was done, so not that great at all.
The new APIs provide much better opportunities for parallelism.
Fun fact:
CORRADE_INTERNAL_ASSERT_OUTPUT(vert.compile() && frag.compile());
is actually one character shorter than
CORRADE_INTERNAL_ASSERT_OUTPUT(GL::Shader::compile({vert, frag}));
so not even typing convenience would be a reason to keep these.
What's left is *a lot* of places taking monstrous
std::vector<std::reference_wrapper> and that can't be changed to
std::vector<Containers::Reference> in a source-compatible way. Even that
would be only a temporary change, since the goal is to fully avoid
dependency on STL in those cases.
The final version of these APIs should take
Containers::ArrayView<Containers::Reference> and be implicitly
convertible froom e.g. std::vector<Containers::Reference>. That's
definitely possible, but not in time for 2019.01, so instead of forcing
users to temporary pass a `{vec.begin(), vec.size()}` everywhere instead
of just `vec`, I'm rather keeping these APIs intact.
The nested for loop is a big problem. Worked around this by putting a
fixed upper bound and some `break`s. This might result in the code
being slower on desktop drivers, needs to be redone from scratch later
by generating the code directly.
Even this minor change caused Mesa drivers to output a slightly
different file. Test output is verbatim below:
============================================================================
FAIL [1] test() at
../src/Magnum/TextureTools/Test/DistanceFieldGLTest.cpp on line 107
Images actualOutputImage and
Utility::Directory::join(DISTANCEFIELDGLTEST_FILES_DIR, "output.tga")
have both max and mean delta above threshold, actual 1/0.000488281 but
at most 0/0 expected. Delta image:
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| M |
| |
| |
| M |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
Pixels above max/mean threshold:
[16,41] Vector(175), expected Vector(174) (Δ = 1)
[46,35] Vector(175), expected Vector(174) (Δ = 1)