The test passes now. This made the benchmark run significantly faster.
~200 µs instead of 670. Are clears really taking up more time than all
those texture fetches? Strange.
Just cannot use gl_FragCoord in here. That's it, that's the fix. What's
however COMPLETELY unexpected is that this simple change made the
process significantly faster on my Intel GPU, from ~815 µs to 670! I
can't even pretend I understand what's going on here, but maybe doing
less math in the fragment shader when calculating the texture
coordinates (and thus possibly the driver having a better idea how to
prefetch or schedule?) is what made this faster? Or maybe it's due to
one uniform input less, and two interpolated values instead of four on
the way from the vertex shader?
The original implementation wrongly assumed that the input and output
pixel centers align, which would only be a case if the ratio of the
input and output sizes would be odd. Which it in practice isn't, usually
it's a 1024x1024 texture scaled down to 128x128 or something like that.
The flipped test cases added in the previous commit now pass.
According to the benchmark, the new code is very slightly slower (~815
µs vs ~805 before). The new code isn't really more complex than the old
one, it just does slightly different work -- there are new corner case
in the initial logic for marking the pixel inside or outside, on the
other hand some corner cases that had to be handled in the previous case
are no longer a thing.
If it's not, it's a programmer error (i.e., don't use Luminance or
packed formats, won't work), and since there's no way for the API to
report a failure in a programmatic way, this was causing hard-to-track
errors.
No reason not to, even though the move is destructive. Also unblocks
Text::DistanceFieldGlyphCache which wasn't movable due to this and one
other problem.
Partially needed to avoid build breakages because Corrade itself
switched as well, partially because a cleanup is always good. Done
except for (STL-heavy) code that's deprecated or SceneGraph-related APIs
that are still quite full of STL as well.
With the workarounds moved to the GL::Shader class itself, it's just a
complicated wrapper for adding the compatibility.glsl file and a rather
strange way to define a file-local helper for resource import on static
builds. Do that directly instead.
So one can directly read it back on GLES without having to wrap the
texture in a framebuffer again.
This change also puts the framebuffer completeness check *before* the
clear() and bind() which makes it no longer emit a GL error. The error
is still silent though, which isn't nice. Gotta fix that eventually as
well.
Mainly important for Shader::addSource() to prevent it from creating a
needless copy, but doesn't hurt to do the same also for
uniformLocation(), bindAttributeLocation() etc. -- it'll avoid a runtime
strlen() in that case at least.
Same as in the previous commit, most cases are inputs so a StringStl.h
compatibility include will do, the only breaking change is
GL::Shader::sources() which now returns a StringIterable instead of a
std::vector<std::string> (ew).
Awesome about this whole thing is that The Shader API now allows
creating a shader from sources coming either from string view literals
or Utility::Resource completely without having to allocate any strings
internally, because all those can be just non-owning references wrapped
with String::nullTerminatedGlobalView(). The only parts which aren't
references are the #line markers, but (especially on 64bit) those can
easily fit into the 22-byte (or 10-byte on 32bit) SSO storage.
Also, various Shader constructors and assignment operators had to be
deinlined in order to avoid having to include the String header, which
would be needed for Array destruction during a move.
Co-authored-by: Hugo Amiard <hugo.amiard@wonderlandengine.com>
Consistently with checkLink(), this avoids having to explicitly include
both Iterable and Reference in shader code. Alsod allowing people to
have direct arrays of shaders, runtime-sized lists of shaders etc.
A compat include is provided on a deprecated build to avoid breaking
existing code.
There's no reason for those to exist anymore -- origiinally they were
added in a hopeful attempt to make use of parallel shader compilation,
but in practice that meant compiling at most two or three shaders at
once and still stalling until that was done, so not that great at all.
The new APIs provide much better opportunities for parallelism.
Fun fact:
CORRADE_INTERNAL_ASSERT_OUTPUT(vert.compile() && frag.compile());
is actually one character shorter than
CORRADE_INTERNAL_ASSERT_OUTPUT(GL::Shader::compile({vert, frag}));
so not even typing convenience would be a reason to keep these.
What's left is *a lot* of places taking monstrous
std::vector<std::reference_wrapper> and that can't be changed to
std::vector<Containers::Reference> in a source-compatible way. Even that
would be only a temporary change, since the goal is to fully avoid
dependency on STL in those cases.
The final version of these APIs should take
Containers::ArrayView<Containers::Reference> and be implicitly
convertible froom e.g. std::vector<Containers::Reference>. That's
definitely possible, but not in time for 2019.01, so instead of forcing
users to temporary pass a `{vec.begin(), vec.size()}` everywhere instead
of just `vec`, I'm rather keeping these APIs intact.
The nested for loop is a big problem. Worked around this by putting a
fixed upper bound and some `break`s. This might result in the code
being slower on desktop drivers, needs to be redone from scratch later
by generating the code directly.
Even this minor change caused Mesa drivers to output a slightly
different file. Test output is verbatim below:
============================================================================
FAIL [1] test() at
../src/Magnum/TextureTools/Test/DistanceFieldGLTest.cpp on line 107
Images actualOutputImage and
Utility::Directory::join(DISTANCEFIELDGLTEST_FILES_DIR, "output.tga")
have both max and mean delta above threshold, actual 1/0.000488281 but
at most 0/0 expected. Delta image:
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| M |
| |
| |
| M |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
Pixels above max/mean threshold:
[16,41] Vector(175), expected Vector(174) (Δ = 1)
[46,35] Vector(175), expected Vector(174) (Δ = 1)
103% of use cases use the returned value directly without checking, so
we might as well do the check ourselves. Added new function hasCurrent()
and added deprecated backward-compatibility conversion and -> operators.
Wow, that creeped to a lot of places.
Last dinosaur from the pointer age.
The shader code took the image size from uniform if the GL was older
than version 3.2 (GLSL 1.50). But the shader class was setting that
uniform only if the GL was older than version 3.0. Thus the distance
field converter worked only on GL 2.1 and GL >= 3.2.
This was discovered only by accident, thanks to the quite recent
attempts to create core contexts by default. Because apparently setting
explicit version requirements (core GL 3.1) on AMDwill make
wglCreateContext() stay on that version and not choosing any later
compatible version, similarly to how NVidia behaves. That's another bug
for later.
Before the application was just creating the context the old way, thus
all cards compatible with GL3 were at least on GL 3.2 and thus the
missing uniform setting did not affect anybody, thus the bug was
effectively hidden.
Each class/function that needs to access the resources first checks
whether the group exists and the group is registered if not. Thus there
is now no difference and annoying special cases when using static build.
This was a leftover from some not-well-thought-out design decision. The
function is now used exclusively for binding for draw, as all
framebuffer reading functions (blit(), read()) are doing the read
binding internally. Moreover it required the user to be extra careful on
ES2, because in many cases there are no separate binding points for
reading and drawing.
The function is now parameter-less and always bind the framebuffer for
drawing. The logic for internal binding was also simplified and on ES2
there are separate implementations for single/separate binding points.
For *Framebuffer::checkStatus() the documentation was updated to explain
the meaning of the parameter on ES2 implementation. Also removed the
need for FramebufferTarget::ReadDraw binding, as it was rather
confusing.
Old *Framebuffer::bind(FramebufferTarget) is now just an alias to the
parameter-less function, ignoring the parameter. Along with
FramebufferTarget::ReadDraw it is marked as deprecated and will be
removed in some future release.