Just cannot use gl_FragCoord in here. That's it, that's the fix. What's
however COMPLETELY unexpected is that this simple change made the
process significantly faster on my Intel GPU, from ~815 µs to 670! I
can't even pretend I understand what's going on here, but maybe doing
less math in the fragment shader when calculating the texture
coordinates (and thus possibly the driver having a better idea how to
prefetch or schedule?) is what made this faster? Or maybe it's due to
one uniform input less, and two interpolated values instead of four on
the way from the vertex shader?
The original implementation wrongly assumed that the input and output
pixel centers align, which would only be a case if the ratio of the
input and output sizes would be odd. Which it in practice isn't, usually
it's a 1024x1024 texture scaled down to 128x128 or something like that.
The flipped test cases added in the previous commit now pass.
According to the benchmark, the new code is very slightly slower (~815
µs vs ~805 before). The new code isn't really more complex than the old
one, it just does slightly different work -- there are new corner case
in the initial logic for marking the pixel inside or outside, on the
other hand some corner cases that had to be handled in the previous case
are no longer a thing.
I went through renaming this on many places quite some time ago, but
this one slipped through. Now that UBOs will be a thing, rename to
EXPLICIT_BINDING instead of EXPLICIT_UNIFORM_BINDING.
*Finally* having consistent output on desktop, ES1, ES2, WebGL 1 and
WebGL 2, while also cutting 40% off the processing time. For the record,
the benchmark took 2.3 ms before, now it's 1.4.
The nested for loop is a big problem. Worked around this by putting a
fixed upper bound and some `break`s. This might result in the code
being slower on desktop drivers, needs to be redone from scratch later
by generating the code directly.
Even this minor change caused Mesa drivers to output a slightly
different file. Test output is verbatim below:
============================================================================
FAIL [1] test() at
../src/Magnum/TextureTools/Test/DistanceFieldGLTest.cpp on line 107
Images actualOutputImage and
Utility::Directory::join(DISTANCEFIELDGLTEST_FILES_DIR, "output.tga")
have both max and mean delta above threshold, actual 1/0.000488281 but
at most 0/0 expected. Delta image:
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| M |
| |
| |
| M |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
Pixels above max/mean threshold:
[16,41] Vector(175), expected Vector(174) (Δ = 1)
[46,35] Vector(175), expected Vector(174) (Δ = 1)
GL 3.2 has texelFetch() and layout(pixel_center_integer), which means
that we integer coordinates with no precision loss when addressing
individual pixels in the source texture. In the versions before we have
to craft floating-point coordinates for texture() to grab the value of
wanted pixel with no jumping around or interpolation.
This change improves the behavior *a bit*, but not fully. I'm postponing
this to the point when I have an unit test that compares the output with
ground truth.
For some reason this was causing the inner for cycle to loop
indefinitely on AMD cards. Not a problem or NVidia drivers, Intel
Windows drivers or Mesa. Thanks a lot to @LB-- for the investigation.
Everything what was in src/ is now in src/Corrade, everything from
src/Plugins is now in src/MagnumPlugins, everything from external/ is in
src/MagnumExternal. Added new CMakeLists.txt file and updated the other
ones for the moves, no other change was made. If MAGNUM_BUILD_DEPRECATED
is set, everything compiles and installs like previously except for the
plugins, which are now in MagnumPlugins and not in Magnum/Plugins.
* Older GLSL doesn't have texelFetch() and related things, working
around it by using classical texture() and normalized floating-point
coordinates. But that needs to have Texture::imageSize() passed,
which is not available in OpenGL ES, thus the user must specify it
explicitly there. On desktop OpenGL that parameter is ignored.
* Older GLSL doesn't have gl_VertexID, thus vertex buffer must be
created and vertex data passed expliticly.
* GLSL ES 2.0 doesn't have one-component texture format and
TextureFormat::Luminance probably isn't renderable anywhere, thus
TextureFormat::RGB should be used, although it is inefficient.
* Checking for framebuffer completeness, if not complete, nothing is
done.
* Re-eabled building of TextureTools library in all ES PKGBUILDs.