The test passes now. This made the benchmark run significantly faster.
~200 µs instead of 670. Are clears really taking up more time than all
those texture fetches? Strange.
Just cannot use gl_FragCoord in here. That's it, that's the fix. What's
however COMPLETELY unexpected is that this simple change made the
process significantly faster on my Intel GPU, from ~815 µs to 670! I
can't even pretend I understand what's going on here, but maybe doing
less math in the fragment shader when calculating the texture
coordinates (and thus possibly the driver having a better idea how to
prefetch or schedule?) is what made this faster? Or maybe it's due to
one uniform input less, and two interpolated values instead of four on
the way from the vertex shader?
The original implementation wrongly assumed that the input and output
pixel centers align, which would only be a case if the ratio of the
input and output sizes would be odd. Which it in practice isn't, usually
it's a 1024x1024 texture scaled down to 128x128 or something like that.
The flipped test cases added in the previous commit now pass.
According to the benchmark, the new code is very slightly slower (~815
µs vs ~805 before). The new code isn't really more complex than the old
one, it just does slightly different work -- there are new corner case
in the initial logic for marking the pixel inside or outside, on the
other hand some corner cases that had to be handled in the previous case
are no longer a thing.
If it's not, it's a programmer error (i.e., don't use Luminance or
packed formats, won't work), and since there's no way for the API to
report a failure in a programmatic way, this was causing hard-to-track
errors.
And the thing that changed for SwiftShader 4.1 was added EXT_texture_rg
support, which made the test work. So no, this was not a SwiftShader bug
at all.
Use a different overload to not have to create a temp framebuffer every
time, move also the error checking outside the loop and increase the
iteration count to actually have a result that isn't wildly different
every time.
No reason not to, even though the move is destructive. Also unblocks
Text::DistanceFieldGlyphCache which wasn't movable due to this and one
other problem.
So one can directly read it back on GLES without having to wrap the
texture in a framebuffer again.
This change also puts the framebuffer completeness check *before* the
clear() and bind() which makes it no longer emit a GL error. The error
is still silent though, which isn't nice. Gotta fix that eventually as
well.
Fully passes only on desktop and ES3 (Mesa), expecting minor differences
onother GPUs. ES2 is slightly broken and needs fixing; doesn't even
compile on WebGL 1 and causes a serious GPU stall on WebGL 2 -- in both
causes caused by the unbounded nested loops. Rendering doesn't work on
WebGL 1 at the moment, since luminance formats are not renderable. And
for a RGBA output format I would need some utility to get rid of the
extra channels in order to pass the comparison.
Lots of work to do here.