It's a bit involved as we need to ensure that gl.Context.current doesn't
outline the Application instance, so we need to:
- remember the Application object when it gets constructed (and clear
it again when it gets destructed)
- in gl.Context.current check if there's an active Application (which
means sharing data across two different Python modules, and even
though pybind11 docs suggest to "simply export a symbol", this
*cannot* possibly work in practice; instead we share data using a
Python capsule), and increase its refcount when returning the Context
instance
- decrease the Application refcount again when the Context gets
destructed
This is so ugly it's beautiful. The translation needed a metaclass to
work properly, but the undoubtedly worst/best is making those exposed
nicely in the docs.
This makes Vector3 to np.array conversion about 20x faster. Yes, *that*
much. Crazy. Timings from the benchmark added in previous commit before:
np.array([]) 0.66096 µs
np.array([1.0, 2.0, 3.0]) 0.70623 µs
a = array.array("f", [1.0, 2.0, 3.0]); np.array(a) 0.57877 µs
a = Vector3(1.0, 2.0, 3.0); np.array(a) 18.18542 µs
after:
np.array([]) 0.57162 µs
np.array([1.0, 2.0, 3.0]) 0.68309 µs
a = array.array("f", [1.0, 2.0, 3.0]); np.array(a) 0.53958 µs
a = Vector3(1.0, 2.0, 3.0); np.array(a) 0.74818 µs
There's still some overhead that could be removed I think, making the
Vector3-to-numpy conversion faster than list-to-numpy.
Lots of optimization opportunities here. In particular, the conversion
of Vector3 to np.array is *crazy slow*, turns out to be caused mainly by
the overhead of exception throwing in pybind11. In case of Matrix3 to
np.array conversion there's no such overhead because the buffer protocol
takes care of that.
Another thing is that pybind11 buffer protocol interface has a
relatively large overhead compared to e.g. python's own array.array. I
blame the unneded allocations.