This is so ugly it's beautiful. The translation needed a metaclass to
work properly, but the undoubtedly worst/best is making those exposed
nicely in the docs.
Compared to having to subclass every type that can reference external
data, this has several advantages for 3rd party binding code:
* it doesn't need to worry about the additional type when binding
function arguments (currently it had to provide lambdas that accept
the PyFoo subtype instead of just Foo)
* and it can now easily bind those types also for function
return values and properties -- the return type doesn't need to be
subclassed (which in case of move-only types is practically
impossible) but instead just wrapped in a holder along with the
memory owner object reference
The new holders also assert that memory owner is always specified unless
the data is empty.
The order should be (and now is):
1. magnum's own conversion constructors (double from integer and such)
2. stuff like implicit color3 -> color4, if applicable
3. buffer protocol constructors
4. general "init from a tuple" constructors last, because they're the
slowest
This makes Vector3 to np.array conversion about 20x faster. Yes, *that*
much. Crazy. Timings from the benchmark added in previous commit before:
np.array([]) 0.66096 µs
np.array([1.0, 2.0, 3.0]) 0.70623 µs
a = array.array("f", [1.0, 2.0, 3.0]); np.array(a) 0.57877 µs
a = Vector3(1.0, 2.0, 3.0); np.array(a) 18.18542 µs
after:
np.array([]) 0.57162 µs
np.array([1.0, 2.0, 3.0]) 0.68309 µs
a = array.array("f", [1.0, 2.0, 3.0]); np.array(a) 0.53958 µs
a = Vector3(1.0, 2.0, 3.0); np.array(a) 0.74818 µs
There's still some overhead that could be removed I think, making the
Vector3-to-numpy conversion faster than list-to-numpy.
Lots of optimization opportunities here. In particular, the conversion
of Vector3 to np.array is *crazy slow*, turns out to be caused mainly by
the overhead of exception throwing in pybind11. In case of Matrix3 to
np.array conversion there's no such overhead because the buffer protocol
takes care of that.
Another thing is that pybind11 buffer protocol interface has a
relatively large overhead compared to e.g. python's own array.array. I
blame the unneded allocations.