Lots of optimization opportunities here. In particular, the conversion
of Vector3 to np.array is *crazy slow*, turns out to be caused mainly by
the overhead of exception throwing in pybind11. In case of Matrix3 to
np.array conversion there's no such overhead because the buffer protocol
takes care of that.
Another thing is that pybind11 buffer protocol interface has a
relatively large overhead compared to e.g. python's own array.array. I
blame the unneded allocations.