Basically using the same idea as with the discrete version -- having the
second dimension dynamic, together with restricting the implementation to
just Float and Double.
According to the SubdivideRemoveDuplicatesBenchmark, this makes the
implementation slightly slower. I presume this is due to how minmax and
offsets are calculated which is quite cache-inefficient as it goes over
the same memory block multiple times. Added a TODO for later.