Skip to content

Conversation

@seberg
Copy link
Member

@seberg seberg commented Sep 26, 2013

Current status: Everything works. Some reorganizing is likely necessary, but pending input on that.

@njsmith
Copy link
Member

njsmith commented Sep 27, 2013

Fast paths are horrible beasts -- they increase code paths that need
test coverage (probably a bunch of ours actually aren't tested even
since we have no coverage tool to check with), they increase
maintenance burden, often they don't even help, etc.

So just a thought, as you're implementing them, you'll probably want
to be systematically running microbenchmarks to validate each one? And
maybe it would be good to write these microbenchmarks in the form of
vbench benchmarks?

On Thu, Sep 26, 2013 at 4:57 PM, seberg [email protected] wrote:

Current status: Everything works, but cleanup to do, new tests to add and
optimizations to do. If anyone is interested, everything that is not
GetMap/SetMap could already be reviews probably.

The code has a lot of TODOs, some of them mark smaller pending decisions...
Parsing itself should be fine, cleanup mostly for MapIter necessary (also
fixing binary compatibility)
Still needs tests for the new deprecations (non-integer array-likes)
Speed improvements/fast-path necessary for SetMap.
Speed improvements/cleanups possibly for GetMap
New 0-d boolean index support can't be tested/used until deprecation is
done...


You can merge this Pull Request by running

git pull https://github.com/seberg/numpy new-index-machinery

Or view, comment on, or merge it at:

#3798

Commit Summary

ENH: Attempt to rewrite the index parsing.
WIP: handle output dim calculation immidiatly
WIP/BUG: Add missing index DECREF.
WIP: Smaller fixes/moves
WIP: Clean up the error messages
WIP: Always return a view, never self
WIP: Fix error returns up.
MERGE: include old bug fix lost in merge conflict
WIP: Free view correctly for fancy indexing
WIP: Speed improvements by using EXTERNAL_LOOP
WIP: Test fixups<
WIP: Implement fast_take and some other speed things
WIP: Fixups, no-fancy indexes fancy iteration, probably more
WIP: Fixup DeprecationWarnings. Only issue remaining...
WIP,TST: Small fixes for the tests
WIP: Py3k fixes
WIP: Change field check and string check order
WIP,TST: Fixup python3 tests
WIP: really fix the order of hasfields...
WIP,BUG: Do not go to far PyArray_MapIterNext, fix pyobj size
WIP,TST,DOC: tiny fix
WIP: Move Ellipsis check before subclass special case
WIP: Remove debug comments
WIP: Fix mapping.c warnings

File Changes

M numpy/core/include/numpy/ndarraytypes.h (42)
M numpy/core/src/multiarray/mapping.c (2587)
M numpy/core/src/multiarray/mapping.h (5)
M numpy/core/src/multiarray/scalartypes.c.src (29)
M numpy/core/tests/test_deprecations.py (5)
M numpy/core/tests/test_indexing.py (31)
M numpy/core/tests/test_multiarray.py (8)
M numpy/core/tests/test_numerictypes.py (2)
M numpy/core/tests/test_regression.py (7)
M numpy/lib/tests/test_function_base.py (2)

Patch Links:

https://github.com/numpy/numpy/pull/3798.patch
https://github.com/numpy/numpy/pull/3798.diff

@seberg
Copy link
Member Author

seberg commented Oct 19, 2013

OK, there is a test failing, but that is not a big deal. I have now replaced the whole MapIter with one that specializes no-subspace (iterating everything in one, even the operand) and subspace iteration (as nested iters). The new MapIter will do iteration order optimization (if an index is visited twice, the order is not guaranteed) and for getitem will not keep the original array order.

The code is now generally faster for larger arrays, though for very small advanced indexing operation it is around 1.5x slower. And it is a lot faster for larger subspaces and non C-Order arrays. The big issue remaining issue is that shape mismatch errors are even more cryptic then before (they are reported by NpyIter with axis remapping and all...), and I think I need a dedicated function to replace the errors with something human readable.

Anyway, overall, it has converged enough that you can look at it/play with it. Technically I don't think the code will need to change much, unless someone has some comments.

@seberg
Copy link
Member Author

seberg commented Oct 21, 2013

For anyone interested. This has converged quite far. It probably needs a bit readibility fixes for the MapIterNew code, and new tests are still missing. Also the GetMap/SetMap should probably be condensed (could get more inner loop specializations too, but I am not really interested in that right now). BUT: It is definetly in a stage where you can look at it, and should be in a state ready for usage. Numpy tests run through fine, scipy tests run through fine and the broadcasting errors are now nice and clean.

The only disadvantage with the code is that it is actually slower for fancy indexing with less then about 100 elements because of the longer set up time of the NpyIter. I will probably do some timings to show the differences clearer.

@seberg
Copy link
Member Author

seberg commented Oct 25, 2013

I added some fast paths (1-d trivial iteration), and if they can be taken it is insanely fast. Uploaded some timings at http://sebastian.sipsolutions.net/indexing_speedup/. Basically the new stuff is much faster unless buffering is used (or the for smaller arrays the special cases cannot be used, i.e. uncontiguous indexes). Especially non np.intp indexes are slower unless you are indexing with more then around 1000 elements (might be a slowdown on windows maybe, and just realized that the typenum == NPY_INTP might miss np.int64). Still needs tests I admit. The comparisons are a bit unfair due to inner loop optimizations, maybe I will retest with the more fair complex double type (larger itemsize which is not specialized yet)

@pv
Copy link
Member

pv commented Oct 25, 2013

What happens with indexing with integer scalars? Maybe out of scope for this PR, but that case I think has quite severe overheads.

@seberg
Copy link
Member Author

seberg commented Oct 25, 2013

It doesn't have any severe overheads I am aware of (maybe a bit of it was fixed when I sped up the PyArray_IntpFromPyInt parsing function). Indexing with only integers (full integer special case) is a tiny inkling slower I think, simply because the old machinery did a special case early on, but that is neglegible. Slicing is faster. If scalar indexing (assuming non-fancy here, scalar fancy indexes I optimized away) is slow, it probably needs checking if PyArray_GetItem can be improved, but I doubt it.

@pv
Copy link
Member

pv commented Oct 25, 2013

The speedups here look great. What I'm a bit concerned about is the test coverage here. It would be very useful if it's possible to run this through some C code coverage tool to see if all branches are covered.

@seberg
Copy link
Member Author

seberg commented Oct 25, 2013

The current test coverage is actually not too bad, except maybe for things like non-native byte order, object references, etc. and broadcasting in assignments. I know I have to add a bit of tests there. I uploaded a few timings for simple slicing (basically what I said). You are indeed right about one thing. The current master (and probably also 1.8.x) as a performance bug when indexing one_dimenaional_arr[(1,)] (one dim tuple into one dim array, arr[1] is fast), that is fixed of course. Indexing an object array is half as fast as indexing a python list, I am not sure we can expect much more.

@pv
Copy link
Member

pv commented Oct 25, 2013

gcov points out a couple of code paths never run (apart from memory allocation error handling), see gh-3979, but coverage looks quite OK.

@seberg
Copy link
Member Author

seberg commented Oct 25, 2013

Thanks a lot, I will have a look at that and add tests for those things that are missing.

@seberg
Copy link
Member Author

seberg commented Oct 26, 2013

Hmmm, @pv maybe you know it, but your coverage commands don't give any output for me. Just too old gcc (4.7.3)? Am on an ubuntu, so somewhat expected it to just work. EDIT: What I mean is of course that the .gcda gcov files are not created

@seberg
Copy link
Member Author

seberg commented Oct 26, 2013

Nvm that. It ran through fine after I deleted the build directory, dunno maybe the script just searched the wrong paths.

@seberg
Copy link
Member Author

seberg commented Oct 28, 2013

Btw. since 1.8. is soon out ;)... This is pretty much done from my side except some new Deprecation tests. I am sure things will crop up, but it is ready for starting review in case someone wants to read 2500 or so lines of code :) (fortunatly they are very linear, except MapIterNew I think).

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@seberg can you try to keep the existing struct ABI ? If possible, we'd like to keep ABI compatibility in the 1.x branch.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I hope I did keep everything ABI compatible (well everything that is reasonably to use). This is used in two different tests for ufunc.at and another one more specific, but as of yet not tested for ABI, only for API compatibility. I am aware it still needs checking. I think Thaeno is probably the only project using this API (it was only exposed in 1.8. and that is not even officially released yet).

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

this structure was already exposed in at least 1.7 possibly much earlier.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Maybe the structure, but MapIterArray was not. So nobody could have possibly used it before 1.8.x (though I admit in master it has been possibly a year).

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

over which functions is it used in 1.8? I could only find PyArray_MapIterNew which is not exposed.

if it is private we should move it out of the public header into a private one.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

for 1.8 we could at least deprecate PyArrayMapIterObject via comments in the code to avoid that we get new code using it:
What about these:
PyArrayIterObject
PyArrayMultiIterObject
PyArrayNeighborhoodIterObject
can they be deprecated? the first two are probably more commonly used so deprecation of direct access might be premature without existing get/set methods.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I've already tagged 1.8 and am waiting for Ralf to do the binaries. If we get this figured out, we could maybe do a 1.8.1, but I'd rather figure out something for 1.9. This doesn't fall into the category of a bug fix and it isn't clear to me that we yet have a consensus on what should be done, or for that matter, what has already been done. For instance, IIRC, PyArrayNeighborhoodIterObject goes back several releases. All that makes me loathe to stop the 1.8 process at this point.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

yes lets figure something out for 1.9 and not delay 1.8 any longer.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I know, nobody has really time for this probably :). But I just tested the multiarray_tests.test_inplace_add compiled with master and ran it with this branch and it works. So I am pretty confident that it is binary compatible for the usage which makes sense, unless there may be some subtleties on other architectures or compilers or so.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@juliantaylor @seberg

PyArrayIterObject
PyArrayMultiIterObject
PyArrayNeighborhoodIterObject

I haven't checked the others yet, but it looks like PyArrayNeighborhoodIterObject already has functions for accesses, so we could move it into a deprecated file I think. That should not break anyone's code unless we start moving the internals around, at which point folks will probably need to recompile as the functions are inline, i.e., the accesses to struct internals will still be there. So we can guarantee API but not ABI. That is already the case for PyArrayObject, but I don't know if that is widely publicised. It may be that we need to make that a policy, because otherwise we are frozen.

@pv
Copy link
Member

pv commented Oct 28, 2013

@seberg the gcov needs clean rebuild. distutils cant figure out it needs to rebuild itself

@charris
Copy link
Member

charris commented Dec 22, 2013

Time to get this moving again. I think a first step would be to deprecate direct access to the struct internals, but that would also imply having a compatible fall back, which would vitiate this work. Hmm, we really need to know if people are already accessing the internals, and I don't know how to do that without a deprecation. Thoughts?

@seberg
Copy link
Member Author

seberg commented Jan 25, 2014

I wouldn't mind to resurrect this... People most definetly are accessing the internals, however, I hope (and actually think it is most likely) that "people" is limited to thaeno and possibly one more guy and if nobody tries to be overly smart using it then this does not break compatibility (I am sure thaeno is not).

The struct should only be accessed for two reasons (which is how the examples you can find use it) I think 1. to get the operand array (mostly for the dtype) and 2. to check if transposing is necessary. Both of these reasons could easily be moved into a function (or moved into the transpose function itself). However ffor those things I did keep binary compatibility with. I know it is a bit hairy, and we should deprecate the access in any case.

Btw. if someone else feels like starting to review, I could move the branch to the numpy repository so that you can just push some changes directly.

@juliantaylor
Copy link
Contributor

have you already tried running some reverse dependencies against the branch? e.g. pandas, pytables, scipy, h5py

@seberg
Copy link
Member Author

seberg commented Jan 25, 2014

I ran the scipy tests before, it was fine. I just ran pandas and it is fine except for finding a pretty fat bug in this code which is fixed now.

@charris
Copy link
Member

charris commented Jan 30, 2014

@seberg Where is this now? You have quite a few PR's that I'd like to go through, but I'm not sure which ones you consider more or less ready.

@seberg
Copy link
Member Author

seberg commented Jan 30, 2014

This one is good for review. (The doc I didn't update; the npyiter one still needs action; the boolean subtract one mostly needs ping of mailing list again to make sure we really want it I think; the dtype shape also needs decision if it is the right way or not)

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is this consistent with the rules?

In [5]: array(1)[(True,)]
Out[5]: array([1])

In [6]: array(1)[[True]]
---------------------------------------------------------------------------
IndexError                                Traceback (most recent call last)
<ipython-input-6-fc18d566cf7d> in <module>()
----> 1 array(1)[[True]]

IndexError: too many indices for array

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It is a bit curious that scalar booleans return 1D arrays

In [30]: a[True]
Out[30]: array([1])

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hah, I first thought it looks like a bug too, but it is actually meant like that. I agree, it is non-obvious why this should be right, but of course I am convinced it is ;). The is this an example of why:

def filter_subarrays_with_large_sum(array, large):
    """
    arrays : ndarray {..., N}

    out : ndarray {1, N}
    """
    large_sum = array.sum(-1)
    return array[large_sum >= large]

With this change, the result will consistently add one dimension even if array is one dimensional and the 0-d change is the same. No dimension is removed (boolean index is 0-d) but as always one dimension is added. That said, it might be a large change.

@seberg
Copy link
Member Author

seberg commented Feb 7, 2014

Hmmm, didn't look too long, but I don't quite understand the segfault. I can reproduce it in 3.3 (though not with valgrind) and see the backtrace in gdb, but the PyUString_ConcatAndDel(&errmsg, tmp); call has NULL checks in place unless something has a wrong refcount...

@seberg
Copy link
Member Author

seberg commented Feb 7, 2014

Ok, fixed that bug... I forgot to fix the convert_shape_to_string name in the common.h and somehow that destroyed the return pointer.

@SylvainCorlay
Copy link

0d-arrays are treated as very a special case

exp(array(1.0)).__class__        # returns numpy.bool_     instead of numpy.ndarray
logical_not(array(1)).__class__  # returns numpy.float64     " " "     
(array(1.0) > 0).__class__       # returns numpy.bool_       " " " 

For the latter case, the fix seems to allow 0d-arrays to be indexed by booleans rather than fixing the previous behaviors, making 0d arrays even more special.

Would it be possible instead to make generalized functions of 0d arrays return 0d arrays? (See issue #1421)

@seberg
Copy link
Member Author

seberg commented Feb 7, 2014

This does not add a special case for 0-d arrays being indexed, it changes the treatment of boolean scalars in indexing to be more sensible. Sure, changing the ufunc behaviour would remove the necessity to do it, but even then it would make sense in my opinion (why should arr[True] and arr[np.array(True)] do different things).

So in my opinion it is simply a different issue (and one that I still think might be quite hairy and complex to change).

@charris
Copy link
Member

charris commented Feb 9, 2014

If there are no more comments in the next day I'm going to put this in.

@charris
Copy link
Member

charris commented Feb 9, 2014

OK, let's see how this plays. @seberg Adding functions to access the struct internals would be good so we can start down the long road to hiding them.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

7 participants