fix: iswt/idwt performance regression by grlee77 · Pull Request #162 · PyWavelets/pywt

grlee77 · 2016-02-29T20:21:01Z

This is an attempt to address the performance regression in #157

Altogether this gives me an approximately 3-fold speedup relative to current master for the test script provided by @zstomp in #157. Roughly half the improvement is from the first commit and most of the rest is from the third.

I am not a big fan of the third commit here, so would appreciate if anyone else has a cleaner solution.

apologies to @kwohlfahrt if you had already started working on this, but perhaps you will have found other areas for improvement that I missed!

probably the most obvious next step to improve perfomance for iswt would be moving the loops over to the C/Cython side as was done for dwtn and idwtn in v0.4.0, but I don't currently have time to work on that.

kwohlfahrt · 2016-02-29T22:19:19Z

Good effort, will review tomorrow. I'm not a huge fan of the check_inputs option either (and I like memoryviews) so I'll see what alternatives there are.

Perhaps this should be re-based onto the ASV commit, with some appropriate benchmarks? That would give a nice way of analysing the performance impact.

I'd started working on moving SWT to C, and decided I should get benchmarks running before that and then got sucked into the setuptools hole. It's still planned.

zstomp · 2016-03-01T12:49:20Z

Would it work (and help) if only first array element is checked with np.iscomplexobj()?

kwohlfahrt · 2016-03-01T13:50:40Z

According to the numpy docs, np.iscomplexobj only checks the dtype, not the actual values. So I'd be surprised if this makes a difference.

grlee77 · 2016-03-01T14:55:48Z

when I did line profiling, it did not seem like np.iscomplexobj checks were the primary source of the slowdown, but were a contributing factor.

I still have a line profiling output up on the screen from after the first commit above and the np.iscomplexobj call accounted for 11.8% of the total execution time in idwt. Each np.asarray call was 6-7%, each _check_dtype ~3%, and the various conditionals such as if cA is not None statements at <2% each. Adding all those up and the time actually spent within idwt_single was only 38.8% of the total execution time for idwt.

The code now has new features relative to v0.2.2 and those are not entirely without cost:

it works for more input dtypes (integer, complex, etc)
transforms along a specific axis are supported

grlee77 · 2016-03-01T14:57:51Z

Another micro-optimization would be to move the axis checks into the else case of the if ndim == 1 statement. However, this would break one of the current tests which expects calling a idwt on a 1D array with an invalid axis of 1 to raise a ValueError.

grlee77 · 2016-03-01T15:01:28Z

@kwohlfahrt
no guarantees, but If I find some time later this evening I will see about rebasing onto the ASV branch.

zstomp · 2016-03-01T15:13:11Z

The reason I'm asking is that when I profiled, the iscomplexobj call took about 30% of iswt time, and it looked as if the slice in iswt was affecting it (I didn't notice any significant change with other occurrences of iscomplexobj). So I am not sure if iscomplexobj can check the dtype of the whole slice or it goes through its elements.

grlee77 · 2016-03-02T22:05:07Z

Here is an ASV plot of the iswt benchmark proposed in #158. These plots show 4 commits with the leftmost being current master (10b13cb) and the next three are the 3 commits proposed above. The biggest gain which (for both Python 2 and 3) is due to the first commit above. For some reason the third commit above has a substantial benefit on Python 3, but not so much on 2.

Here is test_idswton a linear scale

Here is test_idswt again, but on a log scale:

The first commit here also improves the test_idwt benchmark and the test_dwt benchmarks as well (the last two commits don't touch code relative to those functions and can be ignored below). There is no difference on any of the commits for test_idwtn or test_dwtn or test_swt.

Here is the dwt result on a linear scale:

and idwt on a linear scale:

Based on these results it seems avoiding declaring as a memory view followed by np.asarray() has a pretty substantial benefit. I don't think there is a strong need for memory views as we are not using any fancy indexing or anything in the cython code for idwt or dwt.

grlee77 · 2016-03-02T22:08:30Z

I rebased off of #158 to generate the figures above, but haven't pushed the rebased branch here. I can do so if you think it is useful, but it is trivial to do the rebase on your end if you want to test it out since there are no overlapping changes between the branches.

kwohlfahrt · 2016-03-03T00:00:36Z

Damn. Oh well, looks like the first commit is too big to ignore. If I find a way to use memoryviews that is equally fast I'll do that, but I don't have time to look into it at the moment. The second one is sensible even if it doesn't give much performance benefit, so I'm happy to merge that too.

Still not a fan of the last one, I really don't like making dwt less neat to work around swt not being properly implemented as an independent transform. Might be best to just live with it being a little slower on python3 than it could be for now?

grlee77 · 2016-03-04T00:44:42Z

Okay. I rebased without the 3rd commit

…of dwt_single or idwt_single

… reduce overhead

kwohlfahrt · 2016-03-04T10:19:45Z

pywt/_extensions/_dwt.pyx


-cpdef idwt_single(data_t[::1] cA, data_t[::1] cD, Wavelet wavelet, MODE mode):
-    cdef data_t[::1] rec
+cpdef idwt_single(np.ndarray cA, np.ndarray cD, Wavelet wavelet, MODE mode):


Can you double check whether there is any performance impact to using a memoryview for the inputs? The advantage is that Cython will automatically check to make sure the dtypes match.

Actually, I've just checked this and it looks like idwt_single currently works correctly with different types for the input arrays. Which according to the documentation should not be the case.

Edit: Missed a bit - it says it can choose the biggest corresponding numerical type here. I'm surprised that applies to arrays as well!

If I recall correctly, it used to be (prior to Cython 0.21 or so) that using fused types created code for all possible combinatorial combinations of the input types, but usually that wasn't desired so it was changed to the present behavior.

I guess that if the types are already being checked then I can remove the cA.dtype != cD.dtype check I had added.

I will check later this evening or weekend if changing just the inputs back to memory views has any effect.

I have Cython 0.23 installed, which is why I am confused that the following works:

>>> import pywt, numpy as np >>> w = pywt.Wavelet('haar') >>> m = pywt.Modes.zero >>> a = np.arange(10, dtype='double') ** 2 >>> cA, cD = pywt.dwt(a, w, m) >>> pywt._extensions._dwt.idwt_single(cA.astype('float'), cD.astype('double'), w, m) <MemoryView of 'ndarray' at 0x7f92de51c398> >>> np.asarray(_) array([ 0., 1., 4., 9., 16., 25., 36., 49., 64., 81.]) >>> a array([ 0., 1., 4., 9., 16., 25., 36., 49., 64., 81.])

My impression was that it shouldn't, and if something was going wrong (i.e. a buffer of floats interpreted as doubles) the result should be incorrect.

I tried just using a memoryview for the inputs to idwt_single, but leaving the result as np.ndarray, but this did negated a large portion of the speedup (final commit of the plot below):

For time_idwt

Huh, very odd. Leave it in then, I'll investigate it later. Thanks for checking it out though.

Also, I figure out why your example above succeeds. .astype('float') and .astype('double') both give arrays with dtype as np.float64 since the default float is 64-bit in python.

If I modified the call as follows we see the expected failure:

pywt._extensions._dwt.idwt_single(cA.astype(np.float32), cD.astype('double'), w, m)

File "pywt/_extensions/_dwt.pyx", line 101, in pywt._extensions._dwt.idwt_single (pywt/_extensions/_dwt.c:5454) File "pywt/_extensions/_dwt.pyx", line 112, in pywt._extensions._dwt.idwt_single (pywt/_extensions/_dwt.c:4961) ValueError: Coefficients arrays must have the same dtype.

Ah, I'm an idiot. Mystery solved!

…ient length attributes in C function calls

grlee77 · 2016-03-08T23:22:12Z

The last commit removes a bit of yellow from the cythonize -a HTML annotations for _dwt.pyx.

kwohlfahrt · 2016-03-10T10:05:09Z

OK, +1 from me. Making a note to investigate memoryviews & efficiency later (after we've sorted out the benchmarking issue).

fix: iswt/idwt performance regression

grlee77 mentioned this pull request Mar 3, 2016

Add ASV #158

Closed

grlee77 force-pushed the dwt_idwt_speed branch from d78a620 to 606a923 Compare March 4, 2016 00:43

grlee77 added 2 commits March 3, 2016 19:46

MAINT: To improve performance, don't use memoryviews for the outputs …

68f0f99

…of dwt_single or idwt_single

MAINT: iswt, iswt2. convert to Wavelet object outside of the loops to…

7c34712

… reduce overhead

grlee77 force-pushed the dwt_idwt_speed branch from 606a923 to 7c34712 Compare March 4, 2016 00:46

kwohlfahrt reviewed Mar 4, 2016
View reviewed changes

MAINT: In dwt_single and idwt_single, use the existing size_t coeffic…

9f401ad

…ient length attributes in C function calls

grlee77 added a commit that referenced this pull request Mar 18, 2016

Merge pull request #162 from grlee77/dwt_idwt_speed

416c697

fix: iswt/idwt performance regression

grlee77 merged commit 416c697 into PyWavelets:master Mar 18, 2016

grlee77 added this to the v0.5.0 milestone Oct 6, 2016

grlee77 deleted the dwt_idwt_speed branch October 8, 2016 21:16

grlee77 mentioned this pull request Nov 28, 2016

ENH: speed up upfirdn and resample_poly for n-dimensional arrays scipy/scipy#6810

Merged

grlee77 mentioned this pull request Dec 18, 2016

Performance degradation between 0.2.2, 0.3.0, and 0.4.0 #157

Closed

Uh oh!

Conversation

grlee77 commented Feb 29, 2016

Uh oh!

kwohlfahrt commented Feb 29, 2016

Uh oh!

zstomp commented Mar 1, 2016

Uh oh!

kwohlfahrt commented Mar 1, 2016

Uh oh!

grlee77 commented Mar 1, 2016

Uh oh!

grlee77 commented Mar 1, 2016

Uh oh!

grlee77 commented Mar 1, 2016

Uh oh!

zstomp commented Mar 1, 2016

Uh oh!

grlee77 commented Mar 2, 2016

Uh oh!

grlee77 commented Mar 2, 2016

Uh oh!

kwohlfahrt commented Mar 3, 2016

Uh oh!

grlee77 commented Mar 4, 2016

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

grlee77 commented Mar 8, 2016

Uh oh!

kwohlfahrt commented Mar 10, 2016

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants