Skip to content

fix: iswt/idwt performance regression#162

Merged
grlee77 merged 3 commits intoPyWavelets:masterfrom
grlee77:dwt_idwt_speed
Mar 18, 2016
Merged

fix: iswt/idwt performance regression#162
grlee77 merged 3 commits intoPyWavelets:masterfrom
grlee77:dwt_idwt_speed

Conversation

@grlee77
Copy link
Contributor

@grlee77 grlee77 commented Feb 29, 2016

This is an attempt to address the performance regression in #157

Altogether this gives me an approximately 3-fold speedup relative to current master for the test script provided by @zstomp in #157. Roughly half the improvement is from the first commit and most of the rest is from the third.

I am not a big fan of the third commit here, so would appreciate if anyone else has a cleaner solution.

apologies to @kwohlfahrt if you had already started working on this, but perhaps you will have found other areas for improvement that I missed!

probably the most obvious next step to improve perfomance for iswt would be moving the loops over to the C/Cython side as was done for dwtn and idwtn in v0.4.0, but I don't currently have time to work on that.

@kwohlfahrt
Copy link
Member

Good effort, will review tomorrow. I'm not a huge fan of the check_inputs option either (and I like memoryviews) so I'll see what alternatives there are.

Perhaps this should be re-based onto the ASV commit, with some appropriate benchmarks? That would give a nice way of analysing the performance impact.

I'd started working on moving SWT to C, and decided I should get benchmarks running before that and then got sucked into the setuptools hole. It's still planned.

@zstomp
Copy link

zstomp commented Mar 1, 2016

Would it work (and help) if only first array element is checked with np.iscomplexobj()?

@kwohlfahrt
Copy link
Member

According to the numpy docs, np.iscomplexobj only checks the dtype, not the actual values. So I'd be surprised if this makes a difference.

@grlee77
Copy link
Contributor Author

grlee77 commented Mar 1, 2016

when I did line profiling, it did not seem like np.iscomplexobj checks were the primary source of the slowdown, but were a contributing factor.

I still have a line profiling output up on the screen from after the first commit above and the np.iscomplexobj call accounted for 11.8% of the total execution time in idwt. Each np.asarray call was 6-7%, each _check_dtype ~3%, and the various conditionals such as if cA is not None statements at <2% each. Adding all those up and the time actually spent within idwt_single was only 38.8% of the total execution time for idwt.

The code now has new features relative to v0.2.2 and those are not entirely without cost:

  • it works for more input dtypes (integer, complex, etc)
  • transforms along a specific axis are supported

@grlee77
Copy link
Contributor Author

grlee77 commented Mar 1, 2016

Another micro-optimization would be to move the axis checks into the else case of the if ndim == 1 statement. However, this would break one of the current tests which expects calling a idwt on a 1D array with an invalid axis of 1 to raise a ValueError.

@grlee77
Copy link
Contributor Author

grlee77 commented Mar 1, 2016

@kwohlfahrt
no guarantees, but If I find some time later this evening I will see about rebasing onto the ASV branch.

@zstomp
Copy link

zstomp commented Mar 1, 2016

The reason I'm asking is that when I profiled, the iscomplexobj call took about 30% of iswt time, and it looked as if the slice in iswt was affecting it (I didn't notice any significant change with other occurrences of iscomplexobj). So I am not sure if iscomplexobj can check the dtype of the whole slice or it goes through its elements.

@grlee77
Copy link
Contributor Author

grlee77 commented Mar 2, 2016

Here is an ASV plot of the iswt benchmark proposed in #158. These plots show 4 commits with the leftmost being current master (10b13cb) and the next three are the 3 commits proposed above. The biggest gain which (for both Python 2 and 3) is due to the first commit above. For some reason the third commit above has a substantial benefit on Python 3, but not so much on 2.

Here is test_idswton a linear scale
idswt_linscale

Here is test_idswt again, but on a log scale:
idswt_log_scale

The first commit here also improves the test_idwt benchmark and the test_dwt benchmarks as well (the last two commits don't touch code relative to those functions and can be ignored below). There is no difference on any of the commits for test_idwtn or test_dwtn or test_swt.

Here is the dwt result on a linear scale:
dwt_lin

and idwt on a linear scale:
idwt_lin

Based on these results it seems avoiding declaring as a memory view followed by np.asarray() has a pretty substantial benefit. I don't think there is a strong need for memory views as we are not using any fancy indexing or anything in the cython code for idwt or dwt.

@grlee77
Copy link
Contributor Author

grlee77 commented Mar 2, 2016

I rebased off of #158 to generate the figures above, but haven't pushed the rebased branch here. I can do so if you think it is useful, but it is trivial to do the rebase on your end if you want to test it out since there are no overlapping changes between the branches.

@kwohlfahrt
Copy link
Member

Damn. Oh well, looks like the first commit is too big to ignore. If I find a way to use memoryviews that is equally fast I'll do that, but I don't have time to look into it at the moment. The second one is sensible even if it doesn't give much performance benefit, so I'm happy to merge that too.

Still not a fan of the last one, I really don't like making dwt less neat to work around swt not being properly implemented as an independent transform. Might be best to just live with it being a little slower on python3 than it could be for now?

@grlee77 grlee77 mentioned this pull request Mar 3, 2016
@grlee77
Copy link
Contributor Author

grlee77 commented Mar 4, 2016

Okay. I rebased without the 3rd commit


cpdef idwt_single(data_t[::1] cA, data_t[::1] cD, Wavelet wavelet, MODE mode):
cdef data_t[::1] rec
cpdef idwt_single(np.ndarray cA, np.ndarray cD, Wavelet wavelet, MODE mode):
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can you double check whether there is any performance impact to using a memoryview for the inputs? The advantage is that Cython will automatically check to make sure the dtypes match.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actually, I've just checked this and it looks like idwt_single currently works correctly with different types for the input arrays. Which according to the documentation should not be the case.

Edit: Missed a bit - it says it can choose the biggest corresponding numerical type here. I'm surprised that applies to arrays as well!

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If I recall correctly, it used to be (prior to Cython 0.21 or so) that using fused types created code for all possible combinatorial combinations of the input types, but usually that wasn't desired so it was changed to the present behavior.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I guess that if the types are already being checked then I can remove the cA.dtype != cD.dtype check I had added.

I will check later this evening or weekend if changing just the inputs back to memory views has any effect.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I have Cython 0.23 installed, which is why I am confused that the following works:

>>> import pywt, numpy as np
>>> w = pywt.Wavelet('haar')
>>> m = pywt.Modes.zero
>>> a = np.arange(10, dtype='double') ** 2
>>> cA, cD = pywt.dwt(a, w, m)
>>> pywt._extensions._dwt.idwt_single(cA.astype('float'), cD.astype('double'), w, m)
<MemoryView of 'ndarray' at 0x7f92de51c398>
>>> np.asarray(_)
array([  0.,   1.,   4.,   9.,  16.,  25.,  36.,  49.,  64.,  81.])
>>> a
array([  0.,   1.,   4.,   9.,  16.,  25.,  36.,  49.,  64.,  81.])

My impression was that it shouldn't, and if something was going wrong (i.e. a buffer of floats interpreted as doubles) the result should be incorrect.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I tried just using a memoryview for the inputs to idwt_single, but leaving the result as np.ndarray, but this did negated a large portion of the speedup (final commit of the plot below):

For time_idwt
memoryview_revert

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Huh, very odd. Leave it in then, I'll investigate it later. Thanks for checking it out though.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Also, I figure out why your example above succeeds. .astype('float') and .astype('double') both give arrays with dtype as np.float64 since the default float is 64-bit in python.

If I modified the call as follows we see the expected failure:

pywt._extensions._dwt.idwt_single(cA.astype(np.float32), cD.astype('double'), w, m)
  File "pywt/_extensions/_dwt.pyx", line 101, in pywt._extensions._dwt.idwt_single (pywt/_extensions/_dwt.c:5454)

  File "pywt/_extensions/_dwt.pyx", line 112, in pywt._extensions._dwt.idwt_single (pywt/_extensions/_dwt.c:4961)

ValueError: Coefficients arrays must have the same dtype.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ah, I'm an idiot. Mystery solved!

@grlee77
Copy link
Contributor Author

grlee77 commented Mar 8, 2016

The last commit removes a bit of yellow from the cythonize -a HTML annotations for _dwt.pyx.

@kwohlfahrt
Copy link
Member

OK, +1 from me. Making a note to investigate memoryviews & efficiency later (after we've sorted out the benchmarking issue).

grlee77 added a commit that referenced this pull request Mar 18, 2016
fix:  iswt/idwt performance regression
@grlee77 grlee77 merged commit 416c697 into PyWavelets:master Mar 18, 2016
@grlee77 grlee77 added this to the v0.5.0 milestone Oct 6, 2016
@grlee77 grlee77 deleted the dwt_idwt_speed branch October 8, 2016 21:16
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants