-
-
Notifications
You must be signed in to change notification settings - Fork 2k
Update scipy hook to support 1.0.0 revision #3048
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
|
Test failures due to #3045. |
|
The Appveyor tests may also fail because they ran for over an hour, which seemed to be a hard limit for my account. When I forcibly reran the Python2.7 Library tests, it succeeded but took ~55 minutes. |
|
That's unreasonable. The current test takes 16 minutes; that would mean that the scipy test alone takes ~40 minutes. If that's happening, something is wrong. |
|
I'm referring to the configuration with TEST_LIBRARIES=YES. In the other configuration, they do take about 15 minutes. There aren't many timestamps in the Appveyor build-log, but here's the link: According to the "slowest tests" table at the end the scipy tests took around two minutes a piece, and there are only four scipy tests. It's running all the library tests, though, which is around 150 tests. If we assume that every test takes on average 15 seconds, we're at 37 minutes already. There were other worse offenders, like openpyxl and pandas which took almost 3 minutes. |
|
I'm not sure I have a solid grasp of the state for each test, but given that the Travis builds ran in 30 minutes I'm tempted to go down a path that checks if the scipy hook is getting picked up on every test, but I'm not sure how to do that. There is a lot of runtime magic to trace through, and it's unclear how tests are isolated from each other (or the test harness program for that matter). Does this sound like a reasonable line of inquiry given that I made so few changes, and if so, which module makes sense to start in? |
|
Don't spend extra time on this now. I'm going to submit a PR to reduce the test time and then we'll see where we are. |
|
I wrote two test scripts, one that imported scipy and one that did not and ran PyInstaller on them. Both executables bundled the scipy DLLs. I see now that I didn't realize that by adding that hook, it would be executed every time PyInstaller ran, not just every time a script with scipy was analyzed. So those DLLs were added to every test app on Windows, not just the scipy tests' apps. |
|
Wait, what was the test app that did not import SciPy? What is the exact script? |
|
Here is the test script with scipy import numpy as np
from scipy import linalg
from scipy.optimize import leastsq
print(linalg.pinv(np.eye(100)))
def gaussian(x, m, s2):
return 1. / np.sqrt(np.pi * s2 * 2) * np.exp((-(x - m) ** 2) / (2 * s2))
def generate_gaussian(x, m, s2):
return gaussian(x, m, s2)
x = np.linspace(-10, 10)
y = generate_gaussian(x, 0.5, 5)
def fit_gaussian(params, x, y):
m, s2 = params
yhat = gaussian(x, m, s2)
return y - yhat
params, _ = leastsq(fit_gaussian, (0, 1), (x, y))
print(params)Here is the test script without scipy import numpy as np
def gaussian(x, m, s2):
return 1. / np.sqrt(np.pi * s2 * 2) * np.exp((-(x - m) ** 2) / (2 * s2))
def generate_gaussian(x, m, s2):
return gaussian(x, m, s2)
x = np.linspace(-10, 10)
y = generate_gaussian(x, 0.5, 5)
print(y.mean(), y.sd()) |
PyInstaller/hooks/hook-scipy.py
Outdated
|
|
||
| # package the DLL bundle that official scipy wheels for Windows ship | ||
| if "windows" in platform.platform().lower(): | ||
| dll_path = os.path.join(os.path.dirname(scipy.__file__), 'extra-dll', "*.dll") |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This should be
from PyInstaller.utils.hooks import get_module_file_attribute
get_module_file_attribute('scipy')|
It seems not all scripts fire the scipy hook. A script with just the standard library didn't trigger that hook. I would have expected scipy to trigger including numpy, but not the other way around. |
|
Can you post the trace log as a gist? It's |
|
Attempting to make a gist with both logs was throwing a 405 error no scipy: https://gist.github.com/mobiusklein/d35e803065ec72b6b929c293aa86cd8e |
|
This is where scipy is coming from: |
|
I don't see an explicit import in that file on https://github.com/numpy/numpy/blob/master/numpy/testing/nose_tools/nosetester.py |
|
There is indeed an import in a class method https://github.com/numpy/numpy/blob/master/numpy/testing/nose_tools/nosetester.py#L246 |
|
That makes sense. The file has been moved in HEAD, but is still there in the released version. |
|
What we need to do is add a |
PyInstaller/hooks/hook-numpy.py
Outdated
|
|
||
| # don't bundle the testing module as this will cause | ||
| # `scipy` to be pulled in unintentionally | ||
| excludedimports = ['numpy.testing'] |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Sorry, this should be:
from PyInstaller.utils.hooks import collect_submodules
excludedimports = collect_submodules('numpy.testing')
If I understand correctly.
|
It appears that after excluding numpy.testing, using either mode of expression, numpy becomes unimportable: I'll have to re-enable it for now. The offending line: https://github.com/numpy/numpy/blob/master/numpy/__init__.py#L151 |
|
Oh well. But would you mind modifying the numpy hook to match the scipy hook? These DLLs will be in the next numpy release as well. |
|
Sure. Since numpy drags scipy in, and numpy is used by many, many big C extension libraries, it follows that the scipy hook is firing more often than I expected. I'm still not sure how that translates to nearly doubling the runtime of the test suite beyond adding a few dozen DLLs to probe on a file system walk. Does PyInstaller do more with DLLs than copy them around? |
|
Actually, since they're added to binaries, they're run through pefile to inspect DLL dependencies. That would explain most of the extra time. |
|
@htgoebel What's the best way to solve this? |
|
Thank you for helping me debug this in real-time. I've only written hooks for my own libraries by trial-and-error. I'll have to pick this up again tomorrow. |
htgoebel
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks for this pull-requests. I have some clean-up requests. Some of the comments for the numpy hook are valid for the scipy-hook, too.
When you finished, please cleanup the commits. There should be two: one for each hook. Please also follow the commit-message guide (https://pyinstaller.readthedocs.io/en/latest/development/commit-messages.html#commit-messages) or have a look at other "Add hook" commits.
Thanks.
PyInstaller/hooks/hook-numpy.py
Outdated
| binaries = [] | ||
|
|
||
| # package the DLL bundle that official scipy wheels for Windows ship | ||
| if "windows" in platform.platform().lower(): |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Please use PyInstaller.compat.is_win
PyInstaller/hooks/hook-numpy.py
Outdated
|
|
||
| # package the DLL bundle that official scipy wheels for Windows ship | ||
| if "windows" in platform.platform().lower(): | ||
| dll_path = os.path.join(os.path.dirname(get_module_file_attribute('numpy')), 'extra-dll', "*.dll") |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I suggest naming this dll_glob sind it is no path, but a glob.
| @@ -0,0 +1,15 @@ | |||
| import os | |||
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Copyright header missing.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@mobiusklein Did you see this feedback?
| import platform | ||
| from PyInstaller.utils.hooks import collect_submodules, get_module_file_attribute | ||
|
|
||
| # if we bundle the testing module, this will cause |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This "If wee bundle" sounds crooked. I suggest rewording into somethink like: "Including the testing module, this would …"
| # if we bundle the testing module, this will cause | ||
| # `scipy` to be pulled in unintentionally but numpy imports | ||
| # numpy.testing at the top level for historical reasons. | ||
| # excludedimports = collect_submodules('numpy.testing') |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Why is this line commented out? Looks reasonable to me.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
numpy/__init__.py imports numpy.testing and instantiates a pair of benchmarking objects for historical reasons.
PyInstaller/hooks/hook-numpy.py
Outdated
|
|
||
| binaries = [] | ||
|
|
||
| # package the DLL bundle that official scipy wheels for Windows ship |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
scipy? THis is the numpy hook?!
| import platform | ||
|
|
||
| binaries = [] | ||
|
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Blank line too much.
| @@ -0,0 +1,15 @@ | |||
| from PyInstaller.utils.hooks import get_module_file_attribute | |||
|
|
|||
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Blank line. Please also move the PyInstaller import to the end. BTW: platform will no be required when using compat.is_win.
PyInstaller/hooks/hook-numpy.py
Outdated
| @@ -0,0 +1,15 @@ | |||
| import os | |||
| import platform | |||
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
platform will no be required when using compat.is_win.
should be okay, given that this is the only module of the testing package imported superfluously. |
|
I can make these changes, but the tail end of the exploration @xoviat and I did found that we must include This may explain why the tests are taking so long on Windows, since Since this adds ~30 DLLs to the bundle that all have to be inspected as @xoviat mentioned, it's unclear if there is an alternative solution that avoids this expensive step. Furthermore, it sounds like these same DLLs will be put in the |
|
One solution may be to patch numpy with a runtime hook. |
|
Yes, I just screwed up trying to get rid of all those commits. I'm about to start adding the properly committed files if I haven't destroyed things. |
|
Wow, those commits are gone for good! Hopefully you can recover the changes. |
|
I have the files backed up. I've never tried to rewrite published history before. |
|
It looks like you're even with develop, so you can just copy and paste each file from your backup into the repository and commit. Simple but effective. |
|
Did you see some of the feedbacck in the review above? Particularly the copyright header? |
|
I appear to have added it to the scipy hook, but not the NumPy one. Fortunately, this I could just --amend. |
ghost
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Not perfect, but good enough for me.
|
Nevermind, the backed up files were the wrong version. I'll get it right tomorrow. |
|
If you want to go for another round, please make sure that the files comply with flake8. Particularly the lines are < 80 characters. |
tests/requirements-libraries.txt
Outdated
| # isolated to non-Windows platforms. See also: scipy/scipy#5461 | ||
| scipy==0.19.1; sys_platform != 'win32' | ||
| # SciPy now provides binary wheels for Windows | ||
| scipy==1.0.0 |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Also, probably should be ;python_version != 3.3.
|
Also, I don't think we can add scipy to requirements.txt due to the significant increase in test time. |
* Added hidden imports for library-wide C-extensions * Added conditional inclusion of DLL bundle included in official Windows wheel
|
Hopefully I've addressed the issues in the format of the hooks and the commits. I've held off on updating the test requirements until discussion confirms whether they should be added. I don't think it's a big time cost for non-Windows builds, but it might be worth noting that the hooks for pandas, h5py, and matplotlib definitely pull in numpy, and openpyxl and PyQT may pull in numpy. That in turn pulls in scipy. |
* Added conditional inclusion of DLL bundle planned to be included in an upcoming official Windows wheel * Noted an unresolvable coupling between numpy and scipy
ghost
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM. It's always possible to avoid the SciPy inclusion with a clean virtualenv, and I don't think this should be blocked on that.
|
@htgoebel Did I make the changes you requested? |
|
@htgoebel I'm not sure what else is needed in order for this to be merged. Did I miss something? |
This hook will add the new private
scipy._libC-extension modules to the archive, and when building on Windows, include the DLL bundle used by the official wheel provider stored insite-packages/scip/extra-dll