Skip to content

Conversation

@tylerjereddy
Copy link
Contributor

@tylerjereddy tylerjereddy commented Feb 20, 2019

Fixes #12667

The simplest & apparently favored fix for failed NumPy imports on (conda) Windows proposed in #12667 is to ensure that requisite DLL(s) are physically present in the numpy/core path.

Before trying a fix that moves the DLL path(s) within a built NumPy wheel, I thought it might be sensible to have a "test" that tries to mimic the conda-related import failures described in the above issue.

Not sure if this is too aggressive, by confining DLL search resolution to system32, but it does seem to nicely generate an ImportError: DLL load failed.

I guess the answer to "is this too aggressive?" is if the physical placement of the DLL(s) at the core path restores the functional numpy import.

@tylerjereddy
Copy link
Contributor Author

Travis missing from CI list it seems -- wonder if that's related to the "draft PR" or something else.

@tylerjereddy
Copy link
Contributor Author

same for circleci -- maybe a longer queue, or related to the draft?

@tylerjereddy
Copy link
Contributor Author

I added a commit with the distutils changes that seemed to restore all-green Azure CI in my debug branch, overcoming the DLL path resolution narrowing from the first commit.

Looks like Shippable really doesn't like this change at first glance! I don't even see suitable error output there.

@mattip
Copy link
Member

mattip commented Feb 20, 2019

Closing/Reopening to restart shippable builds

@mattip mattip closed this Feb 20, 2019
@mattip mattip reopened this Feb 20, 2019
@mattip
Copy link
Member

mattip commented Feb 20, 2019

We should also remove the code that modifies os.environ['PATH'] here and make sure the linalg module is loading the _umath_linalg c-extension. I am pretty sure it will fail, which means this approach cannot be used. :(

@tylerjereddy
Copy link
Contributor Author

You mean remove the line that I changed instead of changing it? Pretty sure I've adjusted that line you link to in this PR, right?

Travis still not triggered for draft PR it seems, which means codecov isn't either.

@tylerjereddy
Copy link
Contributor Author

If an inappropriate extension is currently being loaded, shouldn't a unit test fail somewhere for linalg?

@mattip
Copy link
Member

mattip commented Feb 20, 2019

This chunk of code modifies os.environ['PATH'] at startup https://github.com/numpy/numpy/blob/v1.16.1/numpy/distutils/misc_util.py#L2298 should be removed as part of this PR,

I am pretty sure this change will cause linalg to fail to load, since numpy/linalg/_umath_linalg*.pyd also links to the dll we moved into numpy/core

* OpenBLAS-related DLL is now placed in numpy/core
instead of numpy/.libs to alleviate issues with
DLL resolution that have cropped up in Anaconda
environments

* added a test in Azure pipelines workflow to
mimic the Anaconda DLL resolution mangling
@tylerjereddy tylerjereddy force-pushed the windows_conda_dll_handling branch from 04b5dff to fbb93eb Compare February 20, 2019 18:14
@tylerjereddy tylerjereddy marked this pull request as ready for review February 20, 2019 18:15
@tylerjereddy
Copy link
Contributor Author

tylerjereddy commented Feb 20, 2019

Ok, I removed that chunk of code and everything works just fine for Azure Windows on my branch, so I've pushed the revisions to this branch now & lifted draft status on PR so Travis can run too.

Note that there's ongoing debate between @rgommers & @zooba in the linked issue as to whether this approach is really sustainable for the broader scientific Python ecosystem, though I'm not sure if Ralf is implying we can't do this for NumPy for now.

Anyway, I'll leave this open for now--I'm expecting the CI to pass based on my debug work on another branch, but I guess we'll see.

Not sure if we'd want to make an actual unit test in the test suite instead of a CI stage intervention for "testing" if we do go this route.


# Setup directory for storing generated extra DLL files on Windows
self.extra_dll_dir = os.path.join(self.build_temp, '.libs')
self.extra_dll_dir = os.path.join(self.build_temp, 'core')
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This changes from a generic solution to something that's specific to numpy itself. I think we can do require each package to add some code in combination with changes here, but we can't just make numpy-specific directory layout assumptions here.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I guess if other packages are using package root level .libs folder we'd probably need to retain that default and either have a generic way to offer a customized switch for the DLL storage folder, or bake in a switch if building NumPy only.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

scipy-wheels is still on a commit before 473a6f0, so it still has extra-dll rather than .libs. But it looks like we'll need to upgrade that repo to latest numpy master to get the fix for this PATH issue.

@rgommers
Copy link
Member

though I'm not sure if Ralf is implying we can't do this for NumPy for now.

not sure yet - but as in my other comment, this numpy.distutils code isn't just used by NumPy itself so we need to be clear what other packages like SciPy need to change if/when this PR goes in

@tylerjereddy tylerjereddy changed the title BLD: Test (break) NumPy import from wheel install by obscuring DLL resolution WIP, BLD: Test (break) NumPy import from wheel install by obscuring DLL resolution Feb 20, 2019
@mattip
Copy link
Member

mattip commented Feb 20, 2019

What if we add a extra_dll directory into site.cfg, and for numpy set it to numpy/core. That would solve our problem at least, I don't know how scipy would deal with it.

@tylerjereddy
Copy link
Contributor Author

What if we add a extra_dll directory into site.cfg, and for numpy set it to numpy/core

I think that falls within the lexicon of my previous suggestion above to:

have a generic way to offer a customized switch for the DLL storage folder

But want to be cautious here.

@rgommers
Copy link
Member

rgommers commented Feb 20, 2019

What if we add a extra_dll directory into site.cfg, and for numpy set it to numpy/core. That would solve our problem at least, I don't know how scipy would deal with it.

Works in principle, however a practical issue is that by default there is no site.cfg and we don't want to start requiring one.

I do think I have a preference for sticking with .libs or extra-dll rather than putting things into core, with the LoadLibrary call. That's for generality, see #12667 (comment). I think that needs to be in numpy.distutils. However if for NumPy itself you both prefer to go with numpy/core, I think that's fine too. That will be the numpy-specific part that then has to not be the default; it can go e.g. in a setup.py file or in some script in the numpy-wheels repo.

@rgommers
Copy link
Member

This PR does seem to work for NumPy. I think the numpy.distutils discussing will need resolving, however that could be done later. Right now we need to stop the bleeding. What do you think about this plan:

  1. merging this as is
  2. backport to 1.16.x and release 1.16.2 asap
  3. Come back and fix numpy.distutils so it work for SciPy and other packages
  4. backport that fix and put it in 1.16.3 (not very urgent)

That's an extra release to be done, but we're getting multiple bug reports a day that will mostly dry up then, so probably worth it.

@tylerjereddy
Copy link
Contributor Author

I'm just trying the other solution now, but if that doesn't work out swiftly then we may have to I guess

@rgommers
Copy link
Member

I am pretty sure this change will cause linalg to fail to load, since numpy/linalg/_umath_linalg*.pyd also links to the dll we moved into numpy/core

@mattip this turned out to not be an issue right? reason: numpy.core is imported first and loads the dll already, so numpy/linalg/_umath_linalg*.pyd just picks up the already loaded dll.

@charris charris added the 09 - Backport-Candidate PRs tagged should be backported label Feb 22, 2019
@charris charris added this to the 1.16.2 release milestone Feb 22, 2019
@charris
Copy link
Member

charris commented Feb 22, 2019

I marked this for backport just so it doesn't get lost.

@mattip
Copy link
Member

mattip commented Feb 23, 2019

The fix seems good, linalg still works even without the code to modify os.environ['PATH'] so I guess @rgommers's analysis is correct. The plan to release this as 1.16.2 sounds good to me. Similar to linalg, if other packages like scipy needs the dll and imports numpy first it should Just Work, but maybe worth a check before we release this?

@tylerjereddy
Copy link
Contributor Author

This PR will likely break SciPy because of the distutils lib change. The alternate won't.

Each has its own tradeoffs. I think I like the alternate more to reduce impact on the ecosystem by steering clear of distutils.

@charris charris removed this from the 1.16.2 release milestone Feb 26, 2019
@charris
Copy link
Member

charris commented Mar 11, 2019

Do we want to pursue this test? Anaconda/numpy seems to be working OK now, we have the check in the __init__.py file. There is enough flux in this problem in general that it might be better to wait on developments.

@tylerjereddy
Copy link
Contributor Author

We do have the regression test used in Azure CI here already merged to master with the alternative solution.

I'm still not clear on exactly why either solution only affects conda. I think it is more about likelihood of being helpful--technically it seems like a user could do what I did with the test here / there to break a source install by bricking access to .libs.

But this is so obscure beyond the recent Anaconda event that there's some arguments that the fix should be migrated to the wheels repo only. If we do that, I'd probably want to clone in the modified __init__.py for one of the Azure CI jobs to maintain that guard in dev workflow, although that may be overkill vs. check in nightlies / wheels perhaps.

On the other hand, with Anaconda reversing the design decision that caused the issue in the first place, there's even less justification for the fix.

Can the alternative fix actually do harm? Is guarding against this obscure scenario worth it only for wheels or not even for wheels anymore? I like being defensive against the issue possibly happening again, but there was certainly a wide-ranging set of opinions & some solutions affecting downstream things of course.

Probably want to wait a bit for clear consensus.

@rgommers
Copy link
Member

Do we want to pursue this test?

I don't care so much about the test, but we do need to fix the problem before it re-occurs; it will likely be reintroduced for Python 3.8 through the Windows Store even if Anaconda stays as is.

I'm still not clear on exactly why either solution only affects conda.

It doesn't only affect conda.

There is enough flux in this problem in general that it might be better to wait on developments.

There's no urgency at the moment I think, but let's mark it for the next release to not completely forget about it.

@rgommers rgommers added this to the 1.17.0 release milestone Mar 15, 2019
@mattip
Copy link
Member

mattip commented Jun 11, 2019

I think we should close this. #13019 fixed the PATH problem by preloading the dll, and the fix was backported in #13041

@tylerjereddy
Copy link
Contributor Author

I'll just ping @rgommers to double check on that. I think I wasn't quite getting the full picture, but maybe the "sub-optimal" approach is indeed ok.

@rgommers
Copy link
Member

I think the current approach that's in master is probably fine. However there are still two problems that needs solving before the next release: #13019 (comment)

Could one of you tackle those before closing this (or close this and open a new issue for the todos with the 1.17.0 milestone)?

@charris charris removed this from the 1.17.0 release milestone Jun 14, 2019
@charris
Copy link
Member

charris commented Jun 28, 2019

@mattip is this still needed/wanted?

@mattip
Copy link
Member

mattip commented Jun 28, 2019

Closing. This was resolved and is tested differently

@charris charris removed the 09 - Backport-Candidate PRs tagged should be backported label Jul 14, 2019
@rgommers
Copy link
Member

rgommers commented Dec 3, 2019

actually closing now

@rgommers rgommers closed this Dec 3, 2019
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

None yet

Development

Successfully merging this pull request may close these issues.

BUG: Python from windows 10 store does not extend PATH via numpy.__config__.py

4 participants