Skip to content

Conversation

@tylerjereddy
Copy link
Contributor

Alternative to #12994

As discussed in #12667 by @rgommers, this may have the advantage of not bricking SciPy / ecosystem projects since it confines the "guaranteed" openblas-related DLL load to numpy/core instead of the distutils machinery change.

* add an Azure CI test to mimic recent
Anaconda DLL path resolution restrictions
that prevent NumPy imports

* patch numpy.core initialization to use
absolute DLL path loading to overcome
the scenario described above
@tylerjereddy tylerjereddy changed the title WIP, BLD: Windows absolute path DLL loading BLD: Windows absolute path DLL loading Feb 22, 2019
@tylerjereddy
Copy link
Contributor Author

CI is all green -- I think this is looking like the safer alternative now?

@tylerjereddy
Copy link
Contributor Author

I'm not sure that we need to remove the PATH mangling block in numpy/distutils/misc_util.py, so I haven't done that here yet. Matti seemed to think that was important for some reason in the alternative.

If the DLL is already in memory from the core load I don't see the problem, but maybe just needs to be clarified. It may be ugly to mutate the path, but maybe also something to be said for minimal adjustments to distutils.

@charris charris added 09 - Backport-Candidate PRs tagged should be backported 29 - Intel/Anaconda labels Feb 22, 2019
@charris charris added this to the 1.16.2 release milestone Feb 22, 2019
@rgommers
Copy link
Member

I don't think that this is the fix we were discussing? maybe my understanding is wrong - it's hard to keep track of all the issues/discussions. but what I thought was that the generic solution that will work for both numpy and scipy is to keep using .libs and add .libs to the search path (rather than explicitly load dlls). Along the lines discussed at ContinuumIO/anaconda-issues#10628 (comment). That discussion is not yet completed though.

also there's this: #10229 (comment) (related to our previous attempt, and my comment above - which seems to not be correct).

okay, my conclusion for now is that (a) I need a better summary to not get confused, and (b) this is still a numpy specific fix that doesn't affect scipy and other packages either way.

For now it's not clear to me how to best fix numpy.distutils, and there are likely still Anaconda changes (or a full revert) in the pipeline. So I suspect we'd better go with gh-12994 right now and figure the rest out later. pip install scipy in conda will also be broken right now, but users aren't running into that yet, so let's deal with pip install numpy first.

@tylerjereddy
Copy link
Contributor Author

This is from @zooba comment:

The second most reliable way to have the loader use dependencies from a separate directory is to explicitly load them first using LoadLibrary[Ex] with a full path before importing the module that needs them

Not modifying any search path, just loading with the full path to the DLL. His first choice was just copying the DLL right into core.

@tylerjereddy
Copy link
Contributor Author

Yeah, this deliberately tries to avoid distutils. If SciPy needs access to something/.libs for DLLs then various SciPy modules might be initialized with explicit loads on that absolute / resolved path to circumvent the conda adjustment on the DLL search path (rather than mangling the users' conda DLL search path again).

If SciPy decides not to do that (it may be a LOT of __init__.py adjustments), I don't think this PR has any bearing on that?

@rgommers
Copy link
Member

If SciPy needs access to something/.libs for DLLs then various SciPy modules might be initialized with explicit loads on that absolute / resolved path to circumvent the conda adjustment on the DLL search path (rather than mangling the users' conda DLL search path again).

It starts to look like that's what is needed. Unless the conda change gets reverted or amended to pick up dynamic PATH adjustments, which may still happen.

* Test for DLL pollution in numpy/.libs/ in Azure
CI Windows Python 3.6 builds

* emit a warning with DLL names when > 1 are present
in numpy/.libs/
@zooba
Copy link
Contributor

zooba commented Feb 23, 2019

FYI, since a lot of the relevant people seem to be here, I filed https://bugs.python.org/issue36085 to consider making a potential breaking change in Python 3.8 to help address this. All input is welcome!

@mattip
Copy link
Member

mattip commented Feb 23, 2019

The reason I asked to remove the os.environ['PATH'] modification is to prove that this actually works, otherwise for all we know the WINDLL load is not actually doing anything and the previous mechanism is still the one in play.

@tylerjereddy
Copy link
Contributor Author

For both PRs the workflow was like this:

  • on my fork, add new Azure CI test to mimic Anaconda blocking NumPy access to numpy/.libs (NumPy cannot find any DLL outside the system32 folder with a path search)
  • NumPy now fails CI because you can't import numpy, with DLL resolution failure
  • add one of the two proposed fixes Steve Dower discussed
  • CI works because DLL resolution is resolved

How can either fix not be doing anything if they both restore a working NumPy import though the new Azure test?

I don't see how the PATH code plays into any of that. There's a clear negative test result -> positive test result from the changes in isolation.

If the PATH code had enough influence to restore access to .libs, then how did Anaconda circumvent it?

@mattip
Copy link
Member

mattip commented Feb 23, 2019

add new Azure CI test to mimic Anaconda blocking NumPy access to numpy/.libs

Sorry I missed this part in the changes here. Anyhow, the changes are now not needed so why leave the extra code?

@tylerjereddy
Copy link
Contributor Author

why leave the extra code?

Just to minimize the impact on distutils.

@charris
Copy link
Member

charris commented Feb 25, 2019

@rgommers et. al. Is this the preferred solution?

@rgommers
Copy link
Member

@rgommers et. al. Is this the preferred solution?

I think a variant of gh-12994 is better long-term, but this is mergeable as is and gh-12994 isn't yet.

@charris charris merged commit 06c86bc into numpy:master Feb 26, 2019
@charris
Copy link
Member

charris commented Feb 26, 2019

Let's give this a shot and see if it stops the bleeding. Thanks Tyler.

@charris
Copy link
Member

charris commented Feb 26, 2019

I wonder where Anaconda puts the DLLs?

@tylerjereddy tylerjereddy deleted the windows_conda_dll_option2 branch February 26, 2019 00:37
@rgommers
Copy link
Member

I wonder where Anaconda puts the DLLs?

They don't move them around; DLLs from a package build stay within the package install dir.

Of course in this particular case, there is no static linking so the problematic OpenBLAS DLL and gfortran DLL are not shipped with numpy but provided by the openblas and libgfortran packages.

Don't have a Windows machine to check, but it's probably the same as on Linux: they end up at anaconda3/envs/<envname>/lib/libopenblas.so

@charris charris removed the 09 - Backport-Candidate PRs tagged should be backported label Feb 26, 2019
@charris charris removed this from the 1.16.2 release milestone Feb 26, 2019
@cgohlke
Copy link
Contributor

cgohlke commented Feb 27, 2019

I see two issues with this PR:

  1. The code is distribution specific (it doesn't even apply to default builds of numpy) and belongs into _distributor_init.py

  2. __file__ may not be defined in frozen environments.

@tylerjereddy
Copy link
Contributor Author

default builds of numpy

This was mostly targeted to alleviate a specific scenario where you install from one of our pre-built wheels and then if, like Anaconda temporarily did, you somehow restrict access to numpy/.libs for DLL resolution

In fact, I wasn't really targeting builds at all--just the way the files copied from the pre-built wheel behave to resolve DLLs on Windows.

The file you mention clearly indicates:

The numpy standard source distribution will not put code in this file, so you
can safely replace this file with your own version.

It sounds like distributors are free to do that as they see fit.

@rgommers
Copy link
Member

The code is distribution specific (it doesn't even apply to default builds of numpy) and belongs into _distributor_init.py

This is a good point, arguably we should move this bit of code to the numpy-wheels repo. That more or less is our "default build", but this code isn't useful outside of numpy-wheels. (we didn't break distributor builds by including it here though)

On second thought: we are anyway going to change to another final solution, so let's not move it to numpy-wheels right now.

__file__ may not be defined in frozen environments.

Hmm, I wasn't aware of that. Looks like we also have a usage of __file__ in scipy.optimize.

@cgohlke
Copy link
Contributor

cgohlke commented Feb 27, 2019

default builds of numpy

Builds from source distribution without a site.cfg, hence without OpenBLAS, MKL, or BLIS.

@rgommers
Copy link
Member

rgommers commented Feb 27, 2019

__file__ may not be defined in frozen environments.

Hmm, I wasn't aware of that. Looks like we also have a usage of __file__ in scipy.optimize.

Okay, the problem is - there isn't a good alternative. Python is a bit braindead in this case. In many places one can find discussions about __file__ not being defined in a number of cases, however the only suggestion for how to load data files is to embed them inside Python code or to use pkg_resources - those are both not reasonable solutions.

There's a Python issue for this: https://bugs.python.org/issue21736

EDIT: in this case, just wrapping in try-except is probably the way to go

@tylerjereddy
Copy link
Contributor Author

@rgommers

arguably we should move this bit of code to the numpy-wheels repo.

Why? Our wheels aren't used exclusively by conda--how do you target to just conda wheel usage? Another consumer of wheels could make the same argument that they don't brick DLL resolution so we need another layer of abstraction.

And the fix isn't just for conda, it just happens to be motivated to relieve the type of scenario they caused.

And is it really desirable to introduce dissonance between the dev and wheels workflow this way? It may be possible to dig up other pieces of code that primarily benefit wheels & present a similar argument to generate further dissonance / abstraction.

@cgohlke
Copy link
Contributor

cgohlke commented Feb 27, 2019

Re _distributor_init.py , this is how it is described in the release notes, which I think applies to this case:

Hook in numpy/__init__.py to run distribution-specific checks

Binary distributions of numpy may need to run specific hardware checks or load specific libraries during numpy initialization. For example, if we are distributing numpy with a BLAS library that requires SSE2 instructions, we would like to check the machine on which numpy is running does have SSE2 in order to give an informative error.

Add a hook in numpy/__init__.py to import a numpy/_distributor_init.py file that will remain empty (bar a docstring) in the standard numpy source, but that can be overwritten by people making binary distributions of numpy.

Re __file__: in cases like this I think it is OK to wrap its usage in a try: ... except block and let re-packagers handle distribution specific initialization code if it fails. Numpy cannot know where and which DLLs end up in a frozen or re-packaged environment.

@rgommers
Copy link
Member

Why? Our wheels aren't used exclusively by conda--how do you target to just conda wheel usage?

This doesn't sound quite right. Admittedly, it's complex and very much under-documented. Conda doesn't use wheels. This code is specific to the wheels that we build with the numpy-wheels repo and put on PyPI, it doesn't make sense for conda packages and it doesn't make sense for other wheels that users or other distributions build by themselves.

Re _distributor_init.py , this is how it is described in the release notes, which I think applies to this case:

Yes, I think you're right. We should not modify the _distributor_init.py in this repo though, that's by design empty. If we move things to numpy-wheels then yes, that's where it should live.

@tylerjereddy
Copy link
Contributor Author

We should not modify the _distributor_init.py in this repo though

If the change is transplanted to the wheels repo, can we pull it in for one of our dev CI runs as well to retain harmony there?

There's OpenBLAS thread handling behavior in the core __init__ file too--I guess I'm not clear on how that is relevant to all "general" / "default" builds either, since they may use another (or no) linalg backend.

@rgommers
Copy link
Member

I guess I'm not clear on how that is relevant to all "general" / "default" builds either, since they may use another (or no) linalg backend.

that's applicable to other OpenBLAS-based builds too though (should be correct and needed for any user who builds from source against OpenBLAS), not just our numpy-wheels builds.

default builds of numpy

Builds from source distribution without a site.cfg, hence without OpenBLAS, MKL, or BLIS.

For the record, this is not quite accurate. numpy.distutils will try all of those in the absence of a site.cfg, and will pick the first one found in the order: MKL, BLIS, OpenBLAS, ATLAS, Accelerate. See https://github.com/numpy/numpy/blob/master/numpy/distutils/system_info.py#L1625

@cgohlke
Copy link
Contributor

cgohlke commented Feb 28, 2019

numpy.distutils will try all of those in the absence of a site.cfg, and will pick the first one found in the order: MKL, BLIS, OpenBLAS, ATLAS, Accelerate

Is that practically true for CPython/Win32 from python.org (not msys, WSL, cygwin, Anaconda)? I have a dozen versions of MKL, BLIS, OpenBLAS, LAPACK headers and libraries on my Windows system, but py -m pip install numpy --no-binary :all: (for example) finds none of them unless I explicitly enable a specific build environment before or provide a setup.cfg, which is actually a good thing.

@rgommers
Copy link
Member

Probably not - a number of standard locations are searched, but I'm not sure if those are reasonable on Windows.

CPython/Win32 from python.org

We try to encourage users to use a distribution rather than python.org installers, because those installers are a recipe for disaster unless you know exactly what you're doing. So Python.org Python isn't default or special anymore, it's just one of many options.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

None yet

Development

Successfully merging this pull request may close these issues.

7 participants