Skip to content

python: fix and refactor RPATHs for compiled extensions 2#21149

Merged
scheibelp merged 3 commits intospack:developfrom
skosukhin:fix_python_rpath2
Jan 27, 2021
Merged

python: fix and refactor RPATHs for compiled extensions 2#21149
scheibelp merged 3 commits intospack:developfrom
skosukhin:fix_python_rpath2

Conversation

@skosukhin
Copy link
Copy Markdown
Member

@skosukhin skosukhin commented Jan 19, 2021

I think that the solution in #16527 is rather complex and still does not cover all possible cases. This PR offers a more simple solution with the same practical value.

Closes #16527
Closes #20765

@skosukhin skosukhin changed the title python: fix and refactor RPATHs for compiled extensions python: fix and refactor RPATHs for compiled extensions 2 Jan 19, 2021
# the first ld in the PATH is the Spack wrapper) or somewhere in the
# middle surrounded with spaces:
new_ldshared = config_ldshared.replace(" {0} ".format(config_cc),
" {0} ".format(new_cc))
Copy link
Copy Markdown
Member

@scheibelp scheibelp Jan 20, 2021

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Could something go wrong if you replace an instance of config_cc that is not surrounded by spaces?

This sets LDSHARED if

  • LDSHARED from sysconfigdata begins with CC from sysconfigdata
  • CC from sysconfigdata appears in LDSHARED from sysconfigdata

In both of these cases it sets it to be the Spack compiler wrapper (+ extra options from LDSHARED)

#16527 sets LDSHARED if

  • LDSHARED from sysconfigdata starts with the underlying compiler chosen by Spack for the dependent

In this case it sets it to be the Spack compiler wrapper (+ extra options from LDSHARED)

A few notes/questions:

  • These both seem about as complicated to me.
  • In this PR, how will it avoid choosing the Spack compiler wrapper when building with an external Python that used a different compiler?

Copy link
Copy Markdown
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@scheibelp I'll try to explain how this is supposed to work with examples. Hopefully, I'll find answers to your questions there.

There are two variables of interest (for now) in the sysconfigdata file: CC and LDSHARED. The first one is used for compilation, the second one for linking. Both variables can be overridden with the respective environmental variables. Also, LDSHARED normally starts with the value of CC (with a few exceptions): https://github.com/python/cpython/blob/3554fa4abecfb77ac5fcaa5ce8310eeca5683960/configure.ac#L2579-L2707 ($(CC) is expanded later when the sysconfigdata file is generated: https://github.com/python/cpython/blob/1459fed92c4bf6bb0d18a4c5637e8f1d8662bed5/Lib/distutils/sysconfig.py#L305). Here are several examples of what the variables can be set to:

  1. System installation:
    build_time_vars = {
    'CC': 'gcc -pthread',
    'LDSHARED': 'gcc -pthread -shared'}
  2. Another system installation:
    build_time_vars = {
    'CC': 'x86_64-linux-gnu-gcc -pthread',
    'LDSHARED': 'x86_64-linux-gnu-gcc -pthread -shared -Wl,-O1 '
                '-Wl,-Bsymbolic-functions -Wl,-z,relro'}
  3. Spack installation (after filtering the compiler wrappers):
    build_time_vars = {
    'CC': '/sw/gcc/gcc-6.3.0/bin/gcc -pthread'
    'LDSHARED': '/sw/gcc/gcc-6.3.0/bin/gcc -pthread -shared '
                '-L/scratch/local1/spack/opt/spack/linux-debian9-haswell/gcc-6.3.0/gdbm-1.18.1-npx4dgrnybfnuukx3hqcnrvd27pocxgk/lib '
                '-L/scratch/local1/spack/opt/spack/linux-debian9-haswell/gcc-6.3.0/expat-2.2.10-yin3ravxvv2cwrlxuhkt33msje27l4xx/lib '
                '-L/scratch/local1/spack/opt/spack/linux-debian9-haswell/gcc-6.3.0/bzip2-1.0.8-jwsh3efuidvovhri5xhr6it352vcdeei/lib '
                # and so on
    }
  4. Manual installation with ./configure CFLAGS=-pthread LDFLAGS=-pthread:
    build_time_vars = {
    'CC': 'gcc',
    'LDSHARED': 'gcc -shared -pthread'}
  5. Another possible set of values to make a point later:
    build_time_vars = {
    'CC': 'gcc',
    'LDSHARED': 'some/prefix/ld_so_aix gcc -bI:/some/prefix/python.exp '
                '-L/some/path/with/gcc/substring'}

One thing to notice is that CC often contains an additional flag -pthread. It is needed to build Python but I'm not sure whether the extensions really need it. The thing is that the flag is always dropped when we build an extension with Spack because the latter sets CC to the compiler wrapper, which overrides the value from the sysconfigdata file. We can discuss whether we want to fix this somehow but let's assume that we don't for now.

We need to know the value of CC that will be set by Spack when building an extension. One way to do it is to adjust this code to the context of setup_dependent_build_environment:

new_cc = join_path(spack.paths.build_env_path,
                   dependent_spec.package.compiler.link_paths['cc'])

This is what the original value of CC from the sysconfigdata file will be replaced with. Now we need to replace the old value of CC with the new one in the LDSHARED variable. Unfortunatly, we can's simply replace any occurence of the old value, which you can see in the 5th example above: if we replace 'gcc' with 'whatever', the -L flag will be corrupted. Therefore we have to replace ' gcc ' with ' whatever '. It might also be the case that CC is not a substring of LDSHARED. In this case, we do not need to modify the latter and export it.

The good thing about this solution is that it doesn't matter whether we work with an external Python or not (we don't backup the sysconfigdata file when installing Python with Spack). We simply try to make sure that the values of CC and LDSHARED are consistent. Yes, there might be a compiler mismatch but that's the user's choice to run something like spack install -v py-pyyaml%intel ^python%gcc or spack install -v py-pyyaml%[email protected] ^python%[email protected]. We give the user the freedom to do that (unlike in #16527).

A few more things:

  1. We might need to take care of CXX and LDCXXSHARED too. Could someone give me an example of a small extension that is supposed to use those variables?
  2. With this solution, we drop -pthread not only when compiling (which we always did) but also when linking extensions. My tests didn't show any problems with that but maybe I'm missing something. Anyway, if we decide to solve this, I think we should also solve this for the compilation.
  3. Just in case. It does not make much sense to test this on MacOS because of this.

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

we can't simply replace any occurence of the old value, which you can see in the 5th example above: if we replace 'gcc' with 'whatever', the -L flag will be corrupted. Therefore we have to replace ' gcc ' with ' whatever '.

I see - thanks for the example!

there might be a compiler mismatch but that's the user's choice to run something like spack install -v py-pyyaml%intel ^python%gcc or spack install -v py-pyyaml%[email protected] ^python%[email protected]. We give the user the freedom to do that (unlike in #16527).

OK got it. I wanted to confirm since I thought that was promoted as a good thing to do in $16527. I think we can handle it either way though.

We might need to take care of CXX and LDCXXSHARED too. Could someone give me an example of a small extension that is supposed to use those variables?

I don't think it qualifies as small, but py-numpy contains .cxx files.

Just in case. It does not make much sense to test this on MacOS because of this.

In the code you pointed out, it appears that MacOS respects LDSHARED if it is set in the environment and does not enter if (sys.platform == 'darwin'... if LDSHARED is set, so I think the behavior on MacOS would match what happens on other systems for Spack.

@scheibelp
Copy link
Copy Markdown
Member

scheibelp commented Jan 20, 2021

(originally I mistakenly posted this in #16527)

I tried building py-matplotlib and py-tables with this PR, and both succeeded: I mention this as a datapoint to compare with #20765, where these builds did not succeed according to #20765 (comment), although I haven't confirmed the failure so there may just be a difference in terms of the environment.

@skosukhin
Copy link
Copy Markdown
Member Author

This should now work for C++ extensions too (tested with py-msgpack and py-numexpr).

@scheibelp
Copy link
Copy Markdown
Member

scheibelp commented Jan 26, 2021

(EDIT) Do you agree with the following commit message (if so, then I think this is ready to merge):

Python extensions: consistently set LDSHARED to get Spack RPATHs (#21149)

Python extensions use CC and LDSHARED from the sysconfig module to
build. When Spack installs Python, it replaces the Spack compiler
wrappers in these values with the underlying compilers (since these
wrappers are not useful outside of the context of running Spack).
In order to use the Spack compiler wrappers when building Python
extensions with Spack, Spack sets the LDSHARED environment variable
when running `Python.setup_py` (which overrides sysconfig). However,
many Python extensions use an alternative method to build (namely
PythonPackage.setup_py), which meant that LDSHARED was not set (and
RPATHs were not inserted for dependencies).

This commit makes the following changes:

* Sets LDSHARED in the environment: this applies to all commands
  executed during the build, rather than for a single command
  invocation
* Updates the logic to set LDSHARED: this replaces the compiler
  executable in LDSHARED with the Spack compiler wrapper. This
  means that for some externally-built instances of Python,
  Spack will now switch to using the Spack wrappers when building
  extensions. The behavior is expected to be the same for Spack-
  built instances of Python.
* Performs similar modifications for LDCXXSHARED (to ensure RPATHs
  are included for C++ codes)

@skosukhin
Copy link
Copy Markdown
Member Author

@scheibelp yes, I agree with the message

@scheibelp scheibelp merged commit 3cc5b7a into spack:develop Jan 27, 2021
@scheibelp
Copy link
Copy Markdown
Member

Thanks!

jppelteret pushed a commit to jppelteret/spack that referenced this pull request Feb 4, 2021
…ck#21149)

Python extensions use CC and LDSHARED from the sysconfig module to
build. When Spack installs Python, it replaces the Spack compiler
wrappers in these values with the underlying compilers (since these
wrappers are not useful outside of the context of running Spack).
In order to use the Spack compiler wrappers when building Python
extensions with Spack, Spack sets the LDSHARED environment variable
when running `Python.setup_py` (which overrides sysconfig). However,
many Python extensions use an alternative method to build (namely
PythonPackage.setup_py), which meant that LDSHARED was not set (and
RPATHs were not inserted for dependencies).

This commit makes the following changes:

* Sets LDSHARED in the environment: this applies to all commands
  executed during the build, rather than for a single command
  invocation
* Updates the logic to set LDSHARED: this replaces the compiler
  executable in LDSHARED with the Spack compiler wrapper. This
  means that for some externally-built instances of Python,
  Spack will now switch to using the Spack wrappers when building
  extensions. The behavior is expected to be the same for Spack-
  built instances of Python.
* Performs similar modifications for LDCXXSHARED (to ensure RPATHs
  are included for C++ codes)
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants