Skip to content

Symbol hiding when using GNU linker in the Meson build should be implemented #15996

@rgommers

Description

@rgommers

We're using a build_ext override in setup.py to insert a linker script that does symbol hiding for Python extension modules with GCC on Linux: https://github.com/scipy/scipy/blob/main/setup.py#L166-L206. See gh-8463 for why.

The script tools/check_pyext_symbol_hiding.sh is run in CI on Azure (see below CI config snippet), but not in combination with a Meson build yet:

- script: ./tools/check_pyext_symbol_hiding.sh build
  displayName: "Check dynamic symbol hiding works"
  condition: ne(variables['Build.SourceBranch'], 'refs/heads/main')
  failOnStderr: true

I just ran it locally on a Meson build and get:

$ ./tools/check_pyext_symbol_hiding.sh ./build-install/lib/python3.10/site-packages/scipy/linalg/_cythonized_array_utils.cpython-310-x86_64-linux-gnu.so: too many public symbols!
000000000005c550 T _fini
0000000000008000 T _init
000000000000d791 T PyInit__cythonized_array_utils
00000000000734a0 B __pyx_module_is_main_scipy__linalg___cythonized_array_utils

Note that the script exits after the first non-compliant extension module is encountered. If I delete this one, I get the same on the next .so. Meson does have a builtin feature for this: https://mesonbuild.com/Release-notes-for-0-48-0.html#keyword-argument-for-gnu-symbol-visibility and the description in https://mesonbuild.com/Reference-manual_functions.html#arguments9. If I try to use that, I then get:

$ ./tools/check_pyext_symbol_hiding.sh 
./build-install/lib/python3.10/site-packages/scipy/linalg/_matfuncs_expm.cpython-310-x86_64-linux-gnu.so: too many public symbols!
0000000000050744 T _fini
0000000000008000 T _init
000000000000bdc7 T PyInit__matfuncs_expm

So that hides the B (global uninitialized data) symbols, but not the T (global text symbol) ones. (see this link) for symbol explanations).

I've also tried 'inlineshidden' instead of 'hidden' with the same result, plus a warning which seems to be a minor issue in Meson:

cc1: warning: command line option '-fvisibility-inlines-hidden' is valid for C++/ObjC++ but not for C

Trying on special/_specfun.<>.so gives me a huge amount of symbols, but I see the same in a conda-forge installed SciPy:

Details
$ cd build-install/lib/python3.10/site-packages/scipy/
$ nm -D --defined-only special/_specfun.cpython-310-x86_64-linux-gnu.so
0000000000044560 T airya_
000000000001d7a0 T airyb_
000000000002b860 T airyzo_
0000000000019080 T ajyik_
000000000000dab0 T array_from_pyobj
0000000000037120 T aswfa_
00000000000458f0 T aswfb_
0000000000028a00 T bernoa_
00000000000288c0 T bernob_
000000000001af30 T beta_
00000000000264b0 T bjndd_
0000000000013040 T cbk_
000000000003cdd0 T cchg_
000000000001c430 T cerf_
000000000002c030 T cerror_
00000000000385b0 T cerzo_
00000000000182d0 T cfc_
00000000000174c0 T cfs_
0000000000015c90 T cgama_
0000000000049af0 T ch12n_
000000000003aeb0 T chgm_
0000000000038af0 T chgu_
000000000001afa0 T chgubi_
000000000001b820 T chguit_
000000000001be80 T chgul_
000000000001c150 T chgus_
000000000000fa50 T cik01_
0000000000039c80 T ciklv_
0000000000052bc0 T cikna_
0000000000022550 T ciknb_
000000000004d5f0 T cikva_
000000000004b580 T cikvb_
000000000002a4e0 T cisia_
000000000002a1a0 T cisib_
000000000001ad70 T cjk_
0000000000029620 T cjylv_
00000000000204d0 T cjynb_
000000000002e780 T cjyva_
00000000000328e0 T cjyvb_
0000000000027490 T clpmn_
00000000000398b0 T clpn_
0000000000050ee0 T clqmn_
000000000002ae20 T clqn_
0000000000039390 T comelp_
000000000000e7f0 T copy_ND_array
00000000000460c0 T cpbdn_
0000000000016210 T cpdla_
0000000000026820 T cpdsa_
0000000000054150 T cpsi_
000000000001e7d0 T cv0_
000000000002c880 T cva1_
000000000001f3a0 T cva2_
000000000001e360 T cvf_
000000000001e000 T cvql_
000000000001e250 T cvqm_
0000000000014440 T cy01_
00000000000485d0 T cyzo_
00000000000169a0 T dinf_
000000000000ef00 T dnan_
000000000001cad0 T dvla_
000000000001ce90 T dvsa_
0000000000045410 T e1xa_
0000000000011110 T e1xb_
000000000001a560 T e1z_
000000000003ac00 T eix_
000000000003adb0 T eixz_
000000000003a6d0 T elit_
000000000003a9c0 T elit3_
0000000000020180 T envj_
0000000000050690 T enxa_
00000000000501a0 T enxb_
000000000002be70 T error_
000000000002a400 T eulera_
000000000002c730 T eulerb_
000000000000eeb0 T F2PyCapsule_AsVoidPtr
000000000000eee0 T F2PyCapsule_Check
000000000000e810 T F2PyCapsule_FromVoidPtr
000000000000d700 T F2PyDict_SetItemString
000000000000d850 T F2PyGetThreadLocalCallbackPtr
000000000000d750 T F2PySwapThreadLocalCallbackPtr
000000000000ef10 T fcoef_
000000000004f6b0 T fcs_
0000000000044d30 T fcszo_
0000000000043c10 T ffk_
000000000005486c T _fini
0000000000026740 T gaih_
000000000001d740 T gam0_
00000000000112a0 T gamma2_
0000000000012d40 T gmn_
000000000004fe40 T herzo_
000000000003b630 T hygfx_
000000000003e190 T hygfz_
000000000001a000 T ik01a_
0000000000046b40 T ik01b_
0000000000042810 T ikna_
0000000000042bd0 T iknb_
000000000004ab00 T ikv_
00000000000394f0 T incob_
0000000000035260 T incog_
0000000000006000 T _init
0000000000042310 T itairy_
00000000000358d0 T itika_
00000000000354c0 T itikb_
0000000000047d30 T itjya_
0000000000047910 T itjyb_
0000000000038210 T itsh0_
000000000002ab40 T itsl0_
0000000000045ad0 T itth0_
000000000002d720 T ittika_
000000000002cfb0 T ittikb_
0000000000028f00 T ittjya_
0000000000028b30 T ittjyb_
0000000000028140 T jdzo_
0000000000054650 T jelp_
0000000000034aa0 T jy01a_
000000000001d140 T jy01b_
00000000000375c0 T jyna_
0000000000024000 T jynb_
0000000000023850 T jynbh_
0000000000024180 T jyndd_
0000000000035c90 T jyv_
000000000004a1d0 T jyzo_
0000000000011f50 T klvna_
0000000000048c20 T klvnb_
00000000000496b0 T klvnzo_
00000000000137f0 T kmn_
000000000002e490 T lagzo_
0000000000038f60 T lamn_
000000000002dab0 T lamv_
0000000000036db0 T legzo_
0000000000045d80 T lgama_
00000000000431b0 T lpmn_
00000000000165f0 T lpmns_
0000000000045550 T lpmv_
0000000000016cd0 T lpmv0_
0000000000047080 T lpn_
0000000000048380 T lpni_
0000000000026d30 T lqmn_
00000000000252f0 T lqmns_
0000000000045f30 T lqna_
000000000002d310 T lqnb_
0000000000020380 T msta1_
00000000000201f0 T msta2_
00000000000437a0 T mtu0_
0000000000053710 T mtu12_
0000000000065170 B _npy_f2py_ARRAY_API
0000000000049520 T othpl_
00000000000379d0 T pbdv_
0000000000050770 T pbvv_
0000000000047470 T pbwa_
00000000000169b0 T psi_spec_
000000000000d920 T PyFortranObject_New
000000000000d8c0 T PyFortranObject_NewAsAttr
0000000000064f80 D PyFortran_Type
000000000000caa0 T PyInit__specfun
00000000000135d0 T qstar_
000000000004fb80 T rctj_
0000000000048190 T rcty_
000000000001e630 T refine_
0000000000024580 T rmn1_
000000000001f9c0 T rmn2l_
00000000000252a0 T rmn2so_
0000000000025c70 T rmn2sp_
00000000000499c0 T rswfo_
000000000002a090 T rswfp_
0000000000044790 T scka_
0000000000014160 T sckb_
0000000000011770 T sdmn_
0000000000052590 T segv_
0000000000047200 T sphi_
0000000000024280 T sphj_
0000000000050490 T sphk_
000000000001f7c0 T sphy_
000000000001ccb0 T vvla_
0000000000011470 T vvsa_

The reason the symbols are visible in a conda build is because the check in setup.py probes for gcc/g++ in the compiler name, however the conda compilers are named like: x86_64-conda-linux-gnu-cc, x86_64-conda-linux-gnu-c++.

Looking at the build.ninja file, the C++ extensions seem to default to -fvisibility=hidden -fvisibility-inlines-hidden already in Meson. C and Fortran extensions do not. And the C++ extension modules still have a large number of symbols, so it looks like the effect of these flags is different from the linker script we're using.

This needs more investigation. It's a pretty niche issue, because no one complained about symbols not being hidden in the conda build. That said, we shouldn't regress on this so I marked it for 1.9.0.

The Meson docs link to https://gcc.gnu.org/wiki/Visibility, which says:
Some people may suggest that GNU linker version scripts can do just as well. Perhaps for C programs this is true, but for C++ it cannot be true - unless you laboriously specify each and every symbol to make public (and the complex mangled name of it), you must use wildcards which tend to let a lot of spurious symbols through. And you have to update the linker script if you decide to change names to the classes or the functions. In the case of the library above, the author couldn't get the symbol table below 40,000 symbols using version scripts. Furthermore, using linker version scripts doesn't permit GCC to better optimise the code.

The reason that that comment doesn't apply for Python extensions, I think, is that there's only a single PyInit_ symbol that needs to be public. Public symbols are coming from https://github.com/python/cpython/blob/75eee1d57eb28283a8682a660d9949afc89fd010/Lib/distutils/command/build_ext.py#L686-L698

@eli-schwartz do you have any insights or advice here?

Metadata

Metadata

Assignees

No one assigned

    Labels

    MesonItems related to the introduction of Meson as the new build system for SciPy

    Type

    No type

    Projects

    No projects

    Milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions