Skip to content

pybind/mgr: Hack around the 'ImportError: PyO3 modules may only be in… #61737

Closed
pecastro wants to merge 12 commits intoceph:mainfrom
pecastro:pyO3-import-issues
Closed

pybind/mgr: Hack around the 'ImportError: PyO3 modules may only be in… #61737
pecastro wants to merge 12 commits intoceph:mainfrom
pecastro:pyO3-import-issues

Conversation

@pecastro
Copy link
Contributor

@pecastro pecastro commented Feb 9, 2025

…itialized once per interpreter process' issue.

Not sure if this ( using /usr/bin/python3 ) is fair use given we can't really seem to be able to get the interpreter being used with 'sys.executable' under CPython.

Fixes: https://tracker.ceph.com/issues/64213

This is an alternative approach to the attempt #57926 at solving the problem.
User hashes don't change, and all the mgr_utils.py use cases have been dealt with.

It's not pretty, but it works.

Contribution Guidelines

  • To sign and title your commits, please refer to Submitting Patches to Ceph.

  • If you are submitting a fix for a stable branch (e.g. "quincy"), please refer to Submitting Patches to Ceph - Backports for the proper workflow.

  • When filling out the below checklist, you may click boxes directly in the GitHub web UI. When entering or editing the entire PR message in the GitHub web UI editor, you may also select a checklist item by adding an x between the brackets: [x]. Spaces and capitalization matter when checking off items this way.

Checklist

  • Tracker (select at least one)
    • References tracker ticket
    • Very recent bug; references commit where it was introduced
    • New feature (ticket optional)
    • Doc update (no ticket needed)
    • Code cleanup (no ticket needed)
  • Component impact
    • Affects Dashboard, opened tracker ticket
    • Affects Orchestrator, opened tracker ticket
    • No impact that needs to be tracked
  • Documentation (select at least one)
    • Updates relevant documentation
    • No doc update is appropriate
  • Tests (select at least one)
Show available Jenkins commands
  • jenkins retest this please
  • jenkins test classic perf
  • jenkins test crimson perf
  • jenkins test signed
  • jenkins test make check
  • jenkins test make check arm64
  • jenkins test submodules
  • jenkins test dashboard
  • jenkins test dashboard cephadm
  • jenkins test api
  • jenkins test docs
  • jenkins render docs
  • jenkins test ceph-volume all
  • jenkins test ceph-volume tox
  • jenkins test windows
  • jenkins test rook e2e

@pecastro pecastro requested a review from a team as a code owner February 9, 2025 21:04
@pecastro pecastro force-pushed the pyO3-import-issues branch 2 times, most recently from 37da554 to d5ecc78 Compare February 9, 2025 21:43
Copy link
Contributor

@phlogistonjohn phlogistonjohn left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This looks like a good start.
However, I strongly suggest taking the crypto code out of this file entirely and put it new .py file(s). Possibly in src/python-common somewhere and then instead of using python3 -c use python -m with if __name__ == '__main__ type guards in those files.

In the future we could then switch back to importing those new modules if/when the pyO3/rust/subintepreter issues are resolved with an even cleaner architecture overall.

Does this suggestion make sense to you?

@pecastro
Copy link
Contributor Author

It makes total sense however ...

When I had a go at trying to implement it I bumped into a side effect I had not seen before.
As soon as you move that code into a module that you run rather than run as a script,you are greeted with DeprecationWarning: CSR support in pyOpenSSL is deprecated. messages that will interfere with the existing logic to detect errors and the subsequent try/except order of things.

I've pushed the first stab for comparison and I provide a few examples of how to trigger things.

Run newly converted module and validate errors:

PYTHONPATH=/ceph/src/pybind/mgr:/ceph/src/pybind/mgr/.tox/py3/bin:/ceph/src/pybind:/usr/lib64/python312.zip:/usr/lib64/python3.12:/usr/lib64/python3.12/lib-dynload:/ceph/src/pybind/mgr/.tox/py3/lib64/python3.12/site-packages:/ceph/src/pybind/mgr/.tox/py3/lib/python3.12/site-packages:src/python-common \
python3 -m ceph.pybind.mgr.create_self_signed_cert eyJPIjogIkNlcGgiLCAiQ04iOiAibWdyIn0=

/ceph/build/src/python-common/ceph/pybind/mgr/create_self_signed_cert.py:18: DeprecationWarning: CSR support in pyOpenSSL is deprecated. You should use the APIs in cryptography.
  req = crypto.X509Req()
b'-----BEGIN CERTIFICATE-----
....

pyOpenSSL versions

(mgr-virtualenv) [pecastro@stealthix build]$ PYTHONPATH=/ceph/src/pybind/mgr:/ceph/src/pybind/mgr/.tox/py3/bin:/ceph/src/pybind/mgr/$PYTHONPATH:/ceph/src/pybind:/usr/lib64/python312.zip:/usr/lib64/python3.12:/usr/lib64/python3.12/lib-dynload:/ceph/src/pybind/mgr/.tox/py3/lib64/python3.12/site-packages:/ceph/src/pybind/mgr/.tox/py3/lib/python3.12/site-packages:src/python-common  python -m OpenSSL.debug
pyOpenSSL: 25.0.0
cryptography: 44.0.0
cffi: 1.17.1
cryptography's compiled against OpenSSL: OpenSSL 3.4.0 22 Oct 2024
cryptography's linked OpenSSL: OpenSSL 3.4.0 22 Oct 2024
Python's OpenSSL: OpenSSL 3.2.2 4 Jun 2024
Python executable: /ceph/build/mgr-virtualenv/bin/python
Python version: 3.12.7 (main, Oct  1 2024, 00:00:00) [GCC 14.2.1 20240912 (Red Hat 14.2.1-3)]
Platform: linux
sys.path: ['/ceph/build', '/ceph/src/pybind/mgr', '/ceph/src/pybind/mgr/.tox/py3/bin', '/ceph/src/pybind', '/usr/lib64/python312.zip', '/usr/lib64/python3.12', '/usr/lib64/python3.12/lib-dynload', '/ceph/src/pybind/mgr/.tox/py3/lib64/python3.12/site-packages', '/ceph/src/pybind/mgr/.tox/py3/lib/python3.12/site-packages', '/ceph/build/src/python-common', '/ceph/build/mgr-virtualenv/lib64/python3.12/site-packages', '/ceph/build/mgr-virtualenv/lib/python3.12/site-packages']

So, the options I think we have are:

  1. Leave it as it is. Document the reasons, Open story ticket to deal with it properly or wait for a fix ....
  2. Continue the refactoring into a separate module but change the logic to accommodate the error in some way shape or form.
  3. Substitute all the deprecated pyOpenSSL code by the pyca/cryptography module, there is prior art in ./src/pybind/mgr/cephadm/ssl_cert_utils.py that at a very quick glance could be used as reference.
  4. Something else I'm not thinking of ?

I'm leaning towards option 3.

@phlogistonjohn
Copy link
Contributor

I'm ok with your suggestions. FWIW we should be able to suppress warning when needed https://docs.python.org/3/library/warnings.html#temporarily-suppressing-warnings

@pecastro
Copy link
Contributor Author

🤦‍♂️ I never pushed the first stab ...

I reworked everything following your initial suggestion and applied the warnings, it turned out only the first block of code had the need for it.
We can leave the substitution for the pyca/cryptography module for a later stage.

@pecastro pecastro force-pushed the pyO3-import-issues branch 2 times, most recently from 0c12faa to f780aa1 Compare February 13, 2025 23:23
@pecastro
Copy link
Contributor Author

WOOM.

[pecastro@host build]$ rm -rf dev/ out/                                                           
[pecastro@host build]$ MDS=1 MGR=1 MON=1 OSD=1 NFS=0 ../src/vstart.sh -n                                                                                
rm -f core*                                                                                                                                                                                                  
hostname host                                                                                                                                                                                           
ip 192.168.12.74                                                                                                                                                                                             
port 40915                                                                                                                                                                                                   
/ceph/build/bin/ceph-authtool --create-keyring --gen-key --name=mon. /ceph/build/keyring --cap mon 'allow *'                                                                
creating /ceph/build/keyring                                                                                                                                                                                 
/ceph/build/bin/ceph-authtool --gen-key --name=client.admin --cap mon 'allow *' --cap osd 'allow *' --cap mds 'allow *' --cap mgr 'allow *' /ceph/build/keyring             
/ceph/build/bin/monmaptool --create --clobber --addv a [v2:192.168.12.74:40915,v1:192.168.12.74:40916] --print /tmp/ceph_monmap.1686593                                     
/ceph/build/bin/monmaptool: monmap file /tmp/ceph_monmap.1686593                                                                                                                                             
setting min_mon_release = quincy                                                                                                                                                                             
/ceph/build/bin/monmaptool: set fsid to b4739f56-ef23-4af4-b7b2-d7a41303e75c                                                                                                                                 
epoch 0                                                                                                                                                                                                      
fsid b4739f56-ef23-4af4-b7b2-d7a41303e75c                                                                                                                                                                    
last_changed 2025-02-13T23:36:50.171548+0000                                                                                                                                                                 
created 2025-02-13T23:36:50.171548+0000                                                                                                                                                                      
min_mon_release 17 (quincy)
...
...
vstart cluster complete. Use stop.sh to stop. See out/* (e.g. 'tail -f out/????') for debug output.

dashboard urls: https://192.168.12.74:41915
  w/ user/pass: admin / admin

export PYTHONPATH=/ceph/src/pybind:/ceph/build/lib/cython_modules/lib.3:/ceph/src/python-common:$PYTHONPATH
export LD_LIBRARY_PATH=/ceph/build/lib:$LD_LIBRARY_PATH
export PATH=/ceph/build/bin:$PATH
export CEPH_CONF=/ceph/build/ceph.conf
alias cephfs-shell=/ceph/src/tools/cephfs/shell/cephfs-shell
CEPH_DEV=1
[pecastro@host build]$ ceph status
2025-02-13T23:38:03.196+0000 7f81085456c0 -1 WARNING: all dangerous and experimental features are enabled.
2025-02-13T23:38:03.203+0000 7f81085456c0 -1 WARNING: all dangerous and experimental features are enabled.
  cluster:
    id:     b4739f56-ef23-4af4-b7b2-d7a41303e75c
    health: HEALTH_WARN
            4 pool(s) have no replicas configured
 
  services:
    mon: 1 daemons, quorum a (age 72s)

image

@pecastro
Copy link
Contributor Author

jenkins test api

@pecastro
Copy link
Contributor Author

jenkins test make check

@pecastro
Copy link
Contributor Author

jenkins test dashboard cephadm

1 similar comment
@pecastro
Copy link
Contributor Author

jenkins test dashboard cephadm

@pecastro
Copy link
Contributor Author

@nizamial09 @epuertat Given you both did some work in this area maybe you can advise forward or to /dev/null.

@rkachach
Copy link
Contributor

@pecastro @phlogistonjohn related to the crypto functions (such as create_self_signed_cert) as @pecastro found out we already have support in cephadm/ssl_cert_utils.py to generate self-signed certificates. Besides, the above function is only used by the dashboard in the following code:

    @CLIWriteCommand("dashboard create-self-signed-cert")
    def set_mgr_created_self_signed_cert(self):
        cert, pkey = create_self_signed_cert('IT', 'ceph-dashboard')
        result = HandleCommandResult(*self.set_ssl_certificate(inbuf=cert))
        if result.retval != 0:
            return result

        result = HandleCommandResult(*self.set_ssl_certificate_key(inbuf=pkey))
        if result.retval != 0:
            return result
        return 0, 'Self-signed certificate created', ''

Which doesn't make sense to me as cephadm now is the responsible for generating self-signed certificates for the any mgr module who needs that (cephadm provides orch certmgr generate-certificates which is already used by dashboard and prometheus mgr-module) ... so moving forward I think we should be able to move away from using OpenSSL and switch to cryptography library probably making the code in cephadm/ssl_cert_utils.py accessible in python-common so other modules can use it.

@phlogistonjohn
Copy link
Contributor

phlogistonjohn commented Feb 20, 2025

@pecastro @phlogistonjohn related to the crypto functions (such as create_self_signed_cert) as @pecastro found out we already have support in cephadm/ssl_cert_utils.py to generate self-signed certificates. Besides, the above function is only used by the dashboard in the following code:

    @CLIWriteCommand("dashboard create-self-signed-cert")
    def set_mgr_created_self_signed_cert(self):
        cert, pkey = create_self_signed_cert('IT', 'ceph-dashboard')
        result = HandleCommandResult(*self.set_ssl_certificate(inbuf=cert))
        if result.retval != 0:
            return result

        result = HandleCommandResult(*self.set_ssl_certificate_key(inbuf=pkey))
        if result.retval != 0:
            return result
        return 0, 'Self-signed certificate created', ''

Which doesn't make sense to me as cephadm now is the responsible for generating self-signed certificates for the any mgr module who needs that (cephadm provides orch certmgr generate-certificates which is already used by dashboard and prometheus mgr-module) ... so moving forward I think we should be able to move away from using OpenSSL and switch to cryptography library probably making the code in cephadm/ssl_cert_utils.py accessible in python-common so other modules can use it.

I'm fine with those suggestions but I also think we need to take things one step at a time. On top of that, cryptography is a library that brings in rust dependencies and would be subject to this subinterpreter problem. So I think this effort is largely justified still. I don't know how much we want to put on @pecastro who is largely just trying to fix the pyO3 side of things AFAIK.

@pecastro
Copy link
Contributor Author

What @phlogistonjohn said.

So I think this effort is largely justified still. I don't know how much we want to put on @pecastro who is largely just trying to fix the pyO3 side of things AFAIK.

But mostly this:

On top of that, cryptography is a library that brings in rust dependencies and would be subject to this subinterpreter problem.

This MR offers a potential solution for what is an existing problem and it does so in a way that is modular and feasible of rework in the future.
The next step could be refactoring these and substituting them by cryptography functions provided that by then they already work in a subinterpreter or by applying a similar adapter pattern to has been done here.

@github-actions
Copy link

This pull request can no longer be automatically merged: a rebase is needed and changes have to be manually resolved

Copy link
Member

@batrick batrick left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Please squash Apply MR suggestions.

@pecastro
Copy link
Contributor Author

Hi all.
I went MIA on account of family holidays booked.

I'm slowly catching up on some of the nice work @phlogistonjohn did which handles mostly all the of the previous cases you @epuertat had spotted as in need of improvement.

I've just fixed a small typo and pushed and I suspect most of the remaining failures are due to the expectation of bytes vs string across the remote.py plumbing.

The extra shim "has a different opinion" about the type of data returned that might have deeper implications. I like the standardization of returning strings for strings sake but some of the caller functions/methods might not agree with it.
e.g. https://github.com/ceph/ceph/blob/f2edb7268da39edf214c920a84dd4c65c628f272/src/pybind/mgr/dashboard/module.py#L417C44-L417C67

I'm running tests locally to see if I understand the full implications of this as well as the bytes vs strings mismatches.

@nizamial09 nizamial09 mentioned this pull request Apr 22, 2025
14 tasks
@epuertat
Copy link
Member

@phlogistonjohn I was thinking on either:

  • a centralized mgr config option (env. vars are not that nice at cluster scale), or
  • a runtime check:
    • try: import cryptography; except ImportError: PYO3_ISSUE= True;, but that would be problematic since it'd follow 2 different approaches on 1st and 2nd+ runs :/
    • if Version("41.0.0") <= Version(cryptography.__version__) <= Version("Version without Pyo3 sub-interpreter issue")..., but we don't know yet when the pyo3 project will handle this issue...

So my vote would be for the centralized config option.

@phlogistonjohn
Copy link
Contributor

OK, that might be tricky wrt to the dependency flow, but I'll look into it. Right now we're just testing the basic workflow and yesterday we found another use of bcrypt module in the dashboard that was not being handled by the external process. I'm waiting to hear back on the status of that test run to see if we have more imports that need to be cleaned up. If all goes well, I'll continue organizing/refactoring.

@phlogistonjohn
Copy link
Contributor

A brief update: @adk3798 ran the combined patch series (plus a fixup commit) through teuthology and it failed again. After almost a whole day's investigation we determined that the kubernetes module, used by k8sevents and rook mgr modules, imports cryptography and if either of these were imported before cephadm the boostrap would fail. We've summarized some proposed solutions here: https://pad.ceph.com/p/pyo3-cryptography-imports-k8s
For now I will continue helping with this effort to isolate the dashboard from direct uses of cryptography but this is also something that needs considieration. Because one of the options is to remove the k8sevents and rook mgr modules we plain on raising it as a topic at the next CSC meeting (Apr. 28th).

@pecastro
Copy link
Contributor Author

@epuertat When I said the cryptotools needed to die I was talking from a general perspective as in with hope that the issue would be solved by the pyo3 subintepreters folks.
As @phlogistonjohn had suggested earlier in the conversation the process of splitting things out into a module would then make that job easier.

I understand the logic behind keeping the original code but have we considered the cons of this "suggestion"?
There will be two code paths to maintain/manage for as long as the pyo3 bug rages on, this is extra complexity being added.
These code path doesn't seem to be in need to be that high performant, I mean how many certs a minute are we talking of issuing and validating ?
Some food for thought.

I've added and a few more commits, some of which might be a bit controversial, like the last one removing ascii and utf-8 references across encode/decode. The tests seem to pass without those being explicitly stated.
Regarding my earlier comment, I hadn't noticed that everything is already being passed as strings with the exception of verify_tls function in mgr_util which has a test that specifies a bytes based primary key.
So, as long as the return values matches expectations which they seem to based on the testing it should be good.

@nizamial09
Copy link
Member

one of the options is to remove the k8sevents

i had a tracker months ago to deprecate it since no one is actively maintaing it: https://tracker.ceph.com/issues/66091

@phlogistonjohn
Copy link
Contributor

Hi all, I have updated my branch with additional fixes and rebased on main. Unfortunately, one of the things I feared is starting to happen - @pecastro and I are fixing the same set of issues but in different ways. For example - I updated a test to stop encoding a string to bytes while @pecastro changed the code to accept either strings or bytes. Obviously, I prefer my approach and I think I can justify it but we're doing redundant work and creating extra work to reconcile the changes.

To make my changes more visible I am going to create a seperate (draft) PR. My plan is to use it to partially rework the interface to generalize it and allow for either "subprocess mode" or "internal mode" - as discussed with @epuertat earlier in this PR. We will have to come up with a plan to reconcile our changes at some point though.

@phlogistonjohn
Copy link
Contributor

My branch, in PR form: #62951

@pecastro
Copy link
Contributor Author

For example - I updated a test to stop encoding a string to bytes while @pecastro changed the code to accept either strings or bytes. Obviously, I prefer my approach and I think I can justify it but we're doing redundant work and creating extra work to reconcile the changes.

@phlogistonjohn curiously my first fix was the same as yours 😄 , in the meantime before pushing, I added the encoding to honour the test thinking , who knows, maybe there's a legitimate case somewhere where a bytes pkey is passed in.
I'm not married to it.
I know the tests will all pass either way.

Given you're also in flow I can pause / stand down to minimize noise / attrition leaving the remaining bit of changing that verify_cacrt_content function to last.

In case you feel we should talk, I exist on a GMT schedule, but most of the work I do here is on GMT evenings.

@pecastro
Copy link
Contributor Author

jenkins test api

@phlogistonjohn
Copy link
Contributor

For example - I updated a test to stop encoding a string to bytes while @pecastro changed the code to accept either strings or bytes. Obviously, I prefer my approach and I think I can justify it but we're doing redundant work and creating extra work to reconcile the changes.

@phlogistonjohn curiously my first fix was the same as yours 😄 , in the meantime before pushing, I added the encoding to honour the test thinking , who knows, maybe there's a legitimate case somewhere where a bytes pkey is passed in. I'm not married to it. I know the tests will all pass either way.

:-) Agreed. The reason I prefer to change the test and not the library code is that mypy is run (only) on the modules dir and passes indicating that there is a very good chance no other production code is passing in a bytes array. Since tests exist to verify the behavior of the production code and

Given you're also in flow I can pause / stand down to minimize noise / attrition leaving the remaining bit of changing that verify_cacrt_content function to last.

I appreciate your offer, but I don't think I need that (yet). I just won't be rebasing on your branch frequently for the next couple of work days - not until all the fetures @epuertat and I discussed are implemented. I end up spending too much time in rebase, resolving conflicts rather than updating the code. I think it's fine if you continue to test and prove out that your patches work for your use cases. I'm working on making the code more modular and then making it configurable. In the meantime @adk3798 is helping test things in teuthology - main to prove the concept works as well as find other things we had previously overlooked, like the rook/k8sevents issue.

Perhaps late next week I will revisit what you have done in the meantime and reconcile the branches.

In case you feel we should talk, I exist on a GMT schedule, but most of the work I do here is on GMT evenings.

Thank you for the offer. I don't think I need that (yet). More generally, if you are interested there's a Orch Weekly every Tuesday where we discuss design and various related topics and you are certainly welcome to join one of those too.

pecastro and others added 11 commits April 26, 2025 00:07
Where possible try to use structured output in JSON for easier parsing
and interaction with the parent process.

Signed-off-by: John Mulligan <[email protected]>
Create a class to act as a common shim between the cryptotools external
functions and the mgr. It provides common conversion mechanisms and
could possibly act as an abstraction in case we decide to make
the external function calls in different ways in the future.

Signed-off-by: John Mulligan <[email protected]>
@pecastro pecastro force-pushed the pyO3-import-issues branch from 0bc8485 to dbc0d64 Compare April 25, 2025 23:26
@pecastro pecastro force-pushed the pyO3-import-issues branch from dbc0d64 to 7016a39 Compare April 25, 2025 23:26
@pecastro
Copy link
Contributor Author

I think it's fine if you continue to test and prove out that your patches work for your use cases.

I'm done, obviously pending a re-review.

The tests are passing and other than the dual mode running feature which @phlogistonjohn seems to be working on I've proven that this workaround is feasible to implement which was my primary goal with this MR alongside the perceived urgency of this fix.

I've added the function name renaming that @epuertat suggested, and amended the commit messages to the standard but everything should still pretty much be as when @phlogistonjohn last rebased.

@pecastro pecastro closed this Jun 9, 2025
@github-project-automation github-project-automation bot moved this from New to Done in Ceph-Dashboard Jun 9, 2025
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

Archived in project

Development

Successfully merging this pull request may close these issues.

7 participants