Skip to content

Comments

run-make: work around package conflicts from llvm.sh#63409

Merged
cbodley merged 1 commit intoceph:mainfrom
cbodley:wip-70792
Jun 4, 2025
Merged

run-make: work around package conflicts from llvm.sh#63409
cbodley merged 1 commit intoceph:mainfrom
cbodley:wip-70792

Conversation

@cbodley
Copy link
Contributor

@cbodley cbodley commented May 21, 2025

packages installed by llvm.sh sometimes conflict with existing packages from earlier versions, leading to errors like:

The following packages have unmet dependencies:
python3-lldb-13 : Conflicts: python3-lldb-x.y
python3-lldb-19 : Conflicts: python3-lldb-x.y

remove any existing packages before running llvm.sh

Fixes: https://tracker.ceph.com/issues/70792

Show available Jenkins commands

@cbodley cbodley requested review from Matan-B and dmick May 21, 2025 17:27
Comment on lines 58 to 59
ci_debug "Removing existing llvm packages"
$SUDO apt-get purge --auto-remove llvm python3-lldb-13 llvm-13 -y
Copy link
Contributor

@Matan-B Matan-B May 21, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for looking into this!
Should we check that clang-13 exists in the first place? After the first runs of this script it shouldn't exist anymore.
Alternatively, we could also clean our machines manually from clang-13 instead of adding this to the script IIUC.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Alternatively, we could also clean our machines manually from clang-13 instead of adding this to the script IIUC.

this llvm.sh stuff doesn't exist on the squid/reef branches. would those builds keep installing v19 packages from the repos added by llvm.sh? or would they keep reinstalling v13 packages?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

from https://tracker.ceph.com/issues/70792#note-2:

The right solution is to fix the 'update clang' build script snippet to remove any other clang versions. This may happen again as we change the required version; removing 13 (or any other versions) now won't fix the problem, if there is one, when we move from 19 to 20, for example.

@dmick doesn't want to manage the builder dependencies manually. when we eventually want to bump this llvm version to 20+, we're likely to hit the same conflicts again

removing any llvm packages is probably the most reliable approach here (ie purging llvm python3-lldb without specifying version), though it would uninstall/reinstall v19 packages for each build

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

ubuntu 24.04 builders probably won't default to clang-13 anyway, so we may have conflicts with other versions too

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looking at some of the latest makecheck, v14 is actually being installed (not 13) for main/reef:

CI_DEBUG: Running clean_boost_on_ubuntu() in install-deps.sh
Reading package lists...
Building dependency tree...
Reading state information...
clang is already the newest version (1:14.0-55~exp2).

Do we know whether 14 would also conflict with 19? If it won't, it should explain why only some machines are not happy.


this llvm.sh stuff doesn't exist on the squid/reef branches. would those builds keep installing v19 packages from the repos added by llvm.sh? or would they keep reinstalling v13 packages?

IIUC older branches and main would keep reinstalling v14 as it's part of install-deps. But as long as we only remove 13 here it should be ok (no one should reinstall it).

If 14 would also conflict (hopefully not), then we would keep installing/reinstalling it unless we stop getting 14 as part of install-deps when running mackecheck.

FWIW, I still think it would be better to be done manually if no script would keep reinstalling it back since after the first run this line would be irrelevant. However, this concern is not a blocker and I'm ok with merging as is if I'm missing something else.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It seems to me that we want to purge any/all earlier versions of llvm-and-friends, not just 13

updated to remove the -13s from the apt purge command

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Do we think it's safer to leave the other llvm packages intact, or remove them as well?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

hmm looking at llvm.sh, this is the list of packages it installs:

PKG="clang-$LLVM_VERSION lldb-$LLVM_VERSION lld-$LLVM_VERSION clangd-$LLVM_VERSION"

so i could copy those for the purge command. the conflicting python3-lldb isn't mentioned there, but i assume it would be removed along with 'lldb'?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

and, by the way, the llvm.sh on download.ceph.com is already out of date

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

could include python3-lldb anyway to be safe, like this?

$DRY_RUN $SUDO apt-get purge --auto-remove clang lldb lld clangd python3-lldb -y

ci_debug "Removing existing llvm packages"
$SUDO apt-get purge --auto-remove llvm python3-lldb -y
wrap_sudo
$DRY_RUN $SUDO apt-get purge --auto-remove llvm python3-lldb -y
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We will purge possibly conflicting packages only under the if ! type clang-19 check above - Just to make sure: the machines which had 13/19 conflicts were not able to install 19 successfully and therefore would fail the !clang-19 check and are expected to enter this branch, right?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

that sounds right

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

does seem like the wrong approach though; it'd be much less likely to break in future if we made it "uninstall everything that's not the version we're asking for" unconditionally

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

happy to move the 'apt purge' command out of the condition, but is there a way to say "uninstall all clang stuff except v19"?

or would we have to uninstall all clang packages, then run llvm.sh unconditionally?

Copy link
Member

@dmick dmick May 28, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

"for i in 13..current-1 do"... ?

@cbodley
Copy link
Contributor Author

cbodley commented May 28, 2025

seeing failures for v16 which we don't have a repo for. testing on my old 20.04 vm doesn't find those v14 packages either

Package 'clangd-14' is not installed, so not removed
Package 'lld-14' is not installed, so not removed
Package 'lldb-14' is not installed, so not removed
Package 'python3-lldb-14' is not installed, so not removed
The following packages will be REMOVED:
  clang* clang-14* libclang-common-14-dev* libz3-4* libz3-dev* llvm-14*
  llvm-14-dev* llvm-14-linker-tools* llvm-14-runtime* llvm-14-tools*
0 upgraded, 0 newly installed, 10 to remove and 215 not upgraded.
After this operation, 428 MB disk space will be freed.
(Reading database ... 206618 files and directories currently installed.)
Removing clang (1:14.0-55~exp2) ...
Removing clang-14 (1:14.0.0-1ubuntu1.1) ...
Removing libclang-common-14-dev (1:14.0.0-1ubuntu1.1) ...
Removing llvm-14-dev (1:14.0.0-1ubuntu1.1) ...
Removing libz3-dev:amd64 (4.8.12-1) ...
Removing libz3-4:amd64 (4.8.12-1) ...
Removing llvm-14 (1:14.0.0-1ubuntu1.1) ...
Removing llvm-14-linker-tools (1:14.0.0-1ubuntu1.1) ...
Removing llvm-14-runtime (1:14.0.0-1ubuntu1.1) ...
Removing llvm-14-tools (1:14.0.0-1ubuntu1.1) ...
Processing triggers for man-db (2.10.2-1) ...
Processing triggers for libc-bin (2.35-0ubuntu3.8) ...
Reading package lists...
Building dependency tree...
Reading state information...
Package 'clang-15' is not installed, so not removed
Package 'clangd-15' is not installed, so not removed
Package 'lld-15' is not installed, so not removed
Package 'lldb-15' is not installed, so not removed
Package 'python3-lldb-15' is not installed, so not removed
0 upgraded, 0 newly installed, 0 to remove and 215 not upgraded.
Reading package lists...
Building dependency tree...
Reading state information...
Package 'lldb-16' is not installed, so not removed
Package 'python3-lldb-16' is not installed, so not removed
E: Unable to locate package clang-16
E: Unable to locate package lld-16
E: Unable to locate package clangd-16
Build step 'Execute shell' marked build as failure

so added || true to ignore failures from the 'apt purge' commands

@cbodley
Copy link
Contributor Author

cbodley commented May 28, 2025

https://jenkins.ceph.com/job/ceph-pull-requests/159218/ succeeded, but didn't find any old packages to remove and didn't need to run llvm.sh

@cbodley
Copy link
Contributor Author

cbodley commented Jun 3, 2025

no changes, just squashed the commits

Copy link
Member

@dmick dmick left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

looks like the missing component from the original integration

@cbodley
Copy link
Contributor Author

cbodley commented Jun 3, 2025

from the log, i see CI_DEBUG: Removing clang package versions from 13-18 working correctly, but then it later reinstalls clang-14:

CI_DEBUG: Running install-deps.sh
Using apt-get to install dependencies
CI_DEBUG: Running clean_boost_on_ubuntu() in install-deps.sh
Reading package lists...
Building dependency tree...
Reading state information...
debianutils is already the newest version (5.5-1ubuntu2).
ccache is already the newest version (4.5.1-1).
git is already the newest version (1:2.34.1-1ubuntu1.12).
lvm2 is already the newest version (2.03.11-2.1ubuntu5).
The following additional packages will be installed:
  clang-14 libclang-common-14-dev libclang-cpp14 libclang1-14 libz3-4
  libz3-dev llvm-14 llvm-14-dev llvm-14-linker-tools llvm-14-runtime
  llvm-14-tools

added a commit to remove 'clang' from INSTALL_EXTRA_PACKAGES below

packages installed by llvm.sh sometimes conflict with existing packages
from earlier versions, leading to errors like:

> The following packages have unmet dependencies:
> python3-lldb-13 : Conflicts: python3-lldb-x.y
> python3-lldb-19 : Conflicts: python3-lldb-x.y

remove packages from any earlier versions before running llvm.sh

Fixes: https://tracker.ceph.com/issues/70792

Signed-off-by: Casey Bodley <[email protected]>
@github-actions
Copy link

github-actions bot commented Jun 4, 2025

Config Diff Tool Output

- removed: breakpad
! changed: rgw_sts_key: old: Key used for encrypting/ decrypting role session tokens. This key must consist of 16 hexadecimal characters, which can be generated by the command 'openssl rand -hex 16'. All radosgw instances in a zone should use the same key. In multisite configurations, all zones in a realm should use the same key.
! changed: rgw_sts_key: new: Key used for encrypting/ decrypting session token.
! changed: osd_scrub_max_interval: old: Scrub each PG no less often than this interval. Note that this option must be set at ``global`` scope, or for both ``mgr`` and``osd``.
! changed: osd_scrub_max_interval: new: Scrub each PG no less often than this interval
! changed: osd_scrub_min_interval: old: The desired interval between scrubs of a specific PG. Note that this option must be set at ``global`` scope, or for both ``mgr`` and``osd``.
! changed: osd_scrub_min_interval: new: The desired interval between scrubs of a specific PG.
! changed: osd_deep_scrub_interval: old: Deep scrub each PG (i.e., verify data checksums) at least this often. Note that this option must be set at ``global`` scope, or for both ``mgr`` and``osd``.
! changed: osd_deep_scrub_interval: new: Deep scrub each PG (i.e., verify data checksums) at least this often
! changed: osd_deep_scrub_interval_cv: old: The coefficient of variation for the deep scrub interval, specified as a ratio. On average, the next deep scrub for a PG is scheduled osd_deep_scrub_interval after the last deep scrub . The actual time is randomized to a normal distribution with a standard deviation of osd_deep_scrub_interval * osd_deep_scrub_interval_cv (clamped to within 2 standard deviations). The default value guarantees that 95% of deep scrubs will be scheduled in the range [0.8 * osd_deep_scrub_interval, 1.2 * osd_deep_scrub_interval].
! changed: osd_deep_scrub_interval_cv: new: The coefficient of variation for the deep scrub interval, specified as a ratio. On average, the next deep scrub for a PG is scheduled osd_deep_scrub_interval after the last deep scrub . The actual time is randomized to a normal distribution with a standard deviation of osd_deep_scrub_interval * osd_deep_scrub_interval_cv (clamped to within 2 standard deviations). The default value guarantees that 95% of the deep scrubs will be scheduled in the range [0.8 * osd_deep_scrub_interval, 1.2 * osd_deep_scrub_interval].
! changed: osd_deep_scrub_interval_cv: old: deep scrub intervals are varied by a random amount to prevent stampedes. This parameter determines the amount of variation. Technically ``osd_deep_scrub_interval_cv`` is the coefficient of variation for the deep scrub interval.
! changed: osd_deep_scrub_interval_cv: new: deep scrub intervals are varied by a random amount to prevent stampedes. This parameter determines the amount of variation. Technically - osd_deep_scrub_interval_cv is the coefficient of variation for the deep scrub interval.

The above configuration changes are found in the PR. Please update the relevant release documentation if necessary.

@cbodley cbodley merged commit 03238f0 into ceph:main Jun 4, 2025
12 of 13 checks passed
@cbodley cbodley deleted the wip-70792 branch June 4, 2025 18:57
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants