Skip to content

qa/tasks/cephadm: enable mon_cluster_log_to_file#54312

Merged
adk3798 merged 1 commit intoceph:mainfrom
dvanders:dvanders_mcltf_true
Feb 2, 2024
Merged

qa/tasks/cephadm: enable mon_cluster_log_to_file#54312
adk3798 merged 1 commit intoceph:mainfrom
dvanders:dvanders_mcltf_true

Conversation

@dvanders
Copy link
Contributor

@dvanders dvanders commented Nov 2, 2023

Without cluster_log_to_file we have nothing to grep for errors:

2023-10-27T16:06:59.111 DEBUG:teuthology.orchestra.run.smithi150:> sudo egrep '[ERR]|[WRN]|[SEC]' /var/log/ceph/38cc7fce-74d9-11ee-8db9-212e2dc638e7/ceph.log | egrep -v '(MDS_ALL_DOWN)' | egrep -v '(MDS_UP_LESS_THAN_MAX)' | head -n 1 2023-10-27T16:06:59.141 INFO:teuthology.orchestra.run.smithi150.stderr:grep: /var/log/ceph/38cc7fce-74d9-11ee-8db9-212e2dc638e7/ceph.log: No such file or directory

Set mon_cluster_log_to_file = true.

Fixes: https://tracker.ceph.com/issues/63425

See https://pulpito.ceph.com/teuthology-2023-10-28_14:23:03-upgrade:quincy-x-reef-distro-default-smithi/7439369/ for a broken example

Without cluster_log_to_file we have nothing to grep for errors:

2023-10-27T16:06:59.111 DEBUG:teuthology.orchestra.run.smithi150:> sudo egrep '\[ERR\]|\[WRN\]|\[SEC\]' /var/log/ceph/38cc7fce-74d9-11ee-8db9-212e2dc638e7/ceph.log | egrep -v '\(MDS_ALL_DOWN\)' | egrep -v '\(MDS_UP_LESS_THAN_MAX\)' | head -n 1
2023-10-27T16:06:59.141 INFO:teuthology.orchestra.run.smithi150.stderr:grep: /var/log/ceph/38cc7fce-74d9-11ee-8db9-212e2dc638e7/ceph.log: No such file or directory

Set mon_cluster_log_to_file = true.

Fixes: https://tracker.ceph.com/issues/63425
Signed-off-by: Dan van der Ster <[email protected]>
@github-actions github-actions bot added the tests label Nov 2, 2023
@dvanders dvanders requested review from adk3798 and kamoltat November 2, 2023 23:54
log_to_file = true
log_to_stderr = false
log to journald = false
mon cluster log to file = true
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Does this serve the same purpose as #48539 ?

Copy link
Contributor Author

@dvanders dvanders Nov 3, 2023

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Here's another example this time with a list of the out files in the /var/log/ceph/ dir: http://qa-proxy.ceph.com/teuthology/teuthology-2023-10-27_14:23:02-upgrade:pacific-x-quincy-distro-default-smithi/7438907/teuthology.log

You can see that ceph.log is missing.

I couldn't confirm if this cephadm.conf change makes the ceph.log appear -- can you please help testing that?

Copy link
Contributor

@idryomov idryomov Nov 3, 2023

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We still need #48539 in case something else goes sideways.

@adk3798 Please prioritize adding necessary ignorelists to cephadm suite -- it's what that PR is blocked on and why I have been unstaling it for months now. Chris and I didn't want to merge something that would turn one of the major suites red, but there a limit to how long something like that can wait...

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I couldn't confirm if this cephadm.conf change makes the ceph.log appear -- can you please help testing that?

@dvanders If it does, it would turn cephadm suite red because, unlike in other suites, there are no ignorelists there.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@idryomov At this point, if it's causing failures to be missed elsewhere, I'm okay with it being merged. I have a partially complete ignorelist PR and this will at least force me to finish it. I can deal with sifting through the failures for the time before I finish that as well.

Copy link
Contributor

@idryomov idryomov Nov 3, 2023

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@adk3798 I have asked @chrisphoffman to move forward with #48539.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@adk3798 I have asked @chrisphoffman to move forward with #48539.

This hasn't happened. I have unstaled #48539 again, but would suggest moving forward by picking up this PR and seeing if it actually makes ceph.log file appear and causes those ignorelist-related failures.

Copy link
Contributor

@idryomov idryomov Jan 31, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If it does, it would turn cephadm suite red

Adam is asking for another run, but this is definitely working:

https://pulpito.ceph.com/yuriw-2024-01-31_15:48:29-rados:cephadm-main-distro-default-smithi/

Let's get this merged!

@github-actions
Copy link

github-actions bot commented Jan 3, 2024

This pull request has been automatically marked as stale because it has not had any activity for 60 days. It will be closed if no further activity occurs for another 30 days.
If you are a maintainer or core committer, please follow-up on this pull request to identify what steps should be taken by the author to move this proposed change forward.
If you are the author of this pull request, thank you for your proposed contribution. If you believe this change is still appropriate, please ensure that any feedback has been addressed and ask for a code review.

@idryomov
Copy link
Contributor

jenkins test api

Copy link
Contributor

@adk3798 adk3798 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@ljflores
Copy link
Member

ljflores commented Feb 8, 2024

There are a lot of failures in the rados suite from this PR. Until a PR is raised to whitelist expected warnings, a lot of main test batches are blocked.

See https://tracker.ceph.com/issues/64343.

Should be a relatively straightfoward fix from the rados team, but pasting here to raise awareness.

@ljflores
Copy link
Member

ljflores commented Feb 8, 2024

@dvanders the rados team decided it would be best to revert this commit to unblock testing efforts for Squid, with the intent of re-merging the commit with some additions to the allowlist.

#55498

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

None yet

Development

Successfully merging this pull request may close these issues.

5 participants