rgw: replace null_yields with optional ones#50206
Conversation
c92af68 to
9c7e220
Compare
cbodley
left a comment
There was a problem hiding this comment.
looks great overall! 'make check' is catching some unit tests under src/test/rgw/ that need updating also
src/rgw/driver/rados/rgw_gc.cc
Outdated
| const uint64_t queue_size = cct->_conf->rgw_gc_max_queue_size, num_deferred_entries = cct->_conf->rgw_gc_max_deferred; | ||
| gc_log_init2(op, queue_size, num_deferred_entries); | ||
| store->gc_operate(this, obj_names[i], &op); | ||
| store->gc_operate(this, obj_names[i], &op, null_yield); |
There was a problem hiding this comment.
you added an optional_yield y argument to RGWGC::initialize(), so you can pass that here. same with RGWGC::send_split_chain() and several other functions in rgw_gc.cc
|
BTW, you can "make tests" (or "ninja tests") to build the tests that make check uses without running them. |
| auto& ref = obj.get_ref(); | ||
| int ret = cls_timeindex_trim_repeat(dpp, ref, oid, utime_t(start_time), utime_t(end_time), | ||
| from_marker, to_marker); | ||
| from_marker, to_marker, null_yield); |
There was a problem hiding this comment.
You added an optional_yield to objexp_hint_trim so you don't need a null_yield here.
|
|
||
| int ret = exp_store.objexp_hint_trim(dpp, shard, rt_from, rt_to, | ||
| from_marker, to_marker); | ||
| from_marker, to_marker, null_yield); |
There was a problem hiding this comment.
Same here, there's an optional_yield added to trim_chunk.
|
|
||
| if (need_trim) { | ||
| trim_chunk(dpp, shard, last_run, round_start, marker, out_marker); | ||
| trim_chunk(dpp, shard, last_run, round_start, marker, out_marker, null_yield); |
There was a problem hiding this comment.
Same here, there's an optional_yield added to process_single_shard.
| ldpp_dout(dpp, 20) << "processing shard = " << shard << dendl; | ||
|
|
||
| if (! process_single_shard(dpp, shard, last_run, round_start)) { | ||
| if (! process_single_shard(dpp, shard, last_run, round_start, null_yield)) { |
There was a problem hiding this comment.
Same here, there's an optional_yield added to inspect_all_shards.
| int RGWRados::try_refresh_bucket_info(RGWBucketInfo& info, | ||
| ceph::real_time *pmtime, | ||
| const DoutPrefixProvider *dpp, | ||
| const DoutPrefixProvider *dpp, optional_yield y, |
There was a problem hiding this comment.
@cbodley It's a knit, but there are dozens of instances in this PR where the optional_yield argument in a function definition is added on the same line as another argument even though every other argument in the function is on it's own line. I think these optional_yields should go on their own line. What do you think?
There was a problem hiding this comment.
i'm generally fine with arguments on the same line as long as it doesn't go over ~80 columns
but if doubling up arguments looks inconsistent with other nearby function definitions, you're welcome to request changes
src/rgw/driver/rados/rgw_rados.cc
Outdated
| bucket->get_info().owner, obj_ctx, | ||
| dest_obj->clone(), olh_epoch, tag, | ||
| dpp, null_yield); | ||
| dpp, y); |
There was a problem hiding this comment.
src/rgw/rgw_crypt.cc
Outdated
| break; | ||
| } | ||
| res = s->bucket->try_refresh_info(s, nullptr); | ||
| res = s->bucket->try_refresh_info(s, nullptr, null_yield); |
There was a problem hiding this comment.
you can use s->yield here and anywhere else there is a req_state which is commonly called s.
src/rgw/rgw_op.cc
Outdated
| s->bucket->get_info().set_sync_policy(std::move(sync_policy)); | ||
|
|
||
| int ret = s->bucket->put_info(this, false, real_time()); | ||
| int ret = s->bucket->put_info(this, false, real_time(), null_yield); |
There was a problem hiding this comment.
This function takes a y.
src/rgw/rgw_op.cc
Outdated
| s->bucket->get_info().set_sync_policy(std::move(sync_policy)); | ||
|
|
||
| int ret = s->bucket->put_info(this, false, real_time()); | ||
| int ret = s->bucket->put_info(this, false, real_time(), null_yield); |
There was a problem hiding this comment.
This function takes a y.
src/rgw/rgw_op.cc
Outdated
| s->bucket->get_info().has_website = true; | ||
| s->bucket->get_info().website_conf = website_conf; | ||
| op_ret = s->bucket->put_info(this, false, real_time()); | ||
| op_ret = s->bucket->put_info(this, false, real_time(), null_yield); |
There was a problem hiding this comment.
This function takes a y.
src/rgw/rgw_op.cc
Outdated
| s->bucket->get_info().has_website = false; | ||
| s->bucket->get_info().website_conf = RGWBucketWebsiteConf(); | ||
| op_ret = s->bucket->put_info(this, false, real_time()); | ||
| op_ret = s->bucket->put_info(this, false, real_time(), null_yield); |
There was a problem hiding this comment.
This function takes a y.
src/rgw/rgw_op.cc
Outdated
| op_ret = retry_raced_bucket_write(this, s->bucket.get(), [this] { | ||
| s->bucket->get_info().obj_lock = obj_lock; | ||
| op_ret = s->bucket->put_info(this, false, real_time()); | ||
| op_ret = s->bucket->put_info(this, false, real_time(), null_yield); |
There was a problem hiding this comment.
This function takes a y.
|
@kapandya great start to this work! On top of the other comments that were left do you have a list of which ops don't have anymore librados blocking warnings anymore and which still do or are untested? What happens with a full run of s3-tests? |
|
This pull request can no longer be automatically merged: a rebase is needed and changes have to be manually resolved |
e047c57 to
6816554
Compare
6816554 to
7abf53e
Compare
7abf53e to
870c141
Compare
870c141 to
0a8ada6
Compare
0a8ada6 to
bd9c8ea
Compare
Passing optional_yield to rgw_rados.cc Signed-off-by: Kalpesh Pandya <[email protected]>
Changing null_yield in rgw_reshard.cc Signed-off-by: Kalpesh Pandya <[email protected]>
Changing null_yield in rgw_sal_rados.cc Signed-off-by: Kalpesh Pandya <[email protected]>
Signed-off-by: Kalpesh Pandya <[email protected]>
6a85f47 to
05ddbb7
Compare
|
we must be very close here, thanks for the rebase, @TRYTOBE8TME :) |
|
I see that make check failed: [ RUN ] Queue.SpawnAsyncRequest |
|
jenkins test make check |
|
jenkins test api |
These went away but now I see ceph API tests check failing and it doesn't show any reason |
looks great! the s3test failure against dbstore is a new regression, and i've pushed a fix in |
|
jenkins test api |
|
thanks @TRYTOBE8TME! are there more null_yields that we need to catch? @adamemerson had found some in #52036 as a next step, i'd like to find a way to add test coverage in teuthology to make sure that no requests are blocking. that might involve a config variable (something like |
|
@cbodley does this PR need to block on adding tests which find more blocking calls we could fix? I think we want to get this one in when it's complete and stable, no? |
maybe there should be a compilation flag to compile out all the blocking rados calls, and compile the RGW with this flag? |
|
@cbodley yes there are some places where |
@mattbenjamin this PR has merged, sir! i think we're close to the finish line here, so i'd like to add some regression testing to catch any remaining blocking calls, and keep us from adding new ones. then we can do some performance measurements to figure out how to size the thread pool optimally |
@yuvalif we still need the blocking mode for radosgw-admin calls, librgw requests, and background threads. it's just the asio threads (like the beast frontend and |
so, can we force an explicit |
|
i added this topic to next week's agenda https://pad.ceph.com/p/rgw-weekly |
Also clears stale pending entries prior to attempting to clear OLH index entries and remove the OLH object. Signed-off-by: Cory Snyder <[email protected]> (cherry picked from commit 3437897) Conflicts: src/rgw/driver/rados/rgw_rados.cc src/rgw/driver/rados/rgw_rados.h Cherry-pick notes: - conflicts due to ceph#50206 on main but not reef
…ror scenarios Makes request threads clean up after themselves and remove their pending xattrs from OLH objects before they return an error from set_olh. Fixes: https://tracker.ceph.com/issues/59663 Signed-off-by: Cory Snyder <[email protected]> (cherry picked from commit 38133e5) Conflicts: src/rgw/driver/rados/rgw_rados.cc src/rgw/driver/rados/rgw_rados.h Cherry-pick notes: - conflicts due to ceph#50206 on main but not reef
When pending_removal is true and the log is trimmed prior to clearing olh, it allows for the possibility that the olh is not cleared (e.g. due to new pending xattrs on the olh obj), but the olh log entries are removed. Signed-off-by: Cory Snyder <[email protected]> (cherry picked from commit dfbe4f9) Conflicts: src/rgw/driver/rados/rgw_rados.cc Cherry-pick notes: - conflicts due to ceph#50206 on main
Contribution Guidelines
To sign and title your commits, please refer to Submitting Patches to Ceph.
If you are submitting a fix for a stable branch (e.g. "pacific"), please refer to Submitting Patches to Ceph - Backports for the proper workflow.
Checklist
Show available Jenkins commands
jenkins retest this pleasejenkins test classic perfjenkins test crimson perfjenkins test signedjenkins test make checkjenkins test make check arm64jenkins test submodulesjenkins test dashboardjenkins test dashboard cephadmjenkins test apijenkins test docsjenkins render docsjenkins test ceph-volume alljenkins test ceph-volume toxjenkins test windows