Skip to content

Comments

rgw: cksum: implement support for new CRC64NVME checksum algorithm and related fixes#61878

Merged
cbodley merged 5 commits intoceph:mainfrom
linuxbox2:wip-cksum-badalg
Mar 31, 2025
Merged

rgw: cksum: implement support for new CRC64NVME checksum algorithm and related fixes#61878
cbodley merged 5 commits intoceph:mainfrom
linuxbox2:wip-cksum-badalg

Conversation

@mattbenjamin
Copy link
Contributor

@mattbenjamin mattbenjamin commented Feb 18, 2025

Fixes: https://tracker.ceph.com/issues/70040

Contribution Guidelines

  • To sign and title your commits, please refer to Submitting Patches to Ceph.

  • If you are submitting a fix for a stable branch (e.g. "quincy"), please refer to Submitting Patches to Ceph - Backports for the proper workflow.

  • When filling out the below checklist, you may click boxes directly in the GitHub web UI. When entering or editing the entire PR message in the GitHub web UI editor, you may also select a checklist item by adding an x between the brackets: [x]. Spaces and capitalization matter when checking off items this way.

Checklist

  • Tracker (select at least one)
    • References tracker ticket
    • Very recent bug; references commit where it was introduced
    • New feature (ticket optional)
    • Doc update (no ticket needed)
    • Code cleanup (no ticket needed)
  • Component impact
    • Affects Dashboard, opened tracker ticket
    • Affects Orchestrator, opened tracker ticket
    • No impact that needs to be tracked
  • Documentation (select at least one)
    • Updates relevant documentation
    • No doc update is appropriate
  • Tests (select at least one)
Show available Jenkins commands
  • jenkins retest this please
  • jenkins test classic perf
  • jenkins test crimson perf
  • jenkins test signed
  • jenkins test make check
  • jenkins test make check arm64
  • jenkins test submodules
  • jenkins test dashboard
  • jenkins test dashboard cephadm
  • jenkins test api
  • jenkins test docs
  • jenkins render docs
  • jenkins test ceph-volume all
  • jenkins test ceph-volume tox
  • jenkins test windows
  • jenkins test rook e2e

@mattbenjamin mattbenjamin self-assigned this Feb 18, 2025
@mattbenjamin mattbenjamin requested a review from a team as a code owner February 18, 2025 16:39
@mattbenjamin mattbenjamin force-pushed the wip-cksum-badalg branch 3 times, most recently from 716488b to d6b81e1 Compare February 24, 2025 19:44
@mattbenjamin
Copy link
Contributor Author

(there is no relationship to nvmeof; one commit incorrectly changed the nvmeof/gateway submodule, that's fixed)

@cbodley
Copy link
Contributor

cbodley commented Feb 26, 2025

now that you're using the madler crc library, do we still need the bits from spdk? if so, is there a reason we can't pull up our spdk submodule and link to that?

are the madler files coming from this https://github.com/madler/crcany repo? why not add that as a submodule?

p.s. please update COPYING with the added licenses

@mattbenjamin
Copy link
Contributor Author

mattbenjamin commented Mar 2, 2025

some todo-list items:

  1. handle client requested checksum "type" (COMPOSITE or FULL_OBJECT)
  2. improve the abstraction around "type" without undue ugliness
  3. unit tests for combiners--would like a full matrix of algorithms and optional type
  4. add to copying (thanks)
  5. add s3-tests

@mattbenjamin
Copy link
Contributor Author

mattbenjamin commented Mar 2, 2025

spdk: as discussed, the checksums here have simd acceleration and do seem to be in both isa-l standalone and the most recent versions of spdk, so there will be some way to remove these, but right now, I seem to need the table versions here. Can be improved with some build work. crcany (madler) provides table-based combining methods that work, I intend to rely on them only for that, and as cross checks on the others.

@mattbenjamin
Copy link
Contributor Author

crcany: it could be a submodule; is it worth it?

@mattbenjamin
Copy link
Contributor Author

related s3-tests branch: ceph/s3-tests#626

@mattbenjamin
Copy link
Contributor Author

@cbodley just rebased, am having confusing misbehavior. I've successfully completed a multipart upload with CRC64NVME checksum. but when my s3-test does a HEAD request with ChecksumMode = ENABLED, it is getting the right data from the stored checksujm it is adding the correct information to the formatter (below), but the s3-test logic is failing with

FAILED s3tests_boto3/functional/test_s3.py::test_multipart_use_cksum_helper_crc64nvme - KeyError: 'ChecksumCRC64NVME'

here's what the rgw log says:
2025-03-03T15:56:30.627-0500 7f809261b6c0 20 req 8392445594298461084 0.004000062s s3:get_obj Read xattr rgw_rados: user.rgw.x-amz-checksum-type 2025-03-03T15:56:30.627-0500 7f809261b6c0 10 req 8392445594298461084 0.004000062s cache get: name=default.rgw.log++script.getdata. : hit (negative entry) 2025-03-03T15:56:30.627-0500 7f809261b6c0 15 req 8392445594298461084 0.004000062s Encryption mode: 2025-03-03T15:56:30.627-0500 7f809261b6c0 16 req 8392445594298461084 0.004000062s s3:get_obj INFO: send_response_data ChecksumMode==ENABLED element-name ChecksumCRC64NVME value i+6LR0y3eFo= 2025-03-03T15:56:30.627-0500 7f809261b6c0 2 req 8392445594298461084 0.004000062s s3:get_obj completing 2025-03-03T15:56:30.627-0500 7f809261b6c0 10 req 8392445594298461084 0.004000062s cache get: name=default.rgw.log++script.postrequest. : hit (negative entry) 2025-03-03T15:56:30.627-0500 7f809261b6c0 2 req 8392445594298461084 0.004000062s s3:get_obj op status=0 2025-03-03T15:56:30.627-0500 7f809261b6c0 2 req 8392445594298461084 0.004000062s s3:get_obj http status=200 2025-03-03T15:56:30.627-0500 7f809261b6c0 1 ====== req done req=0x7f804fd75180 op=get_obj bucket=mattb-buspqoc7wdy5sjs4v14y0go-1 status=0 http_status=200 latency=0.004000062s request_id=tx000007477f4cd8cf3ab9c-0067c6177e-4209-default ======

@mattbenjamin
Copy link
Contributor Author

I've debugged this, the head request is succeeding, but just doesn't have the checksum element in it

@mattbenjamin
Copy link
Contributor Author

@cbodley ok never mind, thanks for spotting the issue in slack

@mattbenjamin mattbenjamin requested a review from a team as a code owner March 5, 2025 18:57
__func__, cksum.element_name(), cksum.to_armor());
dump_header(s, cksum.header_name(), armored_cksum);
}
dump_header(s, "ChecksumType", std::get<1>(cksum_type));
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

ChecksumType -> x-amz-checksum-type

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

oh, oops.

@mattbenjamin
Copy link
Contributor Author

wtf? in ceph-api-tests:
image

@mattbenjamin
Copy link
Contributor Author

also, github is saying there are modified submodules, but I don't think that's the case

@mattbenjamin
Copy link
Contributor Author

I'll check into whether jenkins is broken tomorrow, before rebasing

@cbodley
Copy link
Contributor

cbodley commented Mar 26, 2025

wtf? in ceph-api-tests: image

not your fault. there's a fix in #62515

@rzarzynski
Copy link
Contributor

Should be good now; retriggering the check bots.

@rzarzynski
Copy link
Contributor

jenkins retest this please

Support uses conditionally compiled implementations from spdk
and ISA-L.

Validated against SPDK test vectors and one example generated and
passed via awscliv2 (AWS SDK) version 2.24.5.

Restored the ability of unittest_rgw_cksum to create a binary input
file for external checksum testing, but only when requested via
cmdline option.

Fixes: https://tracker.ceph.com/issues/70040

Signed-off-by: Matt Benjamin <[email protected]>
These constructions provide conforming implemenations of Amazon
S3 CRC32 (ISO hdlc), CRC32C (iscsi), and CRC64/NVME checksums,
in particular, linear combining of adjacent checksums, by
Mark Adler.

Source: https://github.com/madler/crcany
License: approximate BSD with "optional" advertising clause

Signed-off-by: Matt Benjamin <[email protected]>
@mattbenjamin
Copy link
Contributor Author

jenkins test submodules

@mattbenjamin mattbenjamin reopened this Mar 27, 2025
@mattbenjamin mattbenjamin force-pushed the wip-cksum-badalg branch 2 times, most recently from 51625bc to 2c037bf Compare March 27, 2025 03:14
@mattbenjamin
Copy link
Contributor Author

@mattbenjamin
Copy link
Contributor Author

mattbenjamin commented Mar 28, 2025

the earlier run passed, but was missing some validation steps, re-run (also, rebased to the latest s3-tests/master):
https://pulpito.ceph.com/mbenjamin-2025-03-28_01:54:23-rgw-wip-cksum-badalg-distro-default-gibba/

@cbodley
Copy link
Contributor

cbodley commented Mar 31, 2025

consider squashing some of the fix commits? especially if you're interested in backports

@mattbenjamin
Copy link
Contributor Author

* internally compensate for at-rest byteswapped crc64 representation (e.g., before combine step)
* generalize Cksum crc api for 32bit and 64bit crcs, other cleanup
* prototype abstract cksum::Combiner workflow
* add support for forward and backward handling of composite vs full object checksums
  by marking the composites and the update flag day
* clean up checksum formatting and checksum type reporting
* add unit tests for Combiner interface
* validate and track requested checksum type (i.e., composite or full), plus
  unit test fixture for combinations of full matrix of cksum types
* doh.  GET/HEAD checksums are in headers
* add crcany license to COPYING
* return ChecksumType as header in GET/HEAD

    Found by Casey in review

* avoid fmt of char* null when no checksum header supplied

    A cksum_hdr_t(nullptr, nullptr) results in this case, and is intended,
    but its components obviously can't be presented to std::format unless
    cast to a safe pointer type.

* fail checksum mismatch with BadDigest

    When uploading an S3 object with an invalid checksum, the return code
    should be BadDigest to mirror more closely the AWS S3 implementation.
    See: https://docs.aws.amazon.com/AmazonS3/latest/userguide/checking-object-integrity.html

    Fixes: https://tracker.ceph.com/issues/70614

    Reported by Alex Wojno <[email protected]>

* fix comparison and display of composite checksums

A string comparision bounds error and a bitmask comparison
    error are fixed.

* fix build on centos9

On gcc(?13?) we can't declare rgw::cksum::FLAG_NONE
    and also rgw::cksum::Cksum::FlAG_NONE.

    SAD!!

* include <variant> invariantly

* fix checksum type return from complete-multipart

    This one is in the XML response

* aieee, don't leak Combiners

   Use unique_ptr to polymorphic type...correctly.

Signed-off-by: Matt Benjamin <[email protected]>
@mattbenjamin
Copy link
Contributor Author

jenkins test make check

@cbodley cbodley merged commit f9a8403 into ceph:main Mar 31, 2025
12 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants