Skip to content

fix(storage): use hash values in InsertObject()#7025

Merged
coryan merged 2 commits intogoogleapis:mainfrom
coryan:fix-storage-checksum-values-for-insert-object
Jul 20, 2021
Merged

fix(storage): use hash values in InsertObject()#7025
coryan merged 2 commits intogoogleapis:mainfrom
coryan:fix-storage-checksum-values-for-insert-object

Conversation

@coryan
Copy link
Copy Markdown
Contributor

@coryan coryan commented Jul 20, 2021

If the application provides a gcs::Crc32ChecksumValue() or
gcs::MD5HashValue() to InsertObject() we should use those values to
create the metadata portion of the upload. Otherwise we are losing an
opportunity to validate the data integrity during the upload.

I discovered this while refactoring and restructuring the integration
tests for CRC32C checksums and MD5 hashes. Apologies for the massive
changes in the tests for what amounts to a small fix.

The tests now have 3 portions, one for InsertObject(), one for
WriteObject() and one for ReadObject(). Unlike before, the code for
XML and JSON tests is shared, though this required some changes in the
emulator to make it easier to detect what hash fields (if any) were
including in the upload.

For InsertObject() and WriteObject() we test 5 scenarios: (1) using
the default settings, which are enabled for CRC32C, and disabled for
MD5, (2) explicitly disabling the hash, (3) explicitly enabling the
hash, (4) setting the hash to the correct value, and (5) setting the
hash to the incorrect value.

For WriteObject() we also test detecting an invalid hash, both in the
case where the library computes the hash and the case where the
application provides it.

For ReadObject() it is simpler, because we cannot provide expected
hashes. So we only test the cases where the hash check is in its default
setting (enabled for CRC32, disabled for MD5) and that hash mistmatches
are detected when the hash is enabled.

Fixes #7014


This change is Reviewable

If the application provides a `gcs::Crc32ChecksumValue()` or
`gcs::MD5HashValue()` to `InsertObject()` we should use those values to
create the metadata portion of the upload. Otherwise we are losing an
opportunity to validate the data integrity during the upload.

I discovered this while refactoring and restructuring the integration
tests for CRC32C checksums and MD5 hashes. Apologies for the massive
changes in the tests for what amounts to a small fix.

The tests now have 3 portions, one for `InsertObject()`, one for
`WriteObject()` and one for `ReadObject()`. Unlike before, the code for
XML and JSON tests is shared, though this required some changes in the
emulator to make it easier to detect what hash fields (if any) were
including in the upload.

For `InsertObject()` and `WriteObject()` we test 5 scenarios: (1) using
the default settings, which are enabled for CRC32C, and disabled for
MD5, (2) explicitly disabling the hash, (3) explicitly enabling the
hash, (4) setting the hash to the correct value, and (5) setting the
hash to the incorrect value.

For `WriteObject()` we also test detecting an invalid hash, both in the
case where the library computes the hash and the case where the
application provides it.

For `ReadObject()` it is simpler, because we cannot provide expected
hashes. So we only test the cases where the hash check is in its default
setting (enabled for CRC32, disabled for MD5) and that hash mistmatches
are detected when the hash is enabled.
@googlebot googlebot added the cla: yes This human has signed the Contributor License Agreement. label Jul 20, 2021
@product-auto-label product-auto-label Bot added the api: storage Issues related to the Cloud Storage API. label Jul 20, 2021
@google-cloud-cpp-bot
Copy link
Copy Markdown
Contributor

Google Cloud Build Logs
For commit: 29ccf973fd5e4f394601345bbab209a867eca01d

ℹ️ NOTE: Kokoro logs are linked from "Details" below.

@google-cloud-cpp-bot
Copy link
Copy Markdown
Contributor

Google Cloud Build Logs
For commit: 15bfd91646292bea8c82cf4de848a07c36b94cdd

ℹ️ NOTE: Kokoro logs are linked from "Details" below.

@codecov
Copy link
Copy Markdown

codecov Bot commented Jul 20, 2021

Codecov Report

Merging #7025 (15bfd91) into main (790904f) will increase coverage by 0.00%.
The diff coverage is 100.00%.

Impacted file tree graph

@@           Coverage Diff           @@
##             main    #7025   +/-   ##
=======================================
  Coverage   94.47%   94.47%           
=======================================
  Files        1305     1305           
  Lines      113388   113350   -38     
=======================================
- Hits       107119   107084   -35     
+ Misses       6269     6266    -3     
Impacted Files Coverage Δ
google/cloud/storage/internal/curl_client.cc 98.13% <100.00%> (+0.22%) ⬆️
...e/cloud/storage/testing/storage_integration_test.h 100.00% <100.00%> (ø)
.../storage/tests/object_checksum_integration_test.cc 100.00% <100.00%> (ø)
...loud/storage/tests/object_hash_integration_test.cc 100.00% <100.00%> (ø)
...bigtable/examples/bigtable_hello_instance_admin.cc 81.31% <0.00%> (-2.20%) ⬇️
...sub/internal/batching_publisher_connection_test.cc 97.60% <0.00%> (-0.21%) ⬇️
google/cloud/storage/hashing_options.h 100.00% <0.00%> (ø)
...cloud/pubsub/internal/subscription_session_test.cc 98.27% <0.00%> (+0.24%) ⬆️
google/cloud/bigtable/async_read_stream_test.cc 97.99% <0.00%> (+0.66%) ⬆️

Continue to review full report at Codecov.

Legend - Click here to learn more
Δ = absolute <relative> (impact), ø = not affected, ? = missing data
Powered by Codecov. Last update 790904f...15bfd91. Read the comment docs.

@coryan coryan marked this pull request as ready for review July 20, 2021 13:18
@coryan coryan requested a review from a team July 20, 2021 13:18
@coryan
Copy link
Copy Markdown
Contributor Author

coryan commented Jul 20, 2021

If you are reviewing this: you might be better off thinking of the integration tests as completely new code.

Comment thread google/cloud/storage/internal/curl_client.cc
@coryan coryan merged commit a3df9c7 into googleapis:main Jul 20, 2021
@coryan coryan deleted the fix-storage-checksum-values-for-insert-object branch July 20, 2021 14:49
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

api: storage Issues related to the Cloud Storage API. cla: yes This human has signed the Contributor License Agreement.

Projects

None yet

Development

Successfully merging this pull request may close these issues.

Audit use of pre-computed hashes for uploads and downloads

4 participants