Core, AWS, REST: Promote the S3 signing endpoint to the main spec#15112
Core, AWS, REST: Promote the S3 signing endpoint to the main spec#15112adutra wants to merge 17 commits intoapache:mainfrom
Conversation
ad95a85 to
f3fc095
Compare
open-api/rest-catalog-open-api.yaml
Outdated
|
|
||
| If remote signing for a specific storage provider is enabled, clients must respect the following configurations when creating a remote signer client: | ||
| - `signer.uri`: the base URI of the remote signer endpoint. Optional; if absent, defaults to the catalog's base URI. | ||
| - `signer.endpoint`: the path of the remote signer endpoint. Required. Should be concatenated with `signer.uri` to form the complete URI. |
There was a problem hiding this comment.
It's complicated 😄
The signer client impl uses org.apache.iceberg.rest.RESTUtil#resolveEndpoint to perform the concatenation of signer.uri and signer.endpoint.
So, signer.endpoint could also be an absolute URL, in which case, signer.uri would be ignored.
I will try to come up with a better wording.
There was a problem hiding this comment.
Rephrased, lmk what you think!
| allOf: | ||
| - $ref: '#/components/schemas/Expression' | ||
|
|
||
| MultiValuedMap: |
There was a problem hiding this comment.
I believe this is S3Headers eq section in the s3 signer spec ? can we say like ObjectStoreProviderHeader ?
There was a problem hiding this comment.
I went for a more generic name because there is nothing specific to remote signing here. This component could perfectly be used for something else in the spec.
| - `s3.secret-access-key`: secret for credentials that provide access to data in S3 | ||
| - `s3.session-token`: if present, this value should be used for as the session token | ||
| - `s3.remote-signing-enabled`: if `true` remote signing should be performed as described in the `s3-signer-open-api.yaml` specification | ||
| - `s3.remote-signing-enabled`: if `true` remote signing should be performed as described in the `RemoteSignRequest` schema section of this spec document. |
There was a problem hiding this comment.
FYI I chose to keep this property specific to S3. I think that even if the signer endpoint is now generic, enablement should be performed for each specific object storage.
There was a problem hiding this comment.
you can actually do google GCS cloud access via its s3 gateway; same signing algorithm, just a few different settings to change listing version, endpoint, &c
aws/src/main/java/org/apache/iceberg/aws/s3/signer/S3V4RestSignerClient.java
Outdated
Show resolved
Hide resolved
| * @deprecated since 1.11.0, will be removed in 1.12.0; use {@link CatalogProperties#SIGNER_URI} | ||
| * instead. | ||
| */ | ||
| @Deprecated public static final String S3_SIGNER_URI = CatalogProperties.SIGNER_URI; |
There was a problem hiding this comment.
I don't think we can just change the value here as that would break backwards compatibility
| "true", | ||
| CatalogProperties.URI, | ||
| uri, | ||
| CatalogProperties.SIGNER_ENDPOINT, |
There was a problem hiding this comment.
this wasn't needed before but is needed now, which indicates that this is a breaking change for users?
There was a problem hiding this comment.
For now it's not required, but it will become required in a future release (1.12 or later).
There is a check + warning here:
I proactively updated the tests so that they don't break when we make this property required.
|
|
||
| paths: | ||
|
|
||
| /v1/aws/s3/sign: |
There was a problem hiding this comment.
I don't think we would want to remove this spec yet. We should probably first deprecate it
| } | ||
| } | ||
|
|
||
| public static class RemoteSignRequestSerializer<T extends RemoteSignRequest> |
There was a problem hiding this comment.
these all should probably just be package-private and not public
| gen.writeEndArray(); | ||
| } | ||
| gen.writeEndObject(); | ||
| public static void headersToJson( |
There was a problem hiding this comment.
not sure whether we need to make this one and the one below public
| default: | ||
| throw new UnsupportedOperationException("Unsupported grant_type: " + grantType); | ||
| protected void validateSignRequest(RemoteSignRequest request) { | ||
| if ("POST".equalsIgnoreCase(request.method()) && request.uri().getQuery().contains("delete")) { |
There was a problem hiding this comment.
minor : should we use rest constants instead of raw string for "POST" ?
There was a problem hiding this comment.
Sorry I went back to raw strings because org.apache.iceberg.rest.HttpMethod is missing constants for PUT, OPTIONS, etc. which was causing integration tests to fail.
| * @param response the HTTP response to add headers to | ||
| */ | ||
| protected void addSignResponseHeaders(RemoteSignRequest request, HttpServletResponse response) { | ||
| if (request.method().equalsIgnoreCase("GET") || request.method().equalsIgnoreCase("HEAD")) { |
There was a problem hiding this comment.
we previously had this defined in a CACHEABLE_METHODS set, so would be good to keep this for easier readability
| Preconditions.checkArgument( | ||
| properties().containsKey(S3_SIGNER_URI) || properties().containsKey(CatalogProperties.URI), | ||
| properties().containsKey(S3_SIGNER_URI) | ||
| || properties().containsKey(RESTCatalogProperties.SIGNER_URI) |
There was a problem hiding this comment.
please add some tests for these new properties to testS3RemoteSignerWithoutUri()
There was a problem hiding this comment.
Added, but the tests look very similar to the ones in TestS3V4RestSignerClient.legacySignerProperties().
There was a problem hiding this comment.
sorry, I actually meant to only add tests to testS3RemoteSignerWithoutUri(), which verifies that the error msg S3 signer service URI is required is properly thrown when any of these new props are missing. No need to duplicate TestS3V4RestSignerClient.legacySignerProperties() into TestS3FileIOProperties
There was a problem hiding this comment.
But the error is only thrown when all of these props are missing. So the existing test is imho already testing sufficiently. Wdyt?
| } | ||
|
|
||
| /** | ||
| * The base URI of the remote signer endpoint. Optional, defaults to {@linkplain |
There was a problem hiding this comment.
nit: why not just {@link CatalogProperties#URI} here?
core/src/main/java/org/apache/iceberg/rest/requests/RemoteSignRequestParser.java
Show resolved
Hide resolved
core/src/main/java/org/apache/iceberg/rest/responses/RemoteSignResponseParser.java
Show resolved
Hide resolved
|
|
||
| lint-spec: | ||
| uv run yamllint --strict rest-catalog-open-api.yaml | ||
| uv run yamllint --strict ../aws/src/main/resources/s3-signer-open-api.yaml |
There was a problem hiding this comment.
I think we should still leave this in until we actually remove the openapi spec file
| * @deprecated since 1.11.0, will be removed in 1.12.0; use {@link | ||
| * org.apache.iceberg.rest.requests.RemoteSignRequestParser} instead. | ||
| */ | ||
| @Deprecated |
There was a problem hiding this comment.
I actually liked that you switched the impl here to using functionality from the new RemoteSignRequestParser for stuff, not sure why you decided to change that
There was a problem hiding this comment.
OK 😅 I thought it was a bit too invasive. Let me go back to the previous version.
| * @deprecated since 1.11.0, will be removed in 1.12.0; the serializers for S3 signing are now | ||
| * registered in {@link RESTSerializers}. | ||
| */ | ||
| @Deprecated |
There was a problem hiding this comment.
this class I would probably just deprecate and not switch to using RESTSerializers.registerAll(MAPPER).
nastra
left a comment
There was a problem hiding this comment.
overall LGTM, but left a few minor comments that would be good to address
7ee473b to
90e4eba
Compare
|
Heads up: I had to rebase because of a conflict with fec9800. |
0fcc2da to
ee33e03
Compare
Dev ML discussion: https://lists.apache.org/thread/2kqdqb46j7jww36wwg4txv6pl2hqq9w7 This commit promotes the S3 remote signing endpoint from an AWS-specific implementation to a first-class REST catalog API endpoint. This enables other storage providers (GCS, Azure, etc.) to eventually reuse the same signing endpoint pattern without duplicating the API definition. OpenAPI Specification changes: - Add `/v1/{prefix}/namespaces/{namespace}/tables/{table}/sign/{provider}` endpoint to the main REST catalog OpenAPI spec - Define `RemoteSignRequest`, `RemoteSignResult` and `RemoteSignResponse` schemas - Remove the separate `s3-signer-open-api.yaml` from the AWS module - Update the Python client Core Module changes (iceberg-core): - Add `RemoteSignRequest` and `RemoteSignResponse` model classes, copied from the iceberg-aws module - Add `RemoteSignRequestParser` and `RemoteSignResponseParser` for JSON serialization, copied from the iceberg-aws module - Add `SIGNER_URI` and `SIGNER_ENDPOINT` properties to `CatalogProperties` for configuring the signing endpoint - Add `V1_TABLE_REMOTE_SIGN` field and `remoteSign()` method to `ResourcePaths` - Register the new endpoint in `Endpoint.java` - Add abstract `RemoteSignerServlet` base class for remote signing tests, copied from the iceberg-aws module AWS Module changes (iceberg-aws): - Deprecate `S3SignRequest` and `S3SignResponse` for removal - Deprecate `S3SignRequestParser` and `S3SignResponseParser` for removal - Deprecate `S3ObjectMapper` for removal - Refactor `S3SignerServlet` to extend `RemoteSignerServlet` - Update `S3V4RestSignerClient`
This reverts commit 432ca04.
914bb9b to
852f1cb
Compare
| for requests where the body of the message contains content which must be validated before a request is | ||
| signed, such as the S3 DeleteObjects call. | ||
| provider: | ||
| type: string |
There was a problem hiding this comment.
should we define an ENUM for this, as s3a / s3e .... can create confusion for the catalog, we can say all are s3 ? similar for GCS / Azure check the resolving file IO i believe we do something similar there
There was a problem hiding this comment.
An enum means we need a spec change for every new provider – unless we go ahead an create enums for all major cloud providers?
There was a problem hiding this comment.
unless we go ahead an create enums for all major cloud providers?
thats what we had in mind, like let say s3 / azure / gcp and we can add more in case to case basis, open string still leave room for interpretation in catalog is what i think, but i can be convinced other wise :)
There was a problem hiding this comment.
OK I'm fine with that! I just introduced the enum, the constants are s3, gcs and adls (these names match the naming convention used by ResolvingFileIO).
singhpk234
left a comment
There was a problem hiding this comment.
Spec changes looks good to me.
Added a suggestion for the provider being ENUM
|
@adutra I'm doing some stuff with the signer (#15417 with draft pr and tests in #15428 15428). The s3 signer needs to track which headers are being signed, so when it is safe/unsafe to use the cached signature. new Range? good everwhere. if-modified-since? only with #15428 in. new aws-encryption option: nothing. Either it parses the signature string coming back from the s3 servlet or it gets back a header explicitly listing those headers signed. That seems cleaner, but might need to be part of this spec
Given this PR is explicitly a promotion of the current signing, it'll have to be a followup. It's just at the moment the signing is very brittle |
| - s3 | ||
| - gcs | ||
| - adls | ||
| default: s3 |
There was a problem hiding this comment.
should we maybe omit the default? I also don't see defaulting to s3 anywhere in the java impl
|
The code changes PR is also up for review: #15451. |
Dev ML discussion: https://lists.apache.org/thread/2kqdqb46j7jww36wwg4txv6pl2hqq9w7
This commit promotes the S3 remote signing endpoint from an AWS-specific implementation to a first-class REST catalog API endpoint.
This enables other storage providers (GCS, Azure, etc.) to eventually reuse the same signing endpoint pattern without duplicating the API definition.
OpenAPI Specification changes:
/v1/{prefix}/namespaces/{namespace}/tables/{table}/sign/{provider}endpoint to the main REST catalog OpenAPI specRemoteSignRequest,RemoteSignResultandRemoteSignResponseschemass3-signer-open-api.yamlfrom the AWS moduleCore Module changes (iceberg-core):
RemoteSignRequestandRemoteSignResponsemodel classes, copied from the iceberg-aws moduleRemoteSignRequestParserandRemoteSignResponseParserfor JSON serialization, copied from the iceberg-aws moduleSIGNER_URIandSIGNER_ENDPOINTproperties toCatalogPropertiesfor configuring the signing endpointV1_TABLE_REMOTE_SIGNfield andremoteSign()method toResourcePathsEndpoint.javaRemoteSignerServletbase class for remote signing tests, copied from the iceberg-aws moduleAWS Module changes (iceberg-aws):
S3SignRequestandS3SignResponsefor removalS3SignRequestParserandS3SignResponseParserfor removalS3ObjectMapperfor removalS3SignerServletto extendRemoteSignerServletS3V4RestSignerClient