Skip to content

Core, AWS, REST: Promote the S3 signing endpoint to the main spec#15112

Closed
adutra wants to merge 17 commits intoapache:mainfrom
adutra:promote-sign-endpoint
Closed

Core, AWS, REST: Promote the S3 signing endpoint to the main spec#15112
adutra wants to merge 17 commits intoapache:mainfrom
adutra:promote-sign-endpoint

Conversation

@adutra
Copy link
Contributor

@adutra adutra commented Jan 22, 2026

Dev ML discussion: https://lists.apache.org/thread/2kqdqb46j7jww36wwg4txv6pl2hqq9w7

This commit promotes the S3 remote signing endpoint from an AWS-specific implementation to a first-class REST catalog API endpoint.

This enables other storage providers (GCS, Azure, etc.) to eventually reuse the same signing endpoint pattern without duplicating the API definition.

OpenAPI Specification changes:

  • Add /v1/{prefix}/namespaces/{namespace}/tables/{table}/sign/{provider} endpoint to the main REST catalog OpenAPI spec
  • Define RemoteSignRequest, RemoteSignResult and RemoteSignResponse schemas
  • Remove the separate s3-signer-open-api.yaml from the AWS module
  • Update the Python client

Core Module changes (iceberg-core):

  • Add RemoteSignRequest and RemoteSignResponse model classes, copied from the iceberg-aws module
  • Add RemoteSignRequestParser and RemoteSignResponseParser for JSON serialization, copied from the iceberg-aws module
  • Add SIGNER_URI and SIGNER_ENDPOINT properties to CatalogProperties for configuring the signing endpoint
  • Add V1_TABLE_REMOTE_SIGN field and remoteSign() method to ResourcePaths
  • Register the new endpoint in Endpoint.java
  • Add abstract RemoteSignerServlet base class for remote signing tests, copied from the iceberg-aws module

AWS Module changes (iceberg-aws):

  • Deprecate S3SignRequest and S3SignResponse for removal
  • Deprecate S3SignRequestParser and S3SignResponseParser for removal
  • Deprecate S3ObjectMapper for removal
  • Refactor S3SignerServlet to extend RemoteSignerServlet
  • Update S3V4RestSignerClient
  • Move relevant tests to iceberg-core


If remote signing for a specific storage provider is enabled, clients must respect the following configurations when creating a remote signer client:
- `signer.uri`: the base URI of the remote signer endpoint. Optional; if absent, defaults to the catalog's base URI.
- `signer.endpoint`: the path of the remote signer endpoint. Required. Should be concatenated with `signer.uri` to form the complete URI.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

SHOULD or MUST ?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It's complicated 😄

The signer client impl uses org.apache.iceberg.rest.RESTUtil#resolveEndpoint to perform the concatenation of signer.uri and signer.endpoint.

So, signer.endpoint could also be an absolute URL, in which case, signer.uri would be ignored.

I will try to come up with a better wording.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Rephrased, lmk what you think!

allOf:
- $ref: '#/components/schemas/Expression'

MultiValuedMap:
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I believe this is S3Headers eq section in the s3 signer spec ? can we say like ObjectStoreProviderHeader ?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I went for a more generic name because there is nothing specific to remote signing here. This component could perfectly be used for something else in the spec.

- `s3.secret-access-key`: secret for credentials that provide access to data in S3
- `s3.session-token`: if present, this value should be used for as the session token
- `s3.remote-signing-enabled`: if `true` remote signing should be performed as described in the `s3-signer-open-api.yaml` specification
- `s3.remote-signing-enabled`: if `true` remote signing should be performed as described in the `RemoteSignRequest` schema section of this spec document.
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

FYI I chose to keep this property specific to S3. I think that even if the signer endpoint is now generic, enablement should be performed for each specific object storage.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

you can actually do google GCS cloud access via its s3 gateway; same signing algorithm, just a few different settings to change listing version, endpoint, &c

https://github.com/apache/hadoop/blob/trunk/hadoop-tools/hadoop-aws/src/site/markdown/tools/hadoop-aws/third_party_stores.md#google-cloud-storage-through-the-s3a-connector

* @deprecated since 1.11.0, will be removed in 1.12.0; use {@link CatalogProperties#SIGNER_URI}
* instead.
*/
@Deprecated public static final String S3_SIGNER_URI = CatalogProperties.SIGNER_URI;
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't think we can just change the value here as that would break backwards compatibility

"true",
CatalogProperties.URI,
uri,
CatalogProperties.SIGNER_ENDPOINT,
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

this wasn't needed before but is needed now, which indicates that this is a breaking change for users?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

For now it's not required, but it will become required in a future release (1.12 or later).

There is a check + warning here:

https://github.com/adutra/iceberg/blob/f3fc09595bc47d4f0ba7a879d3c7eaf3d70c1e38/aws/src/main/java/org/apache/iceberg/aws/s3/signer/S3V4RestSignerClient.java#L219-L224

I proactively updated the tests so that they don't break when we make this property required.


paths:

/v1/aws/s3/sign:
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't think we would want to remove this spec yet. We should probably first deprecate it

}
}

public static class RemoteSignRequestSerializer<T extends RemoteSignRequest>
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

these all should probably just be package-private and not public

gen.writeEndArray();
}
gen.writeEndObject();
public static void headersToJson(
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

not sure whether we need to make this one and the one below public

default:
throw new UnsupportedOperationException("Unsupported grant_type: " + grantType);
protected void validateSignRequest(RemoteSignRequest request) {
if ("POST".equalsIgnoreCase(request.method()) && request.uri().getQuery().contains("delete")) {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

minor : should we use rest constants instead of raw string for "POST" ?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sorry I went back to raw strings because org.apache.iceberg.rest.HttpMethod is missing constants for PUT, OPTIONS, etc. which was causing integration tests to fail.

* @param response the HTTP response to add headers to
*/
protected void addSignResponseHeaders(RemoteSignRequest request, HttpServletResponse response) {
if (request.method().equalsIgnoreCase("GET") || request.method().equalsIgnoreCase("HEAD")) {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

we previously had this defined in a CACHEABLE_METHODS set, so would be good to keep this for easier readability

Preconditions.checkArgument(
properties().containsKey(S3_SIGNER_URI) || properties().containsKey(CatalogProperties.URI),
properties().containsKey(S3_SIGNER_URI)
|| properties().containsKey(RESTCatalogProperties.SIGNER_URI)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

please add some tests for these new properties to testS3RemoteSignerWithoutUri()

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Added, but the tests look very similar to the ones in TestS3V4RestSignerClient.legacySignerProperties().

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

sorry, I actually meant to only add tests to testS3RemoteSignerWithoutUri(), which verifies that the error msg S3 signer service URI is required is properly thrown when any of these new props are missing. No need to duplicate TestS3V4RestSignerClient.legacySignerProperties() into TestS3FileIOProperties

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

But the error is only thrown when all of these props are missing. So the existing test is imho already testing sufficiently. Wdyt?

}

/**
* The base URI of the remote signer endpoint. Optional, defaults to {@linkplain
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nit: why not just {@link CatalogProperties#URI} here?


lint-spec:
uv run yamllint --strict rest-catalog-open-api.yaml
uv run yamllint --strict ../aws/src/main/resources/s3-signer-open-api.yaml
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think we should still leave this in until we actually remove the openapi spec file

* @deprecated since 1.11.0, will be removed in 1.12.0; use {@link
* org.apache.iceberg.rest.requests.RemoteSignRequestParser} instead.
*/
@Deprecated
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I actually liked that you switched the impl here to using functionality from the new RemoteSignRequestParser for stuff, not sure why you decided to change that

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

OK 😅 I thought it was a bit too invasive. Let me go back to the previous version.

* @deprecated since 1.11.0, will be removed in 1.12.0; the serializers for S3 signing are now
* registered in {@link RESTSerializers}.
*/
@Deprecated
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

this class I would probably just deprecate and not switch to using RESTSerializers.registerAll(MAPPER).

Copy link
Contributor

@nastra nastra left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

overall LGTM, but left a few minor comments that would be good to address

@adutra adutra force-pushed the promote-sign-endpoint branch from 7ee473b to 90e4eba Compare January 30, 2026 16:02
@adutra
Copy link
Contributor Author

adutra commented Jan 30, 2026

Heads up: I had to rebase because of a conflict with fec9800.

@adutra adutra force-pushed the promote-sign-endpoint branch from 0fcc2da to ee33e03 Compare January 30, 2026 16:46
Dev ML discussion: https://lists.apache.org/thread/2kqdqb46j7jww36wwg4txv6pl2hqq9w7

This commit promotes the S3 remote signing endpoint from an AWS-specific
implementation to a first-class REST catalog API endpoint.

This enables other storage providers (GCS, Azure, etc.) to eventually reuse
the same signing endpoint pattern without duplicating the API definition.

OpenAPI Specification changes:

- Add `/v1/{prefix}/namespaces/{namespace}/tables/{table}/sign/{provider}`
  endpoint to the main REST catalog OpenAPI spec
- Define `RemoteSignRequest`, `RemoteSignResult` and `RemoteSignResponse` schemas
- Remove the separate `s3-signer-open-api.yaml` from the AWS module
- Update the Python client

Core Module changes (iceberg-core):

- Add `RemoteSignRequest` and `RemoteSignResponse` model classes, copied from
  the iceberg-aws module
- Add `RemoteSignRequestParser` and `RemoteSignResponseParser` for JSON
  serialization, copied from the iceberg-aws module
- Add `SIGNER_URI` and `SIGNER_ENDPOINT` properties to `CatalogProperties`
  for configuring the signing endpoint
- Add `V1_TABLE_REMOTE_SIGN` field and `remoteSign()` method to
  `ResourcePaths`
- Register the new endpoint in `Endpoint.java`
- Add abstract `RemoteSignerServlet` base class for remote signing tests, copied
  from the iceberg-aws module

AWS Module changes (iceberg-aws):

- Deprecate `S3SignRequest` and `S3SignResponse` for removal
- Deprecate `S3SignRequestParser` and `S3SignResponseParser` for removal
- Deprecate `S3ObjectMapper` for removal
- Refactor `S3SignerServlet` to extend `RemoteSignerServlet`
- Update `S3V4RestSignerClient`
@adutra adutra force-pushed the promote-sign-endpoint branch from 914bb9b to 852f1cb Compare February 10, 2026 17:12
for requests where the body of the message contains content which must be validated before a request is
signed, such as the S3 DeleteObjects call.
provider:
type: string
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

should we define an ENUM for this, as s3a / s3e .... can create confusion for the catalog, we can say all are s3 ? similar for GCS / Azure check the resolving file IO i believe we do something similar there

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

An enum means we need a spec change for every new provider – unless we go ahead an create enums for all major cloud providers?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

unless we go ahead an create enums for all major cloud providers?

thats what we had in mind, like let say s3 / azure / gcp and we can add more in case to case basis, open string still leave room for interpretation in catalog is what i think, but i can be convinced other wise :)

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

OK I'm fine with that! I just introduced the enum, the constants are s3, gcs and adls (these names match the naming convention used by ResolvingFileIO).

Copy link
Contributor

@singhpk234 singhpk234 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Spec changes looks good to me.
Added a suggestion for the provider being ENUM

@steveloughran
Copy link
Contributor

@adutra I'm doing some stuff with the signer (#15417 with draft pr and tests in #15428 15428). The s3 signer needs to track which headers are being signed, so when it is safe/unsafe to use the cached signature. new Range? good everwhere. if-modified-since? only with #15428 in. new aws-encryption option: nothing.

Either it parses the signature string coming back from the s3 servlet or it gets back a header explicitly listing those headers signed. That seems cleaner, but might need to be part of this spec

  • if x-iceberg-cached-headers comes back with a list of headers, include these in the cache alongside the signature
  • when checking to see if a request can be signed, first check verb and uri, then compare the signed headers, and only reuse if the headers values are identical.

Given this PR is explicitly a promotion of the current signing, it'll have to be a followup. It's just at the moment the signing is very brittle

- s3
- gcs
- adls
default: s3
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

should we maybe omit the default? I also don't see defaulting to s3 anywhere in the java impl

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@nastra let's move the conversation to #15450.

@adutra
Copy link
Contributor Author

adutra commented Feb 26, 2026

FYI in yesterday's catalog sync it was decided to split this PR in two: one PR for spec changes (#15450) and another for code changes (TDB).

Apologies to the many reviewers for the inconvenience. Please review #15450 if you can 🙏

@adutra adutra closed this Feb 26, 2026
@adutra
Copy link
Contributor Author

adutra commented Feb 26, 2026

The code changes PR is also up for review: #15451.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

None yet

Development

Successfully merging this pull request may close these issues.

6 participants