Add tenant resolver by simonswine · Pull Request #3486 · cortexproject/cortex

simonswine · 2020-11-11T11:31:39Z

What this PR does:

Which issue(s) this PR fixes:

This implements the multi tenant resolver as described by the proposal for multi tenant query-federation.

By default it behaves like before, but it's implementation can be swapped out.

This replaces:

user.ExtractOrgID by either TenantID or UserID depending on if in this context a multi tenant usage should be allowed
user.ExtractOrgIDFromHTTPRequest with ExtractTenantIDFromHTTPRequest to gather exactly one tenant id.

I have also added lint test to check for usage of the upstream methods directly

Checklist

Tests updated
[not necessary] Documentation added
[not necessary] CHANGELOG.md updated - the order of entries should be [CHANGE], [FEATURE], [ENHANCEMENT], [BUGFIX]

simonswine · 2020-11-11T11:42:11Z

pkg/tenant/resolver.go

Using a global variable here is not something I am particularly happy about.

I have tried to pass a tenantResolver reference around that would itself be instantiated in a Cortex module. But this approach has a massive changeset, as its touching so many areas of the codebase and I decided to go with this approach.

I think a global is definitely acceptable in this case. A package could always instantiate it's own resolver down the line if there was a reason to go down that path.

jtlisi

LGTM

jtlisi · 2020-11-11T21:13:19Z

pkg/tenant/resolver.go

I think a global is definitely acceptable in this case. A package could always instantiate it's own resolver down the line if there was a reason to go down that path.

jtlisi · 2020-11-11T21:16:47Z

pkg/tenant/resolver.go

Cool, so we are going to continue using the weaveworks/common library as is? It just won't be aware of how the header value will be resolved.

Yes being able intercept headers would require us to do something that I have trailed #3416, which came with a massive change set across the project.

jtlisi · 2020-11-11T21:22:14Z

pkg/alertmanager/api.go

suggestion(if-minor): Instead of retrieving the default resolver and calling these functions, what if you expose the functions of the Resolver interface as top level functions in the tenant package and just assume they will utilize the default in the package. That could make the syntax a bit cleaner.

Suggested change

userID, err := tenant.DefaultResolver().TenantID(r.Context())

userID, err := tenant.TenantID(r.Context())

I have done that, golint complained about the stuttering issue: tenant.TenantID[s], but I think it's important to have it there to make it clear UserID vs TenantID.

pkg/configs/api/api.go

csmarchbanks

I would love some more detail around when UserID will be used vs TenantID/TenantIDs. I don't see any info in the proposal around that (or I am blind which is totally possible). Is this meant to be something like a username? Just an extra header that will be passed?

The comment "used to identify a user in log messages" scares me a bit as we could be getting into PII which I certainly do not want to encourage people to log.

simonswine · 2020-11-19T18:24:34Z

I would love some more detail around when UserID will be used vs TenantID/TenantIDs. I don't see any info in the proposal around that (or I am blind which is totally possible). Is this meant to be something like a username? Just an extra header that will be passed?

@csmarchbanks agreed. This is something where the current comments on the interface tenant.Resolver are not enough and we potentially need to update the proposal to define that properly.

What I think we should be trying to solve is how to move away from the overloaded meaning that is currently derived from user.ExtractOrgID. It is the tenant ID, but it's also used for identifying limits and the origin of certain requests in the logs and metrics. What I tried to do in the scope of this PR is to ensure we call the correct method depending on which of those things the intent is at the point where currently user.ExtractOrgID is called.

While I think using the default settings cortex will always behave TentantID() == TenantIDs[0] == UserID() (and everything else would break compatibility expectations), it would be a good opportunity to evolve the tenant.Resolver by the experiences we gain from adding more experimental features like this one.

As you are aware we are using Cortex as a basis of Grafana Metric Enterprise and we have some plans that involve distinguishing more details between different actors using the API within the same tenant. I see that as a first step in getting a good understanding what is involved there.

I hope that makes sense and wonder what if you think this could go forward, with some of that baked into the comments of the tenant.Resolver?

pstibrany · 2020-11-19T19:10:31Z

What I think we should be trying to solve is how to move away from the overloaded meaning that is currently derived from user.ExtractOrgID. It is the tenant ID, but it's also used for identifying limits and the origin of certain requests in the logs and metrics.

I don't see this as overloaded. Introducing new userID concept is confusing, since in Cortex we don't deal with individual users in any way.

While I think using the default settings cortex will always behave TentantID() == TenantIDs[0] == UserID()

In the proposal, multiple tenants were sorted. I am bit confused by this statement (esp. by "TenantIDs[0]" part, since that will always be tenant with lowest ID??)

pstibrany

Using userID without tenantID is misleading. Different tenants may have same set of users. For log messages, metrics or fairness, both need to be used.

For metrics especially, using user can lead to additional cardinality explosion, and is explicitly called out as bad practice.

From the code it seems like what you're calling "user ID" is basically a sorted set of tenant IDs. This mix of terminology is confusing.

This code replaces user.ExtractOrgID calls, but not user.InjectOrgID calls. Furthermore, there are components from weaveworks library, that we use, that still use original user.ExtractOrgID (instrumentation, tenant propagation for grpc calls). I think this needs to be fixed, preferably in the weaveworks common library. (Otherwise we need to reimplement these helper components on top of new "tenant" package).

simonswine · 2020-11-20T17:08:15Z

Using userID without tenantID is misleading. Different tenants may have same set of users. For log messages, metrics or fairness, both need to be used.

Totally agreed, yes eventually we will need information in the context.Context that will tell us both the user and the tenant, so we can consider both for those decisions.

For metrics especially, using user can lead to additional cardinality explosion, and is explicitly called out as bad practice.

From the code it seems like what you're calling "user ID" is basically a sorted set of tenant IDs. This mix of terminology is confusing.

Both of those are correct and I know this is far from ideal, but that was called out as part of the proposal to be a step-gap solution, that really only affects you if you enabled multi-tenant support and ideally log both and so on

This code replaces user.ExtractOrgID calls, but not user.InjectOrgID calls.

I don't think this is a problem as we ourselves in the tenant package call ExtractOrgID and derive our values from it.

While I think using the default settings cortex will always behave TentantID() == TenantIDs[0] == UserID()

In the proposal, multiple tenants were sorted. I am bit confused by this statement (esp. by "TenantIDs[0]" part, since that will always be tenant with lowest ID??)

Sorry I have really phrased that in aquite confusing way, what I meant to say. Using the default resolver, which is defaultResolver = NewSingleResolver(), all three methods which replace ExtractOrgID basically return the same. The string returned in UserID/TenantID is the same and it also matches with the first and only returned element in the []string in TenantIDs

Furthermore, there are components from weaveworks library, that we use, that still use original user.ExtractOrgID (instrumentation, tenant propagation for grpc calls). I think this needs to be fixed, preferably in the weaveworks common library. (Otherwise we need to reimplement these helper components on top of new "tenant" package).

I think that's how my vision for the future work would look like, either making weaveworks/common more adaptable so that we can reuse our own context extraction/injection from HTTP/GRPC or we have to move them into the project. Especially code which is quite specific to how Cortex works is probably better to live within here.

I am right now not yet in a position to predict how we would achieve that and how interfaces and more specific would look like, but by gradually carving parts out (like this PR) I think we can make this transition work.

simonswine · 2020-11-23T14:33:31Z

I am wondering if this PR would have it easier to get consensus if I would remove the UserID() string method from the Resolver interface:

TenantID() (which will only ever return a single tenant), replaces all usages of ExtractOrgID.
TenantIDs(), would then only be used on the read path where it's necessary for multi-tenant querying.

wdyt @pstibrany?

Withdrawing my request in case someone else wants to approve. (I plan to take another look but not sure when)

pstibrany · 2020-11-23T14:38:31Z

I am wondering if this PR would have it easier to get consensus if I would remove the UserID() string method from the Resolver interface:

TenantID() (which will only ever return a single tenant), replaces all usages of ExtractOrgID.

TenantIDs(), would then only be used on the read path where it's necessary for multi-tenant querying.

These suggestions would make it less controversial

pracucci

Very good job, LGTM!

pracucci · 2020-11-26T11:39:33Z

pkg/tenant/tenant.go

Performance wise a couple of things:

The input ids is just fine if there are no duplicates (which I would expect to be the common case)

result make be pre-allocated with the length of ids

pstibrany

Thank you for addressing the feedback, LGTM. Currently it's a noop in Cortex, and I'm looking forward to see more support for multitenant.

This implements the multi tenant resolver as described by the [proposal] for multi tenant query-federation. By default it behaves like before, but it's implementation can be swapped out. [proposal]: cortexproject#3364 Signed-off-by: Christian Simon <[email protected]>

Use TenantID or UserID depending on which of the methods are meant to be used. Signed-off-by: Christian Simon <[email protected]>

This is replaced by ExtractTenantIDFromHTTPRequest, which makes sure that exactly one tenant ID is set. Signed-off-by: Christian Simon <[email protected]>

Signed-off-by: Christian Simon <[email protected]>

We need a better definition for what we are trying to achieve with UserID before we can add it to the interface Signed-off-by: Christian Simon <[email protected]>

Signed-off-by: Christian Simon <[email protected]>

- reduce allocations by reusing the input slice during de-duplication Signed-off-by: Christian Simon <[email protected]>

* Add tenant resolver package This implements the multi tenant resolver as described by the [proposal] for multi tenant query-federation. By default it behaves like before, but it's implementation can be swapped out. [proposal]: cortexproject/cortex#3364 Signed-off-by: Christian Simon <[email protected]> * Replace usages of `ExtractOrgID` Use TenantID or UserID depending on which of the methods are meant to be used. Signed-off-by: Christian Simon <[email protected]> * Replace usages of `ExtractOrgIDFromHTTPRequest` This is replaced by ExtractTenantIDFromHTTPRequest, which makes sure that exactly one tenant ID is set. Signed-off-by: Christian Simon <[email protected]> * Add methods to `tenant` package to use resolver directly Signed-off-by: Christian Simon <[email protected]> * Remove UserID method from Resolver interface We need a better definition for what we are trying to achieve with UserID before we can add it to the interface Signed-off-by: Christian Simon <[email protected]> * Update comment on the TenantID/TenantIDs Signed-off-by: Christian Simon <[email protected]> * Improve performance of NormalizeTenantIDs - reduce allocations by reusing the input slice during de-duplication Signed-off-by: Christian Simon <[email protected]>

* Add tenant resolver (cortexproject/cortex#3486) * Add tenant resolver package This implements the multi tenant resolver as described by the [proposal] for multi tenant query-federation. By default it behaves like before, but it's implementation can be swapped out. [proposal]: cortexproject/cortex#3364 Signed-off-by: Christian Simon <[email protected]> * Replace usages of `ExtractOrgID` Use TenantID or UserID depending on which of the methods are meant to be used. Signed-off-by: Christian Simon <[email protected]> * Replace usages of `ExtractOrgIDFromHTTPRequest` This is replaced by ExtractTenantIDFromHTTPRequest, which makes sure that exactly one tenant ID is set. Signed-off-by: Christian Simon <[email protected]> * Add methods to `tenant` package to use resolver directly Signed-off-by: Christian Simon <[email protected]> * Remove UserID method from Resolver interface We need a better definition for what we are trying to achieve with UserID before we can add it to the interface Signed-off-by: Christian Simon <[email protected]> * Update comment on the TenantID/TenantIDs Signed-off-by: Christian Simon <[email protected]> * Improve performance of NormalizeTenantIDs - reduce allocations by reusing the input slice during de-duplication Signed-off-by: Christian Simon <[email protected]> * Add multi tenant query federation (cortexproject/cortex#3250) * Add tenant query federation This experimental feature allows queries to cover data from more than a single Cortex tenant and can be activated by supplying `-tenant-federation.enabled` to all cortex instances. To query multiple tenants a `|` separated tenant list can be specified in the `X-Scope-OrgID` header. The source tenant of a metric will be exposed in the label `__tenant_id__`. Signed-off-by: Christian Simon <[email protected]> * Aggregate the limit of maxQueriers correctly This ensures the limit is aggregated correctly of the setting of each individual tenant. It also implements the logic for the v2 query frontend. Signed-off-by: Christian Simon <[email protected]> * Fix tenant labels and make LabelNames more efficient Signed-off-by: Christian Simon <[email protected]> * Use tsdb_errors for error handling Signed-off-by: Christian Simon <[email protected]> * Fix uninitialized label matcher Regexp matcher need to be initialized, this adds a test for regexp matcher and fixes the underlying issue. Signed-off-by: Christian Simon <[email protected]> * Use map for filterValuesByMatchers to reduce allocations Signed-off-by: Christian Simon <[email protected]> * Address review suggestions Signed-off-by: Christian Simon <[email protected]> * Add validation.SmallestPositiveNonZeroIntPerTenant to avoid code duplication Signed-off-by: Christian Simon <[email protected]> * Add limitations and experimental status to docs Signed-off-by: Christian Simon <[email protected]> * Remove unnecessary cast of defaultTenantLabel Signed-off-by: Christian Simon <[email protected]> * Handle query-range limits for multi tenant queries Signed-off-by: Christian Simon <[email protected]> * Skip results cache, if multi tenants query Signed-off-by: Christian Simon <[email protected]> * Add failint to ensure query path supports multiple tenants To avoid any future regressions in the multi tenant support within the query path, a faillint command tests if TenantID is used and if it finds one, it suggestest using TenantIDs instead> Signed-off-by: Christian Simon <[email protected]> * Align CHANGELOG line with the flag description Signed-off-by: Christian Simon <[email protected]> * Add a limitation about missing results cache Signed-off-by: Christian Simon <[email protected]> * Restrict path segments in TenantIDs (cortexproject/cortex#4375) (cortexproject/cortex#4376) This prevents paths generated from TenantIDs to become vulnerable to path traversal attacks. CVE-2021-36157 Signed-off-by: Christian Simon <[email protected]> * Update nolint statement Co-authored-by: Bryan Boreham <[email protected]>

pull-request-size bot added the size/XL label Nov 11, 2020

simonswine commented Nov 11, 2020

View reviewed changes

jtlisi approved these changes Nov 11, 2020

View reviewed changes

simonswine force-pushed the add-tenant-resolver branch from f6e7bb8 to c7f2471 Compare November 12, 2020 11:35

simonswine mentioned this pull request Nov 13, 2020

Add multi tenant query federation #3250

Merged

5 tasks

csmarchbanks reviewed Nov 18, 2020

View reviewed changes

pstibrany previously requested changes Nov 19, 2020

View reviewed changes

simonswine force-pushed the add-tenant-resolver branch 2 times, most recently from 23c4d2b to af2045e Compare November 24, 2020 10:16

simonswine requested review from csmarchbanks and pstibrany November 24, 2020 17:45

pracucci approved these changes Nov 26, 2020

View reviewed changes

pstibrany approved these changes Nov 26, 2020

View reviewed changes

simonswine added 7 commits November 26, 2020 13:17

Replace usages of ExtractOrgID

4d69fe0

Use TenantID or UserID depending on which of the methods are meant to be used. Signed-off-by: Christian Simon <[email protected]>

Replace usages of ExtractOrgIDFromHTTPRequest

2735f98

This is replaced by ExtractTenantIDFromHTTPRequest, which makes sure that exactly one tenant ID is set. Signed-off-by: Christian Simon <[email protected]>

Add methods to tenant package to use resolver directly

184dd16

Signed-off-by: Christian Simon <[email protected]>

Remove UserID method from Resolver interface

57fa806

We need a better definition for what we are trying to achieve with UserID before we can add it to the interface Signed-off-by: Christian Simon <[email protected]>

Update comment on the TenantID/TenantIDs

f46685f

Signed-off-by: Christian Simon <[email protected]>

Improve performance of NormalizeTenantIDs

cd074e9

- reduce allocations by reusing the input slice during de-duplication Signed-off-by: Christian Simon <[email protected]>

simonswine force-pushed the add-tenant-resolver branch from af2045e to cd074e9 Compare November 26, 2020 13:33

pracucci merged commit b7a062f into cortexproject:master Nov 26, 2020

simonswine mentioned this pull request Mar 22, 2022

Add tenant package grafana/dskit#143

Merged

	userID, err := tenant.DefaultResolver().TenantID(r.Context())
	userID, err := tenant.TenantID(r.Context())

Conversation

simonswine commented Nov 11, 2020

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

jtlisi left a comment

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Uh oh!

csmarchbanks left a comment

Choose a reason for hiding this comment

Uh oh!

simonswine commented Nov 19, 2020

Uh oh!

pstibrany commented Nov 19, 2020

Uh oh!

pstibrany left a comment • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

simonswine commented Nov 20, 2020

Uh oh!

simonswine commented Nov 23, 2020

Uh oh!

pstibrany commented Nov 23, 2020

Uh oh!

pracucci left a comment

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

pstibrany left a comment

Choose a reason for hiding this comment

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

5 participants

pstibrany left a comment •

edited

Loading