Ring doc by kavirajk · Pull Request #3486 · grafana/loki

kavirajk · 2021-03-15T07:37:34Z

What this PR does / why we need it:

Document the Hash Ring component used in Loki

What is hash ring
What problem does it solves
How it works?

Which issue(s) this PR fixes:
NA

Special notes for your reviewer:

Based on discussion with @owen-d :)

Checklist

Documentation added
Tests updated

kavirajk · 2021-03-15T07:38:27Z

+
+Token is same as virtual node (or vnode) in consistent hashing.
+
+Number of tokens per ring is configurable via `--ingester.num-tokens`. Default is 128


Q: why can't we have it configurable for distributors?

because distributors are not part of the ring :)

I think distributor also has its own ring. At least that's what I see.

https://github.com/grafana/loki/blob/master/pkg/distributor/distributor.go#L70-L76

cyriltovena · 2021-03-16T08:05:45Z

+
+Loki in microservice mode usually can have multiple ingesters and distributors. This multiple instances of same component(ingester or distributor) forms a ring (more presicisly Consisten Hash Ring).
+
+Both distributors and ingesters have their own ring. Write path looks like Client -> Distributors (ring) -> Ingesters (ring). Read path looks like Client -> Querier -> Ingesters (ring).


Not sure what do you mean by.

Both distributors and ingesters have their own ring.

I actually mean two separate rings for distributor and ingestor.

distributor - https://github.com/grafana/loki/blob/master/pkg/distributor/distributor.go#L52
ingestor - https://github.com/grafana/loki/blob/master/pkg/ingester/ingester.go#L52

I see but they're not all hashring. It seems that you explain hashring here. At least I would focus on ring type to avoid confusion.

cyriltovena · 2021-03-16T08:08:50Z

+
+Loki aggregates incoming log lines into something called Log stream. Log stream is just a set of logs associated with a single tenant and a unique labelset(key value pairs).
+
+Here is the problem, say I have large number of log streams coming in and I have a bunch of Loki servers(can be ingesters or distributors). Now I want to distribute these log streams across the servers in a way I can find it later when I want to read back the log stream. Here is the tricky part. I want to do this without any global directory or lookup service.


This could be rephrase a bit. The technical assumptions are correct.

cyriltovena · 2021-03-16T08:11:13Z

+
+Each ingester belongs to a single hash ring. This hash ring stored in Consul is used to achieve consistent hashing;
+
+Every ingester (also distributor) that is part of the ring has two things associated with it.


Suggested change

Every ingester (also distributor) that is part of the ring has two things associated with it.

Every ingester that is part of the ring has two things associated with it.

see for distributors we store rate limits, not tokens.

Co-authored-by: Cyril Tovena <[email protected]>

owen-d

I think this has some good bits, but we'll need to reorganize it. I'd like you to focus on what the ring is first. I think a section at the end describing how other components use the ring (rulers for scheduling alerting rule evaluation, distributors for calculating rate limits) would be helpful, but we should simplify most of the doc to only talk about our ingestion path.

The Cortex ring docs are a good resource here as well.

I'd like the structure to look like:

- how to spread writes across a pool of Ingesters?
  - why is round robin bad?
    - creates chunks per stream proportional to the number of ingesters
  -  how can we make sure streams are only on replication_factor nodes?
    - hashing!
      - show how hashing helps
    - but re-hashing hurts us when ring membership changes :(
      - consistent hashing! tokens, etc.
- what are our ring options? memberlist, etcd, consul
  - Why do we only need 1 consul/etcd replica?
    - nodes can re-register themselves if the ring store restarts

I also think it'd help if you added some diagrams showing hashing into the ring, how that translates to tokens & thus ingesters, etc.

Finally, there seems to be a bit of confusion about ingesters vs distributors here. Distributors technically store their own ring, but only store one token. The only thing this is used for is to get the total number of distributors so they can calculate per distributor rate limits like (total_rate_limit / n_distributors). Ingesters use the ring to register themselves & an associated set of tokens (vnodes) in order to help route log traffic to them deterministically. The distributors read the ingester ring to know which ingesters they need to send logs to. Again, I would defer the distributor & ruler rings to the end of the document in their own section.

owen-d · 2021-03-17T14:01:52Z

+
+Another way to solve this problem is via hashing. We assign an integer to the servers (say 0, 1, 2, 3.. etc) and we hash our log stream (labelset + tenantID) to integer value 'h' and handover it to the server with the value 'h%n' where n is the max server.
+
+Interestingly, this approach solves some of the problem we have, say same log stream goes to same server (because same set of labels + tenant gives same hash value), and while reading, we can hash it back to find the server where its value is stored.


and while reading, we can hash it back to find the server where its value is stored.

I wouldn't mention this because we don't actually perform this optimization. It's very difficult to do as it changes when the ring membership changes. To get around this, we query all ingesters.

owen-d · 2021-03-17T14:09:59Z

+
+Interestingly, this approach solves some of the problem we have, say same log stream goes to same server (because same set of labels + tenant gives same hash value), and while reading, we can hash it back to find the server where its value is stored.
+
+But this solution lacks some scaling properties. Say if we introduce new server or remove existing server (intentionally or unintentially) then every single log stream will map to different server now.


Suggested change

But this solution lacks some scaling properties. Say if we introduce new server or remove existing server (intentionally or unintentially) then every single log stream will map to different server now.

But this solution lacks some scaling properties. Say if we introduce new server or remove existing server (intentionally or unintentially) then every single log stream may map to different server now.

owen-d · 2021-03-17T14:42:38Z

+
+If Loki is run in microservice mode, both ingester and distributor exposes their ring status via the endpoint `api/v1/ruler/ring`
+
+### Other uses of consistent hashing.


This is not actually true. We use the index to find the chunks. Chunks are content addressed to de-amplify writes: there's no need to store replication_factor identical chunks if their contents are the same. Content addressing helps ensure this because writing the same chunk would have the same address, deduping itself naturally.

owen-d · 2021-03-17T14:43:11Z

+
+### Ring Status page
+
+If Loki is run in microservice mode, both ingester and distributor exposes their ring status via the endpoint `api/v1/ruler/ring`


This endpoint is only the ruler ring. The ingester ring is stored at /ring and that's the one I'd mention here. Followup: we should really expose all the rings on their own endpoints.

CLAassistant · 2021-04-20T17:29:44Z

All committers have signed the CLA.

stale · 2021-06-02T16:05:14Z

This issue has been automatically marked as stale because it has not had any activity in the past 30 days. It will be closed in 7 days if no further activity occurs. Thank you for your contributions.

* Add tenant resolver package This implements the multi tenant resolver as described by the [proposal] for multi tenant query-federation. By default it behaves like before, but it's implementation can be swapped out. [proposal]: cortexproject/cortex#3364 Signed-off-by: Christian Simon <[email protected]> * Replace usages of `ExtractOrgID` Use TenantID or UserID depending on which of the methods are meant to be used. Signed-off-by: Christian Simon <[email protected]> * Replace usages of `ExtractOrgIDFromHTTPRequest` This is replaced by ExtractTenantIDFromHTTPRequest, which makes sure that exactly one tenant ID is set. Signed-off-by: Christian Simon <[email protected]> * Add methods to `tenant` package to use resolver directly Signed-off-by: Christian Simon <[email protected]> * Remove UserID method from Resolver interface We need a better definition for what we are trying to achieve with UserID before we can add it to the interface Signed-off-by: Christian Simon <[email protected]> * Update comment on the TenantID/TenantIDs Signed-off-by: Christian Simon <[email protected]> * Improve performance of NormalizeTenantIDs - reduce allocations by reusing the input slice during de-duplication Signed-off-by: Christian Simon <[email protected]>

bt909 · 2021-06-30T19:55:31Z

+- LEAVING
+- UNHEALTHY
+
+The state ACITVE may receive both read and write requests. While state JOINING can receive only write requests, the state LEAVING may receive read requests.


just typo: ACITVE vs ACTIVE

kavirajk · 2022-03-18T13:55:47Z

Closing this as no activities for long time. Feel free to send new PR if anyone want's to revive the work

danpoltawski · 2022-04-01T06:20:40Z

Seems a shame to have lost this improvement in documentation 😢

kavirajk added 2 commits March 12, 2021 10:15

Ring doc

f0aee28

Hash ring doc

014e8ee

pull-request-size bot added the size/M label Mar 15, 2021

kavirajk requested a review from a team March 15, 2021 07:37

kavirajk commented Mar 15, 2021

View reviewed changes

kavirajk requested a review from owen-d March 15, 2021 07:39

cyriltovena reviewed Mar 16, 2021

View reviewed changes

Comment thread docs/sources/architecture/ring.md Outdated

cyriltovena reviewed Mar 16, 2021

View reviewed changes

Comment thread docs/sources/architecture/ring.md Outdated

cyriltovena reviewed Mar 16, 2021

View reviewed changes

Comment thread docs/sources/architecture/ring.md Outdated

cyriltovena reviewed Mar 16, 2021

View reviewed changes

kavirajk and others added 3 commits March 16, 2021 09:17

Update docs/sources/architecture/ring.md

e6b85f7

Co-authored-by: Cyril Tovena <[email protected]>

Update docs/sources/architecture/ring.md

10855b1

Co-authored-by: Cyril Tovena <[email protected]>

Update docs/sources/architecture/ring.md

98516b4

Co-authored-by: Cyril Tovena <[email protected]>

owen-d suggested changes Mar 17, 2021

View reviewed changes

stale bot added the stale A stale issue or PR that will automatically be closed. label Jun 2, 2021

kavirajk added keepalive An issue or PR that will be kept alive and never marked as stale. and removed stale A stale issue or PR that will automatically be closed. labels Jun 6, 2021

bt909 reviewed Jun 30, 2021

View reviewed changes

kavirajk closed this Mar 18, 2022


		Token is same as virtual node (or vnode) in consistent hashing.

		Number of tokens per ring is configurable via `--ingester.num-tokens`. Default is 128


		Loki in microservice mode usually can have multiple ingesters and distributors. This multiple instances of same component(ingester or distributor) forms a ring (more presicisly Consisten Hash Ring).

		Both distributors and ingesters have their own ring. Write path looks like Client -> Distributors (ring) -> Ingesters (ring). Read path looks like Client -> Querier -> Ingesters (ring).


		Loki aggregates incoming log lines into something called Log stream. Log stream is just a set of logs associated with a single tenant and a unique labelset(key value pairs).

		Here is the problem, say I have large number of log streams coming in and I have a bunch of Loki servers(can be ingesters or distributors). Now I want to distribute these log streams across the servers in a way I can find it later when I want to read back the log stream. Here is the tricky part. I want to do this without any global directory or lookup service.


		Each ingester belongs to a single hash ring. This hash ring stored in Consul is used to achieve consistent hashing;

		Every ingester (also distributor) that is part of the ring has two things associated with it.

	Every ingester (also distributor) that is part of the ring has two things associated with it.
	Every ingester that is part of the ring has two things associated with it.


		Another way to solve this problem is via hashing. We assign an integer to the servers (say 0, 1, 2, 3.. etc) and we hash our log stream (labelset + tenantID) to integer value 'h' and handover it to the server with the value 'h%n' where n is the max server.

		Interestingly, this approach solves some of the problem we have, say same log stream goes to same server (because same set of labels + tenant gives same hash value), and while reading, we can hash it back to find the server where its value is stored.


		Interestingly, this approach solves some of the problem we have, say same log stream goes to same server (because same set of labels + tenant gives same hash value), and while reading, we can hash it back to find the server where its value is stored.

		But this solution lacks some scaling properties. Say if we introduce new server or remove existing server (intentionally or unintentially) then every single log stream will map to different server now.


		If Loki is run in microservice mode, both ingester and distributor exposes their ring status via the endpoint `api/v1/ruler/ring`

		### Other uses of consistent hashing.


		### Ring Status page

		If Loki is run in microservice mode, both ingester and distributor exposes their ring status via the endpoint `api/v1/ruler/ring`

Conversation

kavirajk commented Mar 15, 2021 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Choose a reason for hiding this comment

Uh oh!

kavirajk Mar 16, 2021 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

cyriltovena Mar 16, 2021 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

owen-d left a comment

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

owen-d Mar 17, 2021 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

CLAassistant commented Apr 20, 2021 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

stale bot commented Jun 2, 2021

Uh oh!

Choose a reason for hiding this comment

Uh oh!

kavirajk commented Mar 18, 2022

Uh oh!

danpoltawski commented Apr 1, 2022

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

6 participants

kavirajk commented Mar 15, 2021 •

edited

Loading

kavirajk Mar 16, 2021 •

edited

Loading

cyriltovena Mar 16, 2021 •

edited

Loading

owen-d Mar 17, 2021 •

edited

Loading

CLAassistant commented Apr 20, 2021 •

edited

Loading