Fix PuSH to instances where WEB_DOMAIN != LOCAL_DOMAIN by ClearlyClaire · Pull Request #2739 · mastodon/mastodon

ClearlyClaire · 2017-05-03T05:08:31Z

This is an attempt to fix #2672 and hasn't been tested yet.
This is done in two ways:

Push public toots without attempting to verify the callback URL against the followers
Add host part of followers' remote_url to the list of allowed domains. This makes assumptions not in OStatus, but should work with other Mastodon instances
Rspec tests should be added, and the way the followers_domains is built should probably be optimized.

ghost · 2017-05-03T05:53:37Z

Is modifying Account#followers_domains a good idea? I don't find it used elsewhere, but people would likely assume it to mean the Account#domains of Account#followers. Maybe make something separate like Account#followers_subscription_domains?

Also it would be nice to have a test case to make sure this kind of regression won't happen anymore.

ClearlyClaire · 2017-05-03T06:17:39Z

I'm open about using a new Account#followers_subscription_domains or similar, but you're not sure those will actually be used for subscription (except for Mastodon instances). Moreover, I can't really think of any use for Account#followers_domains as the domain part of the acct: URI doesn't mean much anyway (since actor URI, the feeds and API endpoints are what matters and can be hosted on a different domain entirely, as its the case with the issue at end). Either way, this should be documented. As for the tests, I agree, but I haven't had the time to write them, and I'm on my way to my dayjob now.

ghost · 2017-05-03T06:38:26Z

but you're not sure those will actually be used for subscription

Yes, but what matters is that Mastodon code use them for checking subscription validity. It is for making the code easier to understand.

I can't really think of any use for Account#followers_domains as the domain part of the acct: URI doesn't mean much anyway

True. I can't think of any either. So we can rename followers_domains, instead of keeping it and making something new. No reason for continued usage of an underspecific name, after all. Maybe there is going to be some feature down the line that use another set of domains, who knows?

Either way, this should be documented.

I agree. It should be noted that an assumption is made here.

Gargron · 2017-05-03T10:05:24Z

app/workers/pubsubhubbub/distribution_worker.rb

This becomes hard to read because of "unless not" so maybe "next if hidden and not included"

If WEB_DOMAIN and LOCAL_DOMAIN must be related by a root domain (i.e. one of them is a subdomain) then maybe that can be the check instead?

Yes, it's a bit unreadable, I will change that. As for WEB_DOMAIN being a subdomain of LOCAL_DOMAIN, I'm uneasy about that, that is not an assumption we have made so far, and not one GNU Social has made either. I think multiple GNU Social users have acct: URIs pointing to different domains entirely (I can think of MMN-o, but some of those can be considered aliases). It would make the check easier, though.

Gargron · 2017-05-03T10:07:01Z

app/models/account.rb

I'm a bit concerned because this will be a huge array, every remote follower has a unique remote_url, distinct is kinda useless here

Ah, yeah, I just copied it, but the “distinct” is useless indeed. I'm not sure how to optimize that. We could just .uniq the array, but I'm not sure it will be very useful. We could also add a “remote_url_domain” field to Account, to parse the URL only when refreshing the remote account details, and not at each private/direct message, but I'm not sure how much better that is.
One part of the change was also to compute this array only when pushing private/direct messages, and not public/unlisted ones.

I'll push a slightly optimized version of this, which should probably be ok for about a thousand followers across a few hundreds remote domains, but not much more.

ClearlyClaire · 2017-05-03T13:57:19Z

I have updated my PR with a few optimizations and fixes, but it will probably not scale well to users with thousands of followers. The best way to handle this would probably be to store the parsed-and-normalized domain alongside the URL in a field of the “accounts” table, but that would probably make for a fairly long-running database transition script…

ClearlyClaire · 2017-05-03T16:02:02Z

The last commit introduces a web_domain field to the accounts table as a cache for the normalized domain part of remote_url. This comes with a migration script that I fear will be quite long-running.

Gargron · 2017-05-06T02:08:03Z

db/migrate/20170503172351_add_web_domain_to_accounts.rb

You are absolutely right that this migration script is impossible to run. It will take hours if not days, and migrations place a lock on the entire schema meaning it would be hours of downtime. I am almost veering towards your initial approach of just iterating over everything.

Yeah, sorry about making you do the migration bit. See my comment on the other file.

I don't think that would ever take days, but hours is entirely possible, and that's not acceptable for a migration script. For the record, on my really small instance (only ~2k remote accounts known, but also low-end ARM hardware), the migration script took about 2 minutes to run.

Gargron · 2017-05-06T02:08:34Z

app/workers/pubsubhubbub/distribution_worker.rb

There is a second approach. Instead of checking whether the destination is allowed during the loop here, do it in Pubsubhubbub::DeliveryWorker instead.

It should be made a hashtable, or at least .uniq as @ThibG commented on the last review (though he said he is 'not sure it will be very useful'). An O(n^2) operation is still O(n^2) on threads: a few threads are nothing compared to 10k^2 if the person has a lot of followers.

You're distributing to the same number of subscriptions either way, the difference is between checking a huge array for a value that needs to be transformed first (high memory use), or checking just one thing in the responsible job (low memory use)

Also, lots of small jobs are overall better for the system than a few long-running ones, because small jobs return consumed resources like connections faster, so jobs of different varities can run more or less at the same time, while one long-running job may block a sidekiq worker for a long time.

I see what you mean. Not passing the table to the worker, but leveraging the DB index, thus having both low complexity and benefits of small jobs. Much more clever than mine!

EDIT: Realized DB index might not be constant time.

In my last proposal (well, before the last commit which switches to a database field), I am already using “.uniq” to avoid .normalize()-ing URLs from the same domain, as it's the most expensive part of the computation. This does however require the use of a large array at some point, indeed.

Checking it in Pubsubhubbub::DeliveryWorker would involve doing the whole thing for each PuSH subscriber (so, roughly multiply the total running time by the number of remote instances), and avoiding the big array means you will do multiple a .normalize() call for each follower, even if multiple ones happen to use the same domain…

@ThibG is right. Forgot that PuSH subscriptions are anonymous (again) for the first time. The DB index cannot be relied on. I believe filling the database gradually is a good idea.

If adding a column is expensive what about creating a new table? Is it a concievable situation for a user to have multiple 'authorized' domains?

Yes, it is conceivable as it is only a guess. For instance, “authorized” domains include the domain part of the acct: URI too, and we might also allow the domain part of the PuSH or salmon endpoints…

Sorry, but I insist on the approach I described. Please go back to the approach before migrations/db changes, and move the check to the delivery worker.

I'm not sure what you want me to do here. We would not be able to share the domains array anymore, and would need to re-compute it instead. Or iterate over all the followers without deduplicating as soon as possible, which will ease the RAM requirements, but would make each job potentially run much longer than the “big” one.

EDIT: To clarify, so far, I have a .uniq(&:host) call between parsing and normalizing, as normalizing is much more expensive than parsing, and deduplicating in this order provides a huge performance boost over iterating over each of them and normalizing them.

Do you want me to use something like:

def is_domain_subscribed?(domain) return true if followers.where(domain: domain).count.positive? return true if followers.where(web_domain: domain).count.positive? followers.reorder(nil).where('accounts.domain IS NOT NULL AND accounts.web_domain IS NULL') .select('accounts.remote_url').find_each do |url| return true if domain == Addressable::URI.parse(url).normalized_host end false end

As I said earlier, I fear this is going to be much slower than what I came up with earlier (at least without the web_domain field in the database). But you're right that it should consume less memory.

ClearlyClaire · 2017-05-06T08:13:14Z

Updated with my last suggestion (using both the web_domain value from the database as well as computing a set of domains on-the-fly for remote followers that don't have the value set). I haven't tested it yet.

ClearlyClaire · 2017-05-06T13:16:04Z

Fixed a mistake in an earlier commit (which used Addressable::URI's domain method instead of host). Now running on my instance, haven't seen any issue so far, but obviously haven't seen any improvement either, since the remote instances would have to run the fix.
In the meantime, I'm monitoring how fast the new web_domain field gets updated for known remote users.

ClearlyClaire · 2017-05-06T14:34:30Z

I'm investigating a few ways to significantly reduce the time we spend parsing and normalizing URIs:

Using uri.normalized_host instead of uri.normalize.host, avoiding to normalize the other components and creating a new object
Using the “idn-ruby” gem which implements IDNA natively instead of using the default pure Ruby implementation

EDIT: With those changes, parsing 600000 URLs on my i5 takes ~35s (instead of the 3 minutes it took prior to those changes). The slightly modified migration script takes about 1m30s on my low-end ARM server. Maybe this is a viable solution again?

Gargron · 2017-05-09T21:53:18Z

#2954 changes the distribution to work for everybody for public statuses again. I don't have a good solution to tracking "web_domain" since it's not something the protocol announces in any way. For all I know, the Atom feed, Salmon endpoint and PuSH hub of a user could all be on different domains. :(

ClearlyClaire · 2017-05-10T03:15:26Z

Le 9 mai 2017 23:53:25 GMT+02:00, Eugen Rochko <[email protected]> a écrit :

#2954 changes the distribution to work for everybody for public statuses again.

Good! That will already be a much better situation, although it would be good if follower-only toots worked too.

I don't have a good solution to tracking "web_domain" since it's not something the protocol announces in any way. For all I know, the Atom feed, Salmon endpoint and PuSH hub of a user could all be on different domains. :(

Yes, indeed. This notion of private toots is a departure from OStatus anyway, which does not impose any kind of relation on the different URLs. My proposal was for it to work with *Mastodon* instances where LOCAL_DOMAIN ≠ WEB_DOMAIN.

ClearlyClaire · 2017-05-10T06:51:31Z

I have rebased my branch on latest master and completely revamped it: first, there is a bunch of fixes and performance improvements, and the last commit is the most invasive one and introduces the “web_domain” field to the database.

Unlike in my previous approaches, this is meant to be set only when the domain part of the remote_url and the domain of an Account differ, to reduce the need for deduplication.

Moreover, the accounts are not updated by a migration script, and the update mechanism relies on the normal remote account mechanism instead, thus having almost no impact on performances, at the cost of undelivered toots when all the following conditions are met:

The toot is private (followers-only)
The remote follower is older than the migration and did not get updated
The remote follower is on an instance where WEB_DOMAIN ≠ LOCAL_DOMAIN

EDIT: I haven't thoroughly tested it yet.

ClearlyClaire · 2017-05-30T16:01:33Z

Let me bump this PR again now that 1.4.1 is released! As earlier, I have been running with this changes (regularly rebasing them on master) without any issue I am aware of.

Gargron · 2017-06-02T20:14:27Z

@ThibG Yeah, sorry about the delays on this one - I'm always hesitant to change db schema. Especially when in this case it's so specific to WEB_DOMAIN which is a Mastodon only thing and not even recommended. However, this led to me thinking - why don't we make more use of Webfinger aliases? Webfinger provides a canonical acct, and an array of aliases. When using WEB_DOMAIN, we could list that among the aliases. On the other end, perhaps accounts should have an aliases string array column, which would keep track of those. We still have the "that's a lot of select + parsing out domain out of acct or URL" problem but at least it's not hardcoded to web_domain

ClearlyClaire · 2017-06-02T20:26:13Z

Yes, I can understand why you would be hesitant to change db schema for this. Webfinger aliases would contain entire URIs that would probably have to be parsed each time, which will probably be way too slow (my current solution is to save the domain itself in web_domain, at remote account creation/update, so that the lookup is fast). As for including it in the aliases, the profile URL (thus including the WEB_DOMAIN) is already part of the advertised aliases.

…tance

When we are only interested in the normalized host, calling normalized_host avoids normalizing the other components of the URI as well as creating a new object

…L_DOMAIN While PuSH subscriptions are anonymous, and instance-wide in practice, Mastodon attempts to deliver private (follower-only) toots to instances known to host followers. To do so, it checks that the PuSH callback has the same host name as one of the followers. This assumption is insufficiant for Mastodon instances for which WEB_DOMAIN ≠ LOCAL_DOMAIN. This fix adds a “web_domain” field to Account, which will be set for remote accounts whose remote_url is hosted on a different domain as the domain part of their “acct:” URI. while this makes assumptions not present in OStatus, this is consistent with Mastodon's behaviour and does not limit federation in any way.

Gargron · 2017-07-10T03:05:11Z

I have a new idea. I'm going to implement http signatures (there is a draft rfc for it). PuSH subscription requests will be signed on behalf of the user who triggered the subscription. We don't really need a specific user but once we have a user we can store the domain in the subscriptions table, and then filter by it.

ClearlyClaire · 2017-07-11T20:21:42Z

That seems sensible to me as long as it is optional (since that's not part of OStatus).

Resolves #2739

…ain (#4205) * Fix #2672 - Connect signed PuSH subscription requests to instance domain Resolves #2739 * Fix return of locate_subscription * Fix tests

…ance domain (mastodon#4205) * Fix mastodon#2672 - Connect signed PuSH subscription requests to instance domain Resolves mastodon#2739 * Fix return of locate_subscription * Fix tests

ClearlyClaire force-pushed the fix-push branch 3 times, most recently from 8e5b5db to e6bef1a Compare May 3, 2017 05:37

ClearlyClaire force-pushed the fix-push branch from e6bef1a to 585e974 Compare May 3, 2017 07:11

Gargron requested changes May 3, 2017

View reviewed changes

ClearlyClaire force-pushed the fix-push branch 2 times, most recently from 450b822 to 8ddab9d Compare May 3, 2017 11:33

ClearlyClaire force-pushed the fix-push branch 4 times, most recently from 17a04a3 to e6f416c Compare May 3, 2017 16:51

Gargron requested changes May 6, 2017

View reviewed changes

ClearlyClaire force-pushed the fix-push branch from ca1ea95 to 80200d5 Compare May 6, 2017 08:02

ClearlyClaire force-pushed the fix-push branch from 80200d5 to eb214ee Compare May 6, 2017 12:45

ClearlyClaire force-pushed the fix-push branch from b6445b8 to 2e16d33 Compare May 6, 2017 16:30

ClearlyClaire force-pushed the fix-push branch from 2e16d33 to f147c94 Compare May 10, 2017 06:43

ClearlyClaire force-pushed the fix-push branch 3 times, most recently from 2cce098 to 1d03265 Compare May 12, 2017 17:01

ClearlyClaire force-pushed the fix-push branch 3 times, most recently from 0aff090 to 98aaa62 Compare May 29, 2017 10:14

akihikodaki added the bug Something isn't working label Jun 3, 2017

ClearlyClaire force-pushed the fix-push branch 4 times, most recently from efaedcc to b909cf9 Compare June 10, 2017 16:45

ClearlyClaire force-pushed the fix-push branch 2 times, most recently from 26340e5 to 7c4ed60 Compare June 15, 2017 07:21

ClearlyClaire force-pushed the fix-push branch 2 times, most recently from 171a38e to c58f4da Compare June 25, 2017 09:07

ClearlyClaire added 4 commits June 25, 2017 19:49

Normalize push callback URL when checking it is from a follower's ins…

b420267

…tance

Add dependency on idn-ruby to speed up URI normalization

c488838

Use normalized_host instead of normalize.host when applicable

5520dc1

When we are only interested in the normalized host, calling normalized_host avoids normalizing the other components of the URI as well as creating a new object

ClearlyClaire force-pushed the fix-push branch from c58f4da to 8ab923b Compare June 25, 2017 17:51

Gargron mentioned this pull request Jul 11, 2017

HTTP signatures #4146

Merged

5 tasks

Gargron added a commit that referenced this pull request Jul 14, 2017

Fix #2672 - Connect signed PuSH subscription requests to instance domain

9093df2

Resolves #2739

Gargron mentioned this pull request Jul 14, 2017

Fix #2672 - Connect signed PuSH subscription requests to instance domain #4205

Merged

Gargron closed this in #4205 Jul 14, 2017

ClearlyClaire deleted the fix-push branch July 15, 2017 14:44

Uh oh!

Conversation

ClearlyClaire commented May 3, 2017

Uh oh!

ghost commented May 3, 2017

Uh oh!

ClearlyClaire commented May 3, 2017 via email

Uh oh!

ghost commented May 3, 2017 • edited by ghost Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

ClearlyClaire commented May 3, 2017

Uh oh!

ClearlyClaire commented May 3, 2017

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

ghost May 6, 2017 • edited by ghost Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

ghost May 6, 2017 • edited by ghost Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

ClearlyClaire May 6, 2017 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

ClearlyClaire commented May 6, 2017

Uh oh!

ClearlyClaire commented May 6, 2017 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

ClearlyClaire commented May 6, 2017 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

Gargron commented May 9, 2017

Uh oh!

ClearlyClaire commented May 10, 2017 via email

Uh oh!

ClearlyClaire commented May 10, 2017 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

ClearlyClaire commented May 30, 2017

Uh oh!

Gargron commented Jun 2, 2017

Uh oh!

ClearlyClaire commented Jun 2, 2017

Uh oh!

ghost commented May 3, 2017 •

edited by ghost

Loading

ghost May 6, 2017 •

edited by ghost

Loading

ghost May 6, 2017 •

edited by ghost

Loading

ClearlyClaire May 6, 2017 •

edited

Loading

ClearlyClaire commented May 6, 2017 •

edited

Loading

ClearlyClaire commented May 6, 2017 •

edited

Loading

ClearlyClaire commented May 10, 2017 •

edited

Loading