Skip to content

Don't normalize URIs in Unicode NFKC #4837

@unarist

Description

@unarist

Mastodon normalizes URIs in the status by using Addressable, to NFKC.

https://github.com/tootsuite/mastodon/blob/85c7c42098b5a5b10033eb8de79f6cb5dce8d462/app/lib/formatter.rb#L102-L109

Well, what if the linked website has unnormalized URI? The link will be broken if we normalize it.

For example, we can't link this article in the toots (and it can be posted to Twitter and Facebook):

http://katsu2000x.hatenablog.com/entry/2017/03/17/android%E3%81%A7%E9%80%9A%E7%9F%A5%E3%81%8C%E6%9D%A5%E3%81%AA%E3%81%84%E4%BA%8B%E8%B1%A1%E3%82%92%E5%9B%9E%E9%81%BF%E3%81%99%E3%82%8B%EF%BC%88IPv6%E7%B7%A8%EF%BC%89_

Also I don't know why we need to normalize non user-input URIs, like hub_topic.

Looks like RFC3987 mentioned this problem in sec. 5.3.2.2.


  • I searched or browsed the repo’s other issues to ensure this is not a duplicate.
  • This bug happens on a tagged release and not on master (If you're a user, don't worry about this).

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions