-
Notifications
You must be signed in to change notification settings - Fork 3.2k
Prevent the addition of rel=nofollow for internal URLs #3825
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
|
Hi @thomasplevy! 👋 Thank you for your contribution to WordPress! 💖 It looks like this is your first pull request to No one monitors this repository for new pull requests. Pull requests must be attached to a Trac ticket to be considered for inclusion in WordPress Core. To attach a pull request to a Trac ticket, please include the ticket's full URL in your pull request description. Pull requests are never merged on GitHub. The WordPress codebase continues to be managed through the SVN repository that this GitHub repository mirrors. Please feel free to open pull requests to work on any contribution you are making. More information about how GitHub pull requests can be used to contribute to WordPress can be found in this blog post. Please include automated tests. Including tests in your pull request is one way to help your patch be considered faster. To learn about WordPress' test suites, visit the Automated Testing page in the handbook. If you have not had a chance, please review the Contribute with Code page in the WordPress Core Handbook. The Developer Hub also documents the various coding standards that are followed:
Thank you, |
src/wp-includes/formatting.php
Outdated
| */ | ||
| function _wp_is_url_internal( $href ) { | ||
|
|
||
| if ( in_array( strtolower( wp_parse_url( $href, PHP_URL_SCHEME ) ), array( 'http', 'https' ), true ) ) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Is there any reason to not use wp_allowed_protocols() here? Not sure if URLs like mailto:[email protected] being noted as nofollow ugc is flagged as problematic.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Is there any reason to not use wp_allowed_protocols() here?
I didn't use it because I wasn't aware of it's existence.
I'll update to use.
Not sure if URLs like mailto:[email protected] being noted as nofollow ugc is flagged as problematic.
I'm not an expert here so I did a bit of googling and found a few thinks on various stack exchanges stating that in general crawlers don't crawl mailto: links so nofollow wouldn't be necessary but I don't think it would harm. But, semantically maybe that means that instead of using wp_is_url_internal() to determine the rel as a binary (either ugc or ugc nofollow) a helper function such as wp_get_link_rel_attr() (or something to that effect) should be created to handle this. This function could use the wp_is_internal_link() boolean return but also add additional conditions for links with mailto: protocol (and I'll look through other protocols to see if there might be any other conditions that should be handled differently).
Looks like a mailto: link should probably be ugc if in a comment but otherwise the nofollow would be superfluous. Maybe this is an unnecessary detail though.
src/wp-includes/formatting.php
Outdated
| * @param array $internal_hosts An array of internal URL hostnames. | ||
| */ | ||
| $internal_hosts = apply_filters( | ||
| 'wp_rel_callback_internal_hosts', |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I don't love this filter name, but also not sure what would be better. Maybe just wp_rel_internal_hosts?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
If moving to a new "core" utility function, I think wp_internal_host_urls might be more helpful. Maybe it should even be moved to it's own dedicated function? wp_internal_host_urls() which returns the filtered list only? I like writing code like this myself but didn't want to add additional new functions unless it seemed appropriate.
thomasplevy
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks @desrosj, I've got some follow up comments / questions, I'll get to work on updating this tomorrow.
src/wp-includes/formatting.php
Outdated
| * @param array $internal_hosts An array of internal URL hostnames. | ||
| */ | ||
| $internal_hosts = apply_filters( | ||
| 'wp_rel_callback_internal_hosts', |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
If moving to a new "core" utility function, I think wp_internal_host_urls might be more helpful. Maybe it should even be moved to it's own dedicated function? wp_internal_host_urls() which returns the filtered list only? I like writing code like this myself but didn't want to add additional new functions unless it seemed appropriate.
src/wp-includes/formatting.php
Outdated
| */ | ||
| function _wp_is_url_internal( $href ) { | ||
|
|
||
| if ( in_array( strtolower( wp_parse_url( $href, PHP_URL_SCHEME ) ), array( 'http', 'https' ), true ) ) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Is there any reason to not use wp_allowed_protocols() here?
I didn't use it because I wasn't aware of it's existence.
I'll update to use.
Not sure if URLs like mailto:[email protected] being noted as nofollow ugc is flagged as problematic.
I'm not an expert here so I did a bit of googling and found a few thinks on various stack exchanges stating that in general crawlers don't crawl mailto: links so nofollow wouldn't be necessary but I don't think it would harm. But, semantically maybe that means that instead of using wp_is_url_internal() to determine the rel as a binary (either ugc or ugc nofollow) a helper function such as wp_get_link_rel_attr() (or something to that effect) should be created to handle this. This function could use the wp_is_internal_link() boolean return but also add additional conditions for links with mailto: protocol (and I'll look through other protocols to see if there might be any other conditions that should be handled differently).
Looks like a mailto: link should probably be ugc if in a comment but otherwise the nofollow would be superfluous. Maybe this is an unnecessary detail though.
|
@desrosj I've updated based on your feedback and also covered the missed point 1 on https://core.trac.wordpress.org/ticket/53290 regarding get_comment_author_link(). |
|
I misunderstood how the commands work. Trying again. The php 8.1 and 8.2 test failures are resulting from deprecation warnings thrown by wordpress-develop/src/wp-includes/formatting.php Line 3140 in 6036ee0
The deprecation expectations were added here: 400982a#diff-5e804da16de66a4a86d069033a8a8173c315510bdb522a108f55059ea9919837 Since the I think this is all correct, of course. I'll remove them and update if anyone agrees with me. |
aaronjorbin
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Really liking the direction this is going. I left a couple comments with some thoughts on improvements. Open to discussion on all of them if you think they aren't valuable.
src/wp-includes/comment-template.php
Outdated
| * | ||
| * @param string[] $comment_authors An array of comment author usernames. | ||
| */ | ||
| $non_ugc_comment_authors = apply_filters( 'non_ugc_comment_authors', array() ); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I wonder if this would be better as a filter called comment_is_ugc that could pass in the full $comment rather than a filter of author names. This would be more flexible for plugin authors.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes... but. Semantically I believe this would be the incorrect place to use a filter/function that determines if the comment itself is UGC. This function is retrieving the comment author's link (as submitted by the comment author). So in this particular area we want to know if the author's link is UGC, not whether or not the whole comment is UGC.
So... Merging this with your feedback I'd update it to
comment_author_link_is_ugc, it'll return a boolean instead of an array of usernames, and as a second param it'll receive the WP_Comment object.
The default value will be true and if the filtered value of the var is true we'll add ugc to the $rel_parts array.
What do you think about that?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I wrote the previous comment 3 hours ago and never posted. I rewrote the code and just finished writing tests and now, looking at the new code. I think this particular filter is just unnecessary.
The function get_comment_author_link() shouldn't modify the author link indiscriminately. Although its output can be filtered, it would be better to prevent link tagging for specific authors. This can be achieved by adding the following code on line 223 of wp-includes/comment-template.php:
At the writing of this comment the only available filter is on the full HTML of the anchor via get_comment_author_link
With this PR (and updates from feedback) there will be a new filter specifically on the rel attributes as an array before being composed into the HTML anchor.
If a developer desires to use comment_author_link_rel to make a specific user automatically not ugc, it can be done so there quite easily. I don't see the need to add two filters here, one to build an allowlist of sorts and another to modify the rel attributes directly.
Incoming PR update applies UGC automatically and can be removed via comment_author_link_rel.
src/wp-includes/comment-template.php
Outdated
| * @param string $author The comment author's username. | ||
| * @param string $comment_ID The comment ID as a numeric string. | ||
| */ | ||
| $rel = apply_filters( 'comment_author_link_rel', $rel, $author, $comment->comment_ID ); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This might just be me, but I think this might be better off filtering the $rel_parts since an array can be a little easier to work with than modifying a string.
I do think that $comment should be the first thing passed in as that contains the most information and then the other parameters used here should follow
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
With regards to $rel vs $rel_parts I think I basically copied from make_clickable_rel:
wordpress-develop/src/wp-includes/formatting.php
Lines 2926 to 2935 in 195393f
| /** | |
| * Filters the rel value that is added to URL matches converted to links. | |
| * | |
| * @since 5.3.0 | |
| * | |
| * @param string $rel The rel value. | |
| * @param string $url The matched URL being converted to a link tag. | |
| */ | |
| $rel = apply_filters( 'make_clickable_rel', $rel, $url ); | |
| $rel = esc_attr( $rel ); |
I'd agree with you in general but was hesitant in case there was a good reason to follow this code. It probably doesn't constitute a common pattern that I should be bound to follow though. I'm just not intimately familiar with as much core code as regular contributors probably are.
Anyway, I'll update it since it's a new filter and it certainly is easier to work with the parts.
As par as the args, it's probably not even really necessary to pass anything other than the comment object, right?
So:
$rel_parts = apply_filters( '...', $rel_parts, $comment );
Where $rel_part is the array of parts to be imploded into the rel string and $comment is the WP_Comment object from get_comment().
|
|
||
| /** | ||
| * Returns an array of URL hosts which are considered to be internal hosts. | ||
| * |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
minor: I think a longer description that explains a bit about what "internal" means could be helfpul.
| * @param string $link The URL to test. | ||
| * @return bool Returns true for internal URLs and false for all other URLs. | ||
| */ | ||
| function wp_is_internal_link( $link ) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I would love to see some tests specific to this new function.
A few specific scenarios:
- Making sure that HTTP and https versions of a URL both work when home_url is https
- Making sure that the filter correctly adds/removes domains
| @@ -0,0 +1,81 @@ | |||
| <?php | |||
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I plucked this file from another PR that I got side tracked with when I started working on this. I also plucked a bit of code from that PR on the function get_comment_author_link() to prevent the notices thrown.
See trac https://core.trac.wordpress.org/ticket/57615 and #3980
I realize this now has conflicting code in two semi-related PRs, I can pluck the code out of the other PR or merge it all together but I wanted to keep this PR smaller. But... I also had to add a new tests file to test the changes in the function in this PR... well you know
|
I think this comment got buried: #3825 (comment) I went ahead and dropped the PHP 8.1+ conditions in the related tests. I can rollback the commit if that's problematic. |
|
@aaronjorbin I spoke briefly with @desrosj about this and he said he'd probably be looking at this tomorrow but if you want to push back on any of my above comments let me know and I'll make some changes tomorrow. |
aaronjorbin
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is generally in really good shape. I'm going to make some tweaks to it and get it committed. We may need to iterate on it a bit during beta, but I want it to get in today
|
|
||
| $rel_attr = $rel ? ' rel="' . esc_attr( $rel ) . '"' : ''; | ||
|
|
||
| return "<a {$text}{$rel_attr}>"; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
| return "<a {$text}{$rel_attr}>"; | |
| return "<a {$text} {$rel_attr}>"; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
hey @aaronjorbin this change will result in a double space when the rel attribute is present and an unnecessary space before the closing bracket when the attr is not present since the space is included in the $rel_attr var on line 3180
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks. I fixed that up
src/wp-includes/comment-template.php
Outdated
| * Default current comment. | ||
| * @return string The comment author | ||
| */ | ||
| function wp_is_comment_ugc( $comment_ID = 0 ) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This doesn't seem to be getting used anywhere right now
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
oh shoot yes... sorry I wrote it and then thought better of it and neglected to remove. Updating right now.
1c3d423 to
52a37a5
Compare
|
@aaronjorbin I may have deleted and closed too quickly, it looked like you were done. If you want me to reopen let me know Sorry |
|
@thomasplevy It's all good, you closed it right on time. Thank you for your work on this! |
Trac ticket: https://core.trac.wordpress.org/ticket/56444
Trac ticket: https://core.trac.wordpress.org/ticket/53290
This Pull Request is for code review only. Please keep all other discussion in the Trac ticket. Do not merge this Pull Request. See GitHub Pull Requests for Code Review in the Core Handbook for more details.