Skip to content

Conversation

@thomasplevy
Copy link

Trac ticket: https://core.trac.wordpress.org/ticket/56444
Trac ticket: https://core.trac.wordpress.org/ticket/53290


This Pull Request is for code review only. Please keep all other discussion in the Trac ticket. Do not merge this Pull Request. See GitHub Pull Requests for Code Review in the Core Handbook for more details.

@github-actions
Copy link

github-actions bot commented Jan 6, 2023

Hi @thomasplevy! 👋

Thank you for your contribution to WordPress! 💖

It looks like this is your first pull request to wordpress-develop. Here are a few things to be aware of that may help you out!

No one monitors this repository for new pull requests. Pull requests must be attached to a Trac ticket to be considered for inclusion in WordPress Core. To attach a pull request to a Trac ticket, please include the ticket's full URL in your pull request description.

Pull requests are never merged on GitHub. The WordPress codebase continues to be managed through the SVN repository that this GitHub repository mirrors. Please feel free to open pull requests to work on any contribution you are making.

More information about how GitHub pull requests can be used to contribute to WordPress can be found in this blog post.

Please include automated tests. Including tests in your pull request is one way to help your patch be considered faster. To learn about WordPress' test suites, visit the Automated Testing page in the handbook.

If you have not had a chance, please review the Contribute with Code page in the WordPress Core Handbook.

The Developer Hub also documents the various coding standards that are followed:

Thank you,
The WordPress Project

*/
function _wp_is_url_internal( $href ) {

if ( in_array( strtolower( wp_parse_url( $href, PHP_URL_SCHEME ) ), array( 'http', 'https' ), true ) ) {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is there any reason to not use wp_allowed_protocols() here? Not sure if URLs like mailto:[email protected] being noted as nofollow ugc is flagged as problematic.

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is there any reason to not use wp_allowed_protocols() here?
I didn't use it because I wasn't aware of it's existence.

I'll update to use.

Not sure if URLs like mailto:[email protected] being noted as nofollow ugc is flagged as problematic.

I'm not an expert here so I did a bit of googling and found a few thinks on various stack exchanges stating that in general crawlers don't crawl mailto: links so nofollow wouldn't be necessary but I don't think it would harm. But, semantically maybe that means that instead of using wp_is_url_internal() to determine the rel as a binary (either ugc or ugc nofollow) a helper function such as wp_get_link_rel_attr() (or something to that effect) should be created to handle this. This function could use the wp_is_internal_link() boolean return but also add additional conditions for links with mailto: protocol (and I'll look through other protocols to see if there might be any other conditions that should be handled differently).

Looks like a mailto: link should probably be ugc if in a comment but otherwise the nofollow would be superfluous. Maybe this is an unnecessary detail though.

* @param array $internal_hosts An array of internal URL hostnames.
*/
$internal_hosts = apply_filters(
'wp_rel_callback_internal_hosts',
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't love this filter name, but also not sure what would be better. Maybe just wp_rel_internal_hosts?

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If moving to a new "core" utility function, I think wp_internal_host_urls might be more helpful. Maybe it should even be moved to it's own dedicated function? wp_internal_host_urls() which returns the filtered list only? I like writing code like this myself but didn't want to add additional new functions unless it seemed appropriate.

Copy link
Author

@thomasplevy thomasplevy left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks @desrosj, I've got some follow up comments / questions, I'll get to work on updating this tomorrow.

* @param array $internal_hosts An array of internal URL hostnames.
*/
$internal_hosts = apply_filters(
'wp_rel_callback_internal_hosts',
Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If moving to a new "core" utility function, I think wp_internal_host_urls might be more helpful. Maybe it should even be moved to it's own dedicated function? wp_internal_host_urls() which returns the filtered list only? I like writing code like this myself but didn't want to add additional new functions unless it seemed appropriate.

*/
function _wp_is_url_internal( $href ) {

if ( in_array( strtolower( wp_parse_url( $href, PHP_URL_SCHEME ) ), array( 'http', 'https' ), true ) ) {
Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is there any reason to not use wp_allowed_protocols() here?
I didn't use it because I wasn't aware of it's existence.

I'll update to use.

Not sure if URLs like mailto:[email protected] being noted as nofollow ugc is flagged as problematic.

I'm not an expert here so I did a bit of googling and found a few thinks on various stack exchanges stating that in general crawlers don't crawl mailto: links so nofollow wouldn't be necessary but I don't think it would harm. But, semantically maybe that means that instead of using wp_is_url_internal() to determine the rel as a binary (either ugc or ugc nofollow) a helper function such as wp_get_link_rel_attr() (or something to that effect) should be created to handle this. This function could use the wp_is_internal_link() boolean return but also add additional conditions for links with mailto: protocol (and I'll look through other protocols to see if there might be any other conditions that should be handled differently).

Looks like a mailto: link should probably be ugc if in a comment but otherwise the nofollow would be superfluous. Maybe this is an unnecessary detail though.

@thomasplevy
Copy link
Author

@desrosj I've updated based on your feedback and also covered the missed point 1 on https://core.trac.wordpress.org/ticket/53290 regarding get_comment_author_link().

@thomasplevy
Copy link
Author

thomasplevy commented Jan 31, 2023

Running tests locally using the in-built env scripts and forcing php 8.1 via LOCAL_PHP="8.1-fpm" npm run env:start results in a passing test suite. I didn't try on 8.2. I haven't used the develop test suites a ton so I'm trying to figure out if these are expected to fail or if new code broke them. It seems like new code broke them but I can't recreate that locally to debug further.

I misunderstood how the commands work. Trying again.

The php 8.1 and 8.2 test failures are resulting from deprecation warnings thrown by strtolower() in this line:

if ( in_array( strtolower( wp_parse_url( $atts['href']['value'], PHP_URL_SCHEME ) ), array( 'http', 'https' ), true ) ) {

The deprecation expectations were added here: 400982a#diff-5e804da16de66a4a86d069033a8a8173c315510bdb522a108f55059ea9919837

Since the strtolower() call is removed (moved around) in this call, it looks like these are now invalid failures and the 8.1+ conditions should be removed.

I think this is all correct, of course. I'll remove them and update if anyone agrees with me.

Copy link
Member

@aaronjorbin aaronjorbin left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Really liking the direction this is going. I left a couple comments with some thoughts on improvements. Open to discussion on all of them if you think they aren't valuable.

*
* @param string[] $comment_authors An array of comment author usernames.
*/
$non_ugc_comment_authors = apply_filters( 'non_ugc_comment_authors', array() );
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I wonder if this would be better as a filter called comment_is_ugc that could pass in the full $comment rather than a filter of author names. This would be more flexible for plugin authors.

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes... but. Semantically I believe this would be the incorrect place to use a filter/function that determines if the comment itself is UGC. This function is retrieving the comment author's link (as submitted by the comment author). So in this particular area we want to know if the author's link is UGC, not whether or not the whole comment is UGC.

So... Merging this with your feedback I'd update it to

comment_author_link_is_ugc, it'll return a boolean instead of an array of usernames, and as a second param it'll receive the WP_Comment object.

The default value will be true and if the filtered value of the var is true we'll add ugc to the $rel_parts array.
What do you think about that?

Copy link
Author

@thomasplevy thomasplevy Feb 6, 2023

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I wrote the previous comment 3 hours ago and never posted. I rewrote the code and just finished writing tests and now, looking at the new code. I think this particular filter is just unnecessary.

The function get_comment_author_link() shouldn't modify the author link indiscriminately. Although its output can be filtered, it would be better to prevent link tagging for specific authors. This can be achieved by adding the following code on line 223 of wp-includes/comment-template.php:

At the writing of this comment the only available filter is on the full HTML of the anchor via get_comment_author_link

With this PR (and updates from feedback) there will be a new filter specifically on the rel attributes as an array before being composed into the HTML anchor.

If a developer desires to use comment_author_link_rel to make a specific user automatically not ugc, it can be done so there quite easily. I don't see the need to add two filters here, one to build an allowlist of sorts and another to modify the rel attributes directly.

Incoming PR update applies UGC automatically and can be removed via comment_author_link_rel.

* @param string $author The comment author's username.
* @param string $comment_ID The comment ID as a numeric string.
*/
$rel = apply_filters( 'comment_author_link_rel', $rel, $author, $comment->comment_ID );
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This might just be me, but I think this might be better off filtering the $rel_parts since an array can be a little easier to work with than modifying a string.

I do think that $comment should be the first thing passed in as that contains the most information and then the other parameters used here should follow

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

With regards to $rel vs $rel_parts I think I basically copied from make_clickable_rel:

/**
* Filters the rel value that is added to URL matches converted to links.
*
* @since 5.3.0
*
* @param string $rel The rel value.
* @param string $url The matched URL being converted to a link tag.
*/
$rel = apply_filters( 'make_clickable_rel', $rel, $url );
$rel = esc_attr( $rel );

I'd agree with you in general but was hesitant in case there was a good reason to follow this code. It probably doesn't constitute a common pattern that I should be bound to follow though. I'm just not intimately familiar with as much core code as regular contributors probably are.

Anyway, I'll update it since it's a new filter and it certainly is easier to work with the parts.

As par as the args, it's probably not even really necessary to pass anything other than the comment object, right?

So:

$rel_parts = apply_filters( '...', $rel_parts, $comment );

Where $rel_part is the array of parts to be imploded into the rel string and $comment is the WP_Comment object from get_comment().


/**
* Returns an array of URL hosts which are considered to be internal hosts.
*
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

minor: I think a longer description that explains a bit about what "internal" means could be helfpul.

* @param string $link The URL to test.
* @return bool Returns true for internal URLs and false for all other URLs.
*/
function wp_is_internal_link( $link ) {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I would love to see some tests specific to this new function.

A few specific scenarios:

  • Making sure that HTTP and https versions of a URL both work when home_url is https
  • Making sure that the filter correctly adds/removes domains

@@ -0,0 +1,81 @@
<?php
Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I plucked this file from another PR that I got side tracked with when I started working on this. I also plucked a bit of code from that PR on the function get_comment_author_link() to prevent the notices thrown.

See trac https://core.trac.wordpress.org/ticket/57615 and #3980

I realize this now has conflicting code in two semi-related PRs, I can pluck the code out of the other PR or merge it all together but I wanted to keep this PR smaller. But... I also had to add a new tests file to test the changes in the function in this PR... well you know

@thomasplevy
Copy link
Author

I think this comment got buried: #3825 (comment)

I went ahead and dropped the PHP 8.1+ conditions in the related tests. I can rollback the commit if that's problematic.

@thomasplevy thomasplevy requested review from aaronjorbin and desrosj and removed request for desrosj February 6, 2023 23:47
@thomasplevy
Copy link
Author

@aaronjorbin I spoke briefly with @desrosj about this and he said he'd probably be looking at this tomorrow but if you want to push back on any of my above comments let me know and I'll make some changes tomorrow.

Copy link
Member

@aaronjorbin aaronjorbin left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is generally in really good shape. I'm going to make some tweaks to it and get it committed. We may need to iterate on it a bit during beta, but I want it to get in today


$rel_attr = $rel ? ' rel="' . esc_attr( $rel ) . '"' : '';

return "<a {$text}{$rel_attr}>";
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
return "<a {$text}{$rel_attr}>";
return "<a {$text} {$rel_attr}>";

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

hey @aaronjorbin this change will result in a double space when the rel attribute is present and an unnecessary space before the closing bracket when the attr is not present since the space is included in the $rel_attr var on line 3180

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks. I fixed that up

* Default current comment.
* @return string The comment author
*/
function wp_is_comment_ugc( $comment_ID = 0 ) {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This doesn't seem to be getting used anywhere right now

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

oh shoot yes... sorry I wrote it and then thought better of it and neglected to remove. Updating right now.

@thomasplevy thomasplevy closed this Feb 7, 2023
@thomasplevy thomasplevy deleted the fix/rel-ugc-nofollow branch February 7, 2023 19:05
@thomasplevy
Copy link
Author

@aaronjorbin I may have deleted and closed too quickly, it looked like you were done.

If you want me to reopen let me know

Sorry

@aaronjorbin
Copy link
Member

@thomasplevy It's all good, you closed it right on time. Thank you for your work on this!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants