Serve back JSON exclusively in amp_validate requests#3898
Serve back JSON exclusively in amp_validate requests#3898westonruter merged 26 commits intodevelopfrom
Conversation
|
Build for testing: amp.zip (v1.5.0-alpha-20191207T181223Z-c3e9208b) |
|
Build for testing: amp.zip (v1.5.0-alpha-20191208T122237Z-31dc013c) |
69d6f23 to
ef657b5
Compare
| return __( 'The fetched URL was not found. It may have been deleted. If so, you can trash this.', 'amp' ) . $error_message . $support_forum_message; | ||
| return $implode_non_empty_strings_with_spaces( | ||
| [ | ||
| esc_html__( 'The fetched URL was not found. It may have been deleted. If so, you can trash this.', 'amp' ), |
There was a problem hiding this comment.
I don't think "trash'' is the right term here. Maybe it could be something like:
The fetched URL was not found. It may have been deleted. If so, you can <a href=”link to delete validated URL”>remove</a> the cached validation errors for this URL.
There was a problem hiding this comment.
You're right. Trash isn't right. Actually the terminology used is “forget”:
amp-wp/includes/validation/class-amp-validated-url-post-type.php
Lines 2281 to 2292 in e6f9dc3
So how about:
| esc_html__( 'The fetched URL was not found. It may have been deleted. If so, you can trash this.', 'amp' ), | |
| esc_html__( 'The fetched URL was not found. It may have been deleted. If so, you can forget this validated URL.', 'amp' ), |
There was a problem hiding this comment.
forget is what I had originally in my previous comment, but it sounded confusing to me. If forget is going to be used though, I would suggest making it a link and adding a perhaps adding a title attribute to provide context.
There was a problem hiding this comment.
In reality, this error message is very rarely shown. It can only happen if someone has a validated URL post open in one tab, then someone forgets the URL in another tab, and then tries to re-check the URL that had been forgotten. As such, I don't think we need to worry too much about tailoring the error message.
| $enable_response_caching | ||
| && | ||
| ! AMP_Validation_Manager::should_validate_response() | ||
| ! AMP_Validation_Manager::$is_validate_request |
There was a problem hiding this comment.
Wouldn't this be more correct?
| ! AMP_Validation_Manager::$is_validate_request | |
| ! AMP_Validation_Manager::$is_validation_request |
There was a problem hiding this comment.
I went with validate because query param is amp_validate and there are methods like:
\AMP_Validation_Manager::get_amp_validate_nonce()\AMP_Validation_Manager::get_validate_url_error_message()\AMP_Validation_Manager::should_validate_response()
The outlier here is \AMP_Validation_Manager::get_validation_response_data(). So to be consistent, I think actually it should be renamed to \AMP_Validation_Manager::get_validate_response_data().
There was a problem hiding this comment.
You don't really care about grammar, do you...? 😜
There was a problem hiding this comment.
See 792f6bb.
I don't really see it as being ungrammatical. It's a “request to validate” aka a “validate request”, like a “request to pull” is also a “pull request”.
This is somewhat of an attempt to distinguish validation which happens with generation of every AMP page, with the “deep” validation that obtains the source information for the underlying validation errors. In other words, validation happens all the time as a side effect of requesting an AMP page, but validate only happens if you are explicitly requesting the validation results.
Where I know you will agree is that AMP_Validation_Manager is doing too much and needs to be split up. So we'll be sure to revisit the terminology when we do that. 😄
There was a problem hiding this comment.
Ha, dangling refactoring carrots in front of me now...
Alright, works for me! 😄
|
Love the simplification! |
…p-validate-requests * 'develop' of github.com:ampproject/amp-wp: (50 commits) Update dependency terser-webpack-plugin to v2.2.3 (#3913) Update dependency core-js to v3.5.0 (#3922) Add missing src directory to build Update dependency css-loader to v3.3.0 (#3903) Update dependency core-js to v3.4.8 (#3901) Update dependency eslint-plugin-import to v2.19.1 (#3902) Update dependency @babel/core to v7.7.5 (#3889) Update dependency browserslist to v4.8.2 (#3883) Update dependency terser-webpack-plugin to v2.2.2 (#3890) Update dependency @babel/cli to v7.7.5 (#3888) Update dependency postcss to v7.0.24 (#3892) Use SORT_REGULAR flag for array_unique() instead of serializing arrays as strings Harmonize logic for getting stylesheet by URL Fix typos in comments and code style Bring sanity to the code Remove redundant validation error data; test for non-redundant data Add test for INCORRECT_NUM_CHILD_TAGS Add tests for MANDATORY_TAG_ANCESTOR, DISALLOWED_TAG_ANCESTOR, and (new) DISALLOWED_TAG_MULTIPLE_CHOICES Add test for MANDATORY_CDATA_MISSING_OR_INCORRECT Add tests for INVALID_CDATA_CONTENTS and DISALLOWED_RELATIVE_URL ...
| 'message' => $should_validate_response->get_error_message(), | ||
| ], | ||
| 400 | ||
| 401 |
There was a problem hiding this comment.
The 401 code requires a WWW-Authenticate to be sent along as well: https://httpstatuses.com/401
I suggest using something like the following:
WWW-Authenticate: WordPress realm="wp-login.php", title="Log in via /wp-login.php"
The above is a guess based on reading up on the syntax (https://tools.ietf.org/html/rfc7235#section-4.1). Feel free to adapt if you know more about this header tag.
There was a problem hiding this comment.
The authentication is not performed via logging-in to WordPress however. With the changes in this PR, authentication is always performed via nonce query param. So instead of WordPress would it be nonce?
Isn't “requires” a bit strong here? WordPress isn't sending back this header. For example:
$ curl -i -X PUT -d 'title=hello' 'https://wordpressdev.lndo.site/wp-json/wp/v2/posts/1'
HTTP/2 401
access-control-allow-headers: Authorization, Content-Type
access-control-expose-headers: X-WP-Total, X-WP-TotalPages
allow: GET
content-type: application/json; charset=UTF-8
date: Thu, 12 Dec 2019 07:59:25 GMT
link: <https://wordpressdev.lndo.site/wp-json/>; rel="https://api.w.org/"
server: nginx/1.14.2
x-content-type-options: nosniff
x-powered-by: PHP/7.3.12
x-robots-tag: noindex
content-length: 107
{"code":"rest_cannot_edit","message":"Sorry, you are not allowed to edit this post.","data":{"status":401}}
There was a problem hiding this comment.
Well, WordPress is hardly a good reference.
From the RFC 7235 (Hypertext Transfer Protocol (HTTP/1.1): Authentication):
The server generating a 401 response MUST send a WWW-Authenticate header field
(Section 4.1) containing at least one challenge applicable to the target resource.
There was a problem hiding this comment.
I don't think anything breaks if we don't send it, but the response would technically be invalid.
There was a problem hiding this comment.
If I understand the RFCs correctly, nonce would be a property of the authentication type Digest. And when you send nonce, you're supposed to send a new one as a value so the client can retry with a valid nonce:
WWW-Authenticate: Digest nonce=12345
There was a problem hiding this comment.
Google is hardly an authority here!
😉
There was a problem hiding this comment.
Well, we might as well go full on board here and actually send the nonce via the Authorization request header as well…
There was a problem hiding this comment.
Humm. Except this would cause problems for sites that are already behind some HTTP authentication and they'd need to provide double authentication.
There was a problem hiding this comment.
In that case I suppose multiple headers could be sent: https://stackoverflow.com/a/38515091
There was a problem hiding this comment.
I started going down this road, but I think it's overkill for now.
Stashed patch:
diff --git a/includes/validation/class-amp-validation-manager.php b/includes/validation/class-amp-validation-manager.php
index 86740dd4..1e9fcc07 100644
--- a/includes/validation/class-amp-validation-manager.php
+++ b/includes/validation/class-amp-validation-manager.php
@@ -19,6 +19,13 @@ class AMP_Validation_Manager {
*/
const VALIDATE_QUERY_VAR = 'amp_validate';
+ /**
+ * Custom HTTP authorization type that is sent in Authorization request header and WWW-Authenticate response header.
+ *
+ * @var string
+ */
+ const VALIDATE_AUTHORIZATION_TYPE = 'AMP-WP-Validate-Nonce-v1';
+
/**
* Query var for containing the number of validation errors on the frontend after redirection when invalid.
*
@@ -1632,11 +1639,44 @@ class AMP_Validation_Manager {
return false;
}
- $validate_key = wp_unslash( $_GET[ self::VALIDATE_QUERY_VAR ] ); // phpcs:ignore WordPress.Security.NonceVerification.Recommended
- if ( ! hash_equals( self::get_amp_validate_nonce(), $validate_key ) ) {
+ $validate_nonce = self::get_amp_validate_nonce();
+ $validate_key = wp_unslash( $_GET[ self::VALIDATE_QUERY_VAR ] ); // phpcs:ignore WordPress.Security.NonceVerification.Recommended
+ if ( hash_equals( $validate_nonce, $validate_key ) ) {
+ return true;
+ }
+
+ if ( empty( $_SERVER['HTTP_AUTHORIZATION'] ) ) {
+ return new WP_Error(
+ 'http_request_failed',
+ esc_html__( 'Missing authorization header.', 'amp' )
+ );
+ }
+
+ $request_nonce = null;
+ $authorizations = array_map( 'trim', explode( ',', wp_unslash( $_SERVER['HTTP_AUTHORIZATION'] ) ) );
+ $pattern = sprintf( '/^%s\s+(?P<nonce>\w+)$/', self::VALIDATE_AUTHORIZATION_TYPE );
+ foreach ( $authorizations as $authorization ) {
+ if ( preg_match( $pattern, $authorization, $match ) ) {
+ $request_nonce = $match['nonce'];
+ break;
+ }
+ }
+
+ if ( empty( $request_nonce ) ) {
+ return new WP_Error(
+ 'http_request_failed',
+ esc_html__( 'Missing authorization header.', 'amp' )
+ );
+ }
+
+ if ( hash_equals( $validate_nonce, $request_nonce ) ) {
return new WP_Error(
'http_request_failed',
- __( 'Nonce authentication failed.', 'amp' )
+ sprintf(
+ /* translators: %s is the WP-CLI command to generate the nonce */
+ esc_html__( 'Nonce authentication failed. The required nonce is returned by %s.', 'amp' ),
+ '<code>wp amp validation generate-nonce</code>'
+ )
);
}
@@ -1817,7 +1857,7 @@ class AMP_Validation_Manager {
public static function validate_url( $url ) {
$added_query_vars = [
- self::VALIDATE_QUERY_VAR => self::get_amp_validate_nonce(),
+ self::VALIDATE_QUERY_VAR => '',
self::CACHE_BUST_QUERY_VAR => wp_rand(),
];
$validation_url = add_query_arg( $added_query_vars, $url );
@@ -1836,6 +1876,7 @@ class AMP_Validation_Manager {
'redirection' => 0, // Because we're in a loop for redirection.
'headers' => [
'Cache-Control' => 'no-cache',
+ 'Authorization' => sprintf( '%s %s', self::VALIDATE_AUTHORIZATION_TYPE, self::get_amp_validate_nonce() ),
],
]
);Co-Authored-By: Alain Schlesser <[email protected]>
| /** | ||
| * Generate the authorization nonce needed for a validate request. | ||
| * | ||
| * @subcommand generate-nonce | ||
| * @alias nonce | ||
| */ | ||
| public function generate_nonce() { | ||
| WP_CLI::line( AMP_Validation_Manager::get_amp_validate_nonce() ); | ||
| } | ||
|
|
||
| /** | ||
| * Get the validation results for a given URL. | ||
| * | ||
| * The results are returned in JSON format. | ||
| * | ||
| * ## OPTIONS | ||
| * | ||
| * <url> | ||
| * : The URL to check. The host name need not be included. The URL must be local to this WordPress install. | ||
| * | ||
| * ## EXAMPLES | ||
| * | ||
| * wp amp validation check-url /about/ | ||
| * wp amp validation check-url $( wp option get home )/?p=1 | ||
| * | ||
| * @subcommand check-url | ||
| * @alias check | ||
| * | ||
| * @param array $args Args. | ||
| */ | ||
| public function check_url( $args ) { | ||
| list( $url ) = $args; | ||
| AMP_Validation_Manager::get_amp_validate_nonce(); | ||
|
|
||
| $host = wp_parse_url( $url, PHP_URL_HOST ); | ||
| $parsed_home_url = wp_parse_url( home_url( '/' ) ); | ||
|
|
||
| if ( $host && $host !== $parsed_home_url['host'] ) { | ||
| WP_CLI::error( | ||
| sprintf( | ||
| /* translators: %1$s is the expected host, %2$s is the actual host */ | ||
| __( 'Supplied URL must be for this WordPress install. Expected host "%1$s" but provided is "%2$s".', 'amp' ), | ||
| $parsed_home_url['host'], | ||
| $host | ||
| ) | ||
| ); | ||
| } | ||
|
|
||
| if ( ! $host ) { | ||
| $origin = $parsed_home_url['scheme'] . '://' . $parsed_home_url['host']; | ||
| if ( ! empty( $parsed_home_url['port'] ) ) { | ||
| $origin .= ':' . $parsed_home_url['port']; | ||
| } | ||
| $url = $origin . '/' . ltrim( $url, '/' ); | ||
| } | ||
|
|
||
| $result = AMP_Validation_Manager::validate_url( $url ); | ||
| if ( $result instanceof WP_Error ) { | ||
| WP_CLI::error( $result ); | ||
| } | ||
|
|
||
| print wp_json_encode( AMP_Validation_Manager::validate_url( $url ), JSON_PRETTY_PRINT | JSON_UNESCAPED_SLASHES ); | ||
| } | ||
|
|
||
| /** |
| WP_CLI::error( $result ); | ||
| } | ||
|
|
||
| print wp_json_encode( AMP_Validation_Manager::validate_url( $url ), JSON_PRETTY_PRINT | JSON_UNESCAPED_SLASHES ); |
There was a problem hiding this comment.
Would it be better to use WP_CLI::line() here?
There was a problem hiding this comment.
WP_CLI::log() should be used here. WP_CLI::line() is a legacy compat method.
There was a problem hiding this comment.
No, forget my rambling. You are correct, WP_CLI::line() should be used here, as this output should not be suppressed when using the --quiet flag.
There was a problem hiding this comment.
Actually, given the input, the command should support a --format flag and the output should be processed via WP_CLI\Utils\format_items() instead.
This way it would default to the normal WP-CLI tabular output, but you could request JSON if you wanted to.
There was a problem hiding this comment.
I don't think the data would work well in a tabular format. Would it? It looks like this:
{
"results": [
{
"error": {
"node_name": "script",
"parent_name": "div",
"code": "DISALLOWED_TAG",
"type": "js_error",
"node_attributes": [],
"text": "document.write(\"BAD!\");",
"sources": [
{
"hook": "the_content",
"filter": true,
"post_id": 2531,
"post_type": "post",
"sources": [
{
"type": "core",
"name": "wp-includes",
"file": "class-wp-embed.php",
"line": 59,
"function": "WP_Embed::run_shortcode"
},
{
"type": "core",
"name": "wp-includes",
"file": "class-wp-embed.php",
"line": 404,
"function": "WP_Embed::autoembed"
},
{
"type": "core",
"name": "wp-includes",
"file": "blocks.php",
"line": 532,
"function": "do_blocks"
},
{
"type": "core",
"name": "wp-includes",
"file": "formatting.php",
"line": 51,
"function": "wptexturize"
},
{
"type": "core",
"name": "wp-includes",
"file": "formatting.php",
"line": 455,
"function": "wpautop"
},
{
"type": "core",
"name": "wp-includes",
"file": "formatting.php",
"line": 838,
"function": "shortcode_unautop"
},
{
"type": "core",
"name": "wp-includes",
"file": "post-template.php",
"line": 1649,
"function": "prepend_attachment"
},
{
"type": "core",
"name": "wp-includes",
"file": "media.php",
"line": 1432,
"function": "wp_make_content_images_responsive"
},
{
"type": "core",
"name": "wp-includes",
"file": "formatting.php",
"line": 5726,
"function": "wp_staticize_emoji"
},
{
"type": "core",
"name": "wp-includes",
"file": "formatting.php",
"line": 5359,
"function": "capital_P_dangit"
},
{
"type": "core",
"name": "wp-includes",
"file": "shortcodes.php",
"line": 177,
"function": "do_shortcode"
},
{
"type": "core",
"name": "wp-includes",
"file": "formatting.php",
"line": 3335,
"function": "convert_smilies"
}
]
},
{
"block_name": "core/shortcode",
"post_id": 2531,
"block_content_index": 0,
"type": "plugin",
"name": "gutenberg",
"file": "build/block-library/blocks/shortcode.php",
"line": 16,
"function": "gutenberg_render_block_core_shortcode"
},
{
"type": "plugin",
"name": "bad-shortcode.php",
"file": "bad-shortcode.php",
"line": 7,
"function": "Bad\\shortcode",
"shortcode": "bad"
}
]
},
"sanitized": true
},
{
"error": {
"node_name": "script",
"parent_name": "div",
"code": "DISALLOWED_TAG",
"type": "js_error",
"node_attributes": [],
"text": "badness",
"sources": [
{
"hook": "the_content",
"filter": true,
"post_id": 2531,
"post_type": "post",
"sources": [
{
"type": "core",
"name": "wp-includes",
"file": "class-wp-embed.php",
"line": 59,
"function": "WP_Embed::run_shortcode"
},
{
"type": "core",
"name": "wp-includes",
"file": "class-wp-embed.php",
"line": 404,
"function": "WP_Embed::autoembed"
},
{
"type": "core",
"name": "wp-includes",
"file": "blocks.php",
"line": 532,
"function": "do_blocks"
},
{
"type": "core",
"name": "wp-includes",
"file": "formatting.php",
"line": 51,
"function": "wptexturize"
},
{
"type": "core",
"name": "wp-includes",
"file": "formatting.php",
"line": 455,
"function": "wpautop"
},
{
"type": "core",
"name": "wp-includes",
"file": "formatting.php",
"line": 838,
"function": "shortcode_unautop"
},
{
"type": "core",
"name": "wp-includes",
"file": "post-template.php",
"line": 1649,
"function": "prepend_attachment"
},
{
"type": "core",
"name": "wp-includes",
"file": "media.php",
"line": 1432,
"function": "wp_make_content_images_responsive"
},
{
"type": "core",
"name": "wp-includes",
"file": "formatting.php",
"line": 5726,
"function": "wp_staticize_emoji"
},
{
"type": "core",
"name": "wp-includes",
"file": "formatting.php",
"line": 5359,
"function": "capital_P_dangit"
},
{
"type": "core",
"name": "wp-includes",
"file": "shortcodes.php",
"line": 177,
"function": "do_shortcode"
},
{
"type": "core",
"name": "wp-includes",
"file": "formatting.php",
"line": 3335,
"function": "convert_smilies"
}
]
},
{
"block_name": "core/html",
"post_id": 2531,
"block_content_index": 1
}
]
},
"sanitized": true
},
{
"error": {
"node_name": "script",
"parent_name": "body",
"code": "DISALLOWED_TAG",
"type": "js_error",
"node_attributes": [],
"text": "document.write( 'I am invalid!!' );",
"sources": [
{
"type": "plugin",
"name": "print-invaliud-footer.php",
"file": "print-invaliud-footer.php",
"line": 8,
"function": "{closure}",
"hook": "wp_footer",
"priority": 10
}
]
},
"sanitized": true
}
],
"queried_object": {
"id": 2531,
"type": "post"
},
"url": "https://wordpressdev.lndo.site/2019/12/05/bad-shortcode/?amp"
}There was a problem hiding this comment.
I've improved the command to make use of the standard WP-CLI data offerings. This now includes JSON, YAML and table formatting, counting, exit codes. But also filtering on a given field and selecting which fields. For table output, I improved the formatting so it is useful as the human-readable format.
Here's an example of table output:
To filter, you can do something like this:
wp amp validation check-url example.org --type=css_error
What do you think about that format, it seems more useful for human consumption than JSON, doesn't it?
Should I commit this into this current PR, or create a separate one?
There was a problem hiding this comment.
A problem with that is the fields vary by error type. Maybe you accounted for this. Doing this in a separate PR makes sense to me.
The table does look nice. But at the same time, the admin screen has this information already presented in an HTML table. I think that perhaps this command should in addition to obtaining the validation results, also store them and then provide the URL to the validated URL screen via a message to STDERR.
…p-validate-requests * 'develop' of github.com:ampproject/amp-wp: Fix tests after security fix in WP 5.3.1 (r46896) Update dependency css-loader to v3.3.2 (#3924) Update dependency terser-webpack-plugin to v2.3.0 (#3923) Convert `theme_features` variable into `get_theme_config` function Get html element directly instead of performing a query Only apply smooth scroll fix on 'Twenty Twenty' <= 1.0.0 Re-add smooth scrolling fix on WP < 5.3.1 Set `scroll-behavior` to `auto` in paired browsing clients No need to apply smooth scrolling fix for Twenty Twenty theme
| jQuery( function( $ ) { | ||
| var validateUrl, postId; | ||
| validateUrl = <?php echo wp_json_encode( add_query_arg( AMP_Validation_Manager::VALIDATE_QUERY_VAR, AMP_Validation_Manager::get_amp_validate_nonce(), self::get_url_from_post( $post ) ) ); ?>; | ||
| validateUrl = <?php echo wp_json_encode( self::get_markup_status_preview_url( self::get_url_from_post( $post ) ) ); ?>; |
There was a problem hiding this comment.
Contrary to the regular short tags (<?) that cause issues with XML and are not supposed to be used, short echo tags (<?=) can be safely used from PHP 5.4+ onwards.
| validateUrl = <?php echo wp_json_encode( self::get_markup_status_preview_url( self::get_url_from_post( $post ) ) ); ?>; | |
| validateUrl = <?= wp_json_encode( self::get_markup_status_preview_url( self::get_url_from_post( $post ) ) ) ?>; |
There was a problem hiding this comment.
True, but we don't allow it yet in the PHPCS config, so we're not using it anywhere else in the codebase yet.
|
|
||
| $admin_bar_icon = $dom->getElementById( 'amp-admin-bar-item-status-icon' ); | ||
| if ( $admin_bar_icon ) { | ||
| $admin_bar_icon->firstChild->nodeValue = "\xE2\x9A\xA0\xEF\xB8\x8F"; // WARNING SIGN: U+26A0, U+FE0F. |
There was a problem hiding this comment.
Maybe put this into a constant with a proper docblock if it needs a separate comment here?
There was a problem hiding this comment.
Perhaps. Corresponding Unicode character are referenced here, but in HTML entity form:
amp-wp/includes/validation/class-amp-validation-manager.php
Lines 425 to 435 in bf0a0a0
| WP_CLI::error( $result ); | ||
| } | ||
|
|
||
| print wp_json_encode( AMP_Validation_Manager::validate_url( $url ), JSON_PRETTY_PRINT | JSON_UNESCAPED_SLASHES ); |
There was a problem hiding this comment.
Actually, given the input, the command should support a --format flag and the output should be processed via WP_CLI\Utils\format_items() instead.
This way it would default to the normal WP-CLI tabular output, but you could request JSON if you wanted to.
Co-Authored-By: Alain Schlesser <[email protected]>
|
If I'm not mistaken, this modifies the response that the loopback request returns (server => server). Therefore, this cannot easily be verified from the staging site during QA, as the Network tab in the developer tools will only show requests initiated on the client. @westonruter Thoughts on whether this needs QA and how we can do it? Otherwise I'd say the automated tests in place are enough to assert that the response has changed. |
|
Agreed. |

Summary
When performing an AMP validation request, the AMP plugin currently gathers up that information and injects it into the HTML response as an HTML comment after the
</body>. This HTML comment contains the validation data represented as JSON. The HTML comment, however, is prone to be corrupted or (in the case of PageSpeed) removed entirely.In thinking about how validation requests are performed, I realized that there is no need to inject the validation JSON data into an HTML document: we can simply return the JSON as the response entirely. This simplifies the process of generating the response, while also improving performance of validation requests by avoiding needlessly to remove HTML source stack comments and to serialize the DOM into HTML.
Additionally, in this PR:
textContentis not writable, in updating the AMP admin bar item.Fixes #3887. Improvements to error messages follow up on #3793.
Todo
\AMP_Validation_Manager::finalize_validation().Checklist