Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Implement Compression Streams API for URL Metrics JSON Compression #1905

Merged
merged 27 commits into from
Mar 14, 2025

Conversation

b1ink0
Copy link
Contributor

@b1ink0 b1ink0 commented Mar 4, 2025

Summary

Fixes #1893

Relevant technical choices

This PR implements the Compression Streams API to efficiently compress URL Metrics JSON before sending it via navigator.sendBeacon(). This significantly reduces the payload size, allowing more data to be sent without exceeding the 64 KiB limit.

@b1ink0 b1ink0 added [Type] Enhancement A suggestion for improvement of an existing feature [Plugin] Optimization Detective Issues for the Optimization Detective plugin labels Mar 4, 2025
@b1ink0 b1ink0 added this to the optimization-detective n.e.x.t milestone Mar 4, 2025
Copy link

codecov bot commented Mar 5, 2025

Codecov Report

All modified and coverable lines are covered by tests ✅

Project coverage is 71.18%. Comparing base (5600db9) to head (c0e3291).
Report is 79 commits behind head on trunk.

Additional details and impacted files
@@            Coverage Diff             @@
##            trunk    #1905      +/-   ##
==========================================
+ Coverage   71.01%   71.18%   +0.16%     
==========================================
  Files          85       85              
  Lines        6959     7000      +41     
==========================================
+ Hits         4942     4983      +41     
  Misses       2017     2017              
Flag Coverage Δ
multisite 71.18% <100.00%> (+0.16%) ⬆️
single 40.68% <8.51%> (-0.26%) ⬇️

Flags with carried forward coverage won't be shown. Click here to find out more.

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

🚀 New features to boost your workflow:
  • Test Analytics: Detect flaky tests, report on failures, and find test suite problems.
  • 📦 JS Bundle Analysis: Save yourself from yourself by tracking and limiting bundle sizes in JS merges.

@westonruter
Copy link
Member

A couple things following up from #1886:

if ( isDebug ) {
error( 'Unable to look up XPath for element' );
}

For this let's remove the if statement and change error to warn, so the above is just:

warn( 'Unable to look up XPath for element' ); 

Note that warn() will only output when isDebug by default. This really should never even occur because breadcrumbedElementsMap is composed of elements which are explicitly being observed by the Intersection Observer. Given that, the if ( ! xpath ) { continue; } could be removed entirely as well, but doesn't hurt to include along with warn() for development support.

For this:

if ( isDebug ) {
error(
`Unable to send URL Metric because it is ${ compressedJsonBody.size.toLocaleString() } bytes, ${ Math.round(
percentOfBudget
) }% of ${ maxBodyLengthKiB } KiB limit:`,
urlMetric
);
}

Let's just remove the if() statement so we always call error(). This will output the message to the console always, but this is a serious issue which should get logged.

@b1ink0 b1ink0 marked this pull request as ready for review March 6, 2025 12:03
@b1ink0 b1ink0 requested a review from felixarntz as a code owner March 6, 2025 12:03
@b1ink0 b1ink0 requested a review from westonruter March 6, 2025 12:03
Copy link

github-actions bot commented Mar 6, 2025

The following accounts have interacted with this PR and/or linked issues. I will continue to update these lists as activity occurs. You can also manually ask me to refresh this list by adding the props-bot label.

If you're merging code through a pull request on GitHub, copy and paste the following into the bottom of the merge commit message.

Co-authored-by: b1ink0 <[email protected]>
Co-authored-by: westonruter <[email protected]>

To understand the WordPress project's expectations around crediting contributors, please review the Contributor Attribution page in the Core Handbook.

@westonruter
Copy link
Member

Amazing! With 200 images being tracked, the result is now:

Note

Sending URL Metric (5,167 bytes, 8% of 64 KiB limit)

Whereas before it was an error:

Caution

Unable to send URL Metric because it is 101,527 bytes, 159% of 64 KiB limit

I can even go up to 1,000 images and it's still well under the limit:

Note

Sending URL Metric (20,343 bytes, 32% of 64 KiB limit)

@@ -1,3 +1,5 @@
// noinspection JSUnusedGlobalSymbols
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I added this because in PhpStorm because otherwise there is a warning:

image

The reason the IDE doesn't know the export is used is because it is used in PHP and via a dynamic export. See also https://stackoverflow.com/questions/54687009/export-default-unused-in-webstorm

This noinspection annotation is used in Gutenberg once as well: https://github.com/WordPress/gutenberg/blob/8b88ada2fd675509bf0c39a55c23a75cc67987cb/packages/components/src/mobile/link-settings/test/edit.native.js#L1

Copy link
Member

@westonruter westonruter left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is so good, but there's one more thing that needs to be done: we need to account for whether gzdecode() is available.

}

try {
$decompressed_body = gzdecode( $compressed_body );
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I looked at this in PhpStorm and I noticed an error:

image
So then I added it to require in c59c3ed, but then I wondered what core does with this function. I found this:

https://github.com/WordPress/wordpress-develop/blob/d5a3a140ee0e459acedd3c182274a2b14adc5bdb/src/wp-includes/class-wp-http-encoding.php#L72-L78

Note that core checks if the function exists. I think what we'll have to do is do the same, and that od_get_detection_script() we'll need to add a new param which is exported to the client like gzdecodeAvailable which contains the value of function_exists( 'gzdecode' ). This can then be used to conditionally compress the URL Metric data before sending it to the REST API.

We should use the function in the same way it is being used in core, perhaps even with the error suppression.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Added logic for handling gzdecode() availability in af1d1dc

function od_decompress_rest_request_body( $result, WP_REST_Server $server, WP_REST_Request $request ) {
if (
$request->get_route() === '/' . OD_REST_API_NAMESPACE . OD_URL_METRICS_ROUTE &&
'application/gzip' === $request->get_header( 'Content-Type' )
Copy link
Member

@westonruter westonruter Mar 10, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
'application/gzip' === $request->get_header( 'Content-Type' )
'application/gzip' === $request->get_header( 'Content-Type' ) &&
function_exists( 'gzdecode' )

const jsonBody = JSON.stringify( urlMetric );
const compressedJsonBody = await compress( jsonBody );
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Per above, this should be conditional based on gzdecodeAvailable.

@@ -346,3 +325,65 @@ function od_trigger_page_cache_invalidation( int $cache_purge_post_id ): void {
/** This action is documented in wp-includes/post.php. */
do_action( 'save_post', $post->ID, $post, /* $update */ true );
}

/**
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I just had a thought that perhaps an Accept response header should be added for this endpoint. This apparently isn't needed because I don't see Accept: application/json being served anywhere now in REST API responses. Just something to think about.

Copy link
Member

@westonruter westonruter left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Minor suggestion: Let's have gzdecodeAvailable always passed to the client.

Comment on lines 221 to 241
/*
* The limit for data sent via navigator.sendBeacon() is 64 KiB. This limit is checked in detect.js so that the
* request will not even be attempted if the payload is too large. This server-side restriction is added as a
* safeguard against clients sending possibly malicious payloads much larger than 64 KiB which should never be
* getting sent.
*/
$max_size = 64 * 1024;
$content_length = strlen( (string) wp_json_encode( $url_metric ) );
if ( $content_length > $max_size ) {
return new WP_Error(
'rest_content_too_large',
sprintf(
/* translators: 1: the size of the payload, 2: the maximum allowed payload size */
__( 'JSON payload size is %1$s bytes which is larger than the maximum allowed size of %2$s bytes.', 'optimization-detective' ),
number_format_i18n( $content_length ),
number_format_i18n( $max_size )
),
array( 'status' => 413 )
);
}

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I removed this check from od_handle_rest_request as we can compress the URL metrics. However, in cases where gzdecode is unavailable, should we reintroduce this check in od_handle_rest_request, conditioned upon the availability of gzdecode?

Additionally, if a malicious user sends data with application/json content type and gzdecode is available, there is currently no check enforcing an upper limit on the size of the URL metrics.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Oh yes, good points!

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Should we add a limit of 1MB for URL metrics? I am not sure about the what should be the limit.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I wonder as well, when gzdecode() is available, what happens if I try (maliciously) submitting a URL Metric which has 1,000,000 instances of the same element? Given that compression will likely be very effective in this case with the duplication, will it result in something extremely large being submitted when decompressed? If so, should we have some fallback size constraint for a decompressed URL Metric?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

(I think we were thinking along the same lines here and commented at the same time.)

Yes, let's maybe introduce a new function that returns the maximum byte size for a URL Metric serialized to JSON. The function's return value can be filtered, but defaulting to MB_IN_BYTES seems good to me. This should perhaps get exported to the client so that the pre-compressed JSON size can be checked to see if it is too big so the client will skip sending it in that case.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Just a thought—should we compress the URL metrics before storing them in the database? Would using gzdecode cause performance issues, since we'll need to decompress the data on every request to the optimized page? Compressing the metrics could allow us to store larger URL metric data, but I'm not sure if any real-world scenario would generate more than 1 MB of URL metrics.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't think we need to compress in the DB, no. That would be an over-optimization and could negatively impact performance at runtime with having to decompress. Storage is cheap. What is expensive is network bandwidth, especially with the limitations of the beacon size. So that's what we want to optimize for.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I tried submitting a URL Metric with various numbers of copies of the first element completely populating the elements. I was able to get up to 18,000 identical elements being copied before going over the 64 KiB limit with compression. Before this URL Metric was compressed, it was a total of 10,584,362 bytes (10.6 MB). So 1,800 elements is roughly 1 MB. This seems like a reasonable limit to impose.

Copy link
Member

@westonruter westonruter Mar 12, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I tried doing something silly and visit every single tag:

add_action( 'od_register_tag_visitors', function ( OD_Tag_Visitor_Registry $registry ) {
	$registry->register( 'all', '__return_true' );
} );

On a post with 1,000 Image blocks, this resulted in 2,198 elements in the URL Metric. The uncompressed size of the URL Metric is 1,068,353 bytes (about 1 MB), with the compressed size being 33,047 bytes (52% of 64 KiB limit). So indeed, I think 1 MB is probably far larger than any URL Metric will naturally be, so it's a safe upper bound for what should be allowed.

Copy link
Member

@westonruter westonruter left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

❤️

@westonruter westonruter merged commit da320f3 into WordPress:trunk Mar 14, 2025
19 checks passed
westonruter added a commit to hbhalodia/performance that referenced this pull request Mar 14, 2025
@westonruter
Copy link
Member

@b1ink0 See #1924 for a follow-up to this.

@westonruter
Copy link
Member

Unfortunately, I found a serious bug here which prevents URL Metrics from being submitted: #1928. I think this needs to be reverted for the next release.

@westonruter westonruter added the skip changelog PRs that should not be mentioned in changelogs label Mar 16, 2025
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
[Plugin] Optimization Detective Issues for the Optimization Detective plugin skip changelog PRs that should not be mentioned in changelogs [Type] Enhancement A suggestion for improvement of an existing feature
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Explore compressing URL Metrics to work within the 64 KiB limit
2 participants