Prevent URL in `Link` header from including invalid characters by AhmarZaidi · Pull Request #1802 · WordPress/performance

AhmarZaidi · 2025-01-14T07:30:47Z

Summary

This PR addresses an issue where non-ASCII characters in URL filenames caused HTTP headers to break reverse proxies and violate standards. The solution encodes only the filename part of URLs, ensuring compliance with ISO-8859-1 character requirements. This change maintains URL integrity while preventing potential issues with reverse proxies.

Example URL : https://testsite.com/wp-content/uploads/2025/01/חנות-scaled.avif
Corrected URL: https://testsite.com/wp-content/uploads/2025/01/%D7%97%D7%A0%D7%95%D7%AA-scaled.avif

Fixes #1775

Relevant technical choices

Implemented a solution to address the issue where non-ASCII characters in URLs, such as Hebrew characters in filenames, were causing HTTP headers to break reverse proxies and violate HTTP standards.
Updated the URL encoding logic to use a regular expression that matches any character not allowed by RFC 3986, ensuring that all non-ASCII and disallowed ASCII characters are percent-encoded.
Implemented preg_replace_callback() with a static closure to dynamically encode characters using rawurlencode(). This ensures that only characters outside the allowed set are encoded.
Add test cases to include international domain names, non-ASCII paths, and URLs with special characters.

codecov · 2025-01-14T07:38:46Z

Codecov Report

All modified and coverable lines are covered by tests ✅

Project coverage is 65.97%. Comparing base (e793289) to head (c3c2512).
Report is 12 commits behind head on trunk.

Additional details and impacted files

@@            Coverage Diff             @@
##            trunk    #1802      +/-   ##
==========================================
+ Coverage   65.92%   65.97%   +0.04%     
==========================================
  Files          88       88              
  Lines        6885     6895      +10     
==========================================
+ Hits         4539     4549      +10     
  Misses       2346     2346

Flag	Coverage Δ
multisite	`65.97% <100.00%> (+0.04%)`	⬆️
single	`38.17% <0.00%> (-0.06%)`	⬇️

Flags with carried forward coverage won't be shown. Click here to find out more.

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

westonruter

Thanks for the PR! I left a couple alternative suggestions.

Please add some test coverage for the lines not covered be tests.

plugins/optimization-detective/class-od-link-collection.php

westonruter · 2025-01-14T17:46:08Z

The solution encodes only the filename part of URLs

What about when an internationalized domain name is used? Couldn't this also cause problems with encoding?

westonruter · 2025-01-21T04:32:15Z

Additionally, on multisite subdirectory installs, in theory the path before wp-content could also include non-ASCII chars:

https://testsite.com/חנות/wp-content/uploads/2025/01/example-scaled.avif

…in-link-header

westonruter · 2025-01-29T21:50:56Z

@AhmarZaidi Hey, are you intending to pick this up again?

AhmarZaidi · 2025-01-30T06:33:02Z

@westonruter Yes, I'll be working on the changes right away.

AhmarZaidi · 2025-01-30T08:11:22Z

Instead of parsing the URL and then re-constructing it, what if you just check if the href has any non-ASCII chars and then encode the entire URL

@westonruter I've implemented the changes: If the path contains any non-ascii characters then we encode the whole URL.

However we can use preg_replace_callback to encode only the non-ascii characters in the URL. Let me know if that approach will be better.

westonruter · 2025-01-30T18:15:25Z

What you did looks good. Could you add some test cases for international domain names and non-ASCII paths to ensure the expected output?

AhmarZaidi · 2025-01-31T13:51:24Z

@westonruter The old solution was encoding the :// part also, so I've update the code slightly.
Current solution: Encode complete url after :// part.

Potential Issue: This solution converts the slashes (/) to %2F so the file path would be incorrect.
For example https://xn--fsq.com/תמונה.jpg will be encoded to https://xn--fsq.com%2F%D7%AA%D7%9E%D7%95%D7%A0%D7%94.jpg
Note: I've written the tests according to the above output.

If we use something like:

$decoded_url = urldecode( $link['href'] );
$encoded_url = preg_replace_callback(
        '/[^\x00-\x7F]/',
        fn( $matches ) => rawurlencode( $matches[0] ),
        $decoded_url
);

Then https://xn--fsq.com/תמונה.jpg will be encoded to https://xn--fsq.com/%D7%AA%D7%9E%D7%95%D7%A0%D7%94.jpg

Please feel free to let me know if we should do any further changes to the approach.

westonruter · 2025-01-31T16:28:41Z

If we use something like:

That looks good!

AhmarZaidi · 2025-01-31T19:52:35Z

Implemented the changes.

westonruter · 2025-01-31T19:56:29Z

So this is no longer a draft and is ready for review, correct?

github-actions · 2025-01-31T20:03:29Z

The following accounts have interacted with this PR and/or linked issues. I will continue to update these lists as activity occurs. You can also manually ask me to refresh this list by adding the props-bot label.

Unlinked Accounts

The following contributors have not linked their GitHub and WordPress.org accounts: @amitay-elementor.

Contributors, please read how to link your accounts to ensure your work is properly credited in WordPress releases.

If you're merging code through a pull request on GitHub, copy and paste the following into the bottom of the merge commit message.

Unlinked contributors: amitay-elementor.

Co-authored-by: AhmarZaidi <[email protected]>
Co-authored-by: westonruter <[email protected]>

To understand the WordPress project's expectations around crediting contributors, please review the Contributor Attribution page in the Core Handbook.

plugins/optimization-detective/class-od-link-collection.php

plugins/optimization-detective/tests/test-class-od-link-collection.php

westonruter · 2025-01-31T20:28:00Z

Relevant technical choices

Could you update the description based on the latest revisions?

plugins/optimization-detective/class-od-link-collection.php

…in-link-header

westonruter

Thank you!

@felixarntz Anything we missed here?

…in-link-header

felixarntz

Thank you for the PR @AhmarZaidi, LGTM!

westonruter · 2025-02-12T01:15:02Z

We forgot about the URLs in the imagesrcset. I've opened #1866 to fix that.

westonruter · 2025-03-05T01:22:37Z

Follow-up issue: #1906

Optimize URL encoding logic in get_response_header

700ec77

AhmarZaidi mentioned this pull request Jan 14, 2025

Optimization detective can return non-ascii characters in the Link header, breaking some reverse proxies and HTTP standards. #1775

Closed

AhmarZaidi changed the title ~~Fix: Optimize URL encoding logic in get_response_header~~ Fix: Optimization detective can return non-ascii characters in the Link header, breaking some reverse proxies and HTTP standards Jan 14, 2025

westonruter reviewed Jan 14, 2025

View reviewed changes

plugins/optimization-detective/class-od-link-collection.php Outdated Show resolved Hide resolved

plugins/optimization-detective/class-od-link-collection.php Outdated Show resolved Hide resolved

westonruter added this to the optimization-detective n.e.x.t milestone Jan 14, 2025

westonruter added [Type] Bug An existing feature is broken [Plugin] Optimization Detective Issues for the Optimization Detective plugin labels Jan 14, 2025

westonruter modified the milestones: optimization-detective 1.0.0-beta1, optimization-detective n.e.x.t Jan 23, 2025

Merge branch 'trunk' into fix/optimization-detective-non-ascii-chars-…

cd4116d

…in-link-header

Upadate encoding logic to encode whole path

d1c5afd

Add tests and update url encode scheme separately

505c5bb

Encode only non ascii characters

50591ef

AhmarZaidi marked this pull request as ready for review January 31, 2025 20:03

AhmarZaidi requested a review from felixarntz as a code owner January 31, 2025 20:03

westonruter reviewed Jan 31, 2025

View reviewed changes

plugins/optimization-detective/class-od-link-collection.php Show resolved Hide resolved

westonruter requested changes Jan 31, 2025

View reviewed changes

AhmarZaidi and others added 3 commits February 3, 2025 12:30

Update non-ascii matching logic & tests

8389a40

Add test case for a responsive preload link without an href

ace40cb

Add failing test case for bare percent appearing in URL

b1ed4c1

westonruter reviewed Feb 3, 2025

View reviewed changes

plugins/optimization-detective/class-od-link-collection.php Outdated Show resolved Hide resolved

westonruter changed the title ~~Fix: Optimization detective can return non-ascii characters in the Link header, breaking some reverse proxies and HTTP standards~~ Prevent URL in Link header from including invalid characters Feb 4, 2025

AhmarZaidi and others added 2 commits February 4, 2025 14:15

Remove percentage from regex

7e8867f

Merge branch 'trunk' into fix/optimization-detective-non-ascii-chars-…

2d14c6e

…in-link-header

westonruter approved these changes Feb 4, 2025

View reviewed changes

Merge branch 'trunk' into fix/optimization-detective-non-ascii-chars-…

c3c2512

…in-link-header

felixarntz approved these changes Feb 5, 2025

View reviewed changes

westonruter merged commit 40e8c31 into WordPress:trunk Feb 5, 2025
16 checks passed

westonruter mentioned this pull request Feb 12, 2025

Add percent encoding of URLs in imagesrcset param of Link response header #1866

Merged

westonruter mentioned this pull request Mar 4, 2025

Preload links for image URLs containing commas can be erroneously percent-encoded #1906

Closed

Conversation

AhmarZaidi commented Jan 14, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Summary

Relevant technical choices

Uh oh!

codecov bot commented Jan 14, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Codecov Report

Uh oh!

westonruter left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

westonruter commented Jan 14, 2025

Uh oh!

westonruter commented Jan 21, 2025

Uh oh!

westonruter commented Jan 29, 2025

Uh oh!

AhmarZaidi commented Jan 30, 2025

Uh oh!

AhmarZaidi commented Jan 30, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

westonruter commented Jan 30, 2025

Uh oh!

AhmarZaidi commented Jan 31, 2025

Uh oh!

westonruter commented Jan 31, 2025

Uh oh!

AhmarZaidi commented Jan 31, 2025

Uh oh!

westonruter commented Jan 31, 2025

Uh oh!

github-actions bot commented Jan 31, 2025

Unlinked Accounts

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

westonruter commented Jan 31, 2025

Uh oh!

Uh oh!

westonruter left a comment

Choose a reason for hiding this comment

Uh oh!

felixarntz left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

westonruter commented Feb 12, 2025

Uh oh!

westonruter commented Mar 5, 2025

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

AhmarZaidi commented Jan 14, 2025 •

edited

Loading

codecov bot commented Jan 14, 2025 •

edited

Loading

AhmarZaidi commented Jan 30, 2025 •

edited

Loading