gh-81074: Allow non-ASCII addr_spec in email.headerregistry.Address by medmunds · Pull Request #122477 · python/cpython

medmunds · 2024-07-30T19:16:42Z

The email.headerregistry.Address constructor raised an error if addr_spec contained a non-ASCII character. (But it fully supports non-ASCII in the separate username and domain args.) This change removes the error for a non-ASCII addr_spec.

This PR implements @bitdancer's suggested fix from #81074 (comment):

… But with RFC6532 support, it should be valid to have a local part that has non-ascii in an Address, and the error, as I noted above, should be raised only at serialization time and when we don't have an original source string. So that raise should be modified to explicitly ignore the NonASCIILocalPartDefect.

(The other bugs discussed in that comment are reported separately as #83938 and #122476.)

Issue: email.headerregistry.Address blocks Unicode local part addr_spec accepted elsewhere #81074

Fixes gh-81074

The email.headerregistry.Address constructor raised an error if addr_spec contained a non-ASCII character. (But it fully supports non-ASCII in the separate username and domain args.) This change removes the error for a non-ASCII addr_spec.

ghost · 2024-07-30T19:16:44Z

All commit authors signed the Contributor License Agreement.

bedevere-app · 2024-07-30T19:16:47Z

Most changes to Python require a NEWS entry. Add one using the blurb_it web app or the blurb command-line tool.

If this change has little impact on Python users, wait for a maintainer to apply the skip news label instead.

The email.headerregistry.Address constructor raised an error if addr_spec contained a non-ASCII character. (But it fully supports non-ASCII in the separate username and domain args.) This change removes the error for a non-ASCII addr_spec.

encukou · 2025-03-19T13:58:09Z

@bitdancer, as the email expert, do you want to review this?

bitdancer · 2025-03-19T17:11:39Z

Yes, I'll add it to my list. It will take me quite some time to get through the backlog, unfortunately.

bedevere-app · 2025-05-18T22:17:52Z

A Python core developer has requested some changes be made to your pull request before we can consider merging it. If you could please address their requests along with any other requests in other reviews from core developers that would be appreciated.

Once you have made the requested changes, please leave a comment on this pull request containing the phrase I have made the requested changes; please review again. I will then notify any core developers who have left a review that you're ready for them to take another look at this pull request.

…81074

When parsing email messages from Unicode strings (but not bytes), get_local_part() recorded a NonASCIILocalPartDefect for non-ASCII characters. RFC 5322 permits such addresses. This change: - removes the parse-time detection for a non-ASCII local-part (and a related test) - adds tests for passing a non-ASCII addr_spec to email.headerregistry.Address.__init__() - marks the (undocumented) email.errors.NonASCIILocalPartDefect as unused and deprecated This affected parsing email messages from Unicode strings (but not from bytes), and also prevented

medmunds · 2025-05-26T23:19:31Z

I have made the requested changes; please review again

bedevere-app · 2025-05-26T23:19:36Z

Thanks for making the requested changes!

@bitdancer: please review the changes made to this pull request.

github-actions · 2026-04-17T18:17:57Z

This PR is stale because it has been open for 30 days with no activity.

medmunds · 2026-04-17T18:55:25Z

@github-actions that doesn't seem like a good way to encourage contributions from volunteers.

bitdancer · 2026-04-18T19:58:24Z

It did remind me that I hadn't reviewed the PR, which I guess is the point? I've been ignoring a lot of these pings on issues I'm not ready to deal with yet, though, so you also have a point ;)

Now, unfortunately (or fortunately, depending on how you look at it) in the time since you made the change I have discovered, while doing work on the parser, that there actually is a place where the non-ascii local part defect is generated and is appropriate during non-utf8 processing: it happens when the local part is an encoded word that decodes to something containing non-ascii. That's not RFC compliant, of course, but that's why it's a defect ;)

I'm not quite sure where to go with this. On the one hand that's clearly a defect, on the other hand if we took a message like that and refolded it for sending via SMTPUTF8, it would be valid, just as it would be valid if it came in through an Address. So I guess I'm leaning toward still getting rid of the defect, and making sure there is a defect for an encoded word in the local part.

The latter is, unfortunately, not currently the case. I'm in the midst of re-writing the parser, so there are two ways we could go here: go ahead and fix the code now to add the encoded word defect, or just ignore the issue until I've rewritten the parser where I will add that fix. The problem with the latter is that currently the non-ascii defect causes the mailbox to get marked as invalid. The problem with the former is that with the current code it would require scanning the local-part after it is created to see if to contains any encoded words. That's not a huge overhead, so maybe it is worth it. On the third hand, currently an address like =?utf-8?q?foo?= @abc.com will not show as having any defects and will therefore be marked as a valid mailbox. Which is a bug ;) On the fourth hand, nothing is currently done by the rest of the code with mailboxes marked invalid, so it will only matter to people actually digging in to the parse tree, and there are probably very few of those. (There are other problems with the whole 'invalid mailbox' idea, but that's a whole separate set of issues I'm not ready to deal with.)

A third approach would be to keep the current code but replace the non-ascii defect with an encoded word defect, which would preserve the current invalid-mailbox behavior (both right and wrong) except that the defect type would change. That might be the safest option.

Thoughts? Opinions?

bitdancer · 2026-04-18T20:21:24Z

Ah. Another problem. My comment above said "now that we are doing the serialization check", but as far as I can tell we are not doing that. A non-ascii local part results in our writing an RFC invalid email with the local part rfc2047 encoded :( :(

Sigh. I'm not sure what to do with that, because if we fix it, we'll probably break peoples code. Allowing Address to create such addresses will make this worse, so should we add something to generator that will deprecate this behavior? If so we should probably open a new issue and not commit this PR until after the deprecation is in place.

medmunds · 2026-04-18T21:15:39Z

Ah. Another problem. My comment above said "now that we are doing the serialization check", but as far as I can tell we are not doing that. A non-ascii local part results in our writing an RFC invalid email with the local part rfc2047 encoded :( :(

I think PR #122540 was meant to address that?

Sigh. I'm not sure what to do with that, because if we fix it, we'll probably break peoples code. [...]

I'd argue code relying on that is already broken: it's generating undeliverable email. Raising an error instead seems like an improvement. (I arrived in these issues via Django, which had interpreted Python's support for rfc2047-encoded local-parts as spec compliance. We took some steps to correct that in recent releases.)

I'll need a bit to think on your other questions, but would the more detailed errors floated in #122540 (comment) help any?

bitdancer · 2026-04-18T21:27:31Z

Ah, OK, that makes more sense: I have just lost the context :( Sorry about the delay in dealing with these issues, I'll see if I can drive them through to completion. I'll have to re-load my brain with the full context first.

bitdancer · 2026-04-22T19:57:16Z

OK, so #122540 still looks good and I'll get it merged before the beta ;). For this one, I think we should convert the non-ascii defect into a regular InvalidHeaderDefect talking about the invalid encoded word. That will be a little more backward and forward compatible; I'll clean it up more fully in my parser rewrite.

bitdancer · 2026-04-22T20:04:05Z

And having written that I'm having second thoughts. The whole problem is the existence of a defect, and just changing the type would not change the existence of the defect. We'd have to go with the "look for an encoded word" pattern, which would only catch some cases. So I take it back, let's just go with what you have here.

medmunds · 2026-04-30T01:56:39Z

@bitdancer thanks for moving this forward. The context is all slowly coming back to me. Some thoughts…

In a post-EAI world, an addr-spec with non-ASCII characters is valid. So merely representing such an addr-spec in an Address object is not any sort of defect. Parsing one in a non-utf8 context would be a defect, and trying to flatten it with a non-utf8 policy should (and will) be an error.

So the real issue may be that Address(addr_spec=...) borrows parser.get_addr_spec(), and the parser assumes a non-utf8 policy when recording defects. I can't remember if you were thinking about giving the parser access to a policy?

[...] currently an address like =?utf-8?q?foo?= @abc.com will not show as having any defects and will therefore be marked as a valid mailbox. Which is a bug ;) [...]

The defect there seems to be invalid (or obsolete?) CFWS in an addr-spec, with a valid local-part =?utf-8?q?foo?= (since by definition rfc2047 doesn't apply in an addr-spec 😄). Or maybe something involving decoded display-name foo and an addr-spec with multiple defects (missing local-part, missing angle brackets). But not a NonAsciiLocalPartDefect.¹

Backing up a bit, I think the context for gh-81074 was trying to use email.headerregistry.Address as the modern API replacement for legacy email.utils.formataddr. If you have a name and an email address from a user, you want something like this to encode properly and be safe from address header injections:

def send_message(name, email):
    message = email.message.EmailMessage()
    message["To"] = email.headerregistry.Address(
        display_name=name, addr_spec=email
    )
    ...
    # SMTP.send_messages() automatically requires SMTPUTF8
    # and switches to a utf8=True policy if necessary.
    smtp.send_message(message)

The change in this PR would allow that use (either the current version of the PR or the original that kept NonAsciiLocalPartDefect). But it does seem like there are some other considerations around invalid headers, which might be better addressed in your parser rewrite.

Btw (and fwiw), when I looked a few years back, I wasn't able to find any MTAs or mail clients that generated or decoded rfc2047 in addr-specs. So I'm not sure there's practical value in trying to decode it there rather than taking the specs literally.

rfc2047 does often get improperly wrapped in a quoted-string (just to be extra safe with specials, I guess?). And attachment filenames seem at least as likely to be incorrectly encoded with rfc2047 as correctly with rfc2231. ↩

medmunds · 2026-04-30T02:11:11Z

Oh, also while you're looking at parsers and encoded words, you might be interested in PR #130749.

bitdancer · 2026-04-30T16:09:43Z

Yeah, my conclusion is the same: no defect on parsing, raise an error on serialization with utf8=False. Which is what the code in #122540 is going to do, not only for addr-spec but for all other no-encoded-words-allowed cases. Hopefully that won't break anyone's code ;)

As far as decoding them in add-spec, that's mostly a consequence of the parser's "don't raise errors during parsing" contract. I mean, we could leave them un-decoded, but the natural code path in the parser was to decode them, so I just went with it ;). Not decoding them would actually require an extra check, and while that's trivial, I'll probably preserve the behavior in the rewrite for backward compatibility, for whatever that's worth. I'm open to argument on it, though.

I think I'm handing #130749 correctly in the rewrite, but I'll double check. I'll add that issue to my list of things to deal with before finishing the rewrite...I'm thinking of fixing the bugs you and others (and I during the rewrite) have found first, so that the rewrite is changing less behavior when it lands. Which looks like it is going to be quite a while yet.

medmunds requested a review from a team as a code owner July 30, 2024 19:16

bedevere-app Bot mentioned this pull request Jul 30, 2024

email.headerregistry.Address blocks Unicode local part addr_spec accepted elsewhere #81074

Closed

bedevere-app Bot added the awaiting review label Jul 30, 2024

📜🤖 Added by blurb_it.

6b58716

medmunds mentioned this pull request Aug 1, 2024

gh-83938, gh-122476: Stop incorrectly RFC 2047 encoding non-ASCII email addresses #122540

Merged

medmunds and others added 4 commits September 9, 2024 14:06

📜🤖 Added by blurb_it.

7a0243f

Merge branch 'main' into fix-issue-81074

2fcd186

Merge branch 'main' into fix-issue-81074

3024a2d

ZeroIntensity added the topic-email label Dec 6, 2024

34j mentioned this pull request Feb 22, 2025

feat(copier): add --yes to npx all-contributors-cli for ci and test initial_commit and add_me_as_contributor browniebroke/pypackage-template#903

Merged

6 tasks

bitdancer requested changes May 18, 2025

View reviewed changes

Comment thread Lib/email/headerregistry.py Outdated

bedevere-app Bot removed the awaiting review label May 18, 2025

bedevere-app Bot added the awaiting changes label May 18, 2025

medmunds added 3 commits May 26, 2025 14:48

Merge branch 'main' into fix-issue-81074

7a09aab

Merge remote-tracking branch 'origin/fix-issue-81074' into fix-issue-…

22dd204

…81074

bedevere-app Bot added awaiting change review and removed awaiting changes labels May 26, 2025

bedevere-app Bot requested a review from bitdancer May 26, 2025 23:19

github-actions Bot added the stale Stale PR or inactive for long period of time. label Apr 17, 2026

bitdancer mentioned this pull request Apr 18, 2026

gh-118716: Do not enforce ASCII to format address #118717

Closed

bitdancer approved these changes Apr 22, 2026

View reviewed changes

bedevere-app Bot added awaiting merge and removed awaiting change review labels Apr 22, 2026

bitdancer merged commit b605573 into python:main May 1, 2026
44 of 47 checks passed

bedevere-app Bot removed the awaiting merge label May 1, 2026

Uh oh!

Conversation

medmunds commented Jul 30, 2024 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

ghost commented Jul 30, 2024 • edited by ghost Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

bedevere-app Bot commented Jul 30, 2024

Uh oh!

encukou commented Mar 19, 2025

Uh oh!

bitdancer commented Mar 19, 2025

Uh oh!

Uh oh!

bedevere-app Bot commented May 18, 2025

Uh oh!

medmunds commented May 26, 2025

Uh oh!

bedevere-app Bot commented May 26, 2025

Uh oh!

github-actions Bot commented Apr 17, 2026

Uh oh!

medmunds commented Apr 17, 2026

Uh oh!

bitdancer commented Apr 18, 2026

Uh oh!

bitdancer commented Apr 18, 2026

Uh oh!

medmunds commented Apr 18, 2026

Uh oh!

bitdancer commented Apr 18, 2026

Uh oh!

bitdancer commented Apr 22, 2026

Uh oh!

bitdancer commented Apr 22, 2026

Uh oh!

medmunds commented Apr 30, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Footnotes

Uh oh!

medmunds commented Apr 30, 2026

Uh oh!

bitdancer commented Apr 30, 2026

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

5 participants

medmunds commented Jul 30, 2024 •

edited

Loading

ghost commented Jul 30, 2024 •

edited by ghost

Loading

medmunds commented Apr 30, 2026 •

edited

Loading