Add lint to detect wrongly encoded diacritics due to UTF-8 mistaken for Latin-1 by defacto64 · Pull Request #931 · zmap/zlint

defacto64 · 2025-03-17T07:29:45Z

This lint addresses the error where a UTF8String field in a certificate (e.g. Subject:organizationName) contains, instead of a certain diacritic, the Latin-1 (or Windows-1252) equivalent of the individual bytes that make up the UTF-8 encoding of that diacritic. For example, instead of the single letter "ü" whose UTF-8 encoding is 0xC3 0xBC, we find the two characters "Ã" and "¼" (which correspond to 0xC3 and 0xBC in the Latin-1 character set, respectively).

This mix-up is likely caused by processing UTF-8 strings with obsolete and/or buggy software that mistakenly assumes them to be Latin-1 or Windows-1252 strings (or, at any rate, that they are made up of 1-byte characters).

Numerous "ever-trusted" certificates affected by this error can be found on Censys, most quite old but some issued as late as December 2023.

Even though it's a problem that can hardly go unnoticed, I think it is useful to introduce a lint for it.

This lint is based on the fact that the two-character sequences resulting from the wrong encoding are highly unlikely in the real names of organizations, localities, persons, etc., so their occurrence in some field of the Subject is an almost certain signal that the mix-up has occurred.

Added //nolint:all to comment block to avoid golangci-lint to complain about duplicate words in comment

Fixed import block

Fine to me. Co-authored-by: Christopher Henderson <[email protected]>

As per Chris Henderson's suggestion, to "improve readability".

As per Chris Henderson's suggestion.

Added CABFEV_Sec9_2_8_Date

christopher-henderson

Naivety regarding encoding is perhaps right up there with off-by-one errors. The ASCII notion that 1 byte == 1 character is both elegant and a death trap.

defacto64 and others added 30 commits March 8, 2024 16:07

Add files via upload

0d4a7d5

Add files via upload

9ae1760

Add files via upload

c66f6f6

Add files via upload

3bd2334

Update lint_invalid_subject_rdn_order_test.go

95e89c8

Added //nolint:all to comment block to avoid golangci-lint to complain about duplicate words in comment

Update lint_invalid_subject_rdn_order.go

7230486

Fixed import block

Merge branch 'master' into master

983a0df

Update v3/lints/cabf_br/lint_invalid_subject_rdn_order.go

36682ed

Fine to me. Co-authored-by: Christopher Henderson <[email protected]>

Update lint_invalid_subject_rdn_order.go

fc81ece

As per Chris Henderson's suggestion, to "improve readability".

Update lint_invalid_subject_rdn_order_test.go

9e54f08

As per Chris Henderson's suggestion.

Merge branch 'master' into master

e61235c

Update time.go

8ca486a

Added CABFEV_Sec9_2_8_Date

Add files via upload

1df8c9b

Add files via upload

ae29a40

Merge branch 'zmap:master' into master

9f657b2

Revised according to Chris and Corey suggestions

faa938d

Add files via upload

d2aa5b1

Add files via upload

b827d18

Merge branch 'zmap:master' into master

89e0ed1

Delete v3/lints/cabf_br/lint_e_invalid_cps_uri.go

e2f2f0e

Delete v3/lints/cabf_br/lint_e_invalid_cps_uri_test.go

126e1ac

Delete v3/testdata/invalid_cps_uri_ko_01.pem

a7fbe52

Delete v3/testdata/invalid_cps_uri_ko_02.pem

b289660

Delete v3/testdata/invalid_cps_uri_ko_03.pem

b5af6be

Delete v3/testdata/invalid_cps_uri_ok_01.pem

d9fea03

Delete v3/testdata/invalid_cps_uri_ok_02.pem

a324160

Delete v3/testdata/invalid_cps_uri_ok_03.pem

9ef6f60

Merge branch 'master' into master

949d3ca

Merge branch 'zmap:master' into master

c827e99

Merge branch 'zmap:master' into master

698d02a

defacto64 and others added 7 commits September 30, 2024 07:16

Merge branch 'zmap:master' into master

9a92f1a

Merge branch 'zmap:master' into master

be3dff5

Add files via upload

acd01d3

Add files via upload

bb3213b

Update config.json

0b8fc9e

Update lint_utf8_latin1_mixup.go

c1ca58c

Merge branch 'master' into utf8_latin1_mixup

baad3b8

christopher-henderson self-requested a review March 23, 2025 15:03

christopher-henderson approved these changes Mar 23, 2025

View reviewed changes

christopher-henderson merged commit 7a0479c into zmap:master Mar 23, 2025
5 checks passed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Add lint to detect wrongly encoded diacritics due to UTF-8 mistaken for Latin-1#931

Add lint to detect wrongly encoded diacritics due to UTF-8 mistaken for Latin-1#931
christopher-henderson merged 37 commits intozmap:masterfrom
defacto64:utf8_latin1_mixup

defacto64 commented Mar 17, 2025

Uh oh!

christopher-henderson left a comment

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

Conversation

defacto64 commented Mar 17, 2025

Uh oh!

christopher-henderson left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants