Skip to content

Breaking change notification: Obsoletion of UTF-7 code paths within the framework #19274

@GrabYourPitchforks

Description

@GrabYourPitchforks

Obsoletion of UTF-7 code paths within the framework

The UTF-7 encoding is no longer in wide use among applications, and many specs now forbid its use in interchange. It is also occasionally used as an attack vector in applications which don't anticipate encountering UTF-7 encoded data. Microsoft also warns against use of UTF7Encoding in application code.

Consistent with this guidance, .NET has obsoleted the Encoding.UTF7 property and UTF7Encoding constructors. Additionally, the Encoding.GetEncoding and Encoding.GetEncodings APIs no longer allow specifying UTF-7.

Version introduced

.NET 5.0 Preview 8

Old behavior

In .NET Core 2.x - 3.x, and in .NET Framework 2.x - 4.x, the Encoding.GetEncoding API could be used to create an instance of the UTF-7 encoding.

Encoding enc1 = Encoding.GetEncoding("utf-7"); // by name
Encoding enc2 = Encoding.GetEncoding(65000); // or by code page

Additionally, the API Encoding.GetEncodings could be used to enumerate all Encoding instances registered on the system. One of these instances will represent the UTF-7 encoding.

foreach (EncodingInfo encInfo in Encoding.GetEncodings())
{
    Console.WriteLine(encInfo.Name);
}

// Possible output:
// utf-32
// utf-32BE
// us-ascii
// utf-7 <-- UTF-7 encoding
// utf-8

New behavior

Beginning with .NET 5, the Encoding.UTF7 property getter and the UTF7Encoding constructors are marked obsolete as warning.

// line below produces warning SYSLIB0001
Encoding enc = Encoding.UTF7;

// line below produces warning SYSLIB0001
UTF7Encoding utf7 = new UTF7Encoding();

The UTF7Encoding type itself is not marked obsolete. This minimizes the warning count that callers receive when using the UTF7Encoding class.

UTF7Encoding enc = new UTF7Encoding(); // warning SYSLIB0001 on this line
byte[] bytes = enc.GetBytes("Hello world!"); // no warning on this line

Additionally, the Encoding.GetEncoding method will treat the encoding name "utf-7" and the code page 65000 as unknown, causing it to produce the same exception that it would have thrown when given an unsupported encoding name.

// throws ArgumentException, same as calling Encoding.GetEncoding("unknown")
Encoding enc = Encoding.GetEncoding("utf-7");

Finally, the Encoding.GetEncodings method returns an EncodingInfo[] array which does not include the UTF-7 encoding. This exclusion is to avoid the issue where GetEncodings produces an EncodingInfo that cannot be instantiated.

foreach (EncodingInfo encInfo in Encoding.GetEncodings())
{
    // line below would throw if GetEncodings included UTF-7
    Encoding enc = Encoding.GetEncoding(encInfo.Name);
}

Reason for change

Many applications call Encoding.GetEncoding("encoding-name") with an encoding-name value provided by an untrusted source. For example, a web client or server might take the charset portion of the Content-Type header and pass the value directly to Encoding.GetEncoding without any validation. This could allow a malicious endpoint to specify Content-Type: ...; charset=utf-7, which could cause the receiving application to misbehave.

ASP.NET 4.5+, ASP.NET Core (all versions), and ASP.NET 5.0+ all reject request headers of the form Content-Type: ...; charset=utf-7.

Additionally, disabling UTF-7 code paths by default allows optimizing compilers (such as those used by Blazor) to remove these code paths entirely from the resulting application. This results in the compiled applications running more efficiently and taking less disk space.

Recommended action

Most developers do not need to make any change. For certain scenarios where applications may have previously activated UTF-7 related code paths, we provide guidance below.

If you're calling Encoding.GetEncoding with unknown encoding names provided by an untrusted source, we recommend that you instead compare the encoding names against a configurable allow list. The configurable allow list should at minimum include the industry-standard "utf-8". Depending on your clients and regulatory requirements you may also need to allow region-specific encodings like "GB18030".

If you do not implement an allow list, Encoding.GetEncoding will return any Encoding that is built-in to the system or that is registered via a custom EncodingProvider. You should audit your service's requirements to validate that this is the desired behavior. UTF-7 continues to be disabled by default unless your application re-enables the compat switch mentioned below.

If you're using Encoding.UTF7 or UTF7Encoding within your own protocol or file format, we recommend that you switch to using Encoding.UTF8 or UTF8Encoding. UTF-8 is an industry standard and is widely supported across languages, operating systems, and runtimes. Using UTF-8 will ease future maintenance of your code and will make it more interoperable with the rest of the ecosystem.

If you're trying to compare an Encoding instance against Encoding.UTF7, consider instead performing a check against the well-known UTF-7 code page (65000). This has the dual advantage that it both avoids the warning and handles some edge cases (such as somebody having called new UTF7Encoding() or having subclassed the type).

void DoSomething(Encoding enc)
{
    // don't perform the check this way; it produces a warning and misses some edge cases
    if (enc == Encoding.UTF7)
    {
        // encoding is UTF-7
    }

    // instead, perform the check this way
    if (enc != null && enc.CodePage == 65000)
    {
        // encoding is UTF-7
    }
}

If you must use Encoding.UTF7 or UTF7Encoding, you can suppress the SYSLIB0001 warning in code or within your project's .csproj file.

#pragma warning disable SYSLIB0001 // disable the warning
Encoding enc = Encoding.UTF7;
#pragma warning restore SYSLIB0001 // re-enable the warning
<Project Sdk="Microsoft.NET.Sdk">
  <PropertyGroup>
   <TargetFramework>net5.0</TargetFramework>
   <!-- NoWarn below will suppress SYSLIB0001 project-wide -->
   <NoWarn>$(NoWarn);SYSLIB0001</NoWarn>
  </PropertyGroup>
</Project>

Suppressing SYSLIB0001 disables only the Encoding.UTF7 and UTF7Encoding obsoletion warnings. It does not disable any other warnings, and it does not change the behavior of APIs like Encoding.GetEncoding.

If you must support Encoding.GetEncoding("utf-7", ...), you can re-enable support for this via a compat switch. This compat switch can be specified via the application's .csproj file or via a runtime configuration file, as demonstrated below.

In the application's .csproj file:

<Project Sdk="Microsoft.NET.Sdk">
  <PropertyGroup>
   <TargetFramework>net5.0</TargetFramework>
   <!-- Re-enable support for UTF-7 -->
   <EnableUnsafeUTF7Encoding>true</EnableUnsafeUTF7Encoding>
  </PropertyGroup>
</Project>

In the application's runtimeconfig.template.json file (see ".NET Core run-time configuration settings"):

{
  "configProperties": {
    "System.Text.Encoding.EnableUnsafeUTF7Encoding": true
  }
}

We recommend that applications which re-enable support for UTF-7 perform a security review of code which calls Encoding.GetEncoding.

Category

  • Core .NET libraries
  • Security

Affected APIs

The following APIs are now obsolete as warning but have no behavioral changes:

The following APIs have behavioral changes as described earlier:


Issue metadata

  • Issue type: breaking-change

Metadata

Metadata

Assignees

Labels

🏁 Release: .NET 5Work items for the .NET 5 releasebreaking-changeIndicates a .NET Core breaking change

Type

No type

Projects

No projects

Milestone

No milestone

Relationships

None yet

Development

No branches or pull requests

Issue actions