Proposal for non-nullable references (and safe nullable references)

#1. Overview

This is my concept for non-nullable references (and safe nullable references) in C#. I have tried to keep my points brief and clear so I hope you will be interested in having a look through my proposal.

I will begin with an extract from the C# Design Meeting Notes for Jan 21, 2015 (https://github.com/dotnet/roslyn/issues/98):

_There's a long-standing request for non-nullable reference types, where the type system helps you ensure that a value can't be null, and therefore is safe to access. Importantly such a feature might go along well with proper safe nullable reference types, where you simply cannot access the members until you've checked for null._

This is my proposal for how this could be designed. The types of references in the language would be:
- General References (Dog) - the traditional references we have always had.
- Mandatory References (Dog!)
- Nullable References (Dog?)

Important points about this proposal:
1. There are no language syntax changes other than the addition of the '!' and '?' syntax when declaring (or casting) references.
2. Null reference exceptions are impossible if the new style references are used throughout the code.
3. **There are no changes to the actual code compilation, by which I mean we are only adding compiler checks - we are not changing anything about the way that the compiled code is generated. The compiled IL code will be identical whether traditional (general) references or the new types of references are used.**
4. It follows from this last point that the runtime will not need to know anything about the new types of references. Once the code is compiled, references are references.
5. All existing code will continue to compile, and the new types of references can interact reasonably easily with existing code.
6. The '!' and '?' can be added to existing code and, if that existing code is 'null safe' already, the code will probably just compile and work as it is. If there are compiler errors, these will indicate where the code is not 'null safe' (or possibly where the 'null safe-ness' of the code is expressed in a way that is too obscure). The compiler errors will be able to be fixed using the same 'plain old C#' constructs that we have always used to enforce 'null safe-ness'.
   Conversely, code will continue to behave identically if the '!' and '?' are removed (but the code will not be protected against any future code changes that are not 'null safe').
7. No doubt there are ideas in here that have been said by others, but I haven't seen this exact concept anywhere. However if I have reproduced someone else's concept it was not intentional! (Edit: I now realise that I have unintentionally stolen the core concept from Kotlin - see http://kotlinlang.org/docs/reference/null-safety.html).

The Design Meeting Notes cite a blog post by Eric Lippert (http://blog.coverity.com/2013/11/20/c-non-nullable-reference-types/#.VM_yZmiUe2E) which points out some of the thorny issues that arise when considering non-nullable reference types. I respond to some of his points in this post.

Here is the Dog class that is used in the examples:

``` csharp
public class Dog
{
    public string Name { get; private set; }

    public Dog(string name)
    {
        Name = name;
    }

    public void Bark()
    {
    }
}
```
#2. Background

I will add a bit of context that will hopefully make the intention of the idea clearer.

I have thought about this topic on and off over the years and my thinking has been along the lines of this type of construct (with a new 'check' keyword):

``` csharp
Dog? nullableDog = new Dog("Nullable");

nullableDog.Bark(); // Compiler Error - cannot dereference nullable reference (yet).

check (nullableDog)
{
    // This code branch is executed if the reference is non-null. The compiler will allow methods to be called and properties to be accessed.
    nullableDog.Bark(); // OK.
}
else
{
    nullableDog.Bark(); // Compiler Error - we know the reference is null in this context.
}
```

The 'check' keyword does two things:
1. It checks whether the reference is null and then switches the control flow just like an 'if' statement.
2. It signals to the compiler to apply certain rules within the code blocks that follow it (most importantly, rules about whether or not nullable references can be dereferenced).

It then occurred to me that since it is easy to achieve the first objective using the existing C# language, why invent a new syntax and/or keyword just for the sake of the second objective? We can achieve the second objective by teaching the compiler to apply its rules wherever it detects this common construct:

``` csharp
if (nullableDog != null)
```

Furthermore it occurred to me that we could extend the idea by teaching the compiler to detect other simple ways of doing null checks that already exist in the language, such as the ternary (?:) operator.

This line of thinking is developed in the explanation below.
#3. Mandatory References

As the name suggests, mandatory references can never be null:

``` csharp
Dog! mandatoryDog = null; // Compiler Error.
```

However the good thing about mandatory references is that the compiler lets us dereference them (i.e. use their methods and properties) any time we want, because it knows at compile time that a null reference exception is impossible:

``` csharp
Dog! mandatoryDog = new Dog("Mandatory");
mandatoryDog.Bark(); // OK - can call method on mandatory reference.
string name = mandatoryDog.Name; // OK - can access property on mandatory reference.
```

(See my additional post for more details.)
#4. Nullable References

As the name suggests, nullable references can be null:

``` csharp
Dog? nullableDog = null; // OK.
```

However the compiler will not allow us (except in circumstances described later) to dereference nullable references, as it can't guarantee that the reference won't be null at runtime:

``` csharp
Dog? nullableDog = new Dog("Nullable");
nullableDog.Bark(); // Compiler Error - cannot call method on nullable reference.
string name = nullableDog.Name; // Compiler Error - cannot access property on nullable reference
```

This may make nullable references sound pretty useless, but there are further details to follow.
#5. General References

General references are the references that C# has always had. Nothing is changed about them.

``` csharp
Dog generalDog1 = null; // OK.
Dog generalDog2 = new Dog("General"); // OK.

generalDog.Bark(); // OK at compile time, fingers crossed at runtime.
```
#6. Using Nullable References

So if you can't call methods or access properties on a nullable reference, what's the use of them?

Well, if you do the appropriate null reference check (I mean just an ordinary null reference check using traditional C# syntax), the compiler will detect that the reference can be safely used, and the nullable reference will then behave (within the scope of the check) as if it were a mandatory reference.

In the example below the compiler detects the null check and this affects the way that the nullable reference can be used within the 'if' block and 'else' block:

``` csharp
Dog? nullableDog = new Dog("Nullable");

nullableDog.Bark(); // Compiler Error - cannot dereference nullable reference (yet).

if (nullableDog != null)
{
    // The compiler knows that the reference cannot be null within this scope.
    nullableDog.Bark(); // OK - the reference behaves like a mandatory reference.
}
else
{
    // The compiler knows that the reference is null within this scope.
    nullableDog.Bark(); // Compiler Error - the reference still behaves as a nullable reference.
}
```

The compiler will also recognise this sort of null check:

``` csharp
if (nullableDog == null)
{
    return;
}

// The compiler knows that if the reference was null, this code would never be reached.
nullableDog.Bark(); // OK - reference behaves like a mandatory reference.
```

And this:

``` csharp
if (nullableDog == null)
{
    throw new Exception("Where is my dog?");
}

// The compiler knows that if the reference was null, this code would never be reached.
nullableDog.Bark(); // OK - reference behaves like a mandatory reference.
```

The compiler will also recognise when you do the null check using other language features:

``` csharp
string name1 = (nullableDog != null ? nullableDog.Name : null); // OK
string name2 = nullableDog?.Name; // OK
```

Hopefully it is now clear that if the new style references are used throughout the code, null reference exceptions are actually impossible. However once the effort has been made to convert the code to the new style references, it is important to guard against the accidental use of general references, as this compromises null safety. There needs to be an attribute such as this to tell the compiler to prevent the use use of general references:

``` csharp
[assembly: AllowGeneralReferences(false)] // Defaults to true
```

This attribute could also be applied at the class level, so you could for example forbid general references for the assembly but then allow them for a class (if the class has not yet been converted to use the new style references):

``` csharp
[AllowGeneralReferences(true)]
public class MyClass
{
}
```

(See my additional post for more details.)
#7. Can we develop a reasonable list of null check patterns that the compiler can recognise?

I have not listed every possible way that a developer could do a null check; there are any number of complex and obscure ways of doing it. The compiler can't be expected to handle cases like this: 

``` csharp
if (MyMethodForCheckingNonNull(nullableDog))
{
}
```

However the fact that the compiler will not handle every case is a feature, not a bug. We don't _want_ the compiler to detect every obscure type of null check construct. We want it to detect a finite list of null checking patterns that reflect clear coding practices and appropriate use of the C# language. If the programmer steps outside this list, it will be very clear to them because the compiler will not let them dereference their nullable references, and the compiler will in effect be telling them to express their intention more simply and clearly in their code.

So is it possible to develop a reasonable list of null checking constructs that the compiler can enforce? Characteristics of such a list would be:
1. It must be possible for compiler writers to implement.
2. It must be intuitive, i.e. a reasonable programmer should never have to even think about the list, because any sensible code will 'just work'.
3. It must not seem arbitrary, i.e. there must not be situations where a certain null check construct is detected and another that seems just as reasonable is not detected.

I think the list of null check patterns in the previous section, combined with some variations that I am going to put in a more advanced post, is an appropriate and intuitive list. But I am interested to hear what others have to say.  

Am I expecting compiler writers to perform impossible magic here? I hope not - I think that the patterns here are reasonably clear, and the logic is hopefully of the same order of difficulty as the logic in existing compiler warnings and in code checking tools such as ReSharper.
#8. Converting Between Mandatory, Nullable and General References

The principles presented so far lead on to rules about conversions between the three types of references. You don't have to take in every detail of this section to get the general idea of what I'm saying - just skim over it if you want.

Let's define some references to use in the examples that follow.

``` csharp
Dog! myMandatoryDog = new Dog("Mandatory");
Dog? myNullableDog = new Dog("Nullable");
Dog myGeneralDog = new Dog("General");
```

Firstly, any reference can be assigned to another reference if it is the same type of reference:

``` csharp
Dog! yourMandatoryDog = myMandatoryDog; // OK.
Dog? yourNullableDog = myNullableDog; // OK.
Dog yourGeneralDog = myGeneralDog; // OK.
```

Here are all the other possible conversions. Note that when I talk about 'intent' I am meaning the idea that a traditional (general) reference is **conceptually** either mandatory or nullable at any given point in the code. This intent is explicit and self-documenting in the new style references, but it still exists implicitly in general references (e.g. "I know this reference can't be null because I wrote a null check", or "I know that this reference can't or at least shouldn't be null from my knowledge of the business domain").

``` csharp
Dog! mandatoryDog1 = myNullableDog; // Compiler Error - the nullable reference may be null.
Dog! mandatoryDog2 = myGeneralDog; // Compiler Error - the general reference may be null.
Dog? nullableDog1 = myMandatoryDog; // OK.
Dog? nullableDog2 = myGeneralDog; // Compiler Error - makes an assumption about the intent of the general reference (maybe it is conceptually mandatory, rather than conceptually nullable as assumed here).
Dog generalDog1 = myMandatoryDog; // Compiler Error - loses information about the intent of the mandatory reference (the general reference may be conceptually mandatory, or may be conceptually nullable if the intent is that it could later be made null).
Dog generalDog2 = myNullableDog; // Compiler Error - loses the safety of the nullable reference.
```

There has to be some compromise in the last three cases as our code has to interact with existing code that uses general references. These three cases are allowed if an explicit cast is used to make the compromise visible (and perhaps there should also be a compiler warning). 

``` csharp
Dog? nullableDog2 = (Dog?)myGeneralDog; // OK (perhaps with compiler warning).
Dog generalDog1 = (Dog)myMandatoryDog; // OK (perhaps with compiler warning).
Dog generalDog2 = (Dog)myNullableDog; // OK (perhaps with compiler warning) .
```

Some of the conversions that were not possible by direct assignment can be achieved slightly less directly using existing language features:

``` csharp
Dog! mandatoryDog1 = myNullableDog ?? new Dog("Mandatory"); // OK.
Dog! mandatoryDog2 = (myNullableDog != null ? myNullableDog : new Dog("Mandatory")); // OK.

Dog! mandatoryDog3 = (Dog!)myGeneralDog ?? new Dog("Mandatory"); // OK, but requires cast to indicate that we are making an assumption about the intent of the general reference..
Dog! mandatoryDog4 = (myGeneralDog != null ? (Dog!)myGeneralDog : new Dog("Mandatory")); // OK, but requires a cast for the same reason as above.
```
#9. Class Libraries

As mentioned previously, the compiled IL code will be the same whether you use the new style references or not. If you compile an assembly, the resulting binary will not know what type of references were used in its source code.

This is fine for executables, but in the case of a class library, where the goal is obviously re-use, the compiler will need a way of knowing the types of references used in the public method and public property signatures of the library.

I don't know much about the internal structure of DLLs, but maybe there could be some metadata embedded in the class library which provides this information.

Or even better, maybe reflection could be used - an enum property indicating the type of reference could be added to the ParameterInfo class. Note that the reflection would be used by the _compiler_ to get the information it needs to do its checks - there would be no reflection imposed at runtime. At runtime everything would be exactly the same as if traditional (general) references were used.

Now say we have an assembly that has not yet been converted to use the new style references, but which needs to use a library that does use the new style references. There needs to be a way of turning off the mechanism described above so that the library appears as a traditional library with only general references. This could be achieved with an attribute like this:

``` csharp
[assembly: IgnoreNewStyleReferences("SomeThirdPartyLibrary")]
```

Perhaps this attribute could also be applied at a class level. The class could remain completely unchanged except for the addition of the attribute, but still be able to make use of a library which uses the new style references.

(See my additional post for more details.)
#10. Constructors

Eric Lippert's post (see reference in the introduction to this post) also raises thorny issues about constructors. Eric points out that "the type system absolutely guarantees that ...[class] fields always contain a valid string reference or null".

A simple (but compromised) way of addressing this may be for mandatory references to behave like nullable references within the scope of a constructor. It is the programmer's responsibility to ensure safety within the constructor, as has always been the case. This is a significant compromise but may be worth it if the thorny constructor issues would otherwise kill off the idea of the new style references altogether.

It could be argued that there is a similar compromise for readonly fields which can be set multiple times in a constructor.

A better option would be to prevent _any_ access to the mandatory field (and to the 'this' reference, which can be used to access it) until the field is initialised:

``` csharp
public class Car
{
    public Engine! Engine { get; private set; }

    public Car(Engine! engine)
    {
        Engine.Start(); // Compiler Error
        CarInitializer.Initialize(this); // Compiler Error - the 'this' reference could be used to access Engine methods and properties
        Engine = engine;
        // Can now use Engine and 'this' at will
    }
}
```

Note that it is not an issue if this forces adjustment of existing code - the programmer has chosen to introduce the new style references and thus will inevitably be adjusting the code in various ways as described earlier in this post.

And what if the programmer initializes the property in some way that still makes everything safe but is a bit more obscure and thus more difficult for the compiler to recognise? Well, the general philosophy of this entire proposal is that the compiler recognises a finite list of sensible constructs, and if you step outside of these you will get a compiler error and you will have to make your code simpler and clearer.
#11. Generics

Using mandatory and nullable references in generics seems to be generally ok if we are prepared to have a class constraint on the generic class:

``` csharp
class GenericClass<T>
    where T : class // Need class constraint to use mandatory and nullable references
{
    public void TestMethod(T? nullableRef)
    {
        T! mandatoryRef = null; // Compiler Error - mandatory reference cannot be null
        string s = nullableRef.ToString(); // Compiler Error - cannot dereference nullable reference
    }
}
```

However there is more to think about generics - see comments below.
#12. Var

This is the way that I think var would work:

``` csharp
var dog1 = new Dog("Sam"); // var is Dog! (the compiler will keep things as 'tight' as possible unless we tell it otherwise).
var! dog2 = new Dog("Sam"); // var is Dog!
var? dog3 = new Dog("Sam"); // var is Dog?
var dog4 = (Dog)new Dog("Sam"); // var is Dog (see conversion rules - needs cast)

var dog1 = MethodReturningMandatoryRef(); // var is Dog!
var! dog2 = MethodReturningMandatoryRef(); // var is Dog!
var? dog3 = MethodReturningMandatoryRef(); // var is Dog? (see conversion rules)
var dog4 = (Dog)MethodReturningMandatoryRef(); // var is Dog (see conversion rules - needs cast)

var dog1 = MethodReturningNullableRef(); // var is Dog?
var! dog2 = MethodReturningNullableRef(); // Compiler Error (see conversion rules)
var? dog3 = MethodReturningNullableRef(); // var is Dog?
var dog4 = (Dog)MethodReturningNullableRef(); // var is Dog (see conversion rules - needs cast)

var dog1 = MethodReturningGeneralRef(); // var is Dog
var! dog2 = MethodReturningGeneralRef(); // Compiler Error (see conversion rules)
var? dog3 = (Dog)MethodReturningGeneralRef(); // var is Dog? (see conversion rules - needs cast)
```

The first case in each group would be clearer if we had a suffix to indicate a general reference (say #), rather than having no suffix due to the need for backwards compatibility. This would make it clear that 'var#' would be a general reference whereas 'var' can be mandatory, nullable or general depending on the context.
#12. More Cases

In the process of thinking through this idea as thoroughly as possible, I have come up with some other cases that are mostly variations on what is presented above, and which would just have cluttered up this post if I had put them all in. I'll put these in a separate post in case anyone is keen enough to read them.


Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Proposal for non-nullable references (and safe nullable references) #227

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Proposal for non-nullable references (and safe nullable references) #227

Description

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions