6

When null is not enough: an option type for C#

 3 years ago
source link: http://twistedoakstudios.com/blog/Post1130_when-null-is-not-enough-an-option-type-for-c
Go to the source link to view the article. You can view the picture content, updated content and better typesetting reading experience. If the link is broken, please click the button below to view the snapshot at that time.

When null is not enough: an option type for C#

posted by Craig Gidney on December 4, 2012

In C# you use “null” to indicate a lack of value. This is, in a sense, both too permissive and too restrictive. I covered the “too permissive” aspect, the fact that you can’t ask for a reference that’s not potentially null, in a previous post. In this post I’ll be covering the “too restrictive” aspect, and implementing a reasonable solution in the form of an option type.

You can use the option type, right now, by referencing its NuGet package. You can also view the source code on github.

Option types

The basic problem with null in C# is that it only allows for one level of indicating that a value is missing. This prevents you from telling why a value is missing in a nice way. For example, consider the method FirstOrDefault. When given an empty list of strings, FirstOrDefault indicates that there is no first value by returning null. But, when given a list starting with a null string, it also returns null. The distinction between these two cases is often important, but FirstOrDefault can’t do it for you (despite containing the necessary logic). We can refactor FirstOrDefault to distinguish between these two cases but… what type of thing should it return? The answer is: an option type.

Option types are pretty popular in functional languages. The idea is to have a generic type, typically with a name like Option<T>, Maybe<T>, or Nullable<T> (I’ll be using the shorter May<T>), that can logically contain either a value of type T or no value. Whenever you need the ability to represent lack-of-value, you can return a May<T> instead of a T without worrying about ambiguities. Pretty much anything you can nest (pointers, Tuple<T>, List<T>) is usable as an option type, although it may be awkward to do so.

The ability to nest allows you to distinguish between multiple lack-of-value cases by choosing at what level the lack-of-value occurs. If you need two lack-of-value cases, you can return a May<May<T>>. One case will correspond to a May<May<T>> that does not contain a value of type May<T> and the other case will correspond to a May<May<T>> that contains a value of type May<T> that does not contain a value of type T.

I should point out that C# does have a built-in sorta-kinda option type: Nullable<T>. You use it with special syntax: append “?” to a type T. For example, the type “int?” may contain an int or may contain null. However, T? is only valid when T is a value type. You can’t use it on reference types and you can’t nest nullable inside of itself: neither “object?” nor “int??” are allowed. These restrictions are justified by the special rules for boxing nullable types, but of course cause problems in general. If we want a proper option type, we’ll have to write our own.

Relation to null

Before we start coding an option type, we need to make a choice: how do we interact with null? Null is the built-in way to represent a lack of value, so it’s important to interoperate with it in a useful way. The “right way” might seem obvious on the surface, but it’s actually very controversial. I’m aware of three common choices:

  1. Emulate null. Make your option type behave as much like a reference as possible. This is the approach taken by .Net’s Nullable<T>.
  2. Replace null. Don’t allow a null to be stored inside your option type. Encourage using your option type instead of ever using null. This is how the proposed Optional type for Java 8 works.
  3. Augment null. Treat null like any other value: a possibility for what can be stored inside the optional type, distinct from not storing a value. F# interacts with null values in this way.

Emulating null is ideal, except for the exceptions, but impractical to implement as a library. There’s also already the built-in Nullable type that takes this approach, and it is helped along by special rules that hide some of the differences between it and true reference types (like how boxing works, and the automatic ‘lifting’ of operators). Unfortunately, mere mortals like me don’t have access to the sort of black magic that makes ((object)(int?)1).GetType() return Int32. It would be great to have a proper option type that interoperated well with null and allowed nesting, I just don’t think it can be done properly without help from the language.

Replacing nulls is tempting, but ultimately a bad idea. For example, consider refactoring the FirstOrDefault method I mentioned earlier into an OptionalFirst method that returns an OptionNotNull<T>. What will happen when you invoke this method on a list starting with a null string? Well, the method will attempt to construct an OptionNotNull containing a null, which is not allowed. You need to add logic to detect this case and replace it with a request for an OptionNotNull containing no value instead of one containing a value. Except, that’s the result that you were using to represent the empty list case! You need the ability to represent a second lack-of-value case, which could be done by either adding another layer of optional nesting or returning a null OptionNotNull (ummm…). Both of these options are bad: you’re either confusing people by half-replacing half-using null or burdening people with the task of always applying the option type twice to be safe. I would strongly recommend not trying to replace nulls, because it leads you to fighting the language instead of using it.

Since emulating nulls is impractical and replacing nulls is a bad idea, I chose to implement an option type that augments nulls. The main downside of this approach is that users must be conscious of both nullability and the option type. However, given that nulls are already everywhere, that was going to be the case anyways.

Avoiding forced errors

Another issue I want to address, before talking about implementation, are the exceptions caused by trying to access a value that isn’t there. The fact that this occurs is unfortunate, because it’s possible to avoid it by design (for most cases). A good example of avoiding this type of error by design is the usage of pattern matching in functional languages. You’ll never see a null reference exception caused by pattern matching, because it makes that mistake impossible. When using pattern matching, the compiler will tell you if you haven’t handled the lack-of-value case:

match potentialValue with
| NoValue -> 0 // if you forget this line, the compiler complains
| Some v -> v * v

C# doesn’t have pattern matching, but that doesn’t mean we can’t push users into the pit of success. We can still ensure error cases must be introduced instead of avoided. The safe approach I settled on (I tried a couple) was to just emulate pattern matching: have a Match method that is told what to do in each case. In case you’re worried that supporting only matching is less expressive than exposing a Value property, here’s how the equivalent of Nullable<T>.Value is implemented via Match:

public static T ForceGetValue(this May<T> potentialValue) {
    return potentialValue.Match(
        e => e, 
        () => { throw new InvalidOperationException("No Value"); });
}

There’s a similarly simple construction for ‘HasValue’ but, since HasValue is always well-defined and safe to use, it’s not worth the effort of creating an indirect re-incantation of it.

Implementation Details

The May<T> type is a struct, which is ideal for avoiding null reference exceptions when working with one that happens to contain no value. May<T>’s default value is an instance not containing a value, and you can get such an instance conveniently via May<T>.NoValue. For creating instances containing a value, there’s a simple one-argument constructor.

One unfortunate downside of using a value type instead of a reference type is that it requires us to sacrifice covariance (only interfaces may be covariant in C#, and interfaces are reference types). A May<string> is not a May<object>.

To avoid requiring the repetition of type information, I included an extension method ‘Maybe’ and a static field ‘May.NoValue’. Maybe wraps whatever you give it into an instance of May<TypeOfGivenThing>. May.NoValue has a type that implicitly casts itself to a NoValue instance of any May<T> type. Both are especially useful when you’re trying to work with an anonymous type, since anonymous types don’t have utterable names.

The existence of ‘May.NoValue’ has consequences on how equality works: NoValue is considered equivalent across types. For example, May<int>.NoValue is equal to May<string>.NoValue, including details like having the same hash code (but note that they are different from a NoValue nested inside a potential value, like May<int>.NoValue.Maybe()). This is done because equality that breaks when an implicit conversion is omitted is confusing. May.NoValue has to be equal to May<T>.NoValue, or else subtle differences that affect whether or not an implicit cast occurs (like using Object.Equals instead of ==) would start to matter. Since equality has to be transitive, this further implies every May<T>.NoValue must be equal to every other. Internally, equality across the types is implemented by using a hidden IMayHaveValue interface implemented by both May<T> and MayNoValue, the hidden type of May.NoValue. Beware the unfortunate downside of this equality: the incorrect implication that ((string)null).Maybe() will be equal to ((object)null).Maybe() when it really won’t.

Internally, May<T> stores a value and a boolean flag to determine if the value is specified or not. Externally, the only way to get at that value is via the Match method (and equality comparisons, I guess). Match is actually very simple:

public TOut Match<TOut>(Func<T, TOut> valueProjection, Func<TOut> alternativeFunc) {
    if (valueProjection == null) throw new ArgumentNullException("valueProjection");
    if (alternativeFunc == null) throw new ArgumentNullException("alternativeFunc");
    return _hasValue ? valueProjection(_value) : alternativeFunc();
}

Note that the alternative value is given as a function, allowing its potentially expensive computation to be avoided.

Writing other utility methods in terms of _hasValue and _value is possible, but I wanted to keep the core May<T> type minimal and safe. As a result, all of the utility methods for working with May<T> are implemented as extension methods that ultimately delegate to the Match method. For example, the oh-so-useful Else method and the methods that make linq queries work are implemented like this:

public static T Else<T>(this May<T> potentialValue, Func<T> alternativeFunc) {
    if (alternativeFunc == null) throw new ArgumentNullException("alternativeFunc");
    return potentialValue.Match(e => e, alternativeFunc);
}
public static May<TOut> Bind<TIn, TOut>(this May<TIn> potentialValue, Func<TIn, May<TOut>> projection){
    if (projection == null) throw new ArgumentNullException("projection");
    return potentialValue.Match(projection, () => NoValue);
}
public static May<TOut> Select<TIn, TOut>(this May<TIn> value, Func<TIn, TOut> projection) {
    if (projection == null) throw new ArgumentNullException("projection");
    return value.Bind(e => projection(e).Maybe());
}
public static May<T> Where<T>(this May<T> value, Func<T, bool> filter) {
    if (filter == null) throw new ArgumentNullException("filter");
    return value.Bind(e => filter(e) ? e.Maybe() : NoValue);
}
public static May<TOut> SelectMany<TIn, TMid, TOut>(this May<TIn> source,
                                                    Func<TIn, May<TMid>> maySelector,
                                                    Func<TIn, TMid, TOut> resultSelector) {
    if (maySelector == null) throw new ArgumentNullException("maySelector");
    if (resultSelector == null) throw new ArgumentNullException("resultSelector");
    return source.Bind(s => maySelector(s).Select(m => resultSelector(s, m)));
}

These methods allows us to write query expressions that treat optional values as if they were lists that contained either 0 or 1 items:

var r = (from v1 in potentialInt1 // the query evaluates to NoValue if potentialInt1 is NoValue
         from v2 in potentialInt2 // the query also evaluates to NoValue if potentialInt2 is NoValue
         where v1 != v2 // the query ALSO evaluates to NoValue when the extracted values are equal
         select 1.0 / (v1 - v2); // the result is the inverse difference, assuming we made it this far
        ).Else(double.NegativeInfinity) // use a default value of -infinity if the query returned NoValue

Fun! There’s other utility methods in the MayExtensions class, but also ones related to other types like IEnumerable<T> in the MayUtilities class.

Usage

The simplest task you can do with the library is produce optional values (instances of May<T>). I’ve already mentioned May.NoValue and the Maybe extension method, but sometimes code is clearer than words:

//using Strilanc.Value

// you can get a type's lack-of-value by asking for it explicitely:
May<int> noInt = May<int>.NoValue;
// or by asking for a default value:
May<int> noIntAgain = default(May<int>);
// but the easiest way is May.NoValue, which doesn't require repeating the type:
May<int> noIntYetAgain = May.NoValue;
May<Dictionary<string, List<int>>> noComplicatedThing = May.NoValue;

// you can get a potential value by using May<T>'s constructor:
May<int> mayZeroAgain = new May<int>(0);
// or you can avoid repeating the type by using the Maybe extension method:
May<int> mayZero = 0.Maybe();
May<string> mayIPlease = "I Please".Maybe();
// and you can even get away with raw values, as long as there's an implied cast:
May<bool> mayTrue = true;

With the ability to create optional values in hand, you can write useful methods that return them. The naming convention I’ve settled on for naming methods returning optional results is to use the prefix “May”. For example, one of the methods in the example project is a wrapper around int.TryParse called MayParseInt32:

///<summary>Returns the signed 32 bit integer represented by a string, if there is one.</summary>
public static May<int> MayParseInt32(this string text) {
    int result;
    if (!Int32.TryParse(text, out result)) return May.NoValue;
    return result;
}

The library’s MayUtilities class implements several methods that produce optional values, mostly related to reducing sequences to a single value. These variants all have “May” prefixed to their name, and you use them exactly like the standard enumerable methods. Some examples:

// 'MayFirst' returns the first value in a sequence, unless it's empty
May<int> noInt = new int[0]
    .MayFirst();
May<int> mayZero = new[] { 0, 1, 2, 3 }
    .MayFirst();

// 'MayLast' returns the last value in a sequence, unless it's empty
May<int> noIntAgain = new int[0]
    .MayLast();
May<int> mayThree = new[] { 0, 1, 2, 3 }
    .MayLast();

// 'MayAggregate' combines the values in a sequence together into a single value, unless it's empty
May<string> mayIPlease = new[] {"may", "I", "Please"}
    .MayAggregate((e1, e2) => e1 + e2);

// 'WhereHasValue' skips the lack-of-values in a sequence
IEnumerable<string> acd = new[] { "a".Maybe(), May.NoValue, "c".Maybe(), "d".Maybe() }
    .WhereHasValue();

// 'MayAll' directly enumerates the values in a sequence, unless some are missing
May<IEnumerable<string>> abcd = new[] { "a".Maybe(), "b".Maybe(), "c".Maybe(), "d".Maybe() }
    .MayAll();
May<IEnumerable<string>> noSequence = new[] { "a".Maybe(), May.NoValue, "c".Maybe(), "d".Maybe() }
    .MayAll();

The advantage of methods that return a May<T>, as opposed to a value and a boolean flag, is the manipulations you can do on the result. The MayExtensions file contains manipulation methods. Here’s some examples of manipulating potential values:

// 'Select' transforms the potential value
May<double> mayPi = 1.Maybe()
    .Select(e => e * Math.PI);
May<double> noDouble = May<int>.NoValue
    .Select(e => e * Math.PI);
May<double> mayPiAgain = from e in 1.Maybe() select e * Math.PI;

// 'Else' gets the potential value or else uses an alternate value
int two = 2.Maybe().Else(5);
int five = May<int>.NoValue.Else(5);

// 'Match' either transforms the potential value or else uses an alternate
bool truth = "".Maybe()
    .Match(value => true, () => false);
bool falsehood = May<string>.NoValue.Maybe()
    .Match(value => true, () => false);

// 'ForceGetValue' gets the potential value or else throws an exception
int one = 1.Maybe().ForceGetValue();
int throwsException = May<int>.NoValue.ForceGetValue();

// 'IfHasValueThenDo' and 'ElseDo' are useful for performing actions that have no returned result
// you can also use the more imperative "if (x.HasValue) x.ForceGetValue()" instead
1.Maybe()
    .IfHasValueThenDo(value => System.Diagnostics.Debug.WriteLine("" + value))
    .ElseDo(() => System.Diagnostics.Debug.WriteLine("no val"));
May<int>.NoValue
    .IfHasValueThenDo(value => System.Diagnostics.Debug.WriteLine("" + value))
    .ElseDo(() => System.Diagnostics.Debug.WriteLine("no val"));

That’s all there is to it. The library provides methods to create potential values from scratch, to derive potential values from common operations like aggregation, and to safely and conveniently manipulate potential values once you have them. Go nuts!… just don’t try to fight the language. Some of the features of C#, like iterator methods and async methods, work better when paired with an imperative style of programming. Be aware of when it’s simpler to use ForceGetValue rather than the high level methods like Select and IfHasValueThenDo.

There are several examples of usage in the MainWindow.cs file in the example project.

Summary

Option types are a compelling augmentation over null, despite not being built into the language. They’re naturally non-ambiguous and can even minimize instances of lack-of-value exceptions.

If you’re interested, try out the option type described in this post by referencing the Strilanc.Value.May NuGet package (right-click project references -> Manage NuGet Packages…) or check out the source code on github.

Discuss on Reddit

Comments are closed.


Twisted Oak Studios offers consulting and development on high-tech interactive projects. Check out our portfolio, or Give us a shout if you have anything you think some really rad engineers should help you with.

Archive


About Joyk


Aggregate valuable and interesting links.
Joyk means Joy of geeK