Thursday, February 23, 2012

Static upcast in C#

I was rather surprised to realize only recently, after using C# for so many years, that it doesn't have a proper static upcast operator. By "static upcast operator" I mean a built-in language operator or a function that upcasts with a static (i.e. compile-time) check.

C# actually does implicit upcasting and most people probably don't even realize it. Consider this simple example:

Stream Fun() {
    return new MemoryStream();
}

Whereas in F# we have to do this upcast explicitly, or we get a compile-time error:

let Fun () : Stream = 
    upcast new MemoryStream()

The reason being that type inference is problematic in the face of subtyping [1].

Now how does this interact with parametric polymorphism (generics)?

C# 4.0 introduced variant interfaces, so we can write:

IEnumerable<IEnumerable<Stream>> Fun() {
    return new List<List<MemoryStream>>();
}

Note that covariance is not implicit upcasting: List<List<MemoryStream>> is not a subtype of IEnumerable<IEnumerable<Stream>>.

But this doesn't compile in C# 3.0, requiring conversions instead. When the supertypes are invariant we have to start converting. Even in C# 4.0 if you target .NET 3.5 the above snippet does not compile because System.Collections.Generic.IEnumerable<T> isn't covariant in T. And even in C# 4.0 targeting .NET 4.0 this doesn't compile:

ICollection<ICollection<Stream>> Fun() {
    return new List<List<MemoryStream>>();
} 

because ICollection<T> isn't covariant in T. It's not covariant for good reason: it contains mutators (i.e. methods that mutate the object implementing the interface), so making it covariant would make the type system unsound (actually, this already happens in C# and Java) [2][3].

A programmer new to C# might try the following to appease the compiler (ReSharper suggests this so it must be ok? UPDATE: I submitted this bug and ReSharper fixed it.):

ICollection<ICollection<Stream>> Fun() {
    return (ICollection<ICollection<Stream>>)new List<List<MemoryStream>>();
}

(attempt #1)

It compiles! But upon running the program, our C# learner is greeted with an InvalidCastException.

The second suggestion on ReSharper says "safely cast as...":

ICollection<ICollection<Stream>> Fun() {
    return new List<List<MemoryStream>>() as ICollection<ICollection<Stream>>;
}

(attempt #2)

And sure enough, it's safe since it doesn't throw, but all he gets is a null.

So our hypothetical developer googles a bit and learns about Enumerable.Cast<T>(), so he tries:

ICollection<ICollection<Stream>> Fun() {
    return new List<List<MemoryStream>>()
        .Cast<ICollection<Stream>>().ToList();
}

(attempt #3)

Yay, no errors! Ok, let's add elements to this list:

ICollection<ICollection<Stream>> Fun() {
    return new List<List<MemoryStream>> { 
        new List<MemoryStream> { 
            new MemoryStream(), 
        } 
    }
        .Cast<ICollection<Stream>>().ToList();
}

(attempt #4)

Oh my, InvalidCastException is back...

Determined to make this work, he learns a bit more about LINQ and gets this to compile:

ICollection<ICollection<Stream>> Fun() {
    return new List<List<MemoryStream>> { 
        new List<MemoryStream> { 
            new MemoryStream(), 
        } 
    }
    .Select(x => (ICollection<Stream>)x).ToList();
}

(attempt #5)

But gets another InvalidCastException. He forgot to convert the inner list! He tries again:

ICollection<ICollection<Stream>> Fun() {
    return new List<List<MemoryStream>> { 
        new List<MemoryStream> { 
            new MemoryStream(), 
        } 
    }
        .Select(x => (ICollection<Stream>)x.Select(y => (Stream)y).ToList()).ToList();
}

(attempt #6)

This (finally!) works as expected.

Experienced C# programmers are probably laughing now at these obvious mistakes, but there are two non-trivial lessons to learn here:

  1. Avoid applying Enumerable.Cast<T>() to IEnumerable<U> (for T,U != object). Indeed, Enumerable.Cast<T>() is the source of many confusions, even unrelated to subtyping [4] [5] [6] [7] [8], and yet often poorly advised [9] [10] [11] [12] [13] [14] since it's essentially not type-safe. Cast<T>() will happily try to cast any type into any other type without any compiler check.
    Other than bringing a non-generic IEnumerable into an IEnumerable<T>, I don't think there's any reason to use Cast<T>() on an IEnumerable<U>.
    The same argument can be applied to OfType<T>().
  2. It's easy to get casting wrong (not as easy as in C, but still), particularly when working with complex types (where the definition of 'complex' depends on each programmer), when the compiler checks aren't strict enough (here's a scenario that justifies why C# allows seemingly 'wrong' casts as in attempt #5).

Note how in attempt #6 the conversion involves three upcasts:

  • MemoryStream -> Stream (explicit through casting)
  • List<Stream> -> ICollection<Stream> (explicit through casting)
  • List<ICollection<Stream>> -> ICollection<ICollection<Stream>> (implicit)

What we could use here is a static upcast operator, a function that only does upcasts and no other kind of potentially unsafe casts, that doesn't let us screw things up no matter what types we feed it. It should catch any invalid upcast at compile-time. But as I said at the beginning of the post, this doesn't exist in C#. It's easily doable though:

static U Upcast<T, U>(this T o) where T : U {
    return o;
}

With this we can write:

ICollection<ICollection<Stream>> Fun() {
    return new List<List<MemoryStream>> { 
        new List<MemoryStream> { 
            new MemoryStream(), 
        } 
    }
    .Select(x => x.Select(y => y.Upcast<MemoryStream, Stream>()).ToList().Upcast<List<Stream>, ICollection<Stream>>()).ToList();
}

You may object that this is awfully verbose. Maybe so, but you can't screw this up no matter what types you change. The verbosity stems from the lack of type inference in C#. You may also want to lift this to operate on IEnumerables to make it a bit shorter, e.g:

static IEnumerable<U> SelectUpcast<T, U>(this IEnumerable<T> o) where T : U {
    return o.Select(x => x.Upcast<T, U>());
}
ICollection<ICollection<Stream>> Fun() {
    return new List<List<MemoryStream>> {
        new List<MemoryStream> {
            new MemoryStream(),
        }
    }
    .Select(x => x.SelectUpcast<Stream, Stream>().ToList().Upcast<List<Stream>, ICollection<Stream>>()).ToList();
}

Alternatively, we could have used explicitly typed variables to avoid casts:

ICollection<ICollection<Stream>> Fun() {
    return new List<List<MemoryStream>> {
        new List<MemoryStream> {
            new MemoryStream(),
        }
    }
    .Select(x => {
        ICollection<Stream> l = x.Select((Stream s) => s).ToList();
        return l;
    }).ToList();
}

I mentioned before that F# has a static upcast operator (actually two, one explicit/coercing and one inferencing operator). Here's what the same Fun() looks like in F#:

let Fun(): ICollection<ICollection<Stream>> = 
    List [ List [ new MemoryStream() ]]
    |> Seq.map (fun x -> List (Seq.map (fun s -> s :> Stream) x) :> ICollection<_>)
    |> Enumerable.ToList
    |> fun x -> upcast x

Now if you excuse me, I have to go replace a bunch of casts... ;-)

References

3 comments:

Joh. said...

I wonder, wouldn't it be possible to write Upcast so that callers arent forced to specify the first type parameter? C# should be able to infer it.

Mauricio Scheffer said...

@Joh : turns out it is possible and it doesn't look that bad: https://gist.github.com/1895927

Matthew Rhoden said...

One of the few blogs I've come across that use the exact techinical terms. I'm officially reading your blogs until I at least get on your level.