Type Safety, Variance, and Generics

The goal of a type-safe language is to make sure you don’t shoot yourself in the foot. One of the cornerstones of modeling is the ability to perform substitutions. One form of that is polymorphism – the ability for different objects to accept the same messages. In a type-safe language, the language (compiler) allows you to substitute object types if both object types can accept the same message in some formal way (the important thing being "in some formal way" that the compiler can verify). Many type-safe languages formalize this through defined interfaces. The interface specifies a contract comprising of what members are available, so if two classes implement the same interface, the common interface members are guaranteed to be available on both classes. This principle also applies to a class' "natural" or "default" interface, so that if one class derives from a base class, it is polymorphic with the base class as well, meaning that it can accept all the base class' messages. Let's take a look at a (somewhat) classical example:

C#
public interface IShape
{
    float Height { get; set; }
    float Width {get; set; }
}

public class Rectangle : IShape
{
    // implementation goes here...
}

public class Triangle : IShape
{
    // implementation goes here...
}
VB.NET
Public Interface IShape
    Property Height() As Single
    Property Width() As Single
End Interface

Public Class Rectangle Implements IShape
    ' implementation goes here...
End Class

Public Class Triangle Implements IShape
    ' implementation goes here...
End Class

Because Rectangle and Triangle both implement IShape, they can both guarantee that Height and Width are available. Therefore, it is safe to substitute a reference to IShape with a reference to Rectangle or Triangle, since you can call all of IShape's members from Rectangle or Triangle. For example:

C#
IShape shape = new Rectangle();
shape.Height = 22.5;
shape.Width = 50.2;
or
IShape shape = new Triangle();
shape.Height = 12.8;
shape.Width = 12.8;
VB.NET
Dim shape As IShape = New Rectangle()
shape.Height = 22.5
shape.Width = 50.2
or
Dim shape As IShape = New Triangle()
shape.Height = 12.8
shape.Width = 12.8

And, as I mentioned above, you can do a similar substitution safely with inherited types. For example, in .NET, all classes inherit from Object, so you can substitute a reference to Object with a reference to any class. For example:

C#
object x = new Button();
VB.NET
Dim x As Object = New Button()

We can do this substitution because Button supports all of Object’s members through inheritance. We can call Equals(), ToString(), GetType(), GetHashCode(), etc. from any Button instance.

The reverse, however, isn't true. We can’t safely do the following substitution:

C#
public void DoStuff(object input)
{
    Button btn = input;
    Console.WriteLine(btn.Text);
}
VB
Public Sub DoStuff(input As Object)
    Dim btn As Button = input
    Console.WriteLine(btn.Text)
End Sub

You can't safely substitute Object for Button, because Object doesn't implement all of Button's members. In this case, what happens if we send an integer or a Customer class instance to the method? These types do not implement Text, Height, Width, or any of the other Button-specific members. Therefore, the compiler squawks and stops you from doing the assignment of Object to the Button reference.

For C# and VB.NET (with option strict ON), the rule is that you can safely assign a narrowing type to a more general reference. In other words, if type Y inherits Z (that is, Y : Z), you can safely assign Z = Y. Sometimes, but not always, Z may contain a reference to Y or a Y-derived type (in our example above, that would mean that sometimes someone might actually send a Button through the object input parameter). But it is not safe to assume that, so in order to assign Y = Z you need a specific cast. In other words, Y = (Y)Z, or in VB, you would use a DirectCast or Convert. This is your way of telling the compiler, "yes, I know it’s not safe to do the assignment, but trust me, it will work in this case" (and if it doesn’t, you'll end up with a runtime cast error).

So now with all that in mind, let's talk a little bit about variance and generics. First, the variance part:
Let's say you have types Y and Z, where Y : Z
If the substitution Z = Y is allowed, the types are Covariant.
If the substitution Y = Z is allowed, the types are Contravariant.
If no substitutions are allowed, the types are Invariant.

That brings us to the generics part, which makes it a little more interesting. Currently, all the C# (and VB) generic types are completely invariant, which really sucks. Let's look at a few scenarios.

First, let's use the simple Nullable type as an example.
Nullable defines a HasValue member, and a Value member. The HasValue member is not type-parameter-dependent, but Value is. What that means is that for Nullable<T>, HasValue doesn’t depend on T, but Value does depend on T.

If you have Nullable<int> and Nullable<CustomStruct>, you could call HasValue on both without caring about the specific type parameter. However, you cannot substitute CustomStruct for int (or vice-versa), so you cannot be type-agnostic when calling Value.

You could infer through this little exercise that it would be possible to call all non-type-parameter-dependent members on two generic-derived types (if they came from the same generic template) as if they came from the same "base" class. That is to say, if you have generic template type G<T>, and types Y and Z, and you instantiate G<Y> and G<Z>, both resulting types G<Y> and G<Z> have all the same members from G<T> that are not dependant on Y and Z (the T parameter), and thus you could call all these non-type-parameter-dependent members from both G<Y> and G<Z> in a polymorphic manner.

But you would be wrong. Not because it's impossible or unsafe to do so, but because the CLR/C#/VB guys choose not to include support for doing that. In this case, you have to understand that generic-derived types (G<Y> and G<Z> in this case) are two completely separate classes generated from the same template (and somewhat important to this matter is that they are generated at runtime) – no inheritance or interface mapping is involved. G<Y> and G<Z> are completely invariant. To make that work, you would have to create an IG interface containing all the non-type-parameter-dependent methods, and have G<T> implement that interface. Alternately, you could create a G abstract base class that G<T> inherits. If you implement either of those solutions, you can substitute G<Y> or G<Z> for either IG or G. It's a pain and in some cases (as with ValueTypes) it doesn't work well or at all. There could have been automatic support for this. In fact, I think this would be a very useful addition to generics in .NET.

Secondly, you could infer that for G<Y> and G<Z>, their type-parameter-dependent methods would be covariant if Y : Z. The inference is correct, but again, since the current compilation of generics results in two completely different and independent classes, this won't work either. This is the big pain point most people seem to complain about with regards to generic collections. List<Object> and List<String> are not covariant, so you can't assign a List<String> to List<Object> reference even though Object and String are covariant, especially given the fact that you can put inheritance constraints on type parameters (not that you necessarily need constraints to make this work).

I believe the reason that people intuitively feel they can do this is (and remember that most people use generics for the built-in collections) partly because Array-based types are in fact covariant in .NET. Someone from MS offhandedly told me it was only because they needed to support J#, and Java has covariant arrays. Regardless, the same logic (sadly) doesn't apply to generic collection types or any other generic type for that matter.

So what are some workarounds? I briefly mentioned two above, and I'll expand on it here.

The first is to use make the generic template implement a common interface. For example, if I want to create a generic Field<T> class template, and I want some sort of variance, I need to create something like an IField interface.

C#
public interface interfaceIField : INotifyPropertyChanged, IDataErrorInfo
{
    bool IsChanged { get; }
    string Name { get; }
}

public class Field<T> : IField
{
    public event PropertyChangedEventHandler PropertyChanged;

    private string nameField;
    private bool isChangedField;
    private T valueField;

    public string Name
    {
        get { return nameField; }
    }

    public bool IsChanged
    {
        get { return isChangedValue; }
    }

    public T Value
    {
        get { return valueField; }
        set
        {
            if (Validate(value))
                valueField = value;
            isChangedField = true;
            if (PropertyChanged != null)
                PropertyChanged(new PropertyChangedEventArgs(nameField))
        }
    }

    public abstract bool Validate(T newValue);

    // rest of IDataErrorInfo implementation here
}
VB.NET
Public Interface IField Inherits INotifyPropertyChanged, IDataErrorInfo
    ReadOnly Property IsChanged() As Boolean
    ReadOnly Property Name() As String
End Interface

Public Class Field(of T) Implements IField

    Public Event PropertyChanged As PropertyChangedEventHandler _
        Implements INotifyPropertyChanged.PropertyChanged

    Private nameField As String
    Private isChangedField As Boolean
    Private fieldValueField As String

    Public Property Name() As String
        Get 
            Return nameField
        End Get
    End Property

    Public Property IsChanged() As Boolean
        Get
            Return isChangedValue
        End Get
    End Property

    Public Property FieldValue() As T
        Get
            Return fieldValueField
        End Get
        Set (ByVal value As T)
            If Validate(Value) Then fieldValueField = Value
            isChangedField = True
            RaiseEvent PropertyChanged(New PropertyChangedEventArgs(nameField))
        End Set
    End Property

    Public Mustinherit Function Validate(newValue As T) As Boolean

    ' rest of IDataErrorInfo implementation here
End Class

Now, Field<int>, Field<decimal?>, Field<DateTime>, or Field<anything> have variance through the IField interface, meaning that you can interchangeably call Name, IsChanged, and hook into the events and members of INotifyPropertyChanged and IDataErrorInfo from any instance of any class derived from the Field<T> template through the IField interface. Without IField, you could never do any of that. This would also work if you created a Field abstract base class that Field<T> inherited.

The problem is that this won't work if you create a generic Struct template instead of a generic class template. Structs (or any value type) can't inherit, so you can't use a base class. If you use an interface (like the IField interface above), you run into issues with boxing, which aside from performance, also preclude you from affecting the actual values of the original value type.

This could work just fine without the additional layers of interfaces and inherited base classes if the compilers and the CLR allowed it work – and that would be the only way to make it work with structs. This is somewhat compounded by the fact that both C# and VB.NET force you to create template-derived classes as if you were "inheriting" the template, meaning that you cannot "inherit" any other base classes. However, you don't get any of the polymorphic benefits of inheriting a true base class. So while the CLR guys are somewhat shrugging it off, I think it's a very worthwhile endeavor to add variance to generics.

posted @ Thursday, November 29, 2007 12:28 PM

Print
«April»
SunMonTueWedThuFriSat
303112345
6789101112
13141516171819
20212223242526
27282930123
45678910