io7m | archive (zip, signature)

Type Tricks 3 - Generic Visitors

Most Java programmers are at least somewhat familiar with the visitor pattern. However, few Java programmers fully understand how the design pattern can be used to emulate a subset of the functionality of pattern matching on values of algebraic types. This article attempts to present a clear introduction to the visitor pattern, to clarify the relationship to algebraic types, and to demonstrate how the formulation in Java actually allows for (amusingly) both a subset and a slight superset of the functionality of the types available in languages such as as ML and Haskell. The article gives a terse introduction to the basics of algebraic types in order to explain how the Java emulation relates to them. For a more complete coverage, see the Crash Course In Algebraic Types article. Almost any introductory material to Haskell will espouse the expressiveness, ability to eliminate large classes of bugs, and the basic ability of algebraic types to subsume entire parts of other programming languages. An excellent presentation on the use of algebraic types to produce reliable software is Effective ML by Yaron Minsky.

Algebraic Types

An algebraic type in most typed functional languages is defined as a closed type with a fixed set of injective and disjoint constructors. In those languages, a constructor (more precisely, a data constructor) of type T is simply a function that takes zero or more arguments and yields a value of type T. Informally, the injective property of constructors means if two values of T are considered equal, then they must have been produced using the same constructor applied to the same arguments. The disjoint property means that if two values of T were constructed with different constructors, then they are not equal. Accordingly, it is assumed that the only way to produce a value of type T is by applying arguments to one of T's constructors. The term closed is taken to mean that all of the constructors of the type are introduced in the type definition itself. In other words, the compiler knows all of the possible constructors of a type by looking at one definition. An open type, in contrast, would be somewhat analogous to Java's classes: New subclasses of a class can be added at any time and it is not possible for the compiler to know all of the existing subclasses. The property of being closed allows the compiler to perform useful static checks, as will be shown shortly.

A simple algebraic type definition in Haskell might look something like the following:

data Color = Red | Green | Blue

The Color type has three constructors: Red, Green, and Blue. None of the constructors take any arguments. Given an arbitrary value v of type Color, the value can be deconstructed by pattern matching with a case expression:

case v of
  Red   -> ...
  Green -> ...
  Blue  -> ...

This is similar to using a switch statement to examine a value of an enum type in C or Java. However, the switch statement as defined in those languages only provides a tiny subset of the functionality provided by full pattern matching. The compiler ensures that all possible cases have been covered, and that cases have not been specified more than once. It can do this because the type is closed, as described earlier. A more complicated algebraic type is as follows:

data Snack =
    Apple Color Integer Integer
  | Cookie Integer Boolean
  | Biltong Integer

Each constructor represents a type of snack food. Note that the constructors take (anonymous) arguments, and each constructor takes a different set of arguments. When pattern matching values of the Snack type, the programmer can assign names to constructor arguments of interest. For example, the first Integer argument of each constructor is intended to represent the weight of the food in question. The following weight function returns the weight of a given Snack value:

weight :: Snack -> Integer
weight s =
  case s of
    Apple  _ w _ -> w
    Cookie w _   -> w
    Biltong w    -> w

The programmer assigns the name _ to constructor arguments that do not need names due to not being used in the definition of the function.

Algebraic types can be recursive (a type can refer to itself in its own definition) and polymorphic (a type can be parameterized by other types. As an example, the type of recursive polymorphic binary trees might look something like:

data BinaryTree a =
    Leaf a
  | Tree (BinaryTree a) (BinaryTree a)

The a parameter refers to an arbitrary type, so the type BinaryTree Integer represents the type of binary trees that hold Integer values in their leaves.

The classic example of a non-recursive but polymorphic type is the Option type:

data Option a =
    None
  | Some a

The Option type is useful for any situation where a function may or may not return a value. In languages that do not have algebraic types, the convention is for such a function to return null or some other value that is arbitrarily assumed to mean "nothing". The problems with this approach are widely documented, but the basic issue is that the programmer is not required by the compiler to explicitly examine the returned value before use. The Option type only exposes a usable value via the Some constructor, and the programmer is required to explicitly pattern match on values of type Option in order to use the value contained within. This simple fact statically eliminates classes of bugs that are rampant in languages that have nullable references. Amusingly, Java 8 adds an Optional type which contains a get method that raises NoSuchElementException if the value is not present, missing the point of explicitly matching values entirely and more or less introducing a new type to reproduce all of the problems of nullable references.

Essentially, pattern matching has two main aspects: The ability to discriminate via case analysis between different categories of values of the same overall type, and the ability to extract the values of fields within a given value via destructuring. In the author's experience, case analysis is the aspect that is most useful when the types are used to enforce safety properties in programs.

Visitors

A visitor in Java is an attempt to emulate a closed type that allows case analysis. Reusing the Snack example above, a SnackMatchableType interface is defined:

interface SnackMatchableType
{
  <A> A matchSnack(SnackMatcherType<A> m);
}

The matchSnack function accepts a value of type SnackMatcherType. It's not yet possible to define SnackMatcherType because it's first necessary to define some types of snacks that can be matched. In languages such as ML and Haskell, the constructors of an algebraic type are represented as functions. In the Java emulation, each constructor is represented by its own class, each of which implement the interface that defines the type as a whole. Therefore, for each of the snack constructors, new types are defined that implement the SnackMatchableType interface:

final class Apple implements SnackMatchableType
{
  ...
}

final class Cookie implements SnackMatchableType
{
  ...
}

final class Biltong implements SnackMatchableType
{
  ...
}

Now, finally, the SnackMatcherType interface:

interface SnackMatcherType<A>
{
  A matchApple(Apple a);
  
  A matchCookie(Cookie c);
  
  A matchBiltong(Biltong b);
}

All that remains is for each snack type to call the correct function on the defined SnackMatcherType interface:

final class Apple implements SnackMatchableType
{
  @Override <A> A matchSnack(SnackMatcherType<A> m)
  {
    return m.matchApple(this);
  }
}

final class Cookie implements SnackMatchableType
{
  @Override <A> A matchSnack(SnackMatcherType<A> m)
  {
    return m.matchCookie(this);
  }
}

final class Biltong implements SnackMatchableType
{
  @Override <A> A matchSnack(SnackMatcherType<A> m)
  {
    return m.matchBiltong(this);
  }
}

With the above infrastructure in place, it's possible to perform case analysis on an arbitrary value of type SnackMatchableType:

SnackMatchableType s;

s.matchSnack(new SnackMatcherType<Void>() {
  @Override public Void matchApple(Apple a)
  {
    System.out.println("Got an apple");
    return null;
  }
  
  @Override public Void matchCookie(Cookie c)
  {
    System.out.println("Got a cookie");
    return null;
  }
  
  @Override public Void matchBiltong(Biltong b)
  {
    System.out.println("Got biltong");
    return null;
  }
});

Aside from the fact that a value of any reference type in Java can be null, the above provides a reasonable emulation of a closed type. If at any point a new class of type SnackMatchableType is added, it will not be possible to implement the SnackMatchableType interface until a new function is added to the SnackMatcherType interface that mentions the new type. The compiler will immediately raise type errors for any existing declared instances of SnackMatcherType indicating all of the places in the program that need to be updated with the new function. The emulation cannot support the arbitrary destructuring that's commonly used in typed functional languages. However, the emulation described does allow for one interesting feature: Because each constructor is represented by a separate class and because classes can implement any number of *MatchableType interfaces, it's effectively possible for a constructor to belong to more than one algebraic type.

Multiple Visitors

In ML and Haskell, a constructor may only belong to one algebraic type. In order to further categorize sets of types, it's necessary to introduce new wrapper types. For example:

data Apple = Apple Color Integer Integer
data Cookie = Cookie Integer Boolean
data Biltong = Biltong Integer

data Healthy =
    HealthyApple Apple
  | HealthyBiltong Biltong
  
data Sweet =
    SweetApple Apple
  | SweetCookie Cookie

data NotFruit =
    NotFruitCookie Cookie
  | NotFruitBiltong Biltong

In the Java emulation, the wrapper types are simply represented as more of the MatchableType and MatcherType interfaces:

interface HealthyType
{
  <A> A matchHealthy(HealthyMatcherType<A> m);
}

interface HealthyMatcherType<A>
{
  A matchHealthyApple(Apple a);

  A matchHealthyBiltong(Biltong b);
}

interface SweetType
{
  <A> A matchSweet(SweetMatcherType<A> m);
}

interface SweetMatcherType<A>
{
  A matchSweetApple(Apple a);

  A matchSweetCookie(Cookie c);
}

interface NotFruitType
{
  <A> A matchNotFruit(NotFruitMatcherType<A> m);
}

interface NotFruitMatcherType<A>
{
  A matchNotFruitCookie(Cookie c);

  A matchNotFruitBiltong(Biltong b);
}

final class Apple implements SnackMatchableType, HealthyType, SweetType
{
  @Override <A> A matchSnack(SnackMatcherType<A> m)
  {
    return m.matchApple(this);
  }

  @Override <A> A matchHealthy(HealthyMatcherType<A> m)
  {
    return m.matchHealthyApple(this);
  }
  
  @Override <A> A matchSweet(SweetMatcherType<A> m)
  {
    return m.matchSweetApple(this);
  }
}

final class Cookie implements SnackMatchableType, NotFruitType, SweetType
{
  @Override <A> A matchSnack(SnackMatcherType<A> m)
  {
    return m.matchCookie(this);
  }

  @Override <A> A matchSweet(SweetMatcherType<A> m)
  {
    return m.matchSweetCookie(this);
  }
  
  @Override <A> A matchNotFruit(NotFruitMatcherType<A> m)
  {
    return m.matchNotFruitCookie(this);
  }
}

final class Biltong implements SnackMatchableType, HealthyType, NotFruitType
{
  @Override <A> A matchSnack(SnackMatcherType<A> m)
  {
    return m.matchBiltong(this);
  }

  @Override <A> A matchHealthy(HealthyMatcherType<A> m)
  {
    return m.matchHealthyBiltong(this);
  }

  @Override <A> A matchNotFruit(NotFruitMatcherType<A> m)
  {
    return m.matchNotFruitBiltong(this);
  }
}

Conclusion

Various degrees of pattern matching have been implemented in other JVM languages. The Ceylon language supports pure case analysis with completeness checking, but without destructuring. The Scala language implements pattern matching on case classes, with match completeness checking for sealed case classes (closed types). Almost all of the io7m packages use generic visitors in the manner described above to prevent bugs and enforce program invariants. Given the typical Java conservatism, Java will probably gain closed types and pattern matching abilities some time after 2025.