This document is a survey of the various implementations of *algebraic data types* on the JVM.

In order to provide the requisite background for discussion, the document attempts to describe the theory behind so-called *algebraic types*, which can be thought of as the general form of the restricted *case classes* typically implemented on the JVM. It makes some attempt to describe the motivation behind having the types, but does not attempt to spend much time advocating for them.

Then, an examination is made of existing JVM languages that have implemented analogous versions of the types. In addition, an attempt is made to examine implementations of the types as libraries for the Java language.

Finally, a simple pattern is described that attempts to combine the best aspects of each implementation in Java.

An *algebraic data type* in most typed functional languages is defined as a *closed* type with a fixed set of *injective* and *disjoint* constructors. In those languages, a *constructor* (more precisely, a *data constructor*) of type `T`

is simply a function that takes zero or more arguments and yields a value of type `T`

. Informally, the *injective* property of constructors means if two values of `T`

are considered equal, then they *must* have been produced using the same constructor applied to the same arguments. The *disjoint* property means that if two values of `T`

were constructed with different constructors, then they are not equal. Accordingly, it is assumed that the only way to produce a value of type `T`

is by applying arguments to one of `T`

's constructors. The term *closed* is taken to mean that all of the constructors of the type are introduced in the type definition itself. In other words, the compiler knows all of the possible constructors of a type by looking at one definition. An *open* type, in contrast, would be somewhat analogous to Java's classes: New subclasses of a class can be added at any time and it is not possible for the compiler to know all of the existing subclasses. The property of being *closed* allows the compiler to perform useful static checks, as will be shown shortly.

The classic example for an *algebraic type* is the modelling of simple arithmetic expressions, using Haskell syntax:

```
data Expression =
Constant Integer
| Add Expression Expression
| Multiply Expression Expression
```

For those unfamiliar with Haskell syntax, the preceding definition defines a new type called `Expression`

. The type has three *data constructors*: `Constant`

, `Add`

, and `Multiply`

. Each constructor is a function that takes values of the specified types and yields a value of type `Expression`

. For example, the `Add`

constructor takes two values of type `Expression`

. The `Expression`

type is *closed*: There is no way to add additional constructors later without modifying the type. The compiler can see all of the possible ways to construct a value of type `Expression`

and makes use of this information as will shortly be demonstrated.

Astute readers will notice that at no point were names specified for any of the parameters. The reason for this is that functional languages such as Haskell rely on *pattern matching* to extract the values of fields specified in types. Pattern matching is achieved using `case`

expressions ^{1}, as used here to implement a trivial evaluator for arithmetic expressions:

```
evaluate :: Expression -> Integer
evaluate x =
case x of
(Constant n) -> n
(Add e0 e1) -> (evaluate e0) + (evaluate e1)
(Multiply e0 e1) -> (evaluate e0) * (evaluate e1)
```

The above code defines a new (recursive) function called `evaluate`

that takes a single argument that we name `x`

. The *type signature* `Expression -> Integer`

indicates that the function takes a value of type `Expression`

and returns a value of type `Integer`

. For those unfamilar with pattern matching, the `case`

expression can seem quite surprising. In order to correctly read the expression, one should note that the left hand side of each branch of the case expression is a *pattern* that *destructures* a given value, optionally binding *names* to each field within the destructured value. The *discriminee* of the `case`

expression (here, `x`

), is matched against each of the patterns from top to bottom, stopping at the first pattern that matches.

For example, the `(Add e0 e1)`

pattern matches all `Expression`

values that were constructed using the `Add`

constructor. It binds the name `e0`

to the first `Expression`

argument, and the name `e1`

to the second. These names are only in scope on the right hand side of that branch. In this case, they are passed recursively to the `evaluate`

function.

Evaluating the function in the `ghci`

command-line interpreter gives the expected results:

```
ghci> evaluate (Add (Multiply (Constant 5) (Constant 10)) (Constant 14))
64
```

In Haskell, patterns can essentially be *nested* to provide extremely specific matches. For example, the following is a valid (although in this case, not very useful) pattern:

` (Add e0@(Add k1 k2) (Multiply k3 k4)) -> ...`

The pattern matches `Expression`

values that were constructed by passing two `Expression`

s that were themselves constructed using the `Add`

and `Multiply`

constructors, respectively, to the `Add`

constructor. The `e0@(...)`

syntax binds the name `e0`

to the given subexpression so that it can be used in the right hand side of the branch, if necessary.

The compiler is capable of detecting *inexhaustive* patterns. For example, the following definition would be rejected at compile-time:

```
evaluate :: Expression -> Integer
evaluate x =
case x of
(Constant n) -> n
(Add e0 e1) -> (evaluate e0) + (evaluate e1)
```

The programmer has forgotten to handle the `Multiply`

case and so the match is inexhaustive. This specific feature is often championed by users of languages such as Haskell and OCaml as one of the reasons why algebraic types are instrumental in making and keeping programs *correct*. When a new constructor is added to the `Expression`

type, the compiler immediately notifies the programmer of all of the places where their code is now incomplete! In the words of Yaron Minksy in his Effective ML ^{2} talk:

Case analysis is a deep, ingrained part of programming. You see it everywhere. Having basic checks like the fact that your case analysis is exhaustive is very powerful. The key benefit you get from it over and over again is as a refactoring tool. As your code evolves, you can use these exhaustiveness checks to figure out where you need to fix things in response to changes in the codebase.

It's hard for me to overemphasize how central this is. Like... Every day that I program, I use this! It is one of the key things that keeps me sane. You can take lots of other features of the language away from me but with my dying breath, I will keep my exhaustiveness checks. They are, I think, the single best feature of the language. This kind of automated checking for what is just a really important part of the day-to-day part of the pragmatics of programming.

Overlapping cases are handled similarly. The following program is rejected at compile-time:

```
evaluate :: Expression -> Integer
evaluate x =
case x of
(Constant n) -> n
(Add e0 e1) -> (evaluate e0) + (evaluate e1)
(Add e0 e1) -> (evaluate e0) + (evaluate e1)
```

This point is somewhat less important for the purposes of enforcing program correctness, but is included here for completeness.

Finally, patterns are checked to ensure that they match the arity of the mentioned constructors. The following program is rejected at compile-time:

```
evaluate :: Expression -> Integer
evaluate x =
case x of
(Constant n) -> n
(Add e0 e1) -> (evaluate e0) + (evaluate e1)
(Multiply e0 e1 e2) -> (evaluate e0) * (evaluate e1)
```

The programmer has accidentally assumed that the `Multiply`

constructor takes three arguments instead of two. This is also useful for the purposes of refactoring: If the number of arguments to a constructor changes, the compiler will highlight every `case`

expression that now needs to be updated.

The point that should be taken from the above examples is that *algebraic types* and *pattern matching* provide two main features: *destructuring* and *case analysis*. *Destructuring* is the process of examining the actual structure of a value, and *Case analysis* is the process of of statically distinguishing between a fixed set of possible cases: "Is this expression a constant, an addition, or a multiplication?".

It is the position of the author of this document that for a language such as Java, destructuring is far less important than case analysis for the purposes of producing correct programs, and for the purposes of getting the compiler to assist during refactoring. This is particularly true when balanced against the intrusive changes that would be required to the language to support destructuring (such as the introduction of some form of pattern language to describe the structure of Java classes). In Java, we simply access fields of a given class by name (or by accessor method). In the experience of the author, the addition or removal of cases to/from a type tends to indicate a more fundamental change to the program than the addition or removal of a single field in a type, and therefore it is much more important that the programmer be told about all of the places where changes to the code are now required.

The Scala language is an impure, strict, hybrid functional/object-oriented programming language for the JVM.

The language provides algebraic types via so-called *case classes*. It implements full structural pattern matching with exhaustiveness checking. The Scala equivalent of the Haskell `Expression`

example given earlier is as follows:

```
sealed trait Expression
case class Constant (n : Int) extends Expression
case class Add (e0 : Expression, e1 : Expression) extends Expression
case class Multiply (e0 : Expression, e1 : Expression) extends Expression
```

This example demonstrates a theme common to all of the implementations discussed here: In contrast to Haskell, where *data constructors* are represented by *functions*, all of the JVM implementations of algebraic types represent the constructors as *classes*. The `sealed`

keyword indicates that the `Expression`

class can only be extended by classes defined *in the same file*. This essentially *closes* the type and allows the compiler to perform exhaustiveness checking when pattern matching on a value of type `Expression`

. Scala also supports non-`sealed`

case classes, but in the opinion of the author, their utility is limited due to lacking exhaustiveness checks. As an additional note, a `case class`

in Scala also contains automatically generated `equals`

, `hashCode`

and `toString`

methods.

Pattern matching in Scala is very similar to Haskell, and full structural matching is supported:

```
object Expression {
def evaluate (e : Expression) : Integer = {
e match {
case Constant (n) => n
case Add (e0, e1) => evaluate (e0) + evaluate (e1)
case Multiply (e0, e1) => evaluate (e0) * evaluate (e1)
}
}
}
```

The `object`

keyword introduces a new singleton object. This is used here simply because the language does not directly support `static`

methods. Examination of the bytecode produced by compilation shows that the above `match`

expression is simply transformed to a chain if `instanceof`

checks. Use of the CFR ^{3} tool makes this clear:

```
/*
* Decompiled with CFR 0_110.
*/
public Integer evaluate(Expression e) {
Expression expression = e;
if (expression instanceof Constant) {
Constant constant = (Constant)expression;
int n = constant.n();
return Predef..MODULE$.int2Integer(n);
}
if (expression instanceof Add) {
Add add = (Add)expression;
Expression e0 = add.e0();
Expression e1 = add.e1();
return Predef..MODULE$.int2Integer(Predef..MODULE$.Integer2int(this.evaluate(e0)) + Predef..MODULE$.Integer2int(this.evaluate(e1)));
}
if (!(expression instanceof Multiply)) throw new MatchError((Object)expression);
Multiply multiply = (Multiply)expression;
Expression e0 = multiply.e0();
Expression e1 = multiply.e1();
return Predef..MODULE$.int2Integer(Predef..MODULE$.Integer2int(this.evaluate(e0)) * Predef..MODULE$.Integer2int(this.evaluate(e1)));
}
```

The Ceylon language is, like Scala, an impure, strict, hybrid functional/object oriented programming language for the JVM.

Ceylon does not implement full algebraic types and pattern matching. It instead implements a subset of the types that it refers to as *enumerated types*. The Ceylon equivalent of the Haskell the `Expression`

example given earlier is as follows:

```
abstract class Expression () of Constant | Add | Multiply { }
class Constant (n) extends Expression () {
shared Integer n;
}
class Add (e0, e1) extends Expression () {
shared Expression e0;
shared Expression e1;
}
class Multiply (e0, e1) extends Expression () {
shared Expression e0;
shared Expression e1;
}
```

The main point to note is that the `Expression`

class declaration explicitly enumerates its subclasses. This allows the compiler to perform exhaustiveness checking when matching on values of type `Expression`

. Ceylon does not implement full pattern matching - only *case analysis*:

```
object expressions {
Integer evaluate (Expression e) {
switch (e)
case (is Constant) { return e.n; }
case (is Add) { return evaluate (e.e0) + evaluate (e.e1); }
case (is Multiply) { return evaluate (e.e0) * evaluate (e.e1); }
}
}
```

In the above code, each `case`

of the `switch`

expression specifies an assertion that `e`

is an instance of one of the subclasses of `Expression`

. The right hand side of each case branch is written as if `e`

had been cast to an instance of the specified subclass. Exhaustiveness checks work as expected:

```
object expressions {
Integer evaluate (Expression e) {
switch (e)
case (is Constant) { return e.n; }
case (is Add) { return evaluate (e.e0) + evaluate (e.e1); }
}
}
```

```
error: case types must cover all cases of the switch type or an else clause must appear: 'Constant|Add' does not cover 'Expression'
switch (e)
^
1 error
```

Inspection of the bytecode demonstrates that the above code is transformed to a simple series of `instanceof`

checks. The CFR tool produces the following Java code when executed on the bytecode produced by the Ceylon compiler:

```
/*
* Decompiled with CFR 0_110.
*/
@TypeInfo(value="ceylon.language::Integer")
private final long evaluate$priv$(@Name(value="e") @TypeInfo(value="Expression") Expression e) {
Expression sel$0 = e;
if (sel$0 instanceof Constant) {
Constant e$3 = (Constant)sel$0;
return e$3.getN();
}
if (sel$0 instanceof Add) {
Add e$2 = (Add)sel$0;
return this.evaluate$priv$(e$2.getE0()) + this.evaluate$priv$(e$2.getE1());
}
if (sel$0 instanceof Multiply) {
Multiply e$1 = (Multiply)sel$0;
return this.evaluate$priv$(e$1.getE0()) * this.evaluate$priv$(e$1.getE1());
}
throw new EnumeratedTypeError("Supposedly exhaustive switch was not exhaustive");
}
```

The Kotlin language is a continuation of the theme of impure, strict, hybrid functional/object oriented programming languages for the JVM.

Like Ceylon, the Kotlin language implements only *case analysis*. A class can be marked as `sealed`

in a similar manner to Scala and this fact can then be used for exhaustiveness checking in the language's analogue of pattern matching. The Kotlin equivalent of the Haskell the `Expression`

example given earlier is as follows:

```
sealed class Expression {
class Constant (val n: Int) : Expression ()
class Add (val e0: Expression, val e1: Expression) : Expression ()
class Multiply (val e0: Expression, val e1: Expression) : Expression ()
}
```

The `evaluate`

function is similarly unsurprising:

```
object Expressions {
fun evaluate (e: Expression): Int =
when (e) {
is Expression.Constant -> e.n
is Expression.Add -> evaluate (e.e0) + evaluate (e.e1)
is Expression.Multiply -> evaluate (e.e0) * evaluate (e.e1)
}
}
```

The definitions are almost identical to those in Ceylon. The generated bytecode is equally unsurprising:

```
/*
* Decompiled with CFR 0_110.
*/
public final int evaluate(@NotNull Expression e) {
int n;
Intrinsics.checkParameterIsNotNull((Object)e, (String)"e");
Expression expression = e;
if (expression instanceof Expression.Constant) {
n = ((Expression.Constant)e).getN();
} else if (expression instanceof Expression.Add) {
n = this.evaluate(((Expression.Add)e).getE0()) + this.evaluate(((Expression.Add)e).getE1());
} else if (expression instanceof Expression.Multiply) {
n = this.evaluate(((Expression.Multiply)e).getE0()) * this.evaluate(((Expression.Multiply)e).getE1());
} else {
throw new NoWhenBranchMatchedException();
}
return n;
}
```

The Frege language is a lazily-evaluated, pure-functional programming language for the JVM, heavily inspired by Haskell.

As Frege more or less aspires to be a direct implementation of Haskell for the JVM, the existing Haskell `Expression`

example compiles without any changes beyond the addition of an explicit module name:

```
module Expressions where
data Expression =
Constant Integer
| Add Expression Expression
| Multiply Expression Expression
evaluate :: Expression -> Integer
evaluate x =
case x of
(Constant n) -> n
(Add e0 e1) -> (evaluate e0) + (evaluate e1)
(Multiply e0 e1) -> (evaluate e0) * (evaluate e1)
```

Although the Frege language follows Haskell in representing *data constructors* as functions in the language itself, it actually transforms the constructors to individual classes in a similar manner to all of the languages surveyed so far. The `evaluate`

function is transformed into the following bytecode:

```
/*
* Decompiled with CFR 0_110.
*/
public static final BigInteger evaluate(TExpression tExpression) {
TExpression.DConstant dConstant = tExpression._Constant();
if (dConstant != null) {
BigInteger bigInteger = (BigInteger)Delayed.forced((Object)dConstant.mem1);
return bigInteger;
}
TExpression.DAdd dAdd = tExpression._Add();
if (dAdd != null) {
TExpression tExpression2 = (TExpression)dAdd.mem2.forced();
TExpression tExpression3 = (TExpression)dAdd.mem1.forced();
return Expressions.evaluate(tExpression3).add(Expressions.evaluate(tExpression2));
}
TExpression.DMultiply dMultiply = tExpression._Multiply();
assert (dMultiply != null);
TExpression tExpression4 = (TExpression)dMultiply.mem2.forced();
TExpression tExpression5 = (TExpression)dMultiply.mem1.forced();
return Expressions.evaluate(tExpression5).multiply(Expressions.evaluate(tExpression4));
}
```

In this case, a somewhat different representation is used than a chain of `instanceof`

checks: The `TExpression`

case contains one method per data constructor, and each data constructor class implements this interface. For example, the generated `DConstant`

class (the class generated for the `Constant`

constructor) is as follows:

```
/*
* Decompiled with CFR 0_110.
*/
public static final class DConstant extends Algebraic implements TExpression {
public final Object mem1;
private DConstant(Object object) {
this.mem1 = object;
}
public final int _constructor() {
return 0;
}
public static final TExpression mk(Object object) {
return new DConstant(object);
}
@Override
public final DConstant _Constant() {
return this;
}
@Override
public final DAdd _Add() {
return null;
}
@Override
public final DMultiply _Multiply() {
return null;
}
}
```

The `DConstant`

class only returns a reference to itself via the `_Constant`

method. The other classes work similarly. In effect, a chain of `instanceof`

checks is transformed into a chain of non-null checks.

Various attempts have been made to retrofit algebraic types into the Java language. The main requirements appear to be that pattern matching should:

Not allocate memory

^{4}.Operate in

`O(1)`

time, if possible.Become a compile-time error if cases are added or removed from the discriminee type.

One representation of algebraic types in Java attempts to re-use the language's `enum`

types. First, the `Expression`

type:

```
import java.util.Objects;
abstract class Expression
{
enum Type
{
TYPE_CONSTANT,
TYPE_ADD,
TYPE_MULTIPLY
}
private final Type type;
private Expression(final Type t)
{
this.type = Objects.requireNonNull(t);
}
public final Type type()
{
return this.type;
}
public final class Constant extends Expression
{
public final int n;
public Constant(int i)
{
super(Type.TYPE_CONSTANT);
this.n = i;
}
}
public final class Add extends Expression
{
public final Expression e0;
public final Expression e1;
public Add(final Expression in_e0, final Expression in_e1)
{
super(Type.TYPE_ADD);
this.e0 = Objects.requireNonNull(in_e0);
this.e1 = Objects.requireNonNull(in_e1);
}
}
public final class Multiply extends Expression
{
public final Expression e0;
public final Expression e1;
public Multiply(final Expression in_e0, final Expression in_e1)
{
super(Type.TYPE_MULTIPLY);
this.e0 = Objects.requireNonNull(in_e0);
this.e1 = Objects.requireNonNull(in_e1);
}
}
}
```

The code assigns one `enum`

value to each class extending the base `Expression`

type. The code attempts to make the `Expression`

type *closed* by making the constructor `private`

. The compiler does not understand that the type is supposed to be a closed type, it merely happens as a side effect of the language's visibility rules.

The `evaluate`

method requires dangerous casts:

```
public class Expressions
{
public static int evaluate(Expression e)
{
switch (e.type())
{
case TYPE_CONSTANT:
{
Expression.Constant c = (Expression.Constant) e;
return c.n;
}
case TYPE_ADD:
{
Expression.Add a = (Expression.Add) e;
return evaluate (a.e0) + evaluate (a.e1);
}
case TYPE_MULTIPLY:
{
Expression.Multiply m = (Expression.Multiply) e;
return evaluate (m.e0) * evaluate (m.e1);
}
}
throw new IllegalStateException("Unreachable code");
}
}
```

Advantages of this approach:

The generated bytecode is very efficient due to the compiler's handling of

`enum`

types: The code is transformed to a`tableswitch`

instruction, which operates on integers. At no point are any allocations performed, and the performance should be`O(1)`

if the`tableswitch`

instruction is compiled to code that is`O(1)`

.Some degree of exhaustiveness checking is performed: If a case is added to or removed from the

`enum`

type, each`switch`

statement that examines it will become a compilation error.The

`switch`

statement is fairly syntactically light.

Disadvantages of this approach:

The programmer is expected to know that each

`enum`

case corresponds to a particular`Expression`

subclass and must manually cast each time.Even for

`enum`

types, the Java compiler cannot prove that no statements beyond each of the`switch`

branches are reachable. The programmer must manually add a non-terminating statement such as a`throw`

to indicate code that is assumed to be unreachable.The actual

`Expression`

declaration contains a lot of boilerplate with the potential for programmers to assign the wrong`enum`

value in the constructor of each case class.

Another approach uses a so-called *generic visitor* type ^{5}. A given type is paired with a *visitor* type (typically just an `interface`

) and case analysis is achieved by passing an instance of this visitor type to a `match`

method declared on each expression class. There are mutual dependencies between the expression class, the visitor type, and each expression case class, so they are presented here in one code segment:

```
import java.util.Objects;
interface Expression
{
<A> A match(ExpressionVisitor<A> m);
}
interface ExpressionVisitor<A>
{
A onConstant(Constant e);
A onAdd(Add e);
A onMultiply(Multiply e);
}
final class Constant implements Expression
{
public final int n;
Constant(int i)
{
this.n = i;
}
public <A> A match(ExpressionVisitor<A> m)
{
return m.onConstant(this);
}
}
final class Add implements Expression
{
public final Expression e0;
public final Expression e1;
Add(final Expression ie0, final Expression ie1)
{
this.e0 = Objects.requireNonNull(ie0);
this.e1 = Objects.requireNonNull(ie1);
}
public <A> A match(ExpressionVisitor<A> m)
{
return m.onAdd(this);
}
}
final class Multiply implements Expression
{
public final Expression e0;
public final Expression e1;
Multiply(final Expression ie0, final Expression ie1)
{
this.e0 = Objects.requireNonNull(ie0);
this.e1 = Objects.requireNonNull(ie1);
}
public <A> A match(ExpressionVisitor<A> m)
{
return m.onMultiply(this);
}
}
```

The definition of the `Expression`

interface is such that subclasses are required to implement the `match`

method. The definition of the `ExpressionVisitor`

interface is such that a method is provided for each possible subclass and no others. The only way for a (terminating) `match`

method to produce a value of an arbitrary type `A`

is to call one of the methods on the given `ExpressionVisitor`

type, passing the implementing class as an argument.

Pattern matching involves constructing a (possibly anonymous) instance of type `ExpressionVisitor`

and passing it to the `match`

method of a given `Expression`

. This is how the `evaluate`

function is implemented:

```
public class Expressions
{
public static int evaluate (Expression e)
{
return e.match(new ExpressionVisitor<Integer>() {
@Override public Integer onConstant(Constant c)
{
return c.n;
}
@Override public Integer onAdd(Add a)
{
return evaluate (a.e0) + evaluate (a.e1);
}
@Override public Integer onMultiply(Multiply a)
{
return evaluate (a.e0) * evaluate (a.e1);
}
});
}
}
```

Advantages of this approach:

There are fewer opportunities for mistakes. The type system requires the programmer to implement certain methods, and barring non-termination, it is easier for the programmer to implement them correctly than incorrectly.

Upon adding a case to the

`Expression`

class, all instances of type`ExpressionVisitor`

now need to be updated. Essentially, this provides the desired exhaustiveness checking.It is

`O(1)`

in terms of time. Each`match`

statement compiles down to a simple`invokeinterface`

instruction. The`evaluate`

method compiles down to a`new`

instruction followed by an`invokeinterface`

instruction.

Disadvantages of this approach:

Every pattern match incurs an allocation. Whether the JIT may or may not be able to eliminate those allocations is entirely JVM-specific

^{6}. In low-latency code that is trying to produce as little garbage as possible, this can mean that the types are outright unusable.The anonymous class instance syntax is verbose. At the very minimum, the programmer is expected to write out the full definition of the

`ExpressionVisitor`

class every time pattern matching is performed.If code inside a pattern match needs to throw a checked exception - it cannot! The visitor type must be extended with another type parameter

`E`

with an upper bound of`Throwable`

or`Exception`

, and each visitor method must be annotated with`throws E`

.

An issue that is either an advantage or a disadvantage depending on your point of view is that nothing prevents a non-final case class from being added. This can actually be used to useful effect, as described in *Type Tricks 3 - Generic Visitors* ^{7}, but could also result in unintended side effects if the programmer designing the API forgets to add a `final`

keyword to one or more classes.

*Catamorphism* is a term from *category theory*^{8}. Without attempting a long-winded and tedious description of the exact meaning, it can informally be considered as a higher order function that accepts one function-typed parameter for each constructor of an algebraic type. The only practical difference between this formulation and the generic visitor approach described above is the use of individual functions as opposed to anonymous class instances containing sets of methods. The two approaches are morally equivalent.

```
import java.util.Objects;
import java.util.function.Function;
public interface Expression
{
<A> A match (
Function<Constant, A> constant,
Function<Add, A> add,
Function<Multiply, A> multiply
);
}
final class Constant implements Expression
{
public final int n;
Constant(int i)
{
this.n = i;
}
public <A> A match(
Function<Constant, A> constant,
Function<Add, A> add,
Function<Multiply, A> multiply)
{
return constant.apply(this);
}
}
final class Add implements Expression
{
public final Expression e0;
public final Expression e1;
Add(final Expression ie0, final Expression ie1)
{
this.e0 = Objects.requireNonNull(ie0);
this.e1 = Objects.requireNonNull(ie1);
}
public <A> A match(
Function<Constant, A> constant,
Function<Add, A> add,
Function<Multiply, A> multiply)
{
return add.apply(this);
}
}
final class Multiply implements Expression
{
public final Expression e0;
public final Expression e1;
Multiply(final Expression ie0, final Expression ie1)
{
this.e0 = Objects.requireNonNull(ie0);
this.e1 = Objects.requireNonNull(ie1);
}
public <A> A match(
Function<Constant, A> constant,
Function<Add, A> add,
Function<Multiply, A> multiply)
{
return multiply.apply(this);
}
}
```

The `evaluate`

method is somewhat refreshingly terse ^{9}:

```
public class Expressions
{
public static int evaluate (Expression e)
{
return e.match(
(Constant constant) -> constant.n,
(Add add) -> evaluate (add.e0) + evaluate (add.e1),
(Multiply multiply) -> evaluate (multiply.e0) * evaluate (multiply.e1));
}
}
```

Advantages of this approach:

As with the generic visitor approach, there are few opportunities for mistakes. The type system requires the programmer to implement certain methods, and barring non-termination, it is easier for the programmer to implement them correctly than incorrectly.

It is

`O(1)`

in terms of time. Each`match`

call appears to be compiled down to three`invokedynamic`

calls followed by a single`invokeinterface`

call. The`invokedynamic`

calls are due to the current implementation of lambda functions on the JVM^{10}.It appears to be better in terms of allocations than the generic visitor approach. The

`evaluate`

method is transformed into a form that appears to avoid allocations by using static lambda instances. Recent work on the JVM has supposedly been focused on having the JIT compiler aggressively inline and cache lambdas for performance reasons, and this approach directly benefits from that work.

Disadvantages of this approach:

If a case class is added or removed,

*every single case class*must now be updated in addition to every`match`

call, due to the fact that the`match`

method mentions all of the possible cases.The Java language specification limits methods to

`256`

parameters. This effectively limits algebraic types to`256`

cases or fewer. Whether or not this is likely to be a problem is debatable.If code inside a pattern match needs to throw a checked exception - it cannot! The passed-in function types must be extended with another type parameter

`E`

with an upper bound of`Throwable`

or`Exception`

, and the match method must be annotated with`throws E`

.

Finally, this approach is somewhat less useful on versions of Java prior to Java 8. Without syntax support for function instances, calls to `match`

typically end up becoming more verbose than the generic visitor approach.

The adt4j package is an implementation of algebraic types in Java.

It is difficult to give a description of the package without simply duplicating the documentation presented in the `adt4j`

repository. Suffice to say, it is an annotation processor that transforms `abstract`

classes or `interfaces`

into a set of classes using the generic visitor formulation for pattern matching. The `adt4j`

documentation provides an example analogous to the `Expression`

type used in this document so far:

```
@GenerateValueClassForVisitor
@Visitor(resultVariableName="R", selfReferenceVariableName="E")
interface ExpressionVisitor<E, R> {
@GeneratePredicate(name = "isLiteral");
R lit(int i);
R sum(@Getter(name = "leftOperand") E e1, @Getter(name = "rightOperand") E e2);
R mul(@Getter(name = "leftOperand") E e1, @Getter(name = "rightOperand") E e2);
}
```

An example of the `evaluate`

function is also provided:

```
int value = e.accept(new ExpressionVisitor<Expression, Integer>() {
@Override
public Integer int(int i) {
return i;
}
@Override
public Integer sum(Expression e1, Expression e2) {
return e1.accept(this) + e2.accept(this);
}
@Override
public Integer mul(Expression e1, Expression e2) {
return e1.accept(this) * e2.accept(this);
}
});
```

The package protects the programmer from some of the verbosity of the type declarations required to implement the generic visitor approach, but otherwise has the same advantages and disadvantages as documented.

The derive4j package is another implementation of algebraic types in Java.

The package has the programmer define a container class with a generic visitor interface called `Cases`

:

```
@Data
public abstract class Expression {
interface Cases<R> {
R Constant (Integer value);
R Add (Expression left, Expression right);
R Multiply (Expression left, Expression right);
}
public abstract <R> R match(Cases<R> cases);
}
```

The package then generates a rather alarming amount of code ^{11} that defines a so-called *fluent* interface that allows the programmer to build, for example, the `evaluate`

function:

```
private static Function<Expression, Integer> eval = Expressions.cases()
.Constant (value -> value)
.Add ((left, right) -> eval (left) + eval (right))
.Mult ((left, right) -> eval (left) * eval (right));
public static Integer evaluate (Expression expression)
{
return eval.apply(expression);
}
```

Suffice to say, the generated code uses aspects of both the generic visitor and catamorphism approaches. An inspection of the generated code seems to suggest that every `match`

call will entail not only the allocation of multiple anonymous `Cases`

class instances, but a set of lambda instances too. The example code appears to work around this issue by declaring static instances and reusing them (as the `evaluate`

example above demonstrates).

*Update: The author of derive4j, JB Giraudeau, kindly contacted me to clarify a few aspects of the above:*

*Firstly, catamorphisms are only defined for non-recursive types, and derive4j derives catamorphisms as shown in the following lines:*

https://gist.github.com/jbgi/3904e696fb27a2e33ae1#file-expressions-java-L61

*Additionally, derive4j supports both the visitor and catamorphism forms as user-visible matching options, so users may pick which approach they want to use. The derive4j implementation generates 2N classes for a type with N cases, and supports more advanced features like (safe) partial pattern matches. Checked exceptions are not supported because a typical functional programming approach will instead use structures such as the Either*

*Finally, derive4j does avoid allocations on visitor-based matching by storing static references to visitors - the allocation is performed once and then is not repeated for subsequent matches.*

The javagadt package is another implementation of algebraic types in Java.

Rather than relying on annotated abstract classes, it defines a description language that is used to generate Java classes directly:

```
data Expression
= Constant (n : Integer)
| Add (e0 : Expression, e1 : Expression)
| Multiply (e0 : Expression, e1 : Expression)
```

The package uses the generic visitor representation for the generated classes, with an unfortunate use of method overloading for the generated visitor:

```
public interface Matcher<A> {
A match(Expression.Constant c);
A match(Expression.Add a);
A match(Expression.Multiply m);
}
int r = e.match(new Matcher<Integer>() {
@Override
public Integer match(Expression.Constant c) {
return c.n();
}
@Override
public Integer match(Expression.Add a) {
return a.e0() + a.e1();
}
@Override
public Integer match(Expression.Multiply m) {
return m.e0() + m.e1();
}
});
```

As expected, the package has the same advantages and disadvantages inherent in the generic visitor representation.

The *library* implementations of algebraic types in Java all suffer from the same weaknesses due to lack of support two concepts in Java: The ability to make a set of classes *closed*, and the ability to efficiently select one class from a closed set in `O(1)`

time and without allocating memory.

A Java implementation that combines as many of the advantages (with as few of the disadvantages as possible) of each possible encoding of algebraic types is fairly straightforward. The catamorphism encoding is used as a base, with a slight modification that should guarantee (on current JVM implementations) that pattern matching will never incur an allocation.

On current JVM implementations, a lambda expression is only guaranteed not to incur an allocation if it does not capture any local variables or fields. It is possible in the majority of cases to avoid capturing variables by threading an explicit *context* value through the lambda applications. Code that previously looked like this:

```
interface Producer
{
int apply();
}
static int runner(Producer f)
{
return f.apply();
}
private String data;
int f(int x)
{
return runner(() -> x + Integer.parseInt(this.data));
}
```

Is rewritten to allow passing of contextual values explicitly:

```
interface Producer<T>
{
int apply(T context);
}
static <T> int runner(T context, Producer<T> f)
{
return f.apply(context);
}
private String data;
private int z;
int f(int x)
{
this.z = x;
return runner(this, (t) -> t.z + Integer.parseInt(t.data));
}
```

The lambda expression passed to `runner`

no longer captures any variables, and an inspection of the bytecode shows a simple transformation to a static method call:

```
private String data;
private int z;
static <T> int runner(T t, Producer<T> producer) {
return producer.apply(t);
}
int f(int n) {
this.z = n;
return Test3.runner(this, (Producer<Test3>)LambdaMetafactory.metafactory(null, null, null, (Ljava/lang/Object;)I, lambda$f$0(Test3 ), (LTest3;)I)());
}
private static /* synthetic */ int lambda$f$0(Test3 test3) {
return test3.z + Integer.parseInt(test3.data);
}
static interface Producer<T> {
public int apply(T var1);
}
```

The first part of the implementation therefore requires the creation of a function type used for matching:

```
interface Matcher<T, C, A, E extends Exception>
{
A apply(T case_type, C context) throws E;
}
```

The `T`

parameter corresponds to the type of a specific case. The `C`

parameter corresponds to the type of context values. The `A`

parameter denotes the type of returned values, and the `E`

parameter denotes the type of exceptions that may be raised.

Then, for a type `K`

with three cases `C0`

, `C1`

, and `C2`

:

```
interface K
{
<C, A, E extends Exception> A match(
C context,
Matcher<C0, C, A, E> on_c0,
Matcher<C1, C, A, E> on_c1,
Matcher<C2, C, A, E> on_c2)
throws E;
}
class C0 implements K
{
private final String data;
C0(String x)
{
this.data = x;
}
public <C, A, E extends Exception> A match(
C context,
Matcher<C0, C, A, E> on_c0,
Matcher<C1, C, A, E> on_c1,
Matcher<C2, C, A, E> on_c2)
throws E
{
return on_c0.apply(this, context);
}
}
class C1 implements K
{
private final Integer data;
C1(Integer x)
{
this.data = x;
}
public <C, A, E extends Exception> A match(
C context,
Matcher<C0, C, A, E> on_c0,
Matcher<C1, C, A, E> on_c1,
Matcher<C2, C, A, E> on_c2)
throws E
{
return on_c1.apply(this, context);
}
}
class C2 implements K
{
private final Boolean data;
C2(Boolean x)
{
this.data = x;
}
public <C, A, E extends Exception> A match(
C context,
Matcher<C0, C, A, E> on_c0,
Matcher<C1, C, A, E> on_c1,
Matcher<C2, C, A, E> on_c2)
throws E
{
return on_c2.apply(this, context);
}
}
```

Then, for matching on a value of type `K`

:

```
private final String extra;
String doSomething(K k)
throws IOException
{
return k.match (this,
(c0, t) -> c0.data + t.extra,
(c1, t) -> { throw new IOException("Ouch!"); },
(c2, t) -> c2.data + t.extra);
}
```

In summary, the advantages of this encoding are:

As with the generic visitor approach, there are few opportunities for mistakes. The type system requires the programmer to implement certain methods, and barring non-termination, it is easier for the programmer to implement them correctly than incorrectly.

It is

`O(1)`

in terms of time. Each`match`

call appears to be compiled down to three`invokedynamic`

calls followed by a single`invokeinterface`

call. The`invokedynamic`

calls are due to the current implementation of lambda functions on the JVM^{13}.With an explicit context parameter and non-capturing lambdas, the code is guaranteed to be better in terms of allocations than any of the other Java encodings. The use of lambdas over anonymous class instances ensures that this code benefits from any future improvements to the JVM infrastructure used to support them.

Functions passed to the

`match`

call can raise checked exceptions via the`E`

type parameter.The encoding has the most terse matching syntax out of any of the encodings.

The disadvantages of this encoding are:

If a case class is added or removed,

*every single case class*must now be updated in addition to every`match`

call, due to the fact that the`match`

method mentions all of the possible cases.The Java language specification limits methods to

`256`

parameters. This effectively limits algebraic types to`256`

cases or fewer. Whether or not this is likely to be a problem is debatable.

This is not the

*only*way to perform pattern matching in the Haskell language, but for the sake of example, we assume that it is.↩An absolutely critical talk for any programmer interested in producing code that is

*correct*: https://blogs.janestreet.com/effective-ml-video/↩Consider what would happen if every day control structures such as conditionals and loops required creating new objects on the heap just to function...↩

Consider what would happen if every day control structures such as conditionals and loops required creating new objects on the heap just to function...↩

Note that the left hand side of each "match branch" could be reduced further (at the risk of making the code potentially incomprehensible) due to type inference for lambda parameters.↩

"From Lambdas to Bytecode", Brian Goetz: http://wiki.jvmlangsummit.com/images/1/1e/2011_Goetz_Lambda.pdf↩

Around 400 lines: https://gist.github.com/jbgi/3904e696fb27a2e33ae1↩

A reasonably readable description of the

`Either`

monad (amongst others) is given here: http://book.realworldhaskell.org/read/error-handling.html↩"From Lambdas to Bytecode", Brian Goetz: http://wiki.jvmlangsummit.com/images/1/1e/2011_Goetz_Lambda.pdf↩