Why?

Overview

In this example, a small arithmetic expression interpreter is developed. This
example highlights the problems caused by the lack of algebraic data types
and pattern matching, and also highlights the various complicated language
features and methodologies that arise to deal with a lack of them.

Specification

An arithmetic expression is one of the following:

- An integer constant.
- The sum of two arithmetic expressions.
- The product of two arithmetic expressions.
- The difference of two arithmetic expressions.

Evaluating an arithmetic expression requires the following steps:

- If the expression is a constant, return the constant.
- If the expression is an addition, evaluate the two subexpressions and then return the sum of their results.
- If the expression is a multiplication, evaluate the two subexpressions and then return the product of their results.
- If the expression is a subtraction, evaluate the two subexpressions and then return the difference of their results.

Implementation

The simplest (and possibly only) way to represent the type of an arithmetic
expression in Java is as a set of classes extending an abstract base class. The
ArithmeticExpression type demonstrates this:

package com.io7m.example.ccatpm; abstract class ArithmeticExpression { /** * An integer constant. */ public final static class ConstantExpression extends ArithmeticExpression { private final int value; @SuppressWarnings("synthetic-access") ConstantExpression( final int value) { super(ExpressionType.EXP_CONSTANT); this.value = value; } public final int getValue() { return this.value; } } static enum ExpressionType { EXP_CONSTANT, EXP_PLUS, EXP_MULTIPLY, EXP_SUBTRACT } /** * The product of two arithmetic expressions. */ public final static class MultiplyExpression extends ArithmeticExpression { private final ArithmeticExpression e_left; private final ArithmeticExpression e_right; @SuppressWarnings("synthetic-access") MultiplyExpression( final ArithmeticExpression e_left, final ArithmeticExpression e_right) { super(ExpressionType.EXP_MULTIPLY); this.e_left = e_left; this.e_right = e_right; } public final ArithmeticExpression getLeft() { return this.e_left; } public final ArithmeticExpression getRight() { return this.e_right; } } /** * The sum of two arithmetic expressions. */ public final static class PlusExpression extends ArithmeticExpression { private final ArithmeticExpression e_left; private final ArithmeticExpression e_right; @SuppressWarnings("synthetic-access") PlusExpression( final ArithmeticExpression e_left, final ArithmeticExpression e_right) { super(ExpressionType.EXP_PLUS); this.e_left = e_left; this.e_right = e_right; } public final ArithmeticExpression getLeft() { return this.e_left; } public final ArithmeticExpression getRight() { return this.e_right; } } /** * The difference of two arithmetic expressions. */ public final static class SubtractExpression extends ArithmeticExpression { private final ArithmeticExpression e_left; private final ArithmeticExpression e_right; @SuppressWarnings("synthetic-access") SubtractExpression( final ArithmeticExpression e_left, final ArithmeticExpression e_right) { super(ExpressionType.EXP_SUBTRACT); this.e_left = e_left; this.e_right = e_right; } public final ArithmeticExpression getLeft() { return this.e_left; } public final ArithmeticExpression getRight() { return this.e_right; } } private final ExpressionType type; private ArithmeticExpression( final ExpressionType type) { this.type = type; } public final ExpressionType getType() { return this.type; } }

This example shows a number of immediate problems:

- There is a fixed set of types of expressions that will be evaluated, but the language cannot know in advance how many classes will really extend the ArithmeticExpression class. Essentially, the type wants to be "closed", but the language requires that it be "open". The best that can be done is to make the single constructor of the class private, and the class itself have default visibility (in other words, only visible to classes in the same package).
- Each subclass of ArithmeticExpression is required to carry around a value of type ExpressionType in order to distinguish the values of the classes later. It's possible to use the instanceof keyword to distinguish classes, but this obviously cannot be used in switch statement, meaning that the compiler cannot help the programmer to ensure that all cases are covered. It's also possible for the programmer that specifies the value of ExpressionType for each class to accidentally re-use values, leading to hard-to-locate bugs later.
- Java has nullable references. Everywhere a non-primitive type appears is a possible source of runtime null pointer exceptions.
- The definitions are annoyingly verbose: The code is already over 100 lines from a four-line specification. It's not obvious, without a lot of tedious scanning, that the type even represents the original specification at all.
- It's possible to create circular structures (by accidentally assigning the wrong values in the constructors of arithmetic expressions). Functions written to work with expressions that don't assume that expressions can be circular (as most won't) will run forever.

The evaluation function is as follows:

package com.io7m.example.ccatpm; import com.io7m.example.ccatpm.ArithmeticExpression.ConstantExpression; import com.io7m.example.ccatpm.ArithmeticExpression.MultiplyExpression; import com.io7m.example.ccatpm.ArithmeticExpression.PlusExpression; import com.io7m.example.ccatpm.ArithmeticExpression.SubtractExpression; public final class Interpreter { public static int run( final ArithmeticExpression expr) { switch (expr.getType()) { case EXP_CONSTANT: { final ConstantExpression actual = (ConstantExpression) expr; return actual.getValue(); } case EXP_MULTIPLY: { final MultiplyExpression actual = (MultiplyExpression) expr; final int left = Interpreter.run(actual.getLeft()); final int right = Interpreter.run(actual.getRight()); return left * right; } case EXP_PLUS: { final PlusExpression actual = (PlusExpression) expr; final int left = Interpreter.run(actual.getLeft()); final int right = Interpreter.run(actual.getRight()); return left + right; } case EXP_SUBTRACT: { final SubtractExpression actual = (SubtractExpression) expr; final int left = Interpreter.run(actual.getLeft()); final int right = Interpreter.run(actual.getRight()); return left - right; } default: throw new AssertionError("unreachable!"); } } private Interpreter() { } }

Again, there are obvious problems:

- The code contains one dangerous cast per expression type. The language cannot be informed of the formal link between the enumeration value and the associated class.
- The code assumes expressions, or the contents of expressions, won't be null.
- The Java compiler can't determine that all cases of the enumeration type are present. It requires an unreachable default clause.

Visitor Implementation

A second attempt at the implementation, using the
visitor pattern,
is even larger than the original, and borders on obfuscated with regards to the
flow of execution. First, the interface that expressions must implement:

package com.io7m.example.ccatpm.visitor; interface Expression { int accept(ExpressionVisitor visitor); }

package com.io7m.example.ccatpm.visitor; abstract class Binary { private final Expression left; private final Expression right; public Binary( final Expression left, final Expression right) { this.left = left; this.right = right; } public final Expression getLeft() { return this.left; } public final Expression getRight() { return this.right; } }

Then, the types of each arithmetic expression case:

package com.io7m.example.ccatpm.visitor; final class Constant implements Expression { private final int value; public Constant( final int value) { this.value = value; } public int getValue() { return this.value; } @Override public int accept( final ExpressionVisitor visitor) { return visitor.visit(this); } }

package com.io7m.example.ccatpm.visitor; final class Add extends Binary implements Expression { public Add( final Expression left, final Expression right) { super(left, right); } @Override public int accept( final ExpressionVisitor visitor) { return visitor.visit(this); } }

package com.io7m.example.ccatpm.visitor; final class Multiply extends Binary implements Expression { public Multiply( final Expression left, final Expression right) { super(left, right); } @Override public int accept( final ExpressionVisitor visitor) { return visitor.visit(this); } }

package com.io7m.example.ccatpm.visitor; final class Subtract extends Binary implements Expression { public Subtract( final Expression left, final Expression right) { super(left, right); } @Override public int accept( final ExpressionVisitor visitor) { return visitor.visit(this); } }

Then, the type of arithmetic expression visitors:

package com.io7m.example.ccatpm.visitor; interface ExpressionVisitor { int visit(Add add); int visit(Constant constant); int visit(Multiply multiply); int visit(Subtract subtract); }

Then, finally, the evaluation function:

package com.io7m.example.ccatpm.visitor; public final class Interpreter { public static int evaluate( final Expression expression) { return expression.accept(new ExpressionVisitor() { @Override public int visit( final Add add) { return Interpreter.evaluate(add.getLeft()) + Interpreter.evaluate(add.getRight()); } @Override public int visit( final Constant constant) { return constant.getValue(); } @Override public int visit( final Multiply multiply) { return Interpreter.evaluate(multiply.getLeft()) * Interpreter.evaluate(multiply.getRight()); } @Override public int visit( final Subtract subtract) { return Interpreter.evaluate(subtract.getLeft()) - Interpreter.evaluate(subtract.getRight()); } }); } }

The implementation still has problems:

- The flow of execution is obfuscated. The maintainer of this code has to be familiar with the visitor pattern in order to understand it, as opposed to just being familiar with ordinary recursion.
- Java has nullable references. Everywhere a non-primitive type appears is a possible source of runtime null pointer exceptions.
- The definitions are even more verbose: The code is almost 150 lines. There is a lot of boilerplate code; each expression type has to implement an identical visit function. In a more complex example (such as the abstract syntax tree of a programming language), this quickly becomes excruciating.
- This implementation of the visitor pattern only allows the programmer to write functions that return int. Writing something more general requires even more complicated classes, and possibly even generics.

At the end of this document, an alternative implementation will be given
using algebraic types and pattern matching that avoids all of the problems of
both implementations, is less than a tenth of the size, and gives strong
assurances of safety and correctness.

[0]

Used to cut down on even more boilerplate.