io7m | single-page | multi-page | archive (zip, signature)
4. ShapesA Crash Course In Algebraic Types6. Arithmetic expression interpreter
PreviousUpNext

The Type Zoo
Overview
As stated previously, the combination of algebraic data types and pattern matching is extremely powerful. Whereas languages that lack them have to keep various type system features completely distinct (in both concept and implementation), the language of algebraic types is expressive enough to support a wide range of structures with little or no "magic". Many structures that require explicit language support in other systems can simply be expressed in terms of algebraic types (possibly with extra syntax support).
Unit
The Unit type is the simplest possible algebraic type. It has one data constructor that takes no values. In impure languages like OCaml, the Unit type is often given as the return value of functions that are only of interest for their side effects, such as print_string (which is of type string -> unit, and simply prints its argument to standard out and returns unit). In Haskell, the Unit type is called () and has a single data constructor, also called () [4].
Boolean
The Boolean type is an algebraic type with two data constructors:
module Boolean where

data Boolean
  = True
  | False
  deriving Show
This definition of Boolean type allows languages to implement the usual if .. then .. else construct in terms of pattern matching, simplifying the language definition and semantics.
Enumeration
As may or may not be obvious, it's possible to achieve the normal enumeration types present in other languages. The data constructors of the type are simply declared as not taking any values:
module Enumeration where

data Color
  = Red
  | Blue
  | Green
  | Yellow
  deriving Show
Option
The Option type is the first polymorphic type that has been described so far. The definition of Option is as follows:
module Option where

data Option a
  = Some a
  | None
  deriving Show
The definition introduces a type Option, which is parameterized by a type variable arbitrarily named a. It defines two data constructors, Some and None, the first of which takes a value of type a. The type encodes the concept of an optional value: The programmer is required to pattern match on values of type Option to determine whether or not a value is present.
module OptionPresent where

import Option

present :: Option a -> IO ()
present o =
  case o of
    Some _ -> print "present"
    None   -> print "not present"
Typically, most programs represent the above as a nullable reference type. This leads to endless bugs involving accidentally dereferencing null references. Programs written using Option types are immune to this class of bugs, as it's statically impossible to get access to a nonexistent value.
Note that as Option is polymorphic, it may be instantiated with any other type, including other Option types:
*Option> :type None
None :: Option a

*Option> :type Some
Some :: a -> Option a

*Option> :type Some True
Some True :: Option Bool

*Option> :type Some (23 :: Integer)
Some (23 :: Integer) :: Option Integer

*Option> :type Some (Some (23 :: Integer))
Some (Some (23 :: Integer)) :: Option (Option Integer)

*Option> :type Some None
Some None :: Option (Option a)
Choice
The Choice type is a logical continuation of the Option type. It represents a choice between values of two types:
module Choice where

data Choice a b
  = ChoiceLeft a
  | ChoiceRight b
  deriving Show
As with Option, it may be instantiated with any other types:
*Choice> :type ChoiceLeft
ChoiceLeft :: a -> Choice a b

*Choice> :type ChoiceRight
ChoiceRight :: b -> Choice a b

*Choice> :type ChoiceLeft True
ChoiceLeft True :: Choice Bool b

*Choice> :type ChoiceRight True
ChoiceRight True :: Choice a Bool
Types isomorphic to the Choice type are often used to represent computations that may succeed or fail: On success, a value of type b is returned, and on failure, a value of of type a is returned (usually some sort of error code value) [5].
Pairs
Where the Choice type represents a choice between two possible values, the Pair type represents a structure where values of two (possibly different) types will be present simultaneously. The Pair type is equivalent to the tuple type in other languages:
module Pair where

data Pair a b =
  MakePair a b
  deriving Show
Naturally, Pair types can be nested, forming arbitrarily complex structures:
*Pair> :type MakePair 
MakePair :: a -> b -> Pair a b

*Pair> :type MakePair True
MakePair True :: b -> Pair Bool b

*Pair> :type MakePair True (23 :: Integer)
MakePair True (23 :: Integer) :: Pair Bool Integer

*Pair> :type MakePair (MakePair (23 :: Integer) True) (23 :: Integer)
MakePair (MakePair (23 :: Integer) True) (23 :: Integer)
  :: Pair (Pair Integer Bool) Integer

*Pair> :type MakePair (23 :: Integer) (MakePair (MakePair False True) (MakePair True False))
MakePair (23 :: Integer) (MakePair (MakePair False True) (MakePair True False))
  :: Pair Integer (Pair (Pair Bool Bool) (Pair Bool Bool))
Most languages have shorthand syntax for constructing values of some standard library pair type: "(x, y)".
Naturals (inductive)
Many functional languages (particularly those that act as proof assistants) use the following definition for natural numbers:
module NaturalInd where

data Natural
  = Z
  | S Natural
  deriving Show
This definition is taken directly from axioms 1 and 6 of the Peano axioms, and is probably the first type encountered by programmers upon being introduced to most proof assistants. The type is the first recursive type described so far; that is, a type that refers to itself in its own definition.
*NaturalInd> :type Z
Z :: Natural

*NaturalInd> :type S
S :: Natural -> Natural

-- One
*NaturalInd> :type S Z
S Z :: Natural

-- Two
*NaturalInd> :type S (S Z)
S (S Z) :: Natural

-- Three
*NaturalInd> :type S (S (S Z))
S (S (S Z)) :: Natural
It's also possible to copy the definition of, for example, plus, almost directly:
module NaturalIndPlus where

import NaturalInd

plus :: Natural -> Natural -> Natural
plus x y =
  case (x, y) of
    (n, Z)   -> n
    (n, S m) -> S (plus n m)

-- 0 + 0 = 0
ghci> plus Z Z
Z

-- 0 + 1 = 1
ghci> plus Z (S Z)
S Z

-- 1 + 1 = 2
ghci> plus (S Z) (S Z)
S (S Z)

-- 2 + 2 = 4
ghci> plus (S (S Z)) (S (S Z))
S (S (S (S Z)))
Lists
The List type is a recursive type found in just about every functional language in existence. A List is either empty or an element of some type prepended to another list of elements of that type. By this definition, lists may contain elements of any type, but each element of the list has the same type.
module List where

data List a
  = Empty
  | Cell a (List a)
  deriving Show
Constructing values of type List works as expected:
*List> :type Empty
Empty :: List a

*List> :type Cell
Cell :: a -> List a -> List a

*List> :type Cell True
Cell True :: List Bool -> List Bool

*List> :type Cell True Empty
Cell True Empty :: List Bool

*List> :type Cell True (Cell False Empty)
Cell True (Cell False Empty) :: List Bool

-- A list of lists!
*List> :type Cell (Cell True Empty) Empty
Cell (Cell True Empty) Empty :: List (List Bool)
Typically, the programmer writes recursive functions to manipulate values of recursive types. Using pattern matching, it's possible to write a function such as length in a couple of lines:
module ListLength where

list_length :: List a -> Integer
list_length list =
  case list of
    Null        -> 0
    Cell _ rest -> 1 + (list_length rest)
As with the Pair type, Most languages have shorthand syntax for constructing and matching on lists: "[x, y, ...]".
Trees
The definition of a binary tree type is similarly simple (and is both polymorphic and recursive):
module BinaryTree where

data BTree a
  = Leaf
  | Tree (BTree a) a (BTree a)
  deriving Show
Manipulating values of these types should be unsurprising by this point:
*BinaryTree> :type Leaf
Leaf :: BTree a

*BinaryTree> :type Tree Leaf
Tree Leaf :: a -> BTree a -> BTree a

*BinaryTree> :type Tree Leaf True
Tree Leaf True :: BTree Bool -> BTree Bool

*BinaryTree> :type Tree Leaf True Leaf
Tree Leaf True Leaf :: BTree Bool

*BinaryTree> :type Tree Leaf True (Tree Leaf True Leaf)
Tree Leaf True (Tree Leaf True Leaf) :: BTree Bool
Note that, as demonstrated, it's obviously possible to build binary trees that are unbalanced. In order to preserve invariants such as "the tree must be balanced", it's often necessary to hide the data constructors of the given type and then use the module system of the language in question to expose functions that manipulate values of the tree types internally [6]. As an example, the Haskell standard library offers high performance maps and sets based on red-black trees. The programmer is not exposed to the internal construction of the trees that back the Map and Set types, but internally they are composed of nothing more than the algebraic data types described so far.
The BinaryTree type, as with all the types discussed so far, is an example of a persistent data structure. Persistent data structures offer rather large benefits with regards to both simplification of and confidence in the correctness of the code that deals with them compared to their mutable counterparts (particularly in heavily concurrent/parallel programs). See Purely functional data structures by Chris Okasaki, Cambridge University Press, 1998, ISBN 0-521-66350-4.
Hiding
As mentioned in the previous section, there may sometimes be additional restrictions on the creation of values. For example, if the programmer wants to define a type representing the set of integers greater than or equal to 0, it can't be achieved directly. Most languages have multiple ways of dealing with this problem. In Haskell, the simplest way to achieve this is to hide the data constructors of the declared type, and provide one or more functions that enforce the desired invariants:
module Natural (
  Natural,
  make_natural,
  from_natural
) where

import Option

data Natural =
  MakeNatural Integer
  deriving Show

make_natural :: Integer -> Option Natural
make_natural x =
  if x >= 0
  then Some (MakeNatural x)
  else None

from_natural :: Natural -> Integer
from_natural n =
  case n of
    MakeNatural m -> m
The module declaration exports the Natural type, but does not export its constructor, MakeNatural. This effectively means that it's not possible for the programmer to construct values of type Natural directly! However, the module also exports a function make_natural which takes an integer and returns an optional value based on whether the integer falls within the desired range. It also exports a convenient function, from_natural, which "unwraps" a Natural value and returns the Integer contained within [7].
Essentially, this is a technique to make a given type abstract. That is, the structure of the type essentially becomes private. Languages such as OCaml allow types to be made abstract using module signatures. Java and Ada have the private keyword for controlling visibility of (various parts of) types.
Arrays
Arrays do not have an inductive structure, unlike all of the data structures described so far, and are therefore usually provided by the programming language's standard library as abstract types with associated get and set functions for accessing elements. They are mentioned here for completeness.

[4]
Apparently, to purposefully confuse new users of the language.
[5]
See Lightweight static exceptions for an exploration of this.
[6]
Languages with dependent types allow the expression of types that statically guarantee that the resulting trees will be balanced. See Certified programming with dependent types for some impressive examples.
[7]
This function is optimized away by the compiler. In fact, the MakeNatural constructor would be optimized away too, leaving a plain Integer at run-time, if not for the unfortunate presence of lazy evaluation in Haskell. Other languages do not share this deficiency.

PreviousUpNext
4. ShapesA Crash Course In Algebraic Types6. Arithmetic expression interpreter