Recreating the Haskell List Part 2: Functors

This is part 2 of a 6 part tutorial

Functors

How can we modify or transform a value or values that are contained in a data structure such as Container? Let’s say we have a Container holding the integer 4, and we want to add 1 to it. The problem is that a Container doesn’t have addition defined for it, and really, it shouldn’t considering that any possible type could be stored inside it, any number of which have no meaningful way to respond to addition.

Let’s look at a similar situation in C++:

template <class T>
class Container {
   T _value;
   public:
   Container(T value) : _value(value) { }
   T get_value() {
       return _value;
   }
   void set_value(T value){
       _value = value;
   }
};
Container<int> container = Container<int>(4);
//  Of course, you can't do this to transform the value inside
container = container + 1
//  You must remove it, apply the transformation and put it back
int x = container.get_value();
x = x + 1
container.set_value(x);

The extreme generality of this C++ class means that it would be a mistake to define operator+ on it, as any number of types T also cannot meaningfully respond to addition. Now we find ourselves in a similar situation in Haskell:

let container = Container 4
let container' = container + 1
<interactive>:3:28:
    No instance for (Num (Container Integer))
        arising from a use of `+'
    Possible fix:
        add an instance declaration for (Num (Container Integer))
    In the expression: container + 1
    In an equation for container': container' = container + 1

Haskell has an elegant solution to this problem called a Functor. Ghci is able to print out the full interface for any typeclass with the :info command so let’s see what interface is required for a data type to become an instance of the Functor typeclass:

Using ghci’s :info command

:info Functor
=> class Functor f where
      fmap :: (a -> b) -> f a -> f b

This says there is a class named Functor, its instances will be referred to as f. Instances must define a function called fmap. The first parameter of fmap, is (a -> b). This means the first parameter is a function which accepts something of type a, and returns something of a possibly different type b.

The second parameter f a is a functor (such as Container) wrapping a type a. The last thing in a type signature is the return value. It returns f b, which means it returns the same sort of datatype, (Such as Container) wrapping something of type b.

That can be somewhat difficult to follow at first, but what it is essentially asking you to implement is a way to take the data out of the data structure, apply a transformation function to it, and then put it back in the data structure. Let’s make Container an instance of Functor.

Making Container an Instance of Functor

instance Functor Container where
    fmap f (Container contents) = Container (f contents)
let container = Container 4
container
=> Container 4
fmap (\x -> x + 1) container
=> Container 5

In the function fmap above, on the left hand side of the declaration, we are using Haskell’s pattern matching feature to deconstruct the container and remove the value from it. When called fmap f (Container contents) = binds the function (\x -> x + 1) to f, and in this case the integer 4 to contents.

This is because during pattern matching, the data constructor function Container acts to deconstruct the data type into its components. Later we’ll see data types that contain more than one value and see that it can be used to access any number of data members. Haskell is all about composite data structures and wrapping and unwrapping the components inside them to do work.

On the right hand side of fmap the Container data constructor is again used to wrap up this value, but it is first transformed by the function f. We get the effect of being able to send any function inside the container to be applied to its inner value.

You might be wondering, why wrap this integer in a data structure at all if it just makes it annoying to work on it? The answer is that unless you want to isolate a value or restrict operations that can be performed on it, you probably wouldn’t want to do this if it only held one value. This type of isolation is used in Haskell to separate functions that work with IO and side effects from pure functional code. The IO Monad hides its data constructor so that you cannot create anything of type IO in ‘pure’ code, and you can’t deconstruct an IO and get its values out. This causes you to always need to work with ‘impure’ IO stuff by sending functions into or declaring them inside the IO container, and also serves as a marker for impurity.

The Maybe Monad is another type that can still do something interesting while wrapping only one value, but let’s see an example of a functor working on multiple values first, a list!

A Linked List as a Functor

The native list type in Haskell is a linked list, and it is also a functor. Let’s reimplement it from scratch so that we can see how it works. In functional languages a linked list is often called a Cons List, and is a recursive data structure formed by cons cells in which each cell contains two elements, the first element is called the head or car, and is one value in the list, and the second is called the tail or cdr (pronounced kooder) and is another Cons Cell, which in turn holds one value and another cons cell and so on.

The terms cons, car, and cdr come from Lisp, and they are the three main functions used in that language to work with lists. Cons constructs a list, car returns the head of the list, and cdr returns the tail of the list.
In haskell, the car and cdr functions for lists are actually just named head and tail, but I am using the Lisp-named version here to avoid us getting confused by head and tail already defined in the ghci Prelude’s namespace.

A Cons List

data MyList a =
    Empty |
    Cons {
        car :: a,
        cdr :: MyList a
    } deriving(Show)

Above we define a type MyList to hold a parameterized “variable” type a. This time you can see that there are two data constructors, and that they don’t have to have the same name as the type itself as we chose in the definition of Container. Empty is the data constructor for constructing an empty list, and Cons is a data constructor of two arguments. This could have been written data MyList a = Empty | Cons a (MyList a) deriving(Show) but what we’ve used here is called record syntax.

You can see the first argument to Cons is something of type a, and the second is of type MyList a, record syntax gives names to each argument, and also provides accessor functions by the same name to get at each data member. This is a recursive data structure because MyList itelf is used on the right hand side of the definition. Let’s play with this for a moment, and construct some lists.

--  An empty list
Empty
=> Empty
--  A list with one item in it
Cons 1 Empty
=> Cons {car = 1, cdr = Empty}
--  A list with three items in it
let list = Cons 1 (Cons 2 (Cons 3 Empty))
list
=> Cons {car = 1, cdr = Cons {car = 2, cdr = Cons {car = 3, cdr = Empty}}}
--  Using the functions provided by record syntax
car list
=> 1
cdr list
=> Cons {car = 2, cdr = Cons {car = 3, cdr = Empty}}
car (cdr list)
=> 2

A similar C++ class might look like this:

Quick C++ linked list

template <class T>
class Cons {
    T _car;
    Cons *_cdr;
    public:
    Cons(T car, Cons *cdr) :
        _car(car),
        _cdr(cdr) {}
    void show(){
        cout << _car << " ";
        if(_cdr){
            _cdr->show();
        }
    }
};
typedef Cons<int> ConsInt;
ConsInt list = ConsInt(1, new ConsInt(2, new ConsInt(3, NULL)));
list.show();
=> 1 2 3

Haskell itself has a builtin list data type with some syntactic sugar. The haskell equivalent of the Cons data constructor is :, the Empty data constructor is [], while car and cdr are head and tail. Haskell lists also have a pretty show function defined for them.

Built in List type

--  An empty list
[]
=> []
--  A list with one item in it
1 : []
=> [1]
--  Syntactic Sugar
[1]
=> [1]
--  A list with three items in it
let list = [1, 2, 3]
list
=> [1, 2, 3]
--  Using the functions provided List
head list
=> 1
tail list
=> [2, 3]
head (tail list)
=> 2

So MyList is now a perfectly functioning linked list, but one thing it is missing over the builtin list type is the fact that it is not yet an instance of Functor, but we can fix that.

Before, with Container the idea behind fmap was to unwrap a value, apply a given function to it and wrap it back up again, and here the only difference is that there are two data members in a cons cell; the value, and another cons cell. The recursive nature of a cons list gives us a clue that fmap will also be recursive. We’re going to apply the given function to the value in the cell, and then call fmap again on the cons cell containing the rest of the list so that each of its values will be transformed as well.

MyList becomes a Functor

instance Functor MyList where
    fmap _ Empty = Empty
    fmap f (Cons x xs) = Cons (f x) (fmap f xs)

Here fmap is defined twice. Again we can see data constructors on the left hand side being used to deconstruct our list for pattern matching. If we try to fmap an empty list the first declaration is matched and chosen, the function argument itself is thrown away and and Empty list is constructed and returned. Fmapping an empty list is an empty list.

If a cons cell is matched, the data constructor Cons is used to deconstruct the cell and bind its two values to x and xs. xs as in the plural of x What this does is separates the head of the list from the rest of it, you perform an operation on the head and send the rest for processing by recalling the function recursively. This is a common pattern you’ll see when writing recursive functions.

The base case is encountering the Empty cell at the end of the list, which stops the recursion. The function only knows how to handle one element at a time, and relegates the rest of the work to itself during future calls, and so on until it meets the base case and stops.

Transforming a whole list

let list = Cons 1 (Cons 2 (Cons 3 Empty))
fmap (\x -> x * 2) list
=> Cons {car = 2, cdr = Cons {car = 4, cdr = Cons {car = 6, cdr = Empty}}}
-- Same thing with the builtin list
fmap (\x -> x * 2) [1, 2, 3]
=> [2,4,6]

You’ve probably seen this before in other non-functional languages since many parts of the functional paradigm are being adopted in imperative languages all the time. Ruby and Python both define map on their list types. Ruby passes a code block into its map method and that looks like this:

Ruby’s map

[1, 2, 3].map do |x|
  x * 2
end
=> [2, 4, 6]