When You Should Use Lists in Haskell (Mostly, You Should Not)

There is a saying to the effect of

First get your data structures right, then the program will write itself.

The following rant is about the question: When are (singly linked) lists the right structure to use? My answer is: almost never, despite appearances that would suggest the contrary.

At first glance, the designers of the programming language Haskell were madly in love with singly linked lists. They even granted notational privileges: e.g., the data constructor for a non-empty list ("cons") uses just one letter (:)

Is one concrete data type (singly linked lists) more important than the concept of static typing? Looking at the Haskell standard, the answer seems to be Yes. When declaring the type of an identifier, we must use two letters ::, because the natural (that is, mathematical) notation : is taken.

Also, note that the list type constructor has just two letters ([ ]). Can you guess Haskell's type with the next shortest name (hint: three letters)?

One goal of language design is to make typical expected usage easy to write. Once the design is cast in stone, this becomes self-fulfilling: unsuspecting users of the language will write programs in what they think is idiomatic style, as suggested by built-in syntactical convenience. So it seems that it is typical Haskell to: not declare types for identifiers, and use machine numbers and lists all over the place.

This, with the lists, I mean, needs to stop. Now.

Lists do have their purpose, but it's rather limited. In fact, there is just one:

iteration: you use each element of a collection in order, on demand, and at most once.

If you use lists for anything else, in particular

storage: you use elements out of order, e.g., by known index, or you access ("search") elements by a property,

then you're doing it wrong. I will explain why, and what you should do instead. This turned out to grow into a rather large text. As an appetizer, here are some bumper-sticker slogans extracted from the text.

If your program accesses a list by index (with the !! operator), then your program is wrong.
If your program uses the length function, then your program is wrong.
If your program sorts a list, then your program is wrong.
If you wrote this sort function yourself, then it is doubly wrong.
The ideal use of a list is such that will be removed by the compiler.
The enlightened programmer writes list-free code with Foldable.

Sounds unbelievable? Read on!

Where do all these lists come from?

Certainly Haskell builds on the legacy of LISP, the very first functional programming language - that that has lists in its name already. Indeed, singly linked lists were the only way to structure nested data in LISP.

This is a huge advantage (it's soo uniform), but also a huge dis-advantage (you cannot distinguish data - they all look like nested lists). For discriminating between different shapes of nested data, people then invented algebraic data types, pattern matching, and static typing. The flip side of static typing is that it seems to hinder uniform data processing. So, language designers invented polymorphism (type constructors) and generic programming, to get back flexibility and uniformity, while keeping static safety.

This was all well known at the time of Haskell's design, so why are lists that prominent in the language? The one major design goal for Haskell was: to define a functional programming language with static typing and lazy evaluation. For functional programming with static typing and strict evaluation, the language ML was already well established, cf. [D. A. Turner: Some History of Functional Programming Languages] (https://www.cs.kent.ac.uk/people/staff/dat/tfp12/tfp12.pdf). Haskell has lazy evaluation for function calls, including constructor applications, and a prime showcase for this is that we can handle apparently infinite data, like streams:

data Stream a = Cons a (Stream a)

Such streams do never end, but we can still do reasonable things with them, by evaluating finite prefixes on demand, in other words, lazily.

Now a standard Haskell list is just such a stream that can end, because there is an extra constructor for the empty stream

data Stream a = Nil | Cons a (Stream a)

and the problem with Haskell lists is that people tend to confuse what the type [ a ] stands for:

is it: lazy infinite stream (yes)
or is it: finite list (no).

If you have an object-oriented background, then this can be phrased as the confusion

between an iterator (that allows to traverse data)
and its underlying collection (that holds the actual data).

A Haskell list is a very good iterator - we even might say, it is the essence of iteration, but it makes for a very bad implementation of a collection.

And where are these lists today?

The (historic) first impression of "functional programming is inherently list processing" seems hard to erase. Indeed it is perpetuated by some Haskell teaching texts, some of which are popular. If a text(book) aimed at beginners presents Int, String and [Int] on pages 1, 2, 3; but keywords data and case (and assorted concepts of algebraic data type and pattern matching) appear only after the first two hundred pages, well, then something's wrong.

For example, the shiny new http://haskell.org/ gets this exactly wrong. At the very top of the page, it presents example code that is supposed to highlight typical features of the language. We see a version of Eratosthenes' sieve that generates an infinite list of primes. Declarative it certainly is. But see - Lists, numbers. Must be typical Haskell.

Also, that example code features list comprehensions - that's more syntactic sugar to misguide beginners. And it claims "statically typed code" in the previous line, but - there is no type annotation in the code. Yes, I know, the types are still there, because they are inferred by the compiler, but what does a beginner think?

The icing on all this is that the one thing that this example code really exemplifies, namely: infinite streams with lazy evaluation, is not mentioned anywhere in the vincinity.

What Lists Are (Not) Good At

The answer is simple: never use lists for data storage with out-of-order access.

Accessing an element of a singly linked list at some arbitrary position i takes time proportional to i, and you do not want that.

The only operation that's cheap with a singly linked list is accessing its first element element (the head of list), and prepending a fresh first element.

So can we access any of the other elements in any reasonable way? Yes, we can access the second element, but only after we drop the first one. The second element becomes the "new head", and the "old head" is then gone for good. We are singly linked, we cannot go back. If we want to access any element in a list, we have to access, and then throw away, all the elements to the left of it.

In other words, Haskell lists are very similar to what object oriented people call the iterator pattern (Iterator in Java, IEnumerator in C#).

Or, we can think of a list as a stack, and going to the next element is its "pop" operation.

Can we really never go back? Well, if xs is an iterator for some collection, then x : xs is an iterator that first will provide x, and after that, all elements of xs. So, thinking of a list as a stack, the list constructor (:) realizes the "push" operation.

On the other hand, the "merge" function (of "mergesort" fame, it merges two monotonically increasing sequences) is a good use case for Haskell lists, since both inputs are traversed in order (individually - globally, the two traversals are interleaved). Its source code is

merge :: Ord a => [a] -> [a] -> [a]
merge as@(a:as') bs@(b:bs')
  | a `cmp` b == GT = b:merge as  bs'
  | otherwise       = a:merge as' bs
merge [] bs         = bs
merge as []         = as

We never copy lists, and we only ever pattern match on their beginning.

Indeed you can easily (in Java or C#) write a merge function that takes two iterators for monotone sequences, and produces one. Try it, it's a fun exercise. Use it to learn about yield return in C#.

Back to Haskell: how should we implement merge-sort then, given merge? The truthful answer is: we should not, see below. But for the moment, let's try anyway, as an exercise. A straightforward approach is to split long input lists at (or near) the middle:

msort :: Ord a => [a] -> [a]
msort [] = [] ; msort [x] = [x]
msort xs = let (lo,hi) = splitAt (div (length xs) 2) xs
           in  merge (msort lo) (msort hi)

but this is already wrong. Here, xs is not used in a linear way, but twice: first in length xs (to compute to position of the split), and secondly, in splitAt ... xs, to actually do the split. This is inefficient, since the list is traversed twice.

Indeed, the Haskell standard library contains an implementation of mergesort that does something different. The basic idea is

msort :: Ord a => [a] -> [a]
msort xs = mergeAll (map (\x -> [x]) xs)
  where mergeAll [x] = x
        mergeAll xs  = mergeAll (mergePairs xs)
        mergePairs (a:b:xs) = merge a b: mergePairs xs
        mergePairs xs       = xs

But the actual implementation has an extra provision for handling monotone segments of the input more efficiently. Make sure to read the discussion mentioned in the comments atop the source.

But - if you sort a list in your application at all, then you're probably doing something wrong already. There's a good chance that your data should have never been stored in a list from the start.

How Do We Store Data, Then?

That's described in any textbook on data structures and algorithms, and if it's in any such book, then there's bound to be a Haskell implementation for it. I do not aim to replace textbooks here, so I just give a summary, and pointers to typical implementations, and I recommend you go visit the library for detail. (A library is a house with rooms with books. Yes?)

A data store provides a the abstract type set (just data) or a map (from keys to data). What concrete type to use for the store's implementation, depends on the operations that you want to do with the set or map, as a whole and with single elements.

If you want to access data by key, and you have a total order on the keys, use a balanced search tree (Data.Set, Data.Map). You can expect all operations on individual elements to take logarithmic cost in the number of elements stored, and that's much better than the linear cost you will pay for lists.

Internally, these libraries use size-balanced binary trees:
```
data Map k a  = Bin Size k a (Map k a) (Map k a)
            | Tip
```
Yes, it's as simple as that.

What's more, these libraries provide bulk operations (intersection, union, difference). Use these! This will make your programs more readable and more efficient. Consider this: you could compute the union of (the sets represented by) A and B by inserting into A, each element of B one after another. This already works if B is a list, and is the idea behind java.util.Collections.addAll. But we know more about B - it is represented by a balanced tree. So the library can use the balanced tree structure of A and of B to speed up the process. A sequence of individual inserts will only use the structure of A (and the trees built from it).
For some applications, access to the collection is restricted. For example, in a stack, you will only operate with its top element. We learned earlier that Haskell lists are stacks. An extension of this is a priority queue, which provides quick access to the entry with highest priority (but only to this one), plus arbitrary inserts.
If your keys are (or can be mapped to) integers, then use a Patricia tree (Data.IntMap) or a hash table (Data.HashMap). Then you can expect effectively constant cost for individual operations. But a Patricia tree is still a tree? Yes, but the depth is at most the bit size of the key, and often smaller. Hashing is even better. The actual cost depends on how good your hash function avoids collisions.
If the set of keys forms a contiguous interval of integers, then you want to use them as indices into a contiguous storage area, which is then called an array in some languages, or a vector in a typical Haskell library (Data.Vector).

With that, access operations have no noticeable cost: to get an element by its index, you need one addition (of the index to the base address), and one memory access (which might of course incur a cache miss). There are even some no-cost bulk operations, because of slicing: A slice is a contiguous sub-vector that is denoted simply by using an existing vector with a different base address and length. The internal representation is indeed
```
data Vector a = Vector Int Int (Array a)
```
so a vector consist of (a pointer to) an underlying array that holds actual data, plus an offset (first component) and a length (second). On the other hand, there is no free lunch - update and concatenation of vectors requires fresh allocation (of an area the size of the result) and bulk copies. Looks expensive, but since all areas are contiguous, it might not be too bad in some cases.
What if keys are contiguous integers, and we need efficient access and efficient concatenation? Use Data.Sequence, which gives us logarithmic cost for access and for concatenation, and, as a bonus, constant time access to first and last element of the sequence. How does this work? It's a balanced tree where the root has extra pointers to leftmost and rightmost leaf. The code uses a nice type-level trick:

newtype Seq a = Seq (FingerTree (Elem a))

data FingerTree a = EmptyT | Single a | Deep Int (Digit a) (FingerTree (Node a)) (Digit a)

data Node a = Node2 Int a a | Node3 Int a a a

FingerTree is a non-regular type constructor: notice that that the recursion changes the type argument (from a to Node a). For implementation details, RTFC.
What about strings? Conceptually, a string is a sequence of characters. So, you could use a list, if you access them in order, and in all other cases, it's something like a vector. Recommended libraries are Data.Text (if encodings are an issue), and Data.ByteString (for raw binary data). Both come in two variants: a strict one, where each string is actually one vector, as described above, and a lazy one, where each string is a lazy singly linked list of vectors.

There is much more to be said about the (mis-)use of strings in Haskell. The Haskell standard definition String = [Char] uses lists, so all the problems of lists mentioned above do apply directly, and then some - coming from the implied assumption that characters are basically numbers, which they are not, in today's world of Unicode. Also, the standard prescribes that String be used in parsing (class Read) and printing (class Show). All this makes it a suitable topic for a separate rant, but we'll keep our focus on lists for now.

As an exercise, browse the function signatures and specifications in Data.List and mark those that are misguided and misguiding by the above reasoning.

Hint: a lot of them are. It starts with the index operator

(!!) :: [a] -> Int -> a

Each time you use the !! operator, there's a 99 percent chance that you're using a list in a non-linear, and thus inefficient, way. Check the arguments of xs !! i - if there will be another call xs !! j with the same left argument, then you will be traversing the list xs twice.

As an aside, what about lists in other languages' standard libraries? Well, it depends. For instance, java.util uses the name List not for a concrete datatype, but for an interface. This comes with two main implementations: LinkedList and ArrayList. Now the linked lists are actually doubly linked, which means that their iterator can go back and forth. Other than that, all the reservations about linked lists apply. If you use the iterator of a linked list, then you're fine, if you access it in any other way, then you're not. In particular, access by index slow.

An ArrayList is essentially what we called a vector. Exercise: do they provide efficient slicing? Answer: yes, but not exactly at this type. Use API doc and source to find out. Also, lament the fact that their API docs do not give expected runtime costs (leaving you no choice but to read the source) and praise the authors of the Haskell containers library. They document resource usage so well that it deserves to be linked again.

Is there no use for lists as collections?

There are two exemptions from the rules stated above.

If you are positively certain that your collections have a size bounded by, say, three, then you can choose to implement it in any way you want, because all operations will run in constant time.

Then, education. With lists, you can exemplify pattern matching and recursion (with streams, you exemplify co-recursion). Certainly each student should see and do recursion early on in their first term. But the teacher must make very clear that this is in the same ballpark with exercises of computing Fibonacci numbers, or implementing multiplication and exponentiation for Peano numbers: the actual output of the computation is largely irrelevant. What is relevant are the techniques they teach (algebraic data types, pattern matching, recursion, induction) because these have broad applications. So, indeed, when I teach algebraic data types, I do also use singly linked lists, among other examples (Bool, Maybe, Either, binary trees), but I always write (and have students write)

data List a = Nil | Cons a (List a)

and I avoid Haskell's built-in lists altogether. But, back to "real code".

Lists That Are Not Really There

We learned above that the one legitimate use case for lists is iteration. This is an important one. It's so important that the Glasgow Haskell compiler applies clever program transformations in order to generate efficient machine code for programs that combine iterators (that is, lists) in typical ways.

The most efficient code is the one that is not there. Indeed, several program transformations in the compiler remove intermediate iterators (lists). Consider the expression

sum $ map (\x -> x*x) [1 .. 1000]

contains two iterations: the list [1 .. 1000] (the producer), and another list (of squares, built by map, which is a transformer). Finally, we have sum as a consumer (its output is not a list, but a number).

We note that a transformer is just a consumer (of the input list) coupled to a producer (of the output list). Now the important observation is that we can remove the intermediate list between a producer and the following consumer (where each of them may actually be part of a transformer).

This removal is called fusion. With fusion, lists are used prominently in the source code, but the compiler will transform this into machine code that does not mention lists at all! In particular, the above program will run in constant space. It allocates nothing on the heap, and it needs no garbage collector.

For historic reference and detail, see Phil Wadler: Deforestation: transforming programs to eliminate trees, 1990, Duncan Coutts , Roman Leshchinskiy , Don Stewart: Stream Fusion. From Lists to Streams to Nothing at All (2007).

Lists that are Really Not There

Now I will present a more recent development (in the style of writing libraries for Haskell) that aims to remove lists from the actual source code.

We have, e.g., and [True, False, True] == False, and you would think that and has type [Bool] -> Bool (it maps a list of Booleans to a Boolean). But no, the type is and :: Foldable t => t Bool -> Bool, and indeed we can apply and to any object of a type t a where t implements Foldable.

As a first approximation, Foldable t means that t is a type constructor (it has kind * -> *) and there must be a function toList :: t a -> [a]. So, c of type t a with Foldable t can be thought of as some kind of collection, and we can get its elements by calling toList. This is similar to Iterable<A> c, and toList (c) is c.iterator().

For example, Data.Set (mentioned above) does this, so we can make a set s = S.fromList [False, True] and then write and s. It will produce the same result as and (toList s) where this time, and :: [Bool] -> Bool.

Why all this? Actually, this is not about toList. Quite the contrary - this Foldable class exists to make it easier to avoid lists. Typically, we would not implement or call the toList function, but use foldMap or foldr instead.

We give another example. Ancient standard libraries used to have a function sum :: Num a => [a] -> a but now its actual type is sum :: (Num a, Foldable t) => t a -> a, and so we can take the sum of all elements in a set s :: Set Int simply via sum s, instead of writing sum (toList s). See - the lists are gone from the source code.

Let us look at the type of foldMap, and the class Monoid that it uses. (And, if you're new to Haskell, remember that class here is more like interface in Java - it contains method signatures)

class Foldable t where
    foldMap :: Monoid m => (a -> m) -> t a -> m

class Monoid m where
    mempty :: m
    mappend :: m -> m -> m

This is fancy syntax for the following (slightly re-arranged)

foldMap :: m -> (a -> m) -> (m -> m -> m) -> t a -> m

So when you call foldMap over some collection (of type t a), then you have to supply the result value (mempty :: m) for the empty collection, the mapping (of type a -> m) to be applied to individual elements, and a binary operation (mappend :: m -> m -> m) to be used to combine results for sub-collections.

Let us see how this is used to sum over Data.Set, the type mentioned above, repeated here for convenience:

data Set a = Bin Size a (Set a) (Set a)
          | Tip

instance Foldable Set where
  foldMap f t = go t
    where go Tip = mempty
          go (Bin 1 k _ _) = f k
          go (Bin _ k l r) = go l `mappend` (f k `mappend` go r)

This means that foldMap f s, for s :: Set a of size larger than one, does the following: each branch node n is replaced by two calls of mappend that combine results of recursive calls in subtrees l and r of n, and f k for the key k of n, and we make sure to traverse the left subtree, then the key at the root, then the right subtree.

The definition of sum is

class Foldable t where
  sum :: Num a => t a -> a
  sum = getSum #. foldMap Sum

with the following auxiliary types

newtype Sum a = Sum { getSum :: a }
instance Num a => Monoid (Sum a) where
    mempty = Sum 0
    mappend = Sum x `mappend` Sum y = Sum (x + y)

So, when we take the sum of some Set a, we do a computation that adds up all elements of the set, but without an explicit iterator - there is no list in sight.

Now, when the types for sum, and, and many more functions were generalized from lists to Foldable a while back, some over-the top Foldable instances were introduced that result in funny-looking semantics like maximum (2,1) == 1, which is only "natural" if you think of the curried type constructor application (,) a b, remove the last argument, and demand that (,) a "obviously" must be Functor, Foldable, and so on. Well, this might save some people some keystrokes, but it also

moves Haskell into the Matlab league of languages

where any code is accepted, with unpredictable results. I know Henning said that, but where exactly? Alas, it's a separate topic, and this page here is long enough already.