Functional Programming for the Functionally Challenged (Like Me) – Part 1

10.02.2015

This is the first post in a series dedicated to presenting solutions to common challenges that developers encounter when moving from an imperative programming approach to functional programming (FP). I will present a series of problems and provide solutions in both Java and Elixir, a functional language running on the Erlang VM.

This is not intended as a Java vs. Elixir comparison. Nor is it intended to be an Elixir primer. It is intended to demonstrate how to solve certain types of problems when moving from an imperative approach to a functional one.

Java was chosen for the imperative solutions because it is familiar to most programmers and has traditionally been used in an imperative style. It is worth noting that Java has begun to adopt some functional aspects, for instance the introduction of the for-each loop way back in Java 5 and more recently lambda expressions in Java 8.

The Challenge

One of the first challenges I encountered when starting with functional programming was how to replace the looping constructs (for loop, while loops, etc.) that I knew and understood from my imperative background. Loops are used all the time in imperative programming for a variety of reasons

We want to execute some side effect(s) related to a collection (or data source), like printing it out.
We want to find some element in a collection (and maybe do something with it).
We want to transform a collection into a different collection (or collections) In a (mostly) imperative language like Java, there are only a few ways to do these things, and they all involve loops of some sort. Let’s look at an example of the first task by printing out a list.

Printing a List

Say we have a list of records and that those records have a “name” field. Now suppose that we want to print out all the names from the records in our list. Strictly speaking, printing a list relies on “side effects” and is not purely functional. How we go about it, however, can follow the imperative style or the FP style. In Java we might store our records as Person objects in an ArrayList and iterate over the list to print out the names like this:

for(int i=0; i < list.size(); i++) {
  Person val = list.get(i);
  System.out.println(val.getName()));
}

This code is fairly straightforward, but it’s worth mentioning that we must set the bound of our loop’s initial and terminating conditions. A mistake here can lead to a missed name or an IndexOutOfBoundsException.

Alternatively we could use an iterator instead of a for loop, which would mitigate some of the bound related concerns, but the code inside the iteration would look similar to the code inside our for loop. Even better would be to use a for-each loop, but then we would be moving into FP territory and I don’t want to blur the line too much between the two approaches.

So how might we do something like this in Elixir given that we don’t have looping control flow?¹

Elixir makes this easy by providing functions for operating on enumerable collections in the Enum module. Several of these let us apply a function to all (or some) of the items in a collection.

In our case we can use the each function to print out our names. We store our records as maps in a list and call each on the list.

Enum.each(list, fn item -> IO.puts item["name"] end)

The each function takes two arguments, a collection and a function²to execute on each element of the collection. In Elixir we define an anonymous function³ using thenfn arg(s) -> ... end syntax. The code between the -> and end defines the function body. So our call to fn defines a function that takes a list element as its item argument and prints the value for theitem’s “name” key using the puts function from the IO module.

If you have explored FP at all you have probably heard of the map function that is similar to each. Enum also provides a map function, and we could use it here as well. There is a difference in the two, however.each is specifically designed to execute side-effects and merely returns the symbol :ok when it completes, whereas map returns a new collection composed of the results of running the given function on each item in the input collection.

Notice that unlike the Java implementation, with Elixir we don’t specify how to do what we want so much as declare what we want do (print out the “name” of each item in our list). This declarative style is at the heart of functional programming and is one of the fundamental differences between imperative programming and FP.

Let’s try something slightly more complicated. Suppose that some of our records don’t have “name” entries. We would like to avoid printing outnulls for those that don’t. We can modify our Java code slightly to accomplish this.

for(int i=0; i < list.size(); i++) {
  Person val = list.get(i);

  String name = val.getName();

  if(name != null) {
    System.out.println(name);
  }
}

This code is the same as the previous example except that we must explicitly check our values to avoid printing nulls.

We could take a similar approach with Elixir and add a check in our anonymous function to only print names that are not nil (Elixir’s version of null). We have other options, however. Enum provides the filter function, which iterates over a collection and applies a given function to each item. Any items for which the given function returns a “truthy” (not false or nil) value are returned in a list.

We can combine this with the each function (or map) in a couple of different ways. One way is to capture the output of the filter function in a variable and pass that to each:

filtered =  Enum.filter(list, fn item -> item["name"] end)
Enum.each(filtered, fn item -> IO.puts item["name"] end)

We can write this more succinctly as:

Enum.each(Enum.filter(list, fn item -> item["name"] end),
  fn item -> IO.puts item["name"] end)

but this is harder to follow (for me at least). Fortunately, Elixir supports a different form of function composition using the pipe operator, |>, which takes the value of the expression on its left and applies it as the first argument to the function on its right. Thus we can write:

list |> Enum.filter(fn item -> item["name"] end) |> Enum.each(fn item -> IO.puts item["name"] end)

You read this as “take the value of list and use it as the first argument to filter with the given function, then take the output from that and use it as the first argument to each with its given function.” Notice that we don’t specify the first argument to the filter or each functions as it is assigned implicitly by the pipe operator. The pipe operator can be used repeatedly to create arbitrarily long function compositions, piping the output of each segment of the chain into the first argument of the next.

This style is reminiscent of the unix pipe operator and allows us to think in terms of data flowing through a pipeline of operations. Contrast this with the imperative approach in the Java example.

We have yet another option with the filer_map function. It provides even tighter composition of the filter and map functionality. filter_map takes as arguments a collection and two functions. Each item in the collection is passed to the first function, and each item for which that function returns a truthy value is passed to the second function. The values of the second function for these items are composed into a list.

Enum.filter_map(list, &(&1["name"], &(IO.puts(&1["name"])))

Here we take advantage of the second way of expressing anonymous functions in Elixir. The &(...) syntax declares an anonymous function, with arguments passed to it bound to &1, &2, etc. This short form is convenient when writing concise functions like the ones we have here. I find the longer form slightly more readable with it’s named arguments, so for the sake of clarity I will stick to the longer form for the rest of this series.

Finding an Element in a List

Moving onto our next challenge, how might we find elements in a list? Let’s continue with our previous example of a collection containing records, and say that we want to find the record with the value “Bob” for the key “name”. Our Java implementation might look like this:

Person bob = null;

for(int i=0; i < list.size(); i++) {
  Person val = list.get(i);

  String name = val.getName();

  if(name != null && name.equals("Bob")) {
    bob = val;
    break;
  }
}

Again, there are points worth mentioning about this code. The first point is that we must declare a variable outside of our loop to keep track of our match and explicitly assign to this variable when we find a match inside our loop.nWhen checking for a match inside the loop, we must test to make sure the name is not null to avoid a NullPointerException on line 8. The final point is that we must explicitly break out of our loop when we find a match to avoid the overhead of processing unnecessary records.

How might we do something like this in Elixir given that we don’t have mutable data⁴ to keep track of our match? Elixir makes this easy by providing the find function in the Enum module.

bob = Enum.find(list, fn item -> item["name"] == "Bob" end)

The find function takes two arguments, the collection we want to search and a function that determines if a collection item matches our query. The body of the anonymous function, item["name"] == "Bob", simply evaluates to true if the value for then“name” key of its item argument equals “Bob” and false if not.

find will iterate over all the items in the collection, passing each item to the anonymous function until the that function returns true. find returns the first item in the collection that matches or nil if there was no match.

Note that we don’t have to deal with any of the issues from the Java implementation. There is no bounds checks, no checking for nulls (nils in Elixir) to avoid exceptions, and no need to terminate our search when we find the item we want (find handles that).

Transforming a Collection

Suppose we want to take the list from the previous example and create a new list that contains the names for all the records for which the value stored for the “age” key is greater than 21.

In Java we might use code like the following:

ArrayList newList = new ArrayList();
for(int i=0; i<list .size(); i++) {
  Person person = list.get(i);

  if (person.age > 21) {
    newList.add(person.getName());
  }
}

The Enum module in Elixir provides many functions for transforming collections, but in this case we probably want to use a list comprehension.

new_list = for item < - list, fn item -> item["age"] > 21 end do item["name"] end

for takes a collection, an optional predicate function that selects a subset of the items in the collection (the anonymous function that tests the item’s age), and an expression that returns some function (in the general sense) of the items in the subset

in this case it simply selects the value for the “name” key.

Read this as “construct a new collection consisting of the names of each item in the list that has a value for “age” greater than 21.”

It’s important to distinguish the for in Elixir from a for loop in Java.for in Elixir is not a control flow construct. Although you can have side effects in a for, its primary purpose is to construct a new collection, and this collection is the return value from the for. This contrasts with the Java implementation in which we have to declare our new collection outside our loop and then append values to it piece by piece in the body of the loop.

Suppose we want to sort the names as well. In Java we either sort the new collection as we construct it (ugh), or we can call the sort method on thenCollections class.

Collections.sort(newList);

This almost has an FP feel to it, but it’s not, because this sorts our new list in place, i.e., mutates it.

With Elixir we can similarly call Enum.sort on new_list, but we want to start thinking about data transformations in FP, so instead we pipe the output of our list comprehension directly to sort.

new_list = for item < - list, item["age"] > 21 do item["name"] end |> Enum.sort()

You might be objecting at this point by asking “But these pipelines are transforming data – how is that different from changing our list in place with Java’s Collections.sort()?” The difference is that collections transformed by Elixir functions are copied – the original collection never changes. And right about now you are probably thinking how inefficient that must be in terms of memory and CPU cycles. I’m going to dodge this as outside the scope of this series, but go read about persistent data structures and have your mind blown.

Let’s look at another example. Suppose we want to partition our collection into two different collections, one collection containing all the items where the value for the “age” key is less than 21 and one consisting of all the rest.

In Java this would not be much different from our previous example:

ArrayList underAge = new ArrayList();
ArrayList legal = new ArrayList();

for(int i=0; i<list .size(); i++) {
  Person person = list.get(i);

  if (person.age < 21) {
    underAge.add(person);
  } else {
    legal.add(person);
  }
}

In Elixir, we have yet another higher order function from Enum, thenpartition function. This function is similar to the filter function, but instead of discarding items it splits them into two lists, one for the items for which the given function returns a truthy value and one for everything else.

{under_age, legal} = Enum.partition(list, fn item -> item["age"] < 21 end)

Again, notice the declarative style here. We want to partition our list into one list with items with “age” under 21 and one with the rest. Also note that partition returns a tuple that we can match against (see Elixir pattern matching) to bind our two lists directly to the variables under_age and legal. Compare this to the Java implementation where we had to declare our lists outside our loop and append items to each list explicitly using conditional logic inside our loop.

To Infinity and Beyond

All the previous examples have one thing in common; they all work with collections that fit into memory. Often we need to read and process a data source that won’t fit into memory (like a really big file) or is effectively infinite (like reading from a data stream). This is a classic use for loops in the imperative approach.

So let’s compare how we handle this using a functional approach.nLet’s implement functionality similar to the Unix grep command in which we will read in a file line by line and print out any lines that contain a given string. We’ll make things more interesting by printing out the line number with each matching line. To keep things simple we will assume that there are no lines too big to fit into memory.

A straightforward imperative⁵ Java implementation looks like this (ignoring error handling):

FileInputStream fs = new FileInputStream(filename);
BufferedReader br = new BufferedReader(new InputStreamReader(fs));

String line;
int lineNum = 1;

while ((line = br.readLine()) != null) {
  if (line.indexOf(query) != -1) {
    System.out.format("%d - %s%n", lineNum, line);
  }
  lineNum++;
}

br.close();

We use a while loop to read from the open file until there are no lines left. Inside the loop we check the current line to see if it contains our query term. If it does, we print out the line with its line number. Fairly simple stuff.

So how do we handle data sources of indefinite size in Elixir? Not surprisingly, Elixir provides a module for this. The Stream module provides the same functionality as the Enum module, but instead of operating on enumerable collections, it operates on streams. In Elixir, streams are enumerables that generate items one by one during enumeration. We will examine a solution based on the Stream module in a later post, but first lets take a look at a more generic solution based on a recursive function.

With a recursive function call we can produce the same effect of an iterative loop in an imperative language. There are two primary differences in the implementation. The first is that instead of specifying a termination condition in a loop predicate we must implement the logic for this in our function. The second difference is that with an imperative approach we can mutate external state with each loop iteration (such as incrementing the line counter), whereas with the recursive approach we must pass any state into our function as normal arguments.

Our Elixir implementation consists of two functions (in Elixir, functions with the same name but different arity are considered different functions). The first function is the actual recursive function that does most of the work.

def grep(file, query, line_num) do
  case IO.read(file, :line) do

    {:error, reason} ->
      IO.puts reason
      File.close(file)

    :eof ->
      File.close(file)

    line ->
      if String.contains?(line, query) do
        IO.puts "#{line_num}: #{line}"
      end
      grep(file, query, line_num + 1)

  end
end

def grep(filename, query) do
  {:ok, file} = File.open(filename, [:read, :utf8])
  grep(file, query, 1)
end

The second function accepts a filename and a query string. It opens the file and passes it to the first function, initializing the line_num argument to 1.

The first function simply reads a line from the file and use a case statement to decide what to do. If the line read results in an error then we simply print the error and close the file.

Similarly, if the output of the line read is and end-of-file then we just close the file. The interesting bit is the last case, where we successfully read a line.nIf the line contains our query string, we print it out along with the line number. In any case, since we did not read an end-of-file, we call ourselves again ⁶ on line 15. We pass the same arguments we received, with the exception of the line number, which we increase by one. In this way we keep reading and processing lines until the file is exhausted.

Notice how we don’t keep track of the line number using an external variable. That bit of state is maintained implicitly in the argument to the function call. There is no mutated state since we are never assigning anything. Every time we want to change state we must pass the new state in as arguments to our function. This can be a tricky bit to understand at first, but its fundamental so take time to let it sink in.

Conclusion

We have seen how we can accomplish tasks in a functional language that typically are solved using looping control flow in imperative languages. Hopefully you are starting to get a feel for the declarative style that is pervasive in FP, and hopefully you are starting to appreciate how it can help you write more expressive, concise code. In the next post we will take a look into our next challenge, working with nested data structures.

Notes

Actually, Elixir does have a for but it is not technically a loop construct, rather it is what is known as a list comprehension. ↩
Yes function – see higher order functions. ↩
Anonymous functions are simply functions that are not named. Technically they are also closures. ↩
Although data is immutable, variables can be rebound to new data, so we could do something similar to the variable assignment happening in the Java loop. Higher order functions like Enum.find prevent a need for this, however. ↩
Java 8 provides streams that provide a more declarative approach to processing I/O. ↩
At this point you may be concerned about a large number of recursive calls exceeding our stack limit. Fortunately Elixir, like many functional languages, provides tail call elimination, so our recursive calls are effectively treated as jumps back to the top of the function. ↩