PROBLEM SOLVING ON UNIX/LINUX SYSTEMS: Java 8: Lambdas, Part 2

by Ted Neward

Learn how to use lambda expressions to your advantage.

The release of Java SE 8 swiftly approaches. With it come not only the new linguistic lambda expressions (also called closures or anonymous methods)—along with some supporting language features—but also API and library enhancements that will make parts of the traditional Java core libraries easier to use. Many of these enhancements and additions are on the Collections API, and because the Collections API is pretty ubiquitous across applications, it makes the most sense to spend the majority of this article on it.

However, it’s likely that most Java developers will be unfamiliar with the concepts behind lambdas and with how designs incorporating lambdas look and behave. So, it’s best to examine why these designs look the way they do before showing off the final stage. Thus, we’ll look at some before and after approaches to see how to approach a problem pre-lambda and post-lambda.

Note: This article was written against the b92 (May 30, 2013) build of Java SE 8, and the APIs, syntax, or semantics might have changed by the time you read this or by the time Java SE 8 is released. However, the concepts behind these APIs, and the approach taken by the Oracle engineers, should be close to what we see here.

Collections and Algorithms

BE ATTENTIVE
Algorithms, a more functional-centric way of interacting with collections, have been a part of the Collections API since its initial release, but they often get little attention, despite their usefulness.

The Collections API has been with us since JDK 1.2, but not all parts of it have received equal attention or love from the developer community. Algorithms, a more functional-centric way of interacting with collections, have been a part of the Collections API since its initial release, but they often get little attention, despite their usefulness. For example, the Collections class sports a dozen or so methods all designed to take a collection as a parameter and perform some operation against the collection or its contents.

Consider, for example, the Person class shown in Listing 1, which in turn is used by a List that holds a dozen or so Person objects, as shown in Listing 2.

Listing 1

public class Person {
  public Person(String fn, String ln, int a) {
    this.firstName = fn; this.lastName = ln; this.age = a;
  }

  public String getFirstName() { return firstName; }
  public String getLastName() { return lastName; }
        public int getAge() { return age; }
}

Listing 2

List<Person> people = Arrays.asList(
      new Person("Ted", "Neward", 42),
      new Person("Charlotte", "Neward", 39),
      new Person("Michael", "Neward", 19),
      new Person("Matthew", "Neward", 13),
      new Person("Neal", "Ford", 45),
      new Person("Candy", "Ford", 39),
      new Person("Jeff", "Brown", 43),
      new Person("Betsy", "Brown", 39)
    );
}

Now, assuming we want to examine or sort this list by last name and then by age, a naive approach is to write a for loop (in other words, implement the sort by hand each time we need to sort). The problem with this, of course, is that this violates DRY (the Don’t Repeat Yourself principle) and, worse, we have to reimplement it each time, because for loops are not reusable.

The Collections API has a better approach: the Collections class sports a sort method that will sort the contents of the List. However, using this requires the Person class to implement the Comparable method (which is called a natural ordering, and defines a default ordering for all Person types) or you have to pass in a Comparator instance to define how Person objects should be sorted.

So, if we want to sort first by last name and then by age (in the event the last names are the same), the code will look something like Listing 3. But that’s a lot of work to do something as simple as sort by last name and then by age. This is exactly where the new closures feature will be of help, making it easier to write the Comparator (see Listing 4).

Listing 3

 Collections.sort(people, new Comparator<Person>() {
      public int compare(Person lhs, Person rhs) {
        if (lhs.getLastName().equals(rhs.getLastName())) {
          return lhs.getAge() - rhs.getAge();
        }
        else
          return lhs.getLastName().compareTo(rhs.getLastName());
      }
    });

Listing 4

Collections.sort(people, (lhs, rhs) -> {
      if (lhs.getLastName().equals(rhs.getLastName()))
        return lhs.getAge() - rhs.getAge();
      else
        return lhs.getLastName().compareTo(rhs.getLastName());
    });

The Comparator is a prime example of the need for lambdas in the language: it’s one of the dozens of places where a one-off anonymous method is useful. (Bear in mind, this is probably the easiest—and weakest—benefit of lambdas. We’re essentially trading one syntax for another, admittedly terser, syntax, but even if you put this article down and walk away right now, a significant amount of code will be saved just from that terseness.)

If this particular comparison is something that we use over time, we can always capture the lambda as a Comparator instance, because that is the signature of the method—in this case, "int compare(Person, Person)"—that the lambda fits, and store it on the Person class directly, making the implementation of the lambda easier (see Listing 5) and its use even more readable (see Listing 6).

Listing 5

public class Person {
  // . . .

  public static final Comparator<Person> BY_LAST_AND_AGE =
    (lhs, rhs) -> {
      if (lhs.lastName.equals(rhs.lastName))
        return lhs.age - rhs.age;
      else
        return lhs.lastName.compareTo(rhs.lastName);
    };
}

Listing 6

 Collections.sort(people, Person.BY_LAST_AND_AGE);

BE ANONYMOUS
The Comparator is a prime example of the need for lambdas in the language: it’s one of the dozens of places where a one-off anonymous method is useful.

Storing a Comparator<Person> instance on the Person class is a bit odd, though. It would make more sense to define a method that does the comparison, and use that instead of a Comparator instance. Fortunately, Java will allow any method to be used that satisfies the same signature as the method on Comparator, so it’s equally possible to write the BY_LAST_AND_AGE Comparator as a standard instance or static method on Person (see Listing 7) and use it instead (see Listing 8).

Listing 7

  public static int compareLastAndAge(Person lhs, Person rhs) {
    if (lhs.lastName.equals(rhs.lastName))
      return lhs.age - rhs.age;
    else
      return lhs.lastName.compareTo(rhs.lastName);
  }

Listing 8

Collections.sort(people, Person::compareLastAndAge);

Thus, even without any changes to the Collections API, lambdas are already helpful and useful. Again, if you walk away from this article right here, things are pretty good. But they’re about to get a lot better.

Changes in the Collections API

With some additional APIs on the Collection classes themselves, a variety of new and more powerful approaches and techniques open up, most often leveraging techniques drawn from the world of functional programming. No knowledge of functional programming is necessary to use them, fortunately, as long you can open your mind to the idea that functions are just as valuable to manipulate and reuse as are classes and objects.

Comparisons. One of the drawbacks to the Comparator approach shown earlier is hidden inside the Comparator implementation. The code is actually doing two comparisons, one as a “dominant” comparison over the other, meaning that last names are compared first, and age is compared only if the last names are identical. If project requirements later demand that sorting be done by age first and by last names second, a new Comparator must be written—no parts of compareLastAndAge can be reused.

This is where taking a more functional approach can add some powerful benefits. If we look at that comparison as entirely separate Comparator instances, we can combine them to create the precise kind of comparison needed (see Listing 9).

Listing 9

public static final Comparator<Person> BY_FIRST =
    (lhs, rhs) -> lhs.firstName.compareTo(rhs.firstName);
  public static final Comparator<Person> BY_LAST =
    (lhs, rhs) -> lhs.lastName.compareTo(rhs.lastName);
  public static final Comparator<Person> BY_AGE =
    (lhs, rhs) -> lhs.age – rhs.age;

Historically, writing the combination by hand has been less productive, because by the time you write the code to do the combination, it would be just as fast (if not faster) to write the multistage comparison by hand.

As a matter of fact, this “I want to compare these two X things by comparing values returned to me by a method on each X” approach is such a common thing, the platform gave us that functionality out of the box. On the Comparator class, a comparing method takes a function (a lambda) that extracts a comparison key out of the object and returns a Comparator that sorts based on that. This means that Listing 9 could be rewritten even more easily as shown in Listing 10.

Listing 10

 public static final Comparator<Person> BY_FIRST =
    Comparators.comparing(Person::getFirstName);
  public static final Comparator<Person> BY_LAST =
    Comparators.comparing(Person::getLastName);
  public static final Comparator<Person> BY_AGE =
    Comparators.comparing(Person::getAge);

BE REDUCTIONIST
Doing this bypasses an interesting opportunity to explore one of the more powerful features of the new Java API, that of doing a reduction—coalescing a collection of values down into a single one through some custom operations.

Think for a moment about what this is doing: the Person is no longer about sorting, but just about extracting the key by which the sort should be done. This is a good thing—Person shouldn’t have to think about how to sort; Person should just focus on being a Person.

It gets better, though, particularly when we want to compare based on two or more of those values.

Composition. As of Java 8, the Comparator interface comes with several methods to combine Comparator instances in various ways by stringing them together. For example, the Comparator .thenComparing() method takes a Comparator to use for comparison after the first one compares. So, re-creating the “last name then age” comparison can now be written in terms of the two Comparator instances LAST and AGE, as shown in Listing 11. Or, if you prefer to use methods rather than Comparator instances, use the code in Listing 12.

Listing 11

 Collections.sort(people, Person.BY_LAST.
                                   .thenComparing(Person.BY_AGE));

Listing 12

Collections.sort(people,
      Comparators.comparing(Person::getLastName)
                 .thenComparing(Person::getAge));

By the way, for those who didn’t grow up using Collections.sort(), there’s now a sort() method directly on List. This is one of the neat things about the introduction of interface default methods: where we used to have to put that kind of noninheritance-based reusable behavior in static methods, now it can be hoisted up into interfaces. (See the previous article in this series for more details.)

Similarly, if the code needs to sort the collection of Person objects by last name and then by first name, no new Comparator needs to be written, because this comparison can, again, be made of the two particular atomic comparisons shown in Listing 13.

Listing 13

    Collections.sort(people,
      Comparators.comparing(Person::getLastName)
      .thenComparing(Person::getFirstName));

This combinatory “connection” of methods, known as functional composition, is common in functional programming and at the heart of why functional programming is as powerful as it is.

It’s important to understand that the real benefit here isn’t just in the APIs that enable us to do comparisons, but the ability to pass bits of executable code (and then combine them in new and interesting ways) to create opportunities for reuse and design. Comparator is just the tip of the iceberg. Lots of things can be made more flexible and powerful, particularly when combining and composing them.

Iteration. As another example of how lambdas and functional approaches change the approach to code, consider one of the fundamental operations done with collections: that of iterating over them. Java 8 will bring to collections a change via the forEach() default method defined on the Iterator and Iterable interfaces. Using it to print each of the items in the collection, for example, requires passing a lambda to the forEach method on an Iterator, as shown in Listing 14.

Listing 14

people.forEach((it) -> System.out.println("Person: " + it));

Officially, the type of lambda being passed in is a Consumer instance, defined in the java.util.function package. Unlike traditional Java interfaces, however, Consumer is one of the new functional interfaces, meaning that direct implementations will likely never happen—instead, the new way to think about it is solely in terms of its single, important method, accept, which is the method the lambda provides. The rest (such as compose and andThen) are utility methods defined in terms of the important method, and they are designed to support the important method.

For example, andThen() chains two Consumer instances together, so the first one is called first and the second is called immediately after into a single Consumer. This provides useful composition techniques that are a little outside the scope of this article.

BE A COLLECTOR
It is ugly enough to fix. The code is actually a lot easier to write if we use the built-in Collector interface and its partner Collectors, which specifically do this kind of mutable-reduction operation.

Many of the use cases involved in walking through a collection have the purpose of finding items that fit a particular criterion—for example, determining which of the Person objects in the collection are of drinking age, because the automated code system needs to send everyone in that collection a beer. This “act upon a thing coming from a group of things” is actually far more widespread than just operating upon a collection. Think about operating on each line in a file, each row from a result set, each value generated by a random-number generator, and so on. Java SE 8 generalized this concept one step further, outside collections, by lifting it into its own interface: Stream.

Stream. Like several other interfaces in the JDK, the Stream interface is a fundamental interface that is intended for use in a variety of scenarios, including the Collections API. It represents a stream of objects, and on the surface of things, it feels similar to how Iterator gives us access one object at a time through a collection.

However, unlike collections, Stream does not guarantee that the collection of objects is finite. Thus, it is a viable candidate for pulling strings from a file, for example, or other kinds of on-demand operations, particularly because it is designed not only to allow for composition of functions, but also to permit parallelization “under the hood.”

Consider the earlier requirement: the code needs to filter out any Person object that is not at least 21 years of age. Once a Collection converts to a Stream (via the stream() method defined on the Collection interface), the filter method can be used to produce a new Stream through which only the filtered objects come (see Listing 15).

Listing 15

people
      .stream()
      .filter(it -> it.getAge() >= 21)

The parameter to filter is a Predicate, an interface defined as taking one genericized parameter and returning a Boolean. The intent of the Predicate is to determine whether the parameter object is included as part of the returned set.

The return from filter() is another Stream, which means that the filtered Stream is also available for further manipulation, such as to forEach() through each of the elements that come through the Stream, in this case to display the results (see Listing 16).

Listing 16

 people.stream()
      .filter((it) -> it.getAge() >= 21)
      .forEach((it) -> 
        System.out.println("Have a beer, " + it.getFirstName()));

This neatly demonstrates the composability of streams—we can take streams and run them through a variety of atomic operations, each of which do one—and only one—thing to the stream. Additionally, it’s important to note that filter() is lazy—it will filter only as it needs to, on demand, rather than going through the entire collection of Person objects and filtering ahead of time (which is what we’re used to with the Collections API).

Predicates. It might seem odd at first that the filter() method takes only a single Predicate. After all, if a goal was to find all the Person objects whose age is greater than 21 and whose last name is Neward, it would seem that filter() could or should take a pair of Predicate instances. Of course, this opens a Pandora’s box of possibilities. What if the goal is to find all Person objects with an age greater than 21 and less than 65, and with a first name of at least four or more characters? Infinite possibilities suddenly open up, and the filter() API would need to somehow approach all of these.

Unless, of course, a mechanism were available to somehow coalesce all of these possibilities down into a single Predicate. Fortunately, it’s fairly easy to see that any combination of Predicate instances can themselves be a single Predicate. In other words, if a given filter needs to have condition A be true and condition B be true before an object can be included in the filtered stream, that is itself a Predicate (A and B), and we can combine those two together into a single Predicate by writing a Predicate that takes any two Predicate instances and returns true only if both A and B each yield true.

This “and”ing Predicate is—by virtue of the fact that it knows only about the two Predicate instances that it needs to call (and nothing about the parameters being passed in to each of those)— completely generic and can be written well ahead of time.

If the Predicate closures are stored in Predicate references (similar to how Comparator references were used earlier, as members on Person), they can be strung together using the and() method on them, as shown in Listing 17.

Listing 17

 Predicate<Person> drinkingAge = (it) -> it.getAge() >= 21;
    Predicate<Person> brown = (it) -> it.getLastName().equals("Brown");
    people.stream()
      .filter(drinkingAge.and(brown))
      .forEach((it) ->
                System.out.println("Have a beer, " +
                                   it.getFirstName()));

As might be expected, and(), or(), and xor() are all available. Make sure to check the Javadoc for a full introduction to all the possibilities.

map() and reduce(). Other common Stream operations include map(), which applies a function across each element present within a Stream to produce a result out of each element. So, for example, we can obtain the age of each Person in the collection by applying a simple function to retrieve the age out of each Person, as shown in Listing 18.

Listing 18

  IntStream ages =
      people.stream()
            .mapToInt((it) -> it.getAge());

For all practical purposes, IntStream (and its cousins LongStream and DoubleStream) is a specialization of the Stream<T> interface (meaning that it creates custom versions of that interface) for those primitive types.

This, then, produces a Stream of integers out of a Collection of Person instances. This is also sometimes known as a transformation operation, because the code is transforming or projecting a Person into an int.

Similarly, reduce() is an operation that takes a stream of values and, through some kind of operation, reduces them into a single value. Reduction is an operation already familiar to developers, though they might not recognize it at first: the COUNT() operator from SQL is one such operation (reducing from a collection of rows to a single integer), as are the SUM(), MAX(), and MIN() operators. Each of these takes a stream of values (rows) and produces a single value (the integer) by applying some operation (for example, increment a counter, add the value to a running total, select the highest, or select the lowest) to each of the values in the stream.

So, for example, you could sum the values prior to dividing by the number of elements in the stream to obtain an average age. Given the new APIs, it’s easiest to just use the built-in methods, as shown in Listing 19.

Listing 19

int sum = people.stream()
                .mapToInt(Person::getAge)
                .sum();

But doing this bypasses an interesting opportunity to explore one of the more powerful features of the new Java API, that of doing a reduction—coalescing a collection of values down into a single one through some custom operation. So, let’s rewrite the summation part of this using the new reduce() method:

.reduce(0, (l, r) -> l + r);

This reduction, also known in functional circles as a fold, starts with a seed value (0, in this case), and applies the closure to the seed and the first element in the stream, taking the result and storing it as the accumulated value that will be used as the seed for the next element in the stream.

In other words, in a list of integers such as 1, 2, 3, 4, and 5, the seed 0 is added to 1 and the result (1) is stored as the accumulated value, which then serves as the left-hand value in addition to serving as the next number in the stream (1+2). The result (3) is stored as the accumulated value and used in the next addition (3+3). The result (6) is stored and used in the next addition (6+4), and the result is used in the final addition (10+5), yielding the final result 15. And, sure enough, if we run the code in Listing 20, we get that result.

Listing 20

List<Integer> values = Arrays.asList(1, 2, 3, 4, 5);
    int sum = values.stream().reduce(0, (l,r) -> l+r);
    System.out.println(sum);

Note that the type of closure accepted as the second argument to reduce is an IntBinaryOperator, defined as taking two integers and returning an int result. IntBinaryOperator and IntBiFunction are examples of specialized functional interfaces—including other specialized versions for Double and Long—which take two parameters (of one or two different types) and return an int. These specialized versions were created mostly to ease the work required for using the common primitive types.

IntStream also has a couple of helper methods, including the average(), min(), and max() methods, that do some of the more common integer operations. Additionally, binary operations (such as summing two numbers) are also often defined on the primitive wrapper classes for that type (Integer::sum, Long::max, and so on).

More maps and reduction. Maps and reduction are useful in a variety of situations beyond just simple math. After all, in any case where a collection of objects can be transformed into a different object (or value) and then collected into a single value, map and reduction operations work.

The map operation, for example, can be useful as an extraction or projection operation to take an object and extract portions of it, such as extracting the last name out of a Person object:

Stream lastNames =   people.stream()      .map(Person::getLastName);

Once the last names have been retrieved from the Person stream, the reduction can concatenate strings together, such as transforming the last name into a data representation for XML. See Listing 21.

Listing 21

String xml =
      "<people data='lastname'>" +
      people.stream()
            .map(it -> "<person>" + it.getLastName() + "</person>")
            .reduce("", String::concat)
      + "</people>";
    System.out.println(xml);

And, naturally, if different XML formats are required, different operations can be used to control the contents of each format, supplied either ad hoc, as in Listing 21, or from methods defined on other classes, such as from the Person class itself, as shown in Listing 22, which can then be used as part of the map() operation to transform the stream of Person objects into a JSON array of object elements, as shown in Listing 23.

Listing 22

public class Person {
  // . . .
  public static String toJSON(Person p) {
    return
      "{" +
        "firstName: \"" + p.firstName + "\", " +
        "lastName: \"" + p.lastName + "\", " +
        "age: " + p.age + " " +
      "}";
  }
}

Listing 23

String json =
      people.stream()
        .map(Person::toJSON)
        .reduce("[", (l, r) -> l + (l.equals("[") ? "" : ",") + r)
        + "]";
    System.out.println(json);

BE READY
The release of Java SE 8 swiftly approaches. With it come not only the new linguistic lambda expressions (also called closures or anonymous methods)—along with some supporting language features—but also API and library enhancements that will make parts of the traditional Java core libraries easier to use.

The ternary operation in the middle of the reduce operation is there to avoid putting a comma in front of the first Person serialized to JSON. Some JSON parsers might accept this format, but that is not guaranteed, and it looks ugly to have it there.

It is ugly enough, in fact, to fix. The code is actually a lot easier to write if we use the built-in Collector interface and its partner Collectors, which specifically do this kind of mutable-reduction operation (see Listing 24). This has the added benefit of being much faster than the versions using the explicit reduce and String::concat from the earlier examples, so it’s generally a better bet.

Listing 24

  String joined = people.stream()
                          .map(Person::toJSON)
                          .collect(Collectors.joining(", "));System.out.println("[" + joined + "]");

Oh, and lest we forget our old friend Comparator, note that Stream also has an operation to sort a stream in-flight, so the sorted JSON representation of the Person list looks like Listing 25.

Listing 25

String json = people.stream()
                        .sorted(Person.BY_LAST)
                        .collect(Collectors.joining(", " "[", "]"));
    System.out.println(json);

This is powerful stuff.

Parallelization. What’s even more powerful is that these operations are entirely independent of the logic necessary to pull each object through the Stream and act on each one, which means that the traditional for loop will break down when attempting to iterate, map, or reduce a large collection by breaking the collection into segments that will each be processed by a separate thread.

Learn More Lambda Expressions

Learn More

Lambda Expressions

The Stream API, however, already has that covered, making the XML or JSON map() and reduce() operations shown earlier a slightly different operation—instead of calling stream() to obtain a Stream from the collection, use parallelStream() instead, as demonstrated in Listing 26.

Listing 26

  people.parallelStream()
      .filter((it) -> it.getAge() >= 21)
      .forEach((it) ->
                System.out.println("Have a beer " + it.getFirstName() +
                  Thread.currentThread()));

For a collection of at least a dozen items, at least on my laptop, two threads are used to process the collection: the thread named main, which is the traditional one used to invoke the main() method of a Java class, and another thread named ForkJoinPool.commonPool worker-1, which is obviously not of our creation.

Obviously, for a collection of a dozen items, this would be hideously unnecessary, but for several hundred or more, this would be the difference between “good enough” and “needs to go faster.” Without these new methods and approaches, you would be staring at some significant code and algorithmic study. With them, you can write parallelized code literally by adding eight keystrokes (nine if you count the Shift key required to capitalize the s in stream) to the previously sequential processing.

And, where necessary, a parallel Stream can be brought back to a sequential one by calling—you can probably guess—sequential() on it.

The important thing to note is that regardless of whether the processing is better done sequentially or in parallel, the same Stream interface is used for both. The sequential or parallel implementation becomes entirely an implementation detail, which is exactly where we want it to be when working on code that focuses on business needs (and value); we don’t want to focus on the low-level details of firing up threads in thread pools and synchronizing across them.

Conclusion

Lambdas will bring a lot of change to Java, both in terms of how Java code will be written and how it will be designed. Some of these changes are already taking place within the Java SE libraries, and they will slowly make their way through many other libraries—both those owned by the Java platform and those out in “the wilds” of open source—as developers grow more comfortable with the abilities (and drawbacks) of lambdas.

Numerous other changes are present within the Java SE 8 release. But if you understand how lambdas on collections work, you will have a strong advantage when thinking about how to leverage lambdas within your own designs and code, and you can create better-decoupled code for years to come.

Ted Neward (@tedneward) is an architectural consultant for Neudesic. He has served on several Expert Groups; authored many books, including Effective Enterprise Java (Addison-Wesley Professional, 2004) and Professional F# 2.0 (Wrox, 2010); written hundreds of articles on Java, Scala, and other technologies; and spoken at hundreds of conferences.

Source: http://www.oracle.com/technetwork/articles/java/architect-lambdas-part2-2081439.html

PROBLEM SOLVING ON UNIX/LINUX SYSTEMS

2015-10-17

Java 8: Lambdas, Part 2

Collections and Algorithms

Changes in the Collections API

Learn More

Conclusion

No comments:

Post a Comment