Lazy Enumerators in Ruby

I gave a talk on Ruby’s Enumerators at the DC Ruby Users Group in August. I’ve posted my slides from the presentation, if you’re interested.  Basically, I discovered that enumerators make lazy evaluation easy to implement in Ruby, and applying lazy techniques with enumerators may yield more efficient and elegant code.

I’m defining lazy evaluation as the strategy of evaluating expressions only when needed, and only as much as needed. Lazy evaluation is a hallmark of some functional programming languages, like Haskell, Clojure, and Erlang. In those languages, laziness is often the default. In others, it can be hard work setting it up, possibly by iterating over a collection and halting (and resuming) depending on when items from the collection are needed and how many are needed. I thought of Ruby as one of the latter type of programming languages, until I saw enumerators.

Not to be confused with enumerables, classes that mix in the Enumerable module, enumerators allow you to access a collection of objects, or data. Whereas an enumerable, like an array, iterates over its items internally, allowing you to pass to a method like #each or #collect a block that will be yielded each item in turn, an enumerator is like an external iterator. It’s much easier to delay iteration with an enumerator.

Enumerators are available as an extension to Ruby 1.8, but in 1.9 they’re part of the language. My code samples below are in Ruby 1.9. I’ll let you look at the slides for an introduction to coding with enumerators. The new Pick Axe book (a.k.a. Programming Ruby 1.9) from Pragmatic Programmers has a great overview of them.

An enumerator can be created with Enumerator.new &block. The block has to have a single parameter, called yielder by convention, that will yield up a value when the enumerator iterates. The body of the block determines what it yields. Here’s an example: this creates an enumerator that backs the natural numbers. As in, all the natural numbers:

natural_numbers = Enumerator.new do |yielder| 
  number = 1 
  loop do 
    yielder.yield number 
    number += 1 
  end 
end

We can treat natural_numbers as an infinite sequence. When an enumerator is asked for its next item, it executes until it reaches a yield. Then it stops. When the next request comes, it resumes where it stopped, going again until a yield. The loop in the block above ensures that the enumerator will continue supplying numbers. Now we have a concrete example of what’s being lazily evaluated: the incrementing of the number variable. The next natural number is waiting to be calculated.

Brian Candler posted a terse but promising message about chaining enumerable method calls to enumerators. The Pick Axe picks this up and implements a lazy_select method for the Enumerator class. A lazy version is necessary because simply calling select, or collect, on natural_numbers would produce an infinite loop: these methods expect to exhaust the underlying collection, so they’ll keep taking numbers forever.

Here’s the lazy_select. Note how it follows Candler’s pattern of iterating explicitly over the collection and filtering what is output by yielder.yield:

class Enumerator 
  def lazy_select(&block) 
    Enumerator.new do |yielder| 
      self.each do |val| 
      yielder.yield(val) if block.call(val) 
      end 
    end 
  end 
end

This lets us write code to use natural_numbers like this:

p natural_numbers 
    .lazy_select {|n| n % 47 == 0} 
    .lazy_select {|n| palindrome_number?(n)} 
    .first(5)
# => [141, 282, 1551, 14241, 15651]

Each lazy_select returns a new, filtered enumerator. The final line requests 5 items from the final filtered enumerator, which is essentially the series of natural numbers divisible by 47 that are palindrome numbers. This isn’t the only way to generate these numbers in Ruby, by any means, but look at how clean the code is to read. There’s no noise in that code–it’s all intention. Joy!

Why stop with a lazy select?

def lazy_map(&block) 
    Enumerator.new do |yielder| 
      self.each do |value| 
        yielder.yield(block.call(value)) 
      end 
    end 
end
p natural_numbers.lazy_map {|n| n*n}.take(10)  # the first 10 squares
# => [1, 4, 9, 16, 25, 36, 49, 64, 81, 100]

These are just simple examples, using numbers because they make for concise execution and demonstration, but I can think of a lot of other uses for lazy, possibly infinite evaluation by enumerables. Instead of incrementing an integer, the loop could read another file, or request another resource from a network service. Filtering and mapping could just as easily be applied in these cases.