Home Understanding the basics of ruby enumerators by building one
Post
Cancel

Understanding the basics of ruby enumerators by building one

TLDR

  • Ruby has Enumerator class which allows us to make any object or even piece of code enumerable.
  • These enumerators are only evaluated when used, and thus can contain infinite series.
  • To make them work, we make it so that the yielder calls a code block whenever data is yielded to it. That code block in turn yields data to the enumerator’s each method.

Enumerator basics

Ruby allows us to make anything enumerable. e.g.

1
2
3
4
# y is a yielder, y.<< and y.yield can be used to push data into the enumerator
e = Enumerator.new do |y|
  (1..).each { |i| y << i }
end

The above piece of code basically creates an enumerator over all natural numbers. Enumerator includes Enumerable and thus allows us all of the goodies like map, filter, any?, all?, etc by just defining the each method. They are lazily calculated and thus can contain an infinite series. We can use it to test the basic maths we studied in school.

1
e.take(100).inject(&:+) # Should be the sum of the first 100 natural numbers, which we know is n * (n + 1) / 2 => 5050

Let’s build our own

We’ll build a very basic version, which makes the same examples as above work. We’ll skip over any sort of checks, they’ll just increase code size without adding much to our understanding. This should still allow us to understand the basics of how they work. We’ll also reiterate the concept multiple times, as I find that that helps a lot when understanding something new.

We need our enumerator to do the following:

  • Take a code block, which is passed a yielder.
  • Return the values passed to the yielder in the code block passed in each method.

A basic enumerator, which takes a code block can look like

1
2
3
4
5
6
7
class MyEnumerator
  include Enumerable

  def initialize(&block)
    @block = block
  end
end

Now, we need to call this block each time each is called, and pass in a yielder.

1
2
3
4
5
6
7
8
# after initialize in MyEnumerator

def each
  @block.call(Yielder.new)
end

class Yielder
end

This yielder needs to at least support << method and should yield that value to the block given to the each method in MyEnumerator. This is the fun part of the code.

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
# after initialize in MyEnumerator

def each
  p = proc do |*args|
    yield(*args)
  end

  @block.call(Yielder.new(&p))
end

class Yielder
  def initialize(&block)
    @block = block
  end

  def <<(*args)
    @block.call(*args)
  end
end

This is it. This is the entire magic. It all works on the basic concept that when we call yield within a method body that yields the value to the code block passed to the method. That yield can be nested within another proc, but, it’ll yield the value to the code block of the method within which it is defined. Thus in each, the proc p will yield the result to the code block each is called with, even if p itself is called somewhere else. So, when << is called on the yielder, the proc p is called, and that value is then passed to the code block of the each method.

Let me also walk you through how it works.

  • When MyEnumerator.new is called, it simply stores whatever code block is given to it. Let’s call it code block A.
  • When e.each is called with a code block B, it does the following fun thing.
    • It calls code block A with a new instance of a yielder.
    • The yielder is created with its own code block C, which simply yields whatever it is called with to code block B. This is the important part so I’ll reiterate. Whenever code block C is called, it yields its result to code block B. And what is code block B, but the code block passed to each and what is code block C ever called with, yup the values we want to pass to the yielder. So, we end up having a convenient interface of a yielder, which passes values to the code block that each is called with.

Let’s take a simple example and go through it.

1
2
3
4
5
e = MyEnumerator.new do |y|
  y << 1
end

e.each { |i| pp i } # Will print 1 to the screen
  • e stores code block A. All it does is send one value 1 to the yielder.
  • e.each is called with code block B. All that does is print whatever is passsed to it. Should be called only once and with 1.
  • e.each inside its code body calls code block A with a new yielder.
  • yielder.« is called in code block A, which calls C with the value 1, which yields the value to B.

This approach also handles more complicated cases of break/next in the code block passed to each. e.g. take method that we used above, can be thought of simply as:

1
2
3
4
5
6
7
8
9
10
11
def take(n)
  arr = []

  each do |i|
    break if arr.size == n

    arr << i
  end

  arr
end

If we now consider our original example:

1
2
3
4
5
e = MyEnumerator.new do |y|
  (1..).each { |i| y << i }
end

e.take(100).inject(&:+)

Everytime we do y << i, we yield the value to the each block, which in turn is stored in the arr in the take method. When finally we call break in that method, that will take us out of the take method call and thus stop the execution of the initial code block with the infinite series.

Summary

There you have it, the basics of enumerators. Here’s the link to the official docs and a ruby implementation in Rubinius in case you’re interested. Also, just a note, but in real world usage you’ll find that to_enum is much more useful to convert your enumerable methods into enumerators which can be chained.

This post is licensed under CC BY 4.0 by the author.