TLDR
- Ruby has Enumerator class which allows us to make any object or even piece of code enumerable.
- These enumerators are only evaluated when used, and thus can contain infinite series.
- To make them work, we make it so that the yielder calls a code block whenever data is yielded to it. That code block in turn yields data to the enumerator’s each method.
Enumerator basics
Ruby allows us to make anything enumerable. e.g.
1
2
3
4
# y is a yielder, y.<< and y.yield can be used to push data into the enumerator
e = Enumerator.new do |y|
(1..).each { |i| y << i }
end
The above piece of code basically creates an enumerator over all natural numbers. Enumerator includes Enumerable
and thus allows us all of the goodies like map
, filter
, any?
, all?
, etc by just defining the each
method. They are lazily calculated and thus can contain an infinite series. We can use it to test the basic maths we studied in school.
1
e.take(100).inject(&:+) # Should be the sum of the first 100 natural numbers, which we know is n * (n + 1) / 2 => 5050
Let’s build our own
We’ll build a very basic version, which makes the same examples as above work. We’ll skip over any sort of checks, they’ll just increase code size without adding much to our understanding. This should still allow us to understand the basics of how they work. We’ll also reiterate the concept multiple times, as I find that that helps a lot when understanding something new.
We need our enumerator to do the following:
- Take a code block, which is passed a yielder.
- Return the values passed to the yielder in the code block passed in
each
method.
A basic enumerator, which takes a code block can look like
1
2
3
4
5
6
7
class MyEnumerator
include Enumerable
def initialize(&block)
@block = block
end
end
Now, we need to call this block each time each
is called, and pass in a yielder.
1
2
3
4
5
6
7
8
# after initialize in MyEnumerator
def each
@block.call(Yielder.new)
end
class Yielder
end
This yielder needs to at least support <<
method and should yield that value to the block given to the each
method in MyEnumerator
. This is the fun part of the code.
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
# after initialize in MyEnumerator
def each
p = proc do |*args|
yield(*args)
end
@block.call(Yielder.new(&p))
end
class Yielder
def initialize(&block)
@block = block
end
def <<(*args)
@block.call(*args)
end
end
This is it. This is the entire magic. It all works on the basic concept that when we call yield within a method body that yields the value to the code block passed to the method. That yield can be nested within another proc, but, it’ll yield the value to the code block of the method within which it is defined. Thus in each
, the proc p
will yield the result to the code block each
is called with, even if p
itself is called somewhere else. So, when <<
is called on the yielder, the proc p
is called, and that value is then passed to the code block of the each method.
Let me also walk you through how it works.
- When
MyEnumerator.new
is called, it simply stores whatever code block is given to it. Let’s call it code block A. - When
e.each
is called with a code block B, it does the following fun thing.- It calls code block A with a new instance of a yielder.
- The yielder is created with its own code block C, which simply yields whatever it is called with to code block B. This is the important part so I’ll reiterate. Whenever code block C is called, it yields its result to code block B. And what is code block B, but the code block passed to
each
and what is code block C ever called with, yup the values we want to pass to the yielder. So, we end up having a convenient interface of a yielder, which passes values to the code block thateach
is called with.
Let’s take a simple example and go through it.
1
2
3
4
5
e = MyEnumerator.new do |y|
y << 1
end
e.each { |i| pp i } # Will print 1 to the screen
- e stores code block A. All it does is send one value
1
to the yielder. e.each
is called with code block B. All that does is print whatever is passsed to it. Should be called only once and with1
.e.each
inside its code body calls code block A with a new yielder.- yielder.« is called in code block A, which calls C with the value
1
, which yields the value to B.
This approach also handles more complicated cases of break/next in the code block passed to each
. e.g. take
method that we used above, can be thought of simply as:
1
2
3
4
5
6
7
8
9
10
11
def take(n)
arr = []
each do |i|
break if arr.size == n
arr << i
end
arr
end
If we now consider our original example:
1
2
3
4
5
e = MyEnumerator.new do |y|
(1..).each { |i| y << i }
end
e.take(100).inject(&:+)
Everytime we do y << i
, we yield the value to the each block, which in turn is stored in the arr
in the take method. When finally we call break
in that method, that will take us out of the take method call and thus stop the execution of the initial code block with the infinite series.
Summary
There you have it, the basics of enumerators. Here’s the link to the official docs and a ruby implementation in Rubinius in case you’re interested. Also, just a note, but in real world usage you’ll find that to_enum
is much more useful to convert your enumerable methods into enumerators which can be chained.