I gave a talk on Ruby’s Enumerators at the DC Ruby Users Group in August. I’ve posted my slides from the presentation, if you’re interested. Basically, I discovered that enumerators make lazy evaluation easy to implement in Ruby, and applying lazy techniques with enumerators may yield more efficient and elegant code.
I count myself lucky that I can earn a living programming with Ruby and in particular with the Ruby on Rails framework. But every tool, even the best, has a few surprises up its sleeve. Sometimes you get hung up on something quirky, but more often you’re having a problem with a feature—something that’s there by design. This is my story about one such problem I had with a Rails feature, how I diagnosed it, and what it means: metaprogramming is awesome, but it comes at a price, and in imperative languages, the price is side effects.
Nothing close to a complete review of all the quality talks I heard at RubyNation, the following is just a summary of cool things I learned about while attending.
For other resources, and some of my own snarky comments, take a look at the Twitter Channel for the event: #rubynation
TupleSpaces and Rinda
Luc Castera gave an excellent presentation about concurrency and distributed programming in Ruby. After basically saying that using Ruby’s built-in facilities for concurrency was a myth, he laid out several alternative approaches and applications. The most interesting was using Rinda to implement TupleSpaces in Ruby. Rinda can run as a service listening on a port, and it can manage a queue of messages to be processed by a TupleSpace, which is basically an associative value store, or a mini-environment. Other processes can monitor the contents of the spaces and pull data from them to run operations. This is the actor model: the spaces share nothing and communicate asynchronously with messages.
Take a look at Luc’s slides for all the info. Great talk. I’m going to look into Rinda as soon as I can.
Reek: A code complexity metric tool
Mentioned during Mark Cornick’s excellent talk on refactoring, reek is a tool to measure the perceived complexity of your Ruby code. Reek doesn’t like repetition, and it really hates long methods. It certainly doesn’t like my code: reek returned 41 warnings about my latest controller. More about reek on github.
Micronaut
From Relevance Labs comes a homegrown BDD framework that ended up embracing RSpec while promising to make your tests run faster. Use metadata to group and target your tests, and you may never have to wait for an hour for all your specs to run. According to Aaron Bedra, it’s also beautiful code to look at: a good example of metaprogramming in only 2 KLOC. And someday soon its tiny heart will beat inside RSpec proper. More on micronaut at Relevance’s blog.
Enumerators
How did I miss this? Already in Ruby 1.8, these constructs will be even more important in 1.9. They’re essentially lazy data implementations. Roughly:
>> r = [1,2,3,4].cycle
=> #<Enumerator #f00ba2>
>> r.take(5)
=> [1,2,3,4,1]
Directed Graph Data Stores
Holy Cow! I’ve only just learned the difference between Tokyo Cabinet (it’s a key-value store) and CouchDB (document-based data store). Ben Scofield, in his talk on categorizing comic books, explained these both but also brought up Directed Graphs, which I haven’t thought about since SICP. There don’t seem to be any highly-visible implementations of this storage method, although I believe that RDF models data as directed graphs. This is going on the top of my research list. Cool stuff.
A Wildcard: Reia
There was a lot of buzz about the new Reia programming language, but little concrete information. The estimable Hal Fulton gave a talk on the Ruby-like language for the Erlang VM, but even he didn’t seem to know much about it. Its creator user to run revactor, a Ruby framework that implemented the Actor model, and a language with Erlang’s spooky power and Ruby’s “curb appeal”, as Russ Olsen has it, would be superb. But it seems we’re a long way from being able to evaluate whether Reia will make a mark.
Lingo and Memes
Did anyone else hear the phrase “bikeshop it” several times during different talks? And how about the rainbows and puppies in Ben Scofield’s talk? I didn’t make it to Danny Blitz’s talk on Tiger Teams (Raarrr!), so I’d warrant I missed out on some great terms.
My Clojure Lightning Talk
I put together a 6-minute talk on Clojure, with Paul Barry’s help. The slides are here. I wish I’d also been able to mention type hints, for when you want to speed things up with static types; metaprogramming with macros and first-class functions; and Clojure’s robust metadata system. Maybe I’ll get to give another version of it again.
Offered for your approval, one attendee’s observations of JSConf 2009, held this April in Rosslyn, Virginia.
First, propers are due to Chris and Laura Williams for setting this up. A conference for javascript is an idea whose time has come. The speakers were outstanding. The food and after-party were fantastic, too. If there’s a JSConf 2010, I’ll be the first to sign up.
On the Clojure Google group there’s been a good discussion about lazy sequences and streams. The exchange goes to some of the fundamentals of the language, but if I have a decent grip on the issues, it sounds as though Mark Engelberg is concerned about the inefficiency of caching the results of evaluating n elements of a sequence like (def whole-numbers (iterate inc 0))
. Here’s a bit from an excellent blog post in which he lays out his points:
No one would ever want to cache the whole-numbers sequence. It’s significantly faster to generate the sequence via incrementing every time you need to traverse the sequence, than it is to cache and hold the values. Furthermore, because the sequence represents an infinite sequence, there’s no upper limit to the memory consumption.
Engelberg would prefer non-cached lazy sequences, better described as streams, for most operations. Rich Hickey reveals that he in fact has an implementation of streams more or less up his sleeve, but he’s been holding back on adding it to Clojure’s core because streams bring their own issues and because it sounds non-trivial to add them–it would likely postpone the language’s 1.0 release. That’s my disco summary: please read Mark’s post and the whole discussion thread for everything I’ve missed and left out.
My own contribution to this debate might be to point out that there are plenty of scenarios in which caching is crucial to the construction of procedures. As I often do, I go back to SICP for examples. In the chapter on streams, Abelson and Sussman discuss how nice it would be to write recursive procedures, and gain the simplicity and readability of recursive code, but gain iterative efficiency, that is, linear growth of memory and execution time. They claim that laziness gets them that. As illustration, they provide a recursive definition of the Fibonacci sequence, which I here translate into Clojure:
> (def fibs (lazy-cons 0 (lazy-cons 1 (map + (rest fibs) fibs))))
#'user/fibs
> (take 10 fibs)
(0 1 1 2 3 5 8 13 21 34)
Without caching, this would be a bear. With caching, getting the nth Fibonacci number should be a matter of one addition operation. And we get beautiful code. Sure, this is an academic example, but consider how powerful the one-line definition is. It reads just like the natural language definition of the sequence: the first term is 0, the next is 1, and each successive term is the sum of the two previous terms. Awesoma powa.
Rich’s main worry about streams seems to be that since Clojure promises to let you treat many Java structures as sequences, it can’t be sure it is wrapping a source of immutable data. Between two evaluations of a lazy sequence, something could change in the Java object, and then the n elements of the sequence that your two evaluations share could be different.
That isn’t supposed to happen. With caching, it wouldn’t. Evaluating the first ten elements in a sequence like this fixes them for as long as the sequence is in scope. Later evaluating the first twenty, you will get the same items 0 through 9 as you got before.
So streams would not promise immutability, necessitating a separate abstraction interface from sequences and complicating the current everything-is-a-sequence/seq-able beauty of Clojure.
Perhaps there are other ways to get non-caching performance out of sequences. For example, some Clojure data structures are lazy without using lazy-cons. Stephen Gilardi explains how a clojure.lang.Range
object responds to rest
with another Range. Something like (doall (range 10000000000))
actually discards each of these “rest” Ranges as it goes. Maybe if you’re careful about your collections and the operations you perform on them, you can get better performance now without streams. Perhaps new seq-able structures that iterate like Range can help enough that we can avoid muddying the language.
When not handled in a functional setting, i.e. when side effects matter, structures like dorun
and doseq
that don’t retain the head of the sequences should provide speed and efficiency too.
Finally, as Mark pointed out, there’s scoping to consider. If you don’t bind your sequence to a name, then as soon as it’s evaluated, the JVM’s garbage collector should be able to clean it up. Not ideal, but at least the GC is very efficient, and the unneeded sequence will be unlinked quickly.
I’m curious to see what happens next with streams in Clojure. It will be hard to make everyone happy either way.
The DC Clojure Study Group is digging into Clojure. One of our members, Luke VanderHart, contributed a program that parses a text file to build a Markov Chain generator: it will produce an arbitrarily long text that should mimic the style of the file you gave it by basing its output on the proximity of words or characters in the file. Cool stuff.
What I’m posting here is a close reading and interpretation of Luke’s code. I’d like to understand it myself and elucidate it for others, and I want to identify ways to add future lessons about Clojure into its code. For example, how could, or should, concurrency be used to increase the efficiency or modularity of the creation of the index at the heart of the generator? As it stands now, the code makes excellent use of lazy sequences and Java interop for reading the file. So let’s roll up our sleeves and dig in.
I’m reading the news on Twitter that a remarkable and wonderful thing has happened in the world of Ruby web developers: Rails and Merb are going to merge. Here are announcement posts from DHH and Yehuda Katz. And here is the requisite humorous site tracking whether the two are combined. As Matz says (by way of bryanl), “Love matters. It’s the greatest reason behind Ruby.”
Besides being encouraging to see two camps of alpha-geeks put ego aside for the common good, I think the rapprochement is tremendously good for the Ruby world and for web developers. The two camps are agreeing to take what’s best about each platform and put it in the new one. As a Rails developer, I get the Merb team’s improvements and advancements without having to choose a different platform. Merb developers get to practice their techniques within the Rails juggernaut and become Rails performance experts in the process. 2009 is already looking better.
Many people approaching a new language, particularly a new Lisp, often explore the new language by attempting to implement examples and exercises from canonical Lisp texts. Paul Graham’s On Lisp, Peter Siebel’s Practical Common Lisp, and Abelson and Sussman’s SICP are ususal suspects. I’m not going to attempt any large-scale “translation” of SICP into Clojure, but I do think the section 3.5, on streams, sheds a lot of light on the lazy sequences in the new language.
I haven’t posted about Ruby in a while, in part because I’m excited about learning Clojure and (possibly) Haskell with a bunch of fellow language geeks (er, software professionals). But I do >50% of my daily work in Ruby, and I do think about it a lot. Today I was reminded about how much useful knowledge and mastery lies hidden in the corners of constructs one may think one knows well. The reminder came to me courtesy of Hash#collect
, which returns an array, not a hash. Needing a hash from an operation on a hash, it was time to do a little digging and a little learning.
Members of Fringe DC are organizing a Washington DC area study group around learning Clojure and hacking some righteous artifacts with it. The first meeting is Sunday, December 7, at 1PM at Chief Ike’s Mambo Room, 1725 Columbia Rd NW. We’ll be meeting in person for about 3 hours every 2 or 3 weeks, and we’ll have an online forum for collaboration and contributions. Interested parties are welcome to come by. See some pointers below to prepare yourself so you can make the most of our time together.