-
May 5th, 2008
There you go.
require 'rubygems'
require 'xmpp4r'
require 'xmpp4r/roster'
puts 'Connecting ...'
client = Jabber::Client.new(ARGV[0]).connect(’talk.google.com’)
client.auth(ARGV[1])
puts ‘Receiving’
roster = Jabber::Roster::Helper.new(client)
roster.add_presence_callback do |roster_item, old_presence, new_presence|
if new_presence
from = roster_item.iname || “#{new_presence.from.node}@#{new_presence.from.domain}”
if new_presence.status
puts “#{from}: #{new_presence.status}”
end
end
end
client.send(Jabber::Presence.new)
Thread.stop
client.close
Run from the command line:
$ ruby distwit.rb <jabber_id> <password>
Connecting …
Receiving
Andre Lewis: Away
Matthieu Riou: Enjoying a JavaOne couch
Matthieu Riou: Entertaining Assaf
Alexis Midon: hacking in a couch @community-one
Now just wait for your friends to status away using their IM client.
And don’t forget, please yo-yo the rrm.
Posted by Assaf
Filed in general, ruby
-
February 28th, 2008
Sam Ruby on the trials of getting Ape to run from SVN:
If you do manage to check the file out, you see a Rakefile. This would imply that one might want to use rake. Trying it results in a message “You are missing a dependency required for meta-operations on this gem. No such file to load — echoe”
Unfortunately that’s the case with too many Ruby projects.
When you gem install, you get all the necessary runtime dependencies; svn checkout doesn’t have quite the same effect. And starting with the gem before diving head into source doesn’t always work: runtime and build dependencies are not the same.
What wrong with a few undocumented, tedious, manual steps? As the corollary to Blodgett’s First Law says:
Any step in a process that could be automated must be automated.
We ran into the same problem in Buildr, as luck would have it, fixed it today. Victor Hugo Borja came up with the code snippet below that works on both RubyGems 0.9.5 and 1.1.0.
Now all you have to do is svn checkout, followed by rake setup and you’re good to go.
desc "If you're building from source, run this task first to setup the necessary dependencies."
task 'setup' do
gems = Gem::SourceIndex.from_installed_gems
# Runtime dependencies from the Gem's spec.
dependencies = spec.dependencies
# Add build-time dependencies, like this:
dependencies << Gem::Dependency.new('highline', '~>1.4')
dependencies.each do |dep|
if gems.search(dep.name, dep.version_requirements).empty?
puts "Installing dependency: #{dep}"
begin
require 'rubygems/dependency_installer'
Gem::DependencyInstaller.new(dep.name, dep.version_requirements).install
rescue LoadError # < rubygems 1.0.1
require 'rubygems/remote_installer'
Gem::RemoteInstaller.new.install(dep.name, dep.version_requirements)
end
end
end
end
Most likely your Rakefile will not load at all without some of these dependencies, so code defensively:
begin
require 'highline'
rescue LoadError
puts 'HighLine required, please run rake setup first'
end
Check for build dependencies early and often, fix Rakefile when necessary, your community will thank you.
Posted by Assaf
Filed in general, ruby
-
February 17th, 2008
Cover your eyes, you’re about to see some blatant XML abuse. With your eyes closed, create the file /Library/LaunchDaemons/ruby.gem.server.plist and place this into the file:
<?xml version="1.0" encoding="UTF-8"?>
<!DOCTYPE plist PUBLIC "-//Apple Computer//DTD PLIST 1.0//EN" "http://www.apple.com/DTDs/PropertyList-1.0.dtd">
<plist version="1.0">
<dict>
<key>Label</key><string>ruby.gem.server</string>
<key>ProgramArguments</key><array><string>/usr/bin/gem</string><string>server</string></array>
<key>KeepAlive</key><true/>
<key>RunAtLoad</key><true/>
</dict>
</plist>
Save and open your eyes. From the command line, change ownership on the file, and load to launchd:
$ chown root:wheel /Library/LaunchDaemons/ruby.gem.server.plist
$ launchctl
launchd% load /Library/LaunchDaemons/ruby.gem.server.plist
launchd% exit
Open your browser to http://localhost:8808/. Bask in the glory.
Update: Or you can just use Lingon which reduces all of this to a few clicks, no need for XML or chown heroics (thanks Matt).
Posted by Assaf
Filed in general, ruby
-
January 28th, 2008
First, a quick announcement. We’re wrapping up Ruby In Practice and heading towards final review. Finally.
If you already bought the early access edition from Manning, you’ll get a new PDF with the remaining chapters. If you didn’t, Pat Eyler is running a blogging contest and giving away a free copy. The rules are simple: blog about your Ruby In Practice adventures, and let Pat know about it. I’d love to hear about your experience with Ruby.
With so many Ruby books on the shelf, we decided not to focus on the language itself, but cover the things you use Ruby for. Most of the topics we cover in Ruby In Practice deal with business applications. Yes, that dreaded enterprise stuff. But why not, when Ruby helps you do things simpler and better?
If you’re still sizing up Ruby for your next project, here’s a few things we get to cover in the book (and a couple we don’t).
HTTP and REST
Although some people confuse them, HTTP and REST are not one and the same,and we keep to that distinction.
We have one section dealing with HTTP and covering the basics: URI, open-uri, Net::HTTP, Mongrel. The REST part expands on it to deal with resources and representations, and does so using Rails 2.0 and ActiveResource.
One reason we picked Rails is the wonderfully pragmatic way it uses HTTP and REST. It’s a smart application of Web Architecture practices, one I can point to people and say “here, learn by example”. Even if Rails is not your cup of tea, I recommend looking at it inspiration and ideas. Other people’s experience, amplified.
SOAP
Wish I could say the same about SOAP. Let me summarize SOAP4R: it smells like Java code built on a Monday morning by an EJB coder.
To its defense, once you get SOAP4R working, it does what you expect it to do. It’s just not pretty, simple or fast about it. SOAP4R is WS-I compliant, so you can use it to interoperate with Java and .Net. In the book we show you how to use it effectively, but clearly SOAP4R is not going to win you over to the WS-* side. If anything, you’ll learn to appreciate the simplicity of REST even more.
There’s also ActionWebServices. Although it cleans up the SOAP4R API, it does so by using Ruby annotations and focusing on RPC-style services. If you are using SOAP, at least do it right with contract-first design and doc/literal messaging. We start the example with a WSDL service definition which we use to generate the client and server code, and SOAP4R was the best option we’ve got.
Two recent additions that appeared too late for inclusion in the book are WSF/Ruby (Axis/C with Ruby bindings) and using JRuby in combination with Axis/J.
Axis has killer support for SOAP, WSDL and various other WS-* specs, not to mention performance and stability. All good things, but right now both options mean working with the low-level APIs. It feels more like coding in C and not what you’d expect to find in Ruby.
For the motivated, here’s a cool project idea. Build a Ruby Goodness WS-* API that can use SOAP4R, WSF/Ruby or Axis/J as the underlying stack. I’d love to see something like that.
And speaking of Ruby Goodness APIs for enterprise messaging …
WebSphere MQ
WebSphere MQ is one software product that captures so well the essence of Enterprise computing. That thing has more knobs and switches than a Boeing 747 cockpit, legacy support going back to the ENIAC, and command line tools that only its mother can love. It comes with an Eclipse-based management tool full of usability mistakes that refuses to admin anything (on my machine, at least).
On the other hand, it’s WebSphere MQ. It has all the options you’ll ever need, broad platform support from PC up to the biggest meanest of mainframes, and everything can be managed from the command line, or scripted. So what’s
not to love?
I know what you’re thinking, WebSphere MQ and Ruby are the opposite extremes, putting both in the same room would be an hilarious blind-date gone bad. It so turns out the two have great chemistry.
J. Reid Morrison developed a beautiful Gem, literally, and in the span of a few hours working with it, RubyWMQ became my favorite add-on feature for WebSphere MQ.
RubyWMQ is a thin wrapper around the C client library. I always preferred working with the WMQ API over the complex abstraction that is JMS. Call WMQ::QueueManager.connect to establish a new connection, WMQ::Message.new to create a new message. No JNDI lookups, factories, sessions, just WMQ served straight up in all its glory.
Here’s the surprising part. Even though the entire thing is written as a C wrapper on top of a C API, it has all the right Ruby idioms that make it such a productive pleasure to use.
One example we get to cover in the book:
queue.each(:sync=>true) do |message|
puts message.descriptor[:msg_id]
puts message.data
end
Guess what happens when the processing block raises an exception? It moves to process the next available message, while letting WebSphere MQ deal with re-delivery, backout and dead-letter queuing of the failed message.
The simplest processing loop you’re going to write will buy you the reliability that WebSphere MQ is known for. No heavy-weight app servers, no XML inversions, just simple, intuitive code.
It gets better.
ActiveSalesForce
Knowing what a pain WebSphere MQ is, I set aside enough time for this exercise. Installing the beast took forever, but once I got to the Ruby part, everything just flowed and I ended up with time to spare. So I decided to up the stakes.
The WMQ example consisted of a small Ruby program that collects XML messages from various applications and stores them in the database. What would it take to create these records in SalesForce?
One line of code and one configuration change later, and I’m running the program with messages coming over WebSphere MQ and sales leads popping up in my SalesForce dashboard.
Couldn’t be simpler.
Wrap Up
When it comes to Ruby, REST support is ahead of the curve, SOAP support is as bad as most other languages, and nothing like what you can expect with Java/.Net.
You’re also going to be disappointed if you live for the rush of setting up an ESB that can JMS messages on one end and JCA on the other, coding for generic APIs, and writing XML configurations like there’s no tomorrow. Ruby favors simple steps to get the job done over big infrastructure purchases.
But how well do you know Ruby? Just a scripting langauge that can also serve Web pages? Think again. Ruby got skills, the little language can do serious heavy lifting. How many languages you know that can wire your message bus to online services with just a few lines of code?
Posted by Assaf
Filed in ruby
-
December 17th, 2007
1999 called, they want their CGI scripts back.
I think that pretty much sums up people’s response to the horror that is the SimpleDB API. By which, I mean the few of us who can tell what’s wrong with it. Which sadly is not that many people.
But offending it is, and much like MySpace, I can’t bring myself to use it. So I decided to do something about it. Subbu came up with an interesting proposal for a RESTful API for SimpleDB:
In this exercise, my starting point was Amazon’s definition of the REST API, which I refactored into a RESTful version without breaking the usage pattern. Several variations of this approach are possible, but the key point to make is that (a) it is important to identify what the resources are, and (b) then think of mapping various operations into known HTTP verbs for the API to be RESTful, without losing focus on the net benefits of building an API over HTTP. This is not hard.
Not hard at all, Subbu. I took that idea and ran with it, creating DeHorrible.
DeHorrible is a Rails proxy that RESTifies SimpleDB. Or if you insist, GETStifies your resources to use SimpleDB. Either way, it will keep your sensibilities intact. I ended up with a slightly different resource mapping:
- POST /domains, using either name query parameter or name in body, returning 201 with URL in Location.
- GET /domains, with optional limit and token query parameters, which brings you back a list of URLs, with which you can …
- DELETE /domains/[name].
- GET /domains/[name], which takes the optional query parameters query, limit and token, and brings you back a list of URLs, one for each found item.
- POST /domains/[name] to create a new item, returning 201 with the URL in Location. Requires item name and attributes, either URL-encoded, JSON hash, or XML document.
- GET /domains/[name]/items/[name] retrieves all attributes associated with that item.
- PUT /domains/[name]/items/[name] updates (replaces) attributes with new values.
- POST /domains/[name]/items/[name] adds new attribute values.
- DELETE /domains/[name]/items/[name] does what you think it will.
Once you get into attributes (attributes/[name]) you can do a few more funky things like retrieve, replace and append individual attribute values, and delete an attribute of all its value, or a given value (attributes/[name]/[value]).
It uses HTTP Basic authentication, so plug your AWS ID as login name, secret key as password, and rock:
curl http://your_id:key@localhost:3000/domains -X POST -d HelloDomain
curl http://your_id:key@localhost:3000/domains/HelloDomain
So there you go. SimpleDB de-horrified. Now we can all sleep better.
The code is available here.
Posted by Assaf
Filed in general, ruby
-
December 14th, 2007
Not that the first version was all that bad, but the second version is much better. Filters make all the difference (also some bug fixes):
class ItemsController < ApplicationController
before_filter :item, :only=>[:show, :update]
if_modified :item, :only=>:show
if_unmodified :item, :only=>:update
def show
end
def update
item.update! params[:item]
render :action=>’show’
end
private
def item
@item ||= Item.find(params[:id])
end
end
if_modified
The if_modified filter calculates ETag and Last-Modified values from the instance variable, and compares those to the conditional GET If-Modified-Since/If-None-Match headers. If the data changed since the last request, it performs the action and sets new headers on the response. If the data didn’t change, it sends back 304 (Not Modified).
That means you only need some minimal information to figure out whether or not the action should run its course. For heavy stuff, this can save you from loading a lot of data and rendering the response. And of course you can do it entirely from Memcached and save a trip to the database.
if_unmodified
The if_unmodified filter calculates ETag and Last-Modified values from the instance variable, and compares those to the conditional PUT If-Unmodified-Since/If-Match headers. If the data didn’t change since the last request, it performs the action and sets new headers based on the new (post-update) values. If the data did change, we have a conflicting update, and it sends back 412 (Precondition Failed). That’s sign for the client to retrieve the resource again and attempt another update.
You can use this one to solve the lost update problem. If two clients are updating the same resource concurrently, one gets served with a Notice Of Conflict, so it can safely run the update again.
It can also be used to reliably create a resource, which I’ll cover in a future post.
Usage
The first argument to either filter can be a method name (:item) or an instance variable name (:@item), or you can use the :using option with method name, variable name or a block.
Last-Modified is calculated from the update_at value, most recent if the value is an array. ETag is calculated by calling the etag method, a combined hash in case of an array.
This also means adding an etag method to your ActiveRecord. One is provided by default and will calculate a unique tag from the object’s attributes, or when using optimistic locks, the record’s id and version column.
Content type is also included in the ETag hash, so two representations of the same resource will not conflict in the cache. But as a side note, if you are using the same action to serve a page and a partial (for XHR), you’ll get two ETags and neither copy will be cached, so try using the .js suffix for the XHR URL.
Code is over here, and to install:
./script/plugin install http://labnotes.org/svn/public/ruby/rails_plugins/if_modified
Posted by Assaf
Filed in ruby
-
December 11th, 2007
Here’s the easiest way to handle JSON requests in Rails 2.0:
./script/plugin install http://labnotes.org/svn/public/ruby/rails_plugins/json_request
The plugin adds a MIME parser that parses requests with the content type application/json, and maps them to a request parameter using the controller name as guidance.
class ItemsController < ApplicationController
def show
respond_to do |format|
format.json :json=>@item
format.xml :xml=>@item
end
end
def update
@item.update_attributes! params[:item]
show
end
end
It works the same way whether you feed it an XML document, JSON object, or URL-encoded parameters.
Since the JSON object is not annotated, it figures out the parameter name based on the controller name, in this case ItemsController becomes item. If you prefer a different name, add this line to your controller:
json_request :itemz
The json_request method adds a filter and can take the only and except options, if you need to limit it to specific actions.
Posted by Assaf
Filed in ruby
-
October 23rd, 2007
If you have a few moments, please read and tell me what you think would be the better option for each of these.
1. JSON says
Here’s a nice party trick:
respond_to do |format|
format.html
format.xml { render :xml=>@item }
format.json { render :json=>@item }
end
When I’m feeling particularly evil, I walk over to someone still struggling to dual-render XML and HTML using indecipherable XSLT, ask them when JSON transformation will hit the roadmap, watch them squirm, jot these few lines on a piece of paper, put it on the table and quietly walk away.
OK, not really. Actually I’m struggling doing it the other way around. Wouldn’t it make sense that I could GET a representation, change it, and PUT it back?
For XML it turns out to be quite simple:
def update
Item.update params[:id], params['item']
end
Rails grabs the request body, an XML document with the document element item and stuffs the entire thing into a parameter with the same name.
If you’re lazy you get rewarded. Let Rails do all the form building for you, and its magical helpers will use field names like item[name], item[count] and item[tags][]. Those are nicely parsed into a hash called item, so the above code will also work with hForm.
But what about JSON? Test case please:
curl -d "{ name: \"foo\", count: 2, tags: [ \"bar\" ] }” -H “Content-Type: application/json”
The data is here, alright, and it’s the same structure we got back on the previous GET, but there’s nothing wrapping it up. JSON is naked. No XML document element name we can use to decide which parameter to place it in. Which means, one of these:
Option 1. All JSON objects are called such:
def update
Item.update params[:id], params['item'] || params[:json]
end
This works nicely as long as you remember to always param[:json], which given my propensity to easily forget, is why I’m not particularly thrilled.
Option 2. All non-named params are always called the same:
def update
Item.update params[:id], params['item'] || params[:data]
end
Looks the same, but is not. This option is forward looking and anticipates the possibility of some future technology we would want to use without having to go back and fix old code.
Option 3. It always is data:
def update
Item.update params[:id], params[:data]
end
Even XML documents get called data. Everything gets called data. Which doesn’t work for query parameters or forms, but just for completeness I had to list this as well.
Option 4. Name inference:
def update
Item.update params[:id], params['item']
end
This only works because the controller is called ItemsController, and the downcased singular name is item, and so we can infer what the JSON object should be called more often than not.
I’m personally leaning towards a combination of both #1 and #4, so you can :json or :item it. What do you think?
2. Are JSON objects hashes or arrays?
All the use cases I have are for receiving records, whether represented as JSON objects or XML documents:
<item>
<name>foo</name>
</item>
{ name: "foo" }
But I do have code that renders collections, and contemplating the idea of receiving multiple records as inputs:
<items type='array'>
<item>
<name>foo</name>
</item>
</items>
[ { name: "foo" } ]
The thinking at Rails core is that hashes are enough, although there’s no such restriction on XML input. Have you considered a use case for receiving arrays (XML and/or JSON)?
3. redirect_to vs. see_other
Here’s something you see quite often in Rails apps:
def create
article = Article.create(params[:article])
redirect_to article
end
It seems like the right thing to do. It handles a POST request by creating the new resource, and then redirects the client to that resource (here, using polymorphic routes, a nice addition in Rails 2.o). Except it doesn’t, redirect_to will send back a 302 (Found) status code.
The 302 status code tells the client that the resource is found in a different location. By which we mean the original resource, so the proper thing to do is head over to the new location and do the POST all over again. Not what the controller author had in mind!
Per the HTTP specification, you would want to send back a 303 (See Other). Redirects (301, 302, 307) tell the client that the resource moved, and please can you try sending the same request over to the new location. 303 tells the client the request went through, got processed, please check the other location for the result.
So how come this happens and the world doesn’t fall apart? Turns out browser developers got lazy, and some handled 302 and 303 the same way. People wrote CGI scripts by brushing through the HTTP spec and just testing out what works in the browser. And copying other people’s code. Eventually this bug got codified into the Undocumented HTTP Specification. So redirect_to doesn’t break browsers.
When people write client applications that talk to a Web service, they either go lazy or go HTTP. Those that go lazy (send request, extract Location header) won’t break, but they’re losing an important HTTP feature: the ability to move resources and leave behind a forwarding address. Those that go full HTTP will either raise and error or attempt a second POST.
I think we need to fix this, and get people to write services properly from day one.Option 1. Rails (2.0) gives you two options, both of which are examples for elegant use of hashed arguments:
head :see_other, :location=>article_url(article)
redirect_to article, :status=>:see_other
Both return 303, as HTTP intended it to be, so not much to complain, except we all know how likely it is that people will take the extra step to add an obscure status code that so far hoards of developers ignored.
Option 2. Temporary/permanent redirect and See Other are not the same thing, let’s make the difference known and introduce a see_other method:
# Use this in response to an HTTP POST (or PUT), telling the client where the new resource is.
# Works just like redirect_to, but sends back a 303 (See Other) status code. Redirects should be used
# to tell the client to repeat the same request on a different resource, and see_other when we want the
# client to follow a POST (on this resource) with a GET (to the new resource).
def see_other(options = {})
if options.is_a?(Hash)
redirect_to options.merge(:status=>:see_other)
else
redirect_to options, :status=>:see_other
end
end
Option 3. Or sprinkle a bit of magic on redirect_to:
def redirect_to(options = {}, response_status = {}) #:doc:
if options.is_a?(Hash) && options[:status]
status = options.delete(:status)
elsif response_status[:status]
status = response_status[:status]
else
status = request.post? ? 303 : 302
end
. . .
Which one do you think is better?
4. [1, 2, 3].to_xml
This works:
{ 'foo'=>'bar' }.to_xml
This throws with a violent exception:
[ 'foo', 'bar' ].to_xml
Any particular reason why it would be bad to XML-ize an array of primitive values?
Posted by Assaf
Filed in ruby
-
October 8th, 2007
Nick Sieger brings up an interesting question. Imagine an n-tier application and you’re using HTTP to connect the tiers together. That wasn’t the question, just a good practical advise to follow. The question was, can you place a proxy server between the tiers and reap all the benefits?
Specifically in the context of Rails and ActiveResource:
The great thing is that existing HTTP reverse proxies can be used without having to mix the caching code in with your application code.
You can get a cached installed in a few minutes, switch over by setting HTTP_PROXY, and bingo. Could it be that easy?
The Glass Half Empty
If you’re serving the same data to all your clients and there’s little transformation happening on the client, then a reverse caching proxy is the way to go. It’s as close as it gets to a free lunch, performs and scales better than loading the database.
It breaks in two cases, though.
The first, when you’re serving different content to different clients on the same resource, and you don’t want the proxy server leaking those secrets away. Say you’re authenticating, admins get a different view than pedestrians, and both are served by the same URL. Rails assumes that by default, a good assumption to make, and sets Cache-Control to private, telling the proxy not to cache anything. It still allows caching in the client, but you’re probably not running a caching client.
The second place it breaks is when the client does non-trivial transformation on the data it receives. Imagine a client pulling in 10MB of data to calculate ‘total of orders’. Hitting the cache when there’s no change could save a trip to the database, but you still end up processing all that data again.
I’m saying “could” because the Rails magic involves running the action in full, creating a response document, calculating the ETag and then deciding not to send it back. On the Web that makes a noticeable difference in client response time, your browser will thank you. When you’re running two machines in the data center or even same rack, you won’t see any performance or scalability gains. It’s still database access all the way down.
The Glass Half Full
All of which is fairly easy to fix, for Rails or any other framework, but requires that we remove some of the child-proofing and expect adults to pay more attention to caching. Not horribly complex, but no transparent benefits either.
cache_in_public. If you only ever return the same content for a given resource, but possibly 403/404 for unauthorized clients, you can change Cache-Control to public/must-revalidate. My HTTP sources tell me this will work for a caching proxy without leaking to rogue clients. Something like:
cache_in_public :index, :show
acts_as_sourced. If you do significant work on the client, you want to handle caching directly, sending ETag/Last-Modified and avoiding redundant work on unchanged data. This can be simplified by creating a plugin that handles conditional GETs:
class Computed < ActiveRecord::Base
acts_as_sourced
end
Computed.retrieve(url).save
if_modified. Now that you’re bona fide caching all over the place, or as often as you possibly can, you’ll want to eliminate those round trips to the database by checking for changes before doing any work. Something like if_modified:
def show()
if_modified @record do
render :action=>'show'
end
end
While we’re at it, paying attention to caching and having fun, we should also consider conditional PUTs:
resource.update do |record|
record.count += 1
end
Side Effects
And those are quite unfortunate, so we’ll have to keep that in mind.
You see, caches are side effects. Any request going through a cache will not necessarily return the correct result. That depends entirely on the state of the cache and how resources are handled by the server. How do you prove the code is correct?
More than server-side and client-side support, this will need a good testing framework to make sure the cache abstraction doesn’t leak, and developer awareness for what needs to be tested. It’s not rocket science, but there still is a barrier to entry of Memcached complexity, though with more benefits.
Posted by Assaf
Filed in ruby
-
September 28th, 2007
Tim Bray started the WideFinder meme spreading, so I’m going to bite.
But I’m going to talk about something different. Not a language shoot-out or micro benchmark. I’ll let others talk about that. I’m going to use the WideFinder as an example to show you some principles of concurrent processes and how you can use them in your code. I’ll use Ruby here, link to a Java implementation at the end.
All the recent interest in coreness and RESTful services, it might be the right time to bring back a favorite subject of mine.
Process Calculus
This is something I got involved with around the turn of the century. I like how that sounds, turn of the century, all credible and aw inspiring, don’t you think? Sad part is, it took me a decade before I found out about pi-calculus, and only while trailing down the wrong path. I started with and recommend reading A Calculus of Mobile Processes (Milner, Parrow and Walker, 1989).
The essential features of process calculus, courtesy of the Wikipedia entry:
- Representing interactions between independent processes as communication (message-passing), rather than as the modification of shared variables
- Describing processes and systems using a small collection of primitives, and operators for combining those primitives
- Defining algebraic laws for the process operators, which allow process expressions to be manipulated using equational reasoning
Process calculus is to concurrent systems what relational algebra is to databases. And just for saying that, half of you moved on to the next post, and the rest, I can hear you snoring. But hang on, this is seriously cool stuff, especially if you like the science part of computers and like to learn new skills. I’ll do my part by showing more code than concepts.
In process calculus, we describe everything as concurrent processes interacting through message passing, with the ability to pass channels in messages. We can use that to describe protocols from TCP through HTTP all the way to a Web of services talking to each other. We can also use that to describe low-level constructs like threads, locks, semaphors and such. Also stuff that doesn’t happen in parallel.
These processes are not distributed services. They’re not operating system processes. They’re not threads. They’re a conceptual way we can describe any and all of these. So the processes I’m going to show you here deal with writing lock-free asynchronous code and letting go of shared state. It doesn’t prescribe a particular way for distributing the workload.
Why Should I Care?
I find it fascinating, but I guess you need something a bit more concrete.
When you’re building concurrent systems, you can work hard or work smart. Threads, locks and other synchronization constructs help you work hard. Smart is avoiding any shared state. Think about it as functional programming for concurrency. And it’s a great way to pull off asynchronous processing and anything you need to scale larger than a method call.
So that’s one reason.
It will also help you build distributed systems and deal with distributed fallacies. Those are all easier when you’re using messages to transfer state. Substitute process for resource and you get REST. And you know why that one is important. So there.
Pi-calculus Cliff Notes
Pi-calculus was my introduction into process calculus, so I’m going to show it briefly before moving on to code. I do recommend you spend some time reading about it in detail. My apologies to math geeks worldwide, though pi-calculus uses mathematical notation, I’ll approximate it using ASCII.
The smallest process you can have sends a message on a channel, receives a message from a channel, does something we don’t care to describe (call it, magic) or nothing at all:
P = x!y # Send value y on channel x
Q = x?y # Receive on channel x, store in y
0 # Nothing
The simplest composition is just a sequence of processes, separated with dots:
P = x?y
Q = y!z
S = P.Q
Therefore:
S = x?y.y!z
Interesting? Not so much. So let’s add the parallel composition:
P = {x} Q | R
Q = x!y
R = x?y
The process P consists of two other processes and a new channel known to both (think of it as a scoped variable). Process Q sends a message on that channel and then reduces to nothing (0). Process R receives that message and reduces to nothing, at which point process P reduces to … I’ll let you guess. Think of reduction as single-stepping through your code.
We’re almost there, but the world is non-deterministic, so we also need to consider those cases:
P = x?y.Q | z?y.R
This just means process P either receives a message on channel x and reduces to Q, or receives a message on channel z and reduces to R. Not both. So now we have conditions.
(Wondering what happened to good old if? It will take a few minutes to think this through, Google may help, but there is a way to express if with these constructs alone)
But we still can’t reduce more than once, so let’s introduce the bang:
!P = !P | P
The bang just says that we can reduce the process any number of times, once for each message. And that we can use to implement a Web server, 8080?req, and to handle recursion (and therefore loops). Here’s an example for an infinite loop:
P = x?y.x!y
Q = {x} !P | x!0
So with inputs, outputs, conditions and recursion we can express all sort of functions (the lambda part) but also describe concurrency and distributed systems.
Code Time
So let’s see what this looks like in code. Obviously we’re working higher level than pi-calculus, we have things like variables, functions, loops, ifs, etc. We only use the process model for concurrent work.
We can start by writing a simple DSL. I did. One of my first experiences meta-programming in Ruby, ended up as a miserable failure and important lesson: abstractions are good when they help you express more, not when they limit what you can express. Here, an API will do just fine.
Sending a message to a process:
process.send *args
Receiving a message:
message = receive
This is always called inside the process.
We need conditional receives. In pi-calculus we keep the conceptual model simple, but end up with a boatload of channels. In the real world we don’t want to be in the business of tracking all these channels, so we’ll multiplex different messages on the same communication channel (one per process) and use pattern matching to tell them apart:
receive do |match|
match.when :foo do |args|
. . .
end
match.when :bar do |args|
. . .
end
end
Parallel work is easy, since fork is already in use, we’ll call it spawn instead:
foo = spawn { ... }
spawn { ..., foo, ... }
There’s no bang. We don’t need it when we can just spawn, loop and recurse. (Although, when I implented a bang method I ran into scoping issues when using it and just defaulted to spawn; perhaps there’s a better way to bang)
Speaking of recursion, have you heard of the stack limit? Let’s get around that with stack-less recursion:
tail { ... }
Now we’re ready, let’s use all of these to write a WideFinder.
WideFinder In Ruby Flavored With Pi-C
First thing I’m going to do is pull out counting and reporting into separate functions. I’m using the same code from Tim’s Ruby WideFinder, so we can get that out of the way and move to the more interesting parts:
# I map from lines to hash.
def count(lines)
lines.inject(Hash.new(0)) do |counts, line|
if line =~ %r{GET /ongoing/When/\d\d\dx/(\d\d\d\d/\d\d/\d\d/[^ .]+) }
counts[$1] += 1
end
counts
end
end
# I just output the top-ten results.
def report(counts)
puts “Results are in”
keys = counts.keys.sort { |a, b| counts[b] <=> counts[a] }
keys[0 .. 9].each do |key|
puts “#{counts[key]}: #{key}”
end
end
Next we’re going to write a process that reads the log line by line and splits it into chunks, then spawns a process to count lines in each chunk, so all the chunks counting happens in parallel.
There are so many ways to write that. I decided to pick one that has no shared state or synchronization blocks. But we do need to wait for all counters and collect the results, so we’ll add another process for dealing with that. One maps, the other reduces.
Here’s a spawn that starts a process (everything in the block) to count a chunk of lines and send the results to the collecting process:
spawn { collector.send :result, count(lines) }
I’ll do one better and insist on the process not holding any mutable state, only using messages to change from one state to the other. That’s how you build concurrent systems.
So no Object, global variables, and all local variables are immutable. There’s more overhead, but I’m not shooting for maximum performance, I’m trying to show you how to think asynchronously.
So without variables that can change state, we’ll use the oldest trick in the book and recurse. Since the Ruby stack can only take so much abuse, we’ll optimize tail recursion:
tail { split(collector, limit, source, lines + [source.readline], sets) }
The process looks like this:
def split(collector, limit, source, lines = [], sets = 0)
if lines.size >= limit
# Over the limit, spawn a new process to count the lines and pass the
# results back to collector. The repeat for a new set of lines.
spawn { collector.send :result, count(lines) }
tail { split(collector, limit, source, [], sets + 1) }
elsif source.eof?
# Send whatever lines we collected so far to collector. Also tell collector
# how many sets of results we have.
spawn { collector.send :result, count(lines) }
collector.send :expecting, sets + 1
else
# Collect the new line, repeat.
tail { split(collector, limit, source, lines + [source.readline], sets) }
end
end
The collector receives all the results, groups them together and prints out the report. It follows the same process-functional style.
Since messages may arrive in any order, we can’t tell the collection when we’re done. Instead, we tell it how many result sets to expect and let it figure things out. Here’s what it looks like:
def collect(counts = {}, sets = 0, expecting = nil)
if expecting == sets
# All sets are in, report and return.
report counts
else
receive do |match|
match.when :result do |_, result|
# Results from counter, combine with what we already have, and loop back.
counts = result.keys.inject(counts) { |h,k| h.merge!(k=>(h[k] || 0) + result[k]) }
tail { collect(counts, sets + 1, expecting) }
end
match.when :expecting do |_, expecting|
# If we know how many sets there are, we know when to end.
tail { collect(counts, sets, expecting) }
end
end
end
end
Now we need to tie it all together:
collector = spawn { collect }
spawn { split(collector, 10000, ARGF) }
And run:
ruby wf.rb o10k.ap
=> Results are in
42: 2006/09/29/Dynamic-IDE
8: 2006/07/28/Open-Data
3: 2003/07/25/NotGaming
3: 2004/04/27/RSSticker
2: 2003/09/18/NXML
2: 2004/10/01/AutumnLeaves
2: 2006/09/07/JRuby-guys
2: 2004/02/27/RSS-Unreal
2: 2003/04/10/Concorde
2: 2005/12/29/Selling-Art
Misc
The entire example is here, and the pic library here.
If Ruby is not your deal and you’d much rather use Java, have a look at Jacob. It’s a much larger framework, so a big harder to use, but as bonus it can persist state in the database and pull other cool tricks.
Posted by Assaf
Filed in ruby
« Older posts
|
|