
You’re sitting in the weekly design review meeting, taking comfort that it’s down to the last agenda item. Bob needs a new feature done by Friday, and it’s a simple one: add two fields to the user registration form asking people to submit their age and zipcode. How much problem can two fields be?
Alice quickly points out that each field you add means less registrations. Bob retorts that the new information will lead to better lead qualification, losing a percent or two is “no big deal”. Alice recalls the drop-off per field is much higher than that. Bob is confident people love the service enough to surrender additional data. Alice responds …
At the end of an endless list of opinions, personal biases and supposedly relevant anecdotal stories, it all comes down to who’s higher up the org chart. As Greg Wilson wittingly says:
… even the best of us aren’t doing what we expect the makers of acne creams to do.
Opinions are good, but they only get you so far. So are predictions based on other people’s experience. They’re good as a starting point, but the only meaningful numbers are the ones you measure yourself. On your site. With your visitors/customers. From the experiments you designed.
Experiment Driven Development
You’ve got your TDD, your BDD, your load testing, your user testing, your code metrics. All tools for testing your code and improving it. Here’s a question for you: what are you using to test your ideas?
Nathaniel Talbott came up with Experiment Driven Development to describe the process of testing your ideas and using the data to drive development. My best attempt to summarize EDD in one sentence:
Lay each feature as a hypothesis, write an experiment to prove it, run and collect data, use the conclusion to develop your code.
EDD is fact-based software development. It’s the opposite of developing software out of opinions backed by anecdotes based on stories CEO tell about their cousin in-law. EDD starts out with ideas and then measures them against real people: your customers, visitors to your site, dog-food eaters. It seeks proof, and it’s iterative.

Where TDD and BDD offer tools that help you improve code quality and make sure you code behaves according to spec, EDD helps you find out what features to develop and where to take them: it helps you discover what will become the spec.
If you’re following the Lean Startup methodology, you’ll recognize where EDD fits in the customer development cycle. If you’re not, I heartily recommend grabbing a copy of The Four Steps to the Epiphany.
EDD in Practice: Proving Bob Wrong
So let’s switch gears and put Bob’s idea to the test. We’ll start by formulating the new feature as an experiment, and we’ll run an A/B test to decide one way or the other. Our hypothesis is that adding two additional fields (age and zipcode) makes no difference. Let’s write an experiment for that:
ab_test “Age and Zipcode” do description “Change registration form: add age and zipcode fields.” end |
Visitors to our site will see one of two forms. We’ll make both alternatives available from the same Rails view:
<% form_for Registration.new do |f| %> Your name: <%= f.text_field :name %> <% if ab_test(:age_and_zipcode) %> Your age: <%= f.text_field :age %> Your zipcode: <%= f.text_field :zipcode %> <% end %> <%= f.submit “Register” %> <% end %> |
We’ll need to record each registration, so we can compare how each alternative performed:
class RegistrationController < ApplicationController def register @reg = Registration.new(params[:registration]) if @reg.save track! :age_and_zipcode # registration is our goal redirect_to thankyou_url else render action: “form” end end end |
We’ll let this experiment run for a few days, and then look at the results:

Would you have guessed option B performed better? Maybe Bob was right after all.
Testing Your Experiments
Our experiment involves adding two fields to an existing form. What could possibly go wrong?
Unless we add these fields to the underlying model, we’re not actually storing new inputs. That’s fine if all we want to know is “would people fill-in their age and zipcode”. What about the hypothesis that says these two fields lead to higher quality leads? To test that, we’ll need to store age and zipcode in the database.
So now we have one form that expects both fields, and we’ll want to validate and store their values. We have another form that can’t fail when these two fields are empty. Our goal is to get an A/B test to run, not A/broken, so we’ll make sure both forms work as expected.
Let’s test both alternatives:
class RegistrationControllerTest < ActionController::TestCase def test_form_with_one_field experiment(:age_and_zipcode).chooses(false) . . . end def test_form_with_three_fields experiment(:age_and_zipcode).chooses(true) . . . end end |
Another benefit that comes from testing your experiments: when we’re done with this experiment, we’ll rip out the definition and all tests that reference the experiment will fail. This will help us find all the places in our code that run the experiment and change them to the chosen behavior.
That’s automated testing, now let’s talk about hands-on testing. You’ll want to make sure both forms look good, and there’s no better way than trying them out yourself. For that, we’ll to use the Vanity Dashboard and switch between the alternatives we get to see:

Time to Decide
Once we reach a decision it’s time to end the experiment, change our code to apply the winning alternative, run another round of testing, and deploy a new release. Put that on the todo list.
Meanwhile we’ll let the experiment conclude itself. It might look like this:
ab_test “Age and Zipcode” do description “Change registration form: add age and zipcode fields.” complete_if do alternatives.all? { |alt| alt.participants >= 1000 } && score.choice && score.choice.probability >= 95 end end |
This experiment will complete when a) there are at least 1000 participants for each alternative, that’s a big enough sample size, and b) one alternative stands out with probability of 95% or more. (You want to read more about probabilities and interpreting the results)
Vanity will then turn the experiment off, stop collecting data, and switch all participants to the chosen alternative. You don’t have to rush to make a new release with the chosen alternative.
This may not sound like much, but the less you need to feed and care for each experiment the more experiments you can run. And that means faster iterations and more hard data to improve your application. Cool.
How Vanity Came To Be
Before Vanity I used a system I like to call “too many things in too many places”. It had ActiveRecord queries, JavaScript tracking feeding into Google Analytics goals, A/Bingo testing with crontab reports, log grepping, and a lot of knowing how to piece data from different places to make sense of it all.
It didn’t scale.
Listening to Nathaniel explain EDD, I had a eureka moment … more like euphoria as I imagined all that complexity disappearing into a framework with a clean, simple API. Then came some experimenting to find out which feature I needed, and implementing them one by one.
As I ran through these experiments, I came up with a list of what my EDD framework had to do:
Based in code. I want my experiments to be in source code, checked in source control, I want to edit them with a text editor.
Self-sufficient. I don’t want to maintain database schema and object models in support of the schema. The framework has to take care of storing its own state, collecting statistics, etc.
Lightweight. I don’t want to choose between performance and experimenting.
Smart. Do all the statistic calculations for me. I suck at math.
Integrated. You can do a lot with changes to HTML content, but some experiments need to reach deep into the application.
Testable. I need a way to test experimental code, and if experiment has multiple variants, test them all. A/B testing does not mean A/broken.
Reporting. I want a pretty UI to look at, also be able to email reports.
Easy on/easy off. Help me remove experimental code when I’m done experimenting.
Lazy-friendly. Automate as much as possible, less work for me.
Where to Next?
That’s Experiment Driven Development in short, and a brief introduction to Vanity. If you’re curious, watch Nathaniel Talbott present EDD at RubyConf ’09.
Vanity 1.0 is just out the door with support for A/B testing. The next major release will add usage experiments.
You can learn more about Vanity on the site http://vanity.labnotes.org. The code is on Github, there’s also a vanity-talk list you’re more than welcome to join. Now, let’s go experiment.
Awesome, I’ll be trying this out on a live app very soon!
Great work, Assaf! I’ve been very interested in this since you tweeted about it, and the plugin looks great. Can’t wait to try it out!
Vanity looks awesome, Assaf. Exactly what I’ve been looking for :D
so now we soon have A2Z of .DD :) ,
some I can suggest from my freelancing days.
Payment Driven Development(getting paid is important)
User interface Driven Development(screens are ready bid for it)
Vision Driven Development( I have this worl changing vision)
I really like the approach you are taking – using software for intelligent decision making. Every former grad student should recognize and appreciate this experimental design process. Another example of how the Ruby community is really leading the pack in software design and development.
How much extra work does this require in maintaining the different forks?
In the example you gave, presumably you also have to check which variant you’re running at the model level, to disable validations when necessary, and then make that check at every point where the fields would otherwise be required. I really like the concept of A/B testing, but I’m concerned by the extra maintenance overhead while experiments are running.
You start with an idea for a change that will improve your software. Your baseline cost of development is having both alternatives — before and after the change. Without EDD these alternatives will be separate in time, with EDD you’re going to have some overlap (the duration of the experiment).
For the experiment, you only need a skeletal implementation, you’re not committed to fully developing the feature until after it proves itself. For small changes it makes little difference, the cost is the same.
For complex changes, you can save a lot by not fully developing features that don’t matter. You’re going to know whether a feature matters or not quickly enough, and with data to back up, that you can make the decision to *not* develop it further.
You can also kill unused features early. So these are two ways to reduce development costs using EDD.
Clean looking library Assaf, looking forward to trying it out with a Spree install.
Hi,
How do you think EDD (and perhaps the entire Lean Startup methodology) deals with the arguments presented in this article ?(http://onstartups.com/tabid/3339/bid/11416/Releasing-Early-Is-Not-Always-Good-Heresy.aspx)
And a quick idea – A/B testing should be somehow incorporated into a revision control system, having the system automatically calculate the “best source code paths” for a given scenario/user.
Thank you for the lib and for the blog,
Regards,
Asaf.
That article starts out by taking “release early”, and slippery sloping it into a “release crap” strawman. It attacks design-by-consensus, but either doesn’t mention or doesn’t understand customer development. Strawman are easy to build, easy to defeat, great link bait, but add nothing to the conversation.
Reasonable people start out with good ideas they want to turn into quality products that people will actually want to use. Good ideas still need tuning, few ever get it right the first time. How do you tune? You can do it based on opinions, or you can do it based on facts.
EDD is about tuning your software to be better based on facts.
Looks quite interesting, can’t wait to give it a try in my next app!
Absolutely brilliant! There’s no doubt in my mind you’ll soon see ports to django and other frameworks.
Design by committee just doesn’t work. Solid, data-driven decisions are what matter. But, doing good A/B testing has to be simple or it just won’t be done. Very, very cool. Thanks for this.
If you’re on Windows and trying to use Vanity – maybe give up. Right now it requires Redis which doesn’t easily compile on win. Read a few posts about getting redis running with cygwin sucessfully, but quite a few more that had issues. Some people compiling it with visual studio too, but also some had issues.
Just thought I’d throw that in in case anyone is having difficulties.
Also would love to see a config option to use something besides Redis.
interesting.
how i can see this extended statistics:
a) how many signup (rate)
b) how many signup (rate) with 3 fields (username, age, zip)
– b.1 fill in field username
– b.2 fill in field age
– b.3 fill in field zip
or how i can setup maybe some more subgoals?
Pingback: almost effortless » Weekly Digest, 12-11-09
Pingback: How we do A/B testing in Rails « Amit Kumar Mondol's Blog
Vanity rocks!!! Its an amazing piece of s/w and I got it up and running in not time at all. Thanks a ton Assaf & other contributors