
You’re sitting in the weekly design review meeting, taking comfort that it’s down to the last agenda item. Bob needs a new feature done by Friday, and it’s a simple one: add two fields to the user registration form asking people to submit their age and zipcode. How much problem can two fields be?
Alice quickly points out that each field you add means less registrations. Bob retorts that the new information will lead to better lead qualification, losing a percent or two is “no big deal”. Alice recalls the drop-off per field is much higher than that. Bob is confident people love the service enough to surrender additional data. Alice responds …
At the end of an endless list of opinions, personal biases and supposedly relevant anecdotal stories, it all comes down to who’s higher up the org chart. As Greg Wilson wittingly says:
… even the best of us aren’t doing what we expect the makers of acne creams to do.
Opinions are good, but they only get you so far. So are predictions based on other people’s experience. They’re good as a starting point, but the only meaningful numbers are the ones you measure yourself. On your site. With your visitors/customers. From the experiments you designed.
Experiment Driven Development
You’ve got your TDD, your BDD, your load testing, your user testing, your code metrics. All tools for testing your code and improving it. Here’s a question for you: what are you using to test your ideas?
Nathaniel Talbott came up with Experiment Driven Development to describe the process of testing your ideas and using the data to drive development. My best attempt to summarize EDD in one sentence:
Lay each feature as a hypothesis, write an experiment to prove it, run and collect data, use the conclusion to develop your code.
EDD is fact-based software development. It’s the opposite of developing software out of opinions backed by anecdotes based on stories CEO tell about their cousin in-law. EDD starts out with ideas and then measures them against real people: your customers, visitors to your site, dog-food eaters. It seeks proof, and it’s iterative.

Where TDD and BDD offer tools that help you improve code quality and make sure you code behaves according to spec, EDD helps you find out what features to develop and where to take them: it helps you discover what will become the spec.
If you’re following the Lean Startup methodology, you’ll recognize where EDD fits in the customer development cycle. If you’re not, I heartily recommend grabbing a copy of The Four Steps to the Epiphany.
EDD in Practice: Proving Bob Wrong
So let’s switch gears and put Bob’s idea to the test. We’ll start by formulating the new feature as an experiment, and we’ll run an A/B test to decide one way or the other. Our hypothesis is that adding two additional fields (age and zipcode) makes no difference. Let’s write an experiment for that:
ab_test “Age and Zipcode” do description “Change registration form: add age and zipcode fields.” end
Visitors to our site will see one of two forms. We’ll make both alternatives available form the same Rails view:
<% form_for Registration.new do |f| %> Your name: <%= f.text_field :name %> <% if ab_test(:age_and_zipcode) %> Your age: <%= f.text_field :age %> Your zipcode: <%= f.text_field :zipcode %> <% end %> <%= f.submit “Register” %> <% end %>
We’ll need to record each registration, so we can compare how each alternative performed:
class RegistrationController < ApplicationController
def register
@reg = Registration.new(params[:registration])
if @reg.save
track! :age_and_zipcode # registration is our goal
redirect_to thankyou_url
else
render action: “form”
end
end
end
We’ll let this experiment run for a few days, and then look at the results:

Would you have guessed option B performed better? Maybe Bob was right after all.
Testing Your Experiments
Our experiment involves adding two fields to an existing form. What could possibly go wrong?
Unless we add these fields to the underlying model, we’re not actually storing new inputs. That’s fine if all we want to know is “would people fill-in their age and zipcode”. What about the hypothesis that says these two fields lead to higher quality leads? To test that, we’ll need to store age and zipcode in the database.
So now we have one form that expects both fields, and we’ll want to validate and store their values. We have another form that can’t fail when these two fields are empty. Our goal is to get an A/B test to run, not A/broken, so we’ll make sure both forms work as expected.
Let’s test both alternatives:
class RegistrationControllerTest < ActionController::TestCase
def test_form_with_one_field
experiment(:age_and_zipcode).chooses(false)
. . .
end
def test_form_with_three_fields
experiment(:age_and_zipcode).chooses(true)
. . .
end
end
Antoher benefit that comes from testing your experiments: when we’re done with this experiment, we’ll rip out the definition and all tests that reference the experiment will fail. This will help us find all the places in our code that run the experiment and change them to the chosen behavior.
That’s automated testing, now let’s talk about hands-on testing. You’ll want to make sure both forms look good, and there’s no better way than trying them out yourself. For that, we’ll to use the Vanity Dashboard and switch between the alternatives we get to see:

Time to Decide
Once we reach a decision it’s time to end the experiment, change our code to apply the winning alternative, run another round of testing, and deploy a new release. Put that on the todo list.
Meanwhile we’ll let the experiment conclude itself. It might look like this:
ab_test “Age and Zipcode” do
description “Change registration form: add age and zipcode fields.”
complete_if do
alternatives.all? { |alt| alt.participants >= 1000 } &&
score.choice && score.choice.probability >= 95
end
end
This experiment will complete when a) there are at least 1000 participants for each alternative, that’s a big enough sample size, and b) one alternative stands out with probability of 95% or more. (You want to read more about probabilities and interpreting the results)
Vanity will then turn the experiment off, stop collecting data, and switch all participants to the chosen alternative. You don’t have to rush to make a new release with the chosen alternative.
This may not sound like much, but the less you need to feed and care for each experiment the more experiments you can run. And that means faster iterations and more hard data to improve your application. Cool.
How Vanity Came To Be
Before Vanity I used a system I like to call “too many things in too many places”. It had ActiveRecord queries, JavaScript tracking feeding into Google Analytics goals, A/Bingo testing with crontab reports, log grepping, and a lot of knowing how to piece data from different places to make sense of it all.
It didn’t scale.
Listening to Nathaniel explain EDD, I had a eureka moment … more like euphoria as I imagined all that complexity disappearing into a framework with a clean, simple API. Then came some experimenting to find out which feature I needed, and implementing them one by one.
As I ran through these experiments, I came up with a list of what my EDD framework had to do:
Based in code. I want my experiments to be in source code, checked in source control, I want to edit them with a text editor.
Self-sufficient. I don’t want to maintain database schema and object models in support of the schema. The framework has to take care of storing its own state, collecting statistics, etc.
Lightweight. I don’t want to choose between performance and experimenting.
Smart. Do all the statistic calculations for me. I suck at math.
Integrated. You can do a lot with changes to HTML content, but some experiments need to reach deep into the application.
Testable. I need a way to test experimental code, and if experiment has multiple variants, test them all. A/B testing does not mean A/broken.
Reporting. I want a pretty UI to look at, also be able to email reports.
Easy on/easy off. Help me remove experimental code when I’m done experimenting.
Lazy-friendly. Automate as much as possible, less work for me.
Where to Next?
That’s Experiment Driven Development in short, and a brief introduction to Vanity. If you’re curious, watch Nathaniel Talbott present EDD at RubyConf ’09.
Vanity 1.0 is just out the door with support for A/B testing. The next major release will add usage experiments.
You can learn more about Vanity on the site http://vanity.labnotes.org. The code is on Github, there’s also a vanity-talk list you’re more than welcome to join. Now, let’s go experiment.