1. Nov 19th, 2009

    Vanity: Experiment Driven Development for Rails

    sidebar sidebar

    You’re sitting in the weekly design review meeting, taking comfort that it’s down to the last agenda item. Bob needs a new feature done by Friday, and it’s a simple one: add two fields to the user registration form asking people to submit their age and zipcode. How much problem can two fields be?

    Alice quickly points out that each field you add means less registrations. Bob retorts that the new information will lead to better lead qualification, losing a percent or two is “no big deal”. Alice recalls the drop-off per field is much higher than that. Bob is confident people love the service enough to surrender additional data. Alice responds …

    At the end of an endless list of opinions, personal biases and supposedly relevant anecdotal stories, it all comes down to who’s higher up the org chart. As Greg Wilson wittingly says:

    … even the best of us aren’t doing what we expect the makers of acne creams to do.

    Opinions are good, but they only get you so far. So are predictions based on other people’s experience. They’re good as a starting point, but the only meaningful numbers are the ones you measure yourself. On your site. With your visitors/customers. From the experiments you designed.

    Experiment Driven Development

    You’ve got your TDD, your BDD, your load testing, your user testing, your code metrics. All tools for testing your code and improving it. Here’s a question for you: what are you using to test your ideas?

    Nathaniel Talbott came up with Experiment Driven Development to describe the process of testing your ideas and using the data to drive development. My best attempt to summarize EDD in one sentence:

    Lay each feature as a hypothesis, write an experiment to prove it, run and collect data, use the conclusion to develop your code.

    EDD is fact-based software development. It’s the opposite of developing software out of opinions backed by anecdotes based on stories CEO tell about their cousin in-law. EDD starts out with ideas and then measures them against real people: your customers, visitors to your site, dog-food eaters. It seeks proof, and it’s iterative.

    IEDI

    Where TDD and BDD offer tools that help you improve code quality and make sure you code behaves according to spec, EDD helps you find out what features to develop and where to take them: it helps you discover what will become the spec.

    If you’re following the Lean Startup methodology, you’ll recognize where EDD fits in the customer development cycle. If you’re not, I heartily recommend grabbing a copy of The Four Steps to the Epiphany.

    EDD in Practice: Proving Bob Wrong

    So let’s switch gears and put Bob’s idea to the test. We’ll start by formulating the new feature as an experiment, and we’ll run an A/B test to decide one way or the other. Our hypothesis is that adding two additional fields (age and zipcode) makes no difference. Let’s write an experiment for that:

    ab_test “Age and Zipcode” do
      description “Change registration form: add age and zipcode fields.”
    end

    Visitors to our site will see one of two forms. We’ll make both alternatives available from the same Rails view:

    <% form_for Registration.new do |f| %>
     Your name: <%= f.text_field :name %>
     <% if ab_test(:age_and_zipcode) %>
       Your age:  <%= f.text_field :age %>
       Your zipcode: <%= f.text_field :zipcode %>
     <% end %>
     <%= f.submit “Register” %>
    <% end %>

    We’ll need to record each registration, so we can compare how each alternative performed:

    class RegistrationController < ApplicationController
     def register
       @reg = Registration.new(params[:registration])
       if @reg.save
         track! :age_and_zipcode  # registration is our goal
         redirect_to thankyou_url
       else
         render action: “form”
       end
     end
    end

    We’ll let this experiment run for a few days, and then look at the results:

    Age and zipcode in dashboard

    Would you have guessed option B performed better? Maybe Bob was right after all.

    Testing Your Experiments

    Our experiment involves adding two fields to an existing form. What could possibly go wrong?

    Unless we add these fields to the underlying model, we’re not actually storing new inputs. That’s fine if all we want to know is “would people fill-in their age and zipcode”. What about the hypothesis that says these two fields lead to higher quality leads? To test that, we’ll need to store age and zipcode in the database.

    So now we have one form that expects both fields, and we’ll want to validate and store their values. We have another form that can’t fail when these two fields are empty. Our goal is to get an A/B test to run, not A/broken, so we’ll make sure both forms work as expected.

    Let’s test both alternatives:

    class RegistrationControllerTest < ActionController::TestCase
      def test_form_with_one_field
        experiment(:age_and_zipcode).chooses(false)
        . . .
      end
     
      def test_form_with_three_fields
       experiment(:age_and_zipcode).chooses(true)
        . . .
     end
    end

    Another benefit that comes from testing your experiments: when we’re done with this experiment, we’ll rip out the definition and all tests that reference the experiment will fail. This will help us find all the places in our code that run the experiment and change them to the chosen behavior.

    That’s automated testing, now let’s talk about hands-on testing. You’ll want to make sure both forms look good, and there’s no better way than trying them out yourself. For that, we’ll to use the Vanity Dashboard and switch between the alternatives we get to see:

    choose from dashboard

    Time to Decide

    Once we reach a decision it’s time to end the experiment, change our code to apply the winning alternative, run another round of testing, and deploy a new release. Put that on the todo list.

    Meanwhile we’ll let the experiment conclude itself. It might look like this:

    ab_test “Age and Zipcode” do
      description “Change registration form: add age and zipcode fields.”
     
      complete_if do
        alternatives.all? { |alt| alt.participants >= 1000 } &&
        score.choice && score.choice.probability >= 95
      end
     
    end

    This experiment will complete when a) there are at least 1000 participants for each alternative, that’s a big enough sample size, and b) one alternative stands out with probability of 95% or more. (You want to read more about probabilities and interpreting the results)

    Vanity will then turn the experiment off, stop collecting data, and switch all participants to the chosen alternative. You don’t have to rush to make a new release with the chosen alternative.

    This may not sound like much, but the less you need to feed and care for each experiment the more experiments you can run. And that means faster iterations and more hard data to improve your application. Cool.

    How Vanity Came To Be

    Before Vanity I used a system I like to call “too many things in too many places”. It had ActiveRecord queries, JavaScript tracking feeding into Google Analytics goals, A/Bingo testing with crontab reports, log grepping, and a lot of knowing how to piece data from different places to make sense of it all.

    It didn’t scale.

    Listening to Nathaniel explain EDD, I had a eureka moment … more like euphoria as I imagined all that complexity disappearing into a framework with a clean, simple API. Then came some experimenting to find out which feature I needed, and implementing them one by one.

    As I ran through these experiments, I came up with a list of what my EDD framework had to do:

    Based in code. I want my experiments to be in source code, checked in source control, I want to edit them with a text editor.

    Self-sufficient. I don’t want to maintain database schema and object models in support of the schema. The framework has to take care of storing its own state, collecting statistics, etc.

    Lightweight. I don’t want to choose between performance and experimenting.

    Smart. Do all the statistic calculations for me. I suck at math.

    Integrated. You can do a lot with changes to HTML content, but some experiments need to reach deep into the application.

    Testable. I need a way to test experimental code, and if experiment has multiple variants, test them all. A/B testing does not mean A/broken.

    Reporting. I want a pretty UI to look at, also be able to email reports.

    Easy on/easy off. Help me remove experimental code when I’m done experimenting.

    Lazy-friendly. Automate as much as possible, less work for me.

    Where to Next?

    That’s Experiment Driven Development in short, and a brief introduction to Vanity. If you’re curious, watch Nathaniel Talbott present EDD at RubyConf ’09.

    Vanity 1.0 is just out the door with support for A/B testing. The next major release will add usage experiments.

    You can learn more about Vanity on the site http://vanity.labnotes.org. The code is on Github, there’s also a vanity-talk list you’re more than welcome to join. Now, let’s go experiment.

    1. Nov 24th, 2009

      Dan McGrady

      Awesome, I’ll be trying this out on a live app very soon!

    2. Nov 25th, 2009

      Christian

      Great work, Assaf! I’ve been very interested in this since you tweeted about it, and the plugin looks great. Can’t wait to try it out!

    3. Nov 30th, 2009

      Jim Neath

      Vanity looks awesome, Assaf. Exactly what I’ve been looking for :D

    4. Nov 30th, 2009

      Senthil Nayagam

      so now we soon have A2Z of .DD :) ,

      some I can suggest from my freelancing days.

      Payment Driven Development(getting paid is important)
      User interface Driven Development(screens are ready bid for it)
      Vision Driven Development( I have this worl changing vision)

    5. Nov 30th, 2009

      John

      I really like the approach you are taking – using software for intelligent decision making. Every former grad student should recognize and appreciate this experimental design process. Another example of how the Ruby community is really leading the pack in software design and development.

    6. Dec 8th, 2009

      Jon Wood

      How much extra work does this require in maintaining the different forks?

      In the example you gave, presumably you also have to check which variant you’re running at the model level, to disable validations when necessary, and then make that check at every point where the fields would otherwise be required. I really like the concept of A/B testing, but I’m concerned by the extra maintenance overhead while experiments are running.

    7. Dec 8th, 2009

      Assaf

      You start with an idea for a change that will improve your software. Your baseline cost of development is having both alternatives — before and after the change. Without EDD these alternatives will be separate in time, with EDD you’re going to have some overlap (the duration of the experiment).

      For the experiment, you only need a skeletal implementation, you’re not committed to fully developing the feature until after it proves itself. For small changes it makes little difference, the cost is the same.

      For complex changes, you can save a lot by not fully developing features that don’t matter. You’re going to know whether a feature matters or not quickly enough, and with data to back up, that you can make the decision to *not* develop it further.

      You can also kill unused features early. So these are two ways to reduce development costs using EDD.

    8. Dec 10th, 2009

      Eliot Sykes

      Clean looking library Assaf, looking forward to trying it out with a Spree install.

    9. Dec 26th, 2009

      Asaf Atreides

      Hi,

      How do you think EDD (and perhaps the entire Lean Startup methodology) deals with the arguments presented in this article ?(http://onstartups.com/tabid/3339/bid/11416/Releasing-Early-Is-Not-Always-Good-Heresy.aspx)

      And a quick idea – A/B testing should be somehow incorporated into a revision control system, having the system automatically calculate the “best source code paths” for a given scenario/user.

      Thank you for the lib and for the blog,

      Regards,

      Asaf.

    10. Dec 26th, 2009

      Assaf

      That article starts out by taking “release early”, and slippery sloping it into a “release crap” strawman. It attacks design-by-consensus, but either doesn’t mention or doesn’t understand customer development. Strawman are easy to build, easy to defeat, great link bait, but add nothing to the conversation.

      Reasonable people start out with good ideas they want to turn into quality products that people will actually want to use. Good ideas still need tuning, few ever get it right the first time. How do you tune? You can do it based on opinions, or you can do it based on facts.

      EDD is about tuning your software to be better based on facts.

    11. Jan 7th, 2010

      Avishai

      Looks quite interesting, can’t wait to give it a try in my next app!

    12. Jan 26th, 2010

      Henrik Joreteg

      Absolutely brilliant! There’s no doubt in my mind you’ll soon see ports to django and other frameworks.

      Design by committee just doesn’t work. Solid, data-driven decisions are what matter. But, doing good A/B testing has to be simple or it just won’t be done. Very, very cool. Thanks for this.

    13. Feb 16th, 2010

      Jesse Crouch

      If you’re on Windows and trying to use Vanity – maybe give up. Right now it requires Redis which doesn’t easily compile on win. Read a few posts about getting redis running with cygwin sucessfully, but quite a few more that had issues. Some people compiling it with visual studio too, but also some had issues.

      Just thought I’d throw that in in case anyone is having difficulties.

      Also would love to see a config option to use something besides Redis.

    14. Feb 20th, 2010

      Arno.Nyhm

      interesting.

      how i can see this extended statistics:

      a) how many signup (rate)

      b) how many signup (rate) with 3 fields (username, age, zip)
      – b.1 fill in field username
      – b.2 fill in field age
      – b.3 fill in field zip

      or how i can setup maybe some more subgoals?

    15. Mar 30th, 2010

      almost effortless » Weekly Digest, 12-11-09

      [...] Vanity: Experiment Driven Development for Rails You’ve got your TDD, your BDD, your load testing, your user testing, your code metrics. All tools for testing your code and improving it. Here’s a question for you: what are you using to test your ideas? [...]

    16. Mar 31st, 2010

      How we do A/B testing in Rails « Amit Kumar Mondol's Blog

      [...] service." model Account.in_trial end Here I have only described some brief concept Read more: http://labnotes.org/2009/11/19/vanity-experiment-driven-development-for-rails/#ixzz0jZMRkbJi Posted in Web Development. Tags: A/B Testing. Leave a Comment [...]

    Your comment, here ⇓