Why your company should be experimenting

I was recently approached by a client who needed to explain the benefits of experimentation to his company. My first thought was an infographic. But that just scratches the surface.

Iqbal Ali
UX Collective

--

Benefits of an experimentation process: De-risk, Convert, Learn, ROI efficiency, Consistency, Data-driven
The benefits of testing

A simple sharing of experimentation benefits didn’t seem right, however, as the main problem facing “testing muggles” is that they’re often unaware of the issues with their current approaches they’re using.

We first need to show the shortcomings of their current approach.

Some companies will do user research, perhaps even conducting limited user testing sessions…

User research involves some form of communication with a small number of users
User research

But the responses from this may not be representative of all visitors to a website or app. Not only is the sample size too small, but what users say and do aren’t necessarily correlated.

“What people say, what people do, and what they say they do are entirely different things.”
— Margaret Mead, Cultural Anthropolist

Psychologists have also long recognised the presence of demand characteristics. Subtle cues which can alter participant behaviours.

This “user research” is better when backed up by something called pre-post analysis. This means that the data after a change is released (“post”) is compared to the data from a “pre-release” period. It’s the job of an analyst to find a comparable period for this.

You can read a good write up of this approach here.

Pre-post analysis. Comparing data of one week against data of another
Pre-post analysis. Comparing data (red) with a previous week (grey)

However, there are shortcomings with this type of analysis as well. For starters, our environment is constantly changing. Not only is the world a different place than it was an hour ago, but our product probably is, too.

All this is compounded by the number of changes rolled into a given release, as well as the variety of traffic we drive to our site. These are all extraneous variables which need to be controlled.

These variables are major caveats as it can be difficult for an analyst to determine the right comparison range for data, given all of this.

Environment changes x User differences x Number of changes = lots of variables
Environment changes x User differences x Number of changes = lots of variables

Meanwhile, with A/B testing:

“You can make your analysts significantly smarter by exposing them to perhaps the most advanced way of doing analysis.”
— Avinash Kaushik, Web Analytics 2.0

Now, I’ll admit that for some features (and some companies), pre-post coupled with user research are the only ways to validate changes. And that’s fine. But we need to be aware of the shortcomings when it comes to using them to evaluate our changes.

What’s the size of this risk?

I did some research recently to answer the following question:

Of all experiments designed to increase a conversion, what percentage were successful?

In other words, what is a typical win rate for companies that experiment? Knowing this will tell us how often assumptions are wrong. This should indicate the scale of the risk involved with releases.

I reviewed not only my own experiences with the teams and companies I’ve worked with but also the experiences of others involved in experimentation (shout out to the Conversion World Roundtable here).

The answers I got were pretty consistent: a win rate ranging from 20–30%. This means that 70–80% of all experiments either had no impact on conversion or worse, they impacted conversion negatively.

According to Booking.com, the picture is much bleaker. The following is from an article at the Harvard Business Review, using :

“For every experiment that succeeds, nearly 10 don’t.”

But let's be optimistic and use my initial stat. That means that 70–80% of the time, hypotheses were proved wrong.

Being wrong 70–80% of the time is a problem
Being wrong 70–80% of the time is a problem

While this insight is valuable, it’s important to note that this is based on changes that are designed as experiments — i.e. none of these ideas would exist otherwise.

But what this figure illustrates is how commonplace it is to be wrong. Even if we’re conservative, and assume a 50% chance of changes assumptions being wrong, it’s still a compelling statistic. And 50% is no better than a coin toss.

The best-case scenario for releasing changes without testing is a coin toss!
The best-case scenario for releasing changes without testing is a coin toss!

Working in this way reminds me of a Jenga tower built on an unstable base…

An unstable Jenga base, where a tower is built on top of an unstable base of a single block
An unstable Jenga base

…where faulty assumptions are mistakenly regarded as fact and built upon.

Again, I’m not knocking user research or pre-post analysis. They are required in every organisation. They’re just not the best tools to evaluate changes.

There comes a time in every company’s life when they have to ask, “Are we ready to adopt experimentation to combat these shortcomings?”

Even if the answer to that question is “yes”, it’s important we proceed in a diligent and purposeful manner. Not doing so threatens to negate any of the benefits I’ve mentioned.

So, for the rest of this article, I’ll cover the fundamentals underpinning experiments and experiment programs. The goal is to better understand how and why those benefits exist.

What is an “experiment”?

Here’s a dictionary definition of “experiment”:

“…a scientific procedure undertaken to make a discovery, test a hypothesis or demonstrate a known fact.”

The emphasis is mine.

Part of being “scientific” means we need to control the number of variables used. That’s important as we’re trying to prove cause and effect (Interesting tidbit: Many psychology studies use twins in an effort to control variables).

A “scientific” procedure also needs to be methodical, objective, and repeatable. So, we need to follow a set of well-defined processes, making everything as transparent as possible.

At a high level, the life cycle of a web experiment looks like this:

Experiment overview in four steps: Hypothesise, design & build, split traffic, compare results
Overview of an experiment

A hypothesis is formed. This communicates an as-yet unproven belief that a specific change will impact something, usually a conversion rate, in a particular way.

The experiment is designed and built to control the number of variables in an effort to answer the hypothesis with as much accuracy as possible.

The experiment is then released. Traffic is split into separate groups, with one group experiencing our change, and another experiencing the original. Since we run both at the same time, we limit the problems of pre-post analysis. We also ensure the experiment runs long enough to normalise the different types of users to our site too.

The groups are compared to get an answer to our hypothesis. Statistical methods are used to achieve the desired level of probability and we decide whether the hypothesis has been validated.

There are three (that I count, anyway) types of experiments categorised by the primary goal.

  • De-risking changes. This is where the business wants to launch some changes but also wants to mitigate any risks whilst doing so. In other words, the hope is for a conversion “win”, but the company will also settle for a “flat” result.
  • Increasing conversion. This is when the hypothesis is about a change designed to increase a financial (or other business-centric) metric. A “win” is favoured here.
  • Learning about users. The hypothesis is about validating and/or learning about user behaviour. This may be used to inform product strategy or develop new hypotheses to test.

While an experiment may hope to cover multiple goals, a decision needs to be made about which goal is primary. It helps focus the process and efforts of everyone involved. This goal should be stated clearly in the hypothesis.

Knowing the goal of an experiment means we can adopt our processes to accommodate it. So, if the primary goal of an experiment is to de-risk, we may accept a lower probability (i.e. level of certainty) for the result.

What are the specific benefits of running an experiment?

The three experiment types conveniently form our first three benefits:

Benefits of using an experiment: de-risk, convert, and learn
  • De-risking means we protect our conversion rate. Remember that at least half of our assumptions are likely wrong. Also, since there’s a risk of accidentally releasing bugs, running an experiment allows us to catch them early and with a better degree of certainty than otherwise possible.
  • Since we’re testing our assumptions, it means we can try out new ideas to help increase our conversion rate. It also ensures we make the right changes live.
  • We also learn deeper insights about our users. Each experiment provides insight into user behaviour through accurate comparative analyses. This helps de-risk product strategies as well as inform new ones.

These are great benefits, but we can do even better! Thinking of experiments singularly is limiting. An “experiment program” thinks in terms of multiple experiments.

What is an “experiment program”?

I came across the concept of the Kolb’s learning cycle during my stint as a teacher — it’s a theoretical model describing how we learn.

Kolb’s learning cycle is a  circle containing steps: Experience, reflection, conceptualise, and test
Kolb’s learning cycle

This cycle is also known as the “experiential learning model. A learner starts from any of the four stages and cycles round, ideally touching every base. The stages are:

  • Experience—the learner actively experiences something
  • Reflection—the learner reflects on that experience
  • Conceptualise—a new theory is conceptualised based on the experience
  • Test —the learner tests this new theory

The learner then goes back through the cycle with new concrete experience. Each time the cycle is iterated, learnings compound over the last.

Just as this cycle is effective for multiple learnings, so it is effective for managing multiple experiments. Utilising a process like this for our experiments is an effective way to test.

Here’s an overview of what that looks like:

An experiment learning cycle based on Kolb’s learning cycle: Data, review/reflect, hypothesise, and test

We might start with some data based on an observation or two. This could be the results of an experiment, or some other concrete data insight.

We then review and reflect on this experience, taking into account other data, like user research, and perhaps other experiment results.

Based on these reflections, we conceptualise new hypotheses.

We then test our hypotheses, validating (or not) with experiments.

Each time we iterate, we compound learnings. The cycle is flexible, allowing us to start from any of the four stages. Saying that, it’s most effective starting from either the data or review/reflection stage.

When we analyse Widerfunnel’s Infinity experimentation process (a popular model for experimentation)…

Widerfunnel’s infinity process with an “explore” and “validate” cycles
Widerfunnel’s Infinity Process

…we find Kolb’s learning cycle at its heart. “Explore” and “validate” expand Kolb’s cycle into an experimentation process with multiple sub-processes for prioritisation, experiment design etc.

Because we’ve increased our level of certainty through experimenting, building on our learnings like this makes for a more stable Jenga base:

A stable Jenga base with a foundation of three blocks now

Note: This is a high-level view. When fleshing out an experiment program, each stage would be broken down further into a series of sub-processes, just like the Widerfunnel example.

What are the benefits of an experiment program?

As well as a more stable Jenga base, having an experiment program built with well-defined processes gives us the rest of our benefits:

Benefits of an experiment program: ROI efficient, consistency, and data-driven
The benefits of having an experimentation program
  • Incrementing products by isolating variables, as well as having procedures to prioritise effectively, leads to efficiencies for all involved. This leads to higher returns on investment (ROI).
  • A standardised process leads to consistency of experiment builds and approaches. This ensures all experiments are built with the same rigour, maintaining a good level of certainty.
  • Utilising an experiment program means we get everyone thinking in objective, data-driven ways. This reduces subjectivity in the decision-making process. This not only giving us the best chance of succeeding but is also liberating for everyone involved.

Additionally, we amplify the benefits of singular experiments, as we blend the three experiment types in a strategic way.

The experiment benefits: de-risk, convert, and learn are amplified
Benefits are amplified by using a program to manage multiple experiments

Conclusion

So, does this mean we should test everything? No. As previously mentioned, it’s not always possible or practical to test everything so we need to be pragmatic. It should always be the first consideration, though.

Now that we’re aware of the fundamentals of experimentation and how they relate to the benefits, we can champion implementation more effectively.

Not only that, but we’re also better equipped to validate and conserve the integrity of an existing process.

In case you need more reasons, here is a bonus one…

Competition are doing it, so they  win all the business and money, while you stand and watch

The competition are likely doing it. Testing is more popular than ever. So, looking to our competitors, there’s no reason to think they’re not testing. So, not doing so risks us falling behind. And the better the testing program, the greater the edge.

Furthermore, experimenting is fascinating and fun — a reason in itself in my opinion.

I’m Iqbal Ali, former Head of Optimisation at Trainline. I’m now an Optimisation Specialist, helping companies achieve success with their experimentation programs through training and setting up processes. I’m also a graphic novelist and writer. Here’s my LinkedIn if you want to connect.

The UX Collective donates US$1 for each article published on our platform. This story contributed to Bay Area Black Designers: a professional development community for Black people who are digital designers and researchers in the San Francisco Bay Area. By joining together in community, members share inspiration, connection, peer mentorship, professional development, resources, feedback, support, and resilience. Silence against systemic racism is not an option. Build the design community you believe in.

--

--

Experimentation consultant and trainer. Writer and comics creator.