Idea Validation — Much More Than Just A/B Experiments

UPDATE: I published a longer and more detailed of this article in my free eBook Testing Product Ideas Handbook

Here are a few important things to know about product ideas: 1) the vast majority of ideas lead to no measurable improvements in business results or value-to-customer (some actually cause negative results) 2) No one, no matter how senior or experienced, can predict which ideas will work and which won’t — there are just too many unknowns. 3) Most companies use weak heuristics, opinions and archaic decision processes to place bets on a handful of unproven ideas. I’d argue that this is by far the biggest source of waste in the industry.

The alternative is of course evidence-driven product development. It’s a core principle of Lean Startup, Design Thinking, and other modern product development methods, and the default mode of operation for many successful tech companies today. In product management circles the term “Product Discovery” has become most synonymous with evidence-driven product development. Here’s how product management guru Marty Cagan describes product discovery in his book “Inspired”:

“Our goal in discovery is to validate our ideas the fastest, cheapest way possible. Discovery is about the need for speed. This lets us try out many ideas, and for the promising ideas, try out multiple approaches. There are many different types of ideas, many different types of products, and a variety of different risks that we need to address (value risk, usability risk, feasibility risk, and business risk). So, we have a wide range of techniques, each suitable to different situations. “ (Inspired / Marty Cagan)

As Cagan points out there’s a wide spectrum of validation methods. This is a good thing — there are techniques for any company and product at any stage. It’s also a challenge -the learning curve is quite steep as each validation method has its own set of best practices, subtle nuances and pitfalls.

There’s also a fair amount of confusion and misconceptions about idea validation. Many in the industry believe it’s all about “running experiments”. While experiments are the gold standard, of validation, there are many other, far cheaper and more immediate ways to test an idea. By fixating on experiments many companies set the bar too high, miss out on easier opportunities, and often give themselves an excuse to keep doing things the old way.

In this article I’ll run through the spectrum of validation techniques and briefly describe the most important ones. This is easily a topic for a whole book (“Inspired” by Marty Cagan and “Build the Right It” by Alberto Savoia are two great examples), so this article will be a bit more lengthy. Still, by the end of it I hope to convince you that you too can start validating today and without a huge amount of investment. You really have no excuse not to.


Upcoming Workshops

  • Breakout Growth — Barcelona Sep 30 — In this unique workshop, Sean Ellis, the godfather of growth, and I will show you how companies such as Google and Dropbox keep their products high-value and in constant growth using a combination of Lean Product Management and Growth Hacking principles. Last tickets available!
  • Lean Product Management — Barcelona Dec 03–04 — In this workshop I will walk you through the principles and tools of lean product management — creating strategies and business models, setting goals, prioritizing idea, execution and validation using GIST.
  • Other dates and locations (including private workshops): itamargilad.com/workshops

Idea Validation Methods

There are four types, or levels, of idea validation methods: AssessmentFact Finding, Tests, and Experiments. The diagram shows the most important methods in each, but there are many others, and yet more are being invented as we speak. Product validation is really only limited by our creativity and willingness to step outside our comfort zone.

Assessment

Idea assessment is all about determining quickly, with no external research, if an idea is worth moving forward with at this time. Assessment is never enough to choose to build and launch an idea (although that’s the practice in many companies) as it only gives us weak evidence. Still the techniques below help us evaluate the idea objectively and in a structured way and thus can act as important filters.

These are some common assessment techniques:

  • Goals alignment — Is this idea helping us achieve any of our goals? If not it’s better to park it and maintain focus (or change the goals).
  • Initial ICE analysis — ICE is a technique to score ideas based on their potential impact, Ease (opposite of effort) and Confidence (how much evidence we have this idea will have the expected impact). I’ve written about ICE before. In initial ICE analysis we quickly guesstimate Impact and Ease based on our past experience, or better, based on back-of-the-envelope calculation and rough cost breakdowns. Confidence is always calculated using the Confidence Meter.
  • Business modeling — New products and new business models should first make business sense on paper — this can be a simple revenue/cost projection in a spreadsheet, Strategyzer’s business model canvas, or another financial analysis technique.
  • Stakeholder reviews — You want to review your idea with internal stakeholders not because they can tell you if it’s a good idea or not (they will definitely try), but because they can help flush out business risks — (legal, brand, PR, security etc), and advise how to change the idea to mitigate these risks, or what kind of evidence to look for to show that the risks are minor or non-existent.
  • Assumption mapping — This is a very useful brainstorming techniques to find hidden assumptions and risks in your bigger ideas.

UPDATE: I published a longer and more detailed of this article in my free eBook Testing Product Ideas Handbook

Fact Finding

The next step is to look for available facts and data that support the idea or refute it. For example, if we’re considering offering 4K-resolution playback in our video streaming service, the fact that 4K-playback is the top user request offers support for the idea, while the fact that only 6% of our customers use TVs or devices with 4K resolution, does not.

Key fact-finding techniques include:

  • Data analysis — You can analyze log data, clickstreams, funnel analytics, screen heatmaps, session replays, feedback, requests, CRM data, and anything else that comes from your users. Beware of analyzing someone else’s data — the results may not be representative of your target audience.
  • User interviews — Every contact with users and customer, whether in a scheduled interview, in a pre/post-sale meeting, or just in a five-minute chat in a conference, is an opportunity to learn something about their needs, what they use now, and what they think about your idea. Interviews give us rich qualitative information — they tell us why people do what they do, what’s their thought processes are, what they desire and fear.
  • Surveys — Surveys help us get answers to quantitative and qualitative question quickly and cheaply. You can survey users in your product, in your website, via email or via third-party service. Still surveys results should be used with caution as they are highly sensitive to sampling bias, misinterpretation of questions, non-genuine answers, and other pitfalls.
  • Field research — Observing users in their natural environment — home, workplace or other, can teach us a lot about the context and the reasoning behind their actions and what solution will be most congruent with their lives.
  • Competitor analysis– Looking at competitors will not tell us whether an idea is good or not, but can show if competitors feel that this is a problem worth solving, how they position and price their solutions, and most importantly — what real users think of these solutions.

Sidenote: It’s a good idea to build a regular cadence of fact gathering — interview 3–5 users per week, conduct one field study every quarter, hold regular data deep-dive analysis etc.

Tests

Testing an idea means putting a version of it in front of uses/customers and measuring the reaction. Sometimes we test a new idea just to learn if it works. Other times we test a hypothesis and thus set clear success/failure criteria in advance — some teams use hypothesis statements for this.

These are some of the most popular types of tests:

  • Usability tests — In Usability Tests we ask users to try using the product under the supervision and guidance of a tester. We may use interactive mockups, a code prototype, or an actual product. In all cases we’ll try to make the user interface look reasonably realistic, but in early tests the functionality may be limited and we may use canned data.
  • Human-operated tests — A fast way to simulate the functionality of a product idea is to have humans do the work that the software would eventually automate. In Concierge Tests you have team members perform a service on behalf of the customer, for example the creators of Groupon started out by selling flash deals on their blog and then manually created the coupons and emailed them to customers. In Wizard of Oz tests you’re typically conducting a usability study where the user is seeing a convincing user interface, while behind the scenes a human is doing the work.
  • Smoke tests — Smoke tests (also known as Fake Door tests) help test the demand for a non-existent product or feature. Smoke test create a convincing opportunity for people to check the new product and “opt-in” via ads, product landing pages, in-product calls-to-action, pop-up stores, crowdfunding campaigns and more. Conversion rates at each step of the test will give us a sense of how desirable the product is. When users choose to opt-in, we inform them that the product or feature isn’t ready yet, and often offer the option to join a waitlist — yet another test, as well as a way to generate leads.
  • Dogfood — Testing the product with employees first, often called dogfooding (short for “eat your own dogfood”), is a common practice in many large tech companies. Google, Microsoft and Facebook religiously dogfood every product and feature long before they reach the market. At Gmail we often preceded dogfood with fishfood — testing a bare-bones version of the product with members of the teams or with sister teams. There are clear caveats — your colleagues are rarely exactly the same as your target market, and they get to use the product for free. Still they are more tolerant of bugs and missing functionality, and can give you very early feedback on value, usability, and critical bugs.
  • Early-adopter programs — In these programs we give select customers early access to a new product in exchange for candid feedback and direct contact with the product team. The participants are early-adopters — people who are quite willing to use an incomplete and not fully-tested product in their businesses or personal lives, in exchange for being the first to do so and having the opportunity to help shape the product. The programs go by different names — early adopters, reference customers, trusted testers, alpha testing and more, and they usually don’t require the participant to buy the product. However once the product is ready the participants may become our first customers.
  • Large scale tests — If we’re not worried about keeping the idea a secret, we can announce a preview version, beta program or a lab (an experimental feature that the user may choose to turn on). In these programs customers consciously opt-in to test a yet-to-be-launched product. The goal is to test at scale so we are far less picky about who we let in compared to early-adopter programs. We also don’t offer the same level of direct-contact with the team and support will likely be done via our usual support channels. These late-stage programs act as a general dress-rehearsal before the launch, giving us access to mainstream users, higher loads and a lot more data

Experiments

Experiments are tests that include a control element to guard against false results caused by random chance. Per this definition an A/B test is an experiment, but a usability test is not. I realize that this is not the common definition used in the industry — in many cases the words test and experiment are used interchangeably. I chose this definition because it’s closer to how scientists think of experiments.

Some common types of experiments include:

  • A/B tests — A/B tests typically compare user response to two versions of the product that are different in one single variable. Version A (control) is typically the current version, while version B (treatment) includes the change we wish to test — for example different button text or a new page design. Otherwise the two versions should be the same. As we expose two randomly selected groups of users to versions A and B of the product, we can measure differences in behavior, for example click-through rate on a button. We can assume that the measured differences are due to the product change only if certain statistical tests show that the results are sufficiently statistically significant. For example a statistical significance of 95% means that there’s a 5% chance that the difference is caused by random chance.
  • A/B/n tests — These are essentially the same as A/B tests, but we’re comparing the control version A, to multiple treatment versions B, C, D… which may or may not test the same variable.
  • Multivariate tests — These tests allow us to test multiple variables at the same time — for example the text on a button, its color, and its shape. We can test all the combinations or just a subset to see which combination yields best results. Caveat multivariate tests require a lot more data and harder to reach statistical significance.
  • Percent experiments — As we start gradually rolling out a product change to all our users, we may choose to stop at a specific rollout milestone — for example 25 percent, and conduct a large-scale A/B test, just to confirm that the results are consistent with what we’ve seen before.
  • Holdback experiment — As we’re reaching full rollout of the change, we may choose to leave a small group of users with the old version to monitor the effects of the change over time.

The MVP Principle

The term Minimum Viable Product was first used by Frank Robinson in 2001 to describe the minimal product that can be sold to customers. Robinson argued for using MVP to shorten time-to-market and expedite learning. In 2001 this outcomes-over-output message was definitely very new.

However MVPs entered the mainstream vocabulary only ten years later, with Eric Ries’s book The Lean Startup. Ries took minimum viable products a few steps further — “MVP is that version of the product that enables a full turn of the Build-Measure-Learn loop with a minimum amount of effort and the least amount of development time.” Per this definition, many of the validation methods I described qualify as MVPs, and indeed the book mentions smoke tests, concierge tests and forms of validation as MVPs. In other words, in Lean Startup terminology, MVP is a principle rather than a specific validation technique — always validate with the smallest possible method that produces the evidence you need to learn. However what’s Minimum and Viable is dependent on where you are in the development process — MVPs in early stages of a product idea should definitely be much smaller than the ones in the late stages.

This important message sometimes gets scrambled. Some hear the word Product in MVP and wrongly assume that Lean Startup is all about rushing to market with low-quality, incomplete products. Others go into full waterfall development of what they consider the minimum product they can sell to customers — akin to Robinson’s original definition, but very far from minimal. In general I observed different people (me included) using MVP to mean different things. This does not detract from the importance of the Lean Startup approach — it’s just an observation that the term MVP is often misunderstood.

The need for a system

Evidence-driven product development requires more than a collection of validation methods. To really reap its benefits we need a system to guide us through the process of idea collection, validation and implementation. If you follow my articles regularly you know my favorite — GIST, the framework I started using at Google and further developed with the kind help of companies and teams that were willing to give it a go. Today GIST is the main thing I teach and coach.

Briefly, GIST has these parts:

  • Goals — Define what we wish to achieve. Without goals any idea may be valid.
  • Ideas — Collecting ideas and ranking them so we can systematically choose what to validate first
  • Steps — Iterating through idea validation, results collection and learning.
  • Tasks — The actual management of the work — actually nothing new here — Agile/Kanban is perfectly good here.

To learn more about GIST see this article and this talk.

Final thoughts

As I said, a lengthy article for a big (and important) topic. Hopefully you found some useful information to help you tackle idea validation in your own products. This is definitely not an exhaustive list. If you know of any other important techniques, let me know in the comments.

To receive articles like this by email sign up to my newsletter.

Share with a friend or colleague
Testing product Ideas Handbook

Get your free copy of my latest eBook to learn how to quickly validate product ideas using the AFTER framework and 28 validation techniques.