Case Study: How Fever uses A/B experiments and data analysis to double customer lifetime value

Last April mobile event discovery startup, Fever, invited me to help with its growth initiative. Fever helps users discover and book events — from parties to fashion, food and fitness. The service gained traction in Madrid, New York, London and other cities, generating over 3 million bookings last year, and reaching a unique monthly audience of over 30 million people through its platform and media sites.

As the company reached break-even and switched to rapid growth mode, new challenges emerged: how to rapidly improve key business metrics without hurting user experience? How to try out many ideas without significantly increasing headcount?

After an initial analysis we we set out to implement the following changes:

  • Data-driven development — KPIs, metrics, dashboard, data deep-dives
  • A/B experiments
  • Qualitative research (not covered in this post)

Using data to your advantage

Data is a key asset for any company, and used correctly it can create a tremendous competitive advantages. Everyone wants to be as data-driven as Google, Netflix and Booking.com, but many companies fall into the common pitfalls of focusing on vanity metrics, looking at too many metrics, interpreting data wrongly, and using bad statistics. Here’s what we did at Fever.

Identifying key metrics

  • One metric that matters — We started out by identifying the top metric Fever wanted to further accelerate — customer lifetime value (LTV). Having one metric that matters (OMTM) is of key importance for the focus and velocity of a company — focusing on too many metrics can be counterproductive; focusing on vanity metrics like gross revenue, can drive short-term, non-sustainable growth.
  • Supporting metrics — The next step was to identify supporting metrics, such as retention, conversion to purchase, and repeat purchase rate, and then break those further into metrics that we can influence in the product. This is an ongoing iterative process which is very specific to the product. The result was a “metrics tree” that has less than 10 fundamental metrics that we wanted to focus on, and once in awhile a new key metric would be identified.

Using a Dashboard

Per my suggestion the Fever data team created a dashboard with the metrics we identified, many of which grouped by weekly or monthly cohorts.

Next we established a weekly growth meeting that starts with a review of the dashboard. As we look at the charts we ask ourselves questions such as:

  • What is the current trend — up/down/flat?
  • Were there trend changes? If so, what caused them?
  • Are there any anomalies in the data, for example a particularly good or bad week, and what caused them?
  • How do different geographies compare?

Here’s an example of a chart we have in the dashboard — purchase buckets by cohort:

Dashboard example: purchase buckets by cohort

Each line represents % of users that booked an event once (G1 in blue) or twice (G2 in orange) within the first week of joining Fever. The chart on the left includes all transaction — paid and free, while the chart of the right shows only paid events. The data is grouped by monthly cohorts, for example all the people that joined Fever in June 2017, July 2017 etc. The two charts use different scales.

Notice that this chart allow us to immediately observe a few things:

  • Right chart (paid events)— Rate of users purchasing one or two paid events is on the uptrend (Good). Also, the lines are quite close, suggesting that Fever is effective at moving users from first paid purchase to the second (very good).
  • Left chart (free + paid events) — % of users who made two transactions, (G2 orange line) is also on an uptrend, but the G1 line (blue) is diverging and less consistent. On further investigation we learned the source is likely large scale free events such as public pop-ups, large speed dating, Pokemon go gatherings and others. These have a positive short term effect of bringing 1st-time users into the platform, but most don’t translate into 2nd transactions within the first 7 days. However as we looked at a longer time scale we observed that these free events do increase customer lifetime value and are therefore good to have.

Data deep-dives

When analyzing the dashboard new questions typically arise, for example:

  • How far in advance do users purchase tickets?
  • Are some event categories causing better retention than others?
  • Do events with good star ratings get users to come back more?

The data team owns finding the answers to these questions and presenting them in the next growth meeting, during the dedicated “deep-dives of the week” section. Data deep-dives are a rich source of insights and experiment ideas.

Data deep-dive example: retention by event category

The chart above shows user retention over time split by the category of the first event the user has booked (e.g. party, cinema, food…). As you can see there is some variance in retention depending on content category. This may mean that some event types are more “sticky” or that that users that select certain event types gain more value from Fever. This insight lead to further user and content segmentation, and to changes in the recommendation algorithm.

Learning

Dashboard analysis + data deep-dives drive learning — gaining a deeper understanding on how users use the product and why, what works well and what doesn’t. This usually generates ideas on how to improve. The next step is to try out these ideas in the product.

A/B experiments

A/B experiments is a another super-power-tool that every company needs to have in its arsenal, but few manage to do correctly. They allow testing ideas quickly, cheaply and with high level of certainty. Companies can go through many ideas in a short amount of time, killing the ones that don’t work and fully launching the ones that do. This greatly reduces cost sunk into bad projects and improves idea-to-launch cycle time.

Success/fail ratio — In my first meeting with Fever I explained that when put to the test, most ideas fail to show any measurable improvement and some even show negative results. Companies such as Netflix and Microsoft reported that on average 1 in 3 of their experiments yield positive results, but in my experience startups should assume 1-in-10 winner rate, as they have less tested products and newer users. Our experience in Fever supports this rule-of-thumb, although Fever has done slightly better, and it’s win rate is constantly improving.

Here are some of the experiments we ran in Fever.

Grid View Experiment

The Fever app shows one event at a time in a feed. We hypothesized that showing six events per screen in a gridview, although with less detail, will grow impressions, click-throughs and eventually purchases.

Result: Failed

The experiment version raised the number of events seen by users in by only 11% on average, but decreased CTR by 23% (possible reasons: less info on each event, more distraction) and reduced conversions to purchase by 24%. Overall revenue per user went down by 23%.

Learning: important to give users enough information in the feed to make a decision before entering the event.

Similar Events Experiment

We hypothesized that when entering an event users would benefit from finding similar events, which will lead to an increase in event views and purchases.

Result: Failed

We tested this idea twice with different positioning of the related events carousel. In one case we got insignificant results. In another we got a decrease of 3.2% in total revenue per user — most likely because of the distraction in the UI.

Keyword search experiment

The Fever app offers a curated list of events + ability to filter by category and area. We hypothesized that adding keyword search will help users discover events of interest quicker and will lead to more conversions.

Result: Success!

Both CTR and conversion to pay went up by around 7% each, and total revenue per user shot up by 12%!

Showing star-rating experiment

Fever customers can rate events they attended on a 1–5 star scale. A deep-dive analysis we performed showed that attending a highly-rated event on average causes users to come back again to the app more, so we hypothesized that by showing star ratings in the app we will drive up conversions.

Result: Success!

The experiment version gained 1% higher CTR, 5.1% higher conversion to payment and 4.4% higher revenue per user. A definite win.

Implementing A/B experiments correctly

A/B experiments are about testing many ideas fast. At Fever we wanted the engineering and design team to “own” the experiments. Here’s what we did:

  • Created eng/ux squads around key metrics — conversion, retention and repeat purchase. Each team “owns” the metric and has lots of freedom to choose how to best improve it
  • Created an idea bank and asked the teams as well as management to add and vote on ideas.
  • Instructed the team to work in a fixed iteration cycles — 1 week to code an experiment + 2–3 weeks to run it and collect results

The results were an improvement in velocity, team ownership, and transparency while lowering management overhead. Team members could see their ideas materialize in a matter of days, and being judged in an objective way — not subject to anyone’s opinion. The teams were self-motivated to expedite the process and initiated a number of improvements in the testing and analysis infrastructure. Fever is now running up to 5 experiments per week.

Overall results

The chart below shows LTV7 — average revenue per customer in the first 7d after signup. You can see that since we started the growth project, LTV7 has nearly doubled and the trend is positive.

LTV7 is a good leading indicator for LTV (full customer lifetime value) – getting more users to engage and make a purchase within the first week is almost certain to increase overall purchases. Of course we also regularly look at longer time scales: LTV30, LTV60, LTV90 and LTV365 to see how they are trending.

To be clear, not all of this growth came from product changes — the company is also making improvement in marketing, content, partnerships and channels. Still, as we used A/B experiments we know exactly how much product changes are contributing.

Final thoughts

By focusing Fever on key metrics, systematic data analysis and hypothesis-based experiments we were able to accelerate the company’s growth and set it on a trajectory to meet or exceed its objectives. Having said that, data and experiments are only half of the growth story. It’s just as important to step back and try to identify the key driving forces of growth in a Company. In Fever we also initiated qualitative user research to understand why people do what they do, content demand vs. supply analysis and user segmentation. Perhaps the subject of a separate blog post.

Itamar Gilad (itamargilad.com) is a product consultant helping tech companies build products that deliver and capture tremendous value. He specializes in product strategy, product market fit, growth and innovation. Previously he was lead product manager at Google, Microsoft and a number of startups.

If you prefer to receive posts like these by email sign up to my newsletter.

 

Share with a friend or colleague

Leave a Comment

Your email address will not be published. Required fields are marked *