Experimenting Your Way to a Better Performing Website

Today, we’re going to catch up with Optimizely Director of Product Management, Jon Noronha, to discuss data experimentation, the product developments at Optimizely, and his own professional journey in the website experimentation space. Optimizely is the world’s leader in digital experience optimization, allowing businesses to dramatically drive up the value of their digital products, commerce and campaigns through its best in class experimentation software platform.

David: What is the scale and scope of Optimizely?

Jon: Optimizely is the world’s largest platform for online experimentation, with over 1 million experiments run so far and 600 new ones started every day. We power A/B testing and feature flagging for over a quarter of the Fortune 100, including companies like Microsoft, Visa, and HP. We also power app experimentation for digital disruptors like Blue Apron, Sonos, and Missguided. Chances are that every day you browse the internet, you’re bucketed into hundreds of different experiments powered by Optimizely.

“Two-sample hypothesis tests” date back to 1908 (according to wikipedia), and Google engineers ran their first A/B test in the year 2000. Optimizely started in 2009 and you joined in 2014. Could you walk us through how you see the history and future of A/B testing on the internet?

I’d go back even further — the scientific method has been driving human progress since before Isaac Newton! Online experimentation just applies that same mindset to product development and website design. What’s more striking to me is that it’s taken this long. As you point out, it’s been almost 20 years since Google ran its first A/B test.

But in 2018, most companies still aren’t experimenting on most of their key decisions — and they’re generally being outcompeted by a small minority like Amazon and Netflix who are famous for experimenting relentlessly.

Optimizely’s mission is to change that balance by bringing Google-level experimentation technology to everyone else. We’ve been pretty successful: in the last decade, most online businesses have started to experiment online, and those who haven’t are realizing they’re falling behind. Interestingly, though, online A/B testing has mainly taken off with marketing teams. It’s now established practice to experiment on email subjects, ad copy, and landing page layouts to drive conversion rates.

The bigger trend I see for the next few years is the spread of experimentation to product and engineering teams. The top tech companies already have a model where every new feature is launched as an experiment, but we’re starting to see that mindset grow in smaller start-ups and technical teams within more traditional companies. I think in a few years, every competent product team will be adopting feature flagging and experimenting on their features after launch.

In your 4 years at Optimizely, what product work are you the most proud of?

Optimizely already had a strong product when I joined, but it was focused on a very particular kind of experimenter: a semi-technical lone wolf running sporadic growth A/B tests. We had ambitions to expand far beyond that, to larger teams running more fundamental experiments across all parts of their product stack. So in 2016, we completely revamped our platform to adapt it for the needs of enterprise teams with more modern websites and apps. The result was Optimizely X, a completely redesigned product built for both marketers and developers. It’s been amazing to see this product work translate into culture change across some of the world’s most important businesses.

What do you see as Optimizely’s impact on the internet?

Until a few years ago, the only sites experimenting heavily were tech giants like Facebook or Amazon. These companies built a culture of experimentation that allowed them to disrupt one sector after another. I see Optimizely as leveling the playing field. I’m proud of how we’ve helped iconic companies stay competitive through experimentation. For example, we’ve helped the New York Times navigate a major digital transformation and become a leader in digital subscriptions by testing and personalizing their paywall experience. I’m also excited about how we’ve helped smaller companies like Blue Apron and Rocksbox grow their businesses and refine their messaging as they go.

Most of all, I’m excited about what that means for all of us as end users of these products. The internet is full of broken apps, janky UX, and dumb features that nobody asked for. By helping more companies experiment, we help make these user experiences so much more effective for all of us.

What are your favorite / least favorite misconceptions about data driven experimentation?

I hate the term “data driven” in the context of experimentation. Unless you’re the one behind the steering wheel, your data will drive you straight off a cliff. Data is a tool for informing tradeoffs and understanding your customers, but it’s no substitute for a vision or strategy. You should be data-informed, not data-driven.

There’s a misconception that experimentation is all about tiny, incremental changes — like Google famously testing 40 shades of blue for their links. But the best experiments are actually bold — like Amazon introducing Prime, or Facebook building the News Feed. Experimentation lets you bring controversial ideas out of a meeting room into the real world, while mitigating the risk if something goes wrong. Done right, experiments should make you fearless in trying out bold ideas.

What industries are gaining the most traction from the Optimizely X platform? I imagine behavioural targeting is groundbreaking for ecommerce sites.

We certainly see strong traction in retail and travel, where companies can very quickly see a significant impact in online revenue from experimentation. As a typical example, one ecommerce site we worked with redesigned its category page and saw a 20% increase in conversion rates yielding over $250k/quarter in additional revenue. We’ve also seen experimentation thrive at media companies, where it can be applied to everything from application design to editorial content to subscription flows and advertising.

Lately I’m noticing more growth in industries like financial services and food and beverage. Both industries have had a strong brick and mortar presence that didn’t always translate to the web — but we see rapidly growing adoption in mobile, where experimentation is a must to both improve the user experience and mitigate development risk.

In your series, Product Experimentation Pitfalls, you discuss picking a “north star metric” and how costly it was to pick the wrong one while working at Bing (total number of search queries). How hard was it to admit that your team’s assumption was wrong? And how did the team regain confidence in a new north star? And is a true north star metric even possible?

Choosing the right metrics is essential for experimenting effectively. Two companies with similar businesses, like Airbnb and Booking.com, can end up with completely different user experiences by optimizing for different things. As I describe in the post, it took us years at Bing to realize we were optimizing for the wrong outcome — driving more searching when really users want to search less.

But the truth is it was easy to tell that something was wrong, because our quantitative numbers weren’t lining up with qualitative feedback on our experiments.

That was a powerful hint that something was off, and pursuing that chain made it clear that we needed a metric more in line with our users’ expectations. This is another example of why you can’t be “data driven” — you have to drive the data!

I think most teams aren’t well served by a single, fixed metric. Instead, we should all be picking a couple different metrics and re-evaluating them every 6–12 months to make sure they’re still serving their purpose. And when you change metrics, don’t forget to re-test some old ideas! It’s the cheapest experiment you can run.

You spent 2 and half years at Microsoft. Why did you decide to leave Microsoft for Optimizely?

My time at Microsoft showed the power of experimentation first-hand. In my time there, we went from running just a handful of A/B tests to hundreds of experiments every week, transforming the entire culture of the company in the process. I saw how online experiments could drive a broader trend of agile development and bend the growth curve of a business.

I also saw how difficult it was. It took an enormous effort to build experiments on our in-house platform, and we ran into so many issues along the way with statistics, performance, etc. I thought, “everyone should be experimenting, but there’s no way they will if it’s this hard.” As soon as I discovered Optimizely, it was a no brainer — the perfect chance to join a growing company working on a problem I was already passionate about.

Across your customers, could you share an anecdote of an unexpected change that yielded major results?

There’s a misconception that A/B testing is all about finding “one weird trick” to get big results, like changing a button color and doubling your conversion rate. And there are real experiments like that, but it misses the point. The biggest benefits come from the compounded impact of many experiments, and the culture of fearless development that they enable.

That collective impact can be very large. One customer of ours tallied this at $21M in incremental revenues after their first 500 experiments.

Instead of asking, “what’s the one experiment that will get me a big impact?” I recommend asking instead, “how do we scale to a hundred experiments to get a guaranteed gain?”

Optimizely X Full Stack is A/B testing and feature management for product development teams. What are some of the most advanced algorithms you’ve seen customers build atop Optimizely?

Full Stack is great for testing backend changes. We’ve seen it used effectively for product sorting, search algorithms, and dynamic pricing and offers. Maybe the most common one is recommendations algorithms. Almost every ecommerce site has product recommendations, but I’m always surprised to see how rarely they’re refined with testing. We’ve run several experiments where recommendations actually reduce conversion rates due to a distracting user experience or low quality results. Conversely, we’ve seen that testing the placement, algorithm, labeling, and number of recommendations can lead to as much as 5–10% incremental revenue per visitor!

How far will personalization on the internet go? Will every site I visit soon say, “Hello David, you’re looking handsome today”?

Personalization is a hot topic, but I think the hype mostly misses the point. “Hello David” doesn’t meaningfully impact the user experience, and if anything it can hurt it by seeming creepy or invasive. Coming from a search background, I think about relevance. I expect to see more of the internet resemble Netflix’s show list or Facebook’s news feed. Every user coming to these apps sees something different tailored to them, but it doesn’t scream “this is personalized!”. The key to achieving that kind of relevance is constant experimentation on the algorithms that power these experiences and the user experience around them.

Michael Phelps is an interesting choice to Keynote this year’s Opticon 18. But I imagine his training was data driven… What do you think he’ll speak about?

Experimentation and personalization are buzzwords, but the real point is winning! Michael Phelps is a living example of what it means to come in first place. The difference between gold and bronze can come down to mere milliseconds. We see this with our customers every day: the difference between the market leader and everyone else can come down to just a few impactful changes to the user experience.

Could we see a screenshot from Optimizely’s dashboard of the product being used on Optimizely.com?

Sure, here’s a list of some of the experiments we’re running on our own users:

And here’s a live example of how we personalize our own homepage — in this case, to a prospect from the travel industry:

What mindsets or traits do you think drives effective product development experimentation?

You have to be incredibly curious about your users and open-minded about how they use your product. You have to be skeptical of conventional wisdom and naive readings of the data. And most of all, you have to have conviction.

Think of experiments as a tool for building something new and bold — it’s the safety net that lets you try something new without too much risk. So be fearless!

What roles are you currently hiring for? What traits do you value when building your product team?

We’re always looking for great engineers, and we’re hiring in San Francisco and Austin! We’re also hiring for a range of customer-facing roles like sales engineering, customer success, and technical support in SF, New York, Amsterdam, London, Cologne, and Munich. Check out our jobs page to learn more. If you’d like to learn more about our product team’s culture, check out our engineering blog for Q&As with some of our developers.

Connect with Jon on Twitter or LinkedIn.

Optimizely is the world’s leader in digital experience optimization, allowing businesses to dramatically drive up the value of their digital products, commerce and campaigns through its best in class experimentation software platform.

Disclosure: Optimizely has previously sponsored Hacker Noon.