Data-Driven Validation for Business Ideas: A Step-by-Step Guide

Are you looking to start a side project but don't know where to begin? A data-driven approach to validating business ideas is a great way to get started. This method helps you to make decisions based on facts and data, rather than assumptions and guesswork.

In this article, we'll provide a step-by-step guide to generating and validating business ideas through data-driven validation.

In this guide:

How Do We Generate Ideas?
Get The Data
Idea Validation Using AI
Bonus: Customer Discovery
Conclusion

Let's get to it!

💡 How Do We Generate Ideas?

There are many ways to find business ideas: we can identify problems that we or others are having, research what solutions exist, find and talk to potential customers, create surveys, use competitor analysis to understand the market, etc. Or, as Paul Graham recently said :

The way to get new ideas is to notice anomalies: what seems strange, or missing, or broken?

Then, build a minimum viable product (MVP), put it in the hands of users, and measure customer feedback. We then take that feedback and iterate on it until we find the right product-market fit (or pivot, or simply drop the project altogether).

Talking to our potential customers is very critical throughout the process, which can be a somewhat challenging experience, especially for the more introverted entrepreneurs among us. Not to mention that sometimes, it's not clear where to find those potential customers: Twitter, Reddit...etc.

So, what if we could find a way to do some validation before talking to users?

We don’t need to reinvent the wheel. Going from 0 to 1 might not be the only path. Jakob Greenfeld’s excellent article on the topic is spot-on:

Most people don’t get into business to change the world. They simply want to provide value, be compensated for it, and do it on their own terms.

Nowadays, there’s a wealth of data freely available online. The trick is that data doesn’t necessarily look like data from our perspective. In this particular instance, I’m referring to the app stores.

App stores are a great source because they centralize a lot of different types of data: ratings, reviews…etc. Any app store can work for this (Google, Appel, Amazon…etc). The same approach would also be used with data stemming from software review aggregators (G2, Capterra…etc). It may even be possible to visualize the data from within the platform, but that would be a paid service.

I’ve used the Android Play Store for this article because it’s the one I’m most familiar with, but this approach can work with any playstore out there.

I’m a firm believer that we can learn just as much from success as we can from failure. With this in mind, using the app store data to find and validate business ideas doesn’t only mean relying on the top-performing apps. We can also analyze the apps that aren’t doing as well as expected or investigate why a promising concept isn’t taking off.

The idea is very simple: A poorly rated yet often downloaded app is a signal that a need exists and that it is possible to do it more effectively.

🧲 Get the Data

Getting the data we need from the Play Store is somewhat straightforward, depending on how tech-savvy we are. We can scrape it using one of the many available tools online, such as Octoparse. If the process is unclear, consider using this guide to sort it out. It should be relatively simple.

We’re looking for apps with over 500k downloads but with a 3-star rating or lower. That will tell us what is in high demand (because why would people download it if they didn't need it?) but where poor execution is causing user dissatisfaction.

In the same way, it might be interesting to check out the best-rated apps with low download numbers. That could be an indication of a great product with potential if it could get in front of the right people.

The outcome looks like a CSV with 4-5 columns, like so:

This might need some quick rework since the extraction provides JSON-formatted data, which doesn't play nicely with the CSV format. Hit me up if you need help!

1. Data Cleansing

With the raw data in hand, we’ll want to make some sense of it and find pertinent information.

A quick analysis will allow us to identify some segments that should be excluded. Apps with more than 1 million downloads are usually mobile versions of behemoths (Amazon Prime, Google, etc.). While it’s important to dream big, it’s not relevant at the moment.

Don’t get me wrong; you could choose to take your analysis there and dig deeper; I just don’t think it wise considering the stated objective.

So we’ll filter for the apps to between 500k and 1 million downloads.

2. Analysis Praxis

I’ve aggregated the data by categories as follows :

The second field is just the number of apps in that category in our data, and the last one is the average rating for that category.

The top 3 categories are tools, entertainment, and finance, and the average rating hovers around 2.7. That’s 1375 apps across all 3 categories that have been downloaded at least 500k times. That's a lot of unmet needs and dissatisfied users.

For those of us who prefer charts:

The blue graph is the average rating, and the orange graph represents the percentage of that specific category relative to the whole list. The ideal spot is where the orange graph overlaps (by some margin) the blue one, indicating a high concentration of apps in that category while simultaneously displaying bad ratings. The chart confirms what was already transparent in the table above.

From here on out, the world is our oyster. We can take the analysis deeper and in any direction we deem interesting.

Let's take a closer look at the Tools category, for example.

3. The 'Tools' of the trade

Within the ‘Tools’ category apps, we'll search for apps with at least 1 million downloads and sort them by average rating (ascending).

The list is still quite long, but at least we have a niche and some initial targets. We can then drill down into each app and check out their features, who their competitors are, and what the market potential is when appropriate. We could some sentiment analysis for example on the reviews collected for each product on the list and extract the main themes/keywords. That would be a good way to determine the immediate areas of improvement (or even features) for any new product in that niche.

This would be a topic for another day, so stay tuned!

The goal is to get more information about the specific niche and leverage that data into the next step.

And so, on top of the well-known validation methods, I can suggest two additional tools to further validate the ideas. All of these can be combined to unlock useful insights; it's really a matter of determining what's useful in which context.

🤖 Idea validation using AI

Yes, AI is all the rage, and no, it's not just another guide taking advantage of the trend. This is not about ChatGPT, after all. For now.

Roiquant is a startup intelligence platform for founders. They offer different data services, such as competitive landscape, postmortem analysis, etc. In our case, we're interested in their "Idea Validation" tool.

The first component measures the "uniqueness" of our idea based on the input provided, as shown here:

To illustrate, let's use the example of the "Smart Air Conditioner" app from the list above. We'll enter the inputs to the best of our knowledge, and obviously, the more precise the inputs, the better the outcome will be. But as with any other validation process, the goal is not to reach a state of perfect information; it's impossible. Instead, we aim to de-risk as much as possible and confirm the most critical hypotheses before starting to build anything.

The tool was down while I was writing this, I only got this screenshot after it was fixed, I used somewhat random inputs for illustration purposes.

We get an overall risk of failure measure (59.2% in this example), which is calculated from the different sub-measures for each category. For example, the location and market got a 1% rating, reflecting the difficulty of doing business in those areas of the world.

The novelty score of 20% also shows our idea isn't very innovative, which technically increases the risk of failure (but not always).

There's another component regarding the viability of the business that will also improve the results, however, some of the required inputs (business valuation, monetization...etc) aren't really within our scope since we're too early for that.

This is the stage where we should be making the go/no-go decision. An idea assessed to be "high risk" (70% and above) should be dropped on the spot. This is a somewhat arbitrary threshold, and it's more art than science.

🕵️ Bonus: Customer Discovery

Now that we have a better idea of what we're looking for, it's time to start engaging with potential users and getting their feedback. It's essential to identify the right people to talk to, as they'll be able to provide insights that will help refine our idea.

There are many tools that help locate those potential first users through keyword searches, social listening...etc. We may address that in a future article.

For now, I'd like to talk about two specific solutions. The first one is CustomerDiscovery.io.

The company "helps startups grow faster by giving them an all-in-one workspace to gather, organize, and analyze feedback across several departments".

In a nutshell, this platform allows founders to interview potential early adopters and get valuable feedback, which is exactly what we need at this stage!

The second solution is Respondent.io. The platform is meant for slightly more advanced user research projects and provides the possibility to recruit vetted users based on multiple criteria (occupation, location...etc) to provide deeper insights. There's also the possibility to provide "incentives" to the participants, i.e pay the interviewees a specified amount (at the discretion of the Project Owner). And obviously, the higher the reward the better feedback we get.

Understandably, this is a tool for when we reach a certain maturity threshold. It might not be a perfect fit for every project (especially for indie hackers/solopreneurs for example) but it's a good resource nonetheless.

Conclusion

Well, there you have it!

The process of validating an idea is complex and requires a lot of effort. It's not just about having a great idea, but also about understanding the market, the competition, and the potential users.

By combining the traditional idea validation methods with AI-based tools and customer discovery platforms, we can gain a better understanding of the idea and its potential. This will help us make an informed decision on whether or not to pursue the idea and how to best go about it.

The process should be iterative with constant refinement and adjustment. With the right approach and the right tools, we can make sure our idea has the best chance of success!

Thanks a lot for your attention! 😄

I'm an entrepreneur on a journey! If you enjoyed this piece, there'll be more where that came from: Twitter & The Generalist’s Thinkbox newsletter.