Critical Musings: Psychology, Validity, and Reliability of Online Ratings

Written by wasehahmad | Published 2021/12/24
Tech Story Tags: consumer-behavior | ratings | psychology | online-reviews | user-ratings | confirmation-bias | behavioral-science | technology

TLDRScene: “ You just got back from a flight and immediately pull out your phone to call an Uber. It's a busy day as always at the airport, with people bustling out of the airport in search of their cars. But you manage to track down your Uber and get in. After that, it's an uneventful ride back home. Finally! But you have one thing left to do before you start browsing for your next trip. You need to rate the Uber ride experience. If everything went well, and you don't have complaints, perhaps you'll give it a 4 or 5. But what if you had hailed the ride in peak hours. Your bill jumped up exponentially, and just to get home 1 mile away, you had to pay $100. What rating do you give then? How about if you've never ridden in an Uber before and don't know what to expect?” We think we may be giving a rating depicting the prowess of the driver, but do we think to account for our own biases and emotions? Or even the design of the rating system?via the TL;DR App

Scene:

“ You just got back from a flight and immediately pull out your phone to call an Uber. It's a busy day as always at the airport, with people bustling out of the airport in search of their cars. But you manage to track down your Uber and get in. After that, it's an uneventful ride back home. Finally! But you have one thing left to do before you start browsing for your next trip.

You need to rate the Uber ride experience.

If everything went well, and you don't have complaints, perhaps you'll give it a 4 or 5. But what if you had hailed the ride in peak hours. Your bill jumped up exponentially, and just to get home 1 mile away, you had to pay $100. What rating do you give then?

How about if you've never ridden in an Uber before and don't know what to expect?”

We think we may be giving a rating depicting the prowess of the driver, but do we think to account for our own biases and emotions? Or even the design of the rating system?

Did you know if your driver's rating drops below ~4.6, they get suspended? If I can find a 4.6 star rated restaurant on Google I think I probably wouldn't have any complaints. But the two systems have very different ranges of operation.

See tweet here.

How do People Assign Ratings?

Unless the driver did something absolutely preposterous or was a terrible driver, I likely would give them a rating of 5.

Sure, Uber averages the past 500 ratings, so even if I did give them a low rating, it likely wouldn't have an impact.

But, once they do drop below the threshold, they're out. So for me, there is a norm for handing out 5s. For restaurants, I'd probably give them a 4 unless they were subjectively excellent or marginally terrible.

The question is, how do these norms overlap between groups of people?

1. Lashing out and Sharing the joy

As hard as I might try to remain logical throughout my reasoning, emotions tend to creep through.

Unfortunately for goods and service providers, they can sometimes be at the mercy of the emotions of people.

Have a bad day before you go to eat out at a restaurant? Maybe every minute your food doesn't arrive makes you even more upset. Now you feel you have to warn the rest of the world of your terrible 1-star experience.

Conversely, if you find yourself hailing an Uber to go hang out with your friends might put you in a giddy enough state where you give a 5-star rating and a $10 tip without even thinking about it.

What are the conventions for giving reviews and assigning ratings? What are we motivated by and what manipulates us into giving reviews that we might think are very much our own?

2. Rating not Based on the Object, but Your Expectation

Ratings are never absolutes.

What does a 5 mean? What does a 3.34 mean? If there was only one product in the world, does it even matter what rating it has?

What this means is that when people assign ratings, they are comparing the product/service relative to others. But also, to the expectation of the ratings. What do I expect a 4.5 star rated laptop to feel like? If they don't live up to the expectations of what other relative 4.5 star rated laptops have given me, I might give it a lower review.

Where does that expectation come from? And how high (or low) can that expectation be?

Kimberly was skeptical about the effectiveness of a carpet cleaner, but ended up giving a 5-star review afterward! Would it be just 4 stars if she didn't have such low expectations? Even if the product worked perfectly?

3. Do you Need Social Proof?

How often do feel like you need to conform to the group? Usually, when you don't already have strong preferences, you are susceptible to being nudged into action.

One such way is through social proof, where people will copy the actions of others when faced with a decision they may be unsure about. We know that these behavioral designs can affect consumer behavior, as analyzed in this literature review on nudges.

What this means is that we can also be 'nudged' into providing reviews that align with the current rating of the product/service. At least when we don't have clear preferences.

Asked to review a place on google maps? Just pick the average it currently has. I must admit, it's the most simple strategy to badge up on Local Guides.

Conversely, those who wish to defy the norm will go against the grain. A study published in the Journal of Interactive Marketing showed that people might use writing reviews as a way to develop their brand and worth.

If there is a sea of bad reviews, writing a strongly positive review tends to stand out, especially if you think the bad reviews are unhelpful. This bolsters one's image as a connoisseur within the domain.

Design Rules us all

These online review systems aren't built randomly. Most companies will design some form of system to get them what they need from the reviews. Here, take a look.

What this means is that different products and services define the conventions around them. Uber has designed their system where drivers get kicked off below a threshold.

This creates high sensitivity where a driver with a 4.9 is vastly better than one with a 4.8-star rating.

But that also means the range is pretty useless.

We don't know what a 1-star driver would look like because we have no way to compare.

So we might use other cues like relative experiences and expectations. If we see that they have received a high rating before, they might not have to do much to even get us to give them a high rating!

So what does this mean for us? Well, if you're about to get into an Uber, understand that there might be multiple reasons for their current rating, and maybe the best way to rate is off of a heuristic.

If you would ride with this driver again, a 5 would make sure they don't drop below the threshold from your review. Maybe a restaurant you really couldn't be bothered with again but was still decent is a 3. To others, it is a 4, or maybe a 5.

Rating systems have a lot of bias so treating them with a grain of salt might be worth it, right?


Written by wasehahmad | Writing stories, insights, and code. Avidly curious learner
Published by HackerNoon on 2021/12/24