The Amazon Go Store: How do they do it?

I’m kind-of obsessing about Amazon’s new “Just Walk Out” stores.

My first reaction was: Oh, I get it. Put RFID tags on all items. Those big white things you have to walk through when you leave the store — those are RFID readers that scan your items as you leave. Simple.

And RFID is the easiest technical solution. But it doesn’t make sense.

First, there’s the cost of the tags. Even at 2–5¢ per tag, that eats into the thin retail margins on low-cost items.

Second, there’s the added operational cost of tagging each item. Most convenience stores don’t even bother with price tags any more.

Do I really think Amazon would enter a market knowing they’d have a higher cost structure than their competitors? That’s not the Amazon I know.

So I watched the video again. Turns out the thing that I thought was an RFID reader is actually a turnstile, the kind big office buildings have. (It might also have RFID readers in it, but there’s no reason to assume so.)

Also, the video clearly shows items being placed on your list as you shop. So Amazon tracks, in real time: 1) which item was taken and 2) who took it.

Knowing who took it is easy enough — cameras in the ceiling can track customer location. Software can associate a person with a head at check-in, then follow that head’s location around the store.

It could get tricky in the edge cases — say, when you stand next to someone, then reach past them to take something. But how often does that happen?

As for tracking the item taken off the shelf, they could put sensors on the shelves. Like scales, or presence sensors. They probably do use some mix of sensors, as they suggest with their cryptic reference to “Sensor Fusion Technology.”

But the truth is, they could do the whole damn thing with cameras.

Since they know where the products are on the shelves, it’s relatively easy to track what item you took. Some amount of object identification would help catch edge cases, like when someone takes two bottles off the shelf with one hand. And for now, they probably also use sensors to help with those edge cases (they reference “Sensor Fusion,” after all).

Over time, though, cameras alone should be enough. I love this.

So often, technology fails because it creates new operational complexity — e.g., adding RFID tagging to an inventory process.

AI-based tech, in this case computer vision, is different. As the technology gets more advanced, the complexity of deploying it goes down. Rather than needing more expertise to customize the solution to fit your needs, you can just throw more cameras at it.

Let the computer figure out the rest.

Hacker Noon is how hackers start their afternoons. We’re a part of the @AMIfamily. We are now accepting submissions and happy to discuss advertising &sponsorship opportunities.

To learn more, read our about page, like/message us on Facebook, or simply, tweet/DM @HackerNoon.

If you enjoyed this story, we recommend reading our latest tech stories and trending tech stories. Until next time, don’t take the realities of the world for granted!