Spotify’s Discover Weekly: How machine learning finds your new music

The science behind personalized music recommendations

This Monday — just like every Monday— over 100 million Spotify users found a fresh new playlist waiting for them. It’s a custom mixtape of 30 songs they’ve never listened to before but will probably love. It’s called Discover Weekly_,_ and it’s pretty much magic.

I’m a huge fan of Spotify, and particularly Discover Weekly. Why? It makes me feel seen. It knows my musical tastes better than any person in my life ever has, and I am consistently delighted by how it satisfies me just right every week, with tracks I myself would never have found or known I would like.

For those of you who live under a musically soundproof rock, let me introduce you to my virtual best friend:

A Spotify Discover Weekly playlist — specifically, mine.

As it turns out, I’m not alone in my obsession with Discover Weekly—the user base went crazy for it, which has driven Spotify to completely rethink its focus, investing more resources into algorithm-based playlists.

body[data-twttr-rendered="true"] {background-color: transparent;}.twitter-tweet {margin: auto !important;}

It's scary how well @Spotify Discover Weekly playlists know me. Like former-lover-who-lived-through-a-near-death experience-with-me well.

— @dave_horwitz

At this point @Spotify's discover weekly knows me so well that if it proposed I'd say yes

— @amandawhitbred

Ever since Discover Weekly debuted in 2015, I’ve been dying to know how it worked (plus I’m a fangirl of the company, so sometimes I like to pretend I work there and research their products.) After three weeks of mad googling and a great conversation on Spotify’s roof deck with data engineer Nikhil Tibrewal, I feel grateful to have finally gotten a glimpse behind the curtain.

So how does Spotify do such an amazing job of choosing those 30 songs for each person each week? Let’s zoom out for a second to look at how other music services have done music recommendations, and how Spotify’s doing it better.

A brief history of online music curation

Back in the 2000s, Songza kicked off the online music curation scene using manual curation to create playlists for users. “Manual curation” meant that some team of “music experts” or other curators would put together playlists by hand that they thought sounded good, and then listeners would just listen to their playlists. (Later, Beats Music would employ this same strategy.) Manual curation worked okay, but it was manual and simple, and therefore it couldn’t take into account the nuance of each listener’s individual music taste.

Like Songza, Pandora was also one of the original players in the music curation scene. It employed a slightly more advanced approach, instead manually tagging attributes of songs. This meant a group of people listened to music, chose a bunch of descriptive words for each track, and tagged the tracks with those words. Then, Pandora’s code could simply filter for certain tags to make playlists of similar-sounding music.

Around that same time, a music intelligence agency from the MIT Media Lab called The Echo Nest was born, which took a radically more advanced approach to personalized music. The Echo Nest used algorithms to analyze the audio and textual content of music, allowing it to perform music identification, personalized recommendation, playlist creation, and analysis.

Finally, taking yet another different approach is Last.fm, which still exists today and uses a process called collaborative filtering to identify music its users might like_._ More on that in a moment.

So if that’s how other music curation services have done recommendations, how does Spotify come up with their magic engine, which seem to nail individual users’ tastes so much more accurately than any of the other services?

Spotify’s 3 Types of Recommendation Models

Spotify actually doesn’t use a single revolutionary recommendation model — instead, they mix together some of the best strategies used by other services to create its own uniquely powerful Discovery engine.

In 2014, Spotify actually bought The Echo Nest to gain access to their data and algorithms surrounding audio and text analysis, and they also use collaborative filtering algorithms similar to those used at Last.fm.

Therefore, to create Discover Weekly, there are three main types of recommendation models that Spotify employs:

Collaborative Filtering models (i.e. the ones that Last.fm originally used), which work by analyzing your behavior and others’ behavior.
Natural Language Processing (NLP) models, which work by analyzing text.
Audio models, which work by analyzing the raw audio tracks themselves.

Image credit: Chris Johnson, Spotify

Let’s take a dive into how each of these recommendation models work!

Recommendation Model #1: Collaborative Filtering

First, some background: When many people hear the words “collaborative filtering”, they think of Netflix, as they were one of the first companies to use collaborative filtering to power a recommendation model, using users’ star-based movie ratings to inform their understanding of what movies to recommend to _other “_similar” users.

After Netflix used it successfully, its use spread quickly, and now it’s often considered the starting point for anyone trying to make a recommendation model.

Unlike Netflix, though, Spotify doesn’t have those stars with which users rate their music. Instead, Spotify’s data is implicit feedback — specifically, the stream counts of the tracks we listen to, as well as additional streaming data, including whether a user saved the track to his/her own playlist, or visited the Artist page after listening.

But what is collaborative filtering, and how does it work? Here’s a high-level rundown, as encapsulated in a quick conversation:

Image by Erik Bernhardsson

What’s going on here? Each of these two guys has some track preferences — the guy on the left likes tracks P, Q, R, and S; the guy on the right likes tracks Q, R, S, and T.

Collaborative filtering then uses that data to say,

“Hmmm. You both like three of the same tracks — Q, R, and S — so you are probably similar users. Therefore, you’re each likely to enjoy other tracks that the other person has listened to, that you haven’t heard yet.”

It therefore suggests that the guy on the right check out track P, and the guy on the left check out track T. Simple, right?

But how does Spotify actually use that concept in practice to calculate millions of users’ suggested tracks based on millions of other users’ preferences?

…matrix math, done with Python libraries!

In actuality, this matrix you see here is gigantic. Each row represents one of Spotify’s 140 million users (if you use Spotify, you yourself are a row in this matrix) and each column represents one of the 30 million songs in Spotify’s database.

At the matrix’s intersections, where each user meets each song, there is a 1 if the user has listened to that song, and a 0 if the user hasn’t. So, if I listened to the song “Thriller”, the place where my row meets the column representing “Thriller” is going to be a 1. (Note: Spotify has experimented with using the actual number of streams, vs. a simple 1 vs. 0.)

Of course, this makes for a very sparse matrix— there are way more songs a given user hasn’t listened to than the ones he/she has, so the majority of the entries in the matrix are just ‘0’. But the placement of those few ‘1’s holds critical information.

Then, the Python library runs this long, complicated matrix factorization formula:

Some complicated math…

When it finishes, we end up with two types of vectors, represented here by X and Y. X is a user vector, representing one single user’s taste, and Y is a song vector, representing one single song’s profile.

The User/Song matrix produces two types of vectors: User vectors and Song vectors.

Now we‘ve got 140 million user vectors — one for each user — and 30 million song vectors. The actual content of these vectors is just a bunch of numbers that are essentially meaningless on their own, but they are hugely useful for comparison.

To find which users have taste most similar to mine, collaborative filtering compares my vector with all of the other users’ vectors using a mathematical dot product. Whichever produces the lowest product is the most similar user to me. The same goes for the Y vector, songs — you can compare a song’s vector with all the other song vectors, and find which songs are most similar to the one you’re looking at.

Collaborative filtering does a pretty good job, but Spotify knew they could do even better by adding another engine. Enter NLP.

Recommendation Model #2: Natural Language Processing (NLP)

The second type of recommendation model that Spotify employs are Natural Language Processing (NLP) models. These models’ source data, as the name suggests, are regular ol’ words — track metadata, news articles, blogs, and other text around the internet.

Natural Language Processing — the ability of a computer to understand human speech as it is spoken — is a whole vast field unto itself, often harnessed through sentiment analysis APIs.

The exact mechanisms behind NLP are beyond the scope of this article, but here’s what happens on a very high level: Spotify crawls the web constantly looking for blog posts and other written texts about music, and figures out what people are saying about specific artists and songs — what adjectives and language is frequently used about those songs, and which other artists and songs are also discussed alongside them.

The most-used terms bucket up into what Spotify calls “cultural vectors” or “top terms.” Each artist and song has thousands of daily-changing top terms. Each term has a weight associated, which reveals how important the description is (roughly, the probability that someone will describe music as that term.)

“Cultural vectors”, or “top terms”. Table from Brian Whitman

Then, much like in collaborative filtering, the NLP model uses these terms and weights to create a vector representation of the song that can be used to determine if two pieces of music are similar. Cool, right?

Recommendation Model #3: Raw Audio Models

First, a question. You might be thinking:

But, Sophia, we already have so much data from the first two models! Why do we need to analyze the audio itself, too?

Well, first of all, including a third model further improves the accuracy of this amazing recommendation service. But actually, this model serves a secondary purpose, too: Unlike the first two model types, raw audio models take into account new songs.

Take, for example, the song your singer-songwriter friend put up on Spotify. Maybe it only has 50 listens, so there are few other listeners to collaboratively filter it against. It also isn’t mentioned anywhere on the internet yet, so NLP models won’t pick up on it. Luckily, raw audio models don’t discriminate between new tracks and popular tracks, so with their help, your friend’s song can end up in a Discover Weekly playlist alongside popular songs!

Ok, so now for the “how” — How can we analyze raw audio data, which seems so abstract?

…with convolutional neural networks!

Convolutional neural networks are the same technology behind facial recognition. In Spotify’s case, they’ve been modified for use on audio data instead of pixels. Here’s an example of a neural network architecture:

Image credit: Sander Dieleman

This particular neural network has four convolutional layers, seen as the thick bars on the left, and three dense layers, seen as the more narrow bars on the right. The input are time-frequency representations of audio frames, which are then concatenated to form the spectrogram.

The audio frames go through these convolutional layers, and after the last convolutional layer, you can see a “global temporal pooling” layer, which pools across the entire time axis, effectively computing statistics of the learned features across the time of the song.

All of this information then arrives at the output layer, which predicts an understanding of the song’s personality: Does it have a high tempo? Is it acoustic? Does it have high danceability? All of these characteristics can be found with pretty high accuracy just from letting these neural networks loose on the audio file.

That covers the basics of the three major types of recommendation models feeding the Recommendations pipeline, and ultimately powering the Discover Weekly playlist!

Of course, these recommendation models are all connected to Spotify’s much larger ecosystem, which includes giant amounts of data storage and uses lots of Hadoop clusters to scale recommendations and make these engines work on giant matrices, endless internet music articles, and huge numbers of audio files.

I hope this was informative and tickled your curiosity like it did mine. For now, I’ll be working my way through my own Discover Weekly, finding my new favorite music, knowing and appreciating all the machine learning that’s going on behind the scenes. 🎶

— — If you enjoyed this piece, I’d love it if you hit the clap button 👏 so others might stumble upon it. You can find my own code on GitHub, and more of my writing and projects at http://www.sophiaciocca.com.

Also, if you work at Spotify or know someone who does, I’d love to connect! I’m putting my dream to work at Spotify out into the world 😊

Thanks also to ladycollective for reading this article over and suggesting edits.