30 Million Songs Down to 30

Written by helmz | Published 2018/03/25
Tech Story Tags: artificial-intelligence | music | machine-learning | tech | startup

TLDRvia the TL;DR App

Generating millions of personalized playlists @AnghamiTech

Anghami is the leading Music streaming service in the MENA region. My team leads on personalization related features, the champion of which is the “Your Weekly Mixtape” feature. The Mixtape is a playlist of 30 songs that we generate weekly for every Anghami user.

I know! I liked them all 😀

Every week our users get a fresh set of songs based on their taste. It uses our recommendation pipeline (codename HARP) to sift through 30 Million songs and create a playlist that our users will love; we balance between songs a user knows and what we think they should discover.

Curating music based on a user’s taste profile (musical fingerprint).

Going deeper into how we generate the Mixtape

First, our algorithms use our data to pair up all user-song (optionally user-artist as well) interactions ( see Song2Vec & Collaborative Filtering + more on these in future posts) to project songs & artists into what is called a Latent Space by giving them vector representations; essentially a method that allows us to take any 2 pairs of items and get a “similarity score” between them. Now we have the ability to take any item and score all other items by similarity, so then we take it from here.

Next we need to transform a user’s abstract “taste” into something we can work with. We take the user’s recent interactions with content on Anghami and use an exponential time decay on the scores, in simple terms: a more recent interaction is worth more than an older one. This allows us to capture a user’s changing taste over time while still incorporating their histories so no 1 week of change in habits can throw the Mixtape off, i.e. we allow your guilty pleasures :) In more technical terms we project the user’s “taste” into the same Latent Space where we projected songs & artists allowing us to assign a perceived “interest score” between users & content.

Now that we can assign a score between each user and song we can go ahead and take a crack at generating a playlist. Simply getting the list of the closest 30 songs to a user proves to be problematic. Recommender Systems are a great tool, but without some logic overlaid they tend to generate boring results; they are very likely to give results from the artists the user has interacted with, they also tend to give multiple songs from the same artist and such.

To work around these aspects of the system we layer some final bits of human logic to aid these machines:

After we generate the first ever Mixtape, we keep track of all songs and artists ever added to any user’s Mixtape to make sure we don’t recommend the same song across different weeks and so we don’t keep shoving the same artists in consecutive weeks.

  1. We make sure the songs haven’t been played by the user recently
  2. We make sure we aren’t giving different versions of the same songs previously recommended
  3. We start off by generating up to 200 “possible” tracks for a User (1 by artist)
  4. We factor in how much a user “knows” a given artist (streams + % of discography) to use that in the sorting of the list. This is to allow for discovery or we’ll just be giving users songs from artists they know and the Mixtape will become boring.
  5. We now have a re-sorted list of the 200 initial tracks from which we select the top 30 tracks for the Mixtape.

And there you have it, a simple-ish formula to generate a personalized playlist and get LOVE from users. As normal with all products the formula keeps on changing as we iterate on the the product to make it better.

The chart below gives an idea of how often users interact with the Mixtape. We see a slow start when it was in staged rollout on a small segment for data verification & validation. Then we see a quick rise which happens when we rollout a new feature and user start seeing and interacting with it, after which we get a slight drop and stabilization and back to growth. This is a pretty standard usage chart which is quite healthy as we see near the end of the chart that our usage is split almost equally between users that have never tried the Mixtape before, users that streamed it the previous weeks and users that streamed it before but didn’t the previous week. We had a few dips (2 main ones) where we had issues with sending out notifications and not displaying the Mixtape on the user’s homepage properly 🙊

Usage on the Mixtape by User Segment

More posts on Recommendations coming soon: our take on Collaborative Filtering, “Your Welcome Mixtape”, Content-Based Recommendations (codename Cochlea), Radios@Anghami, NewMusic@Anghami.

A big thanks to Anghami and my team (Ramzi Karam, Abdallah Moussawi, Omar El Zarif).

Shoutouts to the Apache Spark team, on top of which HARP is built.

Join the fun: https://www.anghami.com/careers


Published by HackerNoon on 2018/03/25