Ever Wonder How Spotify Discover Weekly Works? Data Science

Screen Shot 2016-08-18 at 9.38.59 AM

New music is everywhere. Hundreds if not thousands of new albums are released each week between major labels, mid-level subsidiaries, independent shops, and droves of label-less hopefuls. So with all those sweet new tunes out there, how do you dig through the dreck and find what sings to your soul?

Music is a deeply personal experience, and describing what you like or dislike about a particular song or artist can sometimes be frustratingly difficult. This can make finding new music difficult, and discovering hidden gems near impossible.

The answer? Spotify Discover Weekly. As veteran Spotify users know, Discover Weekly is a curated playlist of 30 songs ranging from new releases to deep cuts, personalized just for you. But how does it work? Data science.

“Recommendation is a really common problem for data scientists,” said Lucas Ramadan, a student in GalvanizeU’s data science master’s program. “The most common technique used for recommendation is called collaborative filtering.”

Recommendation engines have become commonplace in our daily lives. Netflix uses them to recommend new movies and TV shows we might like, while Amazon uses them to turn shoppers on to new products. The trick to collaborative filtering is that it recommends new things based on similarity between users, not between items.

In the case of Spotify, that means a huge database filled with everything that users have already listened to, where the rows are filled with users, and the columns are all the songs each user has listened to. A collaborative filtering algorithm finds users that are similar to each other, based upon their usage—the songs in common they have listened to—and then recommends the songs that only one person has listened to to the other.

But collaborative filtering isn’t the only thing responsible for setting you up with that hot new M83 track. Spotify discover actually uses what’s known as an ensemble method—a collection of models of which collaborative filtering is a member of.

“A big problem for collaborative filtering is what’s called the ‘cold start problem,’ which is when you’re starting a new product and you have no user data,” Ramadan said. For Spotify, this manifests when you have a new user who hasn’t listened to very much yet, as well as when you have an obscure, unpopular, or new song that not many people have listened to yet.

The data flow of Spotify Discover Weekly. [Image via Spotify]

Spotify wants to be able to recommend these new songs (and deep cuts) so to get around the cold start problem, it uses what’s called convolutional neural networks to actually analyze the songs themselves.

“The convolutional neural network is run over the acoustics of a song itself and analyzed to determine songs that have similar acoustic patterns,” Ramadan said.

A third method used is a form of natural language processing. In natural language processing, there’s a technique called Word2Vec, which takes words and encodes them into a mathematical representation—a vector. In these mathematical representations, vectors with a similar shape would equate to words with a similar meaning. Basically, it’s mathematical representation of the implicit associations and relationships between words that we know to be true in everyday speech.

What Spotify does is very similar to Word2Vec. It takes playlists and treats them as a paragraph or big block of text, and treats each song in the playlist as an individual word. This results in vector representations of songs that can be used to determine two pieces of music that are similar. As such, Spotify is able to determine which songs are similar to each other, thus enabling it to tackle the cold start problem and recommend songs with very few plays.

One of the things that makes Discover so good is that it employs a technique called outlier detection to differentiate things you actually like. Outlier detection is commonly used in financial security—it’s what banks and credit card companies use to detect fraudulent charges—but it also has uses in recommendation engines.

Essentially, outlier detection is used to determine if a particular usage—that is, listening to a song—is part of a normal pattern of behavior or not. This way, if you usually only listen to classic rock and ’90s alternative, your Discover Weekly playlist won’t get filled up with pop hits when your little brother plays Justin Bieber one time.

“Now, if you keep listening to Bieber 50-50 with other stuff, then it will start to recommend songs similar to Bieber,” Ramadan said. “The idea is that it initially flags it as an outlier and largely ignore it, only adding it to your recommendations if you continue that usage pattern.”

With all these algorithms working together, it’s no wonder that Discover Weekly is a hit. The general sentiment seen on places such as Twitter, as well as feedback collected by Spotify itself, suggests that people are very pleased with the 30 new songs recommended each week.

And if not? Well, all you can blame is the data.

galvanize_logomark_text_4c

Level Up