Week 10/12 #DataScienceBootcamp

Week 10 (30.11.-04.12.)

  • Topic: Recommender Systems
  • Lessons: Unsupervised Learning, Matrix Factorization, Web Development with Flask, Collaborative Filtering, Git Collaboration, PCA, Clustering
  • Project: Create a web-based movie recommender system
  • Dataset: MovieLens (100k)
  • Code: GitHub

This was a really exciting week, because we had a team project which combined the power of Machine Learning algorithms with the beauty of Web Development!

Making a recommender system

Spotify’s Discover Weekly, Netflix’s “Watch Next”, Amazon’s “You might like…” – these are all examples of recommender systems. They basically use the data (history) of their users (what music they listened to, what series they watched, what they bought) to discover patterns in their preferences and recommend more similar products (and in this way keep them consuming).

There are two main approaches to recommender systems:

memory-based
(heuristic, non-parametric)
model-based
(algorithmic, parametric)
content filteringcollaborative filteringcontent filteringcollaborative filtering
– TF-IDF
– similarity
– clustering
– similarity
– KNN
– clustering
– Bayesian classification
– Neural Networks
– Bayesian networks
– Neural Networks
– SVD

In our project, we used two algorithms:

  • NMF (non-negative matrix factorization)
  • SVD (singular value decomposition)

Making a web-page with Flask

I chose to focus on the web development part of the project, in order to gain new skills and refresh my HTML and CSS skills (that I last used in one of my jobs over a year ago). For this, I used Flask – a lightweight Python framework for web-development, which makes it really easy to create simple web apps. Flask also uses Jinja templates, which are are files that contain static data or placeholders for dynamic data. In this case, I used Jinja to render the movies from our dataset and the additional information (posters, trailers) that we got with the TMDB API.

On the main page, we display 15 movies randomly selected from the 50 most rated movies in the dataset, to ensure that there is a high probability that the user knows at least some of these movies. Users are asked to rate as many of them as possible or leave the slider at 0 if the haven’t seen the movie. In the seconds step, we ask users what kind of movies they prefer: old, new, or any. The answers to these two questions (movie ratings and year preference) are our input data, which is used for the NMF and SVD algorithms to make predictions.

After submitting their responses (and waiting a bit, because the algorithms take a while to calculate), the users are taken to the recommendations page. Here we display 5 movies that are similar to movies the user has rated highest and belong to the preferred year category, as calculated by the algorithm.

Friday Lightning Talk

The lightning talk of this week will actually take place next week on Wednesday, to give us more time to work on this complex project, while also brainstorming and trying out ideas for the final project. But we are almost done, so here is a demo of our recommender system (feel free to test it out ):

Comments are closed.

Website Powered by WordPress.com.

Up ↑

%d bloggers like this: