Surprise: Movie Recommender System Example

Surprise is an easy-to-use Python scikit for recommender systems.

This example uses MovieLens dataset with 100, 000 5-star ratings and 3,600 tag applications applied to 9,000 movies by 600 users. It was collected at various times by GroupLens, a research lab in the Department of Computer Science and Engineering at the University of Minnesota.

import numpy
0.4s
Python
conda install -c conda-forge scikit-surprise
51.5s
Bash in Python

Using evaluating RSME, MAE of algorithm SVD.

from surprise import SVD, KNNBasic
from surprise import Dataset
from surprise.model_selection import cross_validate
# Load the movielens-100k dataset (download it if needed).
data = Dataset.load_builtin('ml-100k', prompt = False)
# Use the famous SVD algorithm.
algo = SVD()
# Run 5-fold cross-validation and print results.
cross_validate(algo, data, measures=['RMSE', 'MAE'], cv=5, verbose=True);
35.6s
Python
{'fit_time': (6.243653774261475, 6.378984451293945, 6.2787439823150635, 6.4729087352752686, 6.394500017166138), 'test_mae': array([0.73479121, 0.74070903, 0.73455941, 0.73962714, 0.73664166]), 'test_rmse': array([0.93131499, 0.94057535, 0.93354637, 0.93404636, 0.93581008]), 'test_time': (0.3380296230316162, 0.2197892665863037, 0.1990349292755127, 0.32216334342956543, 0.20287752151489258)}

Using evaluating RSME, MAE of K nearest neighbors algorithm.

# Use KNNBasic().
algo = KNNBasic()
# Run 5-fold cross-validation and print results.
cross_validate(algo, data, measures=['RMSE', 'MAE'], cv=5, verbose=True);
30.7s
Python
{'fit_time': (0.6743791103363037, 0.633216142654419, 0.6172466278076172, 0.6201856136322021, 0.6394896507263184), 'test_mae': array([0.7712308 , 0.77499722, 0.7739747 , 0.77178195, 0.77739389]), 'test_rmse': array([0.97460773, 0.97645493, 0.98057108, 0.98032195, 0.98431783]), 'test_time': (4.963768720626831, 5.0948402881622314, 5.150168418884277, 5.026345729827881, 5.3033013343811035)}

Obs 1. SVD seems to be more accurate compared to KNNBasic, but it seems to take longer.

Using evaluating RSME, MAE of algorithm SVD++.

from surprise import SVDpp
from surprise import Dataset
from surprise.model_selection import cross_validate
# Load the movielens-100k dataset (download it if needed).
data = Dataset.load_builtin('ml-100k', prompt = False)
# Use the famous SVD algorithm.
algo = SVDpp()
# Run 5-fold cross-validation and print results.
cross_validate(algo, data, measures=['RMSE', 'MAE'], cv=5, verbose=True);
577.8s
Python

Obs 2. SVD++ takes unreasonably longer.

Obs 3. The error is around 0.93 RSME at best, and this is on a 5-star rating. Although there is no consistency in literature, we can normalize it with the range. And this would loosely mean that our model is 20% undervaluing or overvaluing a rating, based on the data.

Cheers,

Runtimes (1)