search
Search
Login
Unlock 100+ guides
menu
menu
web
search toc
close
Comments
Log in or sign up
Cancel
Post
account_circle
Profile
exit_to_app
Sign out
What does this mean?
Why is this true?
Give me some examples!
search
keyboard_voice
close
Searching Tips
Search for a recipe:
"Creating a table in MySQL"
Search for an API documentation: "@append"
Search for code: "!dataframe"
Apply a tag filter: "#python"
Useful Shortcuts
/ to open search panel
Esc to close search panel
to navigate between search results
d to clear all current filters
Enter to expand content preview
icon_star
Doc Search
icon_star
Code Search Beta
SORRY NOTHING FOUND!
mic
Start speaking...
Voice search is only supported in Safari and Chrome.
Navigate to

Guide on LightFM package in Python

schedule Aug 12, 2023
Last updated
local_offer
Tags
mode_heat
Master the mathematics behind data science with 100+ top-tier guides
Start your free 7-days trial now!

Suppose we are a video-on-demand service (e.g. Netflix) and we wish to build a model that recommends relevant movies to viewers. In machine learning jargon, the viewers are known as users while the movies are known as items. For instance, for an e-commerce service, the users refer to customers whereas items refer to the products.

Most recommendation models leverage either or both of the following types of data:

  • interaction data that captures how users interact with the items. For example, the interaction data for our video-on-demand service include user ratings of the movies, the user's browsing history of movies and so on.

  • attribute data that captures information about users and items separately. For instance, the users' attribute data may be their profile (e.g. age, nationality) while the items' attribute data may be the year of release or the genre (e.g. horror).

There are mainly two types of recommendation models:

  • collaborative filtering models that leverage the fact that users who interact with the same items will share a similar preference for other items. For instance, suppose user A and user B both give a 5 star rating to a movie M. Intuitively, we should expect user A and user B to share a similar taste in movies, which means that we can recommend movies to user B based on the interests of user A.

  • content-based filtering models recommends items to an user based solely on the user's past data. For instance,

Content-based filtering models recommends items to an user based solely on the user's past data. For instance, consider the following data about an user's ratings of movies they've watched before:

Genre

Year of release

Movie 1

Horror

2020

Movie 2

Romance

1995

Movie 3

Horror

2998

1

User

Romance

Movie 1

Movie 2

Movie 3

Movie 4

Genre

Horror

Romance

Horror

Romance

Release date

2022

2020

1995

2010

User A

5

0

4

1

We can make the following (naive) conclusion:

  • this user enjoys horror movies over romance movies.

  • this user prefers modern movies over old ones.

Therefore, it would make more sense to recommend modern horror movies rather than romance moves to this user. Notice how recommendations of this type do not rely on other users - we generate the recommendations based entirely on this user's past information.

Collaborative filtering models that leverage the fact that users who interact with the same items will share a similar preference for other items. For instance, suppose user A and user B both give a 5 star rating to a movie M. Intuitively, we should expect user A and user B to share a similar taste in movies, which means that we can recommend movies to user B based on the interests of user A.

Because each of these model types have its own strengths and weaknesses, they are both widely used in practice - one type is not superior over the other. For instance, we may opt for content-based filtering models because we simply have a lack of interaction data. There also exist hybrid recommendation models.

There are mainly two types of recommendation models:

  • collaborative filtering models that leverage interaction data. The main idea is that users who interact with the same items will share a similar preference for over items. For instance, suppose user A and user B both give a 5-star rating to a movie M. Intuitively, we should expect user A and user B to share a similar taste in movies.

  • content-based filtering models that leverage attribute data.

There also exist hybrid recommendation models that leverage both interaction and attribute data. Because each of these model types have its own strengths and weaknesses, they are both widely used in practice - one type is not superior over the other. For instance, we may opt for content-based filtering models because we simply have a lack of interaction data.

Movie 1

Movie 2

Movie 3

Movie 4

User A

5

0

5

1

User B

0

4

2

5

Genre

Horror

Romance

Horror

Romance

This guide will cover how to use the LightFM package to train recommendation models based on matrix factorization.

Tutorial for LightFM

Suppose we wanted to build a movie recommendation system based on the following dataset:

df_data = pd.read_csv("./data.csv", index_col=0)
print(len(df_data)) # 10987720 rows
df_data
userID itemID rating occupation genre
0 196 242 3.0 writer Comedy
1 196 242 3.0 writer Comedy
2 196 242 3.0 writer Comedy
3 196 242 3.0 writer Comedy
4 196 242 3.0 writer Comedy

Note the following:

  • itemID refers to the movie ID.

  • rating is a numeric value that ranges from 0 to 5.

  • occupation is the profession of the user. A user can only have a single occupation.

  • genre refers to the type of the movie. An item can have multiple genres (e.g. "Comedy|Musical").

Our goal is to build a movie recommendation system based on the following features:

  • user-item interaction data - rating.

  • user attribute data - occupation.

  • item attribute data - genre.

Preparing LightFM Dataset

The first step is to prepare the dataset (Dataset) that we will feed into our LightFM model. There are two things we must supply:

  • a list of unique user occupations.

  • a list of unique movie genres.

The list of unique occupations can be obtained like so:

list_str_occupations_unique = list(df_data["occupation"].drop_duplicates())
print(len(list_str_occupations_unique)) # 21
list_str_occupations_unique
['writer', 'marketing', 'student', 'other', ... ]

The list of unique movie genres can be obtained like so:

series_genre_of_movies = df_data["genre"].str.split("|")
list_str_movie_genre_unique = list(set(np.concatenate(series_genre_of_movies).ravel()))
print(len(list_str_movie_genre_unique)) # 18
list_str_movie_genre_unique
['War', 'Animation', 'Sci-Fi', 'Comedy', 'Film-Noir', ... ]

Here, we are using the Series' split(-) method to obtain a list of genres (e.g. "Comedy|Musical" becomes ["Comedy","Musical"].

We can now build the LightFM dataset like so:

dataset = Dataset()
dataset.fit(users=df_data["userID"],
items=df_data["itemID"],
item_features=list_str_movie_genre_unique,
user_features=list_str_occupations_unique)

Preparing users features

Next, we must build the users' features. The input format expected by LightFM is as follows:

[(user_id_1, ['feature1']), (user_id_2, ['feature2']), ...]

We can obtain this input format like so:

df_data_with_unique_user_ids = df_data.drop_duplicates("userID")
list_user_features = [(x,[y]) for x,y in zip(df_data_with_unique_user_ids["userID"], df_data_with_unique_user_ids["occupation"])]
# print(len(list_user_features)) # 943
list_user_features
[(196, ['writer']), (63, ['marketing']), (226, ['student']), (154, ['student']), ... ]

We then pass this into the LightFM's build_user_features(-) method:

sm_user_features = dataset.build_user_features(list_user_features)
sm_user_features
<943x964 sparse matrix of type '<class 'numpy.float32'>'
with 1886 stored elements in Compressed Sparse Row format>

Here, the shape of sm_user_features is 943 by 964. This is because there are 943 unique users and each userID is treated as an user attribute. We also have 21 unique occupations, which means there are a total of 964 user attributes.

Preparing items features

Similarly, we build the items' features:

list_item_features = [(x,y) for x,y in zip(df_movies_uniq["itemID"], series_genre_of_movies)]
print(f"Length of item features: {len(list_item_features)}") # 352
list_item_features
[(242, ['Comedy']),
(257, ['Action', 'Adventure', 'Comedy', 'Sci-Fi']),
(111, ['Comedy', 'Romance']),
(25, ['Comedy']),
(382, ['Comedy', 'Drama']), ... ]

We then pass this into LightFM's build_item_features(-) method:

sm_item_features = dataset.build_item_features(list_item_features)
sm_item_features
<352x370 sparse matrix of type '<class 'numpy.float32'>'
with 1097 stored elements in Compressed Sparse Row format>

Preparing interaction data

Now that we've prepared the user and item data, we can move on to preparing the interaction data by using the build_interactions(-) method:

sm_interactions, sm_weights = dataset.build_interactions(df_data[["userID","itemID","rating"]].values)
sm_interactions
<943x352 sparse matrix of type '<class 'numpy.int32'>'
with 10987720 stored elements in COOrdinate format>

Internal mapping of IDs

Instead of using the original IDs of the users and items in our dataset, LightFM internally assigns a new consecutive non-negative integer ID to each user and item. We can see the mapping like so:

user_id_map, user_feature_map, item_id_map, feature_item_map = dataset.mapping()

The user_id_map is:

user_id_map
{196: 0,
63: 1,
226: 2,

The user_feature_map is:

user_feature_map
{196: 0,
63: 1,
...,
'healthcare': 962,
'marketing': 963}

Remember, LightFM also treats the ID of every user as a feature - this is why we see the ID of our users included in the user_feature_map. We will later use these mappings to perform predictions.

Evaluating performance

Since we want to evaluate the performance of our LightFM model, we will use the library's random_train_test_split(-) method:

sm_train_interactions, sm_test_interactions = random_train_test_split(sm_interactions, test_percentage=0.2, random_state=42)
print(f"Shape of train interactions: {sm_train_interactions.shape}")
print(f"Shape of test interactions: {sm_test_interactions.shape}")
Shape of train interactions: (943, 352)
Shape of test interactions: (943, 352)

It's finally time to fit the LightFM model:

LEARNING_RATE = 0.25
NO_EPOCHS = 20
NO_COMPONENTS = 20 # Number of latent factorization
ITEM_ALPHA = 1e-6 # Regularization factor for item features
USER_ALPHA = 1e-6 # Regularization factor for user features

model = LightFM(loss="warp",
no_components=NO_COMPONENTS,
learning_rate=LEARNING_RATE,
item_alpha=ITEM_ALPHA,
user_alpha=USER_ALPHA,
random_state=42)

model.fit(interactions=sm_train_interactions,
user_features=sm_user_features,
item_features=sm_item_features,
epochs=NO_EPOCHS)

It took me roughly 5 minutes to train the model using my M1-chip MacOS. Let's now evaluate the performance (precision@k) of our model using our testing data:

np_arr_prec = precision_at_k(model,
test_interactions=sm_test_interactions,
user_features=sm_user_features,
item_features=sm_item_features)

print(len(np_arr_prec.shape)) # 943
np_arr_prec[:10]
array([0.3, 0.3, 0.1, 0.3, 0.1, 0.5, 0.1, 0.9, 0.7, 0. ], dtype=float32)

Here, we have the precision@k value for every user. We can compute the mean average precision@k (MAP@K) like so:

np_arr_prec.mean()
0.38462353

User-to-Item recommendation

Suppose we wanted to recommend 5 movies to a particular user with ID 63. We can use the predict method of our model like so:

user_id = 63
list_scores = model.predict(user_id_map[user_id], list(item_id_map.values()))
print(len(list_scores)) # 352
list_scores[:5]
array([-89.237495, -40.246788, -49.907825, -54.08462 , -99.96051 ], dtype=float32)

Note the following:

  • we convert the user_id to the user ID used by LightFM internally using user_id_map dictionary that we obtained earlier by dataset.mapping().

  • we supply a list of movie IDs that we want to obtain a recommendation score for. Since we are interested in finding the top k movie recommendations for this user, we need to compute the recommendation score for every movie.

  • the item_id_map is a dictionary that maps the original movie IDs to the non-negative consecutive integers used by LightFM internally:

    item_id_map
    {242: 0,
    257: 1,
    111: 2, ...

We convert the list of recommendation scores into a Series such that we can assign the original movie IDs as the index:

series_scores = pd.Series(list_scores)
series_scores.index = item_id_map.keys()
print(len(list_scores)) # 352
series_scores[:5]
242 -89.237495
257 -40.246788
111 -49.907825
25 -54.084621
382 -99.960510
dtype: float32

Finally, we sort the scores in descending order to obtain the top movie recommendations for the user:

series_scores.sort_values(ascending=False, inplace=True)
series_scores[:5]
222 124.631828
1 21.564163
15 -17.863367
288 -31.532494
258 -34.471897
dtype: float32

Here, we see that the movie with ID 222 is the top recommended movie for this particular user. Note that the magnitude of the scores does not carry any meaning - they are simply used for ranking purpose only.

Item-to-item recommendation5

To obtain the vector embedding of each movie, we can use the get_item_representations(-) method:

_, np_item_embeddings = model.get_item_representations(features=sm_item_features)
print(np_item_embeddings.shape) # (352, 20)
np_item_embeddings[:2]
array([[-0.79904926, -0.85671186, 0.42553982, 0.994905 , -0.06102959,
-0.41155615, -0.64710814, 0.38948753, -0.16504961, -0.24440393,
-0.46848 , -0.00726059, -0.5730575 , -0.12569594, -0.84235895,
0.9981231 , -0.36846963, 0.0336417 , 0.1883249 , 0.7433187 ],
[ 1.3654532 , 1.4837279 , 1.0903912 , -0.14545436, -1.0986278 ,
0.08251551, -2.776378 , -0.20987356, 1.8015835 , 2.2055554 ,
-0.22924855, -3.5627067 , -0.35516343, -0.79560184, -1.4587665 ,
1.6426092 , 1.2299991 , 0.26629227, 0.877507 , -0.35510343]],
dtype=float32)

Note the following:

  • for every movie, we get a vector representation that encodes the characteristics of the movie.

  • the shape is 352 by 20 because there are 352 movies in total and we set the latent vector size (NO_COMPONENTS) to 20 during model fitting.

We compute the cosine similarity for each pair of movies:

np_item_similarities = cosine_similarity(sparse.csr_matrix(np_item_embeddings))
print(np_item_similarities.shape) # (352, 352)
np_item_similarities[:2]
array([[ 9.99999940e-01, 9.83655304e-02, 8.39596242e-03,
-1.05853856e-01, 4.35901970e-01, -1.35404930e-01,
-7.39989057e-03, 5.28273523e-01, -5.74964844e-02, ...

It's good practise to convert this NumPy array into a DataFrame as we can assign the movie IDs to the rows and columns like so:

df_item_similarities = pd.DataFrame(np_item_similarities)
df_item_similarities.columns = item_id_map.keys()
df_item_similarities.index = item_id_map.keys()
df_item_similarities
242 257 111 25 382 202 153 286 66 845 ... 1181
242 1.000000 0.098366 0.008396 -0.105854 0.435902 -0.135405 -0.007400 0.528274 -0.057496 0.247484 ... 0.113449
257 0.098366 1.000000 0.079299 0.134674 -0.236781 -0.060198 0.160672 -0.037351 -0.040263 0.233311 ... 0.121223
111 0.008396 0.079299 1.000000 0.555100 0.156462 0.340305 0.059644 0.220777 0.389159 0.734334 ... 0.113434
25 -0.500537 -0.442127 -0.427205 -0.404253 -0.395989 -0.377560 -0.364232 -0.344025 -0.341550 -0.338591 ... 1.000000
382 0.435902 -0.236781 0.156462 0.271259 1.000000 0.204918 0.178467 0.311732 0.174588 0.147333 ... 0.076015

Remember, the item_id_map is a dictionary obtained by dataset.mapping() from earlier that maps the original movie IDs to the non-negative consecutive integers used by LightFM internally:

item_id_map
{242: 0,
257: 1,
111: 2, ...

To get the top 5 recommendations for a particular movie, say movie with ID 1049:

int_movie_id = 1049
series_rec_movie_ids = df_item_similarities.loc[int_movie_id,:]
series_top_5_rec_movie_ids = series_rec_movie_ids.sort_values(ascending=False).head(6)
series_top_5_rec_movie_ids
1049 1.000000
832 0.786510
1047 0.745040
756 0.741784
930 0.729942
407 0.729872
Name: 1049, dtype: float32

As expected, the first value is the movie itself, which receives a perfect score. We see that the movie with ID 832 is the most recommended (similar) movie!

Suppose we have the following dataset of user ratings of movies:

df_ratings = pd.read_csv("./data/rating.csv")
df_ratings = df_ratings[:50000]
df_ratings.head(10)
userId movieId rating timestamp
0 1 2 3.5 2005-04-02 23:53:47
1 1 29 3.5 2005-04-02 23:31:16
2 1 32 3.5 2005-04-02 23:33:39
3 1 47 3.5 2005-04-02 23:32:07
4 1 50 3.5 2005-04-02 23:29:40

Using this data, we can create an interaction matrix for the ratings:

def create_interaction_matrix(df_rating):
interactions = df_rating.groupby(['userId', 'movieId'])['rating'] \
.sum().unstack().reset_index(). \
fillna(0).set_index('userId')
return interactions

df_interactions = create_interaction_matrix(df_ratings)
df_interactions.head(10)
movieId 1 2 3 4 5 6 7 8 9 10 ... 117590 118696 125916
userId
1 0.0 3.5 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 ... 0.0 0.0 0.0
2 0.0 0.0 4.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 ... 0.0 0.0 0.0
3 4.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 ... 0.0 0.0 0.0
4 0.0 0.0 0.0 0.0 0.0 3.0 0.0 0.0 0.0 4.0 ... 0.0 0.0 0.0
5 0.0 3.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 ... 0.0 0.0 0.0
6 5.0 0.0 3.0 0.0 0.0 0.0 5.0 0.0 0.0 0.0 ... 0.0 0.0 0.0
7 0.0 0.0 3.0 0.0 0.0 0.0 3.0 0.0 0.0 0.0 ... 0.0 0.0 0.0
8 4.0 0.0 5.0 0.0 0.0 3.0 0.0 0.0 0.0 4.0 ... 0.0 0.0 0.0
9 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 ... 0.0 0.0 0.0
10 4.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 ... 0.0 0.0 0.0
10 rows × 6471 columns
def train(df_interactions, n_components=30, loss='warp', k=15, epoch=20, n_jobs=4):
x = sparse.csr_matrix(df_interactions.values)
model = LightFM(no_components=n_components, loss=loss, k=k)
model.fit(x, epochs=epoch, num_threads=n_jobs)
return model

model = train(df_interactions=df_interactions)

Let's create a user dictionary that maps the user ID (1 to 91):

dict_users = create_user_dict(df_interactions)
dict_users
{1: 0,
2: 1,
3: 2,
4: 3,
5: 4,

Note that there are 91 users in total.

Let's also create a movie dictionary that maps the movie ID to the movie title:

dict_movies = create_dict_movies(df_movies=df_movies)
dict_movies
{1: 'Toy Story (1995)',
2: 'Jumanji (1995)',
3: 'Grumpier Old Men (1995)',
4: 'Waiting to Exhale (1995)',
5: 'Father of the Bride Part II (1995)',

Note that there are 27278 movies in total.

Recommending movies to a user

We then obtain a score for each movie (2888 scores in total in this case):

int_user_id = 20
int_user_index = dict_users[int_user_id]
list_int_all_movie_indexes = np.arange(n_movies) # [0,1,2,...,2888]
list_float_scores = model.predict(int_user_index, list_int_all_movie_indexes)
list_float_scores
[-2.4829762 -4.9836407 -4.567343 ... -7.8306265 -5.070689 -5.515493 ]

We then convert the list into a Pandas Series and set the index to the corresponding movie ID:

series_float_scores = pd.Series(list_float_scores)
series_float_scores.index = df_interactions.columns
series_float_scores
movieId
1 -2.482976
2 -4.983641
3 -4.567343
4 -6.787599
5 -3.771592
...
116797 -7.433063
117511 -6.747071
117590 -7.830626
118696 -5.070689
125916 -5.515493
Length: 2889, dtype: float32

We then sort the movie recommendations in descending order of score:

series_float_scores.sort_values(ascending=False, inplace=True)
series_float_scores
movieId
588 1.637854
53125 0.056910
595 -0.019940
5816 -0.071756
4306 -0.114593
...
3035 -8.781129
1632 -8.905715
446 -9.055195
106920 -9.316463
986 -9.662092
Length: 2889, dtype: float32

Finally, we refer to the dict_movies to map the movie IDs to the movie titles:

index_int_top_n_movie_ids = series_float_scores.index[:n_rec_movies]
index_str_top_n_movie_titles = index_int_top_n_movie_ids.map(lambda int_movie_id: dict_movies[int_movie_id])
print(index_str_top_n_movie_titles)
Index(['Aladdin (1992)', 'Pirates of the Caribbean: At World's End (2007)',
'Beauty and the Beast (1991)',
'Harry Potter and the Chamber of Secrets (2002)', 'Shrek (2001)',
'Ace Ventura: Pet Detective (1994)'],
dtype='object', name='movieId')

Item-to-item recommendation

To obtain the embedding of each movie, use the item_embeddings property:

np_2d_item_embeddings = model.item_embeddings
print(np_2d_item_embeddings.shape) # (2889, 30)
np_2d_item_embeddings
array([[-0.09092457, -0.0838875 , -0.12737845, ..., 0.15028293,
0.0911804 , -0.10203753],
[-0.19036956, -0.434018 , -0.01351528, ..., 0.20598511,
0.08963097, -0.65899724],
[-0.32468513, -0.39591578, 0.28006303, ..., 0.04760544,
0.38568467, 0.04577473],
...,
[ 0.22653413, 0.3293239 , 0.03833063, ..., -0.00624772,
-0.06222142, -0.22419415],
[ 0.27651408, -0.17156073, 0.3964168 , ..., 0.03881634,
-0.27182811, 0.00873393],
[ 0.30261683, -0.16283555, 0.41420266, ..., -0.04435388,
-0.2530296 , -0.03735547]], dtype=float32)

The model embeddings are as follows:

np_2d_similarities = cosine_similarity(sparse.csr_matrix(np_2d_item_embeddings))
print(np_2d_similarities.shape) # (2889, 2889)
print(np_2d_similarities)
[[ 1. 0.40566397 0.3469511 ... -0.36663648 -0.6410434
-0.6627071 ]
[ 0.40566397 1. 0.21013616 ... -0.06052235 -0.31996462
-0.31183812]
[ 0.3469511 0.21013616 1.0000001 ... 0.08269963 -0.05860725
-0.06574367]
...
[-0.36663648 -0.06052235 0.08269963 ... 1. 0.50560886
0.52789694]
[-0.6410434 -0.31996462 -0.05860725 ... 0.50560886 1.
0.98076695]
[-0.6627071 -0.31183812 -0.06574367 ... 0.52789694 0.98076695
1. ]]
df_movie_movie_embedding_matrix = pd.DataFrame(np_2d_similarities)
df_movie_movie_embedding_matrix.columns = df_interactions.columns
df_movie_movie_embedding_matrix.index = df_interactions.columns
df_movie_movie_embedding_matrix
movieId 1 2 3 4 5 6 7 8 9 10 ... 111921 112138 112290 112556 112852 116797 117511 117590 118696 125916
movieId
1 1.000000 0.405664 0.346951 -0.294259 0.437728 0.512554 0.353473 -0.107797 -0.230798 0.638173 ... -0.627231 -0.352069 -0.387219 -0.437410 -0.560808 -0.337960 -0.561736 -0.366636 -0.641043 -0.662707
2 0.405664 1.000000 0.210136 -0.176198 0.409218 0.277941 0.297220 0.331275 0.292498 0.389678 ... -0.160900 -0.048249 -0.116282 -0.203658 -0.221201 0.048154 -0.237367 -0.060522 -0.319965 -0.311838
3 0.346951 0.210136 1.000000 -0.298944 0.407445 0.088106 0.760060 0.411049 0.341809 0.552191 ... 0.000870 0.126664 -0.045766 0.188171 -0.071493 0.121961 0.044734 0.082700 -0.058607 -0.065744
4 -0.294259 -0.176198 -0.298944 1.000000 -0.614848 -0.435423 -0.160393 -0.232857 -0.387155 -0.672230 ... 0.225883 0.120840 0.320404 0.122739 0.215385 0.198091 0.236011 0.169824 0.257572 0.292765
5 0.437728 0.409218 0.407445 -0.614848 1.000000 0.715999 0.458008 0.280278 0.327384 0.731787 ... -0.381800 -0.347734 -0.546228 -0.333948 -0.496239 -0.410618 -0.543472 -0.406298 -0.521836 -0.536075
... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ...
116797 -0.337960 0.048154 0.121961 0.198091 -0.410618 -0.586303 0.123174 0.118297 0.303710 -0.389772 ... 0.824818 0.945439 0.890395 0.892665 0.374194 1.000000 0.873650 0.947438 0.465295 0.490205
117511 -0.561736 -0.237367 0.044734 0.236011 -0.543472 -0.798511 -0.054796 0.180001 0.286856 -0.523803 ... 0.870660 0.874688 0.848889 0.903736 0.656449 0.873650 1.000000 0.879492 0.747042 0.749149
117590 -0.366636 -0.060522 0.082700 0.169824 -0.406298 -0.576917 0.097847 0.076889 0.306317 -0.424767 ... 0.840736 0.963988 0.956530 0.882439 0.364814 0.947438 0.879492 1.000000 0.505609 0.527897
118696 -0.641043 -0.319965 -0.058607 0.257572 -0.521836 -0.868977 -0.265207 0.194165 -0.009919 -0.602092 ... 0.720856 0.414643 0.517941 0.485705 0.943585 0.465295 0.747042 0.505609 1.000000 0.980767
125916 -0.662707 -0.311838 -0.065744 0.292765 -0.536075 -0.845708 -0.262634 0.177577 0.032490 -0.608097 ... 0.771162 0.439405 0.544692 0.493888 0.938071 0.490205 0.749149 0.527897 0.980767 1.000000
series_movie_scores = df_movie_movie_embedding_matrix.loc[int_movie_id,:]
index_top_n_rec_movie_ids = series_movie_scores.sort_values(ascending=False).head(n_items+1).index
index_str_top_n_rec_movie_titles = index_top_n_rec_movie_ids.map(lambda int_movie_id: dict_movies[int_movie_id])
print(index_str_top_n_rec_movie_titles)
Index(['Star Wars: Episode II - Attack of the Clones (2002)',
'Spider-Man 2 (2004)', '50 First Dates (2004)',
'X-Men: The Last Stand (2006)', 'Matrix Revolutions, The (2003)',
'My Big Fat Greek Wedding (2002)',
'Harry Potter and the Chamber of Secrets (2002)', 'Aladdin (1992)',
'Signs (2002)',
'Harry Potter and the Sorcerer's Stone (a.k.a. Harry Potter and the Philosopher's Stone) (2001)',
'Pirates of the Caribbean: At World's End (2007)'],
dtype='object', name='movieId')
robocat
Published by Isshin Inada
Edited by 0 others
Did you find this page useful?
thumb_up
thumb_down
Comment
Citation
Ask a question or leave a feedback...