lightfm

date: 2022-03-17 excerpt: lightfmについて

LightFMについて”

概要

matrix factorization形に落とし込むレコメンドライブラリ
adagratや他のオプティマイザが使える
ランクペアワイズやlogisticやベイジアンなどが使える

具体例

映画のデータセットから学習

データの読み込み

from lightfm import LightFM
from lightfm.datasets import fetch_movielens
from lightfm.evaluation import precision_at_k

# Load the MovieLens 100k dataset. Only five
# star ratings are treated as positive.
data = fetch_movielens(min_rating=5.0)
display(data)
display(data["train"].shape)

"""
{'item_feature_labels': array(['Toy Story (1995)', 'GoldenEye (1995)', 'Four Rooms (1995)', ...,
        'Sliding Doors (1998)', 'You So Crazy (1994)',
        'Scream of Stone (Schrei aus Stein) (1991)'], dtype=object),
 'item_features': <1682x1682 sparse matrix of type '<class 'numpy.float32'>'
 	with 1682 stored elements in Compressed Sparse Row format>,
 'item_labels': array(['Toy Story (1995)', 'GoldenEye (1995)', 'Four Rooms (1995)', ...,
        'Sliding Doors (1998)', 'You So Crazy (1994)',
        'Scream of Stone (Schrei aus Stein) (1991)'], dtype=object),
 'test': <943x1682 sparse matrix of type '<class 'numpy.int32'>'
 	with 2153 stored elements in COOrdinate format>,
 'train': <943x1682 sparse matrix of type '<class 'numpy.int32'>'
 	with 19048 stored elements in COOrdinate format>}
(943, 1682)
"""

データの例

display(data["train"].todense())

"""
matrix([[5., 0., 0., ..., 0., 0., 0.],
        [0., 0., 0., ..., 0., 0., 0.],
        [0., 0., 0., ..., 0., 0., 0.],
        ...,
        [5., 0., 0., ..., 0., 0., 0.],
        [0., 0., 0., ..., 0., 0., 0.],
        [0., 5., 0., ..., 0., 0., 0.]], dtype=float32)
"""

学習と評価

# Instantiate and train the model
model = LightFM(loss='warp')
model.fit(data['train'], epochs=30, num_threads=2)

# Evaluate the trained model
test_precision = precision_at_k(model, data['test'], k=5).mean()
display(test_precision) # 0.05019815

推論

item_size = len(data["item_feature_labels"])
item_ids=np.array(range(item_size))
model.predict(user_ids=0, item_ids=item_ids)

google colab

lightfm-minimal-example

cold start問題の対応方

if user_index is not None:
    predictions = model.predict([user_index, ], np.array(target_item_indices))
else:
    predictions = model.predict(0, np.array(target_item_indices), user_features=user_features)

user_featuresに値を入れる
user_featuresはmodel.get_user_representations(features)にて生成できる
参考
- Lightfm: handling user and item cold-start

参考