IIC 3633 - Sistemas Recomendadores - PUC Chile

Denis Parra
Profesor Asistente, DCC, PUC CHile

  • Hasta hace pocos años, la gran mayoría de los modelos avanzados de recomendación, basados en factorización matricial, dependían de preferencias explícitas del usuario en forma de ratings.
  • Pero los ratings (explicit feedback) son difíciles de obtener.
  • Por otro lado, tenemos la opción de usar feedback implícito, pero con los siguientes problemas:
    • No hay feedback negativo.
    • Contiene ruido.
    • Es difícil cuantificar preferencia y confianza en esas preferencias.
    • Hay una carencia de métricas de evaluación (RMSE y MAE no funcionarían bien)

Paper 1

Hu, Y., Koren, Y., & Volinsky, C. (2008).

Collaborative filtering for implicit feedback datasets.

In ICDM'08. Eighth IEEE Internatioonal Conference on Data Mining (pp. 263-272).

Ratings : recurso escaso

  • Si bien SVD++ considera implicit feedback, este modelo optimiza específicamente feedback implícito
  • Considera, antes que todo, valores binarios de consumo/no consumo del ítem

Modelo Implicit Feedback - Hu et al.

  • Se considera también la confianza de observar \(p_{ui}\) con la variable \(c_{ui}\) (\(\alpha\) = 40, uso de CV)

    \(r_{ui}\) es, en este caso, el implicit feedback (e.g. plays)

  • La función que esperamos minizar es, luego

Modelo Implicit Feedback - Hu et al. II

  • Aprendizaje de parámetros (factores latentes): ALS en lugar de SGD.
  • \(c_{ui}\) puede tomar distintas formas. Una alternativa es

  • De esta forma, el implicit feedback \(r_{ui}\) se descompone en \(p_{ui}\) (prefencias) y \(c_{ui}\) (nivel de confianza), y
  • Maneja todas las combinaciones usuario-item (n * m) en tiempo lineal al explotar la estructura algebraica de las variables


  • Servicio de TV digital, datos recolectados de 300.000 set top boxes.
  • En un período de 4 semanas, 17.000 programas de TV únicos
  • \(r_{ui}\) : cuantás veces usuario \(u\) vio programa \(i\) en un período de 4 semanas
  • Luego de una agregación y limpieza de datos, \(|r_{ui}|\) : 32 millones

Evaluación y resultados

  • \(rank_{ui}\) : percentil-ranking de un programa \(i\) en la lista de recomendación de \(u\).
  • Si \(rank_{ui}\) = 0%, el programa \(i\) ha sido predicho como el más relevante para el usuario \(u\), y si \(rank_{ui}\) = 100%, el programa \(i\) es el menos deseado. Expected percentile ranking \(\bar{rank}\) : the smaller the better

Resultados I

Resultados II

Paper 2

Parra, D., & Amatriain, X. (2011).
Walk the Talk: Analyzing the Relation between Implicit and Explicit Feedback for Preference Elicitation.
In User Modeling, Adaptation and Personalization (pp. 255-268). Springer Berlin Heidelberg.

  • Is it possible to map implicit behavior to explicit preference (ratings)?
  • Which variables better account for the amount of times a user listens to online albums? [Baltrunas & Amatriain CARS ‘09 workshop – RecSys 2009.]
  • OUR APPROACH: Study with users
    • Part I: Ask users to rate 100 albums (how to sample)
    • Part II: Build a model to map collected implicit feedback and context to explicit feedback

Walk the Talk (2011)

Walk the Talk - II

  • Requisitos para participar en estudio: > 18años, scrobblings > 5000

Muestreo de Datos para estudio de Usuario

  • Cuántos y qué items (álbums) deberian ver los usuarios?
    • Implicit Feedback (IF): playcount for a user on a given album. Changed to scale [1-3], 3 means being more listened to.
    • Global Popularity (GP): global playcount for all users on a given album [1-3]. Changed to scale [1-3], 3 means being more listened to.
    • Recentness (R) : time elapsed since user played a given album. Changed to scale [1-3], 3 means being listened to more recently.

Análisis de Regresión

  • Including Recentness increases R2 in more than 10% [ 1 -> 2]
  • Including GP increases R2, not much compared to RE + IF [ 1 -> 3]
  • Not Including GP, but including interaction between IF and RE improves the variance of the DV explained by the regression model. [ 2 -> 4 ]

Análisis de Regresión 2

  • RMSE1: Considera los ratings = 0.
  • We tested conclusions of regression analysis by predicting the score, checking RMSE in 10-fold cross validation.
  • Results of regression analysis are supported.

Conclusions of Part I

  • Using a linear model, Implicit feedback and recentness can help to predict explicit feedback (in the form of ratings)
  • Global popularity doesn’t show a significant improvement in the prediction task
  • Our model can help to relate implicit and explicit feedback, helping to evaluate and compare explicit and implicit recommender systems.

Parte II

  • Implicit Feedback Recommendation via Implicit-to-Explicit OLR Mapping (Recsys 2011, CARS Workshop)
    • Consider ratings as ordinal variables
    • Use mixed-models to account for non-independence of observations
    • Compare with state-of-the-art implicit feedback algorithm

Supuestos en el estudio I

  • Linear Regression did not account for the nested nature of ratings

  • And ratings were treated as continuous, when they are actually ordinal.

Modelo II: Ordinal Logistic Regression

  • Actually Mixed-Effects Ordinal Multinomial Logistic Regression
  • Mixed-effects: Nested nature of ratings
  • We obtain a distribution over ratings (ordinal multinomial) per each pair USER, ITEM -> we predict the rating using the expected value. … And we can compare the inferred ratings with a method that directly uses implicit information (playcounts) to recommend ( by Hu, Koren et al. 2007)

Ordinal Logistic Regression Mapping

  • Model

  • Predicted values


  • D1: users, albums, if, re, gp, ratings, demographics/consumption
  • D2: users, albums, if, re, gp, NO RATINGS.


Conclusions and current work

Paper 3

Xing Yi, Liangjie Hong, Erheng Zhong, Nanthan Nan Liu, and Suju Rajan. 2014.

Beyond clicks: dwell time for personalization.

ACM RecSys 2014.

Dwell Time

  • Method to consume fine-grained dwell-time at web scale
    • Focus Blur (FB) and Last Event (LE) methods: server side methods
    • Focus blur closer to client side, so is the one used
  • Dwell times varies by device (correlation between)
  • Raw dwell time distributions change considerably on content type, but at least log-raw distributions are bell shaped

Dwell Time II

  • Challenge: dwell time normalization, to extract an engagement signal which is comparable across devices -> they normalize
    • Dwell time is used in a learning to rank approach (using dwell time as target) to rank items
    • Evaluation on Yahoo! logs
    • Option 2 is using directly dwell time in a CF-based recommendation

Eventos: Server y Client-Side

Dwell Time para Distintos Dispositivos

Dwell Time vs. Largo del articulo

Dwell Time vs. Número de Fotos

Slideshows en Distintos Dispositivos

Consumo de Videos en Distintos Dispositivos




