How to Build a User-Creator Affinity Model for a Short Video Platform

Written by shauryauppal | Published 2022/09/05
Tech Story Tags: recommendation-systems | machine-learning | data-science | entertainment | tiktok | artificial-intelligence | graph-data | user-creator-affinity | web-monetization

TLDRShort videos have become the new darling of the digital mediascape. After the internet boom in India, many influencers are emerging daily. We all have our favorite creators and can spend hours watching their content. For a platform like ours, we needed a user-creator affinity recommendation model. We recommend creator stories to users based on affinity (likeability) factor where a consumer’s (user) likeability for a creator is defined by: Follow, Profile Visit, Like, Comment, Share, etc. We divided the problem into 2 parts: Find out true high-affinity creators for a consumer based on MCDM.via the TL;DR App

Abstract

In the last couple of years, short videos have become the new darling of the digital mediascape. After the internet boom in India, many influencers are emerging daily. We all have our favorite creators and can spend hours watching their content. For a platform like ours, we needed a user-creator affinity recommendation model such that we recommend creator stories to users based on the affinity (likeability) factor where a consumer’s (user) likeability for a creator is defined by: Follow, Profile Visit, Like, Comment, Share, etc.

Overview

Affinity means a natural liking for and understanding of someone or something. Affinity is a temporal factor that changes with time and interest niche. Our goal is to capture user-creator affinity strength, which also captures users’ interest niche i.e., what type of stories a consumer (user) prefers more.

Business Goals

  1. Improve stories recommendation algorithm such that user’s session time increases
  2. Improve user niche discovery for content
  3. Improve visibility of long-tail creators and content discovery based on consumer’s likeability factor.

Expected Outcome

Recommend a list of story ids of creators for whom user-creator affinity is high.

NOTE: A creator is also a user on the platform. Hence, I will address users as consumers who watch a creator’s video.

Interaction between Creator-Consumer on Roposo

Out of these different interactions b/w consumer-creator, we decided to pick profile visit as a stronger signal to map out similarity between creators.

Approach

High-Level Approach

We divided the problem into 2 parts:

  1. For each consumer (user) find the top K creators based on the Multi-Criteria Decision Making TOPSIS technique for which the True Affinity score is high; where affinity is defined by like, follow, profile visit, comment, loop_count, perc_seen (percentage of video seen based on video duration), etc.
  2. Post finding these True Top K creators for whom affinity is high based on MCDM. We take embedding of these creators (embedding computed using node2vec embedding methodology) and find out nearest neighbors and recommend similar creators.

In Summary*: First, we find out true high-affinity creators for a consumer based on MCDM. Then we find similar creators with respect to high-affinity creators.*

Implementation Details (TLDR) — Refer to the above Approach Figure with Steps

Step 1: Finding True Top Affinity creator for a Consumer (user) from interactions.

This is a multi-criteria decision-making (MCDM) or multi-criteria decision analysis (MCDA) problem as we wanted to rank all creators for a consumer (user) with whom the consumer interacted in the last 30 days.

Consumer-Creator Interaction Data

We use the Technique for Order of Preference by Similarity to Ideal Solution (TOPSIS) an MCDM algorithm to rank creators in order of affinity. TOPSIS is based on the concept that the chosen alternative should have the shortest geometric distance from the ideal solution and the longest geometric distance from the worst solution.

Scikit Criteria_:_ Link

One can check out my blogs to get a detailed understanding of MCDM: Ranking of entities with Multi-Criteria Decision Making Methods (MCDM) — Part One | Ranking and Selection of the best with Multi-Criteria Decision Making (MCDM) — Part Two

Sum up: Post step1, for every consumer we have a ranked creator set based on affinity factors with whom the consumer has interacted in the last 30 days.

**Step2 & Step3: Creator Graph — Profile Visits to Embedding**We constructed a Creator-Creator graph based on the profile visits of a consumer. Connections between those creators were made for which profile visits by consumers co-occurred on a particular day.

The graph weights were defined by co-occurrence strength (number of times profile visits by a consumer co-occurred).

We computed the creator embeddings based on Paper Link Node2vec+ that uses word2vec skip-gram model.

Node2vec Params: How to set p and q?

The top and bottom panels correspond to the node2vec embedding generated using q = 0.5 and q = 2. One can see that in the top panel, nodes that fall into the same local network neighborhood (i.e., homophily) are colored the same. On the other hand, in the bottom panel, structurally equivalent nodes are colored the same.

Params q=0.5 and p=1 in this setting node2vec discover clusters/communities of characters that frequently interact with each other. Since the edge b/w nodes are based on co-appearances.

Sum up: Post Step2 & Step3, we now have creator embedding computed based on creator-creator graph build based on co-occurrence of profile visit.

**Step4: Recommending Top Creators**Now, we have true Consumer (User)-Creator Affinity Ranked based on MCDM and we have embeddings of all (active) creators on our platform.

We pick the top 5 True Affinity Creators ranked from the MCDM technique and recommend Nearest Neighbours to get the top 100 high-affinity creators.

Why top 5 True Affinity Creators were picked as query vectors? Why not pick the best top 1 or create a mean vector of top 5 creators and show similar creators to the query vectors in embedding space?

Idea of picking top 5 creators is inspired from Pinterest Research Paper PinnerSage.

It is true a user cannot be represented by one particular “interest” embedding.In general even in example of movies everyone shows interests in multiple genres likes honor, action, sci-fi, comedy, etc.To identify user interest we pick top 5 creators from the ranked set.

For vector similarity search we used Approximate Nearest Neighbour Algorithm (ANN) ScaNN over creator embeddings for fast vector similarity search.

Approach Summary (Recap)

  • We use MCDM to rank creators for each consumer using the interactive features (affinity-defining features). This ranked set of creators is the True Affinity Creators for a consumer (user).
  • Now, we create a creator-creator graph based on profile visits co-occurrence in a session of a consumer.
  • On this graph, we apply the random-walk algorithm Node2Vec+ with a set breadth-first search and depth-first search parameters. This gets us a creator vector representation.
  • At last, we pick the top 5 creators from the True Affinity set (ranked set based on MCDM) for a consumer and use the creator embeddings to find the top 100 most similar creators from the entire creator. For a fast vector similarity search, we use the ScaNN algorithm.

Stories Recommendation

Our expected outcome is a list of storyid. Hence, from the 100 top affinity creators for each consumer (from the above approach), we pick the latest not watched story of each creator and add it to the recommendation pool of the consumer, stories ranked based on creator similarity score wrt. user’s true creator affinity.

Conclusion

This approach of Topsis MCDM and Node2Vec+ not only ranks creators for a consumer but also helped us to find similarities between creators of the same niche using a profile-visit co-occurrence graph.

Reference

I hope you learned something new from this post. If you liked it, hit ❤️, subscribe, and share this blog with others. Want to discuss it further? Connect with me here.

This newsletter is now read by more than 4500+ subscribers. If you are building an AI or a data product or service, you are invited to become a sponsor of one of the future newsletter issues. Feel free to reach out to shauryauppal97@gmail.com for more details on sponsorships.


I am nominated for the HackerNoon 2022 Noonies, Vote for me: https://www.noonies.tech/2022/programming/2022-hackernoon-contributor-of-the-year-data


Connect 1:1 Meeting here: https://topmate.io/shaurya

I am open to Consults you can reach out to me on LinkedIn: https://www.linkedin.com/in/shaurya-uppal/


Data Science Book Recommendations:

[1] The Book of Why

[2] Naked Statistics


Written by shauryauppal | Data Scientist | Applied Scientist | Research Consultant | Startup Builder
Published by HackerNoon on 2022/09/05