Learning Style Space with a Human Touch

Motivation:

Rocksbox was founded on a simple principle: that modern women want to access the latest trends and explore their own fashion sense without spending money on things they don’t want. We offer a monthly subscription, where our members receive a set of three pieces of jewelry curated just for her. We establish a unique relationship with each of our members by learning her style preferences with the goal of getting her items that she’ll love in every box. If she really loves a piece, she can buy it right at home after having experienced it in real life. One of the ways that we learn her preferences is through the wish list. Our members can express interest in pieces that they’d like to see in their boxes by placing items on their wish list. Together with feedback on items she’s experienced in her box, we learn her personal style.

Personalization and the automated learning of style space is ripe for Data Science. The attributes of the jewelry (e.g. metal, stone type, color, shape…) and the combinations of these attributes together are the key determinants of the style of a piece. Similarly, customers have a range of dimensions and preferences that are uniquely combined in each person. The monthly subscription provides a platform for recurring data collection on both the product and the customer so that we can quickly learn about both parts of the equation (product and customer). Together, these factors make Rocksbox well positioned to augment the domain knowledge of our stylists with high quality recommendations for each box that’s curated.

Personalization of the member experience is a fundamental value proposition of ours and one that we believe helps to create enduring and valuable member relationships. Similar to the original Netflix experience, our members can rate the items they receive and tell us how well they meet their personal style. Our stylists, who curate each box that goes out to our members, are awash in feedback data and face a puzzle in matching up varied feedback from her to our inventory in order to create a set that she’ll love. How can we more efficiently learn our members personal style and help our stylists create awesome experiences in every box? Specifically, how can we learn the similarity between our products in order to both place them within a learned style space and use that style space to drive personalized recommendations?

Inferring Her Style Space: By knowing where in style space the products that she has previously liked and bought reside, we can recommend similar things that are both close enough to her known style preferences to be safe bets but controllably dissimilar to historical likes such as to provide a zone of serendipitous discovery.

Data Collection through Internal “Crowd-Sourcing”:

An important but often overlooked aspect of many classes of machine learning is the need for labeled data. Public datasets frequently come already labeled. However, many real datasets captured by businesses either lack labels or lack the right type of labels to facilitate machine learning. In order to learn and model jewelry product attribute space we first needed to have high quality attributes on our products. Initially, the attributes on our products were minimal and not deeply meaningful. They were mainly comprised of metal tone (i.e. gold or silver) and high level subjective style labels such as “boho” and “edgy”. These labels can mean different things to our stylists than our members, and have loose semantic meaning with respect to product attributes. To address this, we launched a multi-functional effort within Rocksbox, leveraging the strengths of our merchandising, engineering and data science teams. Despite the complexity, we undertook this effort because we believe that modeling style space is fundamental to our ability to personalize each set according to our member’s preferences.

Data Science worked with our merchandising team to define over one hundred new attributes covering physical attributes such as clasp type, shapes, secondary colors as well as style elements such as studded, fringe, and perforated. These attributes were chosen that would be most likely to discriminate pieces along style preferences in the desired style space. The growth in attributes was necessary for Data Science but presented a problem of scale. We had thousands of active products that needed to be backfilled with these new attributes. We considered generic crowd-sourcing but quickly discounted it due to the the jewlery specific domain knowledge that we believed was necessary to accurately tag our products. Instead, we “crowd-sourced” internally to Rocksbox with a few employees tagging the products over the course of a couple of weeks. This itself was a learning process. We learned which attributes we had clearly defined and which we had not. We also learned about attributes that we had left out and occurred frequently enough that they warranted inclusion in the set we tagged products with. For instance, we realized that we had left out the attributes crystal clusters and chevron while needing to better define the difference between an ear huggie and an ear jacket.

Product Tagging: Products were previously tagged using a limited set of tag categories. To facilitate style space learning, the number of tag categories was increased over twofold to include more discriminating physical attributes of the jewelry.

Dimensionality Reduction and Normalization:

With new product attributes in hand, we sought to develop a vector space model, in which each product would be represented by an vector learned from its attributes. We sought to use this vector space model to assess the similarity between products in our inventory, thereby forming the backbone of a content based recommendation engine. Backfilling our inventory with the new tags wasn’t the only challenge that the growth in the number of product attributes created. When it comes to machine learning, feature dimensionality poses problems in terms of the burden on training data to learn the importance of any one feature. Furthermore, with a large number of sparsely supported features (203 features with the average feature supported by 136 examples and the median by just 36 examples.), the most common features will tend to dominate and less common features will lose importance. In many applications, it is the rare features that are most discriminating of the individual entities that you are trying to model. To account for both the increased dimensionality and sparseness of the product attribute data, as well as counteract the dominance of common attributes, we applied principal component analysis (PCA ) and TFIDF normalization to our data.

PCA is a tried and true dimensionality reduction technique that applies a linear transformation to the data by projecting the original data onto a reduced set of axes that account for the majority of the variation observed in the original data. However, PCA doesn’t come free. Each new axis in the projected data set is composed of parts of the original features. This linear projection makes interpretability significantly more challenging. The resulting components don’t have a straightforward relationship back to the original features. Furthermore, there is no single best method to choose the number of components (axes) to use. Depending on your application you may want to choose the number of components that captures the most variance observed in the data, or you may simply want to reduce 1000s of dimensions to 100s to facilitate machine learning that might suffer from very wide data. We chose the clusterability of products in the resulting style vector space as a metric to choose the appropriate number of components in PCA. This was based on our goal of efficiently discriminating products in the reduced dimensionality vector space. By driving coherent clusters in the reduced dimensionality space, we could be assured that we had defined regions of product attribute space that our products mapped to.

Product Feature Space: Visualization of product clusters in learned attribute space. The space was reduced to 2 dimensions for display. Sampling of products from these clusters reveals distinct regions of product attribute space.

Along the road to successfully learning a vector space representation of product attribute we first encountered challenges that made us wonder whether this approach would work out at all. With data transformation, there is as much art as there is science that goes into a successful approach. Initially, we assumed that the high level labels in our products such as “boho”, “glam”, “classic”, and “chic” had low semantic meaning and would bias the learning of style similarity away from what could be learned from the product attributes. Furthermore, given the sparsity of the data, we expected that features with very low support in the dataset would hamper our ability to infer product similarity. In contrast, including the high level style levels helped bolster PCA because many of these style labels covary with sparse, granular attributes.

Product Similarity in the Learned Style Space:

To further assess the quality of the style vector space that we had learned, we leveraged the cosine distance between the resulting product style vectors to fetch similar products. For each product in our inventory we fetched the 10 most similar products based on the cosine distance between the seed product and the rest of the inventory. The visual similarity between the similar products was stunning and gave us confidence that we had learned a robust product attribute vector space.

Similar Products: The 10 most similar products for a given seed product can be queried using the vector space model. While images were not used to assess similarity, the similar products share visually similar styles. This suggests that the product attributes effectively capture the visual styles of the products. From the set of 10 most similar products, an algorithm could choose a subset of these products based on other attributes that are relevant such as popularity or price.

Once we had a jewelry attribute style space where we could reasonably assess the similarity between products, we sought to build a content based recommendation engine. By knowing where in style space the products that she has previously liked and bought reside, we can recommend similar things that are both close enough to her known style preferences to be safe bets but controllably dissimilar to historical likes such as to provide a zone of serendipitous discovery. We found that traditional content based recommenders that leverage user item preferences and item attributes tend to overly weight historical style preferences (i.e. serving up pieces based on previous interactions but not serving enough diversity for discovery). We need to balance staying within her taste graph while avoiding fatigue from serving up items that are too similar to those she’s previously purchased. To address these limitations, we developed a taste graph based approach to recommending similar items. The 15 most recent item that has positive feedback (liked or loved), has been wishlisted or has been purchased become the starting nodes in our query. From those starting nodes we fetch the N most similar items. In this manner we can control how similar or dissimilar we want the fetched items to be. There is a lot of flexibility here. For instance, we could fetch pieces less similar to pieces already bought under the assumption that she won’t want to buy or experience more pieces that are very similar to pieces that she’s already bought. Before returning the set of recommendations, we score the resulting set of products. The scoring incorporates both similarity with the starting product node as well as the weighted popularity of that item in terms of both feedback and purchases. Specifically, we compute a popularity score for each product that is a normalized weighted sum of the feedback for the item with the purchase frequency of the item. This popularity score is then weighted by the feedback on the seed piece (i.e. whether it was liked, loved or bought). Each product receives a score that is its distance from the seed piece plus the weighted popularity scores. In this manner, rather than simply relying on raw similarity we rank items based on how likely they are to have a positive member experience and business impact.

Recommendation pipeline: With the goal of surfacing pieces that she’ll love, we applied unsupervised machine learning to learn a product attribute based similarity space. Using the products that she has previously rated or expressed interest in (wish list) we can query the learned jewelry feature space to find the pieces that are stylistically similar to her expressed preferences. This set of pieces is ranked based on similarity to the seed products and popularity The top 40 ranked pieces are surfaced as recommendations.

To determine the likelihood of this approach to deliver negative experiences for our members, we looked at the incidence of recommending items that she has previously disliked. For customers who are newer to our service and for those from which we have less feedback data, we found that we return items that she has disliked about 1% of the time. This drops to below 0.5% once we have more than 10 data points coming from either feedback or the wish list. In tracking the success of this recommender in production, we can compare the performance of recommended items to non-recommended items in general and also the performance of the same items when they are recommended and when they aren’t.

Since deploying the product similarity recommender to our wish list, the wishlist CTR more than doubled (from ~3 wish list per session to ~7 wish list per session). This indicates that the recommender is surfacing relevant items that resonate with our members’ personal styles.

Retrospective:

We built a graph based product recommender from scratch, leveraging human tagging of our inventory and unsupervised learning to learn style space based on product attributes. We’re excited to have launched this recommender to power our wish list, where members can express interest in pieces and help stylists to create boxes that our members will love. We learned that even with small data, it’s possible to build a high quality recommender that goes beyond what typical content based recommenders can do by exploring known style space as well as allowing for serendipitous discovery. We welcome your thoughts and questions in the comments. Rocksbox is hiring and doing awesome things to transform the jewelry and fashion rental spaces with data. If you’re interested in what we are doing, we’d love to talk!