Predicting Popularity of Retail Products Using AWS SageMaker, AWS Marketplace & AWS Data Exchange

Most of the modern and successful agile organizations are increasingly relying upon the new insights gained by third-party data to complement their internal and first-party data to make smarter business decisions. Users can adapt to dynamic business models as layering data from multiple third-party sources enable them to do so. This ultimately optimizes customer success.

With AWS Data Exchange, organizations can accelerate their capabilities to make smarter business decisions, enabling them to make more informed and data-infused decisions faster. This blog is aimed at guiding users to understand how they can leverage AWS SageMaker, AWS Marketplace, and AWS Data Exchange to deduce smarter and actionable business decisions from third-party data.

AWS Data Exchange can be leveraged by customers to optimize efficiency, minimize costs and accelerate their business outcomes. The end users can subject the third-party data to:

· Write algorithms to train custom machine learning (ML) models

· Develop models to dashboard highly localized data points that impact operational decision making &/ business continuity

· Extract more value from the data points such as from propriety data to support point of sale (POS), supply chain, human resources, customer loyalty, and other business functions

· Pilot new technologies at a minimal investment upfront

· Solve a process challenge that could be detrimental to your customer satisfaction indexes

· Leverage a data mesh to solve regulatory reporting challenges related to data ownership, governance, and lineage

For the majority of the B2B professionals, optimizing their customers’ experiences is synonymous with engineering a proper framework for decoupled data regulation.

According to an estimate by McKinsey Global Institute, data and analytics can generate value worth between $9.5 trillion and 15.5 trillion a year if embedded at scale.

Let’s now inspect how AWS SageMaker, AWS Marketplace, and AWS Data Exchange can be used to predict the popularity of retail products and can help retail organizations make more informed business decisions with data sets in conjunction with advanced analytics and machine learning.

How to Leverage AWS SageMaker, AWS Marketplace, and AWS Data Exchange to Predict the Popularity of Retail Products

Certainly leveraging advanced analytics, data visualization tools, and machine learning algorithms helps many retail customers gain a competitive edge. To scale up your machine learning and analytics, however, you need access to high-quality data sets, or else it would be the same “Garbage in, garbage out” theory.

Some of the data is available in-house. However, data scientists always need high-quality, third-party data in addition to the data that’s already available for analysis. With the help of AWS Data Exchange, it gets easier for customers to find, subscribe to, and use third-party data in AWS. AWS Data Exchange boasts of having more than 2,000 data products from over 100 qualified data providers. The users can subscribe, download or load data into Amazon S3 buckets and analyze them with the help of AWS analytics and ML services.

The popularity of retail products can be predicted using a dataset from AWS Data Exchange. One also can use Amazon SageMaker and a third-party algorithm available in AWS Marketplace to analyze data and generate actionable business insights. Let’s consider an experiment designed by AWS.

How to Deliver a Custom ML Model to Predict the Popularity of Bath Products: The Retail Angle

An ML startup using Decision Support System (DSS) was approached by a customer, a retail store to deliver a custom ML model to predict the popularity of bath products. The goal for the startup was simple: to help increase sales and store revenues by stocking the most popular products. The solution architect at the startup had to create a custom ML model to advise on the products that had to be stocked in order to boost sales. The solution architect had to create an ML model to predict the popularity of bath products based on attributes such as the product name and category.

The retail client had already purchased the Retail data set package from AWS, which contains the real-world datasets for bath products comprised of data from large chains such as Bath & Body Works and Bed Bath & Beyond. The solution architect used Intel’s DAAL decision forest classification algorithm, available in AWS Marketplace, for training a machine learning model.

The experiment was designed as under:

· The solution architect got permission to use AWS Data Exchange and allied services to subscribe and export datasets by associating AWSDataExchangeSubscriberFullAccess AWS Data managed policy with his IAM principal. This gave the architect the necessary permissions needed to use AWS Data Exchange as a subscriber.

· Leveraging Identity and Access management in AWS Data Exchange, the solution architect ensured that he had access to S3 bucket, to which the dataset was to be exposed.

· The architect ensured access to S3 bucket where the dataset was to be exported.

· The architect acquainted himself with the basic philosophies of AWS Exchange concepts.

· Finally, the architect ensured that the IAM principal had AmazonSageMakerFullAccess managed IAM policy associated.

An Overview of the Solution

Since the retail client had already procured the dataset, the next logical step was to transfer the dataset into an S3 bucket. The client transferred the data to an S3 bucket with the help of documentation.

Data providers often provide updated revisions of their datasets. To ensure that the solution architect was using the latest version, the client set up a process to automatically load the new data from AWS Data Exchange into the S3 bucket. This process was set up using documentation from AWS. See more. The solution architect opted for a three-step approach:

Subscribing to the Third-Party Algorithm Available in AWS Marketplace – This is an optional step as marketers can use any algorithm for training a model.
Setting Up the AWS SageMaker Notebook Instance – Creation of the notebook instance enabled compiling of the codes, which was intrinsic to data preparation.
Custom Model Training – This step involved the use of third-party data and algorithms.
Testing of the Model – This step involved the use of Ginger orange 3-wick candle as the product name, and the ML model predicted whether this model would be popular. Let’s discuss each of the 4 steps described above in detail:

Step 1: Subscribe to the Algorithm

To subscribe to the Intel®DAAL DecisionForest Classification algorithm in AWS Marketplace, the following steps can be used:

a) Open the Intel®DAAL DecisionForest Classification listing in AWS Marketplace.

b) Read the Highlights, Product Overview, Usage Information, and Additional resources and note the supported instance types.

c) The Continue to Subscribe option was selected.

d) The architect reviewed the End-user license agreement, Support Terms, and Pricing Information and selected Accept Offer to agree to the pricing, EULA, and support terms of the listing.

In this step, the other methods that can be used to train your machine learning model apart from Intel®DAAL DecisionForest Classification algorithm include built-in Amazon SageMaker algorithms that support classification, such as the LinearLearner algorithm, XGBoost algorithm, and other algorithms available in major supported frameworks for training your machine learning model.

Step 2: Creating an AWS SageMaker Notebook Instance

The architect created SageMaker notebook instance to analyze the data and train the machine learning model to predict popular bath products. The following steps were performed in the notebook:

1. Data Analysis and Feature Engineering

The architect identified interesting features such as looking at the dataset to explore features such as price, name, category, reviews, how well the product was received in the market and how long it lasted, ratings and promotions applied.

· The architect kept the model simple chose the name and category as features.

· The architect combined the review counts, review ratings, and the duration for which the product lasted in the market. A single outcome variable was created that implied whether the product became popular or not.

· The architect cleansed the data and cleansed the product name feature by removing all special characters and converting numerals to words. The text case was changed to lowercase, and the insignificant null values were removed.

· Some new features were created, and the data showed that popular products had shorter names. The architect created a product-name-length feature.

· The architect found the categories to be really broad. For example, three-wick-candle and single-wick-candle were in the same category. The architect decided to extract all common suffixes to create a new feature called a sub-category.

· The embeddings were generated. The product name column itself generated embeddings, and the architect visualized the embeddings via t-SNE plot.

2. Data Preparation for Training

· Many algorithms accept data only in numeric format. The solution architect used one such algorithm, and one-hot encoded the category and subcategory features.

· The dataset was randomized and split into training and testing datasets and then uploaded to S3.

3. Hyperparameter Specification As Per the Algorithmic Specification to Run a Training Job

For training, the data was used from the S3 bucket of the architect. The algorithm was harnessed from the AWS Marketplace, and hyperparameters were specified by the architect to train the ML model.

· The architect specified values for hyperparameters and then trained a model with 70% accuracy.

· Once the model was trained, there were two options:

A) Performing a batch inference.

B) Standing up an Amazon SageMaker endpoint for performing real-time inference. For this task, the architect stood up an Amazon SageMaker endpoint.

4. Testing the Trained Model

· When the architect entered Ginger orange 3-wick candle as the product name, the ML model deduced whether it would be popular or not.

· Then, the architect tweaked product names to a few other combinations, such as Orange vanilla hand soap and Orange hand soap.

· The model was further tuned by hyperparameter adjusting and optimization to achieve higher accuracy results.

· The architect continued performing inference on the tuned model until he had a list of products and categories to recommend to the client.

Alternative Approaches

The architect could further extend the exercise by adjusting features such as price and promotions. He could also enable additional synthetic features based on the frequency of occurrences of the specific fragrant elements. For instance, he could have predicted whether lavender-or coconut-scented products were more likely to be popular.

Wrap Up

This post demonstrates a scenario wherein a solution architect could deliver a custom ML model to predict the popularity of retail bath products to optimize the sales revenue for the client.

The retail client was advised to stock specific products based on the model created. One can use relevant real-world third-party data and algorithms to effectively train a custom ML model. One can even list and monetize the trained ML models, depending on the terms of use of third-party products one procured from AWS Marketplace.

One can get information about Amazon SageMaker Algorithms and Model Packages by referring to the AWS documentation. While AWS Data Exchange enables you to find, subscribe to, and use data products. One can build machine learning workflows with AWS Data Exchange and AWS SageMaker. AWS Data Exchange also allows the users to retrieve new updates automatically.

Finally, one can use AWS Marketplace for machine learning workloads.