My engineering friend, let’s address him as Mr. Wolf 🐺 (identify hidden), requested a 1:1 call to help him fix his classification model. This happened months ago, and I thought to share our conversation as a learning opportunity for others.

To be honest he addressed his model as an AI-based classifier, earlier I just polished his reference. 😂

Real-life based story of #engineermeetsdatascientist

It was his first experience building a DS model. Mr. Wolf scanned through many Youtube influencer videos and, from some random godly-looking intelligent data scientists, learned how to build a data science classification model-or, to be precise, a text classification model.

He must have come across someone claiming like this

[Problem Statement] Mr. Wolf, after a few videos, felt he knew enough and went ahead to build his quoted “AI-Based Classifier” which detects a topic with the highest probability and responds with a canned response corresponding to the topic.

The classifier is trained on questions that are mostly related to two topics (shipping plans and order cancellation costs). Mr. Wolf with his partial data science and domain knowledge converted text to embeddings and then created a binary classification model. Ignoring the fact that there might be user questions that do not belong to the above two topics, he deployed this model to his production system.

He became a rock star after model deployment, earning his workplace the title of “AI-DRIVEN COMPANY.” Woohooooo!!!

Soon after a day or two during post-analysis, his happiness was lost. He now had to save his face. The model produces incorrect results for topics that are not listed in the first two categories. He called me to help fix the sham. 😂

First thing I said to my friend Mr. Wolf “Bhai Sun meri baat kisi bhi youtuber ka code copy karke hero mat bano pehle usska background and experience dekho, GURU banane se pehle”

Translation: (My first piece of advice was to not follow any Influencer’s Random Code Snippets and to first learn about a person’s background before appointing an XYZ person as your GURU.)

After understanding the details of the problem and his current deployed model, which classifies user queries into two categories: order cancellations and shipping plans, the following were the two solutions proposed:

Short-term, inelegant solution (not recommended for long-term use)

Implement an automatic response feature that only activates when the class confidence is above a 0.8 threshold. This would help reduce “false positives,” in which the model responds to queries that are not related to shipping plans or order cancellation. However, this solution may lead to an increase in false negatives and result in an influx of common queries being sent to human customer support agents until the model is improved.

Note: The 0.8 threshold is just an assumption. The ideal threshold can be determined by identifying the point on a receiver operating characteristic (ROC) plot where the false positive rate is minimized and the true positive rate is maximized.

Fundamentally, what was done wrong?

Mr. Wolf, while building the classification model, only trained the model on 40% of the data, without considering the remaining 60% as the DO_NOT_RESPOND class. This resulted in the model treating the problem as a binary classification problem rather than a multi-class classification problem.

Long-Term Solution

Implement a multi-label classifier with three categories.

Class1: shipping plans
Class2: order cancellation cost
Class3: negative samples — `DO_NOT_REPLY`

As more queries are added, it is important to keep our model up to date to prevent model drift. To enhance the model’s understanding of the queries and their classes, we should consider retraining or refreshing the model for the three classes.

Online learning with Vowpal Wabbit is also a possible solution to handle the drift issue.

Share some NLP Classifier Methods, we can apply.

Query text to embedding
Build a multi-class classifier with Embedding as features and target as the topic.

Method1: TFIDFVectorizer with SpaCy Tokenizer and a machine learning classifier

Method2: Using Facebook’s FastText library for word embeddings

Method3: Converting text to vectors using Doc2Vec, a pretrained Gensim model, and a machine learning classifier

Method4: A combination of Word2Vec with an Average Pooling strategy or a TFIDF Weighted Pooling strategy

Method5: An advanced approach using Google’s BERT model to obtain document embeddings and classify them into three different topics/categories

Method6: If the training dataset is small, then a zero-shot classifier is the best approach.

Try Ktrain — Zero Shot Classifier only 4–5 lines of Code.

If you are looking for the exact text to embedding code, check out my newsletter Edition 6, “Building a Job Recommendation Strategy for LinkedIn and XING."

If you have something interesting for me, let’s connect and discuss it in detail. MAIL ME

Mr. Wolf carefully heard my list of ideas and approaches and said you are my GURU now.

Originally published here.

Helping My Engineering Friend Build a Text Classifier

Fundamentally, what was done wrong?

Long-Term Solution

Share some NLP Classifier Methods, we can apply.