The necessity of tagged data is well established in the field of analytics. All different forms – Exploratory and Diagnostic, Predictive and Prescriptive Analytics, have well-defined use-cases around how tagged data adds value.
For instance: Banks often have rules and models which can classify customer’s expenditure intelligently into different categories like – movies, online shopping, monthly bill payments, online flight booking, etc. Once this tagged data is available, exploratory analysis is run to identify common and dominant patterns of spends. These insights are finally used to recommend the right banking products, offers etc. to the customer.
On similar lines, our AI Engine, Raman, has been smartly tagging campaigns since January 2019. We use this feature to parse and understand the content of multi-channel campaigns intelligently and use the information to provide richer analytics and insights to clients.
For instance: We parse the app push notification campaign – title and body – deployed by an e-commerce brand and tag it as a “Seasonal”, “Weekend”, “Brand Promo”, “Product Promo”, “Discount” etc. campaign. A campaign can have multiple such tags associated with it.
Here’s how these auto-generated campaign tags work:
- Exploratory: Understand what type of campaign that works or doesn’t work by analyzing the tags. It also helps understand the effect of combining a few tags
- Predictive: Understand who may or may not engage with a campaign, using tags as a dimension of analysis. This is part of our AI-led Predictive Segments capability
- Prescriptive: Suggest what type of campaign works at a holistic and individual customer level, and create content accordingly
To understand the content of a campaign we need to focus on two key elements – title of the campaign and the creative of the campaign. Here is what the overall technical landscape looks like:
Let’s now dive deeper into understanding how the concepts of Natural Language Understanding can be used to parse and extract tags from the campaign title. We’ll define the problem statement, work with some dummy data, and explore how to extract relevant tags from the text.
To perform advanced campaign analysis, understanding what type of titles work, and what don’t is critical. Tagging these campaign titles manually can be painstaking and time-consuming. Enter our AI Engine, Raman, that can automatically tag these titles without any manual intervention through advanced ML models.
Here are a few examples:
Note: One campaign title can have multiple tags, but to simplify the concept – we’ve kept only the most relevant tag here.
In order to solve this problem, we followed the following steps:
- Firstly and most importantly, we manually tagged a few campaign titles to give them as model inputs and selected the most appropriate tags
- Data pre-processing of campaign titles
- Modelling with help of RASA
- Validating model results
Let’s understand each of these steps in detail:
1. Manually tagging campaign titles and selecting tags:
We performed manual tagging based on our own domain knowledge and validated them with subject matter experts. While tagging for the first time, we tagged each campaign title to the most granular level. After tagging, we performed EDA on the number of campaign titles present per tag. Based on the results, we made the decision to either club a few tags or split one tag at a more granular level, so that the number of campaign titles per tag is neither too large or small.
2. Campaign title EDA:
Data pre-processing is an essential step in building a Machine Learning model and depending on how well the data has been pre-processed; the results are seen. The same applies to the NLP tasks as well. We did the following data pre-processing steps on the campaign title to improve model performance –
- Lower-casing the entire campaign title
- Stop words removal: Stop words are very commonly used words (a, an, the, etc.). These words don’t add any value as they are repeated in most campaign titles or subject lines
- Handling emojis and punctuations: Emojis and punctuations have significant impact on the campaign title representation. So, we represented emojis and a few punctuations into text format for better understanding
- Stemming: Process of transforming a word to its root form
3. Modelling with RASA:
We used RASA stack to do the modeling. We chose RASA over other options due to its open source framework nature, easy customization options, and no requirement for data cloud-hosting. This means that you can host it on your own server unlike other frameworks. There may not be a web interface for easy implementation, but you can easily implement it by creating JSON or markdown files. If you need more control over your customization and don’t want to share your private data on the cloud, RASA is the best option available.
Rasa comes up with 2 components:
- RASA NLU library provides the functionality of intent classification and entity extraction from the user input
- Intent Classification: Classifies user input in the predefined categories
- Entity Extraction: Identifies important information from user input
- RASA CORE framework is a dialogue engine for building AI assistants. Rather than a bunch of if/else statements, it uses a Machine Learning model trained on example conversations to decide the next best action.
Both these components of RASA work independently of each other. So, RASA provides the functionality to only use the NLU component separately. It’s useful when you just need your input text categorized into a given set of categories or need to extract some useful words from the input string.
For training purposes, RASA NLU needs training data and pipeline to process training data.
The training data for Rasa NLU is structured into different parts:
- Common Examples: Input data with pre-tagged text with intent and entities
- Synonyms: Similar words for already extracted entities like mapping “Chinease” with “Chinese”
- Regex Features: Regex Pattern for the entity extraction
- Lookup Tables: Case-sensitive list of words that can be mapped for a particular entity
“Common Examples” is the only mandatory input needed by the RASA NLU model, but providing other inputs will help boost model performance. Even for “Common Examples” you need to provide at least 6 examples per intent or entity, to improve model performance. Moreover, add at least two intents for getting proper results from the interpreter.
RASA supports two input formats for training data, Markdown Format and JSON Format. Both take the same input, they only differ in the way you provide the training data.
4. Training and validating model results:
This will train the model with the given input data and training pipeline. This will save the model in the specified directory. The NLU interpreter uses the same model to predict the intent and entity for any new user input. The interpreter provides the output in the following format:
It provides the extracted entities (if available) along with the intent and confidence in the ranked format. As it can be seen from the output, the RASA correctly classified the out-of-sample text into the correct category and extracted entities as per the domain need. With the proper training data and custom pipeline, you can easily create a custom RASA NLU model to accomplish your task.
Campaign smart tagging plays a key role in deploying and measuring the performance of various multi-channel campaigns. This was our attempt at breaking down how the magic of AI makes this possible at scale. We shall continue to demystify the cutting-edge technology that enables you to personalize and delight your customers at scale in future posts as well!