This data set contains simulated data that mimics customer behavior on the Starbucks rewards mobile app. Once every few days, Starbucks sends out an offer to users of the mobile app. An offer can be merely an advertisement for a drink or an actual offer such as a discount or BOGO (buy one get one free).
A Brief look into the provided Data.
The data is contained in three files:
- portfolio.json — containing offer ids and meta data about each offer (duration, type, etc.)
- profile.json — demographic data for each customer
- transcript.json — records for transactions, offers received, offers viewed, and offers completed
Here is the schema and explanation of each variable in the files:
- id (string) — offer id
- offer_type (string) — type of offer ie BOGO, discount, informational
- difficulty (int) — minimum required spend to complete an offer
- reward (int) — reward given for completing an offer
- duration (int) — time for offer to be open, in days
- channels (list of strings)
- age (int) — age of the customer
- became_member_on (int) — date when customer created an app account
- gender (str) — gender of the customer (note some entries contain ‘O’ for other rather than M or F)
- id (str) — customer id
- income (float) — customer’s income
- event (str) — record description (ie transaction, offer received, offer viewed, etc.)
- person (str) — customer id
- time (int) — time in hours since start of test. The data begins at time t=0
- value — (dict of strings) — either an offer id or transaction amount depending on the record
In this project, customer demographics and transactional records are analyzed to predict whether a customer is likely to accept a certain offer or reject it. This is essential information for any company helping them make more targeted advertisements/offers to customers more likely to be influenced by a certain poffer.
The Project is broken down into 2 parts.
- Data Exploration and Feature Creation
- Modelling and Predictions.
Data Exploration and Analysis
Lets start with going through the data files to understand the data.
Since the variables are already explained above, lets start off with visualizing the distribution of the customers based on age.
It can be seen that the distribution consists of an unrealistically high freqeuncy count close to the age of 120. On closer notice the values in this age grouping seemed to consist of incomplete information (missing income values) and were removed from further analysis.
Next , the customers were grouped by gender to see if the distribution was balanced.
The genders “O” or Others, exists in a very small number relative to Male or Female profiles. Considering that gender behavior could have an impact on how an offer is perceived, gender “O” does not provide enough sample counts. Due to this imbalance these values were removed from the dataset.
Another interesting aspect to look at would be to see since how long customers have been members of the Starbucks App. To visualize this, the time when the latest customer became a member was calculated and used as a reference to see how long other customers have been members.
The figure shows a strong grouping in terms of how the customer rises or drops at certain days. This provides an interesting view point and can be used to categorize customers into categories such as new members , long term members etc.
This is exactly what is looked at next. Lets have a look at the distribution of customers based on gender and longevity of there membership as a Starbucks Customer.
Understandably, in majority of the cases, Males have a higher number compared to females since overall in the dataset Females have a lower count. However it is interesting to see that Long Term Members are in almost equal number for both genders. In Percentage terms this means that a higher percentage of females are Long Term Members than their male counterparts.
Finally these categorical features are converted to numeric values to be used in the modelling later on. Here is how the profile dataset looks like in the end.
This file consists of the different types of offers provided by Starbucks. A total of 10 offers (4 bogo , 4 discount and 2 informational offers). The same named offers differ in terms of features such as reward, duration, difficulty and or channel of transmission.
The portfolio is basically used in parallel with the transcript file in creating target variables and other features which will be discussed later on. For the time being, the features channel and offer_id are label encoded. The final version of portfolio can be seen below.
Transcripts is an essential part in creating and solving the problem statement. It consists of all the offer records (receiving , viewing, completing). It also consists of all the transactions made by each customer. The transcript file is used to analyze how the customers reacted to an offer. This file along with profile and portfolio are used to see if a customer was actually influenced by an offer.
NOTE: It is very important to understand what constitutes as offer influence. It can be defined with the rules below.
- An offer had influence if after receiving an offer, a customer viewed it before completing it within the given offer period.
- An received offer is had no influence if a customer completed that offer and but did not view it or viewed it after completing it.
- Informational offers do not have offer completion. If a customer made a transaction within the advertisement period of an informational offer, it is understood that the customer was influenced by the advertisement. If a customer made a transaction but did not look at the informational advertisement, then even though he made a transaction, there was no influence.
Lets have a look at a real example.
We can see that the offer received, viewed and completed are all sequential and within the offer duration period. This offer therefore had a positive influence on the customer.
In this case, the offer was viewed after the offer was completed. This means the customer did not look at the offer therefore no influence.
We can see here that the first offer received a positive response/influence (1) and the second offer received no influence/response (0).
Finally the features above are combined with the Response column being the target variable. The final dataset after feature scaling can be seen below.
Before we start off with data modelling, lets talk about a good evaluation metric. Remember we are dealing with a binary classification problem so accuracy could be a good starting metric. However accuracy can be misleading especially in imbalanced classes. Therefore f1-score is used as the evaluation metric which is the weighted average of precision and recall and provides a better evaluation for imbalanced classes.
- Recall is the ratio
tp / (tp + fn)where
tpis the number of true positives and
fnthe number of false negatives.
- Precision is the ratio
tp / (tp + fp)where
tpis the number of true positives and
fpthe number of false positives.
Another metric that will be used to evaluate the results is the confusion matrix. A confusion matrix provides a great visual representation of how the model performed in terms of the predictions it made. It groups the results into true positives, true negatives, false positives and false negatives showing how the model performed for both classes.
In binary classification, the count of true negatives is C0,0, false negatives is C1,0, true positives is C1,1 and false positives is C0,1, .
Modelling and Predictions
This task is basically a binary classification problem. For modelling, the classifier XGBoost is used. XGBoost is a state of the art distributed gradient boosting framework and has frequently outperformed other classification algorithms. To start off, the estimator was initialized and tested with the default values.
The initial results produced a decent f1-score of 84.86 %. The confusion matrix can be seen below. It can be seen that the model has a slightly higher tendency in predicting that the offers have an influence.
Next, a Randomized GridSearchCV is performed to tune the hyperparameters and see if the model accuracy can be improved. Stratified K-Fold cross validation is performed preserving sample percentage for both classes.
Based on the best estimator parameters received from the GridSearch, the model is now ran again with those parameters. The f1-score improves to 86.24%. Looking at the confusion matrix, we see that the model it can be seen that both True positive and True Negative count has increased and False Positive and False Negative Count has decreased showing an overall better model prediction for both classes.
Furthermore, lets have a look at the feature importance generated.
Amount spent seems to have the highest influence in decision making. However it can be seen majority of the other features also play a significant role and cannot be neglected.
In this article, information was analyzed that mimicked data from the Starbucks app. Information such as customer demographics, customer transactions and offer types were compounded together to see the influence different offers had on different customer types.
Initially different types of customers were visualized and grouped and later on, feature engineering was done bearing in mind what exactly constitutes as an influential offer.
Finally, data modelling was performed with f1-score and confusion matrix chosen as the evaluation metrics providing decent results in differentiating whether a customer would be influenced by a certain offer or not.
Of course there is always room for improvement and the following are perhaps a few ways the predictions can be improved further.
- More Feature Engineering: More intricate features could be created leading to better predictions
- Other Estimators: Although XGBoost outperforms several other estimators, there can be cases where another classifier may outperform XGBoost.
- More intensive hyperparameter tuning: Rather than using Randomized Grid Search CV, a complete brute force methodology could be used if time permits to tune hyperparameters even more.
The complete code of the analysis can be found on my GitHub Repository.