Assignment 3: Machine Learning in Retail
Brief
You are part of a team of data analysts that was contracted by Turtle Games, a game manufacturer and retailer with a global customer base. To improve overall sales performance, you should explore how customers accumulate loyalty points, whether they can be segmented into groups, how reviews can be used to understand the business's reputation and whether predictive models can be built be using existing data from the customer loyalty scheme.
Problem statement
Turtle Games’ sales have stagnated. Likely causes include marketing processes that fail to leverage the loyalty program. Customers accumulate points but the associated data is not used in targeted initiatives. Additionally, customer reviews have not been evaluated for sentiment. These factors combined leave TG with underdeveloped insights and marketing strategies.
Objective
Build models that predict loyalty and gauge sentiment from reviews to inform new marketing strategies, increase engagement and boost sales.
Tools and data
Tools: Python, R
Python Libraries: Pandas (data operations), numpy (numeric operations), datetime (date manipulation), seaborn and matplotlib (visualisations), re, WordCloud and STOPWORDS (WordCloud creation), scikit-learn (predictive analysis), NLTK (Natural Language Processing), scipy.stats (statistical analysis and tests).
R Libraries:tidyverse (data cleaning, visualisations), skimr (data summaries).
Data: A csv file containing 2000 customer records describing demographics (income, education etc), loyalty points accumulation and text from online reviews.
Method
- Data cleaning and exploration using R. High level visualisations produced to highlight the business problem.
- Simple Linear Regression and Regression Decision Trees to make predictions about thresholds where most loyalty points accumulate and to understand the explanatory power of my predictions.
- K-Means clustering to identify discrete customer segments
- Natural Language Processing and Sentiment Analysis to understand customer feedback at scale and identify any risks to my proposed strategy
Insights
- Spending as Primary Driver: Total spending is the main driver of loyalty point accumulation for the majority of customers.
- Income-Driven Thresholds: High-value customer segments were identified where point accumulation thresholds correlated more strongly with income.
- Customer Sentiment: Customer reviews were proven to be overwhelingly positive. No blockers to my proposed strategy materialised from statistical analysis of review text.
Recommendations
- Launch targeted rewards programmes built around the predictive thresholds identified.
- Tailor rewards specifically to the customer personas developed from the clustering process.
- Target adverts at specific audiences.
Files
Submitted project files and Facilitator Feedback