UX Collective

We believe designers are thinkers as much as they are makers. https://linktr.ee/uxc

Follow publication

Leveraging LLMs to uncover experience gaps at Nike Soho

Fredrik Sterner Cederlöf
UX Collective
Published in
15 min readSep 18, 2024

Nike Soho Customer Insights based on Google Reviews
Nike Soho Customer Insights based on Google Reviews. Photo by Jerome on Unsplash.
Image of a Kung Fu restaurant
Photo by Andreas Immke on Unsplash

Suddenly, you posses a new skillset

Boost customer experience with generative AI

Process map from customer service cases to a generative AI solution
CAIr — A built-in RAG solution with Azure AI and Salesforce Service Cloud

Experience drivers

Workflow

AI-powered workflow through ChatGPT 4o, Visual Studie Code and Streamlit web application
AI-powered workflow through ChatGPT 4o, Visual Studie Code and Streamlit web application

01. Data collection & preparation

Google Reviews

ChatGPT prompt: Strucuture Google review data into columns and create a table
Nike Soho Google Review example
Nike Soho Google Review example
Example: Nike Soho Google review comment
Example: Nike Soho Google review comment
import cv2
import pytesseract
from PIL import Image
import pandas as pd
import os

# Function to extract text from an image
def extract_text_from_image(image_path):
img = Image.open(image_path)
text = pytesseract.image_to_string(img)
return text

# Function to detect the number of yellow stars in an image
def detect_stars(image_path):
img = cv2.imread(image_path)
hsv = cv2.cvtColor(img, cv2.COLOR_BGR2HSV)

# Define the range for yellow color in HSV
lower_yellow = (20, 100, 100)
upper_yellow = (30, 255, 255)

# Threshold the image to get only yellow colors
mask = cv2.inRange(hsv, lower_yellow, upper_yellow)

# Find contours of the stars
contours, _ = cv2.findContours(mask, cv2.RETR_TREE, cv2.CHAIN_APPROX_SIMPLE)

# Filter out small contours that are not stars
star_contours = [c for c in contours if cv2.contourArea(c) > 100]

return len(star_contours)

# Directory containing the images
image_directory = "./ReviewImages"

# List to store the data
all_data = []

# Iterate over each image in the directory
for filename in os.listdir(image_directory):
if filename.endswith(".png") or filename.endswith(".jpg"):
image_path = os.path.join(image_directory, filename)
text = extract_text_from_image(image_path)
star_count = detect_stars(image_path)
all_data.append([filename, text, star_count])

# Create a DataFrame from the list
df = pd.DataFrame(all_data, columns=["Image File", "Extracted Text",
"Star Rating"])

# Save the DataFrame to an Excel file
df.to_excel("extracted_reviews.xlsx", index=False)

# Display the DataFrame
print(df)

Data cleaning and structuring

ChatGPT prompt: Strucuture and parse data into proper columns based on 
uploaded template and make sure that names are placed in a different
column called Reviewer Name, the text always starts with the name of the
reviewer. Where it says "Translated by Google - See original", only
write the language within () in a separate column named Language.
ChatGPT prompt: Provide a Python code that structure the google review 
based on data theese conditions and saves it as an excel.
ChatGPT prompt example: Translate text to numbers for star rating, 
please conduct following changes.

Replace * with Rating: 1
Replace kk with Rating: 2
Replace kkk with Rating: 3
Replace we wk &k *& ~~ with Rating: 4
Replace wk ee with Rating: 5
import pandas as pd
import re

# Load the data from the provided Excel file
file_path = 'extracted_reviews.xlsx'
df = pd.read_excel(file_path)

# Define a function to parse the extracted text into the desired attributes
def parse_review_text(text):
try:
# Use regular expressions to extract the relevant data
reviewer_name = re.search(r"^[^\n]*", text).group(0).strip()

reviews_count_match = re.search(r"(\d+) reviews", text)
photos_count_match = re.search(r"(\d+) photos", text)
time_ago_match = re.search(r"(\d+\s+\w+ ago|a year ago)", text)
review_text_match = re.search(r"ago\n(.*)", text, re.DOTALL)

reviews_count = reviews_count_match.group(1).strip() if reviews_count_match else None
photos_count = photos_count_match.group(1).strip() if photos_count_match else None
time_ago = time_ago_match.group(1).strip() if time_ago_match else None
if time_ago == "a year ago":
time_ago = "1 year ago"
review_text = review_text_match.group(1).strip().replace('\n', ' ') if review_text_match else None

language = "English"
language_match = re.search(r"Translated by Google - See original \((\w+)\)", text)
if language_match:
language = language_match.group(1).strip()

return reviewer_name, reviews_count, photos_count, time_ago, review_text, language
except AttributeError:
return None, None, None, None, None, None

# Apply the parsing function to each row
parsed_data = df['Extracted Text'].apply(parse_review_text)

# Create a new DataFrame with the structured data
structured_df = pd.DataFrame(parsed_data.tolist(), columns=['Reviewer Name', 'Reviews Count', 'Photos Count', 'Time Ago', 'Review Text', 'Language'])

# Add the 'Rating' column
structured_df.insert(3, 'Rating', df['Star Rating'])

# Remove rows where 'Review Text' is None or empty
structured_df = structured_df[structured_df['Review Text'].notna() & (structured_df['Review Text'] != '')]

# Save the structured DataFrame to an Excel file
structured_excel_path = 'structured_reviews.xlsx'
structured_df.to_excel(structured_excel_path, index=False)

# Display the structured DataFrame
print(structured_df.head())
Snapshot of our created data frame stored in structured_reviews.xlsx
ChatGPT prompt: Attached dataset contains Google Reviews from a 
Nike retailer in Soho NY. Could you provide a summary of the dataset's
contents and suggest recommendations for further analysis?
Output
The result will yield 8 columns of data, capturing comments from 621
reviews of the Nike Soho Retailer in New York.

Dataset
ID: Unique identifier for each review
Reviewer Name: Name of the reviewer
Reviews Count: Number of reviews the reviewer has written
Photos Count: Number of photos the reviewer has uploaded
Rating: Rating given by the reviewer
Time Ago: Time since the review was posted
Review Text: Text of the review
Language: Language of the review

Recommendations for further analysis (summarised vers.)
- Rating Distribution: Distribution of star ratings, comparing avg. rating
- Sentiment analysis: Determine the overall tone, correlate with ratings
- Review frequency: Frequency over time, evaluate recency trends
- Reviewer Insights: Segment based on review count, differ in rating
- Language Distribution: Distribution of languages, differences
- Text Analysis: Keyword and topic analysis, common themes
Star rating distribution of Google Reviews
Star rating distribution of Google Reviews
ChatGPT prompt Rating: Visualize the distribution of ratings from 1 to 5 as 
a percentage of the total using a bar chart. When done export data as Excel.
Average sentiment score over time
Average sentiment score over time
ChatGPT prompt Sentiment: 
Visualize sentiment score over time based on review text divided by quarter
in a line graph. Add number of reviews as value label.
Google review feedback themes
Google review feedback themes
ChatGPT prompt Feedback themes: Please identify and categorize the main 
themes from the review texts using topic modeling, then provide the
percentage of reviews associated with each theme.
Average review rating for each language
Average review rating for each language
ChatGPT prompt Review per language: Visualize average review score for each 
language with a bar chart. Then conduct a review summary of the languages with
lowest scores and what they are complaining about.
Output:
Both French and Spanish-speaking customers have relatively lower average
ratings compared to other languages, which could indicate specific
concerns or dissatisfaction among these groups. This aligns with the
earlier finding that French-speaking reviewers expressed concerns
about product choices and store-related issues.

2. Data analysis and processing

OpenAI interface showing how to upload excel file in GPT 4o
OpenAI — ChatGPT 4o interface

Feedback topics and sentiment score

ChatGPT prompt: Analyze and classify all customer feedback into as many 
specific topics as possible based on its theme. Avoid using a generic
'Other' category if possible, and instead, create additional topics that
capture unique aspects of the feedback.

Then, perform a sentiment analysis on the 'Review Text' and add the
sentiment score to a new column named 'Sentiment,' rounding the numbers
down to two decimals.
Output
The customer feedback has been analyzed and contains following topics:

Customer Service, Product/Inventory, Location/Facility
Pricing, Parking/Access, Cleanliness, Checkout/Payment, Operating Hours
Return/Exchange Policy, Ambiance.

The "Miscellaneous" category is present for reviews that do not clearly
fit into the other, more specific categories.
Snapshot of data containing topics and sentiment scores

Count topics and average scores

ChatGPT prompt: Count the occurrences of each topic and summarize the 
corresponding reviews. For each topic, calculate the average rating and
the average sentiment score, rounding both to two decimals.

Then, create a new DataFrame named ExperienceDrivers that includes the topic,
the total number of occurrences, the average rating, and the average
sentiment score and the summarized reviews.
Snapshot of The ExperienceDrivers data frame including feedback summaries
Output

- Topic: The specific theme identified in the customer feedback
- Occurrences: The total number of reviews mentioning this topic
- Average Rating: The average rating associated with the topic
- Average Sentiment: The average sentiment score calculated from the reviews
- Summarized Reviews: A AI based summary of each topic

Data visualization

ChatGPT prompt: I need to understand the main experience drivers to identify 
gaps and opportunities for improvement. Please visualize a bubble chart with
the average sentiment score on the x-axis and the average rating on
the y-axis for each identified topic.

The size of each bubble should represent the number of reviews associated
with that topic. Ensure that each topic is clearly labeled on the plot to
highlight the relationship between sentiment, customer ratings, and review
volume.
Bubble chart containing experience drivers
Bubble chart containing experience drivers
ChatGPT prompt: First break up each sentence on separate rows, then analyze 
and classify all customer feedback into as many specific topics as possible
based on its theme. Avoid using a generic 'Other' and 'Miscellaneous' category
if possible, and instead, create additional topics that capture unique aspects
of the feedback.

Then, perform a sentiment analysis on the 'Review Text' and add the sentiment
score to a new column named 'Sentiment,' rounding the numbers down to
two decimals.
Snapshot of all sentences including related topic and tonality
Snapshot of all sentences including related topic and tonality
Experience drivers/topics scatter plot done with Python
Updated number of experience drivers/topics

Speak with your customers

OpenAI GPT 4o interface using the voice feature
OpenAI GPT 4o interface using the voice feature

3. Visualization and deployment

ChatGPT prompt: Summarize customer feedback by topic, counting entries, 
calculating average sentiment and rating (rounded to two decimals),
and generating a 5-sentence summary for each topic. The summary should
identify gaps and pain points (negative feedback) and highlight what's
working well (positive feedback).

The final data frame should include Topic, Frequency, Average Sentiment,
Average Rating, and AI Summary.
Experience drivers including AI-based summary
ChatGPT prompt: Summarize customer feedback by topic, counting entries, 
calculating average sentiment and rating (rounded to two decimals),
and generating a 5-sentence summary for each topic. The summary should
identify gaps and pain points (negative feedback) and highlight what's
working well (positive feedback).

The final data frame should include Topic, Frequency, Average Sentiment,
Average Rating, and AI Summary.
import streamlit as st
import pandas as pd
import plotly.express as px

# Load the data from the Excel file
file_path = 'topic.xlsx'
df = pd.read_excel(file_path, sheet_name='Reclassified_Customer_Feedback_')

# Grouping the data by 'Feedback' and calculating the frequency, average sentiment, and average rating
feedback_summary = df.groupby('Feedback').agg(
Frequency=('Feedback', 'size'),
Average_Sentiment=('Sentiment', 'mean'),
Average_Rating=('Rating', 'mean')
).reset_index()

# Rounding the sentiment and rating to two decimal places
feedback_summary['Average_Sentiment'] = feedback_summary['Average_Sentiment'].round(2)
feedback_summary['Average_Rating'] = feedback_summary['Average_Rating'].round(2)

# Scatter plot with Plotly
fig = px.scatter(
feedback_summary,
x='Average_Sentiment',
y='Average_Rating',
size='Frequency',
color='Feedback',
hover_name='Feedback',
title='Feedback Summary: Sentiment vs. Rating',
labels={
'Average_Sentiment': 'Sentiment',
'Average_Rating': 'Google Review Rating'
}
)

# Streamlit app title and plot display
st.title('Feedback Sentiment and Rating Analysis')
st.plotly_chart(fig)
streamlit run app.py
Streamlit application showing Experience Drivers
Streamlit application showing Experience Drivers
Nike Soho experience driver matrix visualised as a scatter plot
Nike Soho experience driver matrix visualised as a scatter plot

Free

Distraction-free reading. No ads.

Organize your knowledge with lists and highlights.

Tell your story. Find your audience.

Membership

Read member-only stories

Support writers you read most

Earn money for your writing

Listen to audio narrations

Read offline with the Medium app

Written by Fredrik Sterner Cederlöf

Global Head of Customer Insights, AI & Analytics at Polestar. Currently on sabbatical and residing in Tokyo. CEO & Principal Analyst at Insight Works AB.

Responses (2)

Write a response

A thought-provoking read! It ignites curiosity about our tech-driven future and invites thoughtful discussion.

--

Really incredible process that you've documented well. As someone not in the field of AI or Analytics - a great insight into how it can be leveraged in a way that feels surprisingly human and caring.

--