Leveraging LLMs to uncover experience gaps at Nike Soho

Analyzing customer feedback and revealing insights that enhance customer satisfaction.

Published in

UX Collective

15 min readSep 18, 2024

Nike Soho Customer Insights based on Google Reviews. Photo by Jerome on Unsplash.

In 2018, I began a pivotal journey into R and statistics. Having worked with digital development since the late ’90s, I realized that continue moving into senior CX and analytics roles required a stronger theoretical foundation. Leading teams effectively wasn’t just about passion, stamina, and enthusiasm — it also demanded a thorough grasp of the field’s intricacies.

Determined to enhance my expertise, I immersed myself in a wide range of open resources related to R and statistical analysis. This dedication finally paid off and shortly after that, I secured a new role where I built a Global insight and analytics team at the EV company Polestar.

As I progressed, the Self-Determination Theory hit me, shifting focus from hands-on coding to strategic work and team management. This evolution not only broadened my capabilities but also amplified my ability to drive impactful initiatives. Leading and managing a team required my full attention, leaving little time for coding. For better or for worse.

Image of a Kung Fu restaurant — Photo by Andreas Immke on Unsplash

Suddenly, you posses a new skillset

Now, living in Tokyo as a Shufu (head of the home), I’ve decided to reflect on the journey that brought me here. So I went back in time. After one week of coding with the help of LLMs, I felt like Neo learning Kung Fu in The Matrix — waking to endless possibilities and thinking, “I know Python!”.

Of course, true expertise comes from continuous learning; at my current stage, my skills are only as strong as my latest prompt. Nevertheless, with a virtual and tireless Python assistant by my side, coupled with my experience in observing and leading data scientists, I felt as if I were wielding a samurai sword — ready to tackle and build any coding challenge that came my way.

Boost customer experience with generative AI

Having built a Voice of the Customer Program from the ground up, we quickly realized that integrating machine learning (ML) with customer feedback not only saved time but also revealed valuable opportunities that had previously been hidden. With the release of large language models (LLMs), the gap between analysis and findings narrowed significantly, boosting our ability to understand and respond to customer needs at an accelerated pace.

While large language models (LLMs) are undeniably powerful, I still recommend using supervised ML for tasks like text classification and topic analysis, as the results from LLMs can vary. Supervised models offer more consistency and control in these scenarios.

Looking back, the breakthrough was our smooth adoption of Generative AI, exemplified by CAIr — a Retrieval-Augmented Generation (RAG) solution for formulating customer responses saving up to 4 min/customer inquiry. Proudly with the same customer satisfaction levels as non-AI-generated answers, the solution became a Microsoft Azure OpenAI case study.

Process map from customer service cases to a generative AI solution — CAIr — A built-in RAG solution with Azure AI and Salesforce Service Cloud

Out of curiosity and earlier experience, I began exploring ChatGPT’s potential to accelerate opportunities from customer feedback. My goal was to assess how effectively it could support and maybe even replace modern text analytics tools.

After writing a post on how to identify customer experience gaps using prompts alone, it quickly led to the development of something more advanced: an Experience Driver Matrix Application. When used effectively, this tool helps uncover the key factors driving brand perceptions.

Experience drivers

Reading through endless customer feedback is not time-efficient. Experience drivers are therefore critical and commonly utilized when aiming to enhance a brand’s experience. These drivers encompass elements of a product, service, or interaction that significantly influence customer perceptions and behaviors based on feedback.

Understanding experience drivers is crucial, as they directly impact key outcomes such as customer satisfaction, loyalty, and advocacy. Traditionally, this functionality is built into Feedback Management platforms. The question then became — how far could I push this by relying solely on ChatGPT?

To illustrate this, I decided to focus on the only brand that makes me feel like the athlete I never became: Nike. As for the data source, Nike Soho Google Reviews.

Workflow

To transform reviews into actionable opportunities, we must carefully plan our workflow. While creating effective prompts may require some trial and error, the process is relatively straightforward and essential for success.

We start by gathering and structuring the review data, then move on to analysis and data processing. Finally, we visualize the key experience drivers and deploy a custom web application to bring the findings to life.

AI-powered workflow through ChatGPT 4o, Visual Studie Code and Streamlit web application

01. Data collection & preparation

While it’s possible to generate dummy data with ChatGPT, authentic customer feedback is essential for meaningful analysis. One of the most accessible and valuable sources for real customer feedback is of course Google Reviews.

Google Reviews

While numerous advanced data scraping methods and tools exist — each with its strengths and weaknesses — leveraging the Google Translate API proved ineffective for my needs due to challenges like infinite scroll handling and strict quota limits. Consequently, I chose a more manual approach which is capturing screenshots of individual Google reviews and employing OCR (Optical Character Recognition) with Pytesseract to extract data.

To streamline this process, I wrote a GPT prompt designed to extract and organize text from uploaded screenshots into structured columns. Once I was happy with the results, I proceeded to automate the workflow, ensuring efficiency and scalability for future data extraction tasks.

ChatGPT prompt: Strucuture Google review data into columns and create a table

Google Reviews OCR-scraping with CV2 and Pytesserac
The script outlined below automates textual content extraction and yellow star ratings. It processes screenshots stored in the local ReviewImages folder and utilizes Optical Character Recognition (OCR) to retrieve textual data and identify yellow stars by filtering them based on color.

Extracted data is then organized into a structured table and exported to Excel. This streamlined process effectively prepares us for the subsequent phase, which involves parsing unstructured text.

Example: Nike Soho Google review comment

import cv2
import pytesseract
from PIL import Image
import pandas as pd
import os

# Function to extract text from an image
def extract_text_from_image(image_path):
    img = Image.open(image_path)
    text = pytesseract.image_to_string(img)
    return text

# Function to detect the number of yellow stars in an image
def detect_stars(image_path):
    img = cv2.imread(image_path)
    hsv = cv2.cvtColor(img, cv2.COLOR_BGR2HSV)

    # Define the range for yellow color in HSV
    lower_yellow = (20, 100, 100)
    upper_yellow = (30, 255, 255)

    # Threshold the image to get only yellow colors
    mask = cv2.inRange(hsv, lower_yellow, upper_yellow)

    # Find contours of the stars
    contours, _ = cv2.findContours(mask, cv2.RETR_TREE, cv2.CHAIN_APPROX_SIMPLE)

    # Filter out small contours that are not stars
    star_contours = [c for c in contours if cv2.contourArea(c) > 100]

    return len(star_contours)

# Directory containing the images
image_directory = "./ReviewImages"

# List to store the data
all_data = []

# Iterate over each image in the directory
for filename in os.listdir(image_directory):
    if filename.endswith(".png") or filename.endswith(".jpg"):
        image_path = os.path.join(image_directory, filename)
        text = extract_text_from_image(image_path)
        star_count = detect_stars(image_path)
        all_data.append([filename, text, star_count])

# Create a DataFrame from the list
df = pd.DataFrame(all_data, columns=["Image File", "Extracted Text", 
"Star Rating"])

# Save the DataFrame to an Excel file
df.to_excel("extracted_reviews.xlsx", index=False)

# Display the DataFrame
print(df)

Data cleaning and structuring

The next phase involves data structuring. To achieve this I supplied ChatGPT with an Excel template outlining the specific format needed. Utilizing this template, the code effectively organizes the review data by parsing text to identify key attributes, including the reviewer’s name, number of reviews, presence of photos, review date, review content, and language.

ChatGPT prompt: Strucuture and parse data into proper columns based on 
uploaded template and make sure that names are placed in a different 
column called Reviewer Name, the text always starts with the name of the 
reviewer. Where it says "Translated by Google - See original", only 
write the language within () in a separate column named Language.

ChatGPT prompt: Provide a Python code that structure the google review 
based on data theese conditions and saves it as an excel.

ChatGPT prompt example: Translate text to numbers for star rating, 
please conduct following changes. 

Replace * with Rating: 1 
Replace kk with Rating: 2 
Replace kkk with Rating: 3 
Replace we wk &k *& ~~ with Rating: 4 
Replace wk ee with Rating: 5

The code structures our data into a new data frame, including star ratings, and filters out incomplete reviews. (Make sure to provide text parsing instructions until you reach the desired data quality for further analysis.).

import pandas as pd
import re

# Load the data from the provided Excel file
file_path = 'extracted_reviews.xlsx'
df = pd.read_excel(file_path)

# Define a function to parse the extracted text into the desired attributes
def parse_review_text(text):
    try:
        # Use regular expressions to extract the relevant data
        reviewer_name = re.search(r"^[^\n]*", text).group(0).strip()
        
        reviews_count_match = re.search(r"(\d+) reviews", text)
        photos_count_match = re.search(r"(\d+) photos", text)
        time_ago_match = re.search(r"(\d+\s+\w+ ago|a year ago)", text)
        review_text_match = re.search(r"ago\n(.*)", text, re.DOTALL)
        
        reviews_count = reviews_count_match.group(1).strip() if reviews_count_match else None
        photos_count = photos_count_match.group(1).strip() if photos_count_match else None
        time_ago = time_ago_match.group(1).strip() if time_ago_match else None
        if time_ago == "a year ago":
            time_ago = "1 year ago"
        review_text = review_text_match.group(1).strip().replace('\n', ' ') if review_text_match else None
        
        language = "English"
        language_match = re.search(r"Translated by Google - See original \((\w+)\)", text)
        if language_match:
            language = language_match.group(1).strip()

        return reviewer_name, reviews_count, photos_count, time_ago, review_text, language
    except AttributeError:
        return None, None, None, None, None, None

# Apply the parsing function to each row
parsed_data = df['Extracted Text'].apply(parse_review_text)

# Create a new DataFrame with the structured data
structured_df = pd.DataFrame(parsed_data.tolist(), columns=['Reviewer Name', 'Reviews Count', 'Photos Count', 'Time Ago', 'Review Text', 'Language'])

# Add the 'Rating' column
structured_df.insert(3, 'Rating', df['Star Rating'])

# Remove rows where 'Review Text' is None or empty
structured_df = structured_df[structured_df['Review Text'].notna() & (structured_df['Review Text'] != '')]

# Save the structured DataFrame to an Excel file
structured_excel_path = 'structured_reviews.xlsx'
structured_df.to_excel(structured_excel_path, index=False)

# Display the structured DataFrame
print(structured_df.head())

Snapshot of our created data frame stored in structured_reviews.xlsx

Next, let’s familiarize ourselves with the dataset before building an experience driver analysis matrix. Begin by uploading the complete Excel file into ChatGPT. Then, prompt and generate a data summary that describes the data including recommendations for analysis.

ChatGPT prompt: Attached dataset contains Google Reviews from a 
Nike retailer in Soho NY. Could you provide a summary of the dataset's 
contents and suggest recommendations for further analysis?

Output
The result will yield 8 columns of data, capturing comments from 621 
reviews of the Nike Soho Retailer in New York.

Dataset
ID: Unique identifier for each review
Reviewer Name: Name of the reviewer
Reviews Count: Number of reviews the reviewer has written
Photos Count: Number of photos the reviewer has uploaded
Rating: Rating given by the reviewer
Time Ago: Time since the review was posted
Review Text: Text of the review
Language: Language of the review

Recommendations for further analysis (summarised vers.)
- Rating Distribution: Distribution of star ratings, comparing avg. rating
- Sentiment analysis: Determine the overall tone, correlate with ratings
- Review frequency: Frequency over time, evaluate recency trends
- Reviewer Insights: Segment based on review count, differ in rating
- Language Distribution: Distribution of languages, differences 
- Text Analysis: Keyword and topic analysis, common themes

Let’s proceed with the recommended visualizations to gain deeper insights into customer satisfaction and common feedback themes.

Star rating distribution of Google Reviews

ChatGPT prompt Rating: Visualize the distribution of ratings from 1 to 5 as 
a percentage of the total using a bar chart. When done export data as Excel.

ChatGPT prompt Sentiment: 
Visualize sentiment score over time based on review text divided by quarter 
in a line graph. Add number of reviews as value label.

ChatGPT prompt Feedback themes: Please identify and categorize the main 
themes from the review texts using topic modeling, then provide the 
percentage of reviews associated with each theme.

ChatGPT prompt Review per language: Visualize average review score for each 
language with a bar chart. Then conduct a review summary of the languages with 
lowest scores and what they are complaining about.

Output:
Both French and Spanish-speaking customers have relatively lower average 
ratings compared to other languages, which could indicate specific 
concerns or dissatisfaction among these groups. This aligns with the 
earlier finding that French-speaking reviewers expressed concerns 
about product choices and store-related issues.

It’s worth noting that cultural differences play a significant role in how feedback and complaints are provided. For example, in France 🇫🇷, there’s even a word for complaining just for the fun of it — râler.

With this method, you can analyze data, visualize trends, identify themes in text, and export the results to CSV, Excel, Python, or R for further analysis. Ensure that you operate in a secure environment to prevent your data from being stored or used for future training.

2. Data analysis and processing

Let’s prepare our dataset of Google Reviews and upload Excel in ChatGPT. Then prompt instructions to add topics and calculate sentiment scores.

OpenAI interface showing how to upload excel file in GPT 4o — OpenAI — ChatGPT 4o interface

Feedback topics and sentiment score

ChatGPT prompt: Analyze and classify all customer feedback into as many 
specific topics as possible based on its theme. Avoid using a generic 
'Other' category if possible, and instead, create additional topics that 
capture unique aspects of the feedback. 

Then, perform a sentiment analysis on the 'Review Text' and add the 
sentiment score to a new column named 'Sentiment,' rounding the numbers 
down to two decimals.

Output
The customer feedback has been analyzed and contains following topics:

Customer Service, Product/Inventory, Location/Facility
Pricing, Parking/Access, Cleanliness, Checkout/Payment, Operating Hours
Return/Exchange Policy, Ambiance.

The "Miscellaneous" category is present for reviews that do not clearly 
fit into the other, more specific categories.

Snapshot of data containing topics and sentiment scores

Count topics and average scores

To visualize the results, we’ll create a data frame with topics, frequency, average ratings, and tonality based on sentiment scores.

ChatGPT prompt: Count the occurrences of each topic and summarize the 
corresponding reviews. For each topic, calculate the average rating and 
the average sentiment score, rounding both to two decimals. 

Then, create a new DataFrame named ExperienceDrivers that includes the topic, 
the total number of occurrences, the average rating, and the average 
sentiment score and the summarized reviews.

Snapshot of The ExperienceDrivers data frame including feedback summaries

Output

- Topic: The specific theme identified in the customer feedback
- Occurrences: The total number of reviews mentioning this topic
- Average Rating: The average rating associated with the topic
- Average Sentiment: The average sentiment score calculated from the reviews
- Summarized Reviews: A AI based summary of each topic

Data visualization

Bubble charts are a versatile tool for visualizing multi-dimensional data and are best used in scenarios where you need to compare and analyze three or more variables simultaneously. Let’s proceed with detailed instructions on which variables to use for the X and Y axes, and how they relate to each other. Remember to be precise and clear in your instructions, as these are meant to guide you and future you.

ChatGPT prompt: I need to understand the main experience drivers to identify 
gaps and opportunities for improvement. Please visualize a bubble chart with 
the average sentiment score on the x-axis and the average rating on 
the y-axis for each identified topic. 

The size of each bubble should represent the number of reviews associated 
with that topic. Ensure that each topic is clearly labeled on the plot to 
highlight the relationship between sentiment, customer ratings, and review 
volume.

Bubble chart containing experience drivers

A trained analyst’s eye can easily deduce that operating hours and return/exchange policies appear to be drivers of negative customer experiences. However, these issues are not frequently mentioned. The challenge with conducting topic classification on a single review is that you sometimes miss the details. To enhance accuracy we can easily break down each review into individual sentences and then reclassify them.

ChatGPT prompt: First break up each sentence on separate rows, then analyze 
and classify all customer feedback into as many specific topics as possible 
based on its theme. Avoid using a generic 'Other' and 'Miscellaneous' category
if possible, and instead, create additional topics that capture unique aspects 
of the feedback. 

Then, perform a sentiment analysis on the 'Review Text' and add the sentiment 
score to a new column named 'Sentiment,' rounding the numbers down to 
two decimals.

Snapshot of all sentences including related topic and tonality

This offers a new perspective that extends beyond main experience themes captured from only a full review.

By honing in on these key experience drivers, you can implement targeted improvements that yield the most substantial positive impact. Now, imagine analyzing 400,000 rows of customer feedback. For a large organization, deploying this capability at scale could result in significant time and cost savings.

Ideally, stay away from the lower left corner, where negative sentiment and low ratings converge. Conversely, focus on the drivers that frequently appear in positive reviews which contribute significantly to customer satisfaction.

Experience drivers/topics scatter plot done with Python — Updated number of experience drivers/topics

While there are multiple ways to visualize data with ChatGPT, I recommend using dynamic charts outside of ChatGPT for enhanced flexibility and interactivity.

Speak with your customers

One of the most powerful features is the ability to interact directly with your data. While you can certainly type queries to gain deeper insights, ChatGPT goes a step further by enabling open, conversational interaction. This allows you to dive deeper into pain points, explore opportunities, and uncover suggested solutions alongside a dedicated, voice-based AI assistant.

OpenAI GPT 4o interface using the voice feature

3. Visualization and deployment

While static charts can be visually appealing, bubble charts truly shine when used as dynamic tools. They allow you to zoom in and out, and hover over data points to reveal additional context. For professional applications, it’s advisable to integrate ‘local data’ directly with OpenAI APIs for internal usage.

For now, we’ll focus on using earlier Excel, preparing it for visualization with Python’s Plotly, and deploying it through the open-source Python framework Streamlit.

ChatGPT prompt: Summarize customer feedback by topic, counting entries, 
calculating average sentiment and rating (rounded to two decimals), 
and generating a 5-sentence summary for each topic. The summary should 
identify gaps and pain points (negative feedback) and highlight what's 
working well (positive feedback). 

The final data frame should include Topic, Frequency, Average Sentiment, 
Average Rating, and AI Summary.

Experience drivers including AI-based summary

Once we’re satisfied with the results, we can proceed by using GPT-4o to generate a simplified Python code that will create a scatter plot with displayed data. Before running the code, make sure you have installed the necessary libraries.

ChatGPT prompt: Summarize customer feedback by topic, counting entries, 
calculating average sentiment and rating (rounded to two decimals), 
and generating a 5-sentence summary for each topic. The summary should 
identify gaps and pain points (negative feedback) and highlight what's 
working well (positive feedback). 

The final data frame should include Topic, Frequency, Average Sentiment, 
Average Rating, and AI Summary.

import streamlit as st
import pandas as pd
import plotly.express as px

# Load the data from the Excel file
file_path = 'topic.xlsx'
df = pd.read_excel(file_path, sheet_name='Reclassified_Customer_Feedback_')

# Grouping the data by 'Feedback' and calculating the frequency, average sentiment, and average rating
feedback_summary = df.groupby('Feedback').agg(
    Frequency=('Feedback', 'size'),
    Average_Sentiment=('Sentiment', 'mean'),
    Average_Rating=('Rating', 'mean')
).reset_index()

# Rounding the sentiment and rating to two decimal places
feedback_summary['Average_Sentiment'] = feedback_summary['Average_Sentiment'].round(2)
feedback_summary['Average_Rating'] = feedback_summary['Average_Rating'].round(2)

# Scatter plot with Plotly
fig = px.scatter(
    feedback_summary,
    x='Average_Sentiment',
    y='Average_Rating',
    size='Frequency',
    color='Feedback',
    hover_name='Feedback',
    title='Feedback Summary: Sentiment vs. Rating',
    labels={
        'Average_Sentiment': 'Sentiment',
        'Average_Rating': 'Google Review Rating'
    }
)

# Streamlit app title and plot display
st.title('Feedback Sentiment and Rating Analysis')
st.plotly_chart(fig)

Run the Streamlit Application: Save the code in a file, for example, app.py, and run it using:

streamlit run app.py

Streamlit application showing Experience Drivers

For presentations and storytelling, I suggest showcasing the data in a more visually impactful format, as shown below.

At this stage, we’ve successfully gathered, structured, and visualized the experience drivers, making it ready for further analysis by querying the data frame for opportunities, solutions, and deeper context — rivaling or even surpassing what established Feedback Management platforms can achieve.

The next step would be to extract opportunities into a customer journey framework to transition from insights to action in a structured and strategic manner.