What does a UX expert think of the design of ChatGPT?

Jeff Axup, Ph.D.

Published in

UX Collective

14 min readJan 31, 2023

DALL-E generated art about collaborative idea-generation between ChatGPT and intelligent mammals. — DALL-E-generated art about collaborative idea-generation between ChatGPT and intelligent mammals. Alternative explanation: UX expert being surprised by design quality of research prototype.

This article shows the results of a UX review of the ChatGPT design.
The top personas, use cases, and 11 design attributes are discussed and evaluated (see Use Cases section).
The product performed quite well with a pass ✅ for 15/17 factors, and the remaining 2 were unresolved 🤔 due to further testing needed (see Findings Summary section).
The product has a novel approach to search, structuring the UI as ongoing conversations, and a number of possible future design directions are discussed (see Discussion section).

As a UX expert, I commonly get asked:

“So, what do you think of the design of this product?”
(shows me home page of web site or mobile app)

My answer is typically:

“UX encompasses a lot of underlying aspects of the product design, so I don’t know enough about the product to give you a useful opinion yet.”

This is the same answer anyone with real expertise in the field should give. Designers with insufficient interaction design experience tend to gravitate immediately towards the look and feel, branding, color choices, and adherence to common design patterns or style guides they are familiar with. These things* are much less important than the following design attributes, which can make or break a product, and can’t be easily fixed:

Personas: Is the product design appropriate for the target audience? Is it clear the designers knew who their target personas were?
Use Cases: Is the product design focused on the top three use cases? Is it clear the designers knew what their top use cases were, and what user goals for those were?
Happy Path: Is the happy path for the target user types seamless and offering clear value at the end? Is it clear the designers knew what the happy path was, and then helped QA exhaustively test it end-to-end before release?
Mental Model: Is the mental model for the product clearly articulated in a non-technical and simple manner, regardless of the complexity of the underlying technology and tasks? Does the IA map clearly to that mental model?
Navigation: Is it always clear where I am, how to get to where I want to be, how to get back to where I was, and how to avoid losing anything?
Task Completeness: Do tasks feel “simple and sufficiently complete” (i.e. a good MVP) or do they feel “incomplete and lacking necessary features to proceed” (i.e. an MVP that didn’t focus on the right use cases or released too early.)
Business Goals: Have business goals for the product been supported by the design? A product design that doesn’t support business goals is at high risk of failure.
Emotional Response: How does the user feel about the product after initial onboarding? Excited? Exhilarated? Raring to do more? Lost? Disappointed? Frustrated? Underwhelmed?
Feedback: Is there any obvious way for users to provide feedback, report problems, or help the product become better? If not, it shows a team that isn’t truly user-centered.
Perceived Value: Is the design “immediately useful or rewarding”? Or is the user left wondering how to proceed, or how to start getting value?
Growth Plan: Does the product seem designed to scale into long-term usage? Think: automation, syncing, bulk-management, monitoring, alerting, stickiness, speed of access.

* Please note that “aesthetics” is not one of the bullet points. While it is an important design consideration, many highly usable products don’t look great, and don’t need to, in order to be profitable products that people enjoy using. If they are particularly bad, aesthetics can be easily fixed later. Underlying architectural, mental model, and use case problems get baked in, and are often prohibitively expensive to go back and fix. The only aesthetic elements I initially look at are: whether the design is “clean”, logically organized, and appropriate for the target audience.

I was not part of the ChatGPT design group, and I can only guess at things like target personas, use cases, and business goals for the product. These are typically things I would at least touch on before offering an initial impression of design quality for a product.

Personas

Initially:

👩‍💻 Software developers
🤓 Startup folks
👨‍🎓 Younger demographic that plays with new apps/tech (aka students)
👨‍🎤 Edgy creative types (aka artists)
👨‍💼 News media / investors

Long-term:

👤 Anyone who currently uses Google (i.e. has a smart phone or Internet access and searches for things).

Perhaps I am being a little broad on this one, and it is common for startups to say their target persona is “everyone” (they are usually wrong.) However, in this case, target demographics could be extremely broad if the interface is sufficiently non-technical, simple to use, and easy to get value from across a broad range of scenarios.

You could argue this is a different market from Google. Google is about finding an answer, or finding a web site to answer a question. ChatGPT is about having a conversation where you learn and co-create, which is slightly different than finding a web page. (To be fair, Google has already started to transition towards question-answering.) I personally think that the two markets overlap a great deal.

Use Cases

I tested the product using the top five most common and important use cases I could think up, as well as some common heuristic evaluation methods.

UC-1: As a Student, access the product and get fully onboarded.

The onboarding sequence presented no problems. It is concise and uses larger fonts and a simple sequence of steps. It does verify both your email and a phone number, presumably to cut down on bot-abuse. It drops the user directly into the main screen, which has a number of pre-populated examples for how to being using the system. The onboarding won’t win any awards for creativity, but it does get the job done efficiently, which is what really matters.

UC-2: As a member of News Media, understand what the product is for and how to begin using it.

ChatGPT has many features to support initial use.

Initial use of ChatGPT is actually pretty good. They kept it minimal and focused on the target use cases. The happy path (get going on your first interesting conversation) has a plush red-carpet reaching out to the user. This is via the top-left cell of the table placed in the center of the screen (where the user’s eyes probably first land). Clicking it automatically kicks off the happy path flow.

There are also multiple other access points to begin the happy path in the top-left and center-bottom, so the user can hardly help but stumble into the right sequence. Kudos for the onboarding help table structure, which is organized around categories of the mental model: Examples (how to interact), Capabilities (what it can do, and also how to interact with it effectively), and Limitations (what would not be effective to do). This is a very user-centered table, and it helps the user get more out of the system over time.

There’s something else clever here: the product is literally asking to be interacted with. It is eager to start a conversation with you, and it tells you it’s just a nice informal chat, and here are the types of things it would like to talk about. In short — it is friendly and inviting. Compare this with the front page of Google. Google really has some room for design improvements, but on the plus side the name “Google” is way more inviting and fun than the name “ChatGPT”.

Google search front page is not as friendly and doesn’t onboard users very well in comparison to ChatGPT.

UC-3: As a Startup Person, use it for a personal task and achieve a goal.

My first task was to ask about passive income strategies that I might be able to form a side business with.

ChatGPT can offer advice about advanced topics, explain differences and confirm new ideas.

It actually did quite well. Many of the ideas are commonplace and well known, but some of them were more edgy and creative. It was able to compare and contrast different ideas it presented, which is pretty impressive. When I asked it to dig into a specific sub-category, it was able to provide additional ideas, which supports iterative creative processes. It was also able to let me know if my own idea was valid or not, although it thinks a bit too much like a lawyer for my taste.

It can also personalize responses using specific scenarios.

The ability to ask for clarifications or dig into specific topics is particularly engaging and helpful. Being able to personalize the question with your own example or frame of reference is extremely valuable to people trying to understand a topic. It’s also something which a class lecturer would struggle with due to student/teacher ratios and the level of personalization needed. This is very powerful tech, as long as it is accurate.

UC-4: As a Creative Type, use the product long-term for repeated use.

ChatGPT automatically keeps an audit log of past conversations without being asked, organized by date, with the newest on top, and the button to initiate a new one on top. It also remembers the context of previous conversations and lets you continue with old brainstorms later on. This demonstrates a better memory and greater patience than most human conversational partners.

This use case is very hard to evaluate in only a few days. I do anticipate that if ChatGPT becomes a daily-use item, you would want to be able to ask Siri or Alexa your questions, or speak directly into an iPhone app or an earbud instead of typing. In short, this is great for an MVP, but if it falls under the category of “heavy repeated use” it will need to get more streamlined than it is now.

UC-5: As a Developer, use ChatGPT APIs to interactively do a task for an external user.

TBD. Check back later to see how this one came out, but it looks fairly approachable.

General Observations

Despite all of the advanced automation going on, it is clear the design/research team is also interested in user feedback and training their algorithm. There is a thumbs-up/down on every response, and instead of a dumb and useless selection of canned-response categories like most forms do (which isn’t very actionable), they are requesting a free-form response. They are undoubtedly automatically doing language processing on it for trends. This is what every site on the Internet should be doing for responses.

There are a few minor inconsistencies with common style conventions, but nothing that is likely to create severe usability problems, particularly at this stage of product complexity, with relatively few competing features.

“Regenerate response” should be renamed “Try Again”, or “Tell me this a different way”, which is more accurate and less technical.

Findings Summary

✅ UC-1: Access the product and get fully onboarded.
Result: Onboarding was simple, rapid, and had no blockers. (Good) My attempt to create a duplicate account for testing purposes failed. (Probably also good to avoid abuse of the system.)
✅ UC-2: Understand what the product is for and how to begin using it.
Result:
✅ UC-3: Use it for a personal task and achieve a goal.
Result: I learned something new on both queries. I can’t say the same for most parties I go to.
✅ UC-4: Use the product long-term for repeated use.
Result: This is hard to evaluate after only a few days of usage, but I suspect it will be “good enough” for an MVP. Clearly they built minimal and effective features to support this.
🤔 UC-5: As a Developer, use ChatGPT APIs to interactively do a task for an external user.
Result: TBD, probably fine.
✅ General Observations
Result: User feedback is easy to find and provides high-quality feedback. Minor design problems can be easily rectified.
✅ Personas
Result: With a few minor exceptions, they have taken a highly technical product and provided a simple and approachable interface a novice could use. They could easily expand into different types of use cases specific to particular personas/roles (e.g. Architects or Programmers).
✅ Use Cases
Result: All of the use cases passed with flying colors. Products often fall over during onboarding, initial use, or extended use.
✅ Happy Path
Result: The designers clearly knew what the happy path was an designed around it. Many many products do not do this.
✅ Mental Model
Result: The first thing the user sees after logging in is a depiction of the mental model in the table. They also see this again whenever they start a new conversation. Impressive.
✅ Navigation
Result: While this is a fairly simple product, the navigation is still fairly good. You land in a home page. When you ask a question is automatically generates a title for the conversation and saves it. Discussions are structured in a logical way. Everything auto-saves. This product has a lot of underlying complexity which is hidden well.
✅ Task Completeness
Result: Starting a conversation is by its nature ongoing. A real support call person would say “Did I fully answer your question? Thank you for coming.” ChatGPT doesn’t do that. On the one hand it would be more polite if it did, on the other, this is implicitly inviting you to come back and chat more later, which is kind of nice.
✅ Business Goals
Result: With over a million users in the first month, it seems likely these are being met. The question remains of whether it will be free to use like Google (which would make it a public good)
✅ Emotional Response
Result: As I say again below, not many products make you raise your eyebrows and say “wow”. Also telling you something useful that you didn’t already know is pretty impressive.
✅ Feedback
Result: They have regular thumbs-up/down buttons, and the ability to explain in plain text why you felt that way, which is impressive. They may have options to handle feature requests under Discord, but I would have liked to see a ‘submit feature request’ button more prominently placed for broader ideas that aren’t specifically relating to a particular chat response.
✅ Perceived Value
Result: I got more “wow” moments out of this product than most products I try. It’s immediately possible to see the value and reasons to come back. Not true of many other products.
🤔 Growth Plan
Result: Maybe. This has an open API behind it and it currently supports unlimited question-asking. Their servers are already straining under the load. But this is just a “research preview” so arguably it is still learning and doesn’t need a plan for growth just yet.

Discussion / Personal Opinions

Pros

When I review products they usually fail on about half the above criteria. ChatGPT pretty much got all green checks, and it’s only an MVP that’s a few months old.
Structuring the interface around conversations is either “obvious” or “brilliant” depending on how you look at it. Google does not do this, and seems more focused on finding one web site or one answer, and then being done. ChatGPT could easily have gone a similar route with a simple one-off search and result interface that is recycled each time. Turning it into an ongoing conversation with no necessary ending, makes it more akin to a conference workshop on a particular topic, that recurs each year with new ideas exchanged.
I have always expected that the rise or AI and robots would look more like a collaboration than a replacement or takeover. ChatGPT falls in line with that. You are smarter when you are augmented. You are more productive when you don’t have to do the drudgery. You are more successful when you can build off of others work to create new things. This interface is structured to support that type of interaction.

Cons

The name is terrible. It doesn’t role off the tongue, it is hard to remember, and it stands for “Chat Generative Pre-trained Transformer”. This will need to be rebranded once it leaves the research phase. Perhaps they left the awkward name just to reinforce that it will just be an underlying service protocol eventually.
In a similar fashion to search engines and news outlets, there will be concerns over how factual data is. People will use this and expect answers that are true, and then plan based on that. So, it better know that moon landings happened and that elections weren’t fraudulent.
Can’t forward a response to another person or reference it as a source in a document or research. This hinders group collaboration and accountability.
ChatGPT will still be another walled-garden of information, with limits on what it contains and what it is allowed to discuss. (e.g. Robinhood Snacks doesn’t discuss its parent company Robinhood, and promoted Medium authors can’t discuss Medium.) Similarly ChatGPT won’t talk about things it considers confidential, but are actually already public knowledge.

Ideas

Prototypes: It could be argued that ChatGPT is just a technology prototype, and not an actual product concept in its “final” form. This may be true, but it already has over a million users, and the team likely chose a design and interaction model they thought might work for real product use cases. It’s probably still fair to critique it.
Personification: Why isn’t this product using more personification? “Ask Jeeves” and “Clippy” come to mind. Sometimes it is a conscious choice to avoid personification, because it sets unrealistic expectations for interactions, and doesn’t foster an accurate mental model. My bet is the designers specifically avoided it on purpose, much like the Google team did. That said: a limited amount of it might actually improve usability and enjoyment. It doesn’t need a talking face animation, or a cartoon person, a hotel concierge desk branding scheme.
But what about:
- Speaking politely like a butler or concierge with a similar tone?
- Naming the product with a human name or something indicating a butler or assistant?
- Having the product get to know you over time (like your family chauffeur) who knows where you want to go, and can highlight things of high interest to you?
Group Collaboration: The interaction paradigm is built around the concept of “ongoing conversations”. We might be tempted to use the metaphors of a “salon”, or “academic conference”, or even a “forum”, but all of these are public and collaborative groups — which ChatGPT is not currently. Which brings up the question of whether it would benefit from being more than a “teacher” or “advisor” or “butler” and more of a “salon host” to facilitate problem solving, resolution finding, consensus building, collective bargaining, plan development, and similar things.
Automating Education: Student / teacher ratios are very high, students need more personalization than ever, and there aren’t enough qualified teachers available. While real humans are definitely important in the education process, it is clear that more personalized and accurate learning is needed, and labor costs are only going up. It will need to be automated to a large degree. There is also a lot of potential for developing nations where you can have a college professor in your pocket, potentially for free.

Conclusion

This article has combined a UX expert and heuristic review of ChatGPT with a broader discussion of how the product interacts with users, and what it might be able to do in the future.

Community Questions

Do you find ChatGPT easy to use? What needs improvement?
What do you think of this process for evaluating UX quality of a product?
Do you think personification would be a good or bad thing for them to add?
How could ChatGPT be leveraged into a group collaboration or education tool?

My opinions are my own and not related to any current or past employers. You should make your own design and investing decisions. I hope you find my ideas thought-provoking.