UX Metrics: Why not measure aesthetics?

Extending proven user experience measures to account for aesthetic usability effect.

Published in

UX Collective

6 min readJan 10, 2019

As designers, we conduct user research, create prototypes, run usability tests, and the rest, all in hopes that it will result in products that deliver great experiences for users. But how do we know if we’re making a difference?

As expert on management practices Peter Drucker is purported to have said,

“If you can’t measure it, you can’t improve it.”

And so… we measure, in hopes of continually improving UX. We use an array of mechanisms to determine if the prototypes we create and products we ship provide more value, and are more enjoyable to use than previous versions, or than their competition. But are we measuring the right things?

System Usability Scale

One tried and true UX measurement tool, which many readers are probably familiar with, is the System Usability Scale (SUS). SUS has been around since the 80s, and is widely regarded as a solid indicator of a user’s subjective assessment of a product’s usability. Unfortunately, it does have downsides.

For one, it requires 10 responses in total. While that might not seem excessive, this number can cause users to quit before finishing, or rush through without thoroughly considering their response to each statement, leading to inaccurate answers that contaminate your results. This is especially true when SUS is added (often at the end) of a larger survey, usability study, or user research questionnaire —when respondents may hit a decision fatigue wall.

Also, while specific SUS statement wording varies between versions, they often alternate between positive and negative tone; for example:

I think that I would like to use this system frequently. (positive)
I found the system unnecessarily complex. (negative)

This has a few problematic side effects. For one, research has demonstrated use of mixed tone in SUS has little if any practical value (leading to versions of SUS in all positive tone). On top of that, if skimmed and not read carefully, users may incorrectly score their relative agreement or disagreement with each statement, because they accidentally apply the same tone (usually positive) to all questions. Finally, the alternate positive/negative wording requires some awkward number crunching to calculate the overall SUS score.

In brief, SUS is a great tool that has its shortcomings.

Is there a simpler way?

UMUX-Lite says YES! It reduces the 10 SUS statements to just two:

[This system’s] capabilities meet my requirements.
[This system] is easy to use.

The first statement measures perceived utility: Does this deliver value to me?, Is it useful to me? The second measures usability: Can I figure it out? Does it make sense to me?

Fantastic! We can measure UX with just two responses. But wait… what about aesthetics?

There’s compelling evidence showing that the aesthetics of a product influence perception of user experience. So why not measure it?

Enter the UUP Score

To account for the influence of aesthetics on overall UX ratings, I extended UMUX-Lite, incorporating a third question to account for a user’s aesthetic opinion. The result is the UUP Score (Utility, Usability, and Presentation Score — pronounced “up score”).

UUP Score

[This system’s] capabilities meet my requirements.
[This system] is easy to use.
[This system] is aesthetically appealing.

The aesthetic statement focuses on the look and feel of a product. It asks respondents to judge if it’s appealing to them and if its style is appropriate for the context. Aesthetically appealing for a financial system will obviously be very different than for a children’s game.

All statements are scored on a 0 to 5 scale, where 0 means you strongly disagree, and 5 means you strongly agree. This gives a clear indication of user perception of utility, usability, and presentation. Individually, each can provide useful insights, but how do the three measures work together to give us an overall user experience score?

Example UUP Score matrix as it may appear in a survey tool, with numeric values for each option.

UUP Score is best gathered via a survey tool. This is true even for in person or moderated research. Due to its matrix format, it does not lend itself to verbal question and response. A survey tool can also allow you to assign numeric values to responses, simplifying processing of results from multiple participants.

Determining total score

We could calculate a total UX rating by simply averaging the three scores, but are utility, usability, and presentation really equal?

If a product lacks utility (i.e. I have no use for it), usability and presentation don’t matter much. For instance, if I really need to drive a nail in, I’ll take an awkward and ugly hammer over the best screwdriver in the world.

Similarly, utility is more important than usability. Fred Davis (creator of the Technology Acceptance Model) showed that usefulness is 1.5 times more important than ease of use as a predictor of actual product usage.

To account for this imbalance between the three measures, the total UX rating is calculated by weighting the results of each individual item, such that: utility is worth 3x, usability 2x, and presentation 1x.

Expressed as an equation (resulting in a score from 0 to 5):

Total = ((Utility * 3) + (Usability * 2) + Presentation ) / 6

Utility, Usability, and Aesthetics Pyramid — Utility, Usability, and Presentation Pyramid

Score targets

Individual respondents answers of 2 or below are negative, and 3 and above positive — with 4 or 5 considered on target for a design.

When looking at average scores for individual statement responses or the calculated total UX score across a set of participants, ranges are as follows:

POOR (<=2): Any ratings in this range call for further investigation to resolve the root cause of such a low score.
NEEDS WORK (2.1 to 3.4): Not negative, but not that positive either.
GOOD (3.5 to 4.4): 3.5 and above is on track with results you want to see.
EXCELLENT (4.5 to 5): 4.5 indicates exceptional user satisfaction.

This graph illustrates score targets averaged across a group of respondents.

Target ranges for averaged UUA scores. — Target ranges for averaged UUP scores.

Presentation matters, but only so much

UMUX-Lite ignores measuring aesthetics completely, which (given the Aesthetic-Usability Effect) may be problematic. That said, beauty isn’t everything. We need utility and usability from our products — arguably more than we need stunning looks.

The UUP Score strives to give presentation, usability, and utility a properly balanced influence on our measurements of user experience, and to do so in a short, simple format that users are likely to respond to.

Application of UUP

As a follow up to this article, in Sept ’22 I wrote 3 practical uses of the UUP UX metric — Measuring perceptions and acting on findings, which into depth about specific ways to apply UUP in product design. Check it out for ideas on how to incorporate UUP UX metrics into your process.

Along with the articles and supporting research linked above, I especially want to thank Didier Chincholle, Jeff Patton, and Catherine Chiodo, who each shared unique insights with me that greatly influenced and clarified my thoughts on this topic.