The importance of motivation in usability testing

Stuart Reeves
UX Collective
Published in
12 min readJan 7, 2019

--

Testing is a totemic feature of the technological era. Digital things in particular, with their ever increasing complexity, have exacerbated and accelerated the need to test. A great example of this has been the growth and widespread adoption of usability testing — sometimes called user testing (cf. https://twitter.com/mulegirl/status/1107082527578058752).

Over 30+ years this kind of testing has come to feature centrally in human-centred approaches that tackle digital product and service design. It’s one of the Human-Computer Interaction (HCI) research field’s biggest impacts. And yet we don’t know much about how usability testing is actually practiced by UX and design professionals in industry.

This is a problem.

No matter what unique selling point UX and design work might be pitched on, something resembling ‘testing’ seems to be a pervasively adopted, deeply routine part of that work. This lack of knowledge about usability testing is in spite of testing’s importance, and in spite of the vast body of HCI research literature on usability evaluation.

To address this issue I’ve been video recording what happens in industry usability labs. My research is primarily targeted at academic HCI communities, but it might have some interesting aspects for UX and design practitioners, which is why I chose to write this translation of a recent paper.

First, here’s a brief summary before I delve deeper:

  1. Participants’ usability problems are not necessarily the same as usability problems articulated by evaluators. So, although testing environments in UX and design are primarily set up to surface problems, there is a complex relationship between what is treated as a problem by participants and evaluators, and what ultimately becomes a test finding.
  2. For a variety of reasons, usability problems are only problems because they are seen as such; they can just as easily be made to ‘go away’ if there is reason to do so. This is not necessarily something to do with ‘bad practice’ on the part of evaluators but instead about the role of motivated (and skilled) observation by them.
  3. The net result of points (1) and (2) is that the test is itself a feature of dealing with usability problems. Working out where to draw the line in terms of ‘what counts’ and what doesn’t is an ongoing challenge for evaluators rather than something established rigidly a priori. Concerns for this fit into the motivated way evaluators look at the test.

If you are looking for key principles that connect the above points, then they are as follows: a) usability problems are not intrinsic to the object under test; b) there is a motivatedness to testing as it is undertaken by its participants and stakeholders, a motivatedness that shapes findings. Perhaps these seem obvious but there is a persistent tendency both in the academic and practitioner literatures to not only treat usability problems as inherent to artefacts but at the same time strip out motivatedness as a critical feature of testing’s work.

More on this later.

Usability testing in action

Here I present three examples drawn from my video recordings of testing (drawn from a much broader multi-site study). The particular UX agency here is involved in a website redesign project and are testing a prototype. Particular focus is on the part of the site that deals with complaints from members of the public. Six people are watching the usability testing sessions that are taking place in an adjacent room: a couple of the agency’s UX researchers, the client who has commissioned the testing, and three (external) designers who developed the prototype for the client. The video and audio of the moderator and participants is being streamed live into the observation room where they all sit.

Looking for trouble: Participants’ problems ≠ evaluators’ problems (Ex. 1)

We’ll start by looking at how usability problems are noticed by the team watching (transcripts have been simplified). Here the moderator has asked the participant to fill in a complaints form. She is now at the stage that involves a login element.

Participant: “So I’m guessing that’s done and I continue. Noooo, log in or register, no don’t want that. That’s a bit annoying. ((turns to Moderator)) I hate logging in and registering to every[thing], so if I made a complaint I’d rather you just asked for my details and then contact me rather than me have to log in as if I’m gonna make a complaint every week. I think logging in and registering makes me feel like I’m a regular user ((laughing))! And so I’ll be complaining about something every five minutes.”

Stakeholders watching the test while the participant produces a response to the login stage of the form

Obviously, this kind of response to a login form is highly familiar. The participant here resists being made to feel — categorised as — a “regular user” of the site. The reason for her resistance to the login process is the implication that she might also be seen as a “regular complainer” — not good.

Immediately after this moment, the stakeholders in the observation room watching this response from the participant then discuss it. The client begins:

Client: We actually want to deter that, we don’t want people logging in [to] register a complaint every week cause [if people do] ten or so [complaints] like within in a short space of time we’d almost like kind of…

UX researcher: Ignore them.

Client: Yeah, but we’ll kind of just tell them ‘we can see your complaint’ […] in a polite way, cause we just haven’t got the resource to basically deal with people almost acting as like police themselves.

The interesting thing here is that the various stakeholders — the client, the UX researcher, etc. — work to transform the participants’ concern about being wrongly categorised as a complainer. Specifically the client shifts the problem away from discomfort reported by the participant, and instead towards the organisational “human resource” challenges of dealing with users “acting as police” — or “regular complainers” as we might call them.

The point here is that there is a distinction between a participant’s problem and stakeholders’ version of that problem. The evaluation of usability is about transforming and recontextualising problems as they are seen to emerge from the test. It’s not about ‘detecting’ them. (There is a whole other class of usability problems which are not seen to be emerging from ‘think aloud’ style reports from participants, but I can’t cover these here; see my paper instead.)

More broadly we learn here that usability problems emerging during testing don’t autonomously present themselves out of thin air as ‘findings’. It’s not a matter of simply ‘reading’ them off, like words on a page, or ‘detecting’ them like astronomical objects. Problems themselves have to be worked out and worked up by stakeholders into sculpted findings. They do this moment-by-moment by ‘looking for trouble’. They have to learn the right way of ‘seeing’ problems that participants might be encountering and reporting. Stakeholders have to bring problems to the attention of others, and they have to make a version of the problem relevant by transforming it in particular ways that take into account a whole range of contingent matters.

This characterisation of ‘detecting’ usability problems seems considerably more complex and involved than what I’d call the intrinsic view of usability adopted by the Neilsens and Molichs of the usability world. In this view, usability problems are more a matter of detecting and calibrating your equipment (i.e., the human evaluators) appropriately to successfully achieve correct identification. That’s what discussions about things like ‘how many users’ are about, or the reason why one might run Comparative Usability Evaluations (and then be a bit alarmed that they are not “reproducible”).

Getting rid of problems (Ex. 2)

Even if the distinction between participant’s problems and the version of them shaped up by evaluators seems obvious, it’s important to realise it has significant implications. For instance, it’s precisely because of this flexible relationship that it then becomes possible to make those problems dissolve entirely. In this next example we’ll look at this phenomenon.

Here, another participant is responding to a question from the moderator. The moderator has asked the participant about her overall impression of the website under test. The participant is talking about a page she is viewing that lists prior complaints; she offers a few criticisms of this page and others she has seen.

Participant: I would say it’s […] very, um wordy? And at times […] — what I was saying about the titles and things? — I think there could be more use of colour and stuff in those, just because it’s on your eyes […] it’s very white.

At this point the designers watching the test interject, to the shared acknowledgement / agreement of the UX researchers and client (not shown):

Designer 1: It’s just poor [information architecture]. ((turns to the others)) It’s a bit wireframey this […] still needs a bit of love.

Designer 2: It was pretty quick.

Stakeholders discussing the “wireframey” aspect of the prototype

In this way the participant’s problems with the site prototype in front of her — the wordiness, the absence of colour, and “whiteness” — are effectively done away with by the stakeholders. In this case, the designers in particular offer various reasons to discount the participant’s various reported problems:

  • the information architecture of the design is not realised well currently;
  • that prototypes are “wireframey” and thus are necessarily full of whitespace and wordy;
  • that it’s a prototype and as such still needs work done; and
  • that the prototype was constructed quickly.

These might look like ‘excuses’ but really most of this is just rehearsing familiar points about the nature of prototypes: naturally provisional, always incomplete, and often hastily put together. In other words, nothing to see here!

By stating in so many words that ‘it’s just a prototype’, it’s established by stakeholders that this is the ‘true’ source of the participant’s problems, and therefore not something they really need to worry about. In another time and place what the participant says might have been treated as significant, and led to a finding. But here, the stakeholders here feel the need to explicitly label the participant’s criticisms of the prototype as irrelevant (perhaps anticipating that they could be taken as relevant and heading this off) and in doing so ensure that the potential problem is dissolved.

Of course there are many other ways to get rid of problems surfaced in usability testing. For example, elsewhere in my video recordings, stakeholders might use the limitations of the test itself as a way to diffuse potential problems, but also to teach one another how to work out what might and might not be relevant.

This is what we’ll look at next.

The complexities of problem relevance: the test is a feature of the test (Ex. 3)

First let’s recap. In usability testing stakeholders are constantly trying to work out what of participants’ problems are relevant. But the way they gauge that relevance is contingent upon the unfolding test situation. What’s more, stakeholders work to transform those problems from participants’ problems into problems that fit with their concerns as stakeholders. So, usability problems really aren’t inherent in the artefact but do tend to look that way from a distance.

This third example sounds a note of caution over thinking that working out problem relevance is a simple binary distinction. (Elsewhere I’ve seen usability evaluation being treated as a binary classification problem.) It also examines how the situation actually includes the contours of the test itself, and that this can be important in determining relevance.

The same participant as Ex. 2 has been browsing around the site, having been asked by the moderator to locate what the website’s organisation “stood for”. While watching this unfold, one of the designers begins describing what he sees as a problem related to the participant’s browsing behaviour thus far.

Designer: She didn’t scroll down below the four icons did she? […] She just stopped every time.

Client: Could you do what you did with arrows to kind of show something down? […] I guess that adds stuff to the page but…

Designer: It’s just working on the spacing below the panel. Just quite a lot before you can see anything.

The designer’s comment about the “four icons” refers to a set of links to further information about the organisation. These icons are situated below a large banner (“panel”) at the top of the homepage. Having established this as a problem and then having discussed its possible solution (“arrows”, “spacing”, etc.), one of the UX researchers interjects:

UX researcher: I think it’s sometimes a bit artificial in user testing and in reality they would scroll down. Because they’re here the… sort of here we’re very focussed on doing the task.

UX researcher explaining how “it’s sometimes a bit artificial in user testing”

Ultimately the test report did make a recommendation on adjustments to the homepage banner size (it being thought too big), adjustments which were indeed carried through to the final implementation. As such this example sits in between Ex. 1 (spotting a problem successfully) and Ex. 2 (doing away with a potential problem). It’s neither one nor the other.

Something else is also going on here.

On first glance it might look like the UX researcher is attempting to shut this problem down and finding a reason to do away with it as in Ex. 2. But on closer inspection I don’t think that is what actually happens at all. Now, to be clear, I’m not interested in whether the UX researcher’s comments are justified or not (I’m sure UX practitioners would themselves debate such a point). Rather, I’m interested in how he tutorialises the problem being discussed by the designer and client, and instead turns it into a place to reflect on the nature of testing itself.

This is a part and parcel activity of running a usability test: attending to issues about what the test is meant to be doing for its stakeholders, what is part of the test and what is not, and, as in this case, what is to be counted as ‘normal’ participant behaviour and what instead might be ‘test’ behaviour (and therefore potentially misleading).

How is this connected with the stakeholders determining the relevance of participants’ problems that they witness during the test? Doing away with a problem — dissolving it — is always a possibility, but you need to develop skill in doing so. On the one hand you might be overly defensive and seek to dismiss problems by various means, as I outlined in the previous section. Or, you might be overly receptive to any and all possible problems. The skill then is developing a balanced approach to seeing participants’ problems that is also sensitive to all the contingencies I’ve talked about. So, usability evaluators need to develop critical ways of treating possible usability problems as they emerge, and a key part of that is being aware, reflexively, of the thing that frames this: the parameters of the usability test. Those evaluators are working to find the sense of the potential usability problem while taking into account the test circumstances they are in.

So is a usability problem intrinsic to the thing being tested?

You might think that doing evaluations or usability testing is about detecting problems that are somehow intrinsic to designed artefacts — that issues of usability are somehow inherent to the design inscribed in the object. But the study here suggests otherwise.

I’m not saying there is anything fundamentally, inherently ‘broken’ with the procedures of usability testing (they certainly can always be improved, though). They pretty much work, and reliably so, to locate and document certain kinds of design problems and opportunities in digital systems. Putting such a system in front of a potential ‘user’ and getting them to ‘have a go’ is an incredibly powerful tool.

Rather, instead of viewing a test as a kind of science experiment, we should instead see it as a complex social interaction that is hellbent on collaboratively working to produce findings.

Conclusion

Academic HCI has spent a huge amount of time improving our knowledge of evaluation methods (e.g., ‘how many users’ type questions, comparative usability evaluations). But this is simply not the same as the ways in which testing gets done in practice. This matters because many product and service outcomes are directly influenced by testing and what goes into testing impacts those outcomes.

Better understanding of UX and design’s adoption of usability testing is important for at least a couple of reasons. Firstly, academic HCI research needs to better understand what industry professionals do if it wants to live up to its ideals of impacting “researchers and practitioners” (a phrase frequently tossed about in HCI papers). Secondly, by looking at professionals’ everyday work practices, we can learn about important concepts at play in the course of testing that might have been overlooked. Those concepts can possibly be useful for UX and design practitioners to reflect on what might be taking place in their own practices. And, of course, it also might be useful for HCI researchers trying to develop concrete and appropriate novel approaches to evaluation that they intend to be adopted in practice.

If you want to know more, you can read my ACM Transactions on Computer-Human Interaction article “How UX practitioners produce findings in usability testing”. (In this post I’ve talked mostly about “problems” instead of “troubles” which is language used in the paper for particular reasons.)

Stuart’s research is supported by the UK Engineering and Physical Sciences Research Council [grant numbers EP/M02315X/1, EP/K025848/1].

--

--

Academic at School of Computer Science, University of Nottingham, UK. I do research on human-computer interaction. http://www.cs.nott.ac.uk/~str