Net Promoter Score (NPS) is not harmful. Believing in silver bullets is.

A reflection triggered by Jared Spool’s article “NPS considered harmful (and what UX Professionals can do about it)”.

Published in

UX Collective

16 min readJan 1, 2018

If you throw the statement ‘considered harmful’ into Google Scholar you will find over 1 740 000 records. The first one seems to come from 1968. I didn’t have the time to go through all of them (obviously) but many are written in the spirit that something is seriously bad and needs to be abolished right away. I am having an impression that the phrase ‘considered harmful’ aims to evoke high emotional state that makes people choose the extremes. Say: ‘yeah’ or ‘nay’ to things. See the world in black and white.

Prof. Dave Snowden, the author of the Cynefin framework, who has spent decades dancing with the subject of complexity recently said:

— “Dichotomy will be the death of thinking”.

It is hard not to recall his words when someone rings bells announcing a fire in the professional world of UX.

Let me be extremely clear here. I am not a fan of NPS. I don’t recommend it to my clients and whenever possible try to find other: more reliable and actionable ways to measure how customers feel about the brand and its offerings. But I realize that I can do that because at some point in time a given company decided to use NPS.

One more observation in this respect: although studies show that NPS doesn’t correlate with customer growth and loyalty, there seems to be a correlation between businesses which adopted NPS and a number of pro-customer changes within their services. So, although NPS may not impact the external world, it can be seen as impactful on the internal company culture.

Evolution not revolution

I believe that the development of customer-centricity is an evolutionary process and we are at the beginning of it. I clearly remember the early 2000 when the HCI professionals met at conferences such as CHI and cried on each others’ shoulders complaining how business never wanted to listen to them. How they were the last to hire and the first to fire.

We came a long way in the last 20 years. The need for usability and customer experience is broadly recognized. The fashion for Design Thinking approach is spreading like wild fire. Designers and design researchers are more and more valued as business collaborators. Things are shaping our way.

Let’s for a moment assume that I am correct in stating that customer-centricity is an evolutionary process where certain characteristics of successful businesses inherit over generations. If we further assume that customer-centricity is the correct direction in the development of business, we can then say that businesses which keep on believing in financial game as the choice of a successful business strategy are more likely to become extinct, while those seeing customer-centricity as a potential for growth will get through the process of anagenesis (a change within a given species). Yet, it is not going to happen over night, does it? Evolution happens in steps. What would happen if we chose to see NPS as one of the steps on this evolutionary ladder?

Rarely discussed about reasons for adopting NPS

Implementing NPS could be generally considered a good sign: a sign that a business previously focused on financial or technological aspects started realizing that there is a human at the receiving end who is going to use, and love or hate their product. Such a realization makes the C-level managers start breaking their minds on how to refocus their employees to include the customer perspective into their work.

If you talk to these managers they quite openly admit that the world of experience design is alien to them. They have never been trained in their Management Schools and MBA programs to unpack the psychological, sociological and philosophical aspects of how people perceive the world and how they engage with it through the offerings they choose. Those managers in vast majority don’t speak design and don’t have a clue what design approach is. Often to them the idea is simple: let’s fix what doesn’t work and all will be all right.

Those managers want a hammer, a lever that looks similar to what they understand and use. Typically their toolbox consists of various sorts of numeric KPIs (eBIDTA, ARPU, GROSSADS, etc.). As majority of businesses run based on quantitative data, it feels comfortable to add one more number to the pot. It feels like it is comparing apples with apples and not with carrots.

Note, that top managers who are educated in UX or CX rarely choose NPS as a measurement tool. They understand its shortcomings and often turn into other (often qualitative) methods and alternative to the NPS CX-centric KPIs.

NPS triggers in CX-inexperienced managers all the right buttons — it is strongly skewed towards the positive end. Many managers get on the highest alert hearing that only 9-s and 10-s are seen as promoters and 8-s and 7-s are neutrals. Having a pretty long scale on how much you are failing to deliver on the customer front feels powerful. So the idea behind NPS feels right on the surface. The problems begin with its implementation in the organization. But before we get to it, let’s unpack a few things about NPS.

Relationship and transactional NPS

There are two sorts of NPS: a relationship measure that tries to capture the warmth of the feelings customers have towards a company over regular periods of time and a transactional one attempting to understand how those feelings fluctuate at each touch-point.

Relationship NPS has some obvious advantages: it is easy and fast to run, and it allows for benchmarking against competitors. It can be seen as putting a hand over child’s forehead to check if she develops a fever. Certainly, such a measure it hardly reliable as to all the causes of the possible disease but it could identify first signs that things might go wrong.

Yet, relationship NPS rarely exists of its own as it is seen as non-actionable. It is often accompanied by its younger brother: a transactional NPS. Transactional NPS asks the very same question (“How likely are you to recommend…”) with respect to every touch-point. So, you might get asked how likely you are to recommend a company based on the last bank transfer or the last invoice. There is a number of problems with it.

First, getting such surveys over an over again is annoying. As much as getting constant feedback is important to the business, it is irritating (in the least) to the customers. A study, we did with one client showed, that after repeating the transactional NPS to the same set of clients after three months, the results were significantly lower for the entire population. The touch-point itself hasn’t changed in a way that would explain such a result. We called a number of customers and heard that they didn’t see sense to answer again the same question about the same thing. As they didn’t have another means to express their nuisance they used the lowered score to do so.

The second problem with transactional NPS regards the fact that you never know whether you got a result from a particular transaction or if it is a generic expression of the attitude towards a company. We did a number of analyses of the qualitative feedback left behind in the transactional NPS surveys. Approximately 40% of them relate to the overall feelings towards the company. If you realize that only 10% of the data set is accompanied with the qualitative feedback, it is not hard to see that perhaps the collected data doesn’t refer to what it should.

The third problem relates to the internal complexity of each touch-point. If you come to a shop, there is quite a large number of factors that influence your perception. You might be disappointed by the contrast between the marketing promise and the actual terms of the offering. You might meet a consultant that is just not right for you. Or you might have quarreled with your partner in the morning and nothing anyone can do is going to make you feel better. An answer from the NPS survey is an untangled amalgam of all these feelings. So, in fact, you don’t know what you just measured — perhaps even the impact of the weather on peoples’ mood.

The next problem regards the time of sending the survey. There is an on-going discussion what the best moment to ask about a given touch-point is. Some people believe it should happen right away, others that the experience should sink in, so the survey question can pop up no earlier than a week or two weeks later. The answer to this question should be simple as the phenomenon of global and episodic experiences is well described but somehow there is a confusion around it. Let me unpack it. Schwarz, Kahneman and Xu say:

“When people describe their current feelings, the feelings themselves are accessible to introspection, allowing for accurate reports on the basis of experiential information. […] Once the feeling dissipated, the affective experiences need to be reconstructed on the basis of episodic or semantic information. When the report pertains to a specific recent episode, people can draw on episodic memory, retrieving specific moments and details of the recent past. In contrast, global reports of past feelings are based on semantic knowledge. […] The actual experience does not figure prominently in these global reports because the experience itself is no longer accessible to introspection.

So, in other words, if we ask a question about an experience during the experience itself or right after, we have a chance to capture the actual emotional reaction to it. If we ask it at a later moment, we will get the report summarizing all past experiences with the offering making it impossible to dissect the actual impressions of a particular event.

Last but not least, transactional NPS doesn’t capture the data from customers who weren’t served. If you came to a bank or insurance branch and left annoyed without getting what you came for, there is no way your impressions would be included into the data set. And let’s be honest — customers who weren't served are the most dissatisfied ones.

Thus, there is a number of problems with both types on the NPS measurement that one needs to be aware of when choosing it as tool for assessing the quality of customer experience.

The NPS and the Likert scales

Now let’s discuss the NPS scale.

Typically survey scales known as Likert scales, named after their creator in 1932, consist of 3, 5 or 7-choice points. The larger the number of the choice points the more sensitive the scale is considered.

In some surveys sensitivity is a good thing and in others it is not. For example, if you want to measure an emotional well-being of a depressed person a three-point scale might not be the most effective tool. On the other hand, if you want someone to assess how clean a room was, providing a 7-point scale might be too much.

Importantly, Likert has assumed that the distance between the choice points on the scale is equal. Yet, for many people the distances between the different impressions are not the same. So, equal distribution of the items on a scale suggests that there is a relative position of choices on the scale, which may not be true.

Let’s now talk about the 10-point scale. It is interesting to realize that the ‘Decile’ scale was applied in 1923 by M. Freyd even before the Likert scale was formulated. It consisted of a number of statements corresponding to different levels of construct ‘strength’ along the numbers from 0 – 10 (sounds familiar? Yeah, this is the length of the NPS scale). The decimal numbering system was considered as intuitive and easy to follow for people. Freyd then introduced a ‘Graphic rating method’ in the following form:

He recommended scoring the responses by dividing the line into 10 or 20 equal intervals but not anchoring the responses on a given point of the scale itself. So, in a way he made the scoring continuous rather than defined and also the scale itself gave more space for individual variety in rating. Scale flexibility enables collection of discriminative data particularly in the cases where results are skewed.

Throughout years, such scales were often constructed with only two end-points. It is interesting to know that research shows that the end-defined format of a scale (like NPS) appears not to bias the data when comparing to the results gathered using the Likert scale with all points defined.

What does it all mean for the NPS scale? First of all, it is not uncommon in the scientific community to have a scale that reaches 10 or more points and has the defined ends only rather than all the points on the scale. It is also not unusual to have a scale that is skewed in one direction. The skew determines the way in which the data should be calculated. If assuming that the NPS aims to collect a large numbers of data points, its calculation is likely to differ from the traditional mean and median calculation that is typical for the Likert scale. Not being a statistician, I don’t dare to go any deeper into this subject but I would like to point out that the issue of NPS calculation is not as black and white as it may appear on the surface.

So, if the scale construction generally makes sense, where is the problem?

NPS as a KPI

The biggest problem with NPS regards, in my perception, its use as a KPI (Key Performance Indicator). There is a law from 1975 named after the economist Charles Goodhart that points at the problem at hand:

“When a measure becomes a target, it ceases to be a good measure.”

Goodhart further said that: — “Any observed statistical regularity will tend to collapse once pressure is placed upon it for control purposes” while criticizing Margaret Thatcher for trying to determine the monetary policy for The United Kingdom based on the targets for broad and narrow money.

In other words, when any measure (be it: NPS, CSI, FCR) becomes a performance target, it stops being a measurement tool and becomes a political weapon. Much like polls. It is used to enforce certain actions, dedicate or take away budgets and influence. It is turned into a number that can be gamed (and it is easier to play one number than to play many).

I remember a client story that illustrates that phenomenon. A unreasonably high NPS value was set as a KPI for its sales branches. The consultants knew that the score they receive might vary based on the things they do not have much impact on. And getting 9-s and 8-s was not really an option if they wanted to get a bonus. So, they printed an A4 page saying: “If you liked how I have served you, please give me a 10” accompanied with a humongous smiley face. Once you opened the folder with your contract and saw this sheet of paper, there was no way you wouldn’t smile. And give a 10.

In such a way the consultants gamed the NPS and got to the required target. Did it measure what it was supposed to measure? I don’t think so. But the KPI was achieved and everybody could sleep peacefully.

NPS as a silver bullet

The attraction of the one number in the NPS score has another major flaw. It is easy to see it as a silver bullet of Customer Experience strategy. Sure, NPS captures a large number of business aspects. But when it goes down, it is hard to point out what went wrong. Therefore, it requires to be expanded with a set of indicators that are, actually, actionable. Hence, this is why the relationship NPS is often accompanied with the transactional NPS. Having both measures offers an illusion that one can capture the overall attitude and the other detect the points of failure. And as I have explained earlier, this is not quite true.

NPS as a direction

It is also crucial to realize that NPS is a benchmarking tool showing only how one business performs relative to another. It doesn’t show whether this is a battle among offerings that truly delight clients. It may just be an indication of the best choice among many bad ones.

NPS only pretends to be a proactive, strategic tool. In fact it is an evaluative assessment mainly focused on the usability aspects of a given offering. NPS is a measurement if you don’t know what exactly you want to measure. So you shoot out hoping that the answers will tell you what to do.

Using NPS as a tool for setting direction for CX suffers from one more serious consequence: positive adaptation. Positive adaptation is defined as an our ability to successfully adapt to life tasks in the face of adverse conditions. Positive adaptation helps us to bounce back from a negative experience with competent functioning.

Ok, what does it have to do with CX? As much as we are able to adapt to negative conditions, the same ability regards adapting to the positive conditions as well. In other words, something that one day felt extraordinary in a service or a product, the next day becomes a baseline. Therefore, reactively following the outcomes of the NPS surveys is like running after a rabbit with zero chance to catch it.

Is there then a way to measure CX?

Let’s, first, dissect what, CX means. There is probably an array of definitions but my personal favourite (following research work of Marc Hassenzahl) boils down to satisfying two types of needs: pragmatic and emotional.

Pragmatic needs are those related to functionality and usability — they address the quantifiable aspects such as: effectiveness, efficiency, understandability, learnability, etc. There is a bunch or research tools and methods broadly available that measure the success rate in this respect.

The tougher task is to address the emotional needs which are subjective, situated and temporal. What’s even harder it is also a matter of choice as how you define what emotional needs you want to satisfy. Because business is very much like humans: it is not quite that easy to be smart, beautiful, rich and…

Thus, to satisfy emotional needs, each business needs to choose what qualities will define its character. Does it want to be edgy and fashionable or perhaps homey and familiar? Defining business character (not in the marketing but in the behavioural sense) is hard — it demands to make choices as to whom you want to cater. It doesn’t allow to stay in the middle among other indistinguishable businesses measured only by NPS. It demands to opt for some traits and opt out from other ones. Once a business chooses which edges it wants to embrace, a measurement tool can be created to check the extent to which the defined character complies with the perceptions of customers.

Such measurement tools are unique for each business and need to be construed according to the defined strategic qualities. They help to stay unique on the market but at the same time they are not able to serve as a benchmark. So, NPS might prove useful as a complementary tool setting the forever moving baseline for staying in the game of CX.

Experience is predominantly qualitative

It is true that experiences are in their nature qualitative and therefore the nature of research tools that measure it is likely to be qualitative. Which doesn’t mean that it cannot be traceable.

There is a methodology developed decades ago by Tom Gilb termed EVO that aims to measure even things that seem unmeasurable. Methods such as EVO give an opportunity to propose a new approach in measuring CX that will be satisfying both for UX and CX designers and for the C-level managers who want to be convinced that their decisions are optimal in a business sense.

Tom Gilb at TEDx Trondheim

If fate gives you lemons, make a lemonade

It may sound that I am trying to say that if we have NPS, we should use it. And I guess, I am saying just that. Because it is easy to stay righteous and say that NPS is a bad, bad measurement tool. It might make many of us feel superior but it will not progress the value of UX and CX for business. I don’t believe that any C-level manager who has invested millions into setting up an NPS measurement structure will want to listen to such a stance.

So, if we stop looking at NPS as either good or bad and start seeing it as an evolutionary step, a whole new perspective opens up. A perspective that gives us space to build on the NPS structure and propose new tools and methods that make businesses find alternatives for setting their CX-related KPIs. Because, I guess we all agree that such KPIs are a good idea, right?

If we agree with the assumption that many top level managers are relatively unaware as to what designing for experience is, the first step might be to use the NPS scores to explain to them what Customer Experience can be divided into. I am a great fan of a method called: Love and Divorce Letters, which captures sentiment towards a brand that can be linked to the Relationship NPS score. We have run such a study for a number of clients and it certainly opens up a space to discuss what kind of customer experience a company would like to deliver. And once we define that, we can start building CX and UX measures that are truly impactful in the ways we envision, making NPS what it is made for: a baseline benchmarking tool.

P.S. [1] The article “Go To Statement Considered Harmful” by Edsger W. Dijkstra written in 1968 called fro an abolishment of “goto” statements in programming languages. And in this particular case he was right to ring the bell. And he succeeded.

P.S. [2] The first time I encountered the phrase ‘considered harmful’ in HCI was in the context of a very influential article written by Bill Buxton and Soul Greenberg under the title “Usability evaluation considered harmful (some of the time)”. Its original title was: “ Usability evaluation considered harmful”. The title stirred a lot of emotions in the review committee and a major discussion in the HCI community about the value of usability evaluation. And once more it has turned out that the issue is not black and white. That for some goals usability evaluation is good and for others it is not. The authors agreed with the point raised and changed the title and add the statement: some of the time. It is to me another example of unnecessary dichotomy, where thinking of the value of usability evaluation as a continuum might be more effective.

P.S. [3] This article was triggered by the discussion around Jared’s article on our Polish UX Forum: Facebook group Usability.pl. I am most grateful for it as it forced me to finally put in a row all my unbrushed (and sometimes even in my own head perceived as contradictory) thoughts about NPS.

P.S. [4] My easiness in accepting the degrees of failure on the NPS scale might stem from Dutch grading system that is build on a very similar scale: 9 and 10 mean that you did very well, 8 and 7 indicate a good level of knowledge, 6 means that you barely slipped through. Anything below 6 marks the degree to which you failed. Sounds familiar?

__________________________________________________________

Aga Szóstek, PhD is an experience designer with over 19 years of practice in both academic and business world. She is an author of “The Umami Strategy: stand out by mixing business with experience design”, a creator of tools supporting designers in the ideation process: Seed Cards and the co-host in the Catching The Next Wave podcast.