Don’t build trust with AI, calibrate it

Designing AI systems with the right level of trust

Published in

UX Collective

6 min readAug 30, 2023

AI operates on probabilities and uncertainties. Whether it’s object recognition, price prediction, or a Large Language Model (LLM), AI can make mistakes.

The right amount of trust is a key ingredient in a successful AI-empowered system. Users should trust the AI enough to extract value from the tech, but not so much that they’re blinded to its potential errors.

When designing for AI, we should carefully calibrate trust instead of making users rely on the system blindly.

Diagram visualising calibrated trust in the middle of the weights: not too much and not too little — Users should trust the AI enough to extract value from the tech, but not so much that they’re blinded to its potential errors.

How people trust AI systems

According to Harvard researchers’ meta-analysis, people often place too much trust in AI systems when presented with final predictions. And just providing explanations for AI predictions doesn’t solve the problem. It serves as a signal for AI competence rather than drawing attention to AI mistakes.

Sure, a few wrong turns won’t ruin your journey if you’re just seeking movie or music recommendations. But in critical decision-making situations, this approach can make users “co-dependent on AI and susceptible to AI mistakes.”

The trust issue becomes even more critical with LLM chatbots. These interactions naturally create more trust and even emotional connection.

Evaluate risk

Every person involved in the creation of AI at any step is responsible for its impact. The first thing that AI designers have to do is to understand the risk of their solution.

How critical is it if AI makes a mistake, and how likely is it to happen?

Graph showing that high-stake risks that are likely to happen are dangerous and should be mitigated before an AI solution can be used. — Assess the risks and mitigate the risks of AI implementation

If the available training data is biased, messy, subjective, or discriminatory it can almost certainly lead to harmful results. The team's first priority, in that case, should be to prevent this.

For example, a recidivism risk assessment algorithm used in some US states is biased against black people. It predicts a score of the likelihood of committing a future crime and often rates black people higher. This algorithm is trained on data containing years of human mistakes in biased decisions in the justice system.

Take into account that LLM chatbots, with their conversation-like interaction, can be harder to control than single-answer solutions with more restrictive templates.

The risk also depends on the situation and the people who are supposed to use the system. While doctors have the expertise to take some medical AI suggestions into account, it would be dangerous to let teenagers seek mental support in AI. Snapchat, we are looking at you.

Set expectations

Misconceptions about AI are common. Some users may trust it too much, while others may be too skeptical.

Be clear about the system’s limits and capabilities. Focus on the benefits that users get, not the technology. Let people know what they can (and can’t) expect from the AI.

An example of bad and good communication of AI capabilities. Bad example: “Meet your personal AI nutritionist who will make a nutrition plan that works for you. Ask her anything about your diet!”. Good example: “Your AI nutrition plan is built based on your unique physiology, food preferences, dietary goals, and ongoing progress.” — Present clearly AI capabilities and don’t create false authority

If the content is generated by LLM, it’s a good practice to be clear about that too.

AI notice in Microsoft Business Chat that warns users that AI-generated content may be incorrect — AI notice in Microsoft Business Chat

Make your system explainable

AI should be designed so people can easily understand its decision process. It’s about explaining the decision process, not just justifying it.

Data behind recommendations

When dealing with sensitive decisions, AI must provide clear explanations of recommendations, including the data used, and the reasoning behind them.

Avoid overwhelming users with too much information at once. Consider progressive disclosure when designing for explainability.

Business Chat shows a summary of developments related to Fabrikam to the user. The first point it generates in the summary is Mona sent an urgent email saying there is a problem with the lead time in the agreement. To ground this point, it shows an excerpt from the email 2023 Fabrikam Agreement that contains the text there is a problem with the lead time in the agreement. — Microsoft Business Chat makes clear why it did what it did by providing a local explanation when it shows an excerpt from a relevant email that it used to ground its response.

Identify the moments where users need to use their own judgment because of the lack of data.

An example of an AI dietary app with an alert that it lacks allergy information to consider it in the suggestions

Confidence display

Instead of saying why or how we can show how certain the AI is about its prediction and the alternatives it considered.

The confidence can be presented as a numerical value, as a category, or as alternative suggestions.

The confidence level for the recommendations can be displayed in various ways: by category, with the numerical value, or by presenting less confident options as alternatives

If confidence levels don’t impact user decision-making, it’s okay to leave them out.

LLM explainability

Many common design patterns for trust calibration like confidence level don’t work with LLM. Explainability is not a strong point of LLM either. The answers should be grounded on data, but there is still a chance that the provided data is interpreted wrong or the important information is missing.

Make sure users can verify the answers with citations and sources.

An absence of a comprehensive explanation

Sometimes, users won’t have access to the full decision process, such as with financial investment algorithms.

There also may be legal, and ethical considerations for collecting and communicating about data sources used in AI. Be careful when deciding what information can be shared.

In this case, the best tactic is to be transparent about it.

Empowering Users with Control

Let users control the situation

In some situations human control is crucial. For example, if they are related to safety or health.

In such cases, Harvard researchers suggest focusing on helping the people make their decisions, designing for the process leading to this decision. Instead of focusing on providing users with AI end decisions. Even if they are explainable and interpretable: end suggestions add a cognitive load in critical situations.

An example of AI usage in a medical context. While it might be dangerous to let AI make any diagnosis, it can be helpful in building situation awareness by retrieving relevant information from patients’ medical history — In critical situations, AI can be helpful by supporting users with situational awareness rather than suggesting final solutions

For example, in aviation, the team of researchers found that shifting the goal of decision-support tools more toward understanding the situation and less on the decision itself may increase trust and reliance.

Another technique is to put a human expert in the middle. For example, instead of giving the predictions directly to the customers, let an expert in the company decide how to use AI predictions and what to tell the customer.

Let users control AI outputs

The interaction with AI doesn’t have to be a one-way street. Users can, and should, have the ability to steer the AI’s output to better suit their needs. Here are a few possible techniques:

Changing preferences
With preference settings, users can tailor the AI’s responses to match their likes and dislikes. Let users tweak and fine-tune what they want from the AI. It’s important for AI to react to those changes to keep users' trust.

Google Maps adjusts the route suggestions based on user preferences

Giving feedback and rating AI suggestions
By giving feedback or rating AI’s performance, users should get the capability to adapt AI systems to their needs and preferences.

Screenshot of a block of text with one highlighted word. A Topic card is displayed next to the highlighted word In addition to information about the topic (description, related people) it has a block that reads: "Mark 8 was highlighted because it was automatically identified as a topic of interest to your organization. Should it be highlighted? Yes. No. Not sure. — Viva Topics encourages granular feedback on individual system outputs by asking users to indicate whether a topic highlighted on a page is indeed a topic of interest. It also makes clear why the system did what it did by explaining why the topic was highlighted.

Opting out
Making it possible to opt out of AI suggestions increases users’ freedom and control. Imagine a navigation app nudging a driver onto a faster route while they’re juggling passengers and traffic. It might be helpful, but dangerous as well.

Outlook tool tip box with the text: "Text Predictions. Outlook gives you writing suggestions as you type. Click "Don’t show test predictions" if you no longer want to see text predictions. Or go do File>Options>Mail>Compose Messages. Learn more. Below the text are 2 buttons: Don’t show predictions; and Got it. — Microsoft Outloook’s Text Predictions makes clear what the system can do by providing an introductory blurb. It also provides global controls by providing the option to turn off Text Predictions.

Monitoring and Gathering Feedback

Regular check-ins on how our AI is performing, coupled with real conversations with the users are vital for successful implementation. The monitoring includes user behavior metrics as well as explicit feedback.

It’s not just about creating AI; it’s about evolving it, shaping it, and making sure it fits in where it’s needed. In the end, the aim is to keep improving, adapting, and ensuring that our AI-enabled tools are truly empowering the ones they’re designed for.

Sources:

People + AI playbook, Google
Guidelines for Human-AI Interaction, Microsoft
UX: Designing for Copilot, Microsoft
Design for AI, IBM
Zana Buçinca, Alexandra Chouldechova, Jennifer Wortman Vaughan, and Krzysztof Z. Gajos. 2022. “Beyond End Predictions: Stop Putting Machine Learning First and Design Human-Centered AI for Decision Support.”
Cara Storath, Zelun Tony Zhang, Yuanting Liu, and Heinrich Hussmann. “Building trust by supporting situation awareness: Exploring pilots’ design requirements for decision support tools.”
Algorithms Were Supposed to Reduce Bias in Criminal Justice — Do They?, Boston University
Angwin,Jeff, J. 2016. Machine Bias. ProPublica‌‌