When are we going to start designing AI with purpose?

For an industry that prides itself on moving fast, the tech community has been remarkably slow to adapt to the differences of designing with AI. Machine learning is an intrinsically fuzzy science, yet when it inevitably returns unpredictable results, we tend to react like it’s a puzzle to be solved; believing that with enough algorithmic brilliance, we can eventually fit all the pieces into place and render something approaching objective truth. But objectivity and truth are often far afield from the true promise of AI, as we’ll soon discuss.

Josh Lovejoy
UX Collective

--

I think a lot of the confusion stems from language; in particular the way we talk about “machine-like efficiency”. Machines are expected to make precise measurements about whatever they’re pointed at; to produce “data”. But machine learning doesn’t produce data. Machine learning produces predictions about how observations in the present overlap with patterns from the past. In this way, it’s literally an inversion of the classic if-this-then-that logic that’s driven conventional software development for so long. My colleague Rick Barraza has a great way of describing the distinction:

  • In traditional programming, people input *rules* that guide how the system produces *outcomes*.
  • In machine learning, people input *outcomes* that guide how the system produces *rules*.
Diagram: In traditional programming, deterministic rules produce outcomes. In ML, outcomes produce probabilistic rules.

This dynamic of learning — through examples, trials, errors, and corrections — has been intentionally designed to mimic human cognition. Yet amidst the hype of AI, we seem to continually forget — or neglect — the outsized and active role that other people play in early childhood development.

“Teachable moments”, we call them when helping children learn to navigate the world. We recognize that rules — whether they’re about grammar, jaywalking, or even “the golden rule” — are essential but insufficient on their own; experience is the best educator and slip-ups are to be expected. The difference, of course, is that we don’t ask preschoolers to curate our news, drive our cars, or mediate our social interactions with other grown-ups.

As anyone with childcare experience can attest, it’s mostly a lost cause to expect a coherent answer when you ask a young child *why* they behaved inappropriately … scribbled with crayons on the walls … grabbed a toy away from someone… nudged a glass off the edge of a table … These weren’t “good” or “bad” or “unethical” actions, and they certainly weren’t done with consideration for downstream impact. In fact, they weren’t done with consideration at all. They were done … because they could. Because they wanted to see what would happen. Kids being kids, testing their boundaries.

As parents and caregivers, we recognize that it’s our responsibility to advise and not simply admonish. We respond by drawing connections between cause and effect: “When you behave like that, … we have to do a bunch of work to clean the wall … your friend will throw a fit because they’ll want their toy back … there’ll be sharp pieces of glass on the floor that somebody could step on…” We try to teach children to be more purposeful in the future by describing what could be done better next time (while taking deep breaths and hoping they’re actually paying attention).

We expect this kind of “I wonder what would happen if…” behavior from kids as part of finding and testing their boundaries. And because we expect it, we prepare ourselves to react in intentional ways and to communicate constructively about the repercussions of crossing those boundaries. After all, learning can only happen when we make room for mistakes.

In the field of AI, we’re working with a technology that learns in a strikingly childlike way — a technology that shapes its understanding of the world through a lens shaped and mediated by its designers; by us. So we need to ask ourselves: are we prepared to shape it intentionally?

Diagram: AI has a “conceptual model” of the world that a designer tries to express, but it changes depending on the user.
Unlike conventional software, the probabilistic nature of AI introduces a third “conceptual model” into the design process. While the designer attempts to convey the system’s functionality to the user through a “system image” (e.g., an app user interface), the AI is fundamentally affected by the user, their context and expectations. The result is a triad of conceptual models that are interdependent but disconnected.

AI by Design

As a Designer, I think intentionality is part of the job description.

Design is a considered and considerate act of service for someone other than yourself. It means listening to the needs of the people we intend to serve so we can be their advocates. It means having answers when we’re asked why something we designed behaved the way it did. And while there’s no such thing as a perfectly safe technology, there’s a world of difference between “I didn’t mean to” and “I actively tried to avoid”.

Intentional AI needs to be designed. It needs to happen on purpose.

Over the remainder of this essay, I’m going to argue that what we have on our hands with the current state of AI is a fundamentally misunderstood Design material. I’m going to propose that instead of trying to optimize for the reliability of AI systems, we need to be optimizing for the reliability of Human:AI collaboration.

To that end, I’m going to introduce three conversations that I think every product team should have prior to breaking ground on AI development. The point of these conversations isn’t to plan for every little detail or solve all your problems in advance. The point is to engage with one another, to promote intellectual curiosity, to make space for common sense, and to commit to acting with purpose. The point is to be able to say what you’ll do and do what you say.

The three conversations every AI product team needs to have

  1. Capability: What is uniquely AI and what is uniquely human?
  2. Accuracy: What does “working as-intended” mean for a probabilistic system?
  3. Learnability: How will people build — and rebuild — trust in something that’s inherently fallible?

Note: The following exercises are intended to be part of a multi-disciplinary approach that presumes the inclusion of User Research. Getting on the same page with your colleagues is critical, but there’s no substitute for learning from your users.

A primer before the first conversation: Automation is a spectrum

Before we get into the conversations with your team, I want to provide one key bit of framing: automation isn’t an all or nothing value. Your team will need a shared language about how much control your AI should exercise and how much control should be in the hands of its users, and when.

While we may be only about a decade into the deep learning revolution, we’re far from a blank-slate when it comes to the study of people engaging with automated systems. One Human:Computer Interaction (HCI) model in particular has become my go-to whenever I need to talk about automation in a more granular way. Let’s unpack it.

There are ten *levels* of automation, spanning a spectrum of human control:

  1. Offers no assistance, nothing is automated.
  2. Offers a complete set of possible decisions/actions for the human to approve.
  3. Offers a narrowed set of possible decisions/actions for the human to approve.
  4. Offers one decision/action and one alternative for the human to approve.
  5. Offers one decision/action for the human to approve.
  6. Allows the human a restricted time to veto before automatic execution.
  7. Executes automatically, then necessarily informs the human.
  8. Informs the human only if asked to.
  9. Informs the human only if it decides to.
  10. Decides everything, acts autonomously, ignoring the human.

We can use these 10 levels of automation to make distinct design choices for each of the four *types* of automation:

  1. Information acquisition: The process of sensing and registering data. This can include activities like choosing what to pay attention to, how often to pay attention to it, through which sensory channels, and over what periods of time. For instance, your car is capturing all sorts of data without your involvement while you drive, from the rotations of your engine’s crankshaft to the rate of fuel it’s consuming. An SLR camera, by contrast, puts most of the “acquisition” decisions in the hands of the photographer, including how closely they stand to their subject, the focal distance of the lens, and the amount of light to expose the sensor to.
  2. Information analysis: The process of synthesizing and representing data. This can include activities like choosing how literally or abstractly to present information, in what format, and alongside which reference points. A weather app, for example, might visualize an incoming storm system by animating changes in wind speed, temperature, and humidity. But it could also display the same information in a table, or simply show an icon of an umbrella with rain drops.
  3. Decision selection: The process of selecting from an available set of actions. This can include activities like choosing how many actions should be surfaced to the user, if any, and whether they should be treated as “options” or “recommendations”. A financial app, for example, might highlight a menu of different investing approaches while an entertainment app might just queue up the next show to watch in a series.
  4. Action implementation: Finally, the process of executing a decision. This last phase is the most definitive, but it’s also impossible to design for it without considering the prior stages. For example, a user might prefer to manually turn on the air conditioning before pre-heating their oven (instead of waiting for the thermostat to automatically register an increase in temperature), but they’re still relying on passive automated measurements of the room’s temperature. Conversely, a user might appreciate a location-based reminder automatically popping up on their phone, but only because they’d manually specified the location themselves.
Table: The different “types” of automation can have different “levels” of automation for the same system.
The level of automation can vary considerably across different stages of automation. A “lane assist” system in a car, for instance, passively gathers and analyzes information such as the proximity to lane demarcations and other cars, then represents its analysis through a single audio notification, before ultimately ceding control to the driver about whether to actually change lanes. A camera’s auto-focus system, meanwhile, has no control over where it’s pointed, but takes full control over the act of focusing after the photographer chooses from a small set of automatically-provided targets.

It’s not lost on me that a 4x10 matrix of options might seem a little daunting at face value. Yet for all the theoretical shades of grey it poses, there’s a single question that virtually everything ladders up to: what does a person need in order to feel confident taking accountability?

With that question in mind, we can start at the end — with people signing up to be accountable — and work our way backward.

With human accountability comes a new wrinkle in the design of AI: embracing human frailty as an essential input to your technical architecture (and sunsetting the played-out paradigm of people just being there as a fallback if the machine fails). As part of that, your team will need to consider the following four human performance criteria — what Dr. Mica Endsley dubs the “automation conundrum” — as fundamental constraints in your design process:

  1. Mental workload: The fidelity of the automation needs to match or exceed the fidelity of the human cognitive process that it’s offloading. People pay attention to all sorts of little details while performing a task; and because we think of them as “little” they often go unspoken. Think about the difference between the way a driver uses eye contact with other drivers at a 4-way stop (high fidelity) compared to the way they maintain appropriate speed on the highway (low fidelity).
  2. Situation awareness: The less actively someone is engaged in an activity, the more effort will be required for them to re-establish context when needed (i.e., “take the wheel back”). When people stop performing a task, they also stop paying attention to all the contextual cues that support that task, which unfortunately was at the root of Uber’s self-driving car accident that killed a pedestrian in 2018. While the “driver” was charged with a crime, the failure was on the design of a system that presumed a person could maintain perfect vigilance while not directly operating the vehicle.
  3. Complacency: The more reliable the automation becomes, the less someone will think about their own role or responsibility and the more they’ll presume the system has things under control. If something is perceived as being accurate, why bother spending energy on it yourself?
  4. Skill degradation: Simply put: use it or lose it.

No matter how you slice it, AI will always depend on people to some degree, the question is just when and how much. And if people need to be involved at all in the operation of an automated system, then their involvement needs to be considered throughout the operation. Which leads us into the first conversation.

The first conversation: Capability

Your team needs to clearly articulate what is uniquely human and what is uniquely AI about your product’s capability. AI-powered systems are at their most useful when designed to augment people’s awareness or support their ability to spend time on activities that are more intrinsically human; and therefore more intrinsically satisfying. Below is a simplified comparison to help avoid getting your wires crossed.

  • It’s uniquely human to thrive on novelty and creativity but get easily distracted; while it’s uniquely AI to thrive on repetition but never lose focus.
  • It’s uniquely human to perceive a variety of stimuli all at once; while it’s uniquely AI to be insensitive to mood or distractions.
  • It’s uniquely human to ask questions out of pure curiosity; while it’s uniquely AI to respond instantly and multi-task with precision.
  • It’s uniquely human to adapt behavior rapidly but forget details of past events; while it’s uniquely AI to remember everything it was taught but struggle to unlearn past patterns.
Table: What is uniquely human vs. uniquely AI (as described in the accompanying text)

Hopefully the comparisons above help jumpstart creative thinking, but they may still feel a little bit vague. So I recommend an exercise that is more relatable but might seem a little counter-intuitive at first glance.

Imagine as a team: What if your product experiences were collaborations between people?

Role-playing interpersonal collaboration is a great way to generate insights about the benefits and capabilities of probabilistic systems because other people are the probabilistic systems we all have the most experience with! (More on this in the last conversation)

Discuss how your users would achieve their goals (i.e. get their jobs done) if they were to partner with one, two, or even dozens of other people who were “experts” in the jobs that your app is trying to help perform. For example, if you were designing a system to help monitor how effectively people were socially distancing in a store, first imagine the roles that real live humans would play if they were to do it. Where would they position themselves? What would they be looking for? How would they deal with ambiguous viewing angles? What would they do if they thought people were standing too close to each other? What would they communicate to a supervisor if they needed to escalate? (Related: The role of human experts in the design of autonomous photography)

Consider how much interpersonal sensitivity goes into all the above, especially confronting patrons, and how fuzzy the actual “detection” part is. Instead of modeling human judgment as something to be replicated or outperformed, you could explore how store workers would benefit most from the unique capabilities of AI: spending more time on meaningful activities, avoiding ignorable distractions, being more present in the places that matter, or having more precise recall.

For example, perhaps by reflecting on past aggregate trends, workers could team up in parts of the store that are likely to be more populated, to support one another. Or a large monitor could be setup with augmented overlays projected on top of “personal bubbles” when people get too close to one another (faces and other personal features should be obscured). In this way, some of the brunt of confrontation could be alleviated for store workers and shoppers might self-regulate their social distance more actively.

AI systems trained with sufficiently representative and diverse examples of useful outcomes can pour over mountains of data, applying the fuzzy logic learned without ever getting tired, distracted, or forgetful. But they will also struggle with unforeseen and unprecedented conditions, which brings us to the second conversation.

The second conversation: Accuracy

Your team needs to agree on how you’ll measure both the accuracy and inaccuracy of the AI in your product’s experience.

As discussed in the prior section, while AI excels at finding-and-repeating, it’s a uniquely human trait to feel unsatisfied with the status quo and seek out change. As a result of this quirky tension between prior habits and future aspirations, it’s been my experience that the majority of AI-powered unintended outcomes are the result of one (or both) of the following:

  1. The team hasn’t gotten specific enough about which parts of the past they want the AI to learn from, and which parts they don’t want it to learn from. (Remember, we only call something “data” when it’s something we’ve chosen to pay attention to)
  2. The team hasn’t gotten specific enough about the characteristics of a more ideal future.
Diagram: An image search for “physicist” returns only men in the results because it’s not optimized for representation.
The optimization goals of an AI can play a dramatic role in users’ perceptions of accuracy. In the above example, a discussion about the fairness of the results (i.e., there are no women or people of color) should actually be a discussion about how the model was trained to “fill in the blanks” in cases where users only enter a single search term.

Conversations about past and future states can get a little existential, so I suggest approaching the issue from a grounded perspective. Discuss as a team: How will you create measures for when your AI is working-as-intended and measures for when it’s not working-as-intended.

To start, I recommend that each person on the team puts together two lists:

  1. Which operating conditions *should* play a role in the AI’s *accuracy*? For example, if your goal is to transcribe speech, aspects like ambient room noise, the type of microphone used, or how loudly people are speaking should have an impact on accuracy.
  2. Which operating conditions *shouldn’t* play a role in the AI’s *inaccuracy*? For example, continuing with the case of speech transcription from above, aspects like the gender identity or accent of the speaker ideally shouldn’t cause the system to be less accurate.

For both of the above lists, the idea is to brainstorm things that are likely to have an impact on the AI’s performance, but to distinguish between (list #1) contributing factors that can help set the user up for success and (list #2) confounding factors that you’ll need to mitigate against through Design and Engineering. While real life doesn’t happen in controlled conditions, it should be possible to spot irrelevant details.

When putting together your lists, try to use plainspoken language (avoid technical jargon) wherever possible so that a diversity of disciplines can be invited to take part. Ultimately, the goal of this exercise is to give your team the time and space to consider the people, environments, and devices that your AI is likely to encounter in the wild.

With your lists in-hand, the next step is to sort them into two categories:

  1. Perceiver-independent: Characteristics that can be assessed in a standardized way, regardless of the person doing the measuring. For example, measuring the number of words spoken per minute or the frequency range of someone’s voice.
  2. Perceiver-dependent: Characteristics that are trickier to assess because they’re subjective to the person doing the measuring and affected by context, culture, perspective, experience, or a wide variety of other cognitive quirks. For example, categorizing sentiment (e.g., lighthearted, serious, sarcastic) or instances of interruption.

Finally, it’s time to synthesize the results. Share your lists and categorizations, cluster the similarities, and discuss the differences.

Diagram: Perceiver-dependent and perceiver-independent characteristics (as described in the accompanying text)

The more operating conditions your team comes up with that are perceiver-dependent — about measures of both accuracy and inaccuracy — the more your design will need to consider a human-in-the-loop to guide and grow with the AIs. Perceiver-dependent characteristics are just that: dependent on people and culture, and therefore always in a state of evolution. Which brings us to the last conversation.

The third conversation: Learnability

Your team needs to approach the collaboration between people and AI as a collaboration built on shared intuition.

As I mentioned earlier, we’ve got plenty of experience working with probabilistic systems already. They’re called people. Yet the lessons learned in fields like psychology and sociology about Human:Human collaboration are rarely applied to Human:AI collaboration. Even the words we choose to use are noteworthy. For instance, I mostly hear people in the tech industry talking about Human:AI interaction. But if we only focus on the interactions, we miss crucial facets of teamwork like cooperation and common ground.

When two people meet for the first time, they don’t implicitly trust each other. Sure, there are a few things they can hopefully take for granted (like respecting personal space or speaking in complete sentences), but there’s simply no instruction manual in the world that can replace the dance of trying to figure someone out.

My colleague Dr. Arathi Sethumadhavan taught me that the process of building trust between people happens in two stages:

  1. Predictability: The more I can trust my predictions about how the other person will behave, the better I can calibrate how I might depend on them.
  2. Dependability: As I grow to trust in the quality of my predictions about how the other person will behave, I can start to depend on them in experimental ways. And the more confidently I can depend on the other person, the better I can calibrate my trust in them.

Note that there’s no mention of effectiveness in the above model. Trust doesn’t mean we need or expect the other person to be perfect at everything. Far from it. Often the most trustworthy people are the ones whose strengths and deficits are known to us.

Building appropriate trust

The “predictability → dependability” model of Human:Human trust construction can be applied quite directly to the design of Human:AI collaboration. But there’s one enormously important curveball to contend with first: Automation bias.

Automation bias is an unconscious preference for the outputs of automated systems over human judgment. We tend to expect mechanical things to be inherently more objective and true, even in the face of obvious errors or contradictory evidence (think back to the “machine-like efficiency” comment from the introduction). Consider a time when, say, you’ve followed the directions from your GPS despite consciously thinking “this can’t be the right way”.

Because of automation bias, users will be prone to ignore their own skepticism and leap into over-reliance.

Diagram: Predictability and dependability build trust over time, but automation bias causes people to over-rely too quickly.
The foundations of trust in Human:Human collaborations are largely transferable to Human:AI collaborations. The critical difference is that when people interact with AI there is an “automation bias” that causes them to initially over-rely on the system.

Without the opportunity to build their own personal mental model about an AI’s idiosyncrasies, users won’t be able to reconcile its performance against what they predicted it was or wasn’t likely to do. And further, they won’t be able to judge whether they should or shouldn’t have depended on it for that task in the first place; falling into one of the following failure modes:

  • Misuse: People using the AI in unreliable ways because they didn’t understand or appreciate its intended uses/limitations; or
  • Disuse: People not using the AI because its performance didn’t match their expectations about its intended uses/limitations.

We can short-circuit both the misuse and disuse of AI by flipping the script on the conventional wisdom of “seamlessness”. Instead of doing everything in our power to “save” users from spending mental effort, we should be embracing the seams. Instead of striving for more trust in AI, we should be striving for appropriate trust in AI. Because like any teacher:apprentice relationship, learning requires a mutual acknowledgment of fallibility.

Diagram: AI performance is only reliable when attributes “match” between the training conditions and operating conditions.
Machine learning systems are not reliable or unreliable in isolation. Learned systems don’t produce “errors” so much as they surface mismatches between (1) training examples, conditions, and goals and (2) operating behaviors, conditions, and goals. The diagram above illustrates the types of overlap necessary for reliability.

The exponential leap that AI offers isn’t that our tools will work perfectly for us right out of the box, it’s that we can make our tools work better for us by using them. But we’ll only be able to make that leap if we prioritize the efficacy of people over the performance of AI.

New affordances and new signifiers

Like so many Designers, I was introduced to the concept of “affordances” by Don Norman’s book The Design of Everyday Things (though they were originally introduced by the psychologist James Gibson in 1966). Affordances are the potential actions inherent to a thing. For example, a chair affords sitting, a door affords opening, a button affords pressing, and a book affords page-turning. But without careful consideration, these potential actions can be misunderstood or even imperceptible. Try starting a web search with the words “how to restart” and the autosuggestions will paint a telling picture for you about missed affordances in modern gadgetry.

To unlock affordances, we need “signifiers” that communicate where potential actions should take place. A chair’s swooped seat and angled back are signifiers for the place where we should sit. A door’s handle and hinges are signifiers for the way we should open it. A button’s size and relative elevation are signifiers for how we should press it. A book’s cover and binding are signifiers for where we flip to the next page.

So what are AI’s affordances and signifiers?

AI’s foundational affordance is that it can form intuition. But as a community of practitioners, we’re still in the wilderness when it comes to designing signifiers that effectively communicate those intuitions (spoiler: thumbs-up and thumbs-down simply don’t cut it on their own).

To make matters even more complex, the machine learning models underlying AI are rarely trained with the necessary flexibility to be able to adapt to real-world scenarios (per the second conversation above).

The interplay between horse and rider provides a helpful analogy for the type of learnability that we’re after, where the rider expects to be able to take certain things for granted about the horse’s intuition (e.g., avoiding obstacles in its path), but they also expect to play a critical role in its education (e.g., learning which path is the desired path).

Bringing together everything we’ve discussed in this section about the importance of building predictability → dependability, contending with automation bias, the importance of being upfront about fallibility, and the many lessons learned from Human:Human collaboration, discuss as a team: What signifiers will help your users construct their own intuition about the AI’s intuition?

Signifier #1: Reference points

Surface examples of what the AI was “looking at” — or an example from its training data that’s similar to what the AI was “looking at” — when it made a prediction.

Reference points are useful any time you’re presenting inferences to the user. They’re like a collaborator saying, “here’s what I observed that made me think that”, or “here’s what that reminded me of”.

A theoretical meeting app presents reference points and keywords behind topic clusters in a transcript.
A theoretical meeting app presents reference points and keywords behind topic clusters in a transcript.

Signifier #2: Optionality

Offer multiple predictions, even if the AI is significantly more confident about one of them.

Optionality is useful whenever the user is trying to make a decision. Being able to weigh a handful of options before acting helps people build confidence. It’s like a collaborator saying, “I think this is the best path, but it’s not the only one I considered”.

A theoretical image search app presents different ways that a term like “lotus” might be interpreted.
A theoretical image search app presents different ways that a term like “lotus” might be interpreted.

Signifier #3: Nearest neighbors

Surface predictions based on inputs that are close-to-but-not-quite what the AI was “looking at”.

Nearest neighbors are useful whenever you’re presenting recommendations or sorted results. They’re like a collaborator saying, “aspects of this remind me of other things I’ve observed before”.

A theoretical music app presents tracks that share similarities with a liked song.
A theoretical music app presents tracks that share similarities with a liked song.

Signifier #4: Card sorting

Arrange items in the order the AI found most “interesting” and was most “confident” about, while also following three requirements: (1) Don’t explicitly label interestingness or confidence per-item, allow that to be conveyed implicitly; (2) Render a consistent number of items in the list, leaving room for the discovery of false negatives; (3) Support a complimentary mode that sorts the items using a simple rule, such as chronological or alphabetical.

Card sorting is particularly useful for avoiding the frustration and confusion of false negatives (missing results or options). It’s like a collaborator saying, “I’ve prioritized this list for you, but feel free to scan through it in case I misjudged something.”

A theoretical investing app presents a sorted view of watched stocks without hiding low-scorers.
A theoretical investing app presents a sorted view of watched stocks without hiding low-scorers.

Signifier #5: Semantic ladder

Support a mode that analyzes inputs at more rudimentary semantic* levels so the user can understand whether they’re “looking at” the same thing as the AI.

A semantic ladder is useful any time there’s likely to be a mismatch between the user and an AI. It turns an “error” into an opportunity to get on the same page, like a collaborator saying, “let’s break this task down into simpler chunks until we find a starting point that we’re both confident about”.

* Semantics is the study of how things get their meaning. Using computer vision as an example, climbing down the “semantic ladder” might go something like “object → thingness → shapes → edges → contrast → light”. It’s a chain of concepts that helps us assign meaning to something. In this case, light is needed to recognize contrast, and contrast is needed to recognize edges, and so forth. This video of the SeeingAI app offers a fantastic demonstration of the semantic ladder signifier, where a user who is Blind works with the AI to calibrate their understanding about whether their kitchen lights are on.

A theoretical voice typing app presents low confidence transcriptions using “sound-it-out” phonetic spelling.
A theoretical voice typing app presents low confidence transcriptions using “sound-it-out” phonetic spelling. In this example, the highlighted term is intended to be “CSAT”, which is a shorthand jargon for “Customer Satisfaction” often used in tech product development.

Signifier #6: Simulation

Provide a way to play with how the AI’s predictions might behave under a variety of relevant conditions.

Simulation is especially useful for helping users weigh tradeoffs. It’s like a collaborator saying “let’s roleplay through a few likely scenarios to help us prep”.

A theoretical driving navigation app presents simulated routes based on configurable optimization goals.
A theoretical driving navigation app presents simulated routes based on configurable optimization goals.

Signifier #7: Calibration

Support a mode where the AI can passively compile observations to identify relevant baselines and spectrums.

Calibration is useful for personalization or if the user is offloading a task that they’d otherwise perform themselves. It’s like a collaborator saying, “let me shadow you for a little while as I’m getting up to speed”.

A theoretical writing app presents feedback options that can be tuned to an individual and their goals.
A theoretical writing app presents feedback options that can be tuned to an individual and their goals.

Signifier #8: Two minds

Communicate the AI’s confidence about both “sides” of a prediction: confidence that something is x and confidence that something isn’t x.

Two minds are useful whenever “System 2” thinking is needed, such as high-stakes or safety-critical situations. It’s like a collaborator saying, “This may be what we’re looking for, but we should also check whether it’s something we’re trying to avoid”.

A theoretical field guide app presents results about both safe and unsafe plants.
A theoretical field guide app presents results about both safe and unsafe plants.

Signifier #9: Sampling

Provide a way to orient the AI’s attention by remixing aspects of what it’s observed.

Sampling is useful for exploration, especially when trying to break free from “echo chambers”. It’s like a collaborator saying, “Tell me which aspects of this interest you and which don’t so we can play with some other options.”

A theoretical video app presents options to watch based on individual facets of a source video.
A theoretical video app presents options to watch based on individual facets of a source video.

Onward, with purpose

The wild potential of a machine-that-learns is spellbinding. Personally speaking, the promise of AI lights up my imagination with dreams about universal accessibility through digital synesthesia (the process of translating intuitions between different mediums, e.g., “seeing” what something sounds like or “hearing” what something looks like) and universal education by integrating curricula into the lived experiences of students (e.g., language immersion through augmented reality translation or learning about music theory by sampling bird songs and street sounds from the neighborhood), greatly inspired by the Young Lady’s Illustrated Primer, of course).

It’s because of these dreams that I care so much about being purposeful in the design of Human:AI collaboration. I worry that in the fervor of helping machines reach their potential, we’ll accidentally neglect the fullness of our own needs, especially the quieter ones; like our needs for rest, affection, and freedom.

But I’m optimistic.

I hope that over the course of this essay, the considerations and techniques I’ve introduced have offered a useful balance of philosophy and pragmatism. Sure, there’s a healthy dose of gravity that comes with this territory (“with great power…”), but I also hope you’re feeling like it’s doable.

Above all, I hope that I’ve landed the following: that there is no perfect model for doing the “right thing”. The best we can do is to be intentional; to nurture the tensions that will naturally arise when we allow ourselves and our colleagues to remain curious long enough to learn what it takes to be accountable for the things we’ll bring into this world.

The UX Collective donates US$1 for each article published on our platform. This story contributed to Bay Area Black Designers: a professional development community for Black people who are digital designers and researchers in the San Francisco Bay Area. By joining together in community, members share inspiration, connection, peer mentorship, professional development, resources, feedback, support, and resilience. Silence against systemic racism is not an option. Build the design community you believe in.

--

--

Designing for the expression of values. Privacy & Data Protection Office @Google. paradoxtheory.com