The troubled state of screen readers in multilingual situations

What happens when blind and visually-impaired internet users speak multiple languages? I find out, and also give you some tips!

Xurxe Toivo García
UX Collective

--

A purple open laptop showing the infinity symbol on the screen. Digital isometric illustration by Xurxe Toivo García.

If you already know how screen readers work, and/or don’t have time to read about the experiments, feel free to go straight to the Conclusions section!

I’m a Junior Developer and Accessibility Specialist at Wunder. I got here over the course of one year or two decades, depending on how you look at it. But before I became a coder, I was a scientist.

I have two bachelor degrees (in Agricultural Sciences and Plant Science) from Cornell, an Ivy League university in the US. There are many ways in which my scientific training benefits my work as a developer:

  • Ecology helps me understand large, complex systems, both agricultural and technological.
  • Physiology enables me to dissect both natural and human-made processes.
  • Chemistry lets me see how pieces fit together, whether they are atoms in a molecule or components in an interface.
  • And, of course, my many years of research experience allow me to design experiments, collect data, and crunch numbers, not only when it comes to studying plants, but also this article’s topic: screen readers.

I performed these experiments and wrote this article because, as far as I know, this type of knowledge wasn’t recorded anywhere. Firstly, it’s already hard enough to find tutorials and instructions about how to set up screen readers so they even work in multilingual situations. And then there’s the matter that’s more relevant to my work: I couldn’t find a simple source that discussed multilanguage bugs in screen readers, which is ridiculous considering that every single screen-reading software has quite a few of these (not to mention that screen reader users are as capable of speaking multiple languages as anyone else).

Anyways: on with the study!

P.S. A lot of links in this article go to our Wunderpedia, which Wunder launched this past May on Global Accessibility Awareness Day. I also recently wrote an article about how Accessibility on Wunderpedia came to be.

Screen reader basics

Screen readers are one kind of digital assistive technology that people with disabilities can use to interact with devices and services. The archetypal screen reader user is a fully blind person, although other users (such as people with low vision, partial vision, dyslexia, etc) can also benefit from them.

In order to serve all these profiles, screen readers do a lot more than just reading the text that’s on the screen; they also provide information about the kinds of text on the screen, as well as about non-text elements… if the code behind them is properly implemented, of course. I’m a web developer, so from this point onward I’ll be talking about screen readers in the context of websites and HTML code.

When we see a few words in a larger font at the top of the page, we understand it’s a heading. When we see bullet points, we understand it’s a list. When we see underlined text, we understand it’s a link. In order for a blind person to access the meaning encoded in both the text itself and its visual attributes, the screen reader needs to announce (or “tell about”) both aspects. In order for a screen reader to access both, they need to be communicated properly by the code.

A screen reader announces the content as it encounters it when moving the reading cursor through the page. The reading cursor is just where the screen reader “is currently at”, and many screen readers add an outline around the element as a visual indicator of said cursor.

The reading cursor can move automatically during continuous reading mode, or you can control where it goes with keyboard and/or touch shortcuts, depending on what the device supports. Additionally, people who cannot operate a regular keyboard or touch device may use supplementary assistive technologies, such as voice recognition and switch devices.

Take, for example, the following HTML snippet:

<main>
<h1>Fruits</h1>
<ul>
<li>Apple</li>
<li>Orange</li>
</ul>
</main>

A screen reader doesn’t simply say: “Fruits. Apple. Orange”. Instead, it could announce it like this (depending on the software and settings):

“Main. Heading level 1: Fruits. List, 2 items. List item: Apple. List item: Orange. End of list. End of main.”

See how much more meaning and context the second announcement conveys?

Text alternatives

Image content includes photos, illustrations, drawings, icons, graphs, diagrams, and so on. It can have either one of two roles:

  • Informative: when it conveys information, and that information is not provided in the surrounding text.
  • Decorative: when it doesn’t convey information; or when it does, but the information is already provided in the surrounding text.

Informative image content must have a suitable text alternative, which can then be announced by screen readers. Text alternatives can be provided in several ways:

  • The alt attribute (for elements that support it, namely <img>, <area>, and <input type="image">).
  • ARIA attributes, specifically aria-label and aria-labelledby (for elements that don’t support the alt attribute). ARIA stands for Accessible Rich Internet Applications and, when properly used, extends the accessibility of basic HTML code.
  • Visually-hidden text (again, for elements that don’t support the alt attribute). This can be achieved with CSS styling in multiple ways.

Decorative image content must not have a text alternative:

  • If the element supports the alt attribute, it must be left empty (as in alt=""), instead of being omitted altogether. An empty alt attribute tells the screen reader that you didn’t just forget to add a text alternative, but you’re consciously marking this image content as decorative. On the other hand, if you omit the attribute altogether, screen readers will read the name of the file (which, nowadays, it often is an annoyingly long, random string of letters and numbers).
  • For other elements, simply don’t use aria-label, aria-labelledby, or visually-hidden text. Depending on the situation, you may need to add aria-hidden="true" in order to make the screen reader ignore the element altogether (more on this later).

Voices and languages

The screen reader voice is the voice synthesizer used to announce the text content at any given time. You can think of it as the screen reader’s “accent”.

The language is the language of the text itself. This text can come from your operating system, from the screen reader UI, or from the web content.

In order to help screen readers figure out which voice they need to use for a given webpage, you must declare the document language; for example, the following code declares that the page is in English:

<html lang="en">
Hello
</html>

If any part of the page is in a different language, you have to declare that too; for example, let’s add a bit of Finnish text:

<html lang="en">
Hello,
<span lang="fi">moi</span>
</html>

When the voice and language match, everything is great. But when they don’t, you can end up with an English voice completely butchering Finnish text, or vice versa.

Based on how carefully and thoroughly we declared the languages above, screen readers should have no language issues, right?

Unfortunately, at the current time, the interactions between all major screen readers and browsers are full of language bugs.

When I started my job at Wunder, I became a much more frequent screen reader user. I would be testing client websites and notice constant mismatches between voices and languages. The first few times I thought the developers must not have declared the language correctly, but when I inspected the code, there it was: a perfectly valid and correct lang attribute. Then I began to notice minor patterns, until I finally said: enough is enough! I must roll up my sleeves, put my science glasses on, and figure out what’s going on here.

Common testing environment

I performed three experiments. In all cases, I included:

  • Major devices: desktop and mobile.
  • Major operating systems: macOS, iOS, Windows, and Android.
  • Major browsers: Safari, Chrome, Firefox, Opera, Opera Touch, and Edge.
  • Major screen readers: VoiceOver, NVDA, JAWS, and TalkBack.
  • Major reading modes: continuous reading, keyboard shortcuts, and touch gestures.

I used the latest releases of all the software, and made sure versions didn’t change between tests. You can check the versions and other technical details in the testing environment in experiment 1 or the testing environment in experiment 2 (they’re exactly the same).

Obviously, not all combinations work (for example, Safari is only supported by Apple products). Here is the actual list of hardware and software combinations:

  • VoiceOver (desktop): Macbook Pro + macOS + Safari, Chrome, Firefox, Opera, Edge.
  • VoiceOver (mobile): iPhone + iOS + Safari, Chrome, Firefox, Opera Touch, Edge.
  • NVDA: Asus laptop + Windows + Chrome, Firefox, Opera, Edge.
  • JAWS: Asus laptop + Windows + Chrome, Firefox, Opera, Edge.
  • TalkBack: Samsung phone + Android + Chrome, Firefox, Opera Touch, Edge.

In both experiments, the operating system, screen reader UI, and browser UI were in Finnish, while the website was not.

Experiment 1: indicator elements

I made a simple website with the following elements:

  • A: the title of the page (<title>).
  • B: a top-level heading (<h1>).
  • C: a paragraph (<p>).
  • D: an ordered list (<ol>) with two items (<li>).
  • E: an informative image (<img>) with an alt attribute.
  • F: an informative image (<img>) with an alt attribute and an explicit lang attribute.
  • G: a decorative image (<img>) with an empty alt attribute.
  • H: a button (<button>) with visible text.
  • I: a portion of text (<span>) in a different language, with an explicit lang attribute.
  • J: a link (<a>).

The entire website is in English, with the exception of that one portion of text, which is in Spanish.

For each test case, I wrote down what the screen reader announced, and in what voice. In addition to the 9 elements above, I also took note of the same information for the element descriptions. All the raw data can be found in the experiment 1 data file on the GitHub repository.

After that, I assigned scores from 0 (complete and utter failure) to 3 (perfect announcement). What constituted “perfect announcement” depended on the test case:

  • For element descriptions: the language and voice match (either both in Finnish or both in English).
  • For the decorative image with empty alt (G): the element is skipped completely.
  • For the portion in Spanish (I): the text is announced in a Spanish voice.
  • For all others: the text content is announced in an English voice.

You can find the experiment 1 scoring criteria on the GitHub repository, and all the scores in the Google Sheet, under the experiment 1 Scores tab. I also did some handy-dandy data analysis under the experiment 1 Analysis tab. There is a lot to unpack there, so let me share the most important findings now.

Key results of experiment 1

All combinations had major issues that can severely impact users, but the issues varied wildly.

The element with most language trouble (average score: 1.23 out of 3) was the page title (A) [Figure 1]. This is a relatively minor problem, because it’s often only heard once, and it should anyways be identical or very similar to the top-level heading of the page.

Figure 1
Figure 1: Average scores of screen reader multilanguage performance in indicator HTML code (by test case, all).

The second most significant problem occurred in informative images with an alt text (E and F). Whether the image had an explicit lang attribute made no difference [Figure 1]. They were announced perfectly (score 3) in just 25% of the test cases, and poorly (score 0 or 1) 68% of the time [Figure 2].

In contrast, decorative images with an empty alt (G) had the least issues (average score: 2.68 out of 3) [Figure 1]. This is not surprising, given that the correct screen reader behavior in this test case is to skip the element (arguably the easiest task ever). Still, some test cases did not achieve this.

Figure 2
Figure 2: Scoring of screen reader multilanguage performance in indicator HTML code (by test case, all).

No screen reader did well across the board. For example, TalkBack did pretty well in general, but quite poorly in combination with Firefox [Figure 3].

Likewise, no browser did well across the board. For example, Chrome did quite well in two cases (JAWS and TalkBack), decently in two cases (VoiceOver on desktop, and JAWS), and poorly in one case (VoiceOver on mobile) [Figure 3].

It’s often said that VoiceOver works the best with Safari, but based on my data, that’s not the case. VoiceOver on desktop did equally well with Safari, Chrome, and Edge. VoiceOver on mobile did equally poorly with all browsers [Figure 3] (more on this in a couple of paragraphs).

Figure 3
Figure 3: Average scores of screen reader multilanguage performance in indicator HTML code (by screen reader and browser combination, all)

Both VoiceOver (desktop) and NVDA had inconsistent behavior that changed according to the reading mode. For example, on VoiceOver for desktop, paragraph text would be read with a Finnish voice (wrong) during continuous reading, and with an English voice (right) when using the keyboard to move the reading pointer.

VoiceOver for mobile is a bit special. In this case, you can perform a simple swipe gesture to change the screen reader language; in contrast, with other screen readers, you have to dig out some settings menu, and often restart your computer or relaunch the program. Since VoiceOver (mobile) was in Finnish, it described the elements in Finnish language and voice, and it read the text content in a Finnish voice, ignoring any language declarations.

Therefore, in the testing environment I designed, VoiceOver for mobile did very poorly. In a real life scenario, I could have just swiped to change the language to English the moment I realized that the content was in English. This is not a big deal if there’s only one language on the page, but as soon as you have multiple smaller bits in different languages, it quickly becomes annoying and cumbersome.

Experiment 2: accessible labelling

I made a simple close button to use as a case study:

<button>
<span aria-hidden="true">X</span>
</button>

I used aria-hidden="true" to hide the X from screen readers; this is because in this case it doesn’t represent a letter, but the close button convention we should all be familiar with. Then, I tried different ways to provide the word “close” as a text alternative (lettering continues from previous experiment):

  • K: Using aria-label.
  • L: Using aria-label and an explicit lang attribute.
  • M: Using aria-labelledby and referring to a visually-hidden label outside the labelled element.
  • N: Using aria-labelledby and referring to a visually-hidden and aria-hidden label outside the labelled element.
  • O: Using aria-labelledby and referring to a visually-hidden label inside the labelled element.
  • P: Using aria-labelledby and referring to a visually-hidden and aria-hidden label inside the labelled element.
  • Q: Using a visually-hidden label inside the labelled element.

The page was in English, and I recorded what the screen reader announced, and in what voice. You can find the raw data in the experiment 2 data file on the GitHub repository.

After that, I assigned scores between 0 and 3 to each test case (you can find the full experiment 2 scoring criteria on the GitHub repository):

  • 3: Perfect announcement, with voices and languages matched correctly.
  • 2: The voices and languages are matched correctly, but there’s repetition (for example, the accessible label being announced multiple times).
  • 1: The accessible label is announced in the right voice, but the name of the element is announced in the wrong voice. Or, the accessible label is announced multiple times and not all of them correctly.
  • 0: the accessible label is announced in the wrong voice, or there’s some other big issue.

The reason why announcing the name of the element in the wrong voice doesn’t automatically deduct all points is the following: if you’re a frequent screen reader user, you can get used to phrases like “button” and “heading level 1” being mispronounced just from hearing them so much. An accessible label, on the other hand, could be anything, and that means it can be very hard to understand if it’s announced in the wrong voice.

You can find all the scores in the Google Sheet, under the experiment 2 Scores tab. I also did some handy-dandy data analysis under the experiment 2 Analysis tab. Once again, let me give you the highlights.

Key results of experiment 2

All ARIA-based approaches (K to P) led to language issues in more than 50% of test cases [Figure 4].

Figure 4
Figure 4: Scoring of screen reader multilanguage performance in different accessible labelling approaches (by test case, all)

Using a visually-hidden text element (approach Q) led to the least amount of language issues [Figure 4]. Furthermore, if we exclude from the analysis the combinations that gave uniformly bad results (score of 0 or 1), approach G was announced perfectly 100% of the time [Figure 5].

Figure 5
Figure 5: Scoring of screen reader multilanguage performance in different accessible labelling approaches (by test case, filtered)

As in the previous experiment:

  • No screen reader did well with all browsers.
  • No browser did well with all screen readers.
  • Both VoiceOver (desktop) and NVDA had inconsistent behavior that changed according to the reading mode.
  • VoiceOver on mobile, as a special case, announced everything in Finnish language and/or voice when the language of the program was Finnish. When I swiped to change its language to English, everything was read in English language and/or voice (this data is not included in the analysis because it doesn’t fall under the scope of the experiment design).

Experiment 3 (bonus): text provided in HTML attributes

For this experiment, I didn’t collect any new data; I simply repurposed some of the raw data from the other two experiments, to specifically look at what happens when the screen reader has to read text that is provided as an HTML attribute.

To this end, I took the two test cases involving informative images from experiment 1 (E and F), and re-scored them according to the criteria in experiment 2. I also took test cases K to P from experiment 2 (I excluded case Q because it provides the text alternatives using visually-hidden text, not an HTML attribute).

Once again, you can find all the scores in the Google Sheet, under the experiment 3 Scores tab. I also did some handy-dandy data analysis under the experiment 3 Analysis tab.

Key results of experiment 3

The cases with the highest failure rate were those in which the screen reader had to announce text provided as an attribute of the same element (cases E and F of images using alt, and cases K and L of buttons using aria-label) [Figure 6].

Figure 6
Figure 6: Average scores of screen reader multilanguage performance in different text alternative approaches (by test case, all)

The other cases (M to P, where the text was in a different element, which was then accessed by ID reference with the help of aria-labelledby) did somewhat better on VoiceOver for desktop only.

Test case M (using aria-labelledby and referring to a visually-hidden label outside the labelled element) got a lot of middle-of-the-road scores because the word “close” was often read twice (once as the label of the button, once as the content of the outside element itself), and often not both times with the right voice.

Overall, there were very strong patterns across multiple variables. Just to name a few:

  • By test case and screen reader: test cases E to F and K to L were always read poorly by VoiceOver (scores 0 and 1).
  • By screen reader and browser: NVDA always did poorly on Chrome (scores 0 and 1), but well on Firefox (scores 2 and 3).
  • By screen reader: JAWS did poorly on all test cases and browsers.
  • Etc.

Conclusions

The state of multilanguage support at the interface between all major screen readers and browsers is truly appalling. As a web developer (and a junior at that), I lack the programming skills to determine whose fault it is. Is it the screen readers? The browsers? Both? Probably both.

Still. I wonder, how is it possible that we’re in the year 2020 and having all these issues? The internet has been around for three decades. JAWS, since 1995. VoiceOver, NVDA, and Talkback, since the 2000s. Is it that most developers behind these technologies are monolingual? Or are they working under the assumption that most users with visual impairments can’t speak multiple languages? Again, probably both.

Even if we lower our standards and figure hey, as long as I have one browser that works well (in terms of language) with my screen reader of choice, I can get by. Unfortunately, there’s a couple of problems with that:

  • Based on experiment 1, it’s not possible to find a working browser match for several of the screen readers, even when we stick to basic, simple HTML. For example, VoiceOver (both desktop and mobile) performed relatively poorly with all browsers, and Apple users simply have no proper alternatives.
  • A combination that works well today can break tomorrow. For example, according to experiment 1, Firefox performed well with NVDA but poorly with all others. But guess what? Less than two months before, Firefox version 72.0.2 was doing pretty well with JAWS, but quite poorly with the rest. The archetypal, completely blind, screen reader user relies on familiarity to remain oriented in the world, and things suddenly changing (or being forced to switch to a different browser because the current one doesn’t get along with their screen reader anymore) can be confusing and draining.

P.S. The earlier data is not included here because there were a few things I forgot to account for, such as browsers automatically updating while I was in the middle of the experiment. My science brain simply could not abide by poorly-collected data, which is why I repeated the data collection phase.

About text alternatives

Now, here is a prickly situation. The first, broader issue is that text alternatives are often lacking (in presence and/or quality), because most developers don’t need them or don’t know about people who need them; so they forget to add them, forget to translate them in a multilingual site, or do try to implement them, but they do it incorrectly.

  • Based on experiment 2, we could decide to never use ARIA attributes to provide text alternatives. We could use the visually-hidden text approach, which worked perfectly (except for in the software combinations that led to all-around failure).
  • However, from experiment 3 we can clearly see that the real issue is that most combinations didn’t announce the text provided by HTML attributes in the correct voice. We know it’s possible (TalkBack managed just fine!), but for whatever reason most others don’t deliver.
  • The previous point applies not only to ARIA attributes (which have only existed since 2014), but to the alt attribute as well, which has been around as long as <img> and the other elements that support it (specifically, since HTML 2.0 came out in 1995). And I would certainly never tell you to stop using alt to provide text alternatives to images.

Advice

So, if you’re a developer, or otherwise involved in making or testing websites, here’s my advice:

  • Learn the basics of at least one of the major screen readers. You don’t need to know every tidbit of functionality, just some basic keypresses and/or gestures for webpages. You can use my data and your own experience to figure out the quirks and bugs of your setup.
  • Learn the difference between informative and decorative images, as well as good text alternative practices.
  • Use the alt attribute for elements that support it (<img>, <area>, and <input type="image">).
  • For elements that don’t support the alt attribute, it’s up to you: you can either go with the visually-hidden text approach (not the gold standard, but the option with better language support at the current time), or use ARIA attributes (what most sources usually recommend, but keep in mind that it won’t work properly until screen readers and/or browsers get their collective act together when it comes to multilingual situations).
  • We can’t stop using ARIA-based labelling altogether, because it’s required in certain cases (for example, when distinguishing identical ARIA landmarks).
  • On the other hand, visually-hidden text can be used for more than just providing text alternatives. For example, if your page has several sections, and this information is conveyed visually through a change of background color, layout, etc. it probably needs a heading of the appropriate level. If you don’t want to show it to everyone, you can at least show it to screen reader users.

For the purpose of hiding text visually, do not use display: none or visibility: hidden. Those CSS styles hide the content from everyone, including screen reader users.

There are many different ways to make hide elements visually but not from screen readers. Here is the CSS styling I currently recommend for simple cases (I’m planning on writing an article about it soon):

.visually-hidden:not(:focus):not(:focus-within):not(:active) {
position: absolute;
height: 1px;
width: 1px;
border: 0;
padding: 0;
margin: 0;
overflow: hidden;
clip: rect(1px 1px 1px 1px); /* IE6, IE7 */
clip: rect(1px, 1px, 1px, 1px);
white-space: nowrap;
}

P.S. it’s not identical to the code I used in these experiments, but it behaves the same in test cases examined.

What’s next?

I want to make two public commitments:

  1. I will repeat these tests in a year and write a follow-up article.
  2. I will do my best to bring these concerns to the people in power. I don’t know how successful this endeavor will be if I go through the general support or feedback channels for the organizations in charge of these screen readers and browsers, so if you have inside contacts, please do let me know!

Thank you for reading, and for helping us make the internet a more accessible, diverse, and inclusive place!

The UX Collective donates US$1 for each article published in our platform. This story contributed to UX Para Minas Pretas (UX For Black Women), a Brazilian organization focused on promoting equity of Black women in the tech industry through initiatives of action, empowerment, and knowledge sharing. Silence against systemic racism is not an option. Build the design community you believe in.

--

--

He/him. Web developer and accessibility specialist. Ivy League and UWC graduate. From the Galician ethnic minority in Spain.