Designing NLP features to help students read

A UX research of Natural Language Processing methods, to help students cope with their reading tasks better.

Yotam Abraham
UX Collective

--

Group of students learning on a wooden table
Photo by Startup Stock Photos from Pexels

NLP (natural language processing) is a growing field of research & study of artificial intelligence, which turns text into vectorized numbers, to find deep language structure connections and knowledge relations. The main tasks are performed to help people analyze massive amounts of text, quickly.

NLP As a Learning Tool

Currently, there are lots of exciting explorations being made on the new state-of-the-art language models. In production, we can learn about several effective NLP systems that help companies analyze their users and customers better. In this article, I would like to think about and exemplify how NLP can be a useful tool for students and educational institutions. I will exemplify how Summarizations, Semantic Similarities, and Keyword Extraction can be excellent learning tools for every student and reader.

About two years ago, I started learning and researching Natural Language processing, trying to find systems and features that can help me learn and read better. I love to read and learn new things. Most of my formal education & self-learning is based on reading. I also have a background in UI/UX design, and recently, some successful experiments with python NLP notebook code.

I present here three use cases that I’ve researched, with relevant notebooks and designed prototypes, based on stable models and methods from the existing state-of-the-art models such as BERT, GPT-2, and other models from Hugging Face.

Woman flipping through a professional Journal
Photo by Karolina Grabowska from Pexels

Skimming Articles

Reading articles is an essential task for students and researchers. Skimming is defined as the action of reading something quickly to note only the essential points.

For students and academics, skimming can be a great way to get a quick overview of the paper and decide if it’s relevant to their research. It’s also an excellent method to get the main points and understand the article structure as pre-read before going into close reading.

With this consideration, I designed the ‘Skimming’ feature based on the extractive summarization method, which is relatively cheap to compute in production, and there are several libraries and companies that will ‘service’ this function quickly.

UX consideration for data structure

Extractive Summarizer is an NLP function that finds the main sentences in a given text. There are several open-source libraries available, and some free demos to try with a few lines of code. For reports, short blogs, and news articles, extractive summarization can yield good results (try the amazing newspaper3k library for news articles). But for long and complex texts like scientific articles and books, I think the extractive summarizer is just a helper, and It’s not a sufficient tool to count on yet as a pure summarizer.

To optimize this feature for books, It might be better to render each chapter with an extractive summary. For articles — it might be more efficient to render paragraphs (and easier for data preparation). You can also control the maximum length of output (sentences) you want to receive, so it’s pretty flexible. Dividing and rendering pieces of text inside given documents (paragraphs, Chapters) can make the process slower for dynamic inference. But if we are using a static inference (for example, when all documents are in the database in advance), readers can get a better user experience, quite easily.

Futuristic human figure with annotated brain looking inside
Photo by meo from Pexels

Auto-highlights And Keywords Prompt

Pre-highlights have become very common in many news articles and content sites. In this example, I created a design to apply this idea to every article you read automatically.

For auto-highlighting, I used the same Extractive Summarizer method, just with a different UX approach, and a bit more complex software development implementation.

UX consideration for data structure

An extractive summarizer is preform by ‘ranking’ the relevant phrases and keywords with a machine learning prediction. With this logic and some dev work, you can decide your bar of optimum ranking, and you can also limit the output by defining how many sentences the system should highlight. I’ve designed this feature by the logic of rendering each paragraph — and I think that it’s best to limit the highlights to one or two sentences in each ‘p’. In this case, rendering the entire document as a whole can generate too many highlights only in a few sections of the text (in the abstract and the conclusion, for example) — learn more in this article.

Another helper to use is the keyword extraction method. This helper is a standard method in many systems, and you easily take it to production with static inference. For this case study (scientific article, relatively short), I found that extracting keywords for the whole document makes more sense than rendering and extracting each paragraph. For books, it might be better to render chapters.

Prototyping Semantic Similarity For Reading Marks

‘Semantic Similarity’ is often used in search engines to find related content and optimize recommendation systems. In this example, I want to show that Semantic Similarity can be powerful for a single text, or small corpora, to improve students reading comprehension.

Male student marking a sentence inside a book
Photo by Oladimeji Ajegbile from Pexels

When reading scientific articles, textbooks, and non-fiction, highlighting the relevant sections is a typical behavior (in print and digital). Many students I interviewed say that they can’t read without highlighting. It helps them stay focused while reading and enables them to track the vital section for a later overview.

When activating Semantic Similarity, combined with ‘K-Nearest Neighbor Search,’ you can turn your marks into a dynamic tool to boost your orientation of the document, and help you connect the dots, literally.

UX consideration for data structure:

Data Limitation

The natural behavior of the reader’s marking is not limited to sentences, but in most cases, it will happen within a paragraph. This is why I was looking for a model that encodes the maximum length of tokens per unit of text.

From my research, I found that in most cases, Semantic Similarity will be performed by sentence representation models — which embed the entire sentence as a vector (and not each word). Most standard models are InferSent by Facebook AI team and the Universal Sentence Encoder, by google AI. Another way to vectorize big chunks of text is with BERT, as you can learn from this great tutorial by ChrisMcCormickAI.

The consideration for paragraph vectorizing is dedicated to books. For short academic articles, matching sentences might yield better results. In any case — This feature needs to be investigated more thoroughly.

Presenting data

K-Nearest Neighbor Search is a method to make things searchable, by ranking the selected text to other units of texts by their ‘Euclidean distance.’ A 100% match will print you the same section. In some cases, it might result that a 50% relation is more relevant than 70%. It can differ from paragraph to paragraph, and It’s entirely subjective from reader to reader (it’s not by percentage, but with a few steps, you can turn it to %).

This consideration led me to design the slide bar (at the last part of the video), to give the reader more flexibility and control over the wanted results, and a feeling of a ‘knowledge playground.’

Conclusion

This article is a modest attempt to bridge the gap between NLP and UX. In this study, I found that NLP can be useful for small corpora and even a single text, with an available tech stack that can apply by developers. NLP technologies are growing fast in the last three years and thanks to the Hugging-Face library, very promising language models are much more accessible for everyone to build.

The UX Collective donates US$1 for each article published in our platform. This story contributed to UX Para Minas Pretas (UX For Black Women), a Brazilian organization focused on promoting equity of Black women in the tech industry through initiatives of action, empowerment, and knowledge sharing. Silence against systemic racism is not an option. Build the design community you believe in.

--

--