UX Collective

We believe designers are thinkers as much as they are makers. https://linktr.ee/uxc

Follow publication

How to read LLM benchmarks

Gautham Srinivas
UX Collective
Published in
5 min readNov 24, 2024

A table showing benchmark results for different LLMs
Source: Anthropic’s Claude 3.5 Sonnet blog post

Why Benchmarks

How Benchmarks Work

CoT & Few-shot

Zoomed in version of previous table, showing COT and few shot
Zooming into the same Claude table
ChatGPT showing the difference between with and without COT

The problem with these Benchmarks

How to choose for yourself

Free

Distraction-free reading. No ads.

Organize your knowledge with lists and highlights.

Tell your story. Find your audience.

Membership

Read member-only stories

Support writers you read most

Earn money for your writing

Listen to audio narrations

Read offline with the Medium app

No responses yet

Write a response