Things we learned about LLMs in 2024

Simon Willison

Dec 31, 2024

100

Key themes and pivotal moments from the last 12 months in Large Language Models

Read →

6 Comments

Kelsey Trimbur

Jan 9Edited

This is so comprehensive. Every time I thought of a question, it was covered in a later section of your post. Thank you!

Expand full comment

Subbu Vincent

Jan 24Edited

Hi Simon, a long time I saw your post on the NY Times story on Sam Altman's firing, discussing how anonymous sourcing was signaled in language. My area of interest and intervention at the Markkula Center for Applied Ethics (Santa Clara University) is broadly -- journalistic sourcing.

Our 2024 learning: We proposed a new benchmark for LLMs on annotating sourcing (as a route to assessing media) and compared 5 models. The preprint paper (Jan 3, 2025 published) is here. (We posted the dataset and prompts on Hugging Face). The findings show how LLMs struggle with an area I call source justifications.

https://arxiv.org/abs/2501.00164

Would love your and your readers' feedback.

Expand full comment

Nathan Lambert

Jan 2

Will get you Ai2 on the >GPT4 elo list, stat.

Expand full comment

Samuel Roland

Jan 2

Just a tremendously informative wrap-up, appreciate you taking the time. Do you know of any good resources other than personal testing for creating evals? Have had success doing myself but the number of tasks means I don't really have enough examples to generalize good lessons.

Expand full comment

Reply (1)