5 Comments
User's avatar
Bill Prin's avatar

"Vibe coding" terminology is in a weird spot because it if means what you suggest - using AI to write all the code without caring about its quality - it leaves a huge vacuum for this other use case, where you let the AI write most of the code but you DO spend a lot of time and energy trying to make it production worthy. Improving testing, improving prompts, etc.

One replacement word would be "AI-assisted coding" but, again, someone using Copilot a bit is different than someone trying to build a startup with 90% AI-generated code.

Yegge himself tried to coin CHOP (chatbot-oriented programming) so if he's writing a book calling it "vibe coding" , the nomenclature war has been lost already.

Also, thanks for sharing the workflow for sharing videos with LLM. The use case I'm interested in is breaking down Youtube videos to study writing and visual fx beast, I'll reference your work.

Expand full comment
Ken Kahn's avatar

Claude said "Your understanding of Top-P sampling is partially correct, but I should clarify an important distinction.

When you set Top-P to 0.5, you're not exactly "filtering out tokens in the lower half of the probability distribution." Instead, you're selecting the smallest set of highest-probability tokens whose cumulative probability exceeds 0.5 (or 50%).

The key difference is that Top-P doesn't simply cut off based on where the 50% mark falls in the distribution. It works by:

1. Sorting tokens by probability (highest to lowest)

2. Adding tokens to the selection set one by one until their cumulative probability exceeds the threshold P

3. Then sampling from just those tokens

For example, if your top three tokens had probabilities of 0.3, 0.25, and 0.2, a Top-P of 0.5 would select just the first two tokens (0.3 + 0.25 = 0.55, which exceeds 0.5), even though that's not half of all possible tokens.

The number of tokens included can vary greatly depending on how the probability is distributed. If one token has a probability of 0.6, then with P=0.5, only that single token would be considered.

This adaptive behavior is what makes Top-P (nucleus sampling) particularly useful compared to the fixed cutoff of Top-K."

Expand full comment
Andreas's avatar

Simon, I'm reading your newsletter and most of it is over my head. However I'm trying to use your llm command line tool. But I understand openai does not give any free tokens anymore. Which model would I have to buy token for to get your tool to work, or can any tokens used on any model? Sorry for being a total noob here

Expand full comment
Simon Willison's avatar

Another free option is OpenRouter - they have a collection of free (albeit rate-limited) models: https://simonwillison.net/2025/Mar/10/llm-openrouter-04/

Expand full comment
Andrew Sanchez's avatar

You can use ollama (https://ollama.com/) and the llm-ollama plugin (https://github.com/taketwo/llm-ollama) for free.

Expand full comment