Discussion about this post

User's avatar
Bill Prin's avatar

"Vibe coding" terminology is in a weird spot because it if means what you suggest - using AI to write all the code without caring about its quality - it leaves a huge vacuum for this other use case, where you let the AI write most of the code but you DO spend a lot of time and energy trying to make it production worthy. Improving testing, improving prompts, etc.

One replacement word would be "AI-assisted coding" but, again, someone using Copilot a bit is different than someone trying to build a startup with 90% AI-generated code.

Yegge himself tried to coin CHOP (chatbot-oriented programming) so if he's writing a book calling it "vibe coding" , the nomenclature war has been lost already.

Also, thanks for sharing the workflow for sharing videos with LLM. The use case I'm interested in is breaking down Youtube videos to study writing and visual fx beast, I'll reference your work.

Expand full comment
Ken Kahn's avatar

Claude said "Your understanding of Top-P sampling is partially correct, but I should clarify an important distinction.

When you set Top-P to 0.5, you're not exactly "filtering out tokens in the lower half of the probability distribution." Instead, you're selecting the smallest set of highest-probability tokens whose cumulative probability exceeds 0.5 (or 50%).

The key difference is that Top-P doesn't simply cut off based on where the 50% mark falls in the distribution. It works by:

1. Sorting tokens by probability (highest to lowest)

2. Adding tokens to the selection set one by one until their cumulative probability exceeds the threshold P

3. Then sampling from just those tokens

For example, if your top three tokens had probabilities of 0.3, 0.25, and 0.2, a Top-P of 0.5 would select just the first two tokens (0.3 + 0.25 = 0.55, which exceeds 0.5), even though that's not half of all possible tokens.

The number of tokens included can vary greatly depending on how the probability is distributed. If one token has a probability of 0.6, then with P=0.5, only that single token would be considered.

This adaptive behavior is what makes Top-P (nucleus sampling) particularly useful compared to the fixed cutoff of Top-K."

Expand full comment
3 more comments...

No posts